Articles written in Sadhana

    • Sentiment classification with GST tweet data on LSTM based on polarity-popularity model


      More Details Abstract Fulltext PDF

      One of the biggest issues of Indian economy in 2017 was the implementation of Goods and Services Tax (GST), and the social networks witnessed a lot of opinion contrasts and conflicts regarding this new taxation system. Inspired by such a large-scale tax reformation, we developed an experimental approach to analyze the reactions of public sentiment on Twitter based on popular words either directly or indirectly related to GST. We collected a number of almost 200 k tweets solely about GST from June 2017 to December 2017 in two phases.In order to assure the relevance of our crawled tweets with respect to GST, we prepared a topic-sentiment relevance model. Furthermore, we employed several state-of-the-art lexicons for identifying sentiment words and assigned polarity ratings to each of the tweets. On the other hand, in order to extract the relevant words that are linked with GST implicitly, we propose a new polarity-popularity framework and such popular words were also rated with sentiments. Next, we trained an LSTM model using both types of rated words for predictingsentiment on GST tweets and obtained an overall accuracy of 84.51%. It was observed that the performance of the system has been started improving while incorporating the knowledge of indirectly related GST words during training.

    • Scientific Text Entailment and a Textual-Entailment-based framework for cooking domain question answering


      More Details Abstract Fulltext PDF

      Detecting entailment relationship between two sentences has profoundly impacted several different application areas of Natural Language Processing (NLP). Though recognizing textual entailment (TE) is amongst the widely studied problems, the research on detecting entailment between pieces of scientific texts is still in its infancy. To this end the paper discusses implementation of systems based on Long Short-Term Memory (LSTM) neural network and Support Vector Machine (SVM) classifiers using SCITAIL entailment dataset, a dataset in which premise and hypothesis are constituted of scientific texts. Also, a TE-based framework for cooking domain question answering is introduced. The proposed framework exploits the entailment relationship between user question and the cooking questions contained inside a Knowledge Base (KB).

    • Ensemble approach for identifying medical concepts with special attention to lexical scope


      More Details Abstract Fulltext PDF

      Health-care services are implanted by deploying roots of information extraction techniques. This extraction process is laborious and time-consuming due to unavailability of medical experts. Thus, in the present task, we were motivated to develop an automated extraction system for identifying medical and non-medicalconcepts. These concepts help to extract the key information from medical corpora. Not only medical concepts but also their non-medical counterparts are equally important for diagnosis purposes. Hence, we have employed three different approaches such as unsupervised, supervised, and their combined ensemble version to identify both medical and non-medical terms (words/phrases). The unsupervised module consists of two phases: parts-of speech (POS) tagging followed by searching in a domain-specific lexicon, namely WordNet of Medical Event (WME 3.0). On the other hand the supervised module is designed by two machine learning classifiers, namely Naive Bayes and Conditional Random Field (CRF) along with various features like category, POS, sentiment, etc. Finally, we have combined the important outcomes of unsupervised and supervised modules and developed two versions of ensemble module (Ensemble-I and Ensemble-II). All the modules identify uni-gram, bi-gram,tri-gram, and more than tri-gram medical concepts and separate non-medical words or phrases in a context. In order to evaluate all modules of concept identification system, we have prepared an experimental dataset. It hasbeen split into three parts, namely training, development, and test. We observed that ensemble module provides better output in contrast with individual modules and Ensemble-I outperforms Ensemble-II in identifying medical concepts consisting of all possible n-grams. The result analysis shows that the F-measures of 0.91 and 0.94 have been obtained for identifying medical concepts and non-medical words/phrases using both of the ensemble modules, respectively. The present research reports the initial steps to build an automated concept identification framework in health-care. This system assists in designing various domain-specific applications like annotation, categorization, recommendation system, etc.

  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.