NILADRI SEKHAR DASH
Articles written in Sadhana
Volume 44 Issue 8 August 2019 Article ID 0181
An attempt is made in this paper to report how a supervised methodology has been adopted for the task of Word Sense Disambiguation (WSD) in Bengali with necessary modifications. At the initial stage, four commonly used supervised methods, Decision Tree (DT), Support Vector Machine (SVM), Artificial NeuralNetwork (ANN) and Naı¨ve Bayes (NB), are developed at the baseline. These algorithms are applied individually on a data set of 13 most frequently used Bengali ambiguous words. On experimental basis, the baseline strategyis modified with two extensions: (a) inclusion of lemmatization process into the system and (b) bootstrapping of the operational process. As a result, the levels of accuracy of the baseline methods are slightly improved, which is a positive signal for the whole process of disambiguation as it opens scope for further modification of the existing method for better result. In this experiment, the data sets are prepared from the Bengali corpus, developed in the Technology Development for Indian Languages (TDIL) project of the Government of India andfrom the Bengali WordNet, which is developed at the Indian Statistical Institute, Kolkata. The paper reports the challenges and pitfalls of the work that have been closely observed during the experiment.
Volume 45 All articles Published: 27 June 2020 Article ID 0168
The digital world is flooded with a huge number of documents belonging to multifarious categories. Most of these documents are uncategorized, which is a hindrance to efficient retrieval. In the case of news texts (one of the largest and most common sources of text information), it is often observed that a text does not belong to one particular category and has contents from multiple domains. This demands a text categorization system to segregate it into its respective domains for efficient information retrieval. The main challenge lies in handlingthe overlap of vocabulary among different domains at the time of categorization, which we have tackled using an approach based on fuzzy logic. In the present work a fuzzy rule inference system is presented, which works with newly proposed statistical features for segregating documents that belong to more than one or an undefined category. The generated model was defuzzified using five different techniques for determining the category of a document and the highest accuracy of 98.63% for the Centroid method was obtained. Experimentation was alsocarried out on standard English datasets (Reuters-21578 R8 and 20 Newsgroups). We obtain better results than those of reported works, thereby pointing to the language independence of our system