• Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications

    • Fulltext

       

        Click here to view fulltext PDF


      Permanent link:
      https://www.ias.ac.in/article/fulltext/sadh/044/07/0168

    • Keywords

       

      Natural language processing; word sense disambiguation; principal component analysis; context expansion.

    • Abstract

       

      In this work, Word Sense Disambiguation (WSD) in Bengali language is implemented using unsupervised methodology. In the first phase of this experiment, sentence clustering is performed using Maximum Entropy method and the clusters are labelled with their innate senses by manual intervention, as thesesense-tagged clusters could be used as sense inventories for further experiment. In the next phase, when a test data comes to be disambiguated, the Cosine Similarity Measure is used to find the closeness of that test data withthe initially sense-tagged clusters. The minimum distance of that test data from a particular sense-tagged cluster assigns the same sense to the test data as that of the cluster it is assigned with. This strategy is considered as the baseline strategy, which produces 35% accurate result in WSD task. Next, two extensions are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 52% accuracy in WSD task and (b) Context Expansion of the sentences using Bengali WordNet coupled with PCA,which produces 61% accuracy in WSD task. The data sets that are used in this work are obtained from the Bengali corpus, developed under the Technology Development for the Indian Languages (TDIL) project of the Government of India, and the lexical knowledge base (i.e., the Bengali WordNet) used in the work is developed at the Indian Statistical Institute, Kolkata, under the Indradhanush Project of the DeitY, Government of India. The challenges and the pitfalls of this work are also described in detail in the pre-conclusion section.

    • Author Affiliations

       

      ALOK RANJAN PAL1 DIGANTA SAHA2

      1. Department of Computer Science and Engineering, College of Engineering and Management, Kolaghat, India
      2. Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
    • Dates

       
  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2022-2023 Indian Academy of Sciences, Bengaluru.