SUKOMAL PAL
Articles written in Sadhana
Volume 43 Issue 6 June 2018 Article ID 0093
Machine transliteration and transliterated text retrieval: a survey
DINESH KUMAR PRABHAKAR SUKOMAL PAL
Users of the WWW across the globe are increasing rapidly. According to Internet live stats there are more than 3 billion Internet users worldwide today and the number of non-English native speakers is quite high there. A large proportion of these non-English speakers access the Internet in their native languages but use the Roman script to express themselves through various communication channels like messages and posts. With the advent of Web 2.0, user-generated content is increasing on the Web at a very rapid rate. A substantial proportion of this content is transliterated data. To leverage this huge information repository, there is a matching effort to process transliterated text. In this article, we survey the recent body of work in the field of transliteration. We start with a definition and discussion of the different types of transliteration followed by various deterministicand non-deterministic approaches used to tackle transliteration-related issues in machine translation and information retrieval. Finally, we study the performance of those techniques and present a comparative analysis of them.
Volume 47 All articles Published: 10 January 2022 Article ID 0017
Effect of stopwords in Indian language IR
We explore and evaluate the effect of stopwords in retrieval performance of different Indian languages such as Marathi, Bengali, Gujarati and Sanskrit. The issue was investigated from three viewpoints. Is there any impact of non-corpus-based stopword removal on chosen Indian languages (if yes, to what extent)?Can we recommend, based on experiment, a number of stopwords for chosen Indian languages that are good enough from retrieval point of view? Is there any relationship of stopwords with average document length from retrieval perspective? It is observed that the stopword removal generally improves mean average precision (MAP) significantly compared with the case when it is not done. For each language, different lengths of the stopword list are explored and evaluated that lead to suggesting its optimal length. We also study the effect ofstopwords on retrieval performance over document length. The effect of stopwords is generally found to be quite low in short documents compared with their long counterparts across the four Indian languages.
Volume 48, 2023
All articles
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode
© 2022-2023 Indian Academy of Sciences, Bengaluru.