Articles written in Sadhana
Volume 47 All articles Published: 2 June 2022 Article ID 0106
Explicitly unsupervised statistical machine translation analysis on five Indian languages using automatic evaluation metrics
SHEFALI SAXENA SHWETA CHAUHAN PARAS ARORA PHILEMON DANIEL
This letter, presents the compendium of eight unsupervised Machine Translation (MT) systems built from monolingual corpus of five Indian languages from the Indo-Aryan and Dravidian language families. Recent research has demonstrated outstanding results in completely unsupervised training of Phrase-based Statistical MT (PBSMT) systems using innovative and designs that rely solely on monolingual datasets. However, prior research has shown that Unsupervised Statistical MT (USMT) outperforms Unsupervised Neural MT (UNMT),particularly for language pairings that are not closely related. The purpose of this work is to investigate the architecture of the USMT system utilizing only monolingual dataset using four different Indian morphologically rich languages and one low-resource endangered Kangri language. The experimental results analysis are evaluated using different natural language toolkit tokenizers and analyzed for different language pair using various fully automatic MT evaluation metrics for different iterations.
Volume 47 All articles Published: 27 June 2022 Article ID 0123
Kinnauri-Pahari (version_0.1): parallel, monolingual dataset and word-embeddings
SHEFALI SAXENA SHWETA CHAUHAN PHILEMON DANIEL
The recent United Nations Educational, Scientific and Cultural Organization (UNESCO) survey states that India has 197 endangered languages. Himachal Pradesh, a state in India, has topped the list with seven definitely endangered languages, and Kinnauri-Pahari being the one. Due to the lack of availability of digitized resources, the corpus compilation is a bit difficult. This paper presents and releases the Kinnauri-Pahari (ISO- 639-3:kjo) dataset, consisting of the 43,362 Monolingual and 20,307 Parallel sentences in version_0.1. The dataset was tested on the Statistical, and Neural Machine Translation and their results were evaluated using different evaluation metrics. The corpus is freely available for non-commercial usage and research (https:// github.com/phildani7/dlnith/tree/master/Kinnauri-Pahari).
Volume 48, 2023
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode