Kinnauri-Pahari (version_0.1): parallel, monolingual dataset and word-embeddings
SHEFALI SAXENA SHWETA CHAUHAN PHILEMON DANIEL
Click here to view fulltext PDF
Permanent link:
https://www.ias.ac.in/article/fulltext/sadh/047/0123
The recent United Nations Educational, Scientific and Cultural Organization (UNESCO) survey states that India has 197 endangered languages. Himachal Pradesh, a state in India, has topped the list with seven definitely endangered languages, and Kinnauri-Pahari being the one. Due to the lack of availability of digitized resources, the corpus compilation is a bit difficult. This paper presents and releases the Kinnauri-Pahari (ISO- 639-3:kjo) dataset, consisting of the 43,362 Monolingual and 20,307 Parallel sentences in version_0.1. The dataset was tested on the Statistical, and Neural Machine Translation and their results were evaluated using different evaluation metrics. The corpus is freely available for non-commercial usage and research (https:// github.com/phildani7/dlnith/tree/master/Kinnauri-Pahari).
SHEFALI SAXENA1 SHWETA CHAUHAN1 PHILEMON DANIEL1
Volume 48, 2023
All articles
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode
© 2022-2023 Indian Academy of Sciences, Bengaluru.