• Fulltext

       

        Click here to view fulltext PDF


      Permanent link:
      https://www.ias.ac.in/article/fulltext/sadh/047/0106

    • Keywords

       

      Machine translation; statistical machine translation; low-resource language; evaluation.

    • Abstract

       

      This letter, presents the compendium of eight unsupervised Machine Translation (MT) systems built from monolingual corpus of five Indian languages from the Indo-Aryan and Dravidian language families. Recent research has demonstrated outstanding results in completely unsupervised training of Phrase-based Statistical MT (PBSMT) systems using innovative and designs that rely solely on monolingual datasets. However, prior research has shown that Unsupervised Statistical MT (USMT) outperforms Unsupervised Neural MT (UNMT),particularly for language pairings that are not closely related. The purpose of this work is to investigate the architecture of the USMT system utilizing only monolingual dataset using four different Indian morphologically rich languages and one low-resource endangered Kangri language. The experimental results analysis are evaluated using different natural language toolkit tokenizers and analyzed for different language pair using various fully automatic MT evaluation metrics for different iterations.

    • Author Affiliations

       

      SHEFALI SAXENA1 SHWETA CHAUHAN1 PARAS ARORA1 PHILEMON DANIEL1

      1. Electronics and Communication Department, National Institute of Technology, Hamirpur, HP, India
    • Dates

       
  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2022-2023 Indian Academy of Sciences, Bengaluru.