• Fulltext

       

        Click here to view fulltext PDF


      Permanent link:
      https://www.ias.ac.in/article/fulltext/sadh/046/0099

    • Keywords

       

      Speech recognition; MFCCs; PLP; monophone; Hidden Markov Model; connected word; Hindi.

    • Abstract

       

      In this paper, a model is proposed to improve monophone-based connected word speech recognition for the Hindi language by utilizing the Hidden Markov Model (HMM). The model consists of hybrid subword units and domain-specific syntactic structures. The hybrid units contain both phoneme- and syllable-basedsubword units. As the syllable-based subword units cover a larger acoustic span, contextual effects are reduced. The syllable-based acoustic units are applied for modelling only nasal sound in the hybrid model for improving the recognition score of a nasal sound. Further, improvement is proposed using syntactic structures in the grammar definition during the recognition process. Using the domain-specific syntactic structures in the grammar, the search space for the recognizer is reduced; consequently, the performance of the system isimproved. For example, two grammar definitions (gram1) with no restriction and grammar(gram2) with domain specific structures were applied. The speech recognition framework was implemented using the HMM-basedtoolkit HTK with five-state HMMs. The self-created connected word speech dataset is used with a vocabulary of 240 Hindi words. The Mel frequency cepstral coefficients (MFCCs), MFCCs with energy (MFCC_E), and perceptual linear prediction coefficients with energy (PLP_E) are utilized for feature extraction. Further, monophones were trained with and without using silence fixing to check the impact of short pauses on the recognizer’s performance. The system was tested for both speaker-dependent and speaker-independent modes. Itwas found that using a hybrid model and grammar(gram2) with silence fixing provided the best results. The system obtained an overall word accuracy of 80.28%, word correct of 80.28%, and a word error rate of 19.72% using MFCCs, gram2, phoneme-based modelling, and silence fixing. For the PLP_E coefficients, hybrid model, silence fixing, and gram2, the system obtained an overall word accuracy of 88.54%, word correct of 88.54%, and the word error rate of 11.46%.

    • Author Affiliations

       

      SHOBHA BHATT1 ANURAG JAIN1 AMITA DEV2

      1. University School of Information and Communication Technology, GGSIP University, New Delhi, India
      2. Indira Gandhi Delhi Technical University for Women, New Delhi, India
    • Dates

       
  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.