Articles written in Sadhana
Volume 46 All articles Published: 6 May 2021 Article ID 0099
In this paper, a model is proposed to improve monophone-based connected word speech recognition for the Hindi language by utilizing the Hidden Markov Model (HMM). The model consists of hybrid subword units and domain-specific syntactic structures. The hybrid units contain both phoneme- and syllable-basedsubword units. As the syllable-based subword units cover a larger acoustic span, contextual effects are reduced. The syllable-based acoustic units are applied for modelling only nasal sound in the hybrid model for improving the recognition score of a nasal sound. Further, improvement is proposed using syntactic structures in the grammar definition during the recognition process. Using the domain-specific syntactic structures in the grammar, the search space for the recognizer is reduced; consequently, the performance of the system isimproved. For example, two grammar definitions (gram1) with no restriction and grammar(gram2) with domain specific structures were applied. The speech recognition framework was implemented using the HMM-basedtoolkit HTK with five-state HMMs. The self-created connected word speech dataset is used with a vocabulary of 240 Hindi words. The Mel frequency cepstral coefficients (MFCCs), MFCCs with energy (MFCC_E), and perceptual linear prediction coefficients with energy (PLP_E) are utilized for feature extraction. Further, monophones were trained with and without using silence fixing to check the impact of short pauses on the recognizer’s performance. The system was tested for both speaker-dependent and speaker-independent modes. Itwas found that using a hybrid model and grammar(gram2) with silence fixing provided the best results. The system obtained an overall word accuracy of 80.28%, word correct of 80.28%, and a word error rate of 19.72% using MFCCs, gram2, phoneme-based modelling, and silence fixing. For the PLP_E coefficients, hybrid model, silence fixing, and gram2, the system obtained an overall word accuracy of 88.54%, word correct of 88.54%, and the word error rate of 11.46%.