• Speech recognition from spectral dynamics

    • Fulltext

       

        Click here to view fulltext PDF


      Permanent link:
      https://www.ias.ac.in/article/fulltext/sadh/036/05/0729-0744

    • Keywords

       

      Carrier nature of speech; modulation spectrum; spectral dynamics of speech; coding of linguistic information in speech; machine recognition of speech; data-guided signal processing techniques.

    • Abstract

       

      Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

    • Author Affiliations

       

      Hynek Hermansky1

      1. The Johns Hopkins University, Baltimore, Maryland, USA
    • Dates

       

© 2017-2019 Indian Academy of Sciences, Bengaluru.