Statistical Modeling in Continuous Speech Recognition (CSR)(Invited   Talk)

Steve Young

arXiv:1301.2318·cs.LG·January 14, 2013

Statistical Modeling in Continuous Speech Recognition (CSR)(Invited Talk)

Steve Young

PDF

Open Access

TL;DR

This paper reviews the development of statistical models like HMMs and N-grams in continuous speech recognition, discussing their assumptions, limitations, and ongoing research to improve system performance.

Contribution

It provides a comprehensive overview of the evolution, challenges, and future directions of statistical modeling techniques in CSR systems.

Findings

01

Progress in speech signal parameterisation and modeling assumptions.

02

Techniques to mitigate modeling assumptions' effects.

03

Fundamental modeling research ongoing to address limitations.

Abstract

Automatic continuous speech recognition (CSR) is sufficiently mature that a variety of real world applications are now possible including large vocabulary transcription and interactive spoken dialogues. This paper reviews the evolution of the statistical modelling techniques which underlie current-day systems, specifically hidden Markov models (HMMs) and N-grams. Starting from a description of the speech signal and its parameterisation, the various modelling assumptions and their consequences are discussed. It then describes various techniques by which the effects of these assumptions can be mitigated. Despite the progress that has been made, the limitations of current modelling techniques are still evident. The paper therefore concludes with a brief review of some of the more fundamental modelling work now in progress.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems