Why has (reasonably accurate) Automatic Speech Recognition been so hard   to achieve?

Steven Wegmann; Larry Gillick

arXiv:1003.0206·cs.CL·March 2, 2010·4 cites

Why has (reasonably accurate) Automatic Speech Recognition been so hard to achieve?

Steven Wegmann, Larry Gillick

PDF

Open Access

TL;DR

This paper investigates why automatic speech recognition using Hidden Markov Models (HMMs) remains challenging, revealing that statistical dependencies in speech data significantly impact recognition accuracy and need further understanding for improvements.

Contribution

The study introduces novel statistical analysis methods to demonstrate the impact of data dependencies on HMM-based speech recognition accuracy.

Findings

01

Real speech data exhibit significant statistical dependencies.

02

Removing dependencies from data reduces recognition errors.

03

Statistical dependency is a key factor in recognition performance.

Abstract

Hidden Markov models (HMMs) have been successfully applied to automatic speech recognition for more than 35 years in spite of the fact that a key HMM assumption -- the statistical independence of frames -- is obviously violated by speech data. In fact, this data/model mismatch has inspired many attempts to modify or replace HMMs with alternative models that are better able to take into account the statistical dependence of frames. However it is fair to say that in 2010 the HMM is the consensus model of choice for speech recognition and that HMMs are at the heart of both commercially available products and contemporary research systems. In this paper we present a preliminary exploration aimed at understanding how speech data depart from HMMs and what effect this departure has on the accuracy of HMM-based speech recognition. Our analysis uses standard diagnostic tools from the field of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing