Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy

Zhenhao Ge; Aravind Ganapathiraju; Ananth N. Iyer; Scott A. Randal and; Felix I. Wyss

arXiv:1606.08821·cs.CL·June 29, 2016

Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy

Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal and, Felix I. Wyss

PDF

Open Access

TL;DR

This paper introduces a data-driven method for generating and pruning pronunciation variants to enhance automatic speech recognition accuracy, especially for names, by updating pronunciation dictionaries without harming recognition of similar words.

Contribution

It presents an efficient technique that automatically learns acceptable pronunciations and updates lexicons, significantly reducing error rates in name recognition tasks.

Findings

01

Reduced name recognition error rate by 42%.

02

Generalizes well across datasets of various sizes.

03

Improves recognition accuracy without affecting similar words.

Abstract

Speech recognition, especially name recognition, is widely used in phone services such as company directory dialers, stock quote providers or location finders. It is usually challenging due to pronunciation variations. This paper proposes an efficient and robust data-driven technique which automatically learns acceptable word pronunciations and updates the pronunciation dictionary to build a better lexicon without affecting recognition of other words similar to the target word. It generalizes well on datasets with various sizes, and reduces the error rate on a database with 13000+ human names by 42%, compared to a baseline with regular dictionaries already covering canonical pronunciations of 97%+ words in names, plus a well-trained spelling-to-pronunciation (STP) engine.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems