Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy
Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal and, Felix I. Wyss

TL;DR
This paper introduces a data-driven method for generating and pruning pronunciation variants to enhance automatic speech recognition accuracy, especially for names, by updating pronunciation dictionaries without harming recognition of similar words.
Contribution
It presents an efficient technique that automatically learns acceptable pronunciations and updates lexicons, significantly reducing error rates in name recognition tasks.
Findings
Reduced name recognition error rate by 42%.
Generalizes well across datasets of various sizes.
Improves recognition accuracy without affecting similar words.
Abstract
Speech recognition, especially name recognition, is widely used in phone services such as company directory dialers, stock quote providers or location finders. It is usually challenging due to pronunciation variations. This paper proposes an efficient and robust data-driven technique which automatically learns acceptable word pronunciations and updates the pronunciation dictionary to build a better lexicon without affecting recognition of other words similar to the target word. It generalizes well on datasets with various sizes, and reduces the error rate on a database with 13000+ human names by 42%, compared to a baseline with regular dictionaries already covering canonical pronunciations of 97%+ words in names, plus a well-trained spelling-to-pronunciation (STP) engine.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
