Acoustic data-driven lexicon learning based on a greedy pronunciation   selection framework

Xiaohui Zhang; Vimal Manohar; Daniel Povey; Sanjeev Khudanpur

arXiv:1706.03747·cs.CL·June 13, 2017·5 cites

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

PDF

Open Access

TL;DR

This paper introduces a data-driven method for automatically learning pronunciations for words in speech recognition systems, combining letter sequences and acoustic evidence, and effectively pruning the lexicon for improved ASR performance.

Contribution

It presents a novel greedy pronunciation selection framework that automatically constructs compact, effective lexicons from transcribed data, outperforming traditional G2P-based methods.

Findings

01

Achieves near-expert lexicon performance in WER

02

Outperforms G2P-only lexicons in accuracy

03

Effective pruning reduces lexicon size without sacrificing quality

Abstract

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations. In this paper, we describe a system for automatically obtaining pronunciations of words for which pronunciations are not available, but for which transcribed data exists. Our method integrates information from the letter sequence and from the acoustic evidence. The novel aspect of the problem that we address is the problem of how to prune entries from such a lexicon (since, empirically, lexicons with too many entries do not tend to be good for ASR performance). Experiments on various ASR tasks show that, with the proposed framework, starting with an initial lexicon of several thousand words, we are able to learn a lexicon which performs close to a full expert lexicon in terms of WER performance on test data, and is better than lexicons built using G2P alone or with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing