No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models
Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee,, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu,, Zhifeng Chen, Chung-Cheng Chiu

TL;DR
This paper investigates whether pronunciation lexica are necessary in end-to-end speech recognition models, finding that grapheme-based models outperform phoneme-based ones across different English tasks, simplifying multi-dialect recognition.
Contribution
It provides a comprehensive comparison between phoneme-based and grapheme-based end-to-end models, demonstrating the advantages of graphemes in simplifying and improving speech recognition.
Findings
Grapheme-based models outperform phoneme-based models on large vocabulary tasks.
Grapheme models are more effective for multi-dialect English recognition.
Pronunciation lexica offer limited benefits in end-to-end models.
Abstract
For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
