No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica   in End-to-End Models

Tara N. Sainath; Rohit Prabhavalkar; Shankar Kumar; Seungji Lee,; Anjuli Kannan; David Rybach; Vlad Schogol; Patrick Nguyen; Bo Li; Yonghui Wu,; Zhifeng Chen; Chung-Cheng Chiu

arXiv:1712.01864·cs.CL·December 7, 2017

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee,, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu,, Zhifeng Chen, Chung-Cheng Chiu

PDF

TL;DR

This paper investigates whether pronunciation lexica are necessary in end-to-end speech recognition models, finding that grapheme-based models outperform phoneme-based ones across different English tasks, simplifying multi-dialect recognition.

Contribution

It provides a comprehensive comparison between phoneme-based and grapheme-based end-to-end models, demonstrating the advantages of graphemes in simplifying and improving speech recognition.

Findings

01

Grapheme-based models outperform phoneme-based models on large vocabulary tasks.

02

Grapheme models are more effective for multi-dialect English recognition.

03

Pronunciation lexica offer limited benefits in end-to-end models.

Abstract

For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.