Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba; Peng Xu; Fernando Pereira; Thomas Richardson

arXiv:1302.1123·cs.CL·February 6, 2013

Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

PDF

TL;DR

This paper demonstrates that large-scale distributed acoustic models using back-off n-grams and extensive context can significantly improve speech recognition accuracy on Google Voice Search data.

Contribution

It introduces a back-off n-gram approach to acoustic modeling, scaling up model size and context beyond traditional limits with distributed computing.

Findings

01

Achieved 11% relative WER reduction with large models.

02

Expanded phonetic context up to seven phones, but beyond five phones showed no additional benefit.

03

Utilized 87,000 hours of training data for model estimation.

Abstract

The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly beyond triphones, as well as increase the number of Gaussian mixture components for the context-dependent states that allow it. We have experimented with contexts that span seven or more context-independent phones, and up to 620 mixture components per state. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The back-off acoustic model is estimated, stored and served using MapReduce distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.