Building DNN Acoustic Models for Large Vocabulary Speech Recognition

Andrew L. Maas; Peng Qi; Ziang Xie; Awni Y. Hannun; Christopher T.; Lengerich; Daniel Jurafsky; Andrew Y. Ng

arXiv:1406.7806·cs.CL·January 21, 2015

Building DNN Acoustic Models for Large Vocabulary Speech Recognition

Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T., Lengerich, Daniel Jurafsky, Andrew Y. Ng

PDF

1 Repo

TL;DR

This paper empirically investigates the design choices for DNN acoustic models in large vocabulary speech recognition, comparing architectures and training methods to establish best practices.

Contribution

It provides a comprehensive analysis of DNN design factors, including architecture and training techniques, for speech recognition, with experiments on large datasets and novel neural network types.

Findings

01

Simple DNN architectures perform strongly.

02

Locally-connected neural networks show promise.

03

Large datasets improve model performance.

Abstract

Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pannous/caffe-speech-recognition
caffe2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.