Deep Speech: Scaling up end-to-end speech recognition

Awni Hannun; Carl Case; Jared Casper; Bryan Catanzaro; Greg Diamos,; Erich Elsen; Ryan Prenger; Sanjeev Satheesh; Shubho Sengupta; Adam Coates and; Andrew Y. Ng

arXiv:1412.5567·cs.CL·December 23, 2014

Deep Speech: Scaling up end-to-end speech recognition

Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos,, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates and, Andrew Y. Ng

PDF

5 Repos 1 Models

TL;DR

Deep Speech is an end-to-end deep learning speech recognition system that simplifies traditional pipelines, learns directly from data, and outperforms previous models especially in noisy environments, using novel training and data synthesis techniques.

Contribution

The paper introduces a simplified end-to-end speech recognition system that eliminates the need for hand-engineered components and phoneme dictionaries, achieving superior accuracy and noise robustness.

Findings

01

Achieves 16.0% error on Switchboard Hub5'00 test set.

02

Outperforms previous models in noisy environments.

03

Uses novel data synthesis for training.

Abstract

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
mbazaNLP/kinyarwanda-coqui-stt-model
model· 29 dl· ♡ 3
29 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.