The CAPIO 2017 Conversational Speech Recognition System
Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

TL;DR
This paper presents a state-of-the-art conversational speech recognition system using densely connected LSTMs and a simple acoustic model adaptation scheme, achieving record low word error rates and surpassing human parity on Switchboard data.
Contribution
Introduces densely connected LSTMs for speech recognition and a novel acoustic model adaptation method, achieving top performance on standard benchmarks.
Findings
Achieved 5.0% WER on Switchboard, surpassing human parity.
Improved CallHome performance by 6.1% relative with simple adaptation.
Set new best results for conversational speech recognition.
Abstract
In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected LSTMs, inspired by the densely connected convolutional networks recently introduced for image classification tasks. We also propose an acoustic model adaptation scheme that simply averages the parameters of a seed neural network acoustic model and its adapted version. This method was applied with the CallHome training corpus and improved individual system performances by on average 6.1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion. With RNN-LM rescoring and lattice combination on the 5 systems trained across three different phone sets, our 2017 speech recognition system has obtained 5.0% and 9.1% on Switchboard and CallHome, respectively, both of which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
