The CAPIO 2017 Conversational Speech Recognition System

Kyu J. Han; Akshay Chandrashekaran; Jungsuk Kim; Ian Lane

arXiv:1801.00059·cs.CL·April 11, 2018·77 cites

The CAPIO 2017 Conversational Speech Recognition System

Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

PDF

Open Access

TL;DR

This paper presents a state-of-the-art conversational speech recognition system using densely connected LSTMs and a simple acoustic model adaptation scheme, achieving record low word error rates and surpassing human parity on Switchboard data.

Contribution

Introduces densely connected LSTMs for speech recognition and a novel acoustic model adaptation method, achieving top performance on standard benchmarks.

Findings

01

Achieved 5.0% WER on Switchboard, surpassing human parity.

02

Improved CallHome performance by 6.1% relative with simple adaptation.

03

Set new best results for conversational speech recognition.

Abstract

In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected LSTMs, inspired by the densely connected convolutional networks recently introduced for image classification tasks. We also propose an acoustic model adaptation scheme that simply averages the parameters of a seed neural network acoustic model and its adapted version. This method was applied with the CallHome training corpus and improved individual system performances by on average 6.1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion. With RNN-LM rescoring and lattice combination on the 5 systems trained across three different phone sets, our 2017 speech recognition system has obtained 5.0% and 9.1% on Switchboard and CallHome, respectively, both of which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing