Improvements to deep convolutional neural networks for LVCSR

Tara N. Sainath; Brian Kingsbury; Abdel-rahman Mohamed; George E.; Dahl; George Saon; Hagen Soltau; Tomas Beran; Aleksandr Y. Aravkin; Bhuvana; Ramabhadran

arXiv:1309.1501·cs.LG·December 11, 2013

Improvements to deep convolutional neural networks for LVCSR

Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E., Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana, Ramabhadran

PDF

TL;DR

This paper enhances deep CNNs for large vocabulary continuous speech recognition by analyzing sharing strategies, applying advanced pooling, integrating speaker adaptation, and using dropout, leading to significant WER improvements.

Contribution

It introduces novel methods for CNN optimization in LVCSR, including effective speaker adaptation and dropout strategies during sequence training.

Findings

01

Achieved 2-3% relative WER reduction on 50-hour BN task

02

Achieved 4-5% relative WER reduction on 400-hour BN task

03

Validated improvements over previous CNN baselines

Abstract

Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. This has also been confirmed experimentally, with CNNs showing improvements in word error rate (WER) between 4-12% relative compared to DNNs across a variety of LVCSR tasks. In this paper, we describe different methods to further improve CNN performance. First, we conduct a deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features. Second, we apply various pooling strategies that have shown improvements in computer vision to an LVCSR speech task. Third, we introduce a method to effectively incorporate speaker adaptation, namely fMLLR, into log-mel features. Fourth, we introduce an effective strategy to use dropout during Hessian-free sequence training. We find that with these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.