A Curriculum Learning Method for Improved Noise Robustness in Automatic   Speech Recognition

Stefan Braun; Daniel Neil; Shih-Chii Liu

arXiv:1606.06864·cs.CL·September 19, 2016·1 cites

A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition

Stefan Braun, Daniel Neil, Shih-Chii Liu

PDF

Open Access

TL;DR

This paper introduces a curriculum learning approach called accordion annealing combined with per-epoch noise mixing to enhance noise robustness in automatic speech recognition, significantly reducing word error rates in noisy conditions.

Contribution

It proposes a novel curriculum training strategy and online noise mixing method that improve noise robustness without complex system modifications.

Findings

01

ACCAN reduces WER by up to 31.4% in noisy environments.

02

The methods outperform conventional multi-condition training.

03

Effective on the Wall Street Journal corpus.

Abstract

The performance of automatic speech recognition systems under noisy environments still leaves room for improvement. Speech enhancement or feature enhancement techniques for increasing noise robustness of these systems usually add components to the recognition system that need careful optimization. In this work, we propose the use of a relatively simple curriculum training strategy called accordion annealing (ACCAN). It uses a multi-stage training schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are first added and samples at increasing higher SNR values are gradually added up to an SNR value of 50dB. We also use a method called per-epoch noise mixing (PEM) that generates noisy training samples online during training and thus enables dynamically changing the SNR of our training data. Both the ACCAN and the PEM methods are evaluated on a end-to-end speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing