A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition
Stefan Braun, Daniel Neil, Shih-Chii Liu

TL;DR
This paper introduces a curriculum learning approach called accordion annealing combined with per-epoch noise mixing to enhance noise robustness in automatic speech recognition, significantly reducing word error rates in noisy conditions.
Contribution
It proposes a novel curriculum training strategy and online noise mixing method that improve noise robustness without complex system modifications.
Findings
ACCAN reduces WER by up to 31.4% in noisy environments.
The methods outperform conventional multi-condition training.
Effective on the Wall Street Journal corpus.
Abstract
The performance of automatic speech recognition systems under noisy environments still leaves room for improvement. Speech enhancement or feature enhancement techniques for increasing noise robustness of these systems usually add components to the recognition system that need careful optimization. In this work, we propose the use of a relatively simple curriculum training strategy called accordion annealing (ACCAN). It uses a multi-stage training schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are first added and samples at increasing higher SNR values are gradually added up to an SNR value of 50dB. We also use a method called per-epoch noise mixing (PEM) that generates noisy training samples online during training and thus enables dynamically changing the SNR of our training data. Both the ACCAN and the PEM methods are evaluated on a end-to-end speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
