Noisy Training Improves E2E ASR for the Edge

Dilin Wang; Yuan Shangguan; Haichuan Yang; Pierce Chuang; Jiatong; Zhou; Meng Li; Ganesh Venkatesh; Ozlem Kalinli; Vikas Chandra

arXiv:2107.04677·cs.CL·July 13, 2021·1 cites

Noisy Training Improves E2E ASR for the Edge

Dilin Wang, Yuan Shangguan, Haichuan Yang, Pierce Chuang, Jiatong, Zhou, Meng Li, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

PDF

Open Access

TL;DR

This paper introduces a noisy training strategy that adds random noise to model parameters during training, leading to improved generalization and significant WER reductions in end-to-end ASR models on LibriSpeech data.

Contribution

The paper proposes a simple noisy training method that enhances E2E ASR model generalization, outperforming existing regularization techniques on dense and sparse models.

Findings

01

Achieved 12% WER reduction on LibriSpeech Test-other with sparse models.

02

Achieved 14% WER reduction on LibriSpeech Test-clean with sparse models.

03

Consistent improvements across different Emformer models.

Abstract

Automatic speech recognition (ASR) has become increasingly ubiquitous on modern edge devices. Past work developed streaming End-to-End (E2E) all-neural speech recognizers that can run compactly on edge devices. However, E2E ASR models are prone to overfitting and have difficulties in generalizing to unseen testing data. Various techniques have been proposed to regularize the training of ASR models, including layer normalization, dropout, spectrum data augmentation and speed distortions in the inputs. In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training. By introducing random noise to the parameter space during training, our method can produce smoother models at convergence that generalize better. We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing