Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie, Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

TL;DR
This paper introduces a noise-robust speech representation learning method combining contrastive learning with a reconstruction module, significantly improving ASR performance in noisy environments without requiring denoising during inference.
Contribution
It proposes a novel multi-task continual pre-training framework that enhances noise robustness of speech representations using a reconstruction module alongside contrastive learning.
Findings
Reduces WER by around 4.1/7.5% on noisy LibriSpeech test sets
Achieves state-of-the-art performance on CHiME-4 noisy speech recognition
Performs comparably to supervised methods with only 16% labeled data
Abstract
Noise robustness is essential for deploying automatic speech recognition (ASR) systems in real-world environments. One way to reduce the effect of noise interference is to employ a preprocessing module that conducts speech enhancement, and then feed the enhanced speech to an ASR backend. In this work, instead of suppressing background noise with a conventional cascaded pipeline, we employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. We propose to combine a reconstruction module with contrastive learning and perform multi-task continual pre-training on noisy data. The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference. Experiments demonstrate the effectiveness of our proposed method. Our model substantially reduces the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsTest · Contrastive Learning
