A Noise-Robust Self-supervised Pre-training Model Based Speech   Representation Learning for Automatic Speech Recognition

Qiu-Shi Zhu; Jie Zhang; Zi-Qiang Zhang; Ming-Hui Wu; Xin Fang; Li-Rong; Dai

arXiv:2201.08930·eess.AS·May 10, 2022

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong, Dai

PDF

TL;DR

This paper introduces an enhanced wav2vec2.0 model that improves noise robustness in speech recognition by leveraging both noisy and clean speech during training, achieving better performance in noisy environments with minimal impact on clean speech recognition.

Contribution

The paper proposes a novel noise-robust pre-training approach for wav2vec2.0 that uses both noisy and clean speech to improve ASR performance under noisy conditions.

Findings

01

Improved ASR accuracy on noisy test sets.

02

Minimal performance loss on clean test sets.

03

Effective across various noise types.

Abstract

Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the noise robustness is still unclear. In this work, we therefore first analyze the noise robustness of wav2vec2.0 via experiments. We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. Specifically, the noisy speech and the corresponding clean version are fed into the same feature encoder, where the clean speech provides training targets for the model. Experimental results reveal that the proposed method can not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.