Exploring WavLM on Speech Enhancement
Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min, Tang, Jong Won Shin, Shujie Liu

TL;DR
This paper investigates the effectiveness of WavLM, a self-supervised speech model, for speech enhancement, proposing new training methods and demonstrating improved performance especially with limited fine-tuning data.
Contribution
It introduces a regression-based training objective and noise-mixing data configuration for WavLM, enhancing speech enhancement performance in low-resource scenarios.
Findings
WavLM improves speech quality and recognition accuracy in enhancement tasks.
The proposed methods boost performance especially with limited fine-tuning data.
Significant reduction in word error rate in high-resource conditions.
Abstract
There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, we propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies
