Investigating self-supervised learning for speech enhancement and   separation

Zili Huang; Shinji Watanabe; Shu-wen Yang; Paola Garcia; Sanjeev; Khudanpur

arXiv:2203.07960·eess.AS·March 16, 2022·ICASSP

Investigating self-supervised learning for speech enhancement and separation

Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, Sanjeev, Khudanpur

PDF

Open Access

TL;DR

This paper evaluates 13 self-supervised learning methods for speech enhancement and separation, showing some outperform traditional features and analyzing the challenges and properties needed for SSL in these tasks.

Contribution

It provides a comprehensive evaluation of SSL methods on speech enhancement and separation, highlighting their effectiveness and analyzing key factors affecting their application.

Findings

01

Some SSL representations outperform baseline features like STFT and FBANK.

02

Analysis of factors hindering SSL application to speech tasks.

03

Discussion on desirable representation properties for enhancement and separation.

Abstract

Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation extracts target speech from interfering speakers. Despite a great number of supervised learning-based enhancement and separation methods having been proposed and achieving good performance, studies on applying self-supervised learning (SSL) to enhancement and separation are limited. In this paper, we evaluate 13 SSL upstream methods on speech enhancement and separation downstream tasks. Our experimental results on Voicebank-DEMAND and Libri2Mix show that some SSL representations consistently outperform baseline features including the short-time Fourier transform (STFT) magnitude and log Mel filterbank (FBANK). Furthermore, we analyze the factors that make existing SSL frameworks difficult to apply to speech enhancement and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques