Self-Supervised Learning based Monaural Speech Enhancement with   Multi-Task Pre-Training

Yi Li; Yang Sun; Syed Mohsen Naqvi

arXiv:2112.11459·cs.SD·January 2, 2022

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

Yi Li, Yang Sun, Syed Mohsen Naqvi

PDF

Open Access

TL;DR

This paper introduces a multi-task pre-training approach for self-supervised monaural speech enhancement, leveraging limited clean speech data and multiple pre-tasks to improve denoising performance on reverberant mixtures.

Contribution

It proposes a novel multi-task pre-training framework combining a pre-training autoencoder and a downstream autoencoder for enhanced speech denoising.

Findings

01

Outperforms state-of-the-art speech enhancement methods

02

Effective with limited clean speech data

03

Improves denoising on unseen reverberant mixtures

Abstract

In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech enhancement performance with self-supervised learning. Within the pre-training autoencoder (PAE), only a limited set of clean speech signals are required to learn their latent representations. Meanwhile, to solve the limitation of single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the second pre-task. Different from the PAE, where the target speech signals are estimated, the downstream task autoencoder (DAE) utilizes a large number of unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The trained DAE is shared by the learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies