Self-supervised Learning for Speech Enhancement

Yu-Che Wang; Shrikant Venkataramani; Paris Smaragdis

arXiv:2006.10388·eess.AS·June 19, 2020·20 cites

Self-supervised Learning for Speech Enhancement

Yu-Che Wang, Shrikant Venkataramani, Paris Smaragdis

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised learning approach for speech enhancement that leverages autoencoding and shared latent representations, eliminating the need for labeled noisy-clean speech pairs.

Contribution

It presents a novel self-supervised training schema that enables speech enhancement without requiring labeled training data or human intervention.

Findings

01

Effective mapping of noisy to clean speech using self-supervised autoencoding.

02

Reduces dependency on labeled datasets for speech enhancement.

03

Demonstrates autonomous training process for speech enhancement networks.

Abstract

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the conditions on the training data, we consider the task of training speech enhancement networks in a self-supervised manner. We first use a limited training set of clean speech sounds and learn a latent representation by autoencoding on their magnitude spectrograms. We then autoencode on speech mixtures recorded in noisy environments and train the resulting autoencoder to share a latent representation with the clean examples. We show that using this training schema, we can now map noisy speech to its clean version using a network that is autonomously trainable without requiring labeled training examples or human intervention.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeffreyjeffreywang/SSE
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing