Continual self-training with bootstrapped remixing for speech   enhancement

Efthymios Tzinis; Yossi Adi; Vamsi K. Ithapu; Buye Xu; Anurag Kumar

arXiv:2110.10103·cs.SD·November 14, 2022

Continual self-training with bootstrapped remixing for speech enhancement

Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

PDF

Open Access 1 Repo

TL;DR

RemixIT introduces a self-supervised speech enhancement method that iteratively improves performance by bootstrapping estimated signals and updating a teacher model, outperforming previous methods.

Contribution

The paper presents RemixIT, a novel self-training approach that overcomes domain assumptions and clean data requirements, applicable to various separation tasks.

Findings

01

RemixIT outperforms state-of-the-art self-supervised methods.

02

Effective for semi-supervised and unsupervised domain adaptation.

03

Applicable to any separation model and task.

Abstract

We propose RemixIT, a simple and novel self-supervised training method for speech enhancement. The proposed method is based on a continuously self-training scheme that overcomes limitations from previous studies including assumptions for the in-domain noise distribution and having access to clean target signals. Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures. Next, we bootstrap the mixing process by generating artificial mixtures using permuted estimated clean and noise signals. Finally, the student model is trained using the permuted estimated sources as targets while we periodically update teacher's weights using the latest student model. Our experiments show that RemixIT outperforms several previous state-of-the-art self-supervised methods under multiple speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

etzinis/unsup_speech_enh_adaptation
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques