TL;DR
RemixIT is a self-supervised speech enhancement method that uses bootstrapped remixing and iterative self-training to improve performance without relying on clean in-domain signals, effectively handling domain mismatch.
Contribution
It introduces a novel self-training scheme with remixing for speech enhancement that does not require clean target signals and can adapt across domains.
Findings
Outperforms prior speech enhancement methods in various datasets
Compatible with any separation model and domain adaptation tasks
Student models improve despite degraded pseudo-targets
Abstract
We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and remixing them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
