Training Noisy Single-Channel Speech Separation With Noisy Oracle Sources: A Large Gap and A Small Step
Matthew Maciejewski, Jing Shi, Shinji Watanabe, Sanjeev Khudanpur

TL;DR
This paper addresses the challenge of training single-channel speech separation models in noisy conditions by proposing a new training objective that leverages the inseparability of noise, resulting in improved separation performance.
Contribution
It introduces a novel SI-SDR-inspired training objective that exploits noise inseparability to enhance training with noisy oracle sources.
Findings
The proposed method improves separation quality in noisy conditions.
Training with noisy oracle sources becomes more effective using the new objective.
Noise remains largely inseparable, impacting separation performance.
Abstract
As the performance of single-channel speech separation systems has improved, there has been a desire to move to more challenging conditions than the clean, near-field speech that initial systems were developed on. When training deep learning separation models, a need for ground truth leads to training on synthetic mixtures. As such, training in noisy conditions requires either using noise synthetically added to clean speech, preventing the use of in-domain data for a noisy-condition task, or training using mixtures of noisy speech, requiring the network to additionally separate the noise. We demonstrate the relative inseparability of noise and that this noisy speech paradigm leads to significant degradation of system performance. We also propose an SI-SDR-inspired training objective that tries to exploit the inseparability of noise to implicitly partition the signal and discount noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
