Training Noisy Single-Channel Speech Separation With Noisy Oracle   Sources: A Large Gap and A Small Step

Matthew Maciejewski; Jing Shi; Shinji Watanabe; Sanjeev Khudanpur

arXiv:2010.12430·eess.AS·February 23, 2021

Training Noisy Single-Channel Speech Separation With Noisy Oracle Sources: A Large Gap and A Small Step

Matthew Maciejewski, Jing Shi, Shinji Watanabe, Sanjeev Khudanpur

PDF

TL;DR

This paper addresses the challenge of training single-channel speech separation models in noisy conditions by proposing a new training objective that leverages the inseparability of noise, resulting in improved separation performance.

Contribution

It introduces a novel SI-SDR-inspired training objective that exploits noise inseparability to enhance training with noisy oracle sources.

Findings

01

The proposed method improves separation quality in noisy conditions.

02

Training with noisy oracle sources becomes more effective using the new objective.

03

Noise remains largely inseparable, impacting separation performance.

Abstract

As the performance of single-channel speech separation systems has improved, there has been a desire to move to more challenging conditions than the clean, near-field speech that initial systems were developed on. When training deep learning separation models, a need for ground truth leads to training on synthetic mixtures. As such, training in noisy conditions requires either using noise synthetically added to clean speech, preventing the use of in-domain data for a noisy-condition task, or training using mixtures of noisy speech, requiring the network to additionally separate the noise. We demonstrate the relative inseparability of noise and that this noisy speech paradigm leads to significant degradation of system performance. We also propose an SI-SDR-inspired training objective that tries to exploit the inseparability of noise to implicitly partition the signal and discount noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.