Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation
Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

TL;DR
This paper introduces a unified neural network model that combines speech enhancement and separation with gradient modulation to improve noise robustness in monaural speech separation, achieving state-of-the-art results on noisy datasets.
Contribution
The paper proposes a novel unified network with gradient modulation for joint speech enhancement and separation, enhancing noise robustness in monaural speech separation.
Findings
Achieves state-of-the-art SI-SNRi on Libri2Mix-noisy and Libri3Mix-noisy datasets.
Demonstrates improved noise robustness over existing methods.
Validates effectiveness through extensive experiments.
Abstract
Recent studies in neural network-based monaural speech separation (SS) have achieved a remarkable success thanks to increasing ability of long sequence modeling. However, they would degrade significantly when put under realistic noisy conditions, as the background noise could be mistaken for speaker's speech and thus interfere with the separated sources. To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness. Specifically, we first build a unified network by combining speech enhancement (SE) and separation modules, with multi-task learning for optimization, where SE is supervised by parallel clean mixture to reduce noise for downstream speech separation. Furthermore, in order to avoid suppressing valid speaker information when reducing noise, we propose a gradient modulation (GM) strategy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
