Unifying Speech Enhancement and Separation with Gradient Modulation for   End-to-End Noise-Robust Speech Separation

Yuchen Hu; Chen Chen; Heqing Zou; Xionghu Zhong; Eng Siong Chng

arXiv:2302.11131·eess.AS·February 23, 2023·1 cites

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified neural network model that combines speech enhancement and separation with gradient modulation to improve noise robustness in monaural speech separation, achieving state-of-the-art results on noisy datasets.

Contribution

The paper proposes a novel unified network with gradient modulation for joint speech enhancement and separation, enhancing noise robustness in monaural speech separation.

Findings

01

Achieves state-of-the-art SI-SNRi on Libri2Mix-noisy and Libri3Mix-noisy datasets.

02

Demonstrates improved noise robustness over existing methods.

03

Validates effectiveness through extensive experiments.

Abstract

Recent studies in neural network-based monaural speech separation (SS) have achieved a remarkable success thanks to increasing ability of long sequence modeling. However, they would degrade significantly when put under realistic noisy conditions, as the background noise could be mistaken for speaker's speech and thus interfere with the separated sources. To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness. Specifically, we first build a unified network by combining speech enhancement (SE) and separation modules, with multi-task learning for optimization, where SE is supervised by parallel clean mixture to reduce noise for downstream speech separation. Furthermore, in order to avoid suppressing valid speaker information when reducing noise, we propose a gradient modulation (GM) strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuchen005/unified-enhance-separation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research