Effective Noise-aware Data Simulation for Domain-adaptive Speech   Enhancement Leveraging Dynamic Stochastic Perturbation

Chien-Chun Wang; Li-Wei Chen; Hung-Shin Lee; Berlin Chen; Hsin-Min; Wang

arXiv:2409.01545·cs.SD·September 4, 2024

Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min, Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a noise-aware data simulation approach using GANs and dynamic stochastic perturbation to improve cross-domain speech enhancement, especially in unseen noise conditions, with promising results on benchmark data.

Contribution

The paper proposes a novel noise-aware data simulation method leveraging noise embeddings and dynamic stochastic perturbation for better domain adaptation in speech enhancement.

Findings

01

Outperforms existing data simulation baselines on VoiceBank-DEMAND

02

Effectively generalizes to unseen noise conditions

03

Enhances speech quality in cross-domain scenarios

Abstract

Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited target noisy speech data. Notably, our method employs a noise encoder to extract noise embeddings from target-domain data. These embeddings aptly guide the generator to synthesize utterances acoustically fitted to the target domain while authentically preserving the phonetic content of the input clean speech. Furthermore, we introduce the notion of dynamic stochastic perturbation, which can inject controlled perturbations into the noise embeddings during inference, thereby enabling the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JethroWangSir/NADA-GAN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis