Study of GANs for Noisy Speech Simulation from Clean Speech
Leander Melroy Maben, Zixun Guo, Chen Chen, Utkarsh Chudiwal, Chng Eng, Siong

TL;DR
This paper investigates the use of various GAN architectures to simulate noisy speech from clean speech, aiming to improve training data for speech models in noisy environments, with significant spectral loss reductions.
Contribution
It introduces four GAN models, including both parallel and non-parallel translators, for effective noisy speech simulation from limited clean datasets.
Findings
Achieved up to 55.8% reduction in spectral loss on multiple datasets.
Demonstrated effectiveness of GANs in simulating diverse noise types.
Improved speech model robustness in noisy conditions.
Abstract
The performance of speech processing models trained on clean speech drops significantly in noisy conditions. Training with noisy datasets alleviates the problem, but procuring such datasets is not always feasible. Noisy speech simulation models that generate noisy speech from clean speech help remedy this issue. In our work, we study the ability of Generative Adversarial Networks (GANs) to simulate a variety of noises. Noise from the Ultra-High-Frequency/Very-High-Frequency (UHF/VHF), additive stationary and non-stationary, and codec distortion categories are studied. We propose four GANs, including the non-parallel translators, SpeechAttentionGAN, SimuGAN, and MaskCycleGAN-Augment, and the parallel translator, Speech2Speech-Augment. We achieved improvements of 55.8%, 28.9%, and 22.8% in terms of Multi-Scale Spectral Loss (MSSL) as compared to the baseline for the RATS, TIMIT-Cabin, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
