TL;DR
This paper investigates the impact of background noise on neural speech generation and proposes a denoising preprocessing step to improve model robustness in noisy conditions.
Contribution
It introduces a denoising preprocessing method during feature extraction and training to enhance neural speech models' performance in noisy environments.
Findings
Denoising preprocessing significantly improves speech quality in noisy conditions.
Training with clean speech targets enhances model robustness.
Preprocessing is the most effective strategy among tested methods.
Abstract
Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
