Handling Background Noise in Neural Speech Generation

Tom Denton; Alejandro Luebs; Felicia S. C. Lim; Andrew Storus,; Hengchin Yeh; W. Bastiaan Kleijn; Jan Skoglund

arXiv:2102.11906·eess.AS·February 25, 2021

Handling Background Noise in Neural Speech Generation

Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus,, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund

PDF

1 Repo

TL;DR

This paper investigates the impact of background noise on neural speech generation and proposes a denoising preprocessing step to improve model robustness in noisy conditions.

Contribution

It introduces a denoising preprocessing method during feature extraction and training to enhance neural speech models' performance in noisy environments.

Findings

01

Denoising preprocessing significantly improves speech quality in noisy conditions.

02

Training with clean speech targets enhances model robustness.

03

Preprocessing is the most effective strategy among tested methods.

Abstract

Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/lyra
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.