Generative Speech Enhancement Based on Cloned Networks

Michael Chinen; W. Bastiaan Kleijn; Felicia S. C. Lim; Jan Skoglund

arXiv:1909.04776·eess.AS·September 12, 2019·1 cites

Generative Speech Enhancement Based on Cloned Networks

Michael Chinen, W. Bastiaan Kleijn, Felicia S. C. Lim, Jan Skoglund

PDF

Open Access

TL;DR

This paper introduces a novel generative speech enhancement method using cloned networks to extract noise-robust salient features, resulting in high-quality, natural-sounding speech restoration that surpasses existing techniques.

Contribution

The paper presents a new cloned network architecture for robust feature extraction and a generative WaveNet-based system for speech enhancement, achieving state-of-the-art results.

Findings

01

Outperforms existing systems across all SNR ranges

02

Produces natural speech with fewer artifacts in noisy conditions

03

Achieves state-of-the-art performance in MUSHRA-like tests

Abstract

We propose to implement speech enhancement by the regeneration of clean speech from a salient representation extracted from the noisy signal. The network that extracts salient features is trained using a set of weight-sharing clones of the extractor network. The clones receive mel-frequency spectra of different noisy versions of the same speech signal as input. By encouraging the outputs of the clones to be similar for these different input signals, we train a feature extractor network that is robust to noise. At inference, the salient features form the input to a WaveNet network that generates a natural and clean speech signal with the same attributes as the ground-truth clean signal. As the signal becomes noisier, our system produces natural sounding errors that stay on the speech manifold, in place of traditional artifacts found in other systems. Our experiments confirm that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing