Generative Speech Enhancement Based on Cloned Networks
Michael Chinen, W. Bastiaan Kleijn, Felicia S. C. Lim, Jan Skoglund

TL;DR
This paper introduces a novel generative speech enhancement method using cloned networks to extract noise-robust salient features, resulting in high-quality, natural-sounding speech restoration that surpasses existing techniques.
Contribution
The paper presents a new cloned network architecture for robust feature extraction and a generative WaveNet-based system for speech enhancement, achieving state-of-the-art results.
Findings
Outperforms existing systems across all SNR ranges
Produces natural speech with fewer artifacts in noisy conditions
Achieves state-of-the-art performance in MUSHRA-like tests
Abstract
We propose to implement speech enhancement by the regeneration of clean speech from a salient representation extracted from the noisy signal. The network that extracts salient features is trained using a set of weight-sharing clones of the extractor network. The clones receive mel-frequency spectra of different noisy versions of the same speech signal as input. By encouraging the outputs of the clones to be similar for these different input signals, we train a feature extractor network that is robust to noise. At inference, the salient features form the input to a WaveNet network that generates a natural and clean speech signal with the same attributes as the ground-truth clean signal. As the signal becomes noisier, our system produces natural sounding errors that stay on the speech manifold, in place of traditional artifacts found in other systems. Our experiments confirm that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
