VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Kyungguen Byun, Jason Filos, Erik Visser, and Sunkuk Moon

TL;DR
This paper introduces VC-ENHANCE, a two-stage speech restoration framework combining noise suppression and voice conversion, which improves speech quality and bandwidth, with a focus on balancing intelligibility and quality.
Contribution
It presents a novel diffusion-based voice conversion approach for restoring speech after noise suppression, including a content encoder adaptation for noisy environments.
Findings
Outperforms single-stage models in speech quality metrics
Achieves bandwidth extension, de-reverberation, and in-painting
Slightly lower speech intelligibility but improved overall quality
Abstract
Noise suppression (NS) algorithms are effective in improving speech quality in many cases. However, aggressive noise suppression can damage the target speech, reducing both speech intelligibility and quality despite removing the noise. This study proposes an explicit speech restoration method using a voice conversion (VC) technique for restoration after noise suppression. We observed that high-quality speech can be restored through a diffusion-based voice conversion stage, conditioned on the target speaker embedding and speech content information extracted from the de-noised speech. This speech restoration can achieve enhancement effects such as bandwidth extension, de-reverberation, and in-painting. Our experimental results demonstrate that this two-stage NS+VC framework outperforms single-stage enhancement models in terms of output speech quality, as measured by objective metrics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
