VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and   Voice Conversion

Kyungguen Byun; Jason Filos; Erik Visser; and Sunkuk Moon

arXiv:2409.06126·eess.AS·September 11, 2024

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Kyungguen Byun, Jason Filos, Erik Visser, and Sunkuk Moon

PDF

Open Access

TL;DR

This paper introduces VC-ENHANCE, a two-stage speech restoration framework combining noise suppression and voice conversion, which improves speech quality and bandwidth, with a focus on balancing intelligibility and quality.

Contribution

It presents a novel diffusion-based voice conversion approach for restoring speech after noise suppression, including a content encoder adaptation for noisy environments.

Findings

01

Outperforms single-stage models in speech quality metrics

02

Achieves bandwidth extension, de-reverberation, and in-painting

03

Slightly lower speech intelligibility but improved overall quality

Abstract

Noise suppression (NS) algorithms are effective in improving speech quality in many cases. However, aggressive noise suppression can damage the target speech, reducing both speech intelligibility and quality despite removing the noise. This study proposes an explicit speech restoration method using a voice conversion (VC) technique for restoration after noise suppression. We observed that high-quality speech can be restored through a diffusion-based voice conversion stage, conditioned on the target speaker embedding and speech content information extracted from the de-noised speech. This speech restoration can achieve enhancement effects such as bandwidth extension, de-reverberation, and in-painting. Our experimental results demonstrate that this two-stage NS+VC framework outperforms single-stage enhancement models in terms of output speech quality, as measured by objective metrics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing