Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Chao Xie; Yi-Chiao Wu; Patrick Lumban Tobing; Wen-Chin Huang and; Tomoki Toda

arXiv:2111.07116·cs.SD·November 16, 2021·1 cites

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang and, Tomoki Toda

PDF

Open Access 1 Repo

TL;DR

This paper introduces a noisy-to-noisy voice conversion framework that preserves background sounds while converting speaker identity, addressing distortion issues with an improved waveform modeling approach, and demonstrating significant performance gains.

Contribution

The paper proposes a novel noisy-to-noisy voice conversion framework with an improved module that directly models noisy speech, enhancing naturalness and similarity in background sound preservation.

Findings

01

Significant improvement over previous framework in naturalness scores

02

Achieves comparable speaker similarity to upper bound

03

Effectively preserves background sounds in noisy-to-noisy conversion

Abstract

Beyond the conventional voice conversion (VC) where the speaker information is converted without altering the linguistic content, the background sounds are informative and need to be retained in some real-world scenarios, such as VC in movie/video and VC in music where the voice is entangled with background sounds. As a new VC framework, we have developed a noisy-to-noisy (N2N) VC framework to convert the speaker's identity while preserving the background sounds. Although our framework consisting of a denoising module and a VC module well handles the background sounds, the VC module is sensitive to the distortion caused by the denoising module. To address this distortion issue, in this paper we propose the improved VC module to directly model the noisy speech waveform while controlling the background sounds. The experimental results have demonstrated that our improved framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chaoxiefs/n2nvc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing