Generalization of Spectrum Differential based Direct Waveform   Modification for Voice Conversion

Wen-Chin Huang; Yi-Chiao Wu; Kazuhiro Kobayashi; Yu-Huai Peng; Hsin-Te; Hwang; Patrick Lumban Tobing; Yu Tsao; Hsin-Min Wang; Tomoki Toda

arXiv:1907.11898·eess.AS·July 30, 2019·SSW·1 cites

Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion

Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te, Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces a flexible waveform modification method for voice conversion that eliminates the need for retraining spectral differential models, improving quality and generalizability across models.

Contribution

It proposes a novel F0 transformation-based DIFFVC framework that simplifies waveform generation and enhances compatibility with various spectral conversion models.

Findings

01

Outperforms vocoder-based baseline in quality

02

Compatible with non-parallel VAE spectral conversion

03

Eliminates retraining of spectral differential models

Abstract

We present a modification to the spectrum differential based direct waveform modification for voice conversion (DIFFVC) so that it can be directly applied as a waveform generation module to voice conversion models. The recently proposed DIFFVC avoids the use of a vocoder, meanwhile preserves rich spectral details hence capable of generating high quality converted voice. To apply the DIFFVC framework, a model that can estimate the spectral differential from the F0 transformed input speech needs to be trained beforehand. This requirement imposes several constraints, including a limitation on the estimation model to parallel training and the need of extra training on each conversion pair, which make DIFFVC inflexible. Based on the above motivations, we propose a new DIFFVC framework based on an F0 transformation in the residual domain. By performing inverse filtering on the input signal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing