A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

Xiaohai Tian; Eng Siong Chng; Haizhou Li

arXiv:1902.03705·eess.AS·September 18, 2019·6 cites

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

Xiaohai Tian, Eng Siong Chng, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a vocoder-free WaveNet-based voice conversion method that directly maps phonetic representations to speech waveforms, improving quality and reducing errors in non-parallel data scenarios.

Contribution

It proposes a novel vocoder-free approach using WaveNet to directly convert PPGs to speech, eliminating vocoder-related quality issues and feature mismatch problems.

Findings

01

Significantly better speech quality than baseline methods

02

Effective non-parallel training with PPGs

03

Reduces estimation errors caused by vocoders

Abstract

In a typical voice conversion system, vocoder is commonly used for speech-to-features analysis and features-to-speech synthesis. However, vocoder can be a source of speech quality degradation. This paper presents a vocoder-free voice conversion approach using WaveNet for non-parallel training data. Instead of dealing with the intermediate features, the proposed approach utilizes the WaveNet to map the Phonetic PosteriorGrams (PPGs) to the waveform samples directly. In this way, we avoid the estimation errors caused by vocoder and feature conversion. Additionally, as PPG is assumed to be speaker independent, the proposed method also reduces the feature mismatch problem in WaveNet vocoder based approaches. Experimental results conducted on the CMU-ARCTIC database show that the proposed approach significantly outperforms the baseline approaches in terms of speech quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing