Voice Conversion from Non-parallel Corpora Using Variational   Auto-encoder

Chin-Cheng Hsu; Hsin-Te Hwang; Yi-Chiao Wu; Yu Tsao; Hsin-Min Wang

arXiv:1610.04019·stat.ML·October 14, 2016·23 cites

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

PDF

Open Access 5 Repos

TL;DR

This paper introduces a variational auto-encoder based spectral conversion framework that effectively utilizes non-parallel corpora, overcoming the limitations of previous methods requiring aligned data.

Contribution

It presents a novel VAE-based spectral conversion approach that eliminates the need for parallel corpora or phonetic alignments, broadening practical applications.

Findings

01

Achieves comparable spectral conversion quality without aligned data.

02

Demonstrates effectiveness through objective and subjective evaluations.

03

Outperforms traditional methods requiring parallel corpora.

Abstract

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing