ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary   classifier variational autoencoder

Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo

arXiv:1808.05092·stat.ML·October 13, 2020·48 cites

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

PDF

Open Access 2 Repos

TL;DR

This paper introduces ACVAE-VC, a non-parallel many-to-many voice conversion method utilizing an auxiliary classifier VAE with convolutional networks and information regularization to improve control over voice characteristics.

Contribution

It presents a novel ACVAE-based approach with convolutional architectures and information-theoretic regularization for effective non-parallel voice conversion.

Findings

01

Effective non-parallel many-to-many voice conversion demonstrated.

02

Convolutional architectures capture time dependencies in speech.

03

Regularization ensures attribute class information is preserved.

Abstract

This paper proposes a non-parallel many-to-many voice conversion (VC) method using a variant of the conditional variational autoencoder (VAE) called an auxiliary classifier VAE (ACVAE). The proposed method has three key features. First, it adopts fully convolutional architectures to construct the encoder and decoder networks so that the networks can learn conversion rules that capture time dependencies in the acoustic feature sequences of source and target speech. Second, it uses an information-theoretic regularization for the model training to ensure that the information in the attribute class label will not be lost in the conversion process. With regular CVAEs, the encoder and decoder are free to ignore the attribute class label input. This can be problematic since in such a situation, the attribute class label will have little effect on controlling the voice characteristics of input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSolana Customer Service Number +1-833-534-1729 · Auxiliary Classifier · USD Coin Customer Service Number +1-833-534-1729