Enhanced exemplar autoencoder with cycle consistency loss in any-to-one   voice conversion

Weida Liang; Lantian Li; Wenqiang Du; Dong Wang

arXiv:2204.03847·cs.SD·April 13, 2022·1 cites

Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion

Weida Liang, Lantian Li, Wenqiang Du, Dong Wang

PDF

Open Access

TL;DR

This paper introduces an enhanced exemplar autoencoder with cycle consistency loss for any-to-one voice conversion, improving quality by ensuring content representation remains consistent across speakers.

Contribution

It proposes a cycle consistency loss to improve exemplar autoencoder performance in voice conversion, enabling better content representation and reconstruction.

Findings

01

Consistent improvement over baseline eAE in experiments

02

Effective in preserving content in voice conversion

03

Source code and examples provided

Abstract

Recent research showed that an autoencoder trained with speech of a single speaker, called exemplar autoencoder (eAE), can be used for any-to-one voice conversion (VC). Compared to large-scale many-to-many models such as AutoVC, the eAE model is easy and fast in training, and may recover more details of the target speaker. To ensure VC quality, the latent code should represent and only represent content information. However, this is not easy to attain for eAE as it is unaware of any speaker variation in model training. To tackle the problem, we propose a simple yet effective approach based on a cycle consistency loss. Specifically, we train eAEs of multiple speakers with a shared encoder, and meanwhile encourage the speech reconstructed from any speaker-specific decoder to get a consistent latent code as the original speech when cycled back and encoded again. Experiments conducted on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling