Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
Weida Liang, Lantian Li, Wenqiang Du, Dong Wang

TL;DR
This paper introduces an enhanced exemplar autoencoder with cycle consistency loss for any-to-one voice conversion, improving quality by ensuring content representation remains consistent across speakers.
Contribution
It proposes a cycle consistency loss to improve exemplar autoencoder performance in voice conversion, enabling better content representation and reconstruction.
Findings
Consistent improvement over baseline eAE in experiments
Effective in preserving content in voice conversion
Source code and examples provided
Abstract
Recent research showed that an autoencoder trained with speech of a single speaker, called exemplar autoencoder (eAE), can be used for any-to-one voice conversion (VC). Compared to large-scale many-to-many models such as AutoVC, the eAE model is easy and fast in training, and may recover more details of the target speaker. To ensure VC quality, the latent code should represent and only represent content information. However, this is not easy to attain for eAE as it is unaware of any speaker variation in model training. To tackle the problem, we propose a simple yet effective approach based on a cycle consistency loss. Specifically, we train eAEs of multiple speakers with a shared encoder, and meanwhile encourage the speech reconstructed from any speaker-specific decoder to get a consistent latent code as the original speech when cycled back and encoded again. Experiments conducted on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
