# Adversarially Trained Autoencoders for Parallel-Data-Free Voice   Conversion

**Authors:** Orhan Ocal, Oguz H. Elibol, Gokce Keskin, Cory Stephenson, Anil, Thomas, Kannan Ramchandran

arXiv: 1905.03864 · 2019-05-13

## TL;DR

This paper introduces an unsupervised adversarial autoencoder approach for voice conversion that does not require parallel data or phoneme alignment, enabling generalization to unseen speakers.

## Contribution

It proposes a novel autoencoder architecture with adversarial training that achieves speaker-independent encoding without parallel data, improving voice conversion flexibility.

## Key findings

- Subjective tests show high-quality voice conversion results.
- Method generalizes to out-of-training speakers.
- No need for parallel utterances or phoneme alignment.

## Abstract

We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The autoencoders are trained with an addition of an adversarial loss which is provided by an auxiliary classifier in order to guide the output of the encoder to be speaker independent. The training of the model is unsupervised in the sense that it does not require collecting the same utterances from the speakers nor does it require time aligning over phonemes. Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset. We present subjective tests corroborating the performance of our method.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.03864/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1905.03864/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1905.03864/full.md

---
Source: https://tomesphere.com/paper/1905.03864