Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive   Instance Normalization

Mingjie Chen; Yanpei Shi; Thomas Hain

arXiv:2010.11646·cs.SD·April 13, 2021

Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization

Mingjie Chen, Yanpei Shi, Thomas Hain

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel low-resource StarGAN-based voice conversion model that employs weight adaptive instance normalization to improve data efficiency and performance across many speakers with limited training data.

Contribution

The work proposes a new model using speaker embeddings and adaptive weight normalization to enhance many-to-many voice conversion in low-resource scenarios.

Findings

01

Outperforms baseline methods in objective evaluations.

02

Achieves higher naturalness and similarity in subjective tests.

03

Effective with as few as 5 samples per speaker.

Abstract

Many-to-many voice conversion with non-parallel training data has seen significant progress in recent years. StarGAN-based models have been interests of voice conversion. However, most of the StarGAN-based methods only focused on voice conversion experiments for the situations where the number of speakers was small, and the amount of training data was large. In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples. In order to improve data efficiency, the proposed model uses a speaker encoder for extracting speaker embeddings and conducts adaptive instance normalization (AdaIN) on convolutional weights. Experiments are conducted with 109 speakers under two low-resource situations, where the number of training samples is 20 and 5 per…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MingjieChen/LowResourceVC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing