Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec   Features and Dynamic Convolution

Mingjie Chen; Yanghao Zhou; Heyan Huang; Thomas Hain

arXiv:2203.17172·eess.AS·April 1, 2022·5 cites

Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution

Mingjie Chen, Yanghao Zhou, Heyan Huang, Thomas Hain

PDF

Open Access 1 Repo

TL;DR

This paper introduces DYGAN-VC, a compact, non-autoregressive GAN model for voice conversion that uses VQWav2vec features and dynamic convolution, achieving high performance with fewer parameters and faster processing.

Contribution

The paper presents a novel, smaller, and faster voice conversion model that maintains high quality by integrating VQWav2vec embeddings and dynamic convolution in a non-autoregressive GAN framework.

Findings

01

Achieved MOS scores up to 3.86 on VCC2020.

02

Reduced model size by approximately 50%.

03

Increased decoding speed by up to 8 times.

Abstract

It was shown recently that a combination of ASR and TTS models yield highly competitive performance on standard voice conversion tasks such as the Voice Conversion Challenge 2020 (VCC2020). To obtain good performance both models require pretraining on large amounts of data, thereby obtaining large models that are potentially inefficient in use. In this work we present a model that is significantly smaller and thereby faster in processing while obtaining equivalent performance. To achieve this the proposed model, Dynamic-GAN-VC (DYGAN-VC), uses a non-autoregressive structure and makes use of vector quantised embeddings obtained from a VQWav2vec model. Furthermore dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. Objective and subjective evaluation was performed using the VCC2020 task, yielding MOS scores of up to 3.86, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingjiechen/dyganvc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Topic Modeling

MethodsConvolution