An Adaptive Learning based Generative Adversarial Network for One-To-One   Voice Conversion

Sandipan Dhar; Nanda Dulal Jana; Swagatam Das

arXiv:2104.12159·cs.SD·April 27, 2021

An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion

Sandipan Dhar, Nanda Dulal Jana, Swagatam Das

PDF

Open Access

TL;DR

This paper introduces ALGAN-VC, an adaptive learning-based GAN model with a Dense Residual Network architecture for improved one-to-one voice conversion, enhancing speech quality and speaker similarity.

Contribution

The paper proposes a novel adaptive learning mechanism and a Dense Residual Network architecture within a GAN framework for more effective voice conversion.

Findings

01

Achieved high speaker similarity in converted speech.

02

Demonstrated improved speech quality through subjective and objective evaluations.

03

Validated on multiple datasets including VCC 2016, 2018, 2020, and a custom Indian language dataset.

Abstract

Voice Conversion (VC) emerged as a significant domain of research in the field of speech synthesis in recent years due to its emerging application in voice-assisting technology, automated movie dubbing, and speech-to-singing conversion to name a few. VC basically deals with the conversion of vocal style of one speaker to another speaker while keeping the linguistic contents unchanged. VC task is performed through a three-stage pipeline consisting of speech analysis, speech feature mapping, and speech reconstruction. Nowadays the Generative Adversarial Network (GAN) models are widely in use for speech feature mapping from source to target speaker. In this paper, we propose an adaptive learning-based GAN model called ALGAN-VC for an efficient one-to-one VC of speakers. Our ALGAN-VC framework consists of some approaches to improve the speech quality and voice similarity between source and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders