DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled   Representation and Prior Mixup for Verified Robust Voice Conversion

Ha-Yeong Choi; Sang-Hoon Lee; Seong-Whan Lee

arXiv:2305.15816·eess.AS·May 26, 2023·1 cites

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion

Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DDDM-VC, a novel voice conversion method using decoupled diffusion models with disentangled representations and prior mixup, enabling precise style control and robustness across various speech attributes.

Contribution

It proposes a decoupled diffusion framework with disentangled speech representations and a prior mixup technique for improved, controllable, and robust voice conversion.

Findings

01

Outperforms existing VC models in quality.

02

Provides robust performance across different model sizes.

03

Enables precise control over speech attributes.

Abstract

Diffusion-based generative models have exhibited powerful generative performance in recent years. However, as many attributes exist in the data distribution and owing to several limitations of sharing the model parameters across all levels of the generation process, it remains challenging to control specific styles for each attribute. To address the above problem, this paper presents decoupled denoising diffusion models (DDDMs) with disentangled representations, which can control the style for each attribute in generative models. We apply DDDMs to voice conversion (VC) tasks to address the challenges of disentangling and controlling each speech attribute (e.g., linguistic information, intonation, and timbre). First, we use a self-supervised representation to disentangle the speech representation. Subsequently, the DDDMs are applied to resynthesize the speech from the disentangled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hayeong0/DDDM-VC
pytorchOfficial

Videos

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Mixup