Towards end-to-end F0 voice conversion based on Dual-GAN with   convolutional wavelet kernels

Cl\'ement Le Moine Veillon; Nicolas Obin; Axel Roebel

arXiv:2104.07283·eess.AS·April 16, 2021

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

Cl\'ement Le Moine Veillon, Nicolas Obin, Axel Roebel

PDF

TL;DR

This paper introduces an end-to-end neural network framework utilizing Dual-GAN and convolutional wavelet kernels for expressive F0 voice conversion, enabling efficient multi-scale F0 representation and emotion transformation.

Contribution

It proposes a novel end-to-end F0 transformation model combining wavelet-based feature extraction with adversarial learning for emotion conversion.

Findings

01

Effective multi-scale F0 encoding achieved

02

Successful emotion-to-emotion F0 transformation demonstrated

03

End-to-end training improves F0 conversion quality

Abstract

This paper presents a end-to-end framework for the F0 transformation in the context of expressive voice conversion. A single neural network is proposed, in which a first module is used to learn F0 representation over different temporal scales and a second adversarial module is used to learn the transformation from one emotion to another. The first module is composed of a convolution layer with wavelet kernels so that the various temporal scales of F0 variations can be efficiently encoded. The single decomposition/transformation network allows to learn in a end-to-end manner the F0 decomposition that are optimal with respect to the transformation, directly from the raw F0 signal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution