Invertible Voice Conversion

Zexin Cai; Ming Li

arXiv:2201.10687·eess.AS·January 27, 2022

Invertible Voice Conversion

Zexin Cai, Ming Li

PDF

Open Access

TL;DR

This paper introduces INVVC, an invertible deep learning framework for voice conversion that ensures source identity traceability and reversibility, enhancing security and reliability in voice transformation tasks.

Contribution

The paper presents a novel invertible framework for voice conversion that allows for both high-quality conversion and the ability to revert to the original voice using the same model parameters.

Findings

01

Impressive voice conversion performance achieved.

02

Converted voices can be reversed to original inputs.

03

Framework enhances security by traceability.

Abstract

In this paper, we propose an invertible deep learning framework called INVVC for voice conversion. It is designed against the possible threats that inherently come along with voice conversion systems. Specifically, we develop an invertible framework that makes the source identity traceable. The framework is built on a series of invertible $1 \times 1$ convolutions and flows consisting of affine coupling layers. We apply the proposed framework to one-to-one voice conversion and many-to-one conversion using parallel training data. Experimental results show that this approach yields impressive performance on voice conversion and, moreover, the converted results can be reversed back to the source inputs utilizing the same parameters as in forwarding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsAffine Coupling