Invertible Voice Conversion
Zexin Cai, Ming Li

TL;DR
This paper introduces INVVC, an invertible deep learning framework for voice conversion that ensures source identity traceability and reversibility, enhancing security and reliability in voice transformation tasks.
Contribution
The paper presents a novel invertible framework for voice conversion that allows for both high-quality conversion and the ability to revert to the original voice using the same model parameters.
Findings
Impressive voice conversion performance achieved.
Converted voices can be reversed to original inputs.
Framework enhances security by traceability.
Abstract
In this paper, we propose an invertible deep learning framework called INVVC for voice conversion. It is designed against the possible threats that inherently come along with voice conversion systems. Specifically, we develop an invertible framework that makes the source identity traceable. The framework is built on a series of invertible convolutions and flows consisting of affine coupling layers. We apply the proposed framework to one-to-one voice conversion and many-to-one conversion using parallel training data. Experimental results show that this approach yields impressive performance on voice conversion and, moreover, the converted results can be reversed back to the source inputs utilizing the same parameters as in forwarding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsAffine Coupling
