ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
Meiying Chen, Zhiyao Duan

TL;DR
ControlVC is a neural voice conversion system that enables time-varying control over pitch and speed, improving speech quality and controllability in zero-shot, non-parallel scenarios.
Contribution
It introduces the first neural VC system with interpretable, time-varying control over pitch and speed using pre-trained encoders and a vocoder.
Findings
Outperforms baselines in speech quality
Successfully achieves time-varying pitch control
Effective in zero-shot, non-parallel conversion
Abstract
Recent developments in neural speech synthesis and vocoding have sparked a renewed interest in voice conversion (VC). Beyond timbre transfer, achieving controllability on para-linguistic parameters such as pitch and Speed is critical in deploying VC systems in many application scenarios. Existing studies, however, either only provide utterance-level global control or lack interpretability on the controls. In this paper, we propose ControlVC, the first neural voice conversion system that achieves time-varying controls on pitch and speed. ControlVC uses pre-trained encoders to compute pitch and linguistic embeddings from the source utterance and speaker embeddings from the target utterance. These embeddings are then concatenated and converted to speech using a vocoder. It achieves speed control through TD-PSOLA pre-processing on the source utterance, and achieves pitch control by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
