R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion

Junjie Zheng; Gongyu Chen; Chaofan Ding; Zihao Chen

arXiv:2510.20677·cs.SD·October 24, 2025

R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion

Junjie Zheng, Gongyu Chen, Chaofan Ding, Zihao Chen

PDF

Open Access

TL;DR

R2-SVC is a novel singing voice conversion framework that enhances robustness to noise and artifacts, improves expressiveness, and achieves state-of-the-art results in real-world noisy environments.

Contribution

It introduces simulation-based robustness, enriched speaker representations, and NSF integration for natural and controllable singing voice conversion.

Findings

01

State-of-the-art performance under noisy conditions

02

Improved robustness through data augmentation techniques

03

Enhanced naturalness and expressiveness of converted singing

Abstract

In real-world singing voice conversion (SVC) applications, environmental noise and the demand for expressive output pose significant challenges. Conventional methods, however, are typically designed without accounting for real deployment scenarios, as both training and inference usually rely on clean data. This mismatch hinders practical use, given the inevitable presence of diverse noise sources and artifacts from music separation. To tackle these issues, we propose R2-SVC, a robust and expressive SVC framework. First, we introduce simulation-based robustness enhancement through random fundamental frequency ( $F_{0}$ ) perturbations and music separation artifact simulations (e.g., reverberation, echo), substantially improving performance under noisy conditions. Second, we enrich speaker representation using domain-specific singing data: alongside clean vocals, we incorporate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders