LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling
Yubo Huang, Xin Lai, Muyang Ye, Anran Zhu, Zixi Wang, Jingzehua Xu,, Shuai Zhang, Zhiyuan Zhou, Weijie Niu

TL;DR
LHQ-SVC introduces a lightweight diffusion-based singing voice conversion model that achieves high quality output with reduced computational requirements, suitable for CPU deployment.
Contribution
The paper presents LHQ-SVC, a novel, efficient SVC model that balances high audio quality with low resource consumption, advancing practical applications.
Findings
Maintains competitive voice conversion quality.
Significantly improves processing speed and efficiency.
Optimized for CPU execution with parallel computing.
Abstract
Singing Voice Conversion (SVC) has emerged as a significant subfield of Voice Conversion (VC), enabling the transformation of one singer's voice into another while preserving musical elements such as melody, rhythm, and timbre. Traditional SVC methods have limitations in terms of audio quality, data requirements, and computational complexity. In this paper, we propose LHQ-SVC, a lightweight, CPU-compatible model based on the SVC framework and diffusion model, designed to reduce model size and computational demand without sacrificing performance. We incorporate features to improve inference quality, and optimize for CPU execution by using performance tuning tools and parallel computing frameworks. Our experiments demonstrate that LHQ-SVC maintains competitive performance, with significant improvements in processing speed and efficiency across different devices. The results suggest that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
