Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

Chen Geng; Meng Chen; Ruohua Zhou; Ruolan Liu; Weifeng Zhao

arXiv:2605.12310·cs.SD·May 13, 2026

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

Chen Geng, Meng Chen, Ruohua Zhou, Ruolan Liu, Weifeng Zhao

PDF

TL;DR

Poly-SVC is a novel singing voice conversion system that effectively processes residual harmonies using a harmonic-aware approach, outperforming existing methods in naturalness and timbre similarity.

Contribution

It introduces a zero-shot, cross-lingual SVC system with a CQT-based pitch extractor and a diffusion decoder, capable of handling residual harmonies in polyphonic recordings.

Findings

01

Poly-SVC outperforms baseline models in naturalness and timbre similarity.

02

It effectively reconstructs harmonies in both harmony-rich and single-melody recordings.

03

The system demonstrates superior harmony preservation compared to existing methods.

Abstract

Singing Voice Conversion (SVC) aims to transform a source singing voice into a target singer while preserving lyrics and melody. Most existing SVC methods depend on F0 extractors to capture the lead melody from clean vocals. However, no existing method can reliably extract clean vocals from accompanied recordings without leaving residual harmonies behind. In this paper, we innovatively propose Poly-SVC, a zero-shot, cross-lingual singing voice conversion system designed to process residual harmonies. Poly-SVC is composed of three key components: a Constant-Q Transform (CQT)-based pitch extractor to preserve both the lead melody and residual harmony, a random sampler to reduce interference information from the CQT and a diffusion decoder based on Conditional Flow Matching (CFM) that fuses pitch, content, and timbre features into natural-sounding polyphonic outputs. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.