Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck

Zhetao Hu; Yiquan Zhou; Wenyu Wang; Zhiyu Wu; Xin Gao; Jihua Zhu

arXiv:2604.05526·cs.SD·April 8, 2026

Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck

Zhetao Hu, Yiquan Zhou, Wenyu Wang, Zhiyu Wu, Xin Gao, Jihua Zhu

PDF

TL;DR

This paper introduces a novel singing style conversion system that enhances fine-grained control, high-fidelity output, and style suppression using innovative boundary-aware techniques and high-frequency augmentation, achieving top naturalness in evaluations.

Contribution

The system employs boundary-aware bottlenecks, explicit frame-level style matrices, and high-frequency augmentation to improve style conversion quality and control with limited data.

Findings

01

Achieved the best naturalness in SVCC2025 evaluations.

02

Maintained competitive speaker similarity and style control.

03

Performed well with significantly less data than other systems.

Abstract

This paper presents the submission of the S4 team to the Singing Voice Conversion Challenge 2025 (SVCC2025)-a novel singing style conversion system that advances fine-grained style conversion and control within in-domain settings. To address the critical challenges of style leakage, dynamic rendering, and high-fidelity generation with limited data, we introduce three key innovations: a boundary-aware Whisper bottleneck that pools phoneme-span representations to suppress residual source style while preserving linguistic content; an explicit frame-level technique matrix, enhanced by targeted F0 processing during inference, for stable and distinct dynamic style rendering; and a perceptually motivated high-frequency band completion strategy that leverages an auxiliary standard 48kHz SVC model to augment the high-frequency spectrum, thereby overcoming data scarcity without overfitting. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.