TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Yu Zhang; Ziyue Jiang; Ruiqi Li; Changhao Pan; Jinzheng He; Rongjie Huang; Chuxin Wang; Zhou Zhao

arXiv:2409.15977·eess.AS·June 2, 2025

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Yu Zhang, Ziyue Jiang, Ruiqi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

TCSinger is a pioneering zero-shot singing voice synthesis model that enables style transfer and multi-level style control across languages and singing methods, producing high-quality, stylistically nuanced singing voices from audio and text prompts.

Contribution

It introduces a novel multi-module framework for zero-shot style transfer in singing voice synthesis, including style encoding, style and duration prediction, and style adaptive decoding.

Findings

01

Outperforms baseline models in quality and style similarity

02

Effective cross-lingual and speech-to-singing style transfer

03

Enables detailed multi-level style control

Abstract

Zero-shot singing voice synthesis (SVS) with style transfer and style control aims to generate high-quality singing voices with unseen timbres and styles (including singing method, emotion, rhythm, technique, and pronunciation) from audio and text prompts. However, the multifaceted nature of singing styles poses a significant challenge for effective modeling, transfer, and control. Furthermore, current SVS models often fail to generate singing voices rich in stylistic nuances for unseen singers. To address these challenges, we introduce TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control. Specifically, TCSinger proposes three primary modules: 1) the clustering style encoder employs a clustering vector quantization model to stably condense style information into a compact latent space; 2) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AaronZ345/TCSinger
pytorchOfficial

Models

🤗
AaronZ345/TCSinger
model· ♡ 1
♡ 1

Videos

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing