TL;DR
TCSinger 2 is a novel multilingual zero-shot singing voice synthesis model that improves transition smoothness, style control, and robustness without relying on extensive annotations, enabling high-quality, customizable singing synthesis.
Contribution
It introduces a multi-task model with novel modules for boundary prediction, style transfer, and contrastive learning, advancing zero-shot multilingual singing voice synthesis capabilities.
Findings
Outperforms baseline models in subjective and objective metrics
Enables smooth phoneme and note transitions without annotations
Provides effective multi-level style control via diverse prompts
Abstract
Customizable multilingual zero-shot singing voice synthesis (SVS) has various potential applications in music composition and short video dubbing. However, existing SVS models overly depend on phoneme and note boundary annotations, limiting their robustness in zero-shot scenarios and producing poor transitions between phonemes and notes. Moreover, they also lack effective multi-level style control via diverse prompts. To overcome these challenges, we introduce TCSinger 2, a multi-task multilingual zero-shot SVS model with style transfer and style control based on various prompts. TCSinger 2 mainly includes three key modules: 1) Blurred Boundary Content (BBC) Encoder, predicts duration, extends content embedding, and applies masking to the boundaries to enable smooth transitions. 2) Custom Audio Encoder, uses contrastive learning to extract aligned representations from singing, speech,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Dropout · Adam · Multi-Head Attention · Dense Connections · Layer Normalization · Contrastive Learning
