TechSinger: Technique Controllable Multilingual Singing Voice Synthesis   via Flow Matching

Wenxiang Guo; Yu Zhang; Changhao Pan; Rongjie Huang; Li Tang; Ruiqi; Li; Zhiqing Hong; Yongqi Wang; Zhou Zhao

arXiv:2502.12572·cs.SD·April 22, 2025

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching

Wenxiang Guo, Yu Zhang, Changhao Pan, Rongjie Huang, Li Tang, Ruiqi, Li, Zhiqing Hong, Yongqi Wang, Zhou Zhao

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

TechSinger is a novel singing voice synthesis system that provides precise, multi-technique control across five languages using flow-matching models and natural language prompts, significantly improving expressiveness and realism.

Contribution

It introduces a flow-matching-based generative model for controllable singing synthesis with multi-language and multi-technique support, along with automatic technique annotation and natural language-based control.

Findings

01

Outperforms existing methods in audio quality and technique control

02

Supports five languages and seven vocal techniques

03

Enhances expressiveness and realism of synthetic singing voices

Abstract

Singing voice synthesis has made remarkable progress in generating natural and high-quality voices. However, existing methods rarely provide precise control over vocal techniques such as intensity, mixed voice, falsetto, bubble, and breathy tones, thus limiting the expressive potential of synthetic voices. We introduce TechSinger, an advanced system for controllable singing voice synthesis that supports five languages and seven vocal techniques. TechSinger leverages a flow-matching-based generative model to produce singing voices with enhanced expressive control over various techniques. To enhance the diversity of training data, we develop a technique detection model that automatically annotates datasets with phoneme-level technique labels. Additionally, our prompt-based technique prediction model enables users to specify desired vocal attributes through natural language, offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gwx314/techsinger
pytorchOfficial

Models

🤗
verstar/TechSinger
model· ♡ 2
♡ 2

Videos

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching· underline

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing