Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

Changhao Pan; Dongyu Yao; Yu Zhang; Wenxiang Guo; Jingyu Lu; Zhiyuan Zhu; Zhou Zhao

arXiv:2601.13910·eess.AS·January 22, 2026

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

Changhao Pan, Dongyu Yao, Yu Zhang, Wenxiang Guo, Jingyu Lu, Zhiyuan Zhu, Zhou Zhao

PDF

Open Access

TL;DR

This survey comprehensively reviews deep-learning-based singing voice synthesis systems, categorizing architectures, analyzing core technologies, and discussing datasets and evaluation methods to guide future research and development.

Contribution

It provides the first systematic categorization and analysis of SVS architectures, technologies, datasets, and evaluation benchmarks in a comprehensive survey.

Findings

01

Cascaded and end-to-end architectures are the main paradigms.

02

Core technologies include singing modeling and control techniques.

03

Reviewed datasets, annotation tools, and evaluation benchmarks.

Abstract

Recent advances in singing voice synthesis (SVS) have attracted substantial attention from both academia and industry. With the advent of large language models and novel generative paradigms, producing controllable, high-fidelity singing voices has become an attainable goal. Yet the field still lacks a comprehensive survey that systematically analyzes deep-learning-based singing voice synthesis systems and their enabling technologies. To address the aforementioned issue, this survey first categorizes existing systems by task type and then organizes current architectures into two major paradigms: cascaded and end-to-end approaches. Moreover, we provide an in-depth analysis of core technologies, covering singing modeling and control techniques. Finally, we review relevant datasets, annotation tools, and evaluation benchmarks that support training and assessment. In appendix, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Music and Audio Processing