A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems
Yin-Ping Cho, Fu-Rong Yang, Yung-Chuan Chang, Ching-Ting Cheng,, Xiao-Han Wang, Yi-Wen Liu

TL;DR
This survey reviews recent deep learning-based singing voice synthesis systems, highlighting their architectures, strengths, limitations, and the future challenges in achieving human-like singing voice synthesis.
Contribution
It provides a comprehensive overview of state-of-the-art deep learning models for SVS, summarizing their architectures and identifying key challenges for future research.
Findings
Deep learning significantly improves singing voice naturalness.
Current systems face challenges in achieving human-like expressiveness.
Future research needs to address limitations in model generalization and quality.
Abstract
Singing voice synthesis (SVS) is a task that aims to generate audio signals according to musical scores and lyrics. With its multifaceted nature concerning music and language, producing singing voices indistinguishable from that of human singers has always remained an unfulfilled pursuit. Nonetheless, the advancements of deep learning techniques have brought about a substantial leap in the quality and naturalness of synthesized singing voice. This paper aims to review some of the state-of-the-art deep learning-driven SVS systems. We intend to summarize their deployed model architectures and identify the strengths and limitations for each of the introduced systems. Thereby, we picture the recent advancement trajectory of this field and conclude the challenges left to be resolved both in commercial applications and academic research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
