YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
Chunbo Hao, Junjie Zheng, Guobin Ma, Yuepeng Jiang, Huakang Chen, Wenjie Tian, Gongyu Chen, Zihao Chen, Lei Xie

TL;DR
YingMusic-Singer-Plus is a diffusion-based singing voice synthesis model that allows flexible lyric editing and melody control without manual alignment, outperforming existing methods in melody preservation and lyric adherence.
Contribution
It introduces a novel diffusion model for controllable singing synthesis with lyric manipulation and presents a new benchmark for evaluating melody-preserving lyric modifications.
Findings
YingMusic-Singer-Plus outperforms Vevo2 in melody preservation and lyric adherence.
The model supports lyric editing without manual alignment.
Code and benchmark are publicly available at the provided GitHub link.
Abstract
Regenerating singing voices with altered lyrics while preserving melody consistency remains challenging, as existing methods either offer limited controllability or require laborious manual alignment. We propose YingMusic-Singer-Plus, a fully diffusion-based model enabling melody-controllable singing voice synthesis with flexible lyric manipulation. The model takes three inputs: an optional timbre reference, a melody-providing singing clip, and modified lyrics, without manual alignment. Trained with curriculum learning and Group Relative Policy Optimization, YingMusic-Singer-Plus achieves stronger melody preservation and lyric adherence than Vevo2, the most comparable baseline supporting melody control without manual alignment. We also introduce LyricEditBench, the first benchmark for melody-preserving lyric modification evaluation. The code, weights, benchmark, and demos are publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
