DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization

Huakang Chen; Yuepeng Jiang; Guobin Ma; Chunbo Hao; Shuai Wang; Jixun Yao; Ziqian Ning; Meng Meng; Jian Luan; Lei Xie

arXiv:2507.12890·eess.AS·July 25, 2025

DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization

Huakang Chen, Yuepeng Jiang, Guobin Ma, Chunbo Hao, Shuai Wang, Jixun Yao, Ziqian Ning, Meng Meng, Jian Luan, Lei Xie

PDF

Open Access

TL;DR

DiffRhythm+ is a diffusion-based model that generates full-length, expressive songs with enhanced controllability and diversity by using a balanced dataset, multi-modal style conditioning, and preference optimization.

Contribution

It introduces DiffRhythm+, which improves controllability, diversity, and quality in full-length song generation through dataset balancing, multi-modal style conditioning, and user preference-guided optimization.

Findings

01

Significant improvements in naturalness and musical expressiveness.

02

Enhanced controllability over musical styles via multi-modal conditioning.

03

Higher listener satisfaction and arrangement complexity.

Abstract

Songs, as a central form of musical art, exemplify the richness of human intelligence and creativity. While recent advances in generative modeling have enabled notable progress in long-form song generation, current systems for full-length song synthesis still face major challenges, including data imbalance, insufficient controllability, and inconsistent musical quality. DiffRhythm, a pioneering diffusion-based model, advanced the field by generating full-length songs with expressive vocals and accompaniment. However, its performance was constrained by an unbalanced model training dataset and limited controllability over musical style, resulting in noticeable quality disparities and restricted creative flexibility. To address these limitations, we propose DiffRhythm+, an enhanced diffusion-based framework for controllable and flexible full-length song generation. DiffRhythm+ leverages a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Human Motion and Animation