Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Changhao Jiang; Jiahao Chen; Zhenghao Xiang; Zhixiong Yang; Hanchen Wang; Jiabao Zhuang; Xinmeng Che; Jiajun Sun; Hui Li; Yifei Cao; Shihan Dou; Ming Zhang; Junjie Ye; Tao Ji; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2601.03973·cs.SD·January 13, 2026

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Changhao Jiang, Jiahao Chen, Zhenghao Xiang, Zhixiong Yang, Hanchen Wang, Jiabao Zhuang, Xinmeng Che, Jiajun Sun, Hui Li, Yifei Cao, Shihan Dou, Ming Zhang, Junjie Ye, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access 1 Datasets

TL;DR

Muse is an open-source system for long-form song generation that offers fine-grained style control, using a synthetic dataset and a simple training approach to achieve competitive quality and controllability.

Contribution

This work introduces a fully open-source, reproducible pipeline for long-form song generation with style control, including a synthetic dataset and a novel training method without complex architectural changes.

Findings

01

Muse achieves competitive performance on phoneme error rate.

02

Muse enables controllable segment-level music generation.

03

The system performs well despite modest data and model size.

Abstract

Recent commercial systems such as Suno demonstrate strong capabilities in long-form song generation, while academic research remains largely non-reproducible due to the lack of publicly available training data, hindering fair comparison and progress. To this end, we release a fully open-source system for long-form song generation with fine-grained style conditioning, including a licensed synthetic dataset, training and evaluation pipelines, and Muse, an easy-to-deploy song generation model. The dataset consists of 116k fully licensed synthetic songs with automatically generated lyrics and style descriptions paired with audio synthesized by SunoV5. We train Muse via single-stage supervised finetuning of a Qwen-based language model extended with discrete audio tokens using MuCodec, without task-specific losses, auxiliary objectives, or additional architectural components. Our evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

bolshyC/Muse
dataset· 2.6k dl
2.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Artificial Intelligence in Games