SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment
Dapeng Wu, Shun Lei, Wei Tan, Guangzheng Li, Yunzhe Wang, Huaicheng Zhang, Lishi Zuo, Zhiyong Wu

TL;DR
SongBench is a new benchmark framework for detailed, multi-aspect evaluation of song quality, aiding the development of more professional and musically coherent text-to-song models.
Contribution
It introduces a fine-grained, expert-annotated evaluation framework across seven musical dimensions for assessing song generation models.
Findings
High correlation between SongBench scores and expert ratings
Revealed performance gaps in current state-of-the-art models
Constructed a database of 11,717 expert-labeled samples
Abstract
Recent advancements in Text-to-Song generation have enabled realistic musical content production, yet existing evaluation benchmarks lack the professional granularity to capture multi-dimensional aesthetic nuances. In this paper, we propose SongBench, a specialized framework for fine-grained song assessment across seven key dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality. Utilizing this framework, we construct an expert-annotated database comprising 11,717 samples from state-of-the-art models, labeled by music professionals. Extensive experimental results demonstrate that SongBench achieves high correlation with expert ratings. By revealing fine-grained performance gaps in current state-of-the-art models, SongBench serves as a diagnostic benchmark to steer the development toward more professional and musically coherent song generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
