Benchmarking Music Generation Models and Metrics via Human Preference Studies

Florian Gr\"otschla; Ahmet Solak; Luca A. Lanzend\"orfer; Roger Wattenhofer

arXiv:2506.19085·cs.LG·June 25, 2025

Benchmarking Music Generation Models and Metrics via Human Preference Studies

Florian Gr\"otschla, Ahmet Solak, Luca A. Lanzend\"orfer, Roger Wattenhofer

PDF

TL;DR

This paper benchmarks state-of-the-art music generation models by comparing human preferences with various metrics through large-scale listening tests, providing insights into model quality and metric effectiveness.

Contribution

It introduces a comprehensive human preference dataset for music models and ranks models and metrics based on human judgments for the first time.

Findings

01

Human preferences correlate variably with existing metrics.

02

The dataset enables better evaluation of music generation quality.

03

Open access promotes further research in subjective metric assessment.

Abstract

Recent advancements have brought generated music closer to human-created compositions, yet evaluating these models remains challenging. While human preference is the gold standard for assessing quality, translating these subjective judgments into objective metrics, particularly for text-audio alignment and music quality, has proven difficult. In this work, we generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics. To the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference. To further the field of subjective metric evaluation, we provide open access to our dataset of generated music and human evaluations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.