Music Arena: Live Evaluation for Text-to-Music

Yonghyun Kim; Wayne Chi; Anastasios N. Angelopoulos; Wei-Lin Chiang; Koichi Saito; Shinji Watanabe; Yuki Mitsufuji; Chris Donahue

arXiv:2507.20900·cs.SD·November 4, 2025

Music Arena: Live Evaluation for Text-to-Music

Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue

PDF

1 Datasets

TL;DR

Music Arena is an open platform enabling scalable, live human preference evaluations for text-to-music models, fostering standardized comparison, data transparency, and domain-specific features to advance TTM research.

Contribution

It introduces a live evaluation platform with tailored features for TTM, including an LLM-based routing system and renewable preference data collection.

Findings

01

Established a scalable human preference evaluation protocol.

02

Created a renewable, privacy-preserving preference data resource.

03

Demonstrated the platform's utility in benchmarking TTM systems.

Abstract

We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare, as study protocols may differ across systems. Moreover, human preferences might help researchers align their TTM systems or improve automatic evaluation metrics, but an open and renewable source of preferences does not currently exist. We aim to fill these gaps by offering *live* evaluation for TTM. In Music Arena, real-world users input text prompts of their choosing and compare outputs from two TTM systems, and their preferences are used to compile a leaderboard. While Music Arena follows recent evaluation trends in other AI domains, we also design it with key features tailored to music: an LLM-based routing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

music-arena/music-arena-dataset
dataset· 4.2k dl
4.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.