CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa

TL;DR
CMI-Bench introduces a comprehensive benchmark for evaluating music instruction-following capabilities of audio-text LLMs across diverse MIR tasks, highlighting current model limitations and biases.
Contribution
This paper presents CMI-Bench, a new standardized benchmark for assessing music instruction-following in audio-text LLMs across multiple MIR tasks, enabling fair comparison and driving progress.
Findings
Significant performance gaps between LLMs and supervised models.
Identification of cultural, chronological, and gender biases in models.
Benchmark supports multiple open-source audio-text LLMs.
Abstract
Recent advances in audio-text large language models (LLMs) have opened new possibilities for music understanding and generation. However, existing benchmarks are limited in scope, often relying on simplified tasks or multi-choice evaluations that fail to reflect the complexity of real-world music analysis. We reinterpret a broad range of traditional MIR annotations as instruction-following formats and introduce CMI-Bench, a comprehensive music instruction following benchmark designed to evaluate audio-text LLMs on a diverse set of music information retrieval (MIR) tasks. These include genre classification, emotion regression, emotion tagging, instrument classification, pitch estimation, key detection, lyrics transcription, melody extraction, vocal technique recognition, instrument performance technique detection, music tagging, music captioning, and (down)beat tracking: reflecting core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Music Education Insights
MethodsSparse Evolutionary Training
