The Music Maestro or The Musically Challenged, A Massive Music   Evaluation Benchmark for Large Language Models

Jiajia Li; Lu Yang; Mingni Tang; Cong Chen; Zuchao Li; Ping Wang; Hai; Zhao

arXiv:2406.15885·cs.SD·June 25, 2024·1 cites

The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai, Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

ZIQI-Eval is a large-scale, comprehensive benchmark designed to evaluate the musical abilities of large language models, revealing that current models perform poorly in this domain and highlighting the need for further development.

Contribution

The paper introduces ZIQI-Eval, the first dedicated large-scale music benchmark for LLMs, enabling standardized assessment of their musical capabilities.

Findings

01

All evaluated LLMs perform poorly on the benchmark.

02

The benchmark covers 10 categories and 56 subcategories.

03

Over 14,000 curated questions in the dataset.

Abstract

Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zcli-charlie/ziqi-eval
pytorchOfficial

Videos

The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models· underline

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies