Exploring GPT's Ability as a Judge in Music Understanding
Kun Fang, Ziyu Wang, Gus Xia, and Ichiro Fujinaga

TL;DR
This paper investigates GPT's ability to serve as a judge in music information retrieval tasks by converting music data into symbolic form and evaluating its error detection capabilities across beat tracking, chord extraction, and key estimation.
Contribution
It introduces a systematic prompt engineering approach and a concept augmentation method to assess GPT's music reasoning and error detection in MIR tasks.
Findings
GPT achieves over 59% accuracy in MIR error detection.
Error detection accuracy improves with more concept information.
Results surpass random baseline, indicating GPT's potential in MIR evaluation.
Abstract
Recent progress in text-based Large Language Models (LLMs) and their extended ability to process multi-modal sensory data have led us to explore their applicability in addressing music information retrieval (MIR) challenges. In this paper, we use a systematic prompt engineering approach for LLMs to solve MIR problems. We convert the music data to symbolic inputs and evaluate LLMs' ability in detecting annotation errors in three key MIR tasks: beat tracking, chord extraction, and key estimation. A concept augmentation method is proposed to evaluate LLMs' music reasoning consistency with the provided music concepts in the prompts. Our experiments tested the MIR capabilities of Generative Pre-trained Transformers (GPT). Results show that GPT has an error detection accuracy of 65.20%, 64.80%, and 59.72% in beat tracking, chord extraction, and key estimation tasks, respectively, all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Interdisciplinary Research Innovations · Online Learning and Analytics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Discriminative Fine-Tuning · Layer Normalization · Cosine Annealing · Dense Connections · Adam · Softmax
