MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language   Models

Benno Weck; Ilaria Manco; Emmanouil Benetos; Elio Quinton; George; Fazekas; Dmitry Bogdanov

arXiv:2408.01337·cs.SD·August 5, 2024·3 cites

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Benno Weck, Ilaria Manco, Emmanouil Benetos, Elio Quinton, George, Fazekas, Dmitry Bogdanov

PDF

Open Access 2 Repos 3 Datasets

TL;DR

MuChoMusic is a comprehensive benchmark designed to evaluate the music understanding capabilities of multimodal audio-language models, highlighting current limitations and guiding future improvements in the field.

Contribution

It introduces a new, validated benchmark with diverse questions to assess multimodal models' music understanding and analyzes existing models' performance and shortcomings.

Findings

01

Models often rely too heavily on language modality.

02

Current models struggle with fundamental musical concepts.

03

The benchmark reveals significant room for improvement in multimodal integration.

Abstract

Multimodal models that jointly process audio and language hold great promise in audio understanding and are increasingly being adopted in the music domain. By allowing users to query via text and obtain information about a given audio input, these models have the potential to enable a variety of music understanding tasks via language-based interfaces. However, their evaluation poses considerable challenges, and it remains unclear how to effectively assess their ability to correctly interpret music-related inputs with current methods. Motivated by this, we introduce MuChoMusic, a benchmark for evaluating music understanding in multimodal language models focused on audio. MuChoMusic comprises 1,187 multiple-choice questions, all validated by human annotators, on 644 music tracks sourced from two publicly available music datasets, and covering a wide variety of genres. Questions in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies