BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning
Min Jang, Orevaoghene Ahia, Nazif Tamer, Sachin Kumar, Yulia Tsvetkov, Noah A. Smith

TL;DR
BASS is a comprehensive benchmark for evaluating music understanding in audio language models, focusing on structural, semantic, and reasoning tasks across diverse music genres, revealing current models' strengths and limitations.
Contribution
This work introduces BASS, a large-scale, multi-task benchmark for assessing reasoning and understanding in music-focused audio language models, highlighting their current capabilities and gaps.
Findings
State-of-the-art models excel at lyric transcription.
Models struggle with structural segmentation and artist collaboration.
Current models effectively leverage linguistic priors.
Abstract
Music understanding is a complex task that often requires reasoning over both structural and semantic elements of audio. We introduce BASS, designed to evaluate music understanding and reasoning in audio language models across four broad categories: structural segmentation, lyric transcription, musicological analysis, and artist collaboration. BASS comprises 2658 questions spanning 12 tasks, 1993 unique songs and covering over 138 hours of music from a wide range of genres and tracks, crafted to assess musicological knowledge and reasoning in real-world scenarios. We evaluate 14 open-source and frontier multimodal LMs, finding that even state-of-the-art models struggle on higher-level reasoning tasks such as structural segmentation and artist collaboration, while performing best on lyric transcription. Our analysis reveals that current models leverage linguistic priors effectively but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Music Technology and Sound Studies
