TinyMU: A Compact Audio-Language Model for Music Understanding
Xiquan Li, Aurian Quelennec, Slim Essid

TL;DR
TinyMU is a compact, 229-million-parameter music-language model that rivals larger models in music understanding tasks while being efficient enough for edge deployment.
Contribution
The paper introduces TinyMU, a small yet high-performing music-language model trained on a new curated dataset, enabling effective music understanding with reduced computational costs.
Findings
TinyMU achieves 82% of SOTA performance on MuChoMusic benchmark.
TinyMU is 35 times smaller than comparable large models.
The model performs well in both basic and complex music reasoning tasks.
Abstract
Music understanding and reasoning are central challenges in the Music Information Research field, with applications ranging from retrieval and recommendation to music agents and virtual assistants. Recent Large Audio-Language Models (LALMs) have shown remarkable progress in answering music-related questions by following user instructions. However, their massive scale, often billions of parameters, results in expensive training, slow inference, and limited deployability on edge devices. In this work, we present TinyMU, a lightweight (229M) Music-Language Model (MLM) that achieves performance comparable to much larger LALMs while remaining efficient and compact. To train TinyMU, we introduce MusicSkills-3.5M, a carefully curated, music-grounded question-answering dataset with 3.5M samples. Spanning multiple-choice, binary, and open-ended formats, this dataset provides fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
