Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos

TL;DR
This paper evaluates how well current foundation models generalize across diverse musical traditions, revealing their strengths and limitations in capturing world music representations.
Contribution
It provides a comprehensive evaluation of foundation models on multiple world music corpora, introducing methodologies and benchmarks for assessing cross-cultural music understanding.
Findings
Larger models generally perform better on non-Western music.
Model performance declines with increasing cultural distance.
The proposed evaluation framework sets new benchmarks for future research.
Abstract
Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize across diverse musical traditions. This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora spanning Western popular, Greek, Turkish, and Indian classical traditions. We employ three complementary methodologies to investigate these models' cross-cultural capabilities: probing to assess inherent representations, targeted supervised fine-tuning of 1-2 layers, and multi-label few-shot learning for low-resource scenarios. Our analysis shows varying cross-cultural generalization, with larger models typically outperforming on non-Western music, though results decline for culturally distant traditions. Notably, our approaches achieve state-of-the-art performance on five out of six evaluated datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Music Technology and Sound Studies
