Layer-wise Investigation of Large-Scale Self-Supervised Music Representation Models
Yizhi Zhou, Haina Zhu, Hangting Chen

TL;DR
This paper investigates layer-wise information in large-scale self-supervised music models, analyzing their effectiveness across tasks and how layer selection impacts performance, to better understand their capabilities.
Contribution
It provides a detailed layer-wise analysis of SSL music models, revealing how different layers contribute to various music information retrieval tasks.
Findings
SSL models outperform traditional methods in multiple tasks
Layer specialization varies across different music tasks
Selecting specific layers can optimize model performance
Abstract
Recently, pre-trained models for music information retrieval based on self-supervised learning (SSL) are becoming popular, showing success in various downstream tasks. However, there is limited research on the specific meanings of the encoded information and their applicability. Exploring these aspects can help us better understand their capabilities and limitations, leading to more effective use in downstream tasks. In this study, we analyze the advanced music representation model MusicFM and the newly emerged SSL model MuQ. We focus on three main aspects: (i) validating the advantages of SSL models across multiple downstream tasks, (ii) exploring the specialization of layer-wise information for different tasks, and (iii) comparing performance differences when selecting specific layers. Through this analysis, we reveal insights into the structure and potential applications of SSL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Recommender Systems and Techniques · Information Retrieval and Search Behavior
MethodsFocus
