TL;DR
This paper investigates how different music audio representation models perform under limited data conditions, revealing that some models and even random features can perform comparably to large-data models in certain scenarios.
Contribution
It provides a comprehensive analysis of various music representation models trained on limited data, highlighting their robustness and limitations compared to handcrafted features.
Findings
Limited-data models can perform similarly to large-data models in some tasks.
Random models sometimes match learned representations in effectiveness.
Handcrafted features outperform learned models in certain music information retrieval tasks.
Abstract
Large deep-learning models for music, including those focused on learning general-purpose music audio representations, are often assumed to require substantial training data to achieve high performance. If true, this would pose challenges in scenarios where audio data or annotations are scarce, such as for underrepresented music traditions, non-popular genres, and personalized music creation and listening. Understanding how these models behave in limited-data scenarios could be crucial for developing techniques to tackle them. In this work, we investigate the behavior of several music audio representation models under limited-data learning regimes. We consider music models with various architectures, training paradigms, and input durations, and train them on data collections ranging from 5 to 8,000 minutes long. We evaluate the learned representations on various music information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
