Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems
Yan-Martin Tamm, Anna Aljanaki

TL;DR
This study evaluates nine pretrained audio models for music recommendation tasks, revealing significant performance differences and highlighting the potential for improved recommendation systems using transfer learning.
Contribution
It systematically assesses pretrained audio representations across multiple recommendation approaches and scenarios, filling a gap in MIR and recommender systems research.
Findings
Pretrained models show varied effectiveness in music recommendation tasks.
Performance disparity exists between MIR tasks and recommendation scenarios.
The study provides a foundation for future research on pretrained audio representations in MRS.
Abstract
Over the years, Music Information Retrieval (MIR) research community has released various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models for a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network training. Our research addresses this gap and evaluates the performance of nine pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ and MuQ-MuLan) in the context of MRS. We assess them using five recommendation approaches: K-Nearest Neighbours (KNN), Shallow Neural Network, Contrastive Multi-Modal projection, a Hybrid model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
