Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

Yan-Martin Tamm; Anna Aljanaki

arXiv:2604.23077·cs.IR·April 28, 2026

Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

Yan-Martin Tamm, Anna Aljanaki

PDF

TL;DR

This study evaluates nine pretrained audio models for music recommendation tasks, revealing significant performance differences and highlighting the potential for improved recommendation systems using transfer learning.

Contribution

It systematically assesses pretrained audio representations across multiple recommendation approaches and scenarios, filling a gap in MIR and recommender systems research.

Findings

01

Pretrained models show varied effectiveness in music recommendation tasks.

02

Performance disparity exists between MIR tasks and recommendation scenarios.

03

The study provides a foundation for future research on pretrained audio representations in MRS.

Abstract

Over the years, Music Information Retrieval (MIR) research community has released various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models for a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network training. Our research addresses this gap and evaluates the performance of nine pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ and MuQ-MuLan) in the context of MRS. We assess them using five recommendation approaches: K-Nearest Neighbours (KNN), Shallow Neural Network, Contrastive Multi-Modal projection, a Hybrid model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.