Comparative Analysis of Pretrained Audio Representations in Music   Recommender Systems

Yan-Martin Tamm; Anna Aljanaki

arXiv:2409.08987·cs.IR·September 16, 2024

Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems

Yan-Martin Tamm, Anna Aljanaki

PDF

1 Repo

TL;DR

This paper evaluates six pretrained audio models for music recommendation systems, revealing significant variability in their effectiveness and highlighting the need for task-specific adaptation in MIR applications.

Contribution

It provides a comparative analysis of pretrained audio representations in the context of music recommender systems, an area previously underexplored.

Findings

01

Pretrained models show varied performance in MRS tasks.

02

Traditional MIR models may not directly transfer to recommendation tasks.

03

The study establishes a baseline for future research in pretrained audio representations for MRS.

Abstract

Over the years, Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network learning over these models. Our research addresses this gap and evaluates the applicability of six pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in the context of MRS. We assess their performance using three recommendation models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our findings suggest that pretrained audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Darel13712/pretrained-audio-representations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDense Connections · Convolution · Dilated Convolution · VQ-VAE · Layer Normalization · Position-Wise Feed-Forward Layer · Residual Connection · Jukebox