Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for   Recommender Tasks

Florian Gr\"otschla; Luca Str\"assle; Luca A. Lanzend\"orfer; Roger; Wattenhofer

arXiv:2409.09026·cs.SD·September 16, 2024

Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Florian Gr\"otschla, Luca Str\"assle, Luca A. Lanzend\"orfer, Roger, Wattenhofer

PDF

Open Access

TL;DR

This paper explores the use of contrastively pretrained neural audio embeddings, especially CLAP, to improve music recommendation systems by addressing cold-start issues and enriching content-based information.

Contribution

It introduces the application of contrastively pretrained neural audio embeddings, like CLAP, into graph-based music recommendation frameworks, demonstrating their effectiveness.

Findings

01

Neural embeddings outperform traditional hand-crafted features.

02

CLAP embeddings significantly improve cold-start recommendations.

03

Contrastive pretraining enhances content-based music representations.

Abstract

Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods. While previous approaches have relied on hand-crafted audio features for this purpose, we explore the use of contrastively pretrained neural audio embedding models, which offer a richer and more nuanced representation of music. Our experiments demonstrate that neural embeddings, particularly those generated with the Contrastive Language-Audio Pretraining (CLAP) model, present a promising approach to enhancing music recommendation tasks within graph-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis