LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation
Rebecca Salganik, Xiaohao Liu, Yunshan Ma, Jian Kang, and Tat-Seng, Chua

TL;DR
LARP is a multi-modal contrastive learning framework designed to improve cold-start playlist continuation by integrating language, audio, and relational signals into content representations, outperforming existing models.
Contribution
The paper introduces LARP, a novel three-stage contrastive learning model that effectively incorporates multi-modal and relational signals for cold-start playlist continuation.
Findings
LARP outperforms uni-modal and multi-modal baselines on public datasets.
The three-stage contrastive framework enhances content representations for cold-start scenarios.
Code and datasets are publicly available for reproducibility.
Abstract
As online music consumption increasingly shifts towards playlist-based listening, the task of playlist continuation, in which an algorithm suggests songs to extend a playlist in a personalized and musically cohesive manner, has become vital to the success of music streaming. Currently, many existing playlist continuation approaches rely on collaborative filtering methods to perform recommendation. However, such methods will struggle to recommend songs that lack interaction data, an issue known as the cold-start problem. Current approaches to this challenge design complex mechanisms for extracting relational signals from sparse collaborative data and integrating them into content representations. However, these approaches leave content representation learning out of scope and utilize frozen, pre-trained content models that may not be aligned with the distribution or format of a specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research
MethodsContrastive Learning
