TALKPLAY: Multimodal Music Recommendation with Large Language Models

Seungheon Doh; Keunwoo Choi; Juhan Nam

arXiv:2502.13713·cs.IR·May 27, 2025

TALKPLAY: Multimodal Music Recommendation with Large Language Models

Seungheon Doh, Keunwoo Choi, Juhan Nam

PDF

Open Access 1 Datasets

TL;DR

TALKPLAY introduces a multimodal music recommendation system leveraging large language models, encoding diverse music data into tokens, enabling end-to-end conversational recommendations with improved performance and natural language responses.

Contribution

The paper presents a novel multimodal music tokenizer and vocabulary expansion for LLMs, unifying recommendation and dialogue into a single end-to-end system.

Findings

01

Outperforms unimodal approaches in recommendation accuracy.

02

Effectively handles long conversational contexts.

03

Generates natural language responses for user interaction.

Abstract

We present TALKPLAY, a novel multimodal music recommendation system that reformulates recommendation as a token generation problem using large language models (LLMs). By leveraging the instruction-following and natural language generation capabilities of LLMs, our system effectively recommends music from diverse user queries while generating contextually relevant responses. While pretrained LLMs are primarily designed for text modality, TALKPLAY extends their scope through two key innovations: a multimodal music tokenizer that encodes audio features, lyrics, metadata, semantic tags, and playlist co-occurrence signals; and a vocabulary expansion mechanism that enables unified processing and generation of both linguistic and music-relevant tokens. By integrating the recommendation system directly into the LLM architecture, TALKPLAY transforms conventional systems by: (1) unifying previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

talkpl-ai/TalkPlayData-1
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Music History and Culture