Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation

Alessandro B. Melchiorre; Elena V. Epure; Shahed Masoudian; Gustavo Escobedo; Anna Hausberger; Manuel Moussallam; Markus Schedl

arXiv:2507.15826·cs.IR·July 22, 2025

Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation

Alessandro B. Melchiorre, Elena V. Epure, Shahed Masoudian, Gustavo Escobedo, Anna Hausberger, Manuel Moussallam, Markus Schedl

PDF

TL;DR

JAM is a lightweight, multimodal, and personalized natural language music recommendation framework that models user preferences as vector translations, effectively capturing complex queries and long-term user interests with high accuracy.

Contribution

This paper introduces JAM, a novel, scalable framework that models user-query-item interactions as vector translations, integrating multimodal features and long-term preferences for music recommendation.

Findings

01

JAM achieves accurate recommendations in experiments.

02

JAM produces intuitive, practical representations.

03

JAM can be integrated into existing systems.

Abstract

Natural language interfaces offer a compelling approach for music recommendation, enabling users to express complex preferences conversationally. While Large Language Models (LLMs) show promise in this direction, their scalability in recommender systems is limited by high costs and latency. Retrieval-based approaches using smaller language models mitigate these issues but often rely on single-modal item representations, overlook long-term user preferences, and require full model retraining, posing challenges for real-world deployment. In this paper, we present JAM (Just Ask for Music), a lightweight and intuitive framework for natural language music recommendation. JAM models user-query-item interactions as vector translations in a shared latent space, inspired by knowledge graph embedding methods like TransE. To capture the complexity of music and user intent, JAM aggregates multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.