MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong, Bin Chen, Xiulong Liu, Pawel Polak, Peng Zhang

TL;DR
MuseChat is a novel dialogue-based system that personalizes music recommendations for videos, incorporating user preferences and providing explanations through multi-modal reasoning, significantly improving over existing methods.
Contribution
It introduces a pioneering conversational music recommendation system with integrated reasoning and explanation capabilities using large language models and multi-modal inputs.
Findings
MuseChat outperforms existing video-based music retrieval methods.
The system offers strong interpretability and user interaction.
A large-scale dataset was created for evaluation.
Abstract
Music recommendation for videos attracts growing interest in multi-modal research. However, existing systems focus primarily on content compatibility, often ignoring the users' preferences. Their inability to interact with users for further refinements or to provide explanations leads to a less satisfying experience. We address these issues with MuseChat, a first-of-its-kind dialogue-based recommendation system that personalizes music suggestions for videos. Our system consists of two key functionalities with associated modules: recommendation and reasoning. The recommendation module takes a video along with optional information including previous suggested music and user's preference as inputs and retrieves an appropriate music matching the context. The reasoning module, equipped with the power of Large Language Model (Vicuna-7B) and extended to multi-modal inputs, is able to provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Music Technology and Sound Studies
MethodsFocus
