Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles
Zihan Wang, Xiaocui Yang, Yongkang Liu, Shi Feng, Daling Wang, Yifei, Zhang

TL;DR
Muse is the first multimodal conversational recommendation dataset that uses scenario-grounded user profiles and multimodal large language models to generate realistic, high-quality dialogues in the clothing domain, bridging the gap between research and real-world applications.
Contribution
It introduces a scalable, scenario-grounded multimodal recommendation dataset created with multimodal large language models, enabling improved research in multimodal conversational recommendation systems.
Findings
High-quality multimodal conversations validated by human and LLM evaluations.
Fine-tuning experiments show Muse's effectiveness in learning recommendation patterns.
Demonstrates the feasibility of automatic data synthesis for multimodal dialogue datasets.
Abstract
Current conversational recommendation systems focus predominantly on text. However, real-world recommendation settings are generally multimodal, causing a significant gap between existing research and practical applications. To address this issue, we propose Muse, the first multimodal conversational recommendation dataset. Muse comprises 83,148 utterances from 7,000 conversations centered around the Clothing domain. Each conversation contains comprehensive multimodal interactions, rich elements, and natural dialogues. Data in Muse are automatically synthesized by a multi-agent framework powered by multimodal large language models (MLLMs). It innovatively derives user profiles from real-world scenarios rather than depending on manual design and history data for better scalability, and then it fulfills conversation simulation and optimization. Both human and LLM evaluations demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems
