Muse: A Multimodal Conversational Recommendation Dataset with   Scenario-Grounded User Profiles

Zihan Wang; Xiaocui Yang; Yongkang Liu; Shi Feng; Daling Wang; Yifei; Zhang

arXiv:2412.18416·cs.MM·April 16, 2025

Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles

Zihan Wang, Xiaocui Yang, Yongkang Liu, Shi Feng, Daling Wang, Yifei, Zhang

PDF

Open Access

TL;DR

Muse is the first multimodal conversational recommendation dataset that uses scenario-grounded user profiles and multimodal large language models to generate realistic, high-quality dialogues in the clothing domain, bridging the gap between research and real-world applications.

Contribution

It introduces a scalable, scenario-grounded multimodal recommendation dataset created with multimodal large language models, enabling improved research in multimodal conversational recommendation systems.

Findings

01

High-quality multimodal conversations validated by human and LLM evaluations.

02

Fine-tuning experiments show Muse's effectiveness in learning recommendation patterns.

03

Demonstrates the feasibility of automatic data synthesis for multimodal dialogue datasets.

Abstract

Current conversational recommendation systems focus predominantly on text. However, real-world recommendation settings are generally multimodal, causing a significant gap between existing research and practical applications. To address this issue, we propose Muse, the first multimodal conversational recommendation dataset. Muse comprises 83,148 utterances from 7,000 conversations centered around the Clothing domain. Each conversation contains comprehensive multimodal interactions, rich elements, and natural dialogues. Data in Muse are automatically synthesized by a multi-agent framework powered by multimodal large language models (MLLMs). It innovatively derives user profiles from real-world scenarios rather than depending on manual design and history data for better scalability, and then it fulfills conversation simulation and optimization. Both human and LLM evaluations demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems