Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Se-eun Yoon, Hyunsik Jeon, Julian McAuley

TL;DR
This paper introduces a multimodal dataset where users express preferences through images for recommendation tasks, revealing limitations of current models and proposing chain-of-imagery prompting to improve performance.
Contribution
The paper presents a new multimodal dataset for conversational recommendation using images and introduces chain-of-imagery prompting to enhance vision-language model capabilities.
Findings
Vision-language models perform no better than text-only models on these tasks.
Chain-of-imagery prompting significantly improves model performance.
The dataset enables new research in multimodal recommendation systems.
Abstract
We introduce a multimodal dataset where users express preferences through images. These images encompass a broad spectrum of visual expressions ranging from landscapes to artistic depictions. Users request recommendations for books or music that evoke similar feelings to those captured in the images, and recommendations are endorsed by the community through upvotes. This dataset supports two recommendation tasks: title generation and multiple-choice selection. Our experiments with large foundation models reveal their limitations in these tasks. Particularly, vision-language models show no significant advantage over language-only counterparts that use descriptions, which we hypothesize is due to underutilized visual capabilities. To better harness these abilities, we propose the chain-of-imagery prompting, which results in notable improvements. We release our code and datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Storytelling and Education · Education and Critical Thinking Development · Language, Metaphor, and Cognition
