Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
Yifei Yuan, Clemencia Siro, Mohammad Aliannejadi, Maarten de Rijke,, Wai Lam

TL;DR
This paper introduces a novel multimodal approach to improve clarifying questions in conversational search by incorporating images, leading to significant performance gains and a new dataset for research.
Contribution
It proposes the task of asking multimodal clarifying questions, introduces the Melon dataset, and develops the Marto model with a prompt-based training strategy.
Findings
Adding images improves retrieval performance by up to 90%.
Marto outperforms discriminative baselines in effectiveness and efficiency.
The dataset Melon contains over 4,000 multimodal questions with 14,000 images.
Abstract
In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query. These questions aim to uncover user's information needs and resolve query ambiguities. We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information. Therefore, we propose to add images to clarifying questions and formulate the novel task of asking multimodal clarifying questions in open-domain, mixed-initiative conversational search systems. To facilitate research into this task, we collect a dataset named Melon that contains over 4k multimodal clarifying questions, enriched with over 14k images. We also propose a multimodal query clarification model named Marto and adopt a prompt-based, generative fine-tuning strategy to perform the training of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
