ReCQR: Incorporating conversational query rewriting to improve Multimodal Image Retrieval
Yuan Hu, ZhiYu Cao, PeiFeng Li, QiaoMing Zhu

TL;DR
This paper introduces conversational query rewriting (CQR) for multimodal image retrieval, creating a new dataset and demonstrating improved retrieval accuracy through enhanced query understanding.
Contribution
The paper presents a novel CQR task, a large-scale dataset ReCQR, and benchmarks showing improved image retrieval performance with CQR integration.
Findings
CQR significantly improves image retrieval accuracy.
Constructed a high-quality dataset with 7,000 multimodal dialogues.
Benchmark results highlight the effectiveness of CQR in multimodal systems.
Abstract
With the rise of multimodal learning, image retrieval plays a crucial role in connecting visual information with natural language queries. Existing image retrievers struggle with processing long texts and handling unclear user expressions. To address these issues, we introduce the conversational query rewriting (CQR) task into the image retrieval domain and construct a dedicated multi-turn dialogue query rewriting dataset. Built on full dialogue histories, CQR rewrites users' final queries into concise, semantically complete ones that are better suited for retrieval. Specifically, We first leverage Large Language Models (LLMs) to generate rewritten candidates at scale and employ an LLM-as-Judge mechanism combined with manual review to curate approximately 7,000 high-quality multimodal dialogues, forming the ReCQR dataset. Then We benchmark several SOTA multimodal models on the ReCQR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
