Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval
Giacomo Pacini, Fabio Carrara, Nicola Messina, Nicola Tonellotto,, Giuseppe Amato, Fabrizio Falchi

TL;DR
This paper introduces CroQS, a new benchmark and task for query suggestion in cross-modal retrieval, focusing on minimal textual modifications to improve search results, and evaluates various methods including LLMs and captioning models.
Contribution
The paper presents CroQS, a novel dataset and evaluation framework for query suggestion in cross-modal retrieval, addressing a gap in existing research.
Findings
LLM-based and captioning-based methods outperform baselines.
Methods improve recall on cluster specificity by over 115%.
Methods increase representativeness mAP by more than 52%.
Abstract
Query suggestion, a technique widely adopted in information retrieval, enhances system interactivity and the browsing experience of document collections. In cross-modal retrieval, many works have focused on retrieving relevant items from natural language queries, while few have explored query suggestion solutions. In this work, we address query suggestion in cross-modal retrieval, introducing a novel task that focuses on suggesting minimal textual modifications needed to explore visually consistent subsets of the collection, following the premise of ''Maybe you are looking for''. To facilitate the evaluation and development of methods, we present a tailored benchmark named CroQS. This dataset comprises initial queries, grouped result sets, and human-defined suggested queries for each group. We establish dedicated metrics to rigorously evaluate the performance of various methods on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Semantic Web and Ontologies
