A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback
Bulat Khaertdinov, Mirela Popa, Nava Tintarev

TL;DR
This paper introduces relevance feedback mechanisms for vision-language models to improve text-to-image retrieval performance without fine-tuning, demonstrating consistent gains and robustness across different models and settings.
Contribution
It proposes four relevance feedback strategies, including generative and attentive feedback, to enhance retrieval accuracy and robustness in vision-language models at inference time.
Findings
Relevance feedback improves retrieval by 3-5% in MRR@5 for smaller VLMs.
The methods are more robust than traditional pseudo-relevance feedback in multi-turn retrieval.
Relevance feedback enables interactive, adaptive visual search without additional fine-tuning.
Abstract
Large vision-language models (VLMs) enable intuitive visual search using natural language queries. However, improving their performance often requires fine-tuning and scaling to larger model variants. In this work, we propose a mechanism inspired by traditional text-based search to improve retrieval performance at inference time: relevance feedback. While relevance feedback can serve as an alternative to fine-tuning, its model-agnostic design also enables use with fine-tuned VLMs. Specifically, we introduce and evaluate four feedback strategies for VLM-based retrieval. First, we revise classical pseudo-relevance feedback (PRF), which refines query embeddings based on top-ranked results. To address its limitations, we propose generative relevance feedback (GRF), which uses synthetic captions for query refinement. Furthermore, we introduce an attentive feedback summarizer (AFS), a custom…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Information Retrieval and Search Behavior
