Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval
Zijun Long, Kangheng Liang, Gerardo Aragon-Camarasa, Richard Mccreadie, Paul Henderson

TL;DR
This paper introduces Diffusion Augmented Retrieval (DAR), a training-free method for interactive text-to-image retrieval that leverages diffusion models and dialogue refinements to improve generalization and performance on complex queries.
Contribution
DAR is a novel framework that avoids finetuning large models by using diffusion models and dialogue-based intermediate representations for better retrieval accuracy.
Findings
DAR matches finetuned models on simple queries
DAR outperforms finetuned models on complex, multi-turn queries by up to 7.61% in Hits@10
Extensive experiments validate DAR's effectiveness across four benchmarks.
Abstract
Interactive Text-to-image retrieval (I-TIR) is an important enabler for a wide range of state-of-the-art services in domains such as e-commerce and education. However, current methods rely on finetuned Multimodal Large Language Models (MLLMs), which are costly to train and update, and exhibit poor generalizability. This latter issue is of particular concern, as: 1) finetuning narrows the pretrained distribution of MLLMs, thereby reducing generalizability; and 2) I-TIR introduces increasing query diversity and complexity. As a result, I-TIR solutions are highly likely to encounter queries and images not well represented in any training dataset. To address this, we propose leveraging Diffusion Models (DMs) for text-to-image mapping, to avoid finetuning MLLMs while preserving robust performance on complex queries. Specifically, we introduce Diffusion Augmented Retrieval (DAR), a framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsDiffusion
