DIR-TIR: Dialog-Iterative Refinement for Text-to-Image Retrieval

Zongwei Zhen; Biqing Zeng

arXiv:2511.14449·cs.CV·November 19, 2025

DIR-TIR: Dialog-Iterative Refinement for Text-to-Image Retrieval

Zongwei Zhen, Biqing Zeng

PDF

Open Access

TL;DR

DIR-TIR introduces a dialogue-based iterative refinement framework for text-to-image retrieval, significantly improving accuracy and user control through multi-turn interactions and specialized modules.

Contribution

It proposes a novel interactive framework with dialog and image refinement modules that enhance retrieval precision over traditional methods.

Findings

01

Outperforms baseline methods in retrieval accuracy

02

Enhances user control and fault tolerance

03

Achieves higher precision with module integration

Abstract

This paper addresses the task of interactive, conversational text-to-image retrieval. Our DIR-TIR framework progressively refines the target image search through two specialized modules: the Dialog Refiner Module and the Image Refiner Module. The Dialog Refiner actively queries users to extract essential information and generate increasingly precise descriptions of the target image. Complementarily, the Image Refiner identifies perceptual gaps between generated images and user intentions, strategically reducing the visual-semantic discrepancy. By leveraging multi-turn dialogues, DIR-TIR provides superior controllability and fault tolerance compared to conventional single-query methods, significantly improving target image hit accuracy. Comprehensive experiments across diverse image datasets demonstrate our dialogue-based approach substantially outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques