DIR-TIR: Dialog-Iterative Refinement for Text-to-Image Retrieval
Zongwei Zhen, Biqing Zeng

TL;DR
DIR-TIR introduces a dialogue-based iterative refinement framework for text-to-image retrieval, significantly improving accuracy and user control through multi-turn interactions and specialized modules.
Contribution
It proposes a novel interactive framework with dialog and image refinement modules that enhance retrieval precision over traditional methods.
Findings
Outperforms baseline methods in retrieval accuracy
Enhances user control and fault tolerance
Achieves higher precision with module integration
Abstract
This paper addresses the task of interactive, conversational text-to-image retrieval. Our DIR-TIR framework progressively refines the target image search through two specialized modules: the Dialog Refiner Module and the Image Refiner Module. The Dialog Refiner actively queries users to extract essential information and generate increasingly precise descriptions of the target image. Complementarily, the Image Refiner identifies perceptual gaps between generated images and user intentions, strategically reducing the visual-semantic discrepancy. By leveraging multi-turn dialogues, DIR-TIR provides superior controllability and fault tolerance compared to conventional single-query methods, significantly improving target image hit accuracy. Comprehensive experiments across diverse image datasets demonstrate our dialogue-based approach substantially outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
