Dialog-based Interactive Image Retrieval
Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, Rogerio, Schmidt Feris

TL;DR
This paper presents a novel dialog-based interactive image retrieval system that uses natural language feedback and reinforcement learning to improve retrieval accuracy and user experience.
Contribution
It introduces a reinforcement learning framework for dialog-based image retrieval using natural language feedback, trained with a user simulator, and demonstrates superior performance in footwear retrieval.
Findings
Better retrieval accuracy than baseline methods
Natural language feedback enhances communication and effectiveness
Effective in both simulated and real-world data
Abstract
Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results. However, most current systems rely on restricted forms of user feedback, such as binary relevance responses, or feedback based on a fixed set of relative attributes, which limits their impact. In this paper, we introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. We formulate the task of dialog-based interactive image retrieval as a reinforcement learning problem, and reward the dialog system for improving the rank of the target image during each dialog turn. To mitigate the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train our system with a user simulator, which is itself trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
