IntRec: Intent-based Retrieval with Contrastive Refinement
Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Yue Lu

TL;DR
IntRec is an interactive object retrieval framework that uses user feedback and contrastive refinement to improve accuracy in complex scenes, outperforming existing methods with minimal latency.
Contribution
We introduce IntRec, a novel interactive retrieval method that maintains dual memory sets and employs contrastive alignment for fine-grained disambiguation based on user feedback.
Findings
Achieves 35.4 AP on LVIS, surpassing competitors.
Improves by +7.9 AP on LVIS-Ambiguous after one feedback.
Operates with less than 30 ms latency per interaction.
Abstract
Retrieving user-specified objects from complex scenes remains a challenging task, especially when queries are ambiguous or involve multiple similar objects. Existing open-vocabulary detectors operate in a one-shot manner, lacking the ability to refine predictions based on user feedback. To address this, we propose IntRec, an interactive object retrieval framework that refines predictions based on user feedback. At its core is an Intent State (IS) that maintains dual memory sets for positive anchors (confirmed cues) and negative constraints (rejected hypotheses). A contrastive alignment function ranks candidate objects by maximizing similarity to positive cues while penalizing rejected ones, enabling fine-grained disambiguation in cluttered scenes. Our interactive framework provides substantial improvements in retrieval accuracy without additional supervision. On LVIS, IntRec achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Information Retrieval and Search Behavior · Advanced Image and Video Retrieval Techniques
