Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval
Yuxin Yang, Yinan Zhou, Yuxin Chen, Ziqi Zhang, Zongyang Ma, Chunfeng Yuan, Bing Li, Jun Gao, Weiming Hu

TL;DR
This paper introduces Object-Anchored Composed Image Retrieval (OACIR), a new fine-grained retrieval task emphasizing instance fidelity, supported by a large-scale benchmark and a novel attention-based framework called AdaFocal.
Contribution
It proposes OACIR as a new task, creates the first large-scale benchmark OACIRR, and develops AdaFocal, an attention framework that enhances instance-level retrieval accuracy.
Findings
AdaFocal outperforms existing models in maintaining instance fidelity.
OACIRR benchmark contains over 160K quadruples with hard-negative distractors.
AdaFocal effectively balances attention between the anchored object and context.
Abstract
Composed Image Retrieval (CIR) has demonstrated significant potential by enabling flexible multimodal queries that combine a reference image and modification text. However, CIR inherently prioritizes semantic matching, struggling to reliably retrieve a user-specified instance across contexts. In practice, emphasizing concrete instance fidelity over broad semantics is often more consequential. In this work, we propose Object-Anchored Composed Image Retrieval (OACIR), a novel fine-grained retrieval task that mandates strict instance-level consistency. To advance research on this task, we construct OACIRR (OACIR on Real-world images), the first large-scale, multi-domain benchmark comprising over 160K quadruples and four challenging candidate galleries enriched with hard-negative instance distractors. Each quadruple augments the compositional query with a bounding box that visually anchors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
