Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval

Yuxin Yang; Yinan Zhou; Yuxin Chen; Ziqi Zhang; Zongyang Ma; Chunfeng Yuan; Bing Li; Jun Gao; Weiming Hu

arXiv:2604.05393·cs.CV·April 8, 2026

Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval

Yuxin Yang, Yinan Zhou, Yuxin Chen, Ziqi Zhang, Zongyang Ma, Chunfeng Yuan, Bing Li, Jun Gao, Weiming Hu

PDF

1 Models 1 Datasets

TL;DR

This paper introduces Object-Anchored Composed Image Retrieval (OACIR), a new fine-grained retrieval task emphasizing instance fidelity, supported by a large-scale benchmark and a novel attention-based framework called AdaFocal.

Contribution

It proposes OACIR as a new task, creates the first large-scale benchmark OACIRR, and develops AdaFocal, an attention framework that enhances instance-level retrieval accuracy.

Findings

01

AdaFocal outperforms existing models in maintaining instance fidelity.

02

OACIRR benchmark contains over 160K quadruples with hard-negative distractors.

03

AdaFocal effectively balances attention between the anchored object and context.

Abstract

Composed Image Retrieval (CIR) has demonstrated significant potential by enabling flexible multimodal queries that combine a reference image and modification text. However, CIR inherently prioritizes semantic matching, struggling to reliably retrieve a user-specified instance across contexts. In practice, emphasizing concrete instance fidelity over broad semantics is often more consequential. In this work, we propose Object-Anchored Composed Image Retrieval (OACIR), a novel fine-grained retrieval task that mandates strict instance-level consistency. To advance research on this task, we construct OACIRR (OACIR on Real-world images), the first large-scale, multi-domain benchmark comprising over 160K quadruples and four challenging candidate galleries enriched with hard-negative instance distractors. Each quadruple augments the compositional query with a bounding box that visually anchors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
HaHaJun1101/AdaFocal
model· ♡ 1
♡ 1

Datasets

HaHaJun1101/OACIRR
dataset· 565 dl
565 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.