Composed Object Retrieval: Object-level Retrieval via Composed Expressions

Tong Wang; Guanyu Yang; Nian Liu; Zongyan Han; Jinxing Zhou; Salman Khan; Fahad Shahbaz Khan

arXiv:2508.04424·cs.CV·November 24, 2025

Composed Object Retrieval: Object-level Retrieval via Composed Expressions

Tong Wang, Guanyu Yang, Nian Liu, Zongyan Han, Jinxing Zhou, Salman Khan, Fahad Shahbaz Khan

PDF

TL;DR

This paper introduces Composed Object Retrieval (COR), a new task for object-level image retrieval based on composed expressions, along with a large-scale benchmark and a unified model that outperforms existing methods.

Contribution

The paper proposes the first large-scale COR benchmark and a novel end-to-end model that effectively performs object-level retrieval with composed expressions.

Findings

01

CORE outperforms existing models in retrieval accuracy

02

COR dataset contains 127,166 triplets across 408 categories

03

The approach enables precise object localization based on complex expressions

Abstract

Retrieving fine-grained visual content based on user intent remains a challenge in multi-modal systems. Although current Composed Image Retrieval (CIR) methods combine reference images with retrieval texts, they are constrained to image-level matching and cannot localize specific objects. To this end, we propose Composed Object Retrieval (COR), a brand-new task that goes beyond image-level retrieval to achieve object-level precision, allowing the retrieval and segmentation of target objects based on composed expressions combining reference objects and retrieval texts. COR presents significant challenges in retrieval flexibility, which requires systems to identify arbitrary objects satisfying composed expressions while avoiding semantically similar but irrelevant negative objects within the same scene. We construct COR127K, the first large-scale COR benchmark that contains 127,166…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.