Composed Object Retrieval: Object-level Retrieval via Composed Expressions
Tong Wang, Guanyu Yang, Nian Liu, Zongyan Han, Jinxing Zhou, Salman Khan, Fahad Shahbaz Khan

TL;DR
This paper introduces Composed Object Retrieval (COR), a new task for object-level image retrieval based on composed expressions, along with a large-scale benchmark and a unified model that outperforms existing methods.
Contribution
The paper proposes the first large-scale COR benchmark and a novel end-to-end model that effectively performs object-level retrieval with composed expressions.
Findings
CORE outperforms existing models in retrieval accuracy
COR dataset contains 127,166 triplets across 408 categories
The approach enables precise object localization based on complex expressions
Abstract
Retrieving fine-grained visual content based on user intent remains a challenge in multi-modal systems. Although current Composed Image Retrieval (CIR) methods combine reference images with retrieval texts, they are constrained to image-level matching and cannot localize specific objects. To this end, we propose Composed Object Retrieval (COR), a brand-new task that goes beyond image-level retrieval to achieve object-level precision, allowing the retrieval and segmentation of target objects based on composed expressions combining reference objects and retrieval texts. COR presents significant challenges in retrieval flexibility, which requires systems to identify arbitrary objects satisfying composed expressions while avoiding semantically similar but irrelevant negative objects within the same scene. We construct COR127K, the first large-scale COR benchmark that contains 127,166…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
