Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features

Hila Levi; Guy Heller; Dan Levi; Ethan Fetaya

arXiv:2309.14999·cs.CV·December 30, 2024

Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features

Hila Levi, Guy Heller, Dan Levi, Ethan Fetaya

PDF

Open Access

TL;DR

This paper introduces a scalable object-centric image retrieval method that aggregates dense CLIP embeddings, significantly improving accuracy over global feature approaches and enabling efficient large-scale retrieval.

Contribution

The authors propose a novel aggregation of dense CLIP embeddings for object-centric retrieval, balancing scalability and detailed object identification.

Findings

01

Achieves up to 15 mAP points improvement over global features.

02

Effectively combines scalability with object-level retrieval capabilities.

03

Demonstrates advantages in large-scale retrieval frameworks.

Abstract

The task of open-vocabulary object-centric image retrieval involves the retrieval of images containing a specified object of interest, delineated by an open-set text query. As working on large image datasets becomes standard, solving this task efficiently has gained significant practical importance. Applications include targeted performance analysis of retrieved images using ad-hoc queries and hard example mining during training. Recent advancements in contrastive-based open vocabulary systems have yielded remarkable breakthroughs, facilitating large-scale open vocabulary image retrieval. However, these approaches use a single global embedding per image, thereby constraining the system's ability to retrieve images containing relatively small object instances. Alternatively, incorporating local embeddings from detection pipelines faces scalability challenges, making it unsuitable for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training