SORCE: Small Object Retrieval in Complex Environments
Chunxu Liu, Chi Xie, Xiaxu Chen, Wei Li, Feng Zhu, Rui Zhao, Limin Wang

TL;DR
This paper introduces SORCE, a new benchmark and approach for small object retrieval in complex images, demonstrating that multi-embedding representations significantly improve retrieval performance over existing methods.
Contribution
The paper proposes a novel multi-embedding method using MLLMs and Regional Prompts for small object retrieval, along with a new benchmark dataset SORCE-1K.
Findings
Existing T2IR methods struggle with small objects in complex environments.
Multi-embedding representations outperform single-embedding approaches.
The proposed method achieves significant improvements on SORCE-1K.
Abstract
Text-to-Image Retrieval (T2IR) is a highly valuable task that aims to match a given textual query to images in a gallery. Existing benchmarks primarily focus on textual queries describing overall image semantics or foreground salient objects, possibly overlooking inconspicuous small objects, especially in complex environments. Such small object retrieval is crucial, as in real-world applications, the targets of interest are not always prominent in the image. Thus, we introduce SORCE (Small Object Retrieval in Complex Environments), a new subfield of T2IR, focusing on retrieving small objects in complex images with textual queries. We propose a new benchmark, SORCE-1K, consisting of images with complex environments and textual queries describing less conspicuous small objects with minimal contextual cues from other salient objects. Preliminary analysis on SORCE-1K finds that existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Image Processing and 3D Reconstruction
MethodsFocus · Sparse Evolutionary Training
