ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections

Ziling Huang; Yidan Zhang; Shin'ichi Satoh

arXiv:2506.15180·cs.CV·June 19, 2025

ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections

Ziling Huang, Yidan Zhang, Shin'ichi Satoh

PDF

Open Access

TL;DR

ReSeDis introduces a novel unified task combining large-scale image retrieval with fine-grained object localization based on natural language descriptions, addressing limitations of existing methods.

Contribution

It presents the first benchmark and task for joint corpus-level retrieval and pixel-level grounding, along with a zero-shot baseline using frozen vision-language models.

Findings

01

Benchmark dataset with unique description-to-object mappings

02

Proposed metric combining retrieval recall and localization precision

03

Baseline results indicating significant room for improvement

Abstract

Large-scale visual search engines are expected to solve a dual problem at once: (i) locate every image that truly contains the object described by a sentence and (ii) identify the object's bounding box or exact pixels within each hit. Existing techniques address only one side of this challenge. Visual grounding yields tight boxes and masks but rests on the unrealistic assumption that the object is present in every test image, producing a flood of false alarms when applied to web-scale collections. Text-to-image retrieval excels at sifting through massive databases to rank relevant images, yet it stops at whole-image matches and offers no fine-grained localization. We introduce Referring Search and Discovery (ReSeDis), the first task that unifies corpus-level retrieval with pixel-level grounding. Given a free-form description, a ReSeDis model must decide whether the queried object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques