WeDetect: Fast Open-Vocabulary Object Detection as Retrieval

Shenghao Fu; Yukun Su; Fengyun Rao; Jing Lyu; Xiaohua Xie; Wei-Shi Zheng

arXiv:2512.12309·cs.CV·December 16, 2025

WeDetect: Fast Open-Vocabulary Object Detection as Retrieval

Shenghao Fu, Yukun Su, Fengyun Rao, Jing Lyu, Xiaohua Xie, Wei-Shi Zheng

PDF

Open Access 3 Models

TL;DR

WeDetect introduces a fast, open-vocabulary object detection framework that leverages retrieval-based methods, achieving state-of-the-art performance and versatile applications including object retrieval and referring expression comprehension.

Contribution

The paper presents a non-fusion, retrieval-based detection model family that surpasses fusion models in speed and accuracy, and introduces new applications like historical data retrieval and REC.

Findings

01

State-of-the-art open-vocabulary detection performance

02

Real-time detection with a dual-tower architecture

03

Effective retrieval of objects in historical data

Abstract

Open-vocabulary object detection aims to detect arbitrary classes via text prompts. Methods without cross-modal fusion layers (non-fusion) offer faster inference by treating recognition as a retrieval problem, \ie, matching regions to text queries in a shared embedding space. In this work, we fully explore this retrieval philosophy and demonstrate its unique advantages in efficiency and versatility through a model family named WeDetect: (1) State-of-the-art performance. WeDetect is a real-time detector with a dual-tower architecture. We show that, with well-curated data and full training, the non-fusion WeDetect surpasses other fusion models and establishes a strong open-vocabulary foundation. (2) Fast backtrack of historical data. WeDetect-Uni is a universal proposal generator based on WeDetect. We freeze the entire detector and only finetune an objectness prompt to retrieve generic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems