T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection
Jiazhou Zhou, Qing Jiang, Kanghao Chen, Lutao Jiang, Yuanhuiyi Lyu, Ying-Cong Chen, Lei Zhang

TL;DR
T-Rex-Omni introduces a framework that incorporates negative visual prompts into open-set object detection, effectively reducing false positives from distractors and improving zero-shot detection performance especially in long-tailed scenarios.
Contribution
It presents a novel negative prompt integration method with a unified encoder, a training-free negation module, and a discriminative loss, advancing open-set detection beyond positive-only paradigms.
Findings
Achieves state-of-the-art zero-shot detection performance.
Significantly narrows gap between visual and text-prompted methods.
Excels in long-tailed detection scenarios, e.g., 51.2 AP_r on LVIS-minival.
Abstract
Object detection methods have evolved from closed-set to open-set paradigms over the years. Current open-set object detectors, however, remain constrained by their exclusive reliance on positive indicators based on given prompts like text descriptions or visual exemplars. This positive-only paradigm experiences consistent vulnerability to visually similar but semantically different distractors. We propose T-Rex-Omni, a novel framework that addresses this limitation by incorporating negative visual prompts to negate hard negative distractors. Specifically, we first introduce a unified visual prompt encoder that jointly processes positive and negative visual prompts. Next, a training-free Negating Negative Computing (NNC) module is proposed to dynamically suppress negative responses during the probability computing stage. To further boost performance through fine-tuning, our Negating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
