T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang; Feng Li; Zhaoyang Zeng; Tianhe Ren; Shilong Liu; Lei; Zhang

arXiv:2403.14610·cs.CV·March 22, 2024·3 cites

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei, Zhang

PDF

Open Access 2 Repos 1 Models

TL;DR

T-Rex2 is a versatile open-set object detection model that synergizes text and visual prompts through contrastive learning, enabling effective zero-shot detection across diverse real-world scenarios.

Contribution

It introduces a novel framework that combines text and visual prompts within a single model for improved open-set object detection.

Findings

01

Exhibits strong zero-shot detection performance across various scenarios.

02

Demonstrates effective synergy between text and visual prompts.

03

Enables handling of diverse input formats for flexible detection.

Abstract

We present T-Rex2, a highly practical model for open-set object detection. Previous open-set object detection methods relying on text prompts effectively encapsulate the abstract concept of common objects, but struggle with rare or complex object representation due to data scarcity and descriptive limitations. Conversely, visual prompts excel in depicting novel objects through concrete visual examples, but fall short in conveying the abstract concept of objects as effectively as text prompts. Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning. T-Rex2 accepts inputs in diverse formats, including text prompts, visual prompts, and the combination of both, so that it can handle different scenarios by switching between the two prompt modalities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Video Analysis and Summarization