T-Rex: Counting by Visual Prompting
Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu,, Lei Zhang

TL;DR
T-Rex is an interactive, open-set object counting model that uses visual prompts for detection and counting, achieving state-of-the-art results and demonstrating strong zero-shot capabilities across diverse scenarios.
Contribution
The paper introduces T-Rex, a novel interactive object counting framework that integrates visual prompts for open-set detection and counting, with new benchmarks and practical applications.
Findings
Achieves state-of-the-art performance on class-agnostic counting benchmarks.
Demonstrates exceptional zero-shot counting capabilities.
Effective in diverse real-world scenarios.
Abstract
We introduce T-Rex, an interactive object counting model designed to first detect and then count any objects. We formulate object counting as an open-set object detection task with the integration of visual prompts. Users can specify the objects of interest by marking points or boxes on a reference image, and T-Rex then detects all objects with a similar pattern. Guided by the visual feedback from T-Rex, users can also interactively refine the counting results by prompting on missing or falsely-detected objects. T-Rex has achieved state-of-the-art performance on several class-agnostic counting benchmarks. To further exploit its potential, we established a new counting benchmark encompassing diverse scenarios and challenges. Both quantitative and qualitative results show that T-Rex possesses exceptional zero-shot counting capabilities. We also present various practical application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Visual Attention and Saliency Detection
