Zero-Shot Object Counting with Language-Vision Models
Jingyi Xu, Hieu Le, Dimitris Samaras

TL;DR
This paper introduces zero-shot object counting (ZSC), enabling counting of arbitrary object classes using only class names, by leveraging large language-vision models to identify representative patches without human annotations.
Contribution
The paper proposes a novel zero-shot counting framework that uses large models to select exemplars and estimate counting errors, eliminating the need for human-annotated exemplars.
Findings
Effective zero-shot counting on FSC-147 dataset
Outperforms existing class-agnostic counting methods
Demonstrates robustness to unseen classes
Abstract
Class-agnostic object counting aims to count object instances of an arbitrary class at test time. It is challenging but also enables many potential applications. Current methods require human-annotated exemplars as inputs which are often unavailable for novel categories, especially for autonomous systems. Thus, we propose zero-shot object counting (ZSC), a new setting where only the class name is available during test time. This obviates the need for human annotators and enables automated operation. To perform ZSC, we propose finding a few object crops from the input image and use them as counting exemplars. The goal is to identify patches containing the objects of interest while also being visually representative for all instances in the image. To do this, we first construct class prototypes using large language-vision models, including CLIP and Stable Diffusion, to select the patches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsContrastive Language-Image Pre-training · Diffusion
