Zero-Shot Object Counting with Language-Vision Models

Jingyi Xu; Hieu Le; Dimitris Samaras

arXiv:2309.13097·cs.CV·September 26, 2023

Zero-Shot Object Counting with Language-Vision Models

Jingyi Xu, Hieu Le, Dimitris Samaras

PDF

Open Access

TL;DR

This paper introduces zero-shot object counting (ZSC), enabling counting of arbitrary object classes using only class names, by leveraging large language-vision models to identify representative patches without human annotations.

Contribution

The paper proposes a novel zero-shot counting framework that uses large models to select exemplars and estimate counting errors, eliminating the need for human-annotated exemplars.

Findings

01

Effective zero-shot counting on FSC-147 dataset

02

Outperforms existing class-agnostic counting methods

03

Demonstrates robustness to unseen classes

Abstract

Class-agnostic object counting aims to count object instances of an arbitrary class at test time. It is challenging but also enables many potential applications. Current methods require human-annotated exemplars as inputs which are often unavailable for novel categories, especially for autonomous systems. Thus, we propose zero-shot object counting (ZSC), a new setting where only the class name is available during test time. This obviates the need for human annotators and enables automated operation. To perform ZSC, we propose finding a few object crops from the input image and use them as counting exemplars. The goal is to identify patches containing the objects of interest while also being visually representative for all instances in the image. To do this, we first construct class prototypes using large language-vision models, including CLIP and Stable Diffusion, to select the patches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training · Diffusion