Expanding Zero-Shot Object Counting with Rich Prompts

Huilin Zhu; Senyao Li; Jingling Yuan; Zhengwei Yang; Yu Guo; Wenxuan Liu; Xian Zhong; Shengfeng He

arXiv:2505.15398·cs.CV·May 27, 2025

Expanding Zero-Shot Object Counting with Rich Prompts

Huilin Zhu, Senyao Li, Jingling Yuan, Zhengwei Yang, Yu Guo, Wenxuan Liu, Xian Zhong, Shengfeng He

PDF

Open Access

TL;DR

RichCount is a novel framework that significantly improves zero-shot object counting by enhancing text-image feature alignment through a two-stage training process, enabling better generalization to unseen categories.

Contribution

The paper introduces RichCount, a new method that improves zero-shot counting by refining text encoding and strengthening visual-text associations for better generalization.

Findings

01

Achieves state-of-the-art zero-shot counting performance

02

Enhances generalization to unseen categories in open-world scenarios

03

Outperforms existing methods on benchmark datasets

Abstract

Expanding pre-trained zero-shot counting models to handle unseen categories requires more than simply adding new prompts, as this approach does not achieve the necessary alignment between text and visual features for accurate counting. We introduce RichCount, the first framework to address these limitations, employing a two-stage training strategy that enhances text encoding and strengthens the model's association with objects in images. RichCount improves zero-shot counting for unseen categories through two key objectives: (1) enriching text features with a feed-forward network and adapter trained on text-image similarity, thereby creating robust, aligned representations; and (2) applying this refined encoder to counting tasks, enabling effective generalization across diverse prompts and complex images. In this manner, RichCount goes beyond simple prompt expansion to establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning and Data Classification · Video Surveillance and Tracking Methods

MethodsAdapter