Expanding Zero-Shot Object Counting with Rich Prompts
Huilin Zhu, Senyao Li, Jingling Yuan, Zhengwei Yang, Yu Guo, Wenxuan Liu, Xian Zhong, Shengfeng He

TL;DR
RichCount is a novel framework that significantly improves zero-shot object counting by enhancing text-image feature alignment through a two-stage training process, enabling better generalization to unseen categories.
Contribution
The paper introduces RichCount, a new method that improves zero-shot counting by refining text encoding and strengthening visual-text associations for better generalization.
Findings
Achieves state-of-the-art zero-shot counting performance
Enhances generalization to unseen categories in open-world scenarios
Outperforms existing methods on benchmark datasets
Abstract
Expanding pre-trained zero-shot counting models to handle unseen categories requires more than simply adding new prompts, as this approach does not achieve the necessary alignment between text and visual features for accurate counting. We introduce RichCount, the first framework to address these limitations, employing a two-stage training strategy that enhances text encoding and strengthens the model's association with objects in images. RichCount improves zero-shot counting for unseen categories through two key objectives: (1) enriching text features with a feed-forward network and adapter trained on text-image similarity, thereby creating robust, aligned representations; and (2) applying this refined encoder to counting tasks, enabling effective generalization across diverse prompts and complex images. In this manner, RichCount goes beyond simple prompt expansion to establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning and Data Classification · Video Surveillance and Tracking Methods
MethodsAdapter
