Count Anything at Any Granularity
Chang Liu, Haoning Wu, Weidi Xie

TL;DR
This paper introduces a multi-grained counting framework that explicitly models various levels of counting granularity, supported by a new large dataset and a specialized counting model, HieraCount.
Contribution
It redefines open-world counting as multi-grained, creates the KubriCount dataset with comprehensive annotations, and develops HieraCount to improve fine-grained counting accuracy.
Findings
Multimodal large language models struggle with fine-grained prompt distinctions.
Existing counting models exhibit severe prompt-following failures at fine-grained levels.
HieraCount significantly improves multi-grained counting accuracy and robustness.
Abstract
Open-world object counting remains brittle: despite rapid advances in vision-language models (VLMs), reliably counting the objects a user intends is far from solved. We argue that a central reason is that counting granularity is left implicit; users may refer to a specific identity, an attribute, an instance type, a category, or an abstract concept, yet most methods treat "what to count" as a single, category-level matching problem. In this work, we redefine open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text, with optional negative prompts, specifies the intended semantic granularity across five explicit levels. Making granularity explicit, however, exposes a critical data bottleneck: existing counting datasets lack the multi-category scenes, controlled distractors, and instance-level annotations needed to verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
