Count Anything at Any Granularity

Chang Liu; Haoning Wu; Weidi Xie

arXiv:2605.10887·cs.CV·May 12, 2026

Count Anything at Any Granularity

Chang Liu, Haoning Wu, Weidi Xie

PDF

2 Repos 1 Datasets

TL;DR

This paper introduces a multi-grained counting framework that explicitly models various levels of counting granularity, supported by a new large dataset and a specialized counting model, HieraCount.

Contribution

It redefines open-world counting as multi-grained, creates the KubriCount dataset with comprehensive annotations, and develops HieraCount to improve fine-grained counting accuracy.

Findings

01

Multimodal large language models struggle with fine-grained prompt distinctions.

02

Existing counting models exhibit severe prompt-following failures at fine-grained levels.

03

HieraCount significantly improves multi-grained counting accuracy and robustness.

Abstract

Open-world object counting remains brittle: despite rapid advances in vision-language models (VLMs), reliably counting the objects a user intends is far from solved. We argue that a central reason is that counting granularity is left implicit; users may refer to a specific identity, an attribute, an instance type, a category, or an abstract concept, yet most methods treat "what to count" as a single, category-level matching problem. In this work, we redefine open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text, with optional negative prompts, specifies the intended semantic granularity across five explicit levels. Making granularity explicit, however, exposes a critical data bottleneck: existing counting datasets lack the multi-category scenes, controlled distractors, and instance-level annotations needed to verify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

liuchang666/KubriCount
dataset· 470 dl
470 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.