Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
Giacomo Pacini, Luca Ciampi, Nicola Messina, Nicola Tonellotto, Giuseppe Amato, Fabrizio Falchi

TL;DR
This paper evaluates the semantic grounding ability of text-guided class-agnostic counting models, revealing their weaknesses and proposing new benchmarks and metrics for more reliable assessment.
Contribution
It introduces PrACo++ and MUCCA datasets, along with evaluation protocols, to systematically assess and improve semantic grounding in CAC models.
Findings
Current models often misalign textual prompts with visual objects.
Models perform poorly on new evaluation protocols despite high standard metrics.
Semantic similarity influences counting failures significantly.
Abstract
Open-world text-guided class-agnostic counting (CAC) has emerged as a flexible paradigm for counting arbitrary object classes by using natural language prompts. However, current evaluation protocols primarily focus on standard counting errors within single-category images, overlooking a fundamental requirement: the ability to correctly ground the textual prompt in the visual scene. In this paper, we show that several state-of-the-art CAC models often struggle to determine which object class should be counted based on the given prompt, revealing a misalignment between textual semantics and visual object representations. This limitation leads to spurious counting responses and reduced reliability in real-world scenarios. To systematically address these limitations, we propose a new evaluation framework focused on model robustness and trustworthiness. Our contribution is two-fold: (i) we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
