Boosting Quantitive and Spatial Awareness for Zero-Shot Object Counting

Da Zhang; Bingyu Li; Feiyu Wang; Zhiyuan Zhao; Junyu Gao

arXiv:2603.16129·cs.CV·March 18, 2026

Boosting Quantitive and Spatial Awareness for Zero-Shot Object Counting

Da Zhang, Bingyu Li, Feiyu Wang, Zhiyuan Zhao, Junyu Gao

PDF

Open Access

TL;DR

This paper introduces QICA, a framework enhancing zero-shot object counting by integrating quantity perception and spatial aggregation, improving fine-grained reasoning and generalization across unseen categories and domains.

Contribution

QICA combines quantity perception with spatial aggregation, using a novel prompting strategy and cost aggregation decoder to improve zero-shot counting accuracy and robustness.

Findings

01

Achieves competitive results on FSC-147 dataset.

02

Demonstrates superior zero-shot generalization on CARPK and ShanghaiTech-A.

03

Effectively maintains numerical consistency across the pipeline.

Abstract

Zero-shot object counting (ZSOC) aims to enumerate objects of arbitrary categories specified by text descriptions without requiring visual exemplars. However, existing methods often treat counting as a coarse retrieval task, suffering from a lack of fine-grained quantity awareness. Furthermore, they frequently exhibit spatial insensitivity and degraded generalization due to feature space distortion during model adaptation.To address these challenges, we present \textbf{QICA}, a novel framework that synergizes \underline{q}uantity percept\underline{i}on with robust spatial \underline{c}ast \underline{a}ggregation. Specifically, we introduce a Synergistic Prompting Strategy (\textbf{SPS}) that adapts vision and language encoders through numerically conditioned prompts, bridging the gap between semantic recognition and quantitative reasoning. To mitigate feature distortion, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications