CountZES: Counting via Zero-Shot Exemplar Selection
Muhammad Ibraheem Siddiqui, Muhammad Haris Khan

TL;DR
CountZES introduces a novel zero-shot object counting method that selects diverse, accurate exemplars through detection refinement, density-guided self-supervision, and feature clustering, outperforming existing approaches across datasets.
Contribution
The paper proposes CountZES, an inference-only zero-shot exemplar selection framework that enhances object counting accuracy by combining detection refinement, density-based exemplar discovery, and feature clustering.
Findings
Outperforms existing zero-shot counting methods on multiple datasets.
Effectively generalizes across different domains.
Achieves superior counting accuracy with diverse exemplar sets.
Abstract
Object counting in complex scenes is particularly challenging in the zero-shot (ZS) setting, where instances of unseen categories are counted using only a class name. Existing ZS counting methods that infer exemplars from text often rely on off-the-shelf open-vocabulary detectors (OVDs), which in dense scenes suffer from semantic noise, appearance variability, and frequent multi-instance proposals. Alternatively, random image-patch sampling is employed, which fails to accurately delineate object instances. To address these issues, we propose CountZES, an inference-only approach for object counting via ZS exemplar selection. CountZES discovers diverse exemplars through three synergistic stages: Detection-Anchored Exemplar (DAE), Density-Guided Exemplar (DGE), and Feature-Consensus Exemplar (FCE). DAE refines OVD detections to isolate precise single-instance exemplars. DGE introduces a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
