The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting
Corentin Dumery, Niki Amini-Naieni, Shervin Naini, Pascal Fua

TL;DR
The paper introduces MixCount, a new dataset and benchmark for mixed-object counting, created via an automatic synthesis pipeline to improve model performance on real-world tasks.
Contribution
It presents a scalable, automated method for generating diverse, realistic counting data and establishes MixCount as a benchmark to evaluate and enhance counting models.
Findings
Training on MixCount's synthetic data significantly improves real-world counting accuracy.
State-of-the-art models show severe performance drops in mixed-object scenarios.
MixCount reduces MAE by over 20% on key benchmarks.
Abstract
Object counting is a foundational vision task with over a decade of dedicated research, yet state-of-the-art models still fail systematically in the mixed-object setting that dominates real-world applications such as industrial inspection and product sorting. We show that this gap is strongly driven by limitations in existing training and evaluation data: real counting datasets are prohibitively expensive to annotate and suffer from labeling noise, while existing synthetic alternatives lack diversity and realism. We address this with MixCount, a dataset and benchmark for mixed-object counting designed to target the failure modes of current counting models. To overcome the high cost of constructing and labeling such data, we develop an automatic generation pipeline that synthesizes images, fine-grained textual descriptions, and pixel-perfect counting annotations at scale, eliminating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
