CoDi -- an exemplar-conditioned diffusion model for low-shot counting
Grega \v{S}u\v{s}tar, Jer Pelhan, Alan Luke\v{z}i\v{c}, Matej Kristan

TL;DR
CoDi introduces a novel exemplar-conditioned diffusion model for low-shot object counting, significantly improving localization and counting accuracy in dense, small-object scenarios compared to existing methods.
Contribution
It is the first latent diffusion-based low-shot counter with an exemplar conditioning module that enhances object localization and counting performance.
Findings
Outperforms state-of-the-art by 15% MAE on FSC benchmark
Sets new SOTA on MCAC benchmark with 44% MAE improvement
Achieves high-quality density maps enabling accurate object localization
Abstract
Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern low-shot counters are dense regions with small objects. While total counts in such situations are typically well addressed by density-based counters, their usefulness is limited by poor localization capabilities. This is better addressed by point-detection-based counters, which are based on query-based detectors. However, due to limited number of pre-trained queries, they underperform on images with very large numbers of objects, and resort to ad-hoc techniques like upsampling and tiling. We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression. Our core contribution is the new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Generative Adversarial Networks and Image Synthesis
