Diffusion-based Data Augmentation for Object Counting Problems
Zhen Wang, Yuelei Li, Jia Wan, Nuno Vasconcelos

TL;DR
This paper introduces a diffusion-based data augmentation pipeline for object counting, generating synthetic crowd images conditioned on location maps to improve deep learning model performance on limited datasets.
Contribution
It is the first to generate location-conditioned crowd images using diffusion models and to incorporate these for data augmentation in counting tasks.
Findings
Improved counting accuracy on multiple datasets
Enhanced ControlNet performance with smoothed density maps
Effective use of synthetic data for training deep models
Abstract
Crowd counting is an important problem in computer vision due to its wide range of applications in image understanding. Currently, this problem is typically addressed using deep learning approaches, such as Convolutional Neural Networks (CNNs) and Transformers. However, deep networks are data-driven and are prone to overfitting, especially when the available labeled crowd dataset is limited. To overcome this limitation, we have designed a pipeline that utilizes a diffusion model to generate extensive training data. We are the first to generate images conditioned on a location dot map (a binary dot map that specifies the location of human heads) with a diffusion model. We are also the first to use these diverse synthetic data to augment the crowd counting models. Our proposed smoothed density map input for ControlNet significantly improves ControlNet's performance in generating crowds in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Air Quality Monitoring and Forecasting · Human Mobility and Location-Based Analysis
MethodsDiffusion · ALIGN
