The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models
Roy Voetman, Maya Aghaei, Klaas Dijkstra

TL;DR
This paper demonstrates that synthetic datasets generated by fine-tuned diffusion models can effectively train deep object detection models, achieving comparable performance to models trained on real data, thus reducing the need for extensive real-world data collection.
Contribution
It introduces a framework for generating synthetic datasets using diffusion models for training object detectors, showing comparable performance to real data-based models.
Findings
Synthetic data-trained models achieve similar accuracy to real data models.
Average precision deviation ranges from 0.09 to 0.12 for apple detection.
Synthetic datasets can be a viable alternative to real-world data collection.
Abstract
Despite the notable accomplishments of deep object detection models, a major challenge that persists is the requirement for extensive amounts of training data. The process of procuring such real-world data is a laborious undertaking, which has prompted researchers to explore new avenues of research, such as synthetic data generation techniques. This study presents a framework for the generation of synthetic datasets by fine-tuning pretrained stable diffusion models. The synthetic datasets are then manually annotated and employed for training various object detection models. These detectors are evaluated on a real-world test set of 331 images and compared against a baseline model that was trained on real-world images. The results of this study reveal that the object detection models trained on synthetic data perform similarly to the baseline model. In the context of apple detection in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Remote-Sensing Image Classification · Spectroscopy and Chemometric Analyses
MethodsDiffusion
