Stable Diffusion Dataset Generation for Downstream Classification Tasks
Eugenio Lomurno, Matteo D'Oria, Matteo Matteucci

TL;DR
This paper adapts Stable Diffusion 2.0 for synthetic dataset generation, employing transfer learning and parameter tuning, resulting in datasets that can outperform real data in classification tasks.
Contribution
It introduces a class-conditional Stable Diffusion model with optimization techniques to enhance synthetic data utility for downstream classification.
Findings
Synthetic datasets outperformed real datasets in a third of cases.
Class-conditional model improves dataset relevance for classification.
Optimization of generation parameters enhances model performance.
Abstract
Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsDiffusion
