Stable Diffusion Dataset Generation for Downstream Classification Tasks

Eugenio Lomurno; Matteo D'Oria; Matteo Matteucci

arXiv:2405.02698·cs.LG·May 7, 2024

Stable Diffusion Dataset Generation for Downstream Classification Tasks

Eugenio Lomurno, Matteo D'Oria, Matteo Matteucci

PDF

Open Access

TL;DR

This paper adapts Stable Diffusion 2.0 for synthetic dataset generation, employing transfer learning and parameter tuning, resulting in datasets that can outperform real data in classification tasks.

Contribution

It introduces a class-conditional Stable Diffusion model with optimization techniques to enhance synthetic data utility for downstream classification.

Findings

01

Synthetic datasets outperformed real datasets in a third of cases.

02

Class-conditional model improves dataset relevance for classification.

03

Optimization of generation parameters enhances model performance.

Abstract

Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsDiffusion