From Prompts to Deployment: Auto-Curated Domain-Specific Dataset Generation via Diffusion Models

Dongsik Yoon; Jongeun Kim

arXiv:2601.08095·cs.CV·January 14, 2026

From Prompts to Deployment: Auto-Curated Domain-Specific Dataset Generation via Diffusion Models

Dongsik Yoon, Jongeun Kim

PDF

Open Access

TL;DR

This paper introduces an automated pipeline using diffusion models to generate high-quality, domain-specific synthetic datasets, improving deployment readiness by addressing distribution shifts and reducing real-world data collection needs.

Contribution

It proposes a novel three-stage framework combining inpainting, multi-modal validation, and user-preference classification for synthetic dataset generation.

Findings

01

Effective synthesis of domain-specific objects within backgrounds.

02

Validated datasets with high quality and deployment relevance.

03

Reduced need for extensive real-world data collection.

Abstract

In this paper, we present an automated pipeline for generating domain-specific synthetic datasets with diffusion models, addressing the distribution shift between pre-trained models and real-world deployment environments. Our three-stage framework first synthesizes target objects within domain-specific backgrounds through controlled inpainting. The generated outputs are then validated via a multi-modal assessment that integrates object detection, aesthetic scoring, and vision-language alignment. Finally, a user-preference classifier is employed to capture subjective selection criteria. This pipeline enables the efficient construction of high-quality, deployable datasets while reducing reliance on extensive real-world data collection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning