BlendFusion -- Scalable Synthetic Data Generation for Diffusion Model Training
Thejas Venkatesh, Suguna Varshini Velury

TL;DR
BlendFusion is a scalable framework that generates high-quality synthetic image-caption data from 3D scenes, addressing issues of visual inconsistency and model collapse in diffusion model training.
Contribution
It introduces a novel pipeline with object-centric camera placement, filtering, and captioning, and provides the curated FineBLEND dataset for diffusion model training.
Findings
FineBLEND dataset is diverse and high-quality.
Object-centric camera placement improves data quality.
The framework is highly configurable for community use.
Abstract
With the rapid adoption of diffusion models, synthetic data generation has emerged as a promising approach for addressing the growing demand for large-scale image datasets. However, images generated purely by diffusion models often exhibit visual inconsistencies, and training models on such data can create an autophagous feedback loop that leads to model collapse, commonly referred to as Model Autophagy Disorder (MAD). To address these challenges, we propose BlendFusion, a scalable framework for synthetic data generation from 3D scenes using path tracing. Our pipeline incorporates an object-centric camera placement strategy, robust filtering mechanisms, and automatic captioning to produce high-quality image-caption pairs. Using this pipeline, we curate FineBLEND, an image-caption dataset constructed from a diverse set of 3D scenes. We empirically analyze the quality of FineBLEND and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
