TL;DR
SGDiff is a new style guided diffusion model that enhances fashion image synthesis by integrating style guidance with text-to-image diffusion, reducing training costs and improving control over generated styles.
Contribution
It introduces a novel classifier-free guidance method for multi-modal feature fusion and provides a new high-resolution fashion dataset for synthesis tasks.
Findings
Effective generation of fashion images with desired categories and styles
Reduced training costs compared to existing models
Validated approach through comprehensive ablation studies
Abstract
This paper reports on the development of \textbf{a novel style guided diffusion model (SGDiff)} which overcomes certain weaknesses inherent in existing models for image synthesis. The proposed SGDiff combines image modality with a pretrained text-to-image diffusion model to facilitate creative fashion image synthesis. It addresses the limitations of text-to-image diffusion models by incorporating supplementary style guidance, substantially reducing training costs, and overcoming the difficulties of controlling synthesized styles with text-only inputs. This paper also introduces a new dataset -- SG-Fashion, specifically designed for fashion image synthesis applications, offering high-resolution images and an extensive range of garment categories. By means of comprehensive ablation study, we examine the application of classifier-free guidance to a variety of conditions and validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
