FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Abhishek Kumar Singh; Ioannis Patras

arXiv:2404.18591·cs.CV·April 30, 2024·1 cites

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Abhishek Kumar Singh, Ioannis Patras

PDF

Open Access 1 Models

TL;DR

FashionSD-X introduces a multimodal latent diffusion model that generates high-quality fashion images from text and sketches, enhancing design workflows with improved realism and control.

Contribution

This paper presents a novel generative pipeline combining ControlNet and LoRA fine-tuning for multimodal fashion image synthesis, outperforming traditional diffusion models.

Findings

01

Significantly better FID, CLIP Score, and KID metrics than baseline models.

02

Effective integration of sketch data improves fashion image realism.

03

Demonstrates potential for interactive and personalized fashion design applications.

Abstract

The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal inputs such as text and sketches. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data. Our evaluation, utilizing metrics like FID, CLIP Score, and KID, demonstrates that our model significantly outperforms traditional stable diffusion models. The results not only highlight the effectiveness of our model in generating fashion-appropriate outputs but also underscore the potential of diffusion models in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Abhi5ingh/ControlnetDresscode
model· 18 dl· ♡ 1
18 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Fashion and Cultural Textiles

MethodsContrastive Language-Image Pre-training · Diffusion