Improving Virtual Try-On with Garment-focused Diffusion Models
Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, and Yang Cao, Tao Mei

TL;DR
This paper introduces GarDiff, a novel diffusion model that enhances virtual try-on by focusing on garment details and appearance, achieving superior photorealistic results compared to existing methods.
Contribution
We propose GarDiff, a garment-focused diffusion model that incorporates appearance priors and a local adapter to improve detail preservation in virtual try-on images.
Findings
Outperforms state-of-the-art VTON methods on VITON-HD and DressCode datasets.
Effectively preserves garment textures and appearance details.
Demonstrates high-fidelity, photorealistic virtual try-on results.
Abstract
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks. Nevertheless, it is not trivial to directly apply diffusion models for synthesizing an image of a target person wearing a given in-shop garment, i.e., image-based virtual try-on (VTON) task. The difficulty originates from the aspect that the diffusion process should not only produce holistically high-fidelity photorealistic image of the target person, but also locally preserve every appearance and texture detail of the given garment. To address this, we shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process with amplified guidance of both basic visual appearance and detailed textures (i.e., high-frequency details) derived from the given garment. GarDiff first remoulds a pre-trained latent diffusion model with additional appearance priors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization
MethodsDiffusion · Adapter · Latent Diffusion Model · Contrastive Language-Image Pre-training
