StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
Jeongho Kim, Gyojung Gu, Minho Park, Sunghyun Park, and Jaegul Choo

TL;DR
StableVITON leverages a pre-trained diffusion model with novel attention mechanisms to improve virtual try-on by accurately preserving clothing details and generating high-quality, realistic images.
Contribution
It introduces zero cross-attention blocks and a new attention total variation loss to effectively learn semantic correspondence within the diffusion model for virtual try-on.
Findings
Outperforms baseline methods in qualitative assessments.
Achieves higher quantitative accuracy in clothing detail preservation.
Produces sharper, more precise clothing representations.
Abstract
Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.The main challenge is to preserve the clothing details while effectively utilizing the robust generative capability of the pre-trained model. In order to tackle these issues, we propose StableVITON, learning the semantic correspondence between the clothing and the human body within the latent space of the pre-trained diffusion model in an end-to-end manner. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Visual Attention and Saliency Detection
MethodsDiffusion
