StableVITON: Learning Semantic Correspondence with Latent Diffusion   Model for Virtual Try-On

Jeongho Kim; Gyojung Gu; Minho Park; Sunghyun Park; and Jaegul Choo

arXiv:2312.01725·cs.CV·December 5, 2023·2 cites

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park, Sunghyun Park, and Jaegul Choo

PDF

Open Access 1 Repo 1 Models

TL;DR

StableVITON leverages a pre-trained diffusion model with novel attention mechanisms to improve virtual try-on by accurately preserving clothing details and generating high-quality, realistic images.

Contribution

It introduces zero cross-attention blocks and a new attention total variation loss to effectively learn semantic correspondence within the diffusion model for virtual try-on.

Findings

01

Outperforms baseline methods in qualitative assessments.

02

Achieves higher quantitative accuracy in clothing detail preservation.

03

Produces sharper, more precise clothing representations.

Abstract

Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.The main challenge is to preserve the clothing details while effectively utilizing the robust generative capability of the pre-trained model. In order to tackle these issues, we propose StableVITON, learning the semantic correspondence between the clothing and the human body within the latent space of the pre-trained diffusion model in an end-to-end manner. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rlawjdghek/stableviton
pytorchOfficial

Models

🤗
rlawjdghek/StableVITON
model· ♡ 13
♡ 13

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Visual Attention and Saliency Detection

MethodsDiffusion