Rethinking Garment Conditioning in Diffusion-based Virtual Try-On
Kihyun Na, Jinyoung Choi, and Injung Kim

TL;DR
This paper introduces Re-CatVTON, an efficient single UNet model for virtual try-on that achieves high fidelity with less computational cost by leveraging new conditioning strategies and theoretical insights.
Contribution
The paper proposes Re-CatVTON, a novel single UNet architecture with tailored guidance and ground-truth injection, outperforming previous models in efficiency and accuracy.
Findings
Re-CatVTON outperforms CatVTON in FID, KID, and LPIPS scores.
Re-CatVTON requires less computation and memory than Dual UNet models.
The model maintains comparable SSIM with improved overall performance.
Abstract
Virtual Try-On (VTON) is the task of synthesizing an image of a person wearing a target garment, conditioned on a person image and a garment image. While diffusion-based VTON models featuring a Dual UNet architecture demonstrate superior fidelity compared to single UNet models, they incur substantial computational and memory overhead due to their heavy structure. In this study, through visualization analysis and theoretical analysis, we derived three hypotheses regarding the learning of context features to condition the denoising process. Based on these hypotheses, we developed Re-CatVTON, an efficient single UNet model that achieves high performance. We further enhance the model by introducing a modified classifier-free guidance strategy tailored for VTON's spatial concatenation conditioning, and by directly injecting the ground-truth garment latent derived from the clean garment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Face recognition and analysis
