Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On
Lingxiao Lu, Shengyi Wu, Haoxuan Sun, Junhong Gou, Jianlou Si, Chen, Qian, Jianfu Zhang, Liqing Zhang

TL;DR
This paper presents a novel self-supervised Vision Transformer combined with a diffusion model to improve the realism and detail accuracy of virtual clothes try-on, enhancing online shopping experiences.
Contribution
It introduces a new method integrating ViT and diffusion models with self-supervision for detailed virtual clothing visualization.
Findings
Significant improvement in visual realism and detail accuracy.
Outperforms existing virtual try-on technologies.
Enhanced focus on key clothing regions improves detail reproduction.
Abstract
Virtual clothes try-on has emerged as a vital feature in online shopping, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts. Techniques such as conditional guidance and focus on key regions have been integrated into our approach. These combined strategies empower the diffusion model to reproduce clothing details with increased clarity and realism. The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences, significantly surpassing the capabilities of existing technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Simulation and Modeling Applications · Virtual Reality Applications and Impacts
MethodsResidual Connection · Softmax · Layer Normalization · Focus · Byte Pair Encoding · Label Smoothing · Diffusion · Adam · Attention Is All You Need · Linear Layer
