VITON: An Image-based Virtual Try-on Network
Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis

TL;DR
VITON is a novel image-based virtual try-on network that synthesizes realistic images of a person wearing new clothing items without relying on 3D data, using a coarse-to-fine generative approach.
Contribution
The paper introduces a 2D virtual try-on framework that does not require 3D information and employs a coarse-to-fine strategy for realistic image synthesis.
Findings
Outperforms state-of-the-art generative models on Zalando dataset
Produces photo-realistic images with natural clothing deformation
Effectively integrates target clothing onto person images
Abstract
We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy. Conditioned upon a new clothing-agnostic yet descriptive person representation, our framework first generates a coarse synthesized image with the target clothing item overlaid on that same person in the same pose. We further enhance the initial blurry clothing area with a refinement network. The network is trained to learn how much detail to utilize from the target clothing item, and where to apply to the person in order to synthesize a photo-realistic image in which the target item deforms naturally with clear visual patterns. Experiments on our newly collected Zalando dataset demonstrate its promise in the image-based virtual try-on task over state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · 3D Shape Modeling and Analysis
