MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo

TL;DR
MV-VTON introduces a multi-view virtual try-on system using diffusion models, incorporating frontal and back clothing views to generate more accurate and view-consistent images of dressed persons, outperforming existing methods.
Contribution
The paper presents a novel multi-view virtual try-on framework with diffusion models, including a view-adaptive feature selection and joint attention for improved multi-view clothing fitting.
Findings
Achieves state-of-the-art results on MV-VTON dataset.
Outperforms existing methods on frontal-view try-on tasks.
Introduces a new MVG dataset with multi-view images.
Abstract
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing. However, existing methods solely focus on the frontal try-on using the frontal clothing. When the views of the clothing and person are significantly inconsistent, particularly when the person's view is non-frontal, the results are unsatisfactory. To address this challenge, we introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes. Given that single-view clothes provide insufficient information for MV-VTON, we instead employ two images, i.e., the frontal and back views of the clothing, to encompass the complete view as much as possible. Moreover, we adopt diffusion models that have demonstrated superior abilities to perform our MV-VTON. In particular, we propose a view-adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
MethodsDiffusion · ALIGN · Focus
