TL;DR
This paper introduces M3D-VTON, a novel network that reconstructs 3D human models from a single image and clothing, combining 2D and 3D techniques for realistic virtual try-on without needing 3D annotations.
Contribution
It presents the first method to generate 3D try-on meshes solely from a person image and clothing, integrating 2D and 3D information efficiently.
Findings
Produces detailed 3D human models with realistic clothing fit
Outperforms existing 3D virtual try-on methods in efficiency
Creates a high-quality dataset with front and back depth maps
Abstract
Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
