SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J. Black, Justus, Thies, Timo Bolkart

TL;DR
SCULPT introduces a novel unpaired learning framework for generating detailed, pose-dependent 3D human meshes with clothing and textures, leveraging both 3D scans and 2D images to overcome data limitations.
Contribution
The paper proposes an unpaired learning approach that combines 3D scan data and 2D images to generate pose-dependent clothed human meshes with textures, addressing data scarcity issues.
Findings
Effective learning from limited 3D scans and large 2D image datasets.
Achieves realistic, pose-dependent human mesh generation with clothing and textures.
Outperforms existing 3D human body generative models.
Abstract
We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training
