Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment
Xue Hu, Xinghui Li, Benjamin Busam, Yiren Zhou, Ales Leonardis,, Shanxin Yuan

TL;DR
This paper introduces a novel method for extracting and disentangling 3D human pose, shape, and garment attributes from single 2D images, enabling detailed 3D reconstructions and attribute manipulation.
Contribution
It presents the first approach to achieve cross-domain disentanglement of pose, shape, and garments from 2D images using an implicit embedding and 2D-to-3D encoder-decoder architecture.
Findings
Successfully disentangles pose, shape, and garments in 3D reconstructions.
Enables meaningful attribute transfer and manipulation.
Implicit shape loss improves reconstruction detail.
Abstract
For visual manipulation tasks, we aim to represent image content with semantically meaningful features. However, learning implicit representations from images often lacks interpretability, especially when attributes are intertwined. We focus on the challenging task of extracting disentangled 3D attributes only from 2D image data. Specifically, we focus on human appearance and learn implicit pose, shape and garment representations of dressed humans from RGB images. Our method learns an embedding with disentangled latent representations of these three image properties and enables meaningful re-assembling of features and property control through a 2D-to-3D encoder-decoder structure. The 3D model is inferred solely from the feature map in the learned embedding space. To the best of our knowledge, our method is the first to achieve cross-domain disentanglement for this highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
