Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos
Yasamin Jafarian, Hyun Soo Park

TL;DR
This paper introduces a self-supervised learning approach for high-fidelity 3D human reconstruction from social media videos, overcoming the lack of ground truth data by enforcing temporal coherence and geometric consistency.
Contribution
It presents a novel self-supervised method leveraging local transformations and temporal coherence to learn detailed 3D human geometry without ground truth labels.
Findings
Outperforms state-of-the-art depth estimation methods
Achieves high-fidelity depth and surface normal predictions
Provides theoretical bounds for self-supervised learning performance
Abstract
A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Vision and Imaging
