Self-supervised 3D Representation Learning of Dressed Humans from Social   Media Videos

Yasamin Jafarian; Hyun Soo Park

arXiv:2103.03319·cs.CV·December 29, 2022

Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos

Yasamin Jafarian, Hyun Soo Park

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised learning approach for high-fidelity 3D human reconstruction from social media videos, overcoming the lack of ground truth data by enforcing temporal coherence and geometric consistency.

Contribution

It presents a novel self-supervised method leveraging local transformations and temporal coherence to learn detailed 3D human geometry without ground truth labels.

Findings

01

Outperforms state-of-the-art depth estimation methods

02

Achieves high-fidelity depth and surface normal predictions

03

Provides theoretical bounds for self-supervised learning performance

Abstract

A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yasaminjafarian/HDNet_TikTok
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Vision and Imaging