Unsupervised 3D Pose Estimation with Geometric Self-Supervision
Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV,, Stefan Stojanov, and James M. Rehg

TL;DR
This paper introduces an unsupervised method for 3D human pose estimation from 2D images using geometric self-supervision, eliminating the need for multi-view data or 3D annotations.
Contribution
It proposes a novel self-consistency training framework with a discriminator and a 2D domain adapter to improve 3D pose estimation without supervision.
Findings
Outperforms previous unsupervised methods by 30% on Human3.6M
Achieves state-of-the-art results among weakly supervised approaches
Demonstrates effective use of 2D pose data in the wild
Abstract
We present an unsupervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. During training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new "synthetic" 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can define self-consistency loss both in 3D and in 2D. The training can thus be self supervised by exploiting the geometric self-consistency of the lift-reproject-lift process. We show that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems
