X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation
Yuchen Yang, Xuanyi Liu, Xing Gao, Zhihang Zhong, Xiao Sun

TL;DR
This paper introduces an unsupervised framework for monocular 3D human pose estimation that effectively addresses depth ambiguity by using a multi-hypothesis detector and leveraging 3D human priors, achieving state-of-the-art results.
Contribution
It proposes a novel unsupervised approach with a multi-hypothesis detector and 3D priors regularization to improve depth ambiguity handling in monocular 3D pose estimation.
Findings
Achieves state-of-the-art unsupervised 3D pose estimation performance.
Demonstrates strong generalization on larger datasets and animal data.
Effectively manages multi-solution depth ambiguity in pose estimation.
Abstract
Recent unsupervised methods for monocular 3D pose estimation have endeavored to reduce dependence on limited annotated 3D data, but most are solely formulated in 2D space, overlooking the inherent depth ambiguity issue. Due to the information loss in 3D-to-2D projection, multiple potential depths may exist, yet only some of them are plausible in human structure. To tackle depth ambiguity, we propose a novel unsupervised framework featuring a multi-hypothesis detector and multiple tailored pretext tasks. The detector extracts multiple hypotheses from a heatmap within a local window, effectively managing the multi-solution problem. Furthermore, the pretext tasks harness 3D human priors from the SMPL model to regularize the solution space of pose estimation, aligning it with the empirical distribution of 3D human structures. This regularization is partially achieved through a GCN-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Human Motion and Animation
MethodsHeatmap
