In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations
Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll,, Christian Theobalt

TL;DR
This paper introduces a novel deep learning approach for monocular 3D human pose estimation that leverages explicit 2D features and intermediate 3D representations, improving accuracy and generalization to in-the-wild images.
Contribution
It proposes a disentangled hidden space encoding for 2D and 3D features and a learned projection model, enabling joint training on 2D and 3D labeled data for better real-world performance.
Findings
Achieves state-of-the-art accuracy on in-the-wild datasets.
Demonstrates improved generalization to diverse real-world scenes.
Supports training with both 2D and 3D labeled data.
Abstract
Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Advanced Vision and Imaging
