TL;DR
This paper introduces an end-to-end deep learning framework that jointly estimates 2D and 3D human poses from a single RGB image, achieving state-of-the-art accuracy efficiently.
Contribution
It presents a unified probabilistic and CNN-based approach for joint 2D and 3D human pose estimation from monocular images, improving accuracy over previous methods.
Findings
Achieves state-of-the-art results on Human3.6M dataset.
Outperforms previous methods in both 2D and 3D pose estimation errors.
Efficient end-to-end training process.
Abstract
We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations. The entire process is trained end-to-end, is extremely efficient and obtains state- of-the-art results on Human3.6M outperforming previous approaches both on 2D and 3D errors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
