A simple yet effective baseline for 3d human pose estimation
Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little

TL;DR
This paper demonstrates that a simple deep learning approach can effectively lift 2d joint locations to 3d poses with low error, revealing that much of the current error in 3d pose estimation stems from visual analysis rather than the lifting process itself.
Contribution
The authors show that a straightforward deep feed-forward network can outperform complex methods in 3d pose lifting, and that using 2d detector outputs yields state-of-the-art results.
Findings
Simple lifting network outperforms previous methods by 30% on Human3.6M.
Using 2d detector outputs achieves state-of-the-art 3d pose estimation.
Majority of errors in 3d pose estimation originate from 2d visual understanding.
Abstract
Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-dimensional positions. With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feed-forward network outperforms the best reported result by about 30\% on Human3.6M, the largest publicly available 3d pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
