Structured Prediction of 3D Human Pose with Deep Neural Networks
Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit,, Pascal Fua

TL;DR
This paper presents a deep learning approach using an overcomplete auto-encoder for structured 3D human pose prediction from monocular images, improving accuracy and joint dependency modeling.
Contribution
It introduces a novel deep regression architecture with an overcomplete auto-encoder to better capture joint dependencies in 3D pose estimation.
Findings
Outperforms state-of-the-art methods in accuracy
Better preserves human pose structure
Efficient inference compared to max-margin frameworks
Abstract
Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from image to 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images that relies on an overcomplete auto-encoder to learn a high-dimensional latent pose representation and account for joint dependencies. We demonstrate that our approach outperforms state-of-the-art ones both in terms of structure preservation and prediction accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
