Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
Jonathan Tompson, Arjun Jain, Yann LeCun, Christoph Bregler

TL;DR
This paper introduces a hybrid deep learning and graphical model architecture for human pose estimation, leveraging structural constraints to improve accuracy in monocular images.
Contribution
It presents a novel joint training approach for combining convolutional networks with Markov Random Fields for pose estimation.
Findings
Significant performance improvement over existing methods
Effective exploitation of geometric constraints
Outperforms state-of-the-art techniques
Abstract
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
MethodsHeatmap
