Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation
Sijin Li, Weichen Zhang, Antoni B. Chan

TL;DR
This paper introduces a deep structured-output learning framework for 3D human pose estimation from monocular images, achieving state-of-the-art results by jointly learning image and pose embeddings with a maximum-margin approach.
Contribution
It proposes a novel deep network architecture that combines structured-output learning with maximum-margin training for improved 3D human pose estimation.
Findings
Achieved state-of-the-art performance on Human3.6m dataset.
Demonstrated effective joint embedding of images and poses.
Visualized learned embedding space for pose and orientation.
Abstract
This paper focuses on structured-output learning using deep neural networks for 3D human pose estimation from monocular images. Our network takes an image and 3D pose as inputs and outputs a score value, which is high when the image-pose pair matches and low otherwise. The network structure consists of a convolutional neural network for image feature extraction, followed by two sub-networks for transforming the image features and pose into a joint embedding. The score function is then the dot-product between the image and pose embeddings. The image-pose embedding and score function are jointly trained using a maximum-margin cost function. Our proposed framework can be interpreted as a special form of structured support vector machines where the joint feature space is discriminatively learned using deep neural networks. We test our framework on the Human3.6m dataset and obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Gait Recognition and Analysis
