TL;DR
This paper leverages temporal information in sequences of 2D human poses to improve 3D pose estimation accuracy and temporal coherence, outperforming previous methods on the Human3.6M dataset.
Contribution
It introduces a sequence-to-sequence LSTM network with temporal smoothness constraints for more accurate and consistent 3D human pose estimation from 2D pose sequences.
Findings
Improves 3D pose estimation accuracy by 12.2% on Human3.6M dataset.
Enhances temporal consistency of 3D pose predictions.
Robustly recovers 3D poses even when 2D detections fail.
Abstract
In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a state-of-the-art 2D pose estimator to estimate the 2D pose from images and then mapping them into 3D space. They also showed that a low-dimensional representation like 2D locations of a set of joints can be discriminative enough to estimate 3D pose with high accuracy. However, estimation of 3D pose for individual frames leads to temporally incoherent estimates due to independent error in each frame causing jitter. Therefore, in this work we utilize the temporal information across a sequence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
