Recurrent Network Models for Human Dynamics
Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

TL;DR
The paper introduces the ERD model, a recurrent neural network architecture that improves human pose recognition and prediction in videos and motion capture data, outperforming previous models in accuracy and robustness.
Contribution
The ERD model extends LSTM architectures with nonlinear encoder and decoder networks, enabling joint learning of representations and dynamics for human motion tasks.
Findings
ERD outperforms per frame detectors in pose labeling.
ERD accurately predicts joint displacements over 400ms.
ERD synthesizes novel motions and maintains long-term stability.
Abstract
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoid drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Analysis and Summarization
