LSTM Pose Machines

Yue Luo; Jimmy Ren; Zhouxia Wang; Wenxiu Sun; Jinshan Pan; Jianbo Liu,; Jiahao Pang; Liang Lin

arXiv:1712.06316·cs.CV·March 12, 2018

LSTM Pose Machines

Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu,, Jiahao Pang, Liang Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a recurrent neural network with LSTM units for video human pose estimation, improving temporal consistency, handling image quality issues, and increasing speed over traditional multi-stage CNN approaches.

Contribution

It reformulates multi-stage CNNs as a recurrent network with shared weights, enabling LSTM integration for better temporal modeling and efficiency in video pose estimation.

Findings

01

Outperforms state-of-the-art on large-scale video benchmarks.

02

Significantly faster inference for videos.

03

LSTM memory enhances geometric consistency and stability.

Abstract

We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lawy623/LSTM_Pose_Machines
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Advanced Neural Network Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sigmoid Activation · Tanh Activation · Convolution · Long Short-Term Memory