LSTM Pose Machines
Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu,, Jiahao Pang, Liang Lin

TL;DR
This paper introduces a recurrent neural network with LSTM units for video human pose estimation, improving temporal consistency, handling image quality issues, and increasing speed over traditional multi-stage CNN approaches.
Contribution
It reformulates multi-stage CNNs as a recurrent network with shared weights, enabling LSTM integration for better temporal modeling and efficiency in video pose estimation.
Findings
Outperforms state-of-the-art on large-scale video benchmarks.
Significantly faster inference for videos.
LSTM memory enhances geometric consistency and stability.
Abstract
We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sigmoid Activation · Tanh Activation · Convolution · Long Short-Term Memory
