Skeleton-based Action Recognition Using LSTM and CNN
Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou and, Wanqing Li

TL;DR
This paper presents a method combining LSTM and CNN for skeleton-based action recognition, effectively capturing spatial-temporal features and achieving state-of-the-art results on NTU RGB+D datasets.
Contribution
It introduces a fusion approach of LSTM and CNN to improve action recognition accuracy from 3D skeleton data.
Findings
Achieved 87.40% accuracy on NTU RGB+D dataset.
Score fusion of CNN and LSTM outperforms other combinations.
Ranked 1st in Large Scale 3D Human Activity Analysis Challenge.
Abstract
Recent methods based on 3D skeleton data have achieved outstanding performance due to its conciseness, robustness, and view-independent representation. With the development of deep learning, Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM)-based learning methods have achieved promising performance for action recognition. However, for CNN-based methods, it is inevitable to loss temporal information when a sequence is encoded into images. In order to capture as much spatial-temporal information as possible, LSTM and CNN are adopted to conduct effective recognition with later score fusion. In addition, experimental results show that the score fusion between CNN and LSTM performs better than that between LSTM and LSTM for the same feature. Our method achieved state-of-the-art results on NTU RGB+D datasets for 3D human action analysis. The proposed method achieved 87.40%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
