Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition
Hong Liu, Juanhui Tu, Mengyuan Liu

TL;DR
This paper introduces a novel two-stream 3D CNN model for skeleton-based action recognition, effectively capturing spatial and temporal features and outperforming RNN-based methods on benchmark datasets.
Contribution
First application of 3D CNN in skeleton-based action recognition, with a multi-temporal extension to improve global feature capturing.
Findings
Outperforms most RNN-based methods on benchmark datasets
Demonstrates robustness to noise in skeleton data
Validates the complementary nature of spatial and temporal features
Abstract
It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present outstanding performance, one of the shortcomings of these methods is the tendency to overemphasize the temporal information. Since 3D convolutional neural network(3D CNN) is a powerful tool to simultaneously learn features from both spatial and temporal dimensions through capturing the correlations between three dimensional signals, this paper proposes a novel two-stream model using 3D CNN. To our best knowledge, this is the first application of 3D CNN in skeleton-based action recognition. Our method consists of three stages. First, skeleton joints are mapped into a 3D coordinate space and then encoding the spatial and temporal information, respectively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems
