Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Sijie Yan, Yuanjun Xiong, Dahua Lin

TL;DR
This paper introduces ST-GCN, a novel deep learning model that automatically learns spatial and temporal patterns from skeleton data, significantly improving human action recognition accuracy.
Contribution
The paper presents a new spatial-temporal graph convolutional network that surpasses previous methods by learning patterns directly from data, enhancing expressiveness and generalization.
Findings
Achieves state-of-the-art results on Kinetics and NTU-RGBD datasets.
Outperforms traditional hand-crafted and traversal-based methods.
Demonstrates strong generalization across large datasets.
Abstract
Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. This formulation not only leads to greater expressive power but also stronger generalization capability. On two large datasets, Kinetics and NTU-RGBD, it achieves substantial improvements over mainstream methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Gait Recognition and Analysis
