Hierarchical Graph Convolutional Skeleton Transformer for Action Recognition
Ruwen Bai, Min Li, Bo Meng, Fengfa Li, Miao Jiang, Junxing Ren, Degang, Sun

TL;DR
This paper introduces HGCT, a hierarchical model combining GCNs and Transformers with a novel DSTT block to improve skeleton-based action recognition by disentangling features and capturing global and local spatiotemporal information.
Contribution
It proposes a new architecture, HGCT, integrating GCN and Transformer benefits with a DSTT block for enhanced action recognition performance.
Findings
HGCT outperforms existing methods in accuracy.
The DSTT block effectively disentangles spatiotemporal features.
The model is lightweight and computationally efficient.
Abstract
Graph convolutional networks (GCNs) have emerged as dominant methods for skeleton-based action recognition. However, they still suffer from two problems, namely, neighborhood constraints and entangled spatiotemporal feature representations. Most studies have focused on improving the design of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a disentangled spatiotemporal transformer (DSTT) block to overcome the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition;(ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Softmax · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Layer Normalization
