Hierarchical Graph Convolutional Skeleton Transformer for Action   Recognition

Ruwen Bai; Min Li; Bo Meng; Fengfa Li; Miao Jiang; Junxing Ren; Degang; Sun

arXiv:2109.02860·cs.CV·January 11, 2022·1 cites

Hierarchical Graph Convolutional Skeleton Transformer for Action Recognition

Ruwen Bai, Min Li, Bo Meng, Fengfa Li, Miao Jiang, Junxing Ren, Degang, Sun

PDF

Open Access

TL;DR

This paper introduces HGCT, a hierarchical model combining GCNs and Transformers with a novel DSTT block to improve skeleton-based action recognition by disentangling features and capturing global and local spatiotemporal information.

Contribution

It proposes a new architecture, HGCT, integrating GCN and Transformer benefits with a DSTT block for enhanced action recognition performance.

Findings

01

HGCT outperforms existing methods in accuracy.

02

The DSTT block effectively disentangles spatiotemporal features.

03

The model is lightweight and computationally efficient.

Abstract

Graph convolutional networks (GCNs) have emerged as dominant methods for skeleton-based action recognition. However, they still suffer from two problems, namely, neighborhood constraints and entangled spatiotemporal feature representations. Most studies have focused on improving the design of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a disentangled spatiotemporal transformer (DSTT) block to overcome the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition;(ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Softmax · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Layer Normalization