SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do; Munchurl Kim

arXiv:2403.09508·cs.CV·July 18, 2024·2 cites

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do, Munchurl Kim

PDF

Open Access 1 Repo

TL;DR

SkateFormer introduces a skeletal-temporal transformer that partitions joints and frames to efficiently capture key relations for improved human action recognition from skeleton data.

Contribution

It proposes a novel partition-based self-attention mechanism that models skeletal and temporal relations efficiently, surpassing existing GCN and transformer methods.

Findings

01

Outperforms recent state-of-the-art methods on benchmark datasets.

02

Effectively captures key skeletal-temporal relations with partitioned attention.

03

Reduces computational resources compared to full self-attention models.

Abstract

Skeleton-based action recognition, which classifies human actions based on the coordinates of joints and their connectivity within skeleton data, is widely utilized in various scenarios. While Graph Convolutional Networks (GCNs) have been proposed for skeleton data represented as graphs, they suffer from limited receptive fields constrained by joint connectivity. To address this limitation, recent advancements have introduced transformer-based methods. However, capturing correlations between all joints in all frames requires substantial memory resources. To alleviate this, we propose a novel approach called Skeletal-Temporal Transformer (SkateFormer) that partitions joints and frames based on different types of skeletal-temporal relation (Skate-Type) and performs skeletal-temporal self-attention (Skate-MSA) within each partition. We categorize the key skeletal-temporal relations for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KAIST-VICLab/SkateFormer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings · Residual Connection · Softmax · Linear Layer · Multi-Head Attention · EfficientNet