Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Helei Qiu; Biao Hou; Bo Ren; Xiaohua Zhang

arXiv:2201.02849·cs.CV·November 4, 2022·38 cites

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Helei Qiu, Biao Hou, Bo Ren, Xiaohua Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces STTFormer, a novel transformer-based model that captures joint correlations across space and time in skeleton sequences, improving action recognition accuracy.

Contribution

The paper proposes a spatio-temporal tuples self-attention module and a feature aggregation module to better model joint relationships across frames, surpassing existing methods.

Findings

01

Achieves superior performance on large-scale datasets

02

Effectively models joint correlations across frames

03

Outperforms state-of-the-art methods in accuracy

Abstract

Capturing the dependencies between joints is critical in skeleton-based action recognition task. Transformer shows great potential to model the correlation of important joints. However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and legs in "long jump") between adjacent frames move together. Focus on this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is proposed. The skeleton sequence is divided into several parts, and several consecutive frames contained in each part are encoded. And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. In addition, a feature aggregation module is introduced between non-adjacent frames to enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

heleiqiu/sttformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization · Dense Connections · Softmax · Byte Pair Encoding · Label Smoothing