SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder   and Transformer Network

Hamza Bouzid; Lahoucine Ballihi

arXiv:2306.17574·cs.CV·May 31, 2024

SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network

Hamza Bouzid, Lahoucine Ballihi

PDF

Open Access 1 Repo

TL;DR

SpATr introduces a novel 3D human action recognition method using spiral auto-encoders and transformers, effectively capturing spatial and temporal features from fixed-topology mesh sequences with efficient memory use.

Contribution

The paper presents a new approach for 3D human action recognition that combines spiral convolutions with a transformer, specifically designed for fixed-topology mesh data, improving scalability and memory efficiency.

Findings

01

Competitive performance on Babel, MoVi, and BMLrub datasets

02

Efficient memory usage compared to prior methods

03

Effective long-range temporal dependency modeling

Abstract

Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition.Unlike prior methods, that rely on sampling 2D depth images, skeleton points, or point clouds, often leading to substantial memory requirements and the ability to handle only short sequences, we introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network), specifically designed for fixed-topology mesh sequences. The SpATr model disentangles space and time in the mesh sequences. A lightweight auto-encoder, based on spiral convolutions, is employed to extract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

h-bouzid/spatr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Hand Gesture Recognition Systems

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection