SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network
Hamza Bouzid, Lahoucine Ballihi

TL;DR
SpATr introduces a novel 3D human action recognition method using spiral auto-encoders and transformers, effectively capturing spatial and temporal features from fixed-topology mesh sequences with efficient memory use.
Contribution
The paper presents a new approach for 3D human action recognition that combines spiral convolutions with a transformer, specifically designed for fixed-topology mesh data, improving scalability and memory efficiency.
Findings
Competitive performance on Babel, MoVi, and BMLrub datasets
Efficient memory usage compared to prior methods
Effective long-range temporal dependency modeling
Abstract
Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition.Unlike prior methods, that rely on sampling 2D depth images, skeleton points, or point clouds, often leading to substantial memory requirements and the ability to handle only short sequences, we introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network), specifically designed for fixed-topology mesh sequences. The SpATr model disentangles space and time in the mesh sequences. A lightweight auto-encoder, based on spiral convolutions, is employed to extract…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Hand Gesture Recognition Systems
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection
