Memory Attention Networks for Skeleton-based Action Recognition
Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han, and Changqing Zou, Jianzhuang Liu

TL;DR
This paper introduces Memory Attention Networks (MANs) that enhance skeleton-based action recognition by combining temporal attention recalibration and spatio-temporal convolution, achieving state-of-the-art results on multiple benchmarks.
Contribution
The paper proposes a novel end-to-end network architecture with a temporal attention module and a CNN-based spatio-temporal module for improved action recognition.
Findings
Significantly improves recognition accuracy on four benchmark datasets.
Outperforms existing methods in skeleton-based action recognition.
Demonstrates the effectiveness of combining attention and CNN modules.
Abstract
Skeleton-based action recognition task is entangled with complex spatio-temporal variations of skeleton joints, and remains challenging for Recurrent Neural Networks (RNNs). In this work, we propose a temporal-then-spatial recalibration scheme to alleviate such complex variations, resulting in an end-to-end Memory Attention Networks (MANs) which consist of a Temporal Attention Recalibration Module (TARM) and a Spatio-Temporal Convolution Module (STCM). Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence. The STCM treats the attention calibrated skeleton joint sequences as images and leverages the Convolution Neural Networks (CNNs) to further model the spatial and temporal information of skeleton data. These two modules (TARM and STCM) seamlessly form a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
MethodsConvolution
