SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised   Skeleton Action Recognition

Wenhan Wu; Yilei Hua; Ce Zheng; Shiqian Wu; Chen Chen; Aidong Lu

arXiv:2209.02399·cs.CV·May 12, 2023·1 cites

SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised Skeleton Action Recognition

Wenhan Wu, Yilei Hua, Ce Zheng, Shiqian Wu, Chen Chen, Aidong Lu

PDF

Open Access

TL;DR

SkeletonMAE introduces a novel spatial-temporal masked autoencoder framework for self-supervised skeleton-based action recognition, leveraging a masking strategy and transformer architecture to learn generalizable features from unlabeled data, outperforming existing methods.

Contribution

The paper proposes a new spatial-temporal masking strategy and a transformer-based autoencoder for self-supervised skeleton action recognition, enhancing feature generalization and reducing reliance on labeled data.

Findings

01

Achieves state-of-the-art performance on NTU RGB+D datasets.

02

Effectively learns spatial-temporal skeleton features with self-supervised pre-training.

03

Outperforms existing methods in skeleton-based action recognition.

Abstract

Fully supervised skeleton-based action recognition has achieved great progress with the blooming of deep learning techniques. However, these methods require sufficient labeled data which is not easy to obtain. In contrast, self-supervised skeleton-based action recognition has attracted more attention. With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem and reduce the demand of massive labeled training data. Inspired by the MAE, we propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition (SkeletonMAE). Following MAE's masking and reconstruction pipeline, we utilize a skeleton-based encoder-decoder transformer architecture to reconstruct the masked skeleton sequences. A novel masking strategy, named Spatial-Temporal Masking, is introduced in terms of both joint-level and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems

MethodsMasked autoencoder