SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised Skeleton Action Recognition
Wenhan Wu, Yilei Hua, Ce Zheng, Shiqian Wu, Chen Chen, Aidong Lu

TL;DR
SkeletonMAE introduces a novel spatial-temporal masked autoencoder framework for self-supervised skeleton-based action recognition, leveraging a masking strategy and transformer architecture to learn generalizable features from unlabeled data, outperforming existing methods.
Contribution
The paper proposes a new spatial-temporal masking strategy and a transformer-based autoencoder for self-supervised skeleton action recognition, enhancing feature generalization and reducing reliance on labeled data.
Findings
Achieves state-of-the-art performance on NTU RGB+D datasets.
Effectively learns spatial-temporal skeleton features with self-supervised pre-training.
Outperforms existing methods in skeleton-based action recognition.
Abstract
Fully supervised skeleton-based action recognition has achieved great progress with the blooming of deep learning techniques. However, these methods require sufficient labeled data which is not easy to obtain. In contrast, self-supervised skeleton-based action recognition has attracted more attention. With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem and reduce the demand of massive labeled training data. Inspired by the MAE, we propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition (SkeletonMAE). Following MAE's masking and reconstruction pipeline, we utilize a skeleton-based encoder-decoder transformer architecture to reconstruct the masked skeleton sequences. A novel masking strategy, named Spatial-Temporal Masking, is introduced in terms of both joint-level and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems
MethodsMasked autoencoder
