Masked Motion Predictors are Strong 3D Action Representation Learners

Yunyao Mao; Jiajun Deng; Wengang Zhou; Yao Fang; Wanli Ouyang,; Houqiang Li

arXiv:2308.07092·cs.CV·August 15, 2023·1 cites

Masked Motion Predictors are Strong 3D Action Representation Learners

Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang,, Houqiang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MAMP, a self-supervised pre-training framework for 3D human action recognition that predicts motion in masked skeleton sequences, significantly enhancing transformer performance on benchmark datasets.

Contribution

The paper proposes a novel masked motion prediction framework that emphasizes explicit motion modeling over traditional component reconstruction for better 3D action representation.

Findings

01

MAMP improves transformer-based models on NTU-60, NTU-120, and PKU-MMD datasets.

02

It achieves state-of-the-art results without additional bells and whistles.

03

Motion prediction as a pretext task enhances semantic focus in skeleton sequences.

Abstract

In 3D human action recognition, limited supervised data makes it challenging to fully tap into the modeling potential of powerful networks such as transformers. As a result, researchers have been actively investigating effective self-supervised pre-training strategies. In this work, we show that instead of following the prevalent pretext task to perform masked self-component reconstruction in human joints, explicit contextual motion modeling is key to the success of learning effective feature representation for 3D action recognition. Formally, we propose the Masked Motion Prediction (MAMP) framework. To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints. Considering the high temporal redundancy of the skeleton sequence, in our MAMP, the motion information also acts as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maoyunyao/mamp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications