Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D   Action Representation Learning

Jiahang Zhang; Lilang Lin; Jiaying Liu

arXiv:2308.03975·cs.CV·August 9, 2023

Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning

Jiahang Zhang, Lilang Lin, Jiaying Liu

PDF

Open Access 2 Repos

TL;DR

This paper introduces PCM$^{ m 3}$, a novel self-supervised learning framework that effectively combines contrastive learning and masked motion modeling to improve versatile 3D action representation learning, enhancing generalization across multiple tasks.

Contribution

The paper proposes a unified framework that integrates contrastive learning with masked motion modeling using a dual-prompted multi-task pretraining strategy, addressing their previous limitations.

Findings

01

Outperforms state-of-the-art on five downstream tasks

02

Demonstrates superior generalization across three large-scale datasets

03

Effectively reduces training interference between tasks

Abstract

Self-supervised learning has proved effective for skeleton-based human action understanding, which is an important yet challenging topic. Previous works mainly rely on contrastive learning or masked motion modeling paradigm to model the skeleton relations. However, the sequence-level and joint-level representation learning cannot be effectively and simultaneously handled by these methods. As a result, the learned representations fail to generalize to different downstream tasks. Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training. To address these problems, we propose Prompted Contrast with Masked Motion Modeling, PCM $^{3}$ , for versatile 3D action representation learning. Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Human Motion and Animation

Methodsfail · Contrastive Learning