Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning
Jiahang Zhang, Lilang Lin, Jiaying Liu

TL;DR
This paper introduces PCM$^{ m 3}$, a novel self-supervised learning framework that effectively combines contrastive learning and masked motion modeling to improve versatile 3D action representation learning, enhancing generalization across multiple tasks.
Contribution
The paper proposes a unified framework that integrates contrastive learning with masked motion modeling using a dual-prompted multi-task pretraining strategy, addressing their previous limitations.
Findings
Outperforms state-of-the-art on five downstream tasks
Demonstrates superior generalization across three large-scale datasets
Effectively reduces training interference between tasks
Abstract
Self-supervised learning has proved effective for skeleton-based human action understanding, which is an important yet challenging topic. Previous works mainly rely on contrastive learning or masked motion modeling paradigm to model the skeleton relations. However, the sequence-level and joint-level representation learning cannot be effectively and simultaneously handled by these methods. As a result, the learned representations fail to generalize to different downstream tasks. Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training. To address these problems, we propose Prompted Contrast with Masked Motion Modeling, PCM, for versatile 3D action representation learning. Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Human Motion and Animation
Methodsfail · Contrastive Learning
