CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li

TL;DR
This paper introduces a novel cross-modal mutual distillation framework for self-supervised 3D action recognition, leveraging bidirectional knowledge transfer between skeleton modalities to improve representation learning.
Contribution
It proposes a new CMD framework that models cross-modal interactions as bidirectional knowledge distillation with continuous updates and introduces the use of neighboring similarity distributions.
Findings
Outperforms existing self-supervised methods on benchmark datasets.
Sets new state-of-the-art records on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II.
Demonstrates the effectiveness of bidirectional mutual distillation for 3D action recognition.
Abstract
In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems
MethodsKnowledge Distillation
