CMD: Self-supervised 3D Action Representation Learning with Cross-modal   Mutual Distillation

Yunyao Mao; Wengang Zhou; Zhenbo Lu; Jiajun Deng; Houqiang Li

arXiv:2208.12448·cs.CV·May 26, 2023·1 cites

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel cross-modal mutual distillation framework for self-supervised 3D action recognition, leveraging bidirectional knowledge transfer between skeleton modalities to improve representation learning.

Contribution

It proposes a new CMD framework that models cross-modal interactions as bidirectional knowledge distillation with continuous updates and introduces the use of neighboring similarity distributions.

Findings

01

Outperforms existing self-supervised methods on benchmark datasets.

02

Sets new state-of-the-art records on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II.

03

Demonstrates the effectiveness of bidirectional mutual distillation for 3D action recognition.

Abstract

In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maoyunyao/cmd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems

MethodsKnowledge Distillation