I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation
Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang,, Houqiang Li

TL;DR
This paper introduces I$^2$MD, a novel self-supervised 3D action learning framework that employs bidirectional inter- and intra-modal mutual distillation to improve representation quality, outperforming existing methods.
Contribution
The work proposes a new mutual distillation framework with continuous bidirectional knowledge transfer and local cluster-level contrasting for 3D action representation learning.
Findings
Sets new state-of-the-art results on three datasets.
Effectively leverages inter- and intra-modal information.
Improves robustness against similar positive samples.
Abstract
Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (IMD) framework. In IMD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
