I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal   Mutual Distillation

Yunyao Mao; Jiajun Deng; Wengang Zhou; Zhenbo Lu; Wanli Ouyang,; Houqiang Li

arXiv:2310.15568·cs.CV·October 25, 2023·1 cites

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang,, Houqiang Li

PDF

Open Access

TL;DR

This paper introduces I$^2$MD, a novel self-supervised 3D action learning framework that employs bidirectional inter- and intra-modal mutual distillation to improve representation quality, outperforming existing methods.

Contribution

The work proposes a new mutual distillation framework with continuous bidirectional knowledge transfer and local cluster-level contrasting for 3D action representation learning.

Findings

01

Sets new state-of-the-art results on three datasets.

02

Effectively leverages inter- and intra-modal information.

03

Improves robustness against similar positive samples.

Abstract

Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I $^{2}$ MD) framework. In I $^{2}$ MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning