Learning an Augmented RGB Representation with Cross-Modal Knowledge   Distillation for Action Detection

Rui Dai; Srijan Das; Francois Bremond

arXiv:2108.03619·cs.CV·August 10, 2021

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

Rui Dai, Srijan Das, Francois Bremond

PDF

TL;DR

This paper introduces a novel cross-modal knowledge distillation framework that enhances RGB representations for action detection by transferring temporal and contextual knowledge from additional modalities, improving performance with only RGB at inference.

Contribution

The paper proposes a two-level distillation method that transfers both atomic and sequence-level temporal knowledge from multiple modalities to RGB for action detection.

Findings

01

Outperforms existing cross-modal distillation methods in action detection.

02

Achieves competitive results with only RGB during inference.

03

Demonstrates the effectiveness of multi-level knowledge transfer for temporal understanding.

Abstract

In video understanding, most cross-modal knowledge distillation (KD) methods are tailored for classification tasks, focusing on the discriminative representation of the trimmed videos. However, action detection requires not only categorizing actions, but also localizing them in untrimmed videos. Therefore, transferring knowledge pertaining to temporal relations is critical for this task which is missing in the previous cross-modal KD frameworks. To this end, we aim at learning an augmented RGB representation for action detection, taking advantage of additional modalities at training time through KD. We propose a KD framework consisting of two levels of distillation. On one hand, atomic-level distillation encourages the RGB student to learn the sub-representation of the actions from the teacher in a contrastive manner. On the other hand, sequence-level distillation encourages the student…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation