Feature-Supervised Action Modality Transfer

Fida Mohammad Thoker; Cees G. M. Snoek

arXiv:2108.03329·cs.CV·August 10, 2021

Feature-Supervised Action Modality Transfer

Fida Mohammad Thoker, Cees G. M. Snoek

PDF

Open Access

TL;DR

This paper introduces a feature-supervised transfer learning method to recognize actions in non-RGB video modalities using large-scale RGB datasets, improving performance with limited labeled data.

Contribution

It proposes a two-step training process with feature-supervision strategies for cross-modal action transfer from RGB to non-RGB modalities.

Findings

01

Optical-flow teachers outperform RGB teachers for feature transfer.

02

Transfer method improves action recognition with scarce labeled non-RGB data.

03

Method generalizes across different datasets and tasks.

Abstract

This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB datasets that have limited amounts of labeled examples available. Unfortunately, large-scale labeled action datasets for other modalities are unavailable for pre-training. In this paper, our goal is to recognize actions from limited examples in non-RGB video modalities, by learning from large-scale labeled RGB data. To this end, we propose a two-step training process: (i) we extract action representation knowledge from an RGB-trained teacher network and adapt it to a non-RGB student network.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging