Feature-Supervised Action Modality Transfer
Fida Mohammad Thoker, Cees G. M. Snoek

TL;DR
This paper introduces a feature-supervised transfer learning method to recognize actions in non-RGB video modalities using large-scale RGB datasets, improving performance with limited labeled data.
Contribution
It proposes a two-step training process with feature-supervision strategies for cross-modal action transfer from RGB to non-RGB modalities.
Findings
Optical-flow teachers outperform RGB teachers for feature transfer.
Transfer method improves action recognition with scarce labeled non-RGB data.
Method generalizes across different datasets and tasks.
Abstract
This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB datasets that have limited amounts of labeled examples available. Unfortunately, large-scale labeled action datasets for other modalities are unavailable for pre-training. In this paper, our goal is to recognize actions from limited examples in non-RGB video modalities, by learning from large-scale labeled RGB data. To this end, we propose a two-step training process: (i) we extract action representation knowledge from an RGB-trained teacher network and adapt it to a non-RGB student network.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging
