Attention Distillation for Learning Video Representations

Miao Liu; Xin Chen; Yun Zhang; Yin Li; James M. Rehg

arXiv:1904.03249·cs.CV·August 18, 2020·6 cites

Attention Distillation for Learning Video Representations

Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg

PDF

Open Access

TL;DR

This paper introduces an attention distillation method that transfers motion representations from flow networks to RGB networks, improving video recognition performance and action localization.

Contribution

We propose a novel attention distillation technique that enhances RGB video models with motion cues learned from flow networks.

Findings

01

Significant performance improvements on major action benchmarks.

02

Attention maps effectively leverage motion cues for action localization.

03

Method consistently outperforms baseline RGB networks.

Abstract

We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network. We systematically study the design of attention modules, and develop a novel method for attention distillation. Our method is evaluated on major action benchmarks, and consistently improves the performance of the baseline RGB network by a significant margin. Moreover, we demonstrate that our attention maps can leverage motion cues in learning to identify the location of actions in video frames. We believe our method provides a step towards learning motion-aware representations in deep models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications