Hierarchical Attention Network for Action Recognition in Videos

Yilin Wang; Suhang Wang; Jiliang Tang; Neil O'Hare; Yi Chang; Baoxin; Li

arXiv:1607.06416·cs.CV·July 22, 2016·78 cites

Hierarchical Attention Network for Action Recognition in Videos

Yilin Wang, Suhang Wang, Jiliang Tang, Neil O'Hare, Yi Chang, Baoxin, Li

PDF

Open Access

TL;DR

This paper introduces a Hierarchical Attention Network that effectively captures long-range temporal structures and important spatial regions in videos, significantly improving action recognition performance on standard benchmarks.

Contribution

The novel Hierarchical Attention Network integrates static, short-term, and long-term information with attention mechanisms for enhanced video action understanding.

Findings

01

Outperforms state-of-the-art on UCF-101

02

Outperforms state-of-the-art on HMDB-51

03

Efficiently models long-range temporal dependencies

Abstract

Understanding human actions in wild videos is an important task with a broad range of applications. In this paper we propose a novel approach named Hierarchical Attention Network (HAN), which enables to incorporate static spatial information, short-term motion information and long-term video temporal structures for complex human action understanding. Compared to recent convolutional neural network based approaches, HAN has following advantages (1) HAN can efficiently capture video temporal structures in a longer range; (2) HAN is able to reveal temporal transitions between frame chunks with different time steps, i.e. it explicitly models the temporal transitions between frames as well as video segments and (3) with a multiple step spatial temporal attention mechanism, HAN automatically learns important regions in video frames and temporal segments in the video. The proposed model is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications