Top-down Attention Recurrent VLAD Encoding for Action Recognition in   Videos

Swathikiran Sudhakaran; Oswald Lanz

arXiv:1808.09892·cs.CV·August 30, 2018

Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos

Swathikiran Sudhakaran, Oswald Lanz

PDF

TL;DR

This paper introduces TA-VLAD, a deep recurrent model with spatial attention that improves action recognition in videos by focusing on discriminative regions, achieving state-of-the-art results on HMDB51 and UCF101.

Contribution

It presents a novel top-down attention mechanism integrated with VLAD encoding for video action recognition, leveraging class-specific activation maps for better feature weighting.

Findings

01

Achieves state-of-the-art accuracy on HMDB51

02

Outperforms previous methods on UCF101

03

Effectively suppresses background noise in video features

Abstract

Most recent approaches for action recognition from video leverage deep architectures to encode the video clip into a fixed length representation vector that is then used for classification. For this to be successful, the network must be capable of suppressing irrelevant scene background and extract the representation from the most discriminative part of the video. Our contribution builds on the observation that spatio-temporal patterns characterizing actions in videos are highly correlated with objects and their location in the video. We propose Top-down Attention Action VLAD (TA-VLAD), a deep recurrent architecture with built-in spatial attention that performs temporally aggregated VLAD encoding for action recognition from videos. We adopt a top-down approach of attention, by using class specific activation maps obtained from a deep CNN pre-trained for image classification, to weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.