Action Recognition using Visual Attention

Shikhar Sharma; Ryan Kiros; Ruslan Salakhutdinov

arXiv:1511.04119·cs.LG·February 16, 2016·360 cites

Action Recognition using Visual Attention

Shikhar Sharma, Ryan Kiros, Ruslan Salakhutdinov

PDF

Open Access 2 Repos

TL;DR

This paper introduces a soft attention mechanism integrated with deep RNNs for action recognition in videos, enabling the model to focus on relevant frame regions and improve classification accuracy.

Contribution

It presents a novel attention-based deep RNN model that learns to selectively focus on important parts of video frames for action recognition.

Findings

01

Effective attention mechanism improves recognition accuracy.

02

Model adapts focus based on scene and action context.

03

Evaluations on multiple datasets demonstrate robustness.

Abstract

We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model learns to focus selectively on parts of the video frames and classifies videos after taking a few glimpses. The model essentially learns which parts in the frames are relevant for the task at hand and attaches higher importance to them. We evaluate the model on UCF-11 (YouTube Action), HMDB-51 and Hollywood2 datasets and analyze how the model focuses its attention depending on the scene and the action being performed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications