VideoLSTM Convolves, Attends and Flows for Action Recognition
Zhenyang Li, Efstratios Gavves, Mihir Jain, Cees G. M. Snoek

TL;DR
VideoLSTM is a novel end-to-end architecture for action recognition in videos that integrates convolutions, motion-based attention, and localization capabilities, improving performance on challenging datasets.
Contribution
It introduces a new VideoLSTM architecture that incorporates convolutions and motion-based attention tailored for video data, enabling better action recognition and localization.
Findings
Effective on challenging action classification datasets
Improves action localization using attention mechanisms
Outperforms existing methods in accuracy
Abstract
We present a new architecture for end-to-end sequence learning of actions in video, we call VideoLSTM. Rather than adapting the video to the peculiarities of established recurrent or convolutional architectures, we adapt the architecture to fit the requirements of the video medium. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, video has a spatial layout. To exploit the spatial correlation we hardwire convolutions in the soft-Attention LSTM architecture. Second, motion not only informs us about the action content, but also guides better the attention towards the relevant spatio-temporal locations. We introduce motion-based attention. And finally, we demonstrate how the attention from VideoLSTM can be used for action localization by relying on just the action class label. Experiments and comparisons on challenging datasets for action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
VideoLSTM: Convolves, attends, and flows for action recognition· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
