Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based   Mechanism for Videos

Sebastian Agethen; Winston H. Hsu

arXiv:1908.08990·cs.CV·August 27, 2019

Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

Sebastian Agethen, Winston H. Hsu

PDF

Open Access

TL;DR

This paper introduces a multi-kernel convolutional LSTM network with an attention mechanism for improved video action recognition, demonstrating enhanced accuracy on benchmark datasets.

Contribution

It proposes a novel multi-kernel convolutional LSTM architecture combined with an attention mechanism, advancing motion-aware video analysis methods.

Findings

01

Improved accuracy on UCF-101 dataset

02

Enhanced performance on Sports-1M dataset

03

Qualitative analysis of model characteristics

Abstract

Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsSigmoid Activation · Tanh Activation · Convolution · Long Short-Term Memory