Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition
Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes

TL;DR
This paper introduces a deep fusion framework combining CNNs and LSTMs for human action recognition in videos, achieving higher accuracy than existing methods across multiple datasets.
Contribution
It proposes a simple yet effective hierarchical multi-stream fusion method that leverages spatial and temporal features for improved action recognition.
Findings
Outperforms state-of-the-art methods on UCF11, UCFSports, jHMDB datasets.
Fusion acts as an attention mechanism to focus on relevant features.
Demonstrates the effectiveness of combining CNN and LSTM features in video analysis.
Abstract
In this paper we address the problem of human action recognition from video sequences. Inspired by the exemplary results obtained via automatic feature learning and deep learning approaches in computer vision, we focus our attention towards learning salient spatial features via a convolutional neural network (CNN) and then map their temporal relationship with the aid of Long-Short-Term-Memory (LSTM) networks. Our contribution in this paper is a deep fusion framework that more effectively exploits spatial features from CNNs with temporal features from LSTM models. We also extensively evaluate their strengths and weaknesses. We find that by combining both the sets of features, the fully connected features effectively act as an attention mechanism to direct the LSTM to interesting parts of the convolutional feature sequence. The significance of our fusion method is its simplicity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
