Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue

TL;DR
This paper introduces a hybrid deep learning framework that combines CNNs and LSTMs to effectively model spatial, short-term, and long-term temporal features for improved video classification accuracy.
Contribution
The work presents a novel hybrid framework integrating CNN-based feature fusion and LSTM modeling to capture multiple temporal aspects of videos, achieving state-of-the-art results.
Findings
Achieved 91.3% accuracy on UCF-101 benchmark.
Achieved 83.5% accuracy on CCV benchmark.
Fusion of spatial and short-term features outperforms direct CNN classification.
Abstract
Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid learning framework that can model several important aspects of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory
