Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for   Video Classification

Zuxuan Wu; Xi Wang; Yu-Gang Jiang; Hao Ye; Xiangyang Xue

arXiv:1504.01561·cs.CV·April 8, 2015·26 cites

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hybrid deep learning framework that combines CNNs and LSTMs to effectively model spatial, short-term, and long-term temporal features for improved video classification accuracy.

Contribution

The work presents a novel hybrid framework integrating CNN-based feature fusion and LSTM modeling to capture multiple temporal aspects of videos, achieving state-of-the-art results.

Findings

01

Achieved 91.3% accuracy on UCF-101 benchmark.

02

Achieved 83.5% accuracy on CCV benchmark.

03

Fusion of spatial and short-term features outperforms direct CNN classification.

Abstract

Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid learning framework that can model several important aspects of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tejgvsl/Camera-motion-classification-in-a-video-file
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory