Evaluating Two-Stream CNN for Video Classification

Hao Ye; Zuxuan Wu; Rui-Wei Zhao; Xi Wang; Yu-Gang Jiang; Xiangyang Xue

arXiv:1504.01920·cs.CV·April 9, 2015

Evaluating Two-Stream CNN for Video Classification

Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, Xiangyang Xue

PDF

TL;DR

This paper thoroughly evaluates the implementation choices of a two-stream CNN for video classification, providing practical guidelines and achieving competitive results on benchmark datasets.

Contribution

It offers a comprehensive analysis of various design options for two-stream CNNs, guiding future research in effective video classification methods.

Findings

01

Optimal network architectures identified

02

Model fusion improves classification accuracy

03

Guidelines for parameter settings established

Abstract

Videos contain very rich semantic information. Traditional hand-crafted features are known to be inadequate in analyzing complex video semantics. Inspired by the huge success of the deep learning methods in analyzing image, audio and text data, significant efforts are recently being devoted to the design of deep nets for video analytics. Among the many practical needs, classifying videos (or video clips) based on their major semantic categories (e.g., "skiing") is useful in many applications. In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification. Our evaluations are conducted on top of a recent two-stream convolutional neural network (CNN) pipeline, which uses both static frames and motion optical flows, and has demonstrated competitive performance against the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.