Towards Good Practices for Very Deep Two-Stream ConvNets

Limin Wang; Yuanjun Xiong; Zhe Wang; Yu Qiao

arXiv:1507.02159·cs.CV·July 9, 2015·385 cites

Towards Good Practices for Very Deep Two-Stream ConvNets

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao

PDF

Open Access 5 Repos

TL;DR

This paper introduces very deep two-stream convolutional networks for action recognition in videos, employing best practices and multi-GPU training to improve accuracy on UCF101 dataset.

Contribution

It adapts recent very deep architectures to video action recognition and proposes effective training practices for small datasets.

Findings

01

Achieved 91.4% accuracy on UCF101 dataset.

02

Demonstrated the effectiveness of deep architectures with good training practices.

03

Extended Caffe for efficient multi-GPU training.

Abstract

Deep convolutional networks have achieved great success for object recognition in still images. However, for action recognition in videos, the improvement of deep convolutional networks is not so evident. We argue that there are two reasons that could probably explain this result. First the current network architectures (e.g. Two-stream ConvNets) are relatively shallow compared with those very deep models in image domain (e.g. VGGNet, GoogLeNet), and therefore their modeling capacity is constrained by their depth. Second, probably more importantly, the training dataset of action recognition is extremely small compared with the ImageNet dataset, and thus it will be easy to over-fit on the training dataset. To address these issues, this report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain. However, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Neural Network Applications