SmallBigNet: Integrating Core and Contextual Views for Video   Classification

Xianhang Li; Yali Wang; Zhipeng Zhou; Yu Qiao

arXiv:2006.14582·cs.CV·June 26, 2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification

Xianhang Li, Yali Wang, Zhipeng Zhou, Yu Qiao

PDF

Open Access 1 Repo 1 Video

TL;DR

SmallBigNet enhances video classification by integrating core and contextual views through a dual-branch network, improving accuracy and robustness while maintaining model compactness.

Contribution

The paper introduces SmallBigNet, a novel architecture that combines small and big view branches with shared convolutions to improve video representation learning.

Findings

01

Outperforms recent state-of-the-art methods on Kinetics400 and other benchmarks.

02

Achieves high accuracy with model size comparable to 2D CNNs.

03

Demonstrates improved robustness and discriminative power in video classification.

Abstract

Temporal convolution has been widely used for video classification. However, it is performed on spatio-temporal contexts in a limited view, which often weakens its capacity of learning video representation. To alleviate this problem, we propose a concise and novel SmallBig network, with the cooperation of small and big views. For the current time step, the small view branch is used to learn the core semantics, while the big view branch is used to capture the contextual semantics. Unlike traditional temporal convolution, the big view branch can provide the small view branch with the most activated video features from a broader 3D receptive field. Via aggregating such big-view contexts, the small view branch can learn more robust and discriminative spatio-temporal representations for video classification. Furthermore, we propose to share convolution in the small and big view branch, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xhl-video/SmallBigNet
pytorchOfficial

Videos

SmallBigNet: Integrating Core and Contextual Views for Video Classification· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods

MethodsConvolution