Recurrent Residual Module for Fast Inference in Videos

Bowen Pan; Wuwei Lin; Xiaolin Fang; Chaoqin Huang; Bolei Zhou; Cewu Lu

arXiv:1802.09723·cs.CV·February 28, 2018

Recurrent Residual Module for Fast Inference in Videos

Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, Cewu Lu

PDF

Open Access

TL;DR

This paper introduces the Recurrent Residual Module (RRM), a novel framework that significantly accelerates CNN inference on videos by leveraging feature map similarities, achieving up to 500x speedup while maintaining accuracy.

Contribution

The paper presents a new RRM framework that reduces redundant computation in CNNs for video recognition, enabling precise feature map computation and substantial speed improvements.

Findings

01

Achieves 2x to 12x acceleration on standard CNNs

02

Attains 500x speedup on binary networks like XNOR-Nets

03

Maintains recognition performance while significantly speeding up inference

Abstract

Deep convolutional neural networks (CNNs) have made impressive progress in many video recognition tasks such as video pose estimation and video object detection. However, CNN inference on video is computationally expensive due to processing dense frames individually. In this work, we propose a framework called Recurrent Residual Module (RRM) to accelerate the CNN inference for video recognition tasks. This framework has a novel design of using the similarity of the intermediate feature maps of two consecutive frames, to largely reduce the redundant computation. One unique property of the proposed method compared to previous work is that feature maps of each frame are precisely computed. The experiments show that, while maintaining the similar recognition performance, our RRM yields averagely 2x acceleration on the commonly used CNNs such as AlexNet, ResNet, deep compression model (thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods

MethodsAverage Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Kaiming Initialization · Residual Connection · Convolution · Residual Block · Local Response Normalization