TL;DR
CBinfer leverages frame-to-frame pixel change sparsity to significantly accelerate convolutional neural network inference on video streams, enabling real-time processing on embedded devices with minimal accuracy loss.
Contribution
The paper introduces a novel algorithm that exploits temporal pixel change sparsity for faster CNN inference without retraining, achieving substantial speed-ups on embedded platforms.
Findings
Average speed-up of 9.1x over cuDNN on Tegra X2
Achieved 7.0x speed-up for pose detection DNN
Reduced arithmetic operations by 5x for object detection
Abstract
The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. Many recent works focus on reducing network complexity for real-time inference on embedded computing platforms. We adopt an orthogonal viewpoint and propose a novel algorithm exploiting the spatio-temporal sparsity of pixel changes. This optimized inference procedure resulted in an average speed-up of 9.1x over cuDNN on the Tegra X2 platform at a negligible accuracy loss of <0.1% and no retraining of the network for a semantic segmentation application. Similarly, an average speed-up of 7.0x has been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
