Highly Efficient 8-bit Low Precision Inference of Convolutional Neural   Networks with IntelCaffe

Jiong Gong; Haihao Shen; Guoming Zhang; Xiaoli Liu; Shane Li; Ge Jin,; Niharika Maheshwari; Evarist Fomenko; Eden Segal

arXiv:1805.08691·cs.CV·May 23, 2018·1 cites

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe

Jiong Gong, Haihao Shen, Guoming Zhang, Xiaoli Liu, Shane Li, Ge Jin,, Niharika Maheshwari, Evarist Fomenko, Eden Segal

PDF

Open Access 1 Repo

TL;DR

This paper introduces IntelCaffe, an optimized deep learning framework that enables efficient 8-bit low precision inference on Intel Xeon processors, significantly improving throughput and latency with minimal accuracy loss.

Contribution

It presents the first Intel-optimized framework supporting automatic 8-bit model inference without retraining, boosting performance of CNNs on Intel hardware.

Findings

01

Inference throughput improved by up to 2.9X

02

Latency reduced by up to 3X

03

Minimal accuracy loss compared to FP32 baseline

Abstract

High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Processors. The 8-bit optimized model is automatically generated with a calibration process from FP32 model without the need of fine-tuning or retraining. We show that the inference throughput and latency with ResNet-50, Inception-v3 and SSD are improved by 1.38X-2.9X and 1.35X-3X respectively with neglectable accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and 26X-37X from BVLC Caffe. All these techniques have been open-sourced on IntelCaffe GitHub1, and the artifact is provided to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intel/caffe
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Neural Networks and Applications

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD