CascadeCNN: Pushing the Performance Limits of Quantisation in   Convolutional Neural Networks

Alexandros Kouris; Stylianos I. Venieris; Christos-Savvas Bouganis

arXiv:1807.05053·cs.CV·July 16, 2018

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis

PDF

TL;DR

CascadeCNN is an automated FPGA toolflow that enhances CNN inference throughput by aggressively quantizing models and employing a two-stage cascade architecture with confidence evaluation, achieving significant performance gains without retraining.

Contribution

It introduces a novel cascade architecture with confidence evaluation for quantized CNNs, enabling high-performance inference without retraining or training data access.

Findings

01

Up to 55% performance boost for VGG-16

02

Up to 48% performance boost for AlexNet

03

Achieves these gains without retraining or training data

Abstract

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that the proposed toolflow can achieve a performance boost up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/