CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks
Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis

TL;DR
CascadeCNN is an automated FPGA toolflow that enhances CNN inference throughput by aggressively quantizing models and employing a two-stage cascade architecture with confidence evaluation, achieving significant performance gains without retraining.
Contribution
It introduces a novel cascade architecture with confidence evaluation for quantized CNNs, enabling high-performance inference without retraining or training data access.
Findings
Up to 55% performance boost for VGG-16
Up to 48% performance boost for AlexNet
Achieves these gains without retraining or training data
Abstract
This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that the proposed toolflow can achieve a performance boost up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
