Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks
Philipp Gysel

TL;DR
Ristretto is an automated framework that approximates CNNs by reducing bit-width and removing multipliers, enabling energy-efficient hardware implementations with minimal accuracy loss.
Contribution
It introduces Ristretto, a fast, GPU-accelerated tool for hardware-oriented approximation of CNNs, including quantization and multiplier elimination, to facilitate efficient embedded hardware deployment.
Findings
Successfully compresses networks to 8-bit precision with 1% accuracy loss
Reduces hardware resource requirements significantly
Enables fast network compression using GPU acceleration
Abstract
Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · Fire Module · Global Average Pooling · 1x1 Convolution · Dropout · Xavier Initialization · Max Pooling
