Hardware-oriented Approximation of Convolutional Neural Networks
Philipp Gysel, Mohammad Motamedi, Soheil Ghiasi

TL;DR
This paper introduces Ristretto, a framework that approximates CNN models with fixed-point arithmetic to reduce computational complexity and power consumption, enabling efficient deployment on mobile devices.
Contribution
Ristretto provides a hardware-oriented model approximation method that converts CNNs to fixed-point representation with fine-tuning, achieving significant model size reduction.
Findings
Ristretto successfully condenses CaffeNet and SqueezeNet to 8-bit representations.
The framework maintains a maximum error tolerance of 1%.
Fixed-point CNNs outperform floating point models in hardware efficiency.
Abstract
High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator development is hardware-oriented model approximation. In this paper we present Ristretto, a model approximation framework that analyzes a given CNN with respect to numerical resolution used in representing weights and outputs of convolutional and fully connected layers. Ristretto can condense models by using fixed point arithmetic and representation instead of floating point. Moreover, Ristretto fine-tunes the resulting fixed point network. Given a maximum error tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Model Reduction and Neural Networks
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · Fire Module · Global Average Pooling · 1x1 Convolution · Dropout · Xavier Initialization · Max Pooling
