Hardware-oriented Approximation of Convolutional Neural Networks

Philipp Gysel; Mohammad Motamedi; Soheil Ghiasi

arXiv:1604.03168·cs.CV·October 21, 2016·280 cites

Hardware-oriented Approximation of Convolutional Neural Networks

Philipp Gysel, Mohammad Motamedi, Soheil Ghiasi

PDF

Open Access 1 Repo

TL;DR

This paper introduces Ristretto, a framework that approximates CNN models with fixed-point arithmetic to reduce computational complexity and power consumption, enabling efficient deployment on mobile devices.

Contribution

Ristretto provides a hardware-oriented model approximation method that converts CNNs to fixed-point representation with fine-tuning, achieving significant model size reduction.

Findings

01

Ristretto successfully condenses CaffeNet and SqueezeNet to 8-bit representations.

02

The framework maintains a maximum error tolerance of 1%.

03

Fixed-point CNNs outperform floating point models in hardware efficiency.

Abstract

High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator development is hardware-oriented model approximation. In this paper we present Ristretto, a model approximation framework that analyzes a given CNN with respect to numerical resolution used in representing weights and outputs of convolutional and fully connected layers. Ristretto can condense models by using fixed point arithmetic and representation instead of floating point. Moreover, Ristretto fine-tunes the resulting fixed point network. Given a maximum error tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pmgysel/caffe
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Model Reduction and Neural Networks

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · Fire Module · Global Average Pooling · 1x1 Convolution · Dropout · Xavier Initialization · Max Pooling