Compressing Low Precision Deep Neural Networks Using Sparsity-Induced   Regularization in Ternary Networks

Julian Faraone; Nicholas Fraser; Giulio Gambardella; Michaela Blott; and Philip H.W. Leong

arXiv:1709.06262·cs.CV·October 11, 2017

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Julian Faraone, Nicholas Fraser, Giulio Gambardella, Michaela Blott, and Philip H.W. Leong

PDF

TL;DR

This paper introduces a training method for sparse ternary neural networks that reduces model size and computational complexity by incorporating hardware-aware regularization, achieving high sparsity and efficiency on standard datasets.

Contribution

The authors propose a novel training approach that combines L2 regularization and quantization threshold regularizer to produce highly sparse ternary networks with improved accuracy and hardware efficiency.

Findings

01

Networks are up to 98% sparse.

02

Models are 5 to 11 times smaller than binary/ternary counterparts.

03

Significant resource and speed benefits demonstrated.

Abstract

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings