Compact and Computationally Efficient Representation of Deep Neural   Networks

Simon Wiedemann; Klaus-Robert M\"uller; Wojciech Samek

arXiv:1805.10692·cs.LG·December 19, 2018

Compact and Computationally Efficient Representation of Deep Neural Networks

Simon Wiedemann, Klaus-Robert M\"uller, Wojciech Samek

PDF

TL;DR

This paper introduces new matrix representations for deep neural network weights that guarantee computational efficiency proportional to their entropy, leading to significant compression, speed, and energy savings during inference.

Contribution

The authors propose novel matrix formats with complexity bounds tied to entropy, improving efficiency of neural network inference over traditional dense or sparse formats.

Findings

01

Achieved up to 42x compression ratios

02

Realized up to 5x speedups in inference

03

Saved up to 90x energy consumption

Abstract

At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization · Convolution · Average Pooling · Local Response Normalization · Concatenated Skip Connection · Global Average Pooling · Dense Block · Grouped Convolution