Compact and Computationally Efficient Representation of Deep Neural Networks
Simon Wiedemann, Klaus-Robert M\"uller, Wojciech Samek

TL;DR
This paper introduces new matrix representations for deep neural network weights that guarantee computational efficiency proportional to their entropy, leading to significant compression, speed, and energy savings during inference.
Contribution
The authors propose novel matrix formats with complexity bounds tied to entropy, improving efficiency of neural network inference over traditional dense or sparse formats.
Findings
Achieved up to 42x compression ratios
Realized up to 5x speedups in inference
Saved up to 90x energy consumption
Abstract
At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization · Convolution · Average Pooling · Local Response Normalization · Concatenated Skip Connection · Global Average Pooling · Dense Block · Grouped Convolution
