Self-Compressing Neural Networks
Szabolcs Cs\'efalvay, James Imber

TL;DR
This paper introduces Self-Compression, a method to significantly reduce neural network size by removing redundant weights and decreasing bit precision, enabling efficient training and inference without specialized hardware.
Contribution
The paper presents a novel, general approach that simultaneously compresses weights and reduces bit precision using a generalized loss function.
Findings
Achieves floating point accuracy with only 3% of bits remaining
Removes 82% of weights while maintaining accuracy
Reduces network size substantially without hardware modifications
Abstract
This work focuses on reducing neural network size, which is a major driver of neural network execution time, power consumption, bandwidth, and memory footprint. A key challenge is to reduce size in a manner that can be exploited readily for efficient training and inference without the need for specialized hardware. We propose Self-Compression: a simple, general method that simultaneously achieves two goals: (1) removing redundant weights, and (2) reducing the number of bits required to represent the remaining weights. This is achieved using a generalized loss function to minimize overall network size. In our experiments we demonstrate floating point accuracy with as few as 3% of the bits and 18% of the weights remaining in the network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Parallel Computing and Optimization Techniques
