Linearity-based neural network compression
Silas Dobler, Florian Lemmerich

TL;DR
This paper introduces a novel neural network compression method based on the linearity of neurons with ReLU-like activations, achieving significant size reduction without loss of performance.
Contribution
It proposes a new linearity-based compression technique that complements existing importance-based methods, enabling more efficient neural network models.
Findings
Achieves lossless compression down to 25% of original size in most models.
Combining with importance-based pruning shows minimal interference.
Lays foundation for a new class of neural network compression methods.
Abstract
In neural network compression, most current methods reduce unnecessary parameters by measuring importance and redundancy. To augment already highly optimized existing solutions, we propose linearity-based compression as a novel way to reduce weights in a neural network. It is based on the intuition that with ReLU-like activation functions, neurons that are almost always activated behave linearly, allowing for merging of subsequent layers. We introduce the theory underlying this compression and evaluate our approach experimentally. Our novel method achieves a lossless compression down to 1/4 of the original model size in over the majority of tested models. Applying our method on already importance-based pruned models shows very little interference between different types of compression, demonstrating the option of successful combination of techniques. Overall, our work lays the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Neural Networks and Applications
