Exploiting Non-Linear Redundancy for Neural Model Compression
Muhammad A. Shah, Raphael Olivier, Bhiksha Raj

TL;DR
This paper introduces a novel, provably lossless neural network compression method that exploits linear dependencies among neurons, achieving up to 99% size reduction with minimal performance loss.
Contribution
The paper presents a new compression technique based on eliminating neurons via linear dependence, combined with an annealing algorithm, applicable during or after training.
Findings
Up to 99% reduction in network size achieved.
Method is lossless during training and effective on trained models.
Theoretical proof of capturing redundancies in overparametrized ReLU networks.
Abstract
Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world. This situation has led to research into model compression techniques most of which rely on suboptimal heuristics and do not consider the parameter redundancies due to linear dependence between neuron activations in overparametrized networks. In this paper, we propose a novel model compression approach based on exploitation of linear dependence, that compresses networks by elimination of entire neurons and redistribution of their activations over other neurons in a manner that is provably lossless while training. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our method results in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
