Compressing Deep Neural Networks via Layer Fusion
James O' Neill, Greg Ver Steeg, Aram Galstyan

TL;DR
This paper introduces layer fusion, a model compression method that combines similar layers in neural networks, significantly reducing size while maintaining performance across vision and language tasks.
Contribution
It proposes a novel layer fusion technique applicable to various layer types, enabling substantial compression with minimal performance loss.
Findings
Achieves up to 3.33x compression on CNNs with less than 2% accuracy loss.
Reduces transformer models to 20% of original size with minimal perplexity increase.
Identifies a performance inflection point indicating limits of compression.
Abstract
This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2\% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20\% of its original size while being within 5 perplexity points of the original network. We also find that other well-established…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Neural Networks and Applications
MethodsConvolution
