Compressing Deep Neural Networks via Layer Fusion

James O' Neill; Greg Ver Steeg; Aram Galstyan

arXiv:2007.14917·cs.LG·July 30, 2020·6 cites

Compressing Deep Neural Networks via Layer Fusion

James O' Neill, Greg Ver Steeg, Aram Galstyan

PDF

Open Access

TL;DR

This paper introduces layer fusion, a model compression method that combines similar layers in neural networks, significantly reducing size while maintaining performance across vision and language tasks.

Contribution

It proposes a novel layer fusion technique applicable to various layer types, enabling substantial compression with minimal performance loss.

Findings

01

Achieves up to 3.33x compression on CNNs with less than 2% accuracy loss.

02

Reduces transformer models to 20% of original size with minimal perplexity increase.

03

Identifies a performance inflection point indicating limits of compression.

Abstract

This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2\% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20\% of its original size while being within 5 perplexity points of the original network. We also find that other well-established…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Neural Networks and Applications

MethodsConvolution