Towards Hardware-Specific Automatic Compression of Neural Networks

Torben Krieger; Bernhard Klein; Holger Fr\"oning

arXiv:2212.07818·cs.LG·December 16, 2022

Towards Hardware-Specific Automatic Compression of Neural Networks

Torben Krieger, Bernhard Klein, Holger Fr\"oning

PDF

Open Access

TL;DR

This paper introduces Galen, a reinforcement learning framework that automatically finds hardware-specific neural network compression policies, optimizing for inference latency on target devices while maintaining accuracy.

Contribution

Galen is the first framework to automatically optimize neural network compression policies considering hardware-specific latency, combining pruning and quantization via reinforcement learning.

Findings

01

Compressed ResNet18 to 20% latency on ARM with minimal accuracy loss.

02

Joint pruning and quantization outperform individual methods.

03

Reinforcement learning effectively finds hardware-aware compression policies.

Abstract

Compressing neural network architectures is important to allow the deployment of models to embedded or mobile devices, and pruning and quantization are the major approaches to compress neural networks nowadays. Both methods benefit when compression parameters are selected specifically for each layer. Finding good combinations of compression parameters, so-called compression policies, is hard as the problem spans an exponentially large search space. Effective compression policies consider the influence of the specific hardware architecture on the used compression methods. We propose an algorithmic framework called Galen to search such policies using reinforcement learning utilizing pruning and quantization, thus providing automatic compression for neural networks. Contrary to other approaches we use inference latency measured on the target hardware device as an optimization goal. With…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · CCD and CMOS Imaging Sensors

MethodsPruning