Towards Hardware-Specific Automatic Compression of Neural Networks
Torben Krieger, Bernhard Klein, Holger Fr\"oning

TL;DR
This paper introduces Galen, a reinforcement learning framework that automatically finds hardware-specific neural network compression policies, optimizing for inference latency on target devices while maintaining accuracy.
Contribution
Galen is the first framework to automatically optimize neural network compression policies considering hardware-specific latency, combining pruning and quantization via reinforcement learning.
Findings
Compressed ResNet18 to 20% latency on ARM with minimal accuracy loss.
Joint pruning and quantization outperform individual methods.
Reinforcement learning effectively finds hardware-aware compression policies.
Abstract
Compressing neural network architectures is important to allow the deployment of models to embedded or mobile devices, and pruning and quantization are the major approaches to compress neural networks nowadays. Both methods benefit when compression parameters are selected specifically for each layer. Finding good combinations of compression parameters, so-called compression policies, is hard as the problem spans an exponentially large search space. Effective compression policies consider the influence of the specific hardware architecture on the used compression methods. We propose an algorithmic framework called Galen to search such policies using reinforcement learning utilizing pruning and quantization, thus providing automatic compression for neural networks. Contrary to other approaches we use inference latency measured on the target hardware device as an optimization goal. With…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · CCD and CMOS Imaging Sensors
MethodsPruning
