HALO: Learning to Prune Neural Networks with Shrinkage
Skyler Seto, Martin T. Wells, Wenyu Zhang

TL;DR
HALO introduces a novel Bayesian hierarchical penalty that adaptively sparsifies neural networks, enabling highly sparse models with competitive accuracy, surpassing existing pruning methods at similar sparsity levels.
Contribution
The paper proposes HALO, a new adaptive sparsity penalty based on Bayesian hierarchical models, which effectively prunes neural networks without fine-tuning.
Findings
HALO achieves 5% parameter sparsity with high accuracy.
HALO outperforms state-of-the-art magnitude pruning methods.
Highly sparse networks maintain strong performance.
Abstract
Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data, however this performance is closely tied to model size. Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network. We study different sparsity inducing penalties from the perspective of Bayesian hierarchical models and present a novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters. When used to train over-parametrized networks, our penalty yields small subnetworks with high accuracy without fine-tuning. Empirically, on image recognition tasks, we find that HALO is able to learn highly sparse network (only 5% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Advanced Neural Network Applications
MethodsPruning
