Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning
Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

TL;DR
This paper introduces a probabilistic approach to neural network pruning using stochastic masks, analyzes its dynamics in linear regression, and employs PAC-Bayes bounds for self-bounded learning with improved generalization.
Contribution
It proposes a novel probabilistic fine-tuning method for pruning masks, analyzes the training dynamics in linear models, and develops a PAC-Bayes based self-bounded learning algorithm for neural networks.
Findings
Stochastic pruning masks induce a data-adaptive L1 regularization.
Fine-tuning stochastic masks improves test error over baselines.
PAC-Bayes bounds effectively control generalization error.
Abstract
We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regularization term known to underlie dropout in linear regression. We also observe a preference to prune weights that are less well-aligned with the data labels. We evaluate probabilistic fine-tuning for optimizing stochastic pruning masks for neural networks, starting from masks produced by several baselines. In each case, we see improvements in test error over baselines, even after we threshold fine-tuned stochastic pruning masks. Finally, since a stochastic pruning mask induces a stochastic neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsTest · Pruning · Dropout · L1 Regularization
