Impact of Parameter Sparsity on Stochastic Gradient MCMC Methods for Bayesian Deep Learning
Meet P. Vadera, Adam D. Cobb, Brian Jalaian, Benjamin M. Marlin

TL;DR
This paper explores how sparsity in neural networks affects stochastic gradient MCMC methods for Bayesian deep learning, showing that random substructures can match pruning methods in performance while reducing training time.
Contribution
It introduces the use of random sparse network structures within stochastic gradient MCMC frameworks, demonstrating efficient trade-offs between model complexity and inference quality.
Findings
Randomly selected substructures perform comparably to pruning-based structures.
Sparse structures significantly reduce training times.
Certain sparse configurations maintain uncertainty quantification abilities.
Abstract
Bayesian methods hold significant promise for improving the uncertainty quantification ability and robustness of deep neural network models. Recent research has seen the investigation of a number of approximate Bayesian inference methods for deep neural networks, building on both the variational Bayesian and Markov chain Monte Carlo (MCMC) frameworks. A fundamental issue with MCMC methods is that the improvements they enable are obtained at the expense of increased computation time and model storage costs. In this paper, we investigate the potential of sparse network structures to flexibly trade-off model storage costs and inference run time against predictive performance and uncertainty quantification ability. We use stochastic gradient MCMC methods as the core Bayesian inference method and consider a variety of approaches for selecting sparse network structures. Surprisingly, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsPruning
