Bayesian Sparse learning with preconditioned stochastic gradient MCMC   and its applications

Yating Wang; Wei Deng; Lin Guang

arXiv:2006.16376·math.NA·March 17, 2021

Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications

Yating Wang, Wei Deng, Lin Guang

PDF

TL;DR

This paper introduces a Bayesian sparse deep learning algorithm using preconditioned stochastic gradient MCMC, which improves convergence and efficiency in non-convex models by adapting to local geometry and optimizing hyperparameters.

Contribution

The paper proposes a novel Bayesian sparse learning method with preconditioned SGLD that addresses slow mixing issues and ensures asymptotic convergence in complex neural network models.

Findings

01

Demonstrates improved accuracy on synthetic regression tasks.

02

Shows efficient learning of solutions to elliptic PDEs.

03

Provides theoretical convergence guarantees.

Abstract

In this work, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD), and optimize the latent variables via stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in non-convex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply a computationally manageable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.