Scalable Bayesian Learning with posteriors
Samuel Duffield, Kaelan Donatella, Johnathan Chiu, Phoebe Klett,, Daniel Simpson

TL;DR
This paper introduces 'posteriors', a PyTorch library that makes scalable Bayesian learning feasible for large models and datasets, combining stochastic gradient MCMC, optimization, and ensemble methods.
Contribution
It presents a new library, 'posteriors', and a tempered stochastic gradient MCMC framework that unifies Bayesian inference, optimization, and ensemble methods for large-scale models.
Findings
The library enables scalable Bayesian inference in large models.
The tempered SGMCMC approach bridges Bayesian sampling and optimization.
Experiments demonstrate improved Bayesian approximation and insights into the cold posterior effect.
Abstract
Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes; (ii) present a tempered framing of stochastic gradient Markov chain Monte Carlo, as implemented in posteriors, that transitions seamlessly into optimization and unveils a minor modification to deep ensembles to ensure they are asymptotically unbiased for the Bayesian posterior, and (iii) demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models.
Peer Reviews
Decision·ICLR 2025 Poster
- Software release that allows other researchers and practitioners to build on the proposed and baseline methods. Overall appears to contain MCMC-based methods, variational inference, and Laplace approximations, all of which are relevant and practical methods. - Clearly written motivation for Bayesian inference in general - Interface seems very clear and flexible due to functional paradigm - Additional contribution in form of a new SGMCMC algorithm? - Convincing experiments on modern architectur
- Only diagonal variational and Laplace approximations, which are known to have certain issues and are generally outperformed by more structured posterior approximations. For example, Laplace approximations perform better across the board when using layer-wise Kronecker-factored structure (Ritter et al, ICLR 2018, https://discovery.ucl.ac.uk/id/eprint/10080902/1/kflaplace.pdf). Apparently, extending the library to such approximations seems out of scope? (lines 209-210) - I find the interweaved S
The paper is in general well-written and the empirical results are presented clearly. I like the key features of the proposed library which are lacking in existing libraries. Below are the key strengths. - The key features of the proposed library as sketched in section 3 are important for modern Bayesian deep learning. - The experiments cover various scenarios in current deep learning landscape. - the library is open-sourced and supports customization.
I think the paper can be improved by better scoping the problems that the library aims to solve, the inference approaches provided by the library, and the practical tradeoffs between the proposed Bayesian approach versus the non-Bayesian approaches (e.g. SGD, LoRA and Deep Ensemles and their various combinations). In particular, the following questions should be addressed: - What problems can the proposed library solve? Examples include disentangling various sources of uncertainties, continu
- the paper is mostly clear and concise, presents a straightforward approach, keeping primary discussions in the main sections and moving technical details to the appendix. - the paper provides extensive experimentation, including large datasets and Bayesian inference with large-scale models, showing the library's potential across applications - the library is composable, extendable, and scalable; and promotes compatibility with pytorch and other popular libraries, and could therefore have a rel
- the clarity of writing in some technical sections could be improved. Certain descriptions of key concepts, particularly the tempered SGMCMC framework (section 4), would benefit from a more step-by-step breakdown. I felt it was a bit too rushed. - the main text could further emphasize how "posteriors" differentiates itself technically, showing specific implementations or optimizations unique to this library (in a similar spirit to Figure 3). The current presentation could be misinterpreted as i
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
MethodsLib · Deep Ensembles
