Coresets for Scalable Bayesian Logistic Regression
Jonathan H. Huggins, Trevor Campbell, Tamara Broderick

TL;DR
This paper introduces an efficient method for creating small, weighted data subsets called coresets for Bayesian logistic regression, enabling scalable inference with theoretical guarantees and practical efficiency.
Contribution
It develops a novel coreset construction algorithm for Bayesian logistic regression with proven size and approximation guarantees, applicable in streaming and parallel contexts.
Findings
Coresets significantly reduce data size without losing accuracy.
Construction time is negligible compared to MCMC inference.
Coreset size is independent of original dataset size.
Abstract
The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification. In this paper, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Statistical Methods and Inference
MethodsLogistic Regression
