Scalable MCMC for Large Data Problems using Data Subsampling and the Difference Estimator
Matias Quiroz, Mattias Villani, Robert Kohn

TL;DR
This paper introduces a scalable MCMC algorithm that uses data subsampling and the difference estimator to efficiently approximate the likelihood, enabling faster Bayesian inference on large datasets.
Contribution
The authors develop a generic MCMC method leveraging the difference estimator for accurate likelihood estimation from small data subsamples, reducing computational complexity.
Findings
Significant speed-up over standard MCMC on large datasets.
Accurate posterior approximation within O(m^{-1/2}) of the true posterior.
Effective application to logistic regression for bankruptcy prediction.
Abstract
We propose a generic Markov Chain Monte Carlo (MCMC) algorithm to speed up computations for datasets with many observations. A key feature of our approach is the use of the highly efficient difference estimator from the survey sampling literature to estimate the log-likelihood accurately using only a small fraction of the data. Our algorithm improves on the complexity of regular MCMC by operating over local data clusters instead of the full sample when computing the likelihood. The likelihood estimate is used in a Pseudo-marginal framework to sample from a perturbed posterior which is within of the true posterior, where is the subsample size. The method is applied to a logistic regression model to predict firm bankruptcy for a large data set. We document a significant speed up in comparison to the standard MCMC on the full dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Logistic Regression
