parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
Alexey Miroshnikov, Erin Conlon

TL;DR
The paper introduces the R package parallelMCMCcombine, which implements methods for combining independent subset posterior samples in Bayesian analysis of large datasets, facilitating scalable inference.
Contribution
It provides an accessible R package implementing four techniques for combining subset posteriors, advancing practical Bayesian analysis for big data.
Findings
Effective for logistic regression, Gaussian mixture, and hierarchical models
Demonstrated on simulation and real data examples
Supports models with fixed-dimensional parameters in continuous spaces
Abstract
Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for large data sets that are only large due to large sample sizes; these methods partition big data sets into subsets, and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
