Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC
Qi Liu, Anindya Bhadra, and William S. Cleveland

TL;DR
This paper introduces a novel Divide & Recombine method that estimates likelihood functions for large datasets by fitting parametric densities to MCMC samples from data subsets, enabling scalable and complex data analysis.
Contribution
It proposes an innovative D&R procedure to compute likelihood functions for big data models using MCMC-based density fitting and recombination, enhancing scalability and flexibility.
Findings
Effective estimation of likelihood functions for big data models.
Application to logistic regression with normal and skew-normal models.
Demonstrated scalability and accuracy of the method.
Abstract
In Divide & Recombine (D&R), big data are divided into subsets, each analytic method is applied to subsets, and the outputs are recombined. This enables deep analysis and practical computational performance. An innovate D\&R procedure is proposed to compute likelihood functions of data-model (DM) parameters for big data. The likelihood-model (LM) is a parametric probability density function of the DM parameters. The density parameters are estimated by fitting the density to MCMC draws from each subset DM likelihood function, and then the fitted densities are recombined. The procedure is illustrated using normal and skew-normal LMs for the logistic regression DM.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Statistical Methods and Inference · Bayesian Methods and Mixture Models
MethodsLogistic Regression
