Divide and Recombine for Large and Complex Data: Model Likelihood   Functions using MCMC

Qi Liu; Anindya Bhadra; and William S. Cleveland

arXiv:1801.05007·stat.ME·January 17, 2018

Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC

Qi Liu, Anindya Bhadra, and William S. Cleveland

PDF

Open Access

TL;DR

This paper introduces a novel Divide & Recombine method that estimates likelihood functions for large datasets by fitting parametric densities to MCMC samples from data subsets, enabling scalable and complex data analysis.

Contribution

It proposes an innovative D&R procedure to compute likelihood functions for big data models using MCMC-based density fitting and recombination, enhancing scalability and flexibility.

Findings

01

Effective estimation of likelihood functions for big data models.

02

Application to logistic regression with normal and skew-normal models.

03

Demonstrated scalability and accuracy of the method.

Abstract

In Divide & Recombine (D&R), big data are divided into subsets, each analytic method is applied to subsets, and the outputs are recombined. This enables deep analysis and practical computational performance. An innovate D\&R procedure is proposed to compute likelihood functions of data-model (DM) parameters for big data. The likelihood-model (LM) is a parametric probability density function of the DM parameters. The density parameters are estimated by fitting the density to MCMC draws from each subset DM likelihood function, and then the fitted densities are recombined. The procedure is illustrated using normal and skew-normal LMs for the logistic regression DM.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Statistical Methods and Inference · Bayesian Methods and Mixture Models

MethodsLogistic Regression