Distributed inference for heterogeneous mixture models using multi-site data
Xiaokang Liu, Rui Duan, Raymond J. Carroll, Yang Ning, Yong Chen

TL;DR
This paper introduces a distributed EM algorithm for fitting heterogeneous mixture models across multiple sites without sharing individual data, effectively handling site-specific heterogeneity and ensuring theoretical convergence guarantees.
Contribution
It proposes a novel distributed EM framework with a density ratio tilted surrogate Q function for multi-site heterogeneous data, maintaining statistical efficiency.
Findings
The estimator achieves the same contraction rate as pooled data EM.
The framework accommodates heterogeneity in mixing proportions across sites.
The method ensures privacy and data sharing constraints are respected.
Abstract
Mixture models postulate the overall population as a mixture of finite subpopulations with unobserved membership. Fitting mixture models usually requires large sample sizes and combining data from multiple sites can be beneficial. However, sharing individual participant data across sites is often less feasible due to various types of practical constraints, such as data privacy concerns. Moreover, substantial heterogeneity may exist across sites, and locally identified latent classes may not be comparable across sites. We propose a unified modeling framework where a common definition of the latent classes is shared across sites and heterogeneous mixing proportions of latent classes are allowed to account for between-site heterogeneity. To fit the heterogeneous mixture model on multi-site data, we propose a novel distributed Expectation-Maximization (EM) algorithm where at each iteration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
