Grouped Heterogeneous Mixture Modeling for Clustered Data
Shonosuke Sugasawa

TL;DR
This paper introduces a flexible grouped heterogenous mixture modeling approach for clustered data, allowing for interpretable cluster-wise distributions, with algorithms for estimation, model selection, and asymptotic analysis, demonstrated through simulations and real data.
Contribution
It proposes a novel grouped heterogenous mixture model with a simple EM algorithm, structured grouping strategies, and asymptotic properties, advancing analysis of clustered data.
Findings
Effective in modeling clustered data with interpretable groups
Demonstrated superior performance in simulations
Successfully applied to crime risk data in Tokyo
Abstract
Clustered data is ubiquitous in a variety of scientific fields. In this paper, we propose a flexible and interpretable modeling approach, called grouped heterogenous mixture modeling, for clustered data, which models cluster-wise conditional distributions by mixtures of latent conditional distributions common to all the clusters. In the model, we assume that clusters are divided into a finite number of groups and mixing proportions are the same within the same group. We provide a simple generalized EM algorithm for computing the maximum likelihood estimator, and an information criterion to select the numbers of groups and latent distributions. We also propose structured grouping strategies by introducing penalties on grouping parameters in the likelihood function. Under the settings where both the number of clusters and cluster sizes tend to infinity, we present asymptotic properties of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Data-Driven Disease Surveillance
