Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure
David Mimno, David M Blei, Barbara E Engelhardt

TL;DR
This paper introduces posterior predictive checks (PPCs) to evaluate the fit of admixture models in genetic studies, highlighting their utility in detecting model inadequacies across diverse datasets.
Contribution
Develops PPC-based methods for validating admixture models, enabling assessment of model fit for key population genetic statistics.
Findings
PPCs reveal study-specific model fit issues.
Model fit varies significantly across different datasets.
PPCs are useful for large genomic studies.
Abstract
Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of a statistical model. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the model estimates for four qualitatively different population…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
