Evaluation of population structure inferred by principal component analysis or the admixture model
Jan van Waaij, Song Li, Gen\'is Garcia-Erill, Anders, Albrechtsen, Carsten Wiuf

TL;DR
This paper introduces a statistical method to evaluate how well PCA and admixture models fit genetic data, helping to identify violations of assumptions and individuals poorly represented by these models.
Contribution
A novel approach to assess the fit of PCA and admixture models in population genetics using residual covariance analysis.
Findings
Method detects violations of PCA assumptions.
Guides interpretation of population structure.
Identifies individuals not well modeled.
Abstract
Principal component analysis (PCA) is commonly used in genetics to infer and visualize population structure and admixture between populations. PCA is often interpreted in a way similar to inferred admixture proportions, where it is assumed that individuals belong to one of several possible populations or are admixed between these populations. We propose a new method to assess the statistical fit of PCA (interpreted as a model spanned by the top principal components) and to show that violations of the PCA assumptions affect the fit. Our method uses the chosen top principal components to predict the genotypes. By assessing the covariance (and the correlation) of the residuals (the differences between observed and predicted genotypes), we are able to detect violation of the model assumptions. Based on simulations and genome wide human data we show that our assessment of fit can be used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock · Genetics and Plant Breeding
