Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations
Katarzyna Bryc, Wlodek Bryc, Jack W. Silverstein

TL;DR
This paper provides a mathematical framework for using principal component analysis to detect subpopulations in genetic data, emphasizing the importance of sample size over marker quantity.
Contribution
It introduces a mathematical model that justifies and quantifies the effectiveness of PCA in identifying subpopulations from genotype data.
Findings
Power depends more on number of individuals than markers
Mathematical analysis supports PCA's use in population detection
Quantitative measures of eigenvalue separation
Abstract
We present a mathematical model, and the corresponding mathematical analysis, that justifies and quantifies the use of principal component analysis of biallelic genetic marker data for a set of individuals to detect the number of subpopulations represented in the data. We indicate that the power of the technique relies more on the number of individuals genotyped than on the number of markers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
