Separation of the largest eigenvalues in eigenanalysis of genotype data   from discrete subpopulations

Katarzyna Bryc; Wlodek Bryc; Jack W. Silverstein

arXiv:1301.4511·q-bio.PE·October 19, 2017

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations

Katarzyna Bryc, Wlodek Bryc, Jack W. Silverstein

PDF

TL;DR

This paper provides a mathematical framework for using principal component analysis to detect subpopulations in genetic data, emphasizing the importance of sample size over marker quantity.

Contribution

It introduces a mathematical model that justifies and quantifies the effectiveness of PCA in identifying subpopulations from genotype data.

Findings

01

Power depends more on number of individuals than markers

02

Mathematical analysis supports PCA's use in population detection

03

Quantitative measures of eigenvalue separation

Abstract

We present a mathematical model, and the corresponding mathematical analysis, that justifies and quantifies the use of principal component analysis of biallelic genetic marker data for a set of individuals to detect the number of subpopulations represented in the data. We indicate that the power of the technique relies more on the number of individuals genotyped than on the number of markers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.