An Eigenvalue Ratio Approach to Inferring Population Structure from Whole Genome Sequencing Data
Yuyang Xu, Zhonghua Liu, Jianfeng Yao

TL;DR
This paper introduces ERStruct, a new eigenvalue ratio method for inferring population structure from whole genome sequencing data, addressing limitations of traditional PCA-based approaches in high-dimensional, linkage disequilibrium-rich data.
Contribution
The paper proposes ERStruct, a novel eigenvalue ratio approach that improves population structure inference from sequencing data by overcoming sample-to-marker ratio and linkage disequilibrium issues.
Findings
ERStruct outperforms traditional methods in simulations.
It accurately determines the number of principal components.
Demonstrated on HapMap 3 and 1000 Genomes data.
Abstract
Inference of population structure from genetic data plays an important role in population and medical genetics studies. With the advancement and decreasing cost of sequencing technology, the increasingly available whole genome sequencing data provide much richer information about the underlying population structure. The traditional method (Patterson, Price, and Reich, 2006) originally developed for array-based genotype data for computing and selecting top principal components that capture population structure may not perform well on sequencing data for two reasons. First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio n/p is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Evolution and Genetic Dynamics · Gene expression and cancer classification
