A Spectral Graph Approach to Discovering Genetic Ancestry
Ann B. Lee, Diana Luca, and Kathryn Roeder

TL;DR
This paper introduces a spectral graph method for analyzing genetic data that improves the delineation of human ancestry, offering more stability and flexibility than traditional PCA-based approaches.
Contribution
It develops a novel spectral embedding technique based on the normalized Laplacian, enhancing genetic clustering and ancestry inference over PCA.
Findings
More meaningful ancestry delineation than PCA
Stable to outliers and adaptable to various similarity measures
Effective in large, heterogeneous genetic samples
Abstract
Mapping human genetic variation is fundamentally interesting in fields such as anthropology and forensic inference. At the same time patterns of genetic diversity confound efforts to determine the genetic basis of complex disease. Due to technological advances it is now possible to measure hundreds of thousands of genetic variants per individual across the genome. Principal component analysis (PCA) is routinely used to summarize the genetic similarity between subjects. The eigenvectors are interpreted as dimensions of ancestry. We build on this idea using a spectral graph approach. In the process we draw on connections between multidimensional scaling and spectral kernel methods. Our approach, based on a spectral embedding derived from the normalized Laplacian of a graph, can produce more meaningful delineation of ancestry than by using PCA. The method is stable to outliers and can more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Genetic Mapping and Diversity in Plants and Animals
