Needles in the Haystack: Identifying Individuals Present in Pooled   Genomic Data

Rosemary Braun; William Rowe; Carl Schaefer; Jinghui Zhang; and; Kenneth Buetow

arXiv:0902.1506·q-bio.GN·September 24, 2015

Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data

Rosemary Braun, William Rowe, Carl Schaefer, Jinghui Zhang, and, Kenneth Buetow

PDF

TL;DR

This paper critically evaluates a genetic distance metric used to identify individuals in pooled genomic data, revealing its limitations in specificity and exploring potential improvements and applications.

Contribution

The study provides a comprehensive analysis of the assumptions, limitations, and potential uses of a novel genetic distance metric for individual identification.

Findings

01

Low specificity in identifying individuals in samples

02

Misclassifications caused by assumption violations

03

Potential for future research in ancestry and disease prediction

Abstract

Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present an exploration of the strengths and limitations of that method. In addition to analytical investigations of the underlying assumptions, we use both real and simulated genotypes to test empirically the method's accuracy. The results reveal that, when used as a means by which to identify individuals as members of a population sample, the specificity is low in several circumstances. We find that the misclassifications stem from violations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.