Estimating heterozygosity from a low-coverage genome sequence, leveraging data from other individuals sequenced at the same sites
Katarzyna Bryc, Nick Patterson, and David Reich

TL;DR
This paper introduces a novel method to estimate an individual's genome-wide heterozygosity from low-coverage sequencing data by jointly modeling shared allele distributions, sequencing errors, and biases, validated on simulated and real human data.
Contribution
The authors present a new approach that accurately estimates heterozygosity without calling genotypes, overcoming challenges of low coverage sequencing.
Findings
Method performs well on simulated data.
Estimates from low coverage data are consistent with high coverage.
Heterozygosity ratios are more reliable than absolute estimates.
Abstract
High-throughput shotgun sequence data makes it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual's genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an individual is limited to low coverage, preventing the confident calling of genotypes necessary to directly count the proportion of heterozygous sites. Here, we present a method for estimating an individual's genome-wide rate of heterozygosity from low-coverage sequence data, without an intermediate step calling genotypes. Our method jointly learns the shared allele distribution between the individual and a panel of other individuals, together with the sequencing error distributions and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigestive system and related health · Forensic and Genetic Research · Genomics and Phylogenetic Studies
