Maximum Likelihood Estimation of Frequencies of Known Haplotypes from   Pooled Sequence Data

Darren Kessner; Tom Turner; and John Novembre

arXiv:1209.4128·q-bio.QM·February 7, 2013

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

Darren Kessner, Tom Turner, and John Novembre

PDF

TL;DR

This paper introduces an EM algorithm for estimating known haplotype frequencies directly from pooled sequencing data, improving accuracy over existing methods, and provides an open-source implementation.

Contribution

The paper presents a novel EM-based method for haplotype frequency estimation from pooled sequence data, applicable to microbiome and population studies.

Findings

01

Outperforms existing single-site allele frequency methods

02

Effective for microbiome and population sequencing

03

Implemented as open-source software

Abstract

DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strains in a blood sample). We present an expectation-maximization (EM) algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different strains within a metagenomics sample. Our method outperforms existing methods based on single- site allele frequencies, as well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.