TL;DR
This paper critically examines the MEC-based haplotype assembly method, revealing its limitations with error-prone long reads from various sequencing devices and suggesting coverage requirements to improve accuracy.
Contribution
It demonstrates that MEC can produce incorrect haplotypes with long read data and provides coverage guidelines to mitigate this issue.
Findings
MEC may lead to incorrect haplotypes with long reads.
Coverage of 25 is recommended for Pacific BioSciences RS data.
MEC performance varies with error rates and coverage levels.
Abstract
The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
