# GenHap: A Novel Computational Method Based on Genetic Algorithms for   Haplotype Assembly

**Authors:** Andrea Tangherloni, Simone Spolaor, Leonardo Rundo, Marco S. Nobile,, Paolo Cazzaniga, Giancarlo Mauri, Pietro Li\`o, Ivan Merelli, Daniela Besozzi

arXiv: 1812.07689 · 2018-12-20

## TL;DR

GenHap introduces a genetic algorithm-based method for haplotype assembly that achieves high accuracy and faster performance compared to existing algorithms, especially suited for large, complex sequencing datasets.

## Contribution

This paper presents GenHap, a novel genetic algorithm approach for haplotype assembly that outperforms current methods in speed and accuracy on synthetic and real datasets.

## Key findings

- GenHap achieves high accuracy in haplotype reconstruction.
- GenHap is up to 4x faster than HapCol on Roche/454 data.
- GenHap is up to 20x faster on PacBio RS II data.

## Abstract

The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4x faster than HapCol in the case of Roche/454 instances and up to 20x faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07689/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07689/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1812.07689/full.md

---
Source: https://tomesphere.com/paper/1812.07689