The Hydrostructure: a Universal Framework for Safe and Complete Algorithms for Genome Assembly
Massimo Cairo, Shahbaz Khan, Romeo Rizzi, Sebastian Schmidt, and Alexandru I. Tomescu, Elia C. Zirondelli

TL;DR
This paper introduces the hydrostructure framework, a universal method for safe and complete genome assembly algorithms that generalizes previous approaches and adapts to practical assembly models.
Contribution
The paper presents the hydrostructure, a novel graph structure that unifies and extends safe genome assembly algorithms, enabling practical and optimal solutions.
Findings
Unified framework for safe genome assembly algorithms.
Extension of safe assembly concepts to practical models.
Algorithms with optimal verification and enumeration capabilities.
Abstract
Genome assembly is a fundamental problem in Bioinformatics, requiring to reconstruct a source genome from an assembly graph built from a set of reads (short strings sequenced from the genome). A notion of genome assembly solution is that of an arc-covering walk of the graph. Since assembly graphs admit many solutions, the goal is to find what is definitely present in all solutions, or what is safe. Most practical assemblers are based on heuristics having at their core unitigs, namely paths whose internal nodes have unit in-degree and out-degree, and which are clearly safe. The long-standing open problem of finding all the safe parts of the solutions was recently solved [RECOMB 2016] yielding a 60% increase in contig length. This safe and complete genome assembly algorithm was followed by other works improving the time bounds, as well as extending the results for different notions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genome Rearrangement Algorithms · Chromosomal and Genetic Variations
