A hierarchical network heuristic for solving the orientation problem in genome assembly
Karl R. B. Schmitt, Aleksey V. Zimin, Guillaume Marca\c{c}s, James A., Yorke, Michelle Girvan

TL;DR
This paper introduces a hierarchical clustering algorithm that robustly solves the orientation problem in genome assembly, effectively handling errors and noise in both simulated and real bacterial data.
Contribution
The paper presents a novel hierarchical clustering-based method for genome assembly orientation that improves robustness against errors and noise compared to existing algorithms.
Findings
Successfully solves orientation problem in simulated data
Accurately orients contigs in real R. sphaeroides data
Demonstrates stability to data errors and initial conditions
Abstract
In the past several years, the problem of genome assembly has received considerable attention from both biologists and computer scientists. An important component of current assembly methods is the scaffolding process. This process involves building ordered and oriented linear collections of contigs (continuous overlapping sequence reads) called scaffolds and relies on the use of mate pair data. A mate pair is a set of two reads that are sequenced from the ends of a single fragment of DNA, and therefore have opposite mutual orientations. When two reads of a mate-pair are placed into two different contigs, one can infer the mutual orientation of these contigs. While several orientation algorithms exist as part of assembly programs, all encounter challenges while solving the orientation problem due to errors from mis-assemblies in contigs or errors in read placements. In this paper we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Chromosomal and Genetic Variations · RNA and protein synthesis mechanisms
