Linear-Time Superbubble Identification Algorithm for Genome Assembly
Ljiljana Brankovic, Costas S. Iliopoulos, Ritu Kundu, Manal, Mohamed, Solon P. Pissis, Fatima Vayani

TL;DR
This paper introduces a new linear-time algorithm for detecting superbubbles in directed acyclic graphs, significantly improving the efficiency of genome assembly processes by accurately identifying complex subgraph structures.
Contribution
The paper presents the first linear-time algorithm for superbubble detection in DAGs, enhancing genome assembly algorithms by reducing computational complexity.
Findings
Achieves O(n+m) detection time, improving over previous algorithms.
Effectively identifies superbubbles in large genome graphs.
Facilitates faster and more accurate genome assembly.
Abstract
DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n nodes and m (directed) edges, improving the best-known O(m log m)-time algorithm by Sung et…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · RNA and protein synthesis mechanisms
