Supregraph: Enabling Information-Optimal Assembly Graph Representation of a Read Set
Anton Bankevich

TL;DR
This paper introduces supregraphs, a new class of assembly graphs that theoretically optimize genome assembly by overcoming limitations of traditional de Bruijn and overlap graphs.
Contribution
The paper presents a formal mathematical model for genome assembly, introducing supregraphs as a novel, information-preserving graph representation derived from de Bruijn graphs.
Findings
Supregraphs can be constructed by transforming de Bruijn graphs through multiplexing.
Supregraphs provide a foundation for theoretically optimal genome assemblies.
A correct representation of a read set exists as a supregraph under error-free reads.
Abstract
The first step in any genome assembly algorithm entails the conversion from the domain of strings and overlaps to the language of graphs and paths, typically using one of the two conventional methods: de Bruijn graphs or overlap graphs. However, both standard approaches are known to have limitations. De Bruijn graphs fail to represent complete information from reads, while the overlap graphs often produce artificial breaks in contigs due to the necessity to discard contained reads as a preliminary step. In this work we present a mathematical model for genome assembly that provides a formal framework to determine what constitutes a correct conversion of a read set into an assembly graph under the assumption of error-free reads. We prove that a correct representation of a read set exists in the form of a new class of assembly graphs, which we call supregraphs. We show that supregraphs can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
