An External-Memory Algorithm for String Graph Construction
Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali,, Raffaella Rizzi

TL;DR
This paper presents an external-memory algorithm for constructing string graphs, which are crucial in genome assembly, addressing the challenge of processing large genomic datasets efficiently beyond main memory limitations.
Contribution
It introduces a novel external-memory algorithm specifically designed for string graph construction, filling a gap in scalable genome assembly methods.
Findings
Efficiently constructs string graphs from large datasets
Reduces memory usage in genome assembly processes
Enables processing of larger genomic datasets than previous methods
Abstract
Some recent results have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows-Wheeler Transform (BWT) of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memory-intensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem: to design an external-memory algorithm to compute the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
