FSG: Fast String Graph Construction for De Novo Assembly of Reads Data
Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali,, Raffaella Rizzi

TL;DR
This paper introduces FSG, a fast and memory-efficient method for constructing string graphs in de novo genome assembly using FM-index and BWT, improving speed while maintaining accuracy.
Contribution
We present a novel FM-index-based algorithm for string graph construction that does not require access to the original reads, integrated into the SGA assembler.
Findings
FSG is significantly faster than the original SGA.
FSG maintains moderate memory usage.
FSG performs well on multi-threaded environments.
Abstract
The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FM-index and Burrows-Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the SGA assembler as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that FSG is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
