Prefix-free graphs and suffix array construction in sublinear space
Andrej Bal\'a\v{z}, Alessia Petescia

TL;DR
This paper introduces prefix-free graphs, a new pangenomic data structure, enabling the construction of stringology data structures like suffix arrays in sublinear space for efficient graph-based genome analysis.
Contribution
The paper presents the novel concept of prefix-free graphs and demonstrates their use in constructing classical string data structures efficiently on pangenomes.
Findings
Constructed prefix-free graphs for pangenomes
Achieved sublinear space suffix array construction
Enabled efficient read mapping on graph structures
Abstract
A recent paradigm shift in bioinformatics from a single reference genome to a pangenome brought with it several graph structures. These graph structures must implement operations, such as efficient construction from multiple genomes and read mapping. Read mapping is a well-studied problem in sequential data, and, together with data structures such as suffix array and Burrows-Wheeler transform, allows for efficient computation. Attempts to achieve comparatively high performance on graphs bring many complications since the common data structures on strings are not easily obtainable for graphs. In this work, we introduce prefix-free graphs, a novel pangenomic data structure; we show how to construct them and how to use them to obtain well-known data structures from stringology in sublinear space, allowing for many efficient operations on pangenomes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Genomics and Chromatin Dynamics
