Linear Time Construction of Indexable Founder Block Graphs
Veli M\"akinen, Bastien Cazaux, Massimo Equi, Tuukka Norri, and, Alexandru I. Tomescu

TL;DR
This paper presents a linear time algorithm for constructing indexable founder block graphs from multiple sequence alignments, enabling efficient string matching in pangenome representations.
Contribution
It introduces a novel linear time method to build segment repeat-free founder block graphs with an integrated succinct index for fast queries.
Findings
Constructed a compact founder block graph from SARS-CoV-2 MSA in one minute.
Graph contains 3900 nodes and 4440 edges, with node labels up to length 12.
Index size is only 3% of the original MSA size.
Abstract
We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder blocks that then can be concatenated into a set of founder sequences. All possible concatenation orders can be expressed as a founder block graph. We observe a key property of such graphs: if the node labels (founder segments) do not repeat in the paths of the graph, such graphs can be indexed for efficient string matching. We call such graphs segment repeat-free founder block graphs. We give a linear time algorithm to construct a segment repeat-free founder block graph given an MSA. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
