Identifying all snarls and superbubbles in linear-time, via a unified SPQR-tree framework
Francisco Sena, Aleksandr Politov, Corentin Moumard, Manuel C\'aceres, Sebastian Schmidt, Juha Harviainen, Alexandru I. Tomescu

TL;DR
This paper introduces a linear-time algorithm for identifying all snarls and superbubbles in pangenome graphs using a unified SPQR-tree framework, significantly improving efficiency over previous methods.
Contribution
The authors present the first linear-time algorithm for all snarls, unifying superbubble detection within a single framework using SPQR trees, and provide an efficient implementation.
Findings
Up to two times faster than vg for snarls detection.
Up to 50 times faster than BubbleGun for superbubbles.
Algorithms successfully evaluated on various pangenomic datasets.
Abstract
Snarls and superbubbles are fundamental pangenome decompositions capturing variant sites. These bubble-like structures underpin key tasks in computational pangenomics, including structural-variant genotyping, distance indexing, haplotype sampling, and variant annotation. Snarls can be quadratically-many in the size of the graph, and since their introduction in 2018 with the vg toolkit, there has been no work on identifying all snarls in linear time. Moreover, while it is known how to find superbubbles in linear time, this result is a highly specialized solution only achieved after a long series of papers. We present the first algorithm identifying all snarls in linear time. This is based on a new representation of all snarls, of size linear in the input graph size, and which can be computed in linear time. Our algorithm is based on a unified framework that also provides a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Biomedical Text Mining and Ontologies · Gene expression and cancer classification
