Fast, parallel, and cache-friendly suffix array construction
Jamshed Khan, Tobias Rubel, Erin Molloy, Laxman Dhulipala, Rob Patro

TL;DR
This paper introduces a new fast and efficient algorithm for building suffix arrays, which are important in bioinformatics, and makes it publicly available.
Contribution
The novel contribution is a scalable parallel algorithm, caps-sa, with improved performance and memory efficiency for suffix array construction.
Findings
caps-sa outperforms existing state-of-the-art parallel suffix array construction algorithms on modern hardware.
The algorithm achieves strong performance due to excellent memory locality and fewer cache misses.
Abstract
String indexes such as the suffix array (sa) and the closely related longest common prefix (lcp) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize. In this paper we present caps-sa, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort and utilizing an LCP-informed mergesort. Due to its design, caps-sa has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies. We show that despite its simple design, caps-sa outperforms existing state-of-the-art parallel sa and lcp-array construction algorithms on modern hardware. Finally,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · RNA modifications and cancer
