On the Optimisation of the GSACA Suffix Array Construction Algorithm
Jannik Olbrich, Enno Ohlebusch, Thomas B\"uchler

TL;DR
This paper analyzes and optimizes the GSACA suffix array construction algorithm, resulting in a significantly faster linear-time algorithm that outperforms existing methods like DivSufSort and DSH.
Contribution
It provides an improved, optimized linear-time suffix array construction algorithm based on the GSACA sorting principle, enhancing real-world performance.
Findings
The new algorithm is significantly faster than GSACA.
It outperforms DivSufSort and DSH in benchmarks.
The optimization leverages properties of the GSACA sorting principle.
Abstract
The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving the algorithm's subpar real-world performance. There is a super-linear algorithm DSH which relies on the same sorting principle and is faster than DivSufSort, the fastest SACA for over a decade. This paper is concerned with analysing the sorting principle used in GSACA and DSH and exploiting its properties in order to give an optimised linear-time algorithm. Our resulting algorithm is not only significantly faster than GSACA but also outperforms DivSufSort and DSH.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Web Data Mining and Analysis
