On the Optimisation of the GSACA Suffix Array Construction Algorithm

Jannik Olbrich; Enno Ohlebusch; Thomas B\"uchler

arXiv:2206.12222·cs.DS·August 31, 2022·1 cites

On the Optimisation of the GSACA Suffix Array Construction Algorithm

Jannik Olbrich, Enno Ohlebusch, Thomas B\"uchler

PDF

Open Access 1 Repo

TL;DR

This paper analyzes and optimizes the GSACA suffix array construction algorithm, resulting in a significantly faster linear-time algorithm that outperforms existing methods like DivSufSort and DSH.

Contribution

It provides an improved, optimized linear-time suffix array construction algorithm based on the GSACA sorting principle, enhancing real-world performance.

Findings

01

The new algorithm is significantly faster than GSACA.

02

It outperforms DivSufSort and DSH in benchmarks.

03

The optimization leverages properties of the GSACA sorting principle.

Abstract

The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving the algorithm's subpar real-world performance. There is a super-linear algorithm DSH which relies on the same sorting principle and is faster than DivSufSort, the fastest SACA for over a decade. This paper is concerned with analysing the sorting principle used in GSACA and DSH and exploiting its properties in order to give an optimised linear-time algorithm. Our resulting algorithm is not only significantly faster than GSACA but also outperforms DivSufSort and DSH.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/qwerzuiop/lfgsaca
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Web Data Mining and Analysis