Suffix sorting via matching statistics

Zsuzsanna Lipt\'ak; Francesco Masillo; Simon J. Puglisi

arXiv:2207.00972·cs.DS·April 16, 2024

Suffix sorting via matching statistics

Zsuzsanna Lipt\'ak, Francesco Masillo, Simon J. Puglisi

PDF

1 Repo

TL;DR

This paper presents a novel algorithm for efficiently constructing the generalized suffix array of similar strings by leveraging compressed matching statistics, resulting in faster performance on such collections.

Contribution

The paper introduces a new method that uses compressed matching statistics to improve suffix array construction for highly similar string collections.

Findings

01

Constructed suffix arrays faster than existing methods on similar strings

02

Developed a heuristic for quick matching statistics computation

03

Prototype implementation 'sacamats' demonstrates competitive performance

Abstract

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fmasillo/sacamats
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.