TL;DR
This paper presents a novel algorithm for efficiently constructing the generalized suffix array of similar strings by leveraging compressed matching statistics, resulting in faster performance on such collections.
Contribution
The paper introduces a new method that uses compressed matching statistics to improve suffix array construction for highly similar string collections.
Findings
Constructed suffix arrays faster than existing methods on similar strings
Developed a heuristic for quick matching statistics computation
Prototype implementation 'sacamats' demonstrates competitive performance
Abstract
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
