Sampling the suffix array with minimizers
Szymon Grabowski, Marcin Raniszewski

TL;DR
This paper introduces a new suffix array sampling method using minimizers that requires only a minimum pattern length, offering a practical and efficient alternative to previous alphabet sampling schemes.
Contribution
It proposes a suffix sampling approach based on minimizers that relaxes pattern constraints, improving practicality and maintaining competitive efficiency.
Findings
Achieves competitive time-space tradeoffs on benchmark data
Requires only a minimum pattern length, not specific characters
Outperforms previous alphabet sampling schemes in flexibility
Abstract
Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches compared to the sparse suffix array, for long enough patterns. A drawback of their approach is the requirement that sought patterns need to contain at least one character from the chosen subalphabet. In this work we propose an alternative suffix sampling approach with only a minimum pattern length as a requirement, which seems more convenient in practice. Experiments show that our algorithm achieves competitive time-space tradeoffs on most standard benchmark data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Network Packet Processing and Optimization
