Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space
Dominik Kempa, Tomasz Kociumaka

TL;DR
This paper introduces a new compressed index that supports suffix array queries within optimal compressed space, effectively collapsing the hierarchy of compressed data structures and enabling faster construction for repetitive texts.
Contribution
The authors present a novel index that supports suffix array queries in optimal compressed space with sub-polynomial query time, and a faster construction method from LZ77 parsing.
Findings
Supports suffix array queries in optimal compressed space
Achieves faster construction for highly repetitive texts
Develops new techniques for LCE queries in compressed space
Abstract
In the last decades, the necessity to process massive amounts of textual data fueled the development of compressed text indexes: data structures efficiently answering queries on a given text while occupying space proportional to the compressed representation of the text. A widespread phenomenon in compressed indexing is that more powerful queries require larger indexes. For example, random access, the most basic query, can be supported in space (where is the text length, is the alphabet size, and is text's substring complexity), which is the asymptotically smallest space to represent a string, for all , , and (Kociumaka, Navarro, Prezza; IEEE Trans. Inf. Theory 2023). The other end of the hierarchy is occupied by indexes supporting the powerful suffix array (SA) queries. The currently smallest one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization
