Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in   Optimal Compressed Space

Dominik Kempa; Tomasz Kociumaka

arXiv:2308.03635·cs.DS·September 24, 2024·1 cites

Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space

Dominik Kempa, Tomasz Kociumaka

PDF

Open Access

TL;DR

This paper introduces a new compressed index that supports suffix array queries within optimal compressed space, effectively collapsing the hierarchy of compressed data structures and enabling faster construction for repetitive texts.

Contribution

The authors present a novel index that supports suffix array queries in optimal compressed space with sub-polynomial query time, and a faster construction method from LZ77 parsing.

Findings

01

Supports suffix array queries in optimal compressed space

02

Achieves faster construction for highly repetitive texts

03

Develops new techniques for LCE queries in compressed space

Abstract

In the last decades, the necessity to process massive amounts of textual data fueled the development of compressed text indexes: data structures efficiently answering queries on a given text while occupying space proportional to the compressed representation of the text. A widespread phenomenon in compressed indexing is that more powerful queries require larger indexes. For example, random access, the most basic query, can be supported in $O (δ lo g \frac{n l o g σ}{δ l o g n})$ space (where $n$ is the text length, $σ$ is the alphabet size, and $δ$ is text's substring complexity), which is the asymptotically smallest space to represent a string, for all $n$ , $σ$ , and $δ$ (Kociumaka, Navarro, Prezza; IEEE Trans. Inf. Theory 2023). The other end of the hierarchy is occupied by indexes supporting the powerful suffix array (SA) queries. The currently smallest one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization