Efficient Fully-Compressed Sequence Representations
Jeremy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro and, Yakov Nekrich

TL;DR
This paper introduces a fully compressed sequence data structure that supports fundamental queries efficiently, reducing redundancy and improving average-case performance over previous methods, with applications across various data structures.
Contribution
The paper presents a novel compressed sequence representation that reduces redundancy and achieves unprecedented average query times, improving upon prior work in multiple data structures.
Findings
Supports access, rank, and select in worst-case O(log log σ) time
Reduces redundancy to match the data's entropy, improving compression
Enhances performance of self-indexes, permutations, and dynamic collections
Abstract
We present a data structure that stores a sequence over alphabet in bits, where is the zero-order entropy of . This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time and average time . The worst-case complexity matches the best previous results, yet these had been achieved with data structures using bits. On highly compressible sequences the bits of the redundancy may be significant compared to the the bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Cellular Automata and Applications · DNA and Biological Computing
