Efficient Fully-Compressed Sequence Representations

Jeremy Barbay; Francisco Claude; Travis Gagie; Gonzalo Navarro and; Yakov Nekrich

arXiv:0911.4981·cs.DS·April 3, 2012

Efficient Fully-Compressed Sequence Representations

Jeremy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro and, Yakov Nekrich

PDF

Open Access

TL;DR

This paper introduces a fully compressed sequence data structure that supports fundamental queries efficiently, reducing redundancy and improving average-case performance over previous methods, with applications across various data structures.

Contribution

The paper presents a novel compressed sequence representation that reduces redundancy and achieves unprecedented average query times, improving upon prior work in multiple data structures.

Findings

01

Supports access, rank, and select in worst-case O(log log σ) time

02

Reduces redundancy to match the data's entropy, improving compression

03

Enhances performance of self-indexes, permutations, and dynamic collections

Abstract

We present a data structure that stores a sequence $s [1.. n]$ over alphabet $[1.. σ]$ in $n \Ho (s) + o (n) (\Ho (s) + 1)$ bits, where $\Ho (s)$ is the zero-order entropy of $s$ . This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time $\Oh l g l g σ$ and average time $\Oh l g \Ho (s)$ . The worst-case complexity matches the best previous results, yet these had been achieved with data structures using $n \Ho (s) + o (n l g σ)$ bits. On highly compressible sequences the $o (n l g σ)$ bits of the redundancy may be significant compared to the the $n \Ho (s)$ bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Cellular Automata and Applications · DNA and Biological Computing