Counting Colours in Compressed Strings
Travis Gagie, Juha K\"arkk\"ainen

TL;DR
This paper introduces a space-efficient data structure for counting distinct characters in substrings of a string, achieving near-optimal compression and fast query times, with some support for updates.
Contribution
It presents a novel compressed data structure that efficiently supports substring color counting with near-optimal space and sub-logarithmic query time, including partial dynamism.
Findings
Uses space close to the zero-order entropy of the string
Supports substring color counting in 0(\u221a{ ext{log} n}) time
Can be made partially dynamic for updates
Abstract
Suppose we are asked to preprocess a string \(s [1..n]\) such that later, given a substring's endpoints, we can quickly count how many distinct characters it contains. In this paper we give a data structure for this problem that takes \(n H_0 (s) + \Oh{n} + \oh{n H_0 (s)}\) bits, where \(H_0 (s)\) is the 0th-order empirical entropy of , and answers queries in time for any constant \(\epsilon > 0\). We also show how our data structure can be made partially dynamic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
