Random Access in Persistent Strings and Segment Selection
Philip Bille, Inge Li G{\o}rtz

TL;DR
This paper introduces a space-efficient data structure for random access in collections of similar strings represented by trees, achieving optimal query times and extending to a geometric segment selection problem.
Contribution
It presents a novel $O(n)$ space data structure supporting $O(rac{\log n}{\log \log n})$ query time for random access in persistent strings, improving previous bounds and proving optimality.
Findings
Achieves $O(n)$ space and $O(rac{\log n}{\log \log n})$ query time for string collections.
Reduces string random access to a geometric segment selection problem.
Proves the optimality of the query time for near-linear space solutions.
Abstract
We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with nodes, we show how to represent the corresponding collection in space and query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Management and Algorithms · Genomics and Phylogenetic Studies
