A Self-Index on Block Trees

Gonzalo Navarro

arXiv:1606.06617·cs.DS·October 11, 2017

A Self-Index on Block Trees

Gonzalo Navarro

PDF

TL;DR

This paper introduces a self-index built on the Block Tree data structure, enabling efficient pattern searches with space close to the original compressed data, based on Lempel-Ziv parsing.

Contribution

It presents a novel self-index on Block Trees that supports pattern search efficiently while maintaining space proportional to the data's compression.

Findings

01

Uses O(z log(n/z)) space where z is Lempel-Ziv phrases

02

Finds pattern occurrences in O(m log n + occ log^ε n) time

03

Supports efficient direct access to compressed text

Abstract

The Block Tree is a recently proposed data structure that reaches compression close to Lempel-Ziv while supporting efficient direct access to text substrings. In this paper we show how a self-index can be built on top of a Block Tree so that it provides efficient pattern searches while using space proportional to that of the original data structure. More precisely, if a Lempel-Ziv parse cuts a text of length $n$ into $z$ non-overlapping phrases, then our index uses $O (z lo g (n / z))$ words and finds the $occ$ occurrences of a pattern of length $m$ in time $O (m lo g n + occ lo g^{ϵ} n)$ for any constant $ϵ > 0$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.