Spanner Evaluation over SLP-Compressed Documents

Markus L. Schmid; Nicole Schweikardt

arXiv:2101.10890·cs.DS·January 27, 2021

Spanner Evaluation over SLP-Compressed Documents

Markus L. Schmid, Nicole Schweikardt

PDF

TL;DR

This paper presents methods for evaluating regular spanners directly on SLP-compressed documents, enabling efficient processing without decompression, and demonstrating potential advantages over traditional algorithms on uncompressed data.

Contribution

It introduces algorithms for spanner evaluation on SLP-compressed data with complexity bounds that can outperform uncompressed methods in big-data scenarios.

Findings

01

Model checking and non-emptiness checking run in time proportional to SLP size.

02

Enumeration of extracted spans has logarithmic delay relative to uncompressed data size.

03

Algorithms can outperform traditional methods on highly compressible documents.

Abstract

We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) -- a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for algorithmics on compressed data. In terms of data complexity, our results are as follows. For a regular spanner M and an SLP S that represents a document D, we can solve the tasks of model checking and of checking non-emptiness in time O(size(S)). Computing the set M(D) of all span-tuples extracted from D can be done in time O(size(S) size(M(D))), and enumeration of M(D) can be done with linear preprocessing O(size(S)) and a delay of O(depth(S)), where depth(S) is the depth of S's derivation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.