Spanner Evaluation over SLP-Compressed Documents
Markus L. Schmid, Nicole Schweikardt

TL;DR
This paper presents methods for evaluating regular spanners directly on SLP-compressed documents, enabling efficient processing without decompression, and demonstrating potential advantages over traditional algorithms on uncompressed data.
Contribution
It introduces algorithms for spanner evaluation on SLP-compressed data with complexity bounds that can outperform uncompressed methods in big-data scenarios.
Findings
Model checking and non-emptiness checking run in time proportional to SLP size.
Enumeration of extracted spans has logarithmic delay relative to uncompressed data size.
Algorithms can outperform traditional methods on highly compressible documents.
Abstract
We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) -- a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for algorithmics on compressed data. In terms of data complexity, our results are as follows. For a regular spanner M and an SLP S that represents a document D, we can solve the tasks of model checking and of checking non-emptiness in time O(size(S)). Computing the set M(D) of all span-tuples extracted from D can be done in time O(size(S) size(M(D))), and enumeration of M(D) can be done with linear preprocessing O(size(S)) and a delay of O(depth(S)), where depth(S) is the depth of S's derivation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
