Constant-Delay Enumeration for Nondeterministic Document Spanners

Antoine Amarilli; Pierre Bourhis; Stefan Mengel; Matthias Niewerth

arXiv:1807.09320·cs.DB·December 8, 2020

Constant-Delay Enumeration for Nondeterministic Document Spanners

Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth

PDF

1 Repo

TL;DR

This paper presents a new enumeration algorithm for document spanners that achieves constant delay in document size, with linear preprocessing and polynomial delay in the size of the automaton, refuting previous limitations.

Contribution

It introduces a tractable enumeration algorithm for nondeterministic sequential variable-set automata with optimal data complexity bounds.

Findings

01

Linear preprocessing in document size

02

Polynomial delay independent of document size

03

Efficient enumeration for extended VAs

Abstract

We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm which is tractable in combined complexity, i.e., in the sizes of the input document and the VA; while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS'18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PoDMR/enum-spanner-rs
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.