Recursive Programs for Document Spanners
Liat Peterfreund, Balder ten Cate, Ronald Fagin, Benny Kimelfeld

TL;DR
This paper demonstrates that recursive Datalog over regex formulas characterizes exactly the class of polynomial-time computable document spanners, expanding the understanding of expressive power in information extraction models.
Contribution
It establishes a precise correspondence between recursive Datalog over regex formulas and polynomial-time document spanners, and compares this with existing formalisms.
Findings
Recursive Datalog over regex formulas captures exactly polynomial-time spanners.
Comparison with core spanners and their closure under difference.
Extension of results to a generalized framework for relational and document spanners.
Abstract
A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are obtained by adding capture variables to regular expressions. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (extracting relations that play the role of EDBs from the input document). In this paper, we investigate the expressive power of recursive Datalog over regex formulas. Our main result is that such programs capture precisely the document spanners computable in polynomial time. Additional results compare recursive programs to known formalisms such as the language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
