SpannerLib: Embedding Declarative Information Extraction in an   Imperative Workflow

Dean Light; Ahmad Aiashy; Mahmoud Diab; Daniel Nachmias; Stijn; Vansummeren; Benny Kimelfeld

arXiv:2409.01736·cs.DB·September 5, 2024

SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow

Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn, Vansummeren, Benny Kimelfeld

PDF

1 Repo

TL;DR

SpannerLib is a Python library that integrates declarative document spanners with imperative workflows, enabling seamless development of information extraction programs that combine rule-based and ML-based methods within Jupyter notebooks.

Contribution

It introduces SpannerLib, allowing embedding of document spanners in Python, and supports interaction between declarative rules and custom Python code for enhanced IE development.

Findings

01

Supports complex IE workflows within Jupyter Notebook

02

Enables integration of ML-based NLP models with declarative rules

03

Facilitates development of flexible, hybrid IE programs

Abstract

Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DeanLight/spannerlib
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLib