TL;DR
SpannerLib is a Python library that integrates declarative document spanners with imperative workflows, enabling seamless development of information extraction programs that combine rule-based and ML-based methods within Jupyter notebooks.
Contribution
It introduces SpannerLib, allowing embedding of document spanners in Python, and supports interaction between declarative rules and custom Python code for enhanced IE development.
Findings
Supports complex IE workflows within Jupyter Notebook
Enables integration of ML-based NLP models with declarative rules
Facilitates development of flexible, hybrid IE programs
Abstract
Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLib
