Revisiting Weighted Information Extraction: A Simpler and Faster   Algorithm for Ranked Enumeration

Pawel Gawrychowski; Florin Manea; Markus L. Schmid

arXiv:2409.18563·cs.DS·October 8, 2024

Revisiting Weighted Information Extraction: A Simpler and Faster Algorithm for Ranked Enumeration

Pawel Gawrychowski, Florin Manea, Markus L. Schmid

PDF

Open Access

TL;DR

This paper introduces a simpler, faster algorithm for weighted information extraction that improves delay bounds and leverages shortest path enumeration techniques, combining algebra, geometry, and linear programming.

Contribution

It presents a new algorithm with linear preprocessing and improved delay bounds for weighted enumeration, surpassing previous methods in efficiency and simplicity.

Findings

01

Achieves linear preprocessing and delay of O(|s|) with high probability.

02

Significantly improves delay bounds over previous algorithms.

03

Combines algebra, geometry, and linear programming techniques.

Abstract

Information extraction from textual data, where the query is represented by a finite transducer and the task is to enumerate all results without repetition, and its extension to the weighted case, where each output element has a weight and the output elements are to be enumerated sorted by their weights, are important and well studied problems in database theory. On the one hand, the first framework already covers the well-known case of regular document spanners, while the latter setting covers several practically relevant tasks that cannot be described in the unweighted setting. It is known that in the unweighted case this problem can be solved with linear time preprocessing $O (∣ D ∣)$ and output-linear delay $O (∣ s ∣)$ in data complexity, where $D$ is the input data and $s$ is the current output element. For the weighted case, Bourhis, Grez, Jachiet, and Riveros [ICDT 2021] recently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Rough Sets and Fuzzy Logic · Advanced Database Systems and Queries