Efficient Semiring-Weighted Earley Parsing

Andreas Opedal; Ran Zmigrod; Tim Vieira; Ryan Cotterell; Jason Eisner

arXiv:2307.02982·cs.CL·July 7, 2023

Efficient Semiring-Weighted Earley Parsing

Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner

PDF

1 Repo

TL;DR

This paper refines Earley's parsing algorithm with semiring weights, achieving improved worst-case runtimes and efficient implementation details suitable for large, weighted grammars in natural language processing.

Contribution

It introduces a deduction system for Earley's algorithm with speed-ups, including a novel finite-state automaton representation for improved runtime, and extends to semiring-weighted parsing with practical implementation insights.

Findings

01

Achieves worst-case runtime of O(N^3|G|) for large grammars.

02

Provides a semiring-weighted deduction framework for Earley's algorithm.

03

Ensures efficient implementation with asymptotic runtime comparable to unweighted methods.

Abstract

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O (N^{3} ∣ G ∣∣ R ∣)$ , which is unworkable for the large grammars that arise in natural language processing, to $O (N^{3} ∣ G ∣)$ , which matches the runtime of CKY on a binarized version of the grammar $G$ . Here $N$ is the length of the sentence, $∣ R ∣$ is the number of productions in $G$ , and $∣ G ∣$ is the total length of those productions. We also provide a version that achieves runtime of $O (N^{3} ∣ M ∣)$ with $∣ M ∣ \leq ∣ G ∣$ when the grammar is represented compactly as a single finite-state automaton $M$ (this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rycolab/earleys-algo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.