An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
Andreas Stolcke (SRI International, Menlo Park, CA 94025)

TL;DR
This paper introduces an extension of Earley's parser for stochastic context-free grammars that efficiently computes prefix probabilities, substring probabilities, most likely parses, and production counts in a single left-to-right pass.
Contribution
It presents a novel, efficient algorithm that extends Earley's parser to compute multiple probabilistic measures simultaneously without normal form conversion.
Findings
Computes prefix and substring probabilities incrementally.
Handles any context-free rule format without conversion.
Works efficiently on sparse grammars using top-down control.
Abstract
We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling
