Towards History-based Grammars: Using Richer Models for Probabilistic Parsing
Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert, Mercer, and Salim Roukos

TL;DR
This paper introduces HBG, a probabilistic model that leverages detailed linguistic features for improved natural language parsing, significantly outperforming previous models in accuracy.
Contribution
The paper presents a novel history-based grammar model that integrates lexical, syntactic, semantic, and structural information using decision trees, enhancing parsing accuracy.
Findings
HBG achieves 75% accuracy, up from 60% with previous models.
HBG reduces parsing errors by 37%.
Incorporating rich linguistic features improves disambiguation.
Abstract
We describe a generative probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. This stands in contrast to the usual approach of further grammar tailoring via the usual linguistic introspection in the hope of generating the correct parse. In head-to-head tests against one of the best existing robust probabilistic parsing models, which we call P-CFG, the HBG model significantly outperforms P-CFG, increasing the parsing accuracy rate from 60% to 75%, a 37% reduction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
