Bagging and Boosting a Treebank Parser
John C. Henderson, Eric Brill

TL;DR
This paper explores the application of bagging and boosting techniques to improve a trainable statistical parser, achieving significant gains in parsing accuracy and revealing annotation inconsistencies in the Penn Treebank.
Contribution
It introduces the use of ensemble methods like bagging and boosting for natural language parsing, demonstrating their effectiveness and uncovering annotation issues.
Findings
Boosted parser improves F-measure significantly.
Ensemble methods outperform single parsers.
Error analysis identifies annotation inconsistencies.
Abstract
Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in F-measure as doubling the corpus size. Error analysis of the result of the boosting technique reveals some inconsistent annotations in the Penn Treebank, suggesting a semi-automatic method for finding inconsistent treebank annotations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
