Bagging and Boosting a Treebank Parser

John C. Henderson; Eric Brill

arXiv:cs/0006011·cs.CL·May 23, 2007·37 cites

Bagging and Boosting a Treebank Parser

John C. Henderson, Eric Brill

PDF

Open Access

TL;DR

This paper explores the application of bagging and boosting techniques to improve a trainable statistical parser, achieving significant gains in parsing accuracy and revealing annotation inconsistencies in the Penn Treebank.

Contribution

It introduces the use of ensemble methods like bagging and boosting for natural language parsing, demonstrating their effectiveness and uncovering annotation issues.

Findings

01

Boosted parser improves F-measure significantly.

02

Ensemble methods outperform single parsers.

03

Error analysis identifies annotation inconsistencies.

Abstract

Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in F-measure as doubling the corpus size. Error analysis of the result of the boosting technique reveals some inconsistent annotations in the Penn Treebank, suggesting a semi-automatic method for finding inconsistent treebank annotations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies