Statistical Decision-Tree Models for Parsing

David M. Magerman

arXiv:cmp-lg/9504030·cmp-lg·February 3, 2008·46 cites

Statistical Decision-Tree Models for Parsing

David M. Magerman

PDF

Open Access

TL;DR

This paper introduces SPATTER, a statistical decision-tree parser that constructs complete syntactic analyses for sentences, significantly outperforming traditional grammar-based parsers on large, ambiguous text corpora.

Contribution

The paper presents a novel decision-tree based statistical parser, SPATTER, which relies on lexical and contextual information, achieving superior accuracy over existing grammar-based methods.

Findings

01

SPATTER achieves 86-91% precision and recall on Wall Street Journal data.

02

SPATTER outperforms IBM's grammar-based parser in experiments.

03

The parser effectively handles sentences up to 40 words with high accuracy.

Abstract

Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing { $n$ }-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies