Natural Language Parsing as Statistical Pattern Recognition

David M. Magerman

arXiv:cmp-lg/9405009·cmp-lg·August 31, 2016·226 cites

Natural Language Parsing as Statistical Pattern Recognition

David M. Magerman

PDF

Open Access

TL;DR

This paper introduces an automatic statistical parser that learns from parsed sentences using decision trees, significantly outperforming traditional rule-based parsers in accuracy by leveraging contextual and lexical information.

Contribution

It presents a novel method for automatically acquiring a statistical parser using decision trees, reducing manual effort and improving disambiguation accuracy over traditional grammar-based approaches.

Findings

01

Decision tree parser achieved 78% accuracy.

02

Outperformed a ten-year rule-based parser.

03

Utilized contextual and lexical features for disambiguation.

Abstract

Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. These decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems