Three Generative, Lexicalised Models for Statistical Parsing

Michael Collins (University of Pennsylvania)

arXiv:cmp-lg/9706022·cmp-lg·February 3, 2008·50 cites

Three Generative, Lexicalised Models for Statistical Parsing

Michael Collins (University of Pennsylvania)

PDF

Open Access

TL;DR

This paper introduces a new generative lexicalized parsing model that incorporates subcategorization and wh-movement, achieving improved accuracy on Wall Street Journal data.

Contribution

It presents a novel generative lexicalized parsing model with probabilistic handling of syntactic phenomena, enhancing parsing performance.

Findings

01

Achieves 88.1% precision and 87.5% recall on WSJ data

02

Improves over previous models by 2.3% in accuracy

03

Incorporates probabilistic treatment of subcategorisation and wh-movement

Abstract

In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems