Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech   and Punctuation Labels

Ted Briscoe (Cambridge University); and John Carroll (Cambridge; University)

arXiv:cmp-lg/9510005·cmp-lg·February 3, 2008·40 cites

Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels

Ted Briscoe (Cambridge University), and John Carroll (Cambridge, University)

PDF

Open Access

TL;DR

This paper presents a probabilistic LR parser for part-of-speech and punctuation labels, demonstrating its robustness and the impact of punctuation on syntactic analysis through extensive experiments on natural English text.

Contribution

It introduces a novel probabilistic LR parsing approach that incorporates punctuation, and evaluates its effectiveness across multiple corpora with a focus on punctuation's role.

Findings

01

Punctuation significantly improves parsing accuracy.

02

The parser achieves broad coverage across different corpora.

03

Probabilistic models enhance robustness of syntactic parsing.

Abstract

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems