Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels
Ted Briscoe (Cambridge University), and John Carroll (Cambridge, University)

TL;DR
This paper presents a probabilistic LR parser for part-of-speech and punctuation labels, demonstrating its robustness and the impact of punctuation on syntactic analysis through extensive experiments on natural English text.
Contribution
It introduces a novel probabilistic LR parsing approach that incorporates punctuation, and evaluates its effectiveness across multiple corpora with a focus on punctuation's role.
Findings
Punctuation significantly improves parsing accuracy.
The parser achieves broad coverage across different corpora.
Probabilistic models enhance robustness of syntactic parsing.
Abstract
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
