Nightmare at test time: How punctuation prevents parsers from   generalizing

Anders S{\o}gaard; Miryam de Lhoneux; Isabelle Augenstein

arXiv:1809.00070·cs.CL·September 5, 2018

Nightmare at test time: How punctuation prevents parsers from generalizing

Anders S{\o}gaard, Miryam de Lhoneux, Isabelle Augenstein

PDF

TL;DR

This paper investigates how reliance on punctuation affects dependency parser performance, revealing that training without punctuation improves robustness especially when punctuation is absent or used creatively.

Contribution

It demonstrates that neural parsers are highly sensitive to punctuation and that training without punctuation enhances their generalization to informal and corrupted text.

Findings

01

Neural parsers are more sensitive to punctuation than vintage parsers.

02

Training without punctuation improves parser performance on corrupted and informal data.

03

Punctuation reliance hinders parser generalization in real-world scenarios.

Abstract

Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.