ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors
Qinchan Li, Sophie Hao

TL;DR
This paper introduces ERAS, a benchmark for assessing Chinese NLP models' robustness to morphological garden path errors caused by segmentation ambiguities, revealing vulnerabilities in current models' handling of morphosyntactic context.
Contribution
The paper proposes ERAS, a novel benchmark for evaluating Chinese NLP models' sensitivity to segmentation ambiguities and demonstrates existing models' susceptibility to these errors.
Findings
Word segmentation models make garden path errors on ambiguous sentences.
Sentiment analysis models with character-level tokenization also exhibit garden path errors.
Models often fail to incorporate morphosyntactic context in Chinese text segmentation.
Abstract
In languages without orthographic word boundaries, NLP models perform word segmentation, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, ERAS, that tests a model's vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
