Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset
Guanyi Chen, Fahime Same, Kees van Deemter

TL;DR
This paper introduces a new multilingual dataset based on OntoNotes for neural referential form selection, demonstrating its advantages over WebNLG and revealing language-specific differences in RFS.
Contribution
It creates a broader, more realistic dataset for RFS in English and Chinese, and evaluates neural models, highlighting linguistic differences and dataset suitability.
Findings
OntoNotes dataset is more effective than WebNLG for RFS evaluation.
Chinese RFS relies more on discourse context than English.
Neural RFS models perform better on the new dataset.
Abstract
Previous work on Neural Referring Expression Generation (REG) all uses WebNLG, an English dataset that has been shown to reflect a very limited range of referring expression (RE) use. To tackle this issue, we build a dataset based on the OntoNotes corpus that contains a broader range of RE use in both English and Chinese (a language that uses zero pronouns). We build neural Referential Form Selection (RFS) models accordingly, assess them on the dataset and conduct probing experiments. The experiments suggest that, compared to WebNLG, OntoNotes is better for assessing REG/RFS models. We compare English and Chinese RFS and confirm that, in line with linguistic theories, Chinese RFS depends more on discourse context than English.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
