To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese
Yukiko Ishizuki, Tatsuki Kuribayashi, Yuichiroh Matsubayashi, Ryohei, Sasano, Kentaro Inui

TL;DR
This study investigates how native Japanese speakers decide on argument omission in sentences, providing a large annotated dataset and analyzing the performance of language models in predicting these judgments.
Contribution
It introduces a large-scale annotated dataset of Japanese argument ellipsis judgments and evaluates language models' ability to predict human decisions, revealing linguistic factors involved.
Findings
Native speakers share common criteria for ellipsis decisions.
Language models show gaps compared to human judgments in specific linguistic aspects.
Quantitative analysis of linguistic factors influencing ellipsis judgments.
Abstract
Speakers sometimes omit certain arguments of a predicate in a sentence; such omission is especially frequent in pro-drop languages. This study addresses a question about ellipsis -- what can explain the native speakers' ellipsis decisions? -- motivated by the interest in human discourse processing and writing assistance for this choice. To this end, we first collect large-scale human annotations of whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese, a prototypical pro-drop language. The data indicate that native speakers overall share common criteria for such judgments and further clarify their quantitative characteristics, e.g., the distribution of related linguistic factors in the balanced corpus. Furthermore, the performance of the language model-based argument ellipsis judgment model is examined, and the gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics and Discourse Analysis · Natural Language Processing Techniques
