Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility
Xiaomeng Zhu, Zhenghao Zhou, Simon Charlow, Robert Frank

TL;DR
This paper introduces an anaphora accessibility task to evaluate discourse-level understanding in humans and language models, revealing differences in how they process structural versus lexical information.
Contribution
It proposes a new discourse-level understanding assessment based on anaphora accessibility and provides an evaluation dataset inspired by dynamic semantics.
Findings
Humans and LLMs align on some discourse tasks.
LLMs rely more on lexical cues than structural abstractions.
Divergence highlights differences in language comprehension mechanisms.
Abstract
We present a hierarchy of natural language understanding abilities and argue for the importance of moving beyond assessments of understanding at the lexical and sentence levels to the discourse level. We propose the task of anaphora accessibility as a diagnostic for assessing discourse understanding, and to this end, present an evaluation dataset inspired by theoretical research in dynamic semantics. We evaluate human and LLM performance on our dataset and find that LLMs and humans align on some tasks and diverge on others. Such divergence can be explained by LLMs' reliance on specific lexical items during language comprehension, in contrast to human sensitivity to structural abstractions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsALIGN
