Expect the Unexpected? Testing the Surprisal of Salient Entities
Jessica Lin, Amir Zeldes

TL;DR
This study investigates how the salience of discourse entities influences surprisal, revealing that salient entities tend to have higher surprisal and improve predictability in various genres.
Contribution
It introduces a novel analysis of entity salience's effect on surprisal, using a large annotated dataset and a minimal-pair prompting method.
Findings
Globally salient entities have higher surprisal than non-salient ones.
Salient entities reduce surprisal in surrounding content when used as prompts.
Effect of salience on surprisal varies across genres, strongest in topic-coherent texts.
Abstract
Previous work examining the Uniform Information Density (UID) hypothesis has shown that while information as measured by surprisal metrics is distributed more or less evenly across documents overall, local discrepancies can arise due to functional pressures corresponding to syntactic and discourse structural constraints. However, work thus far has largely disregarded the relative salience of discourse participants. We fill this gap by studying how overall salience of entities in discourse relates to surprisal using 70K manually annotated mentions across 16 genres of English and a novel minimal-pair prompting method. Our results show that globally salient entities exhibit significantly higher surprisal than non-salient ones, even controlling for position, length, and nesting confounds. Moreover, salient entities systematically reduce surprisal for surrounding content when used as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
