Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse
Eleftheria Tsipidi, Franz Nowak, Ryan Cotterell, Ethan Wilcox, Mario, Giulianelli, Alex Warstadt

TL;DR
This paper challenges the Uniform Information Density hypothesis by proposing that discourse structure influences information rate, and demonstrates that hierarchical discourse predictors better predict surprisal contours in natural language.
Contribution
It introduces the Structured Context Hypothesis, linking discourse hierarchy to information rate modulation, and empirically shows hierarchical predictors outperform uniform models.
Findings
Hierarchical predictors significantly predict surprisal contours.
Deeply nested discourse structures are more predictive.
UID alone does not fully explain information rate fluctuations.
Abstract
The Uniform Information Density (UID) hypothesis posits that speakers tend to distribute information evenly across linguistic units to achieve efficient communication. Of course, information rate in texts and discourses is not perfectly uniform. While these fluctuations can be viewed as theoretically uninteresting noise on top of a uniform target, another explanation is that UID is not the only functional pressure regulating information content in a language. Speakers may also seek to maintain interest, adhere to writing conventions, and build compelling arguments. In this paper, we propose one such functional pressure; namely that speakers modulate information rate based on location within a hierarchically-structured model of discourse. We term this the Structured Context Hypothesis and test it by predicting the surprisal contours of naturally occurring discourses extracted from large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiscourse Analysis in Language Studies · Natural Language Processing Techniques
