# The Relationship Between Surprisal and Prosodic Prominence in Conversation Reflects Intelligibility‐Oriented Pressures

**Authors:** Thomas Hikaru Clark, Moshe Poliak, Tamar Regev, A. J. Haskins, Caroline Robertson, Edward Gibson

PMC · DOI: 10.1111/cogs.70134 · 2025-10-27

## TL;DR

This study explores how unpredictable words in conversation are linked to prosodic features like pitch and duration, suggesting speakers adjust their speech to improve listener understanding.

## Contribution

The study provides new evidence that unpredictability in speech correlates with prosodic prominence, supporting intelligibility-oriented language production.

## Key findings

- GPT-2 surprisal predicts higher duration, maximum pitch, and pitch range of words in conversation.
- Listener backchannels are associated with spikes in speaker word surprisal.
- Context window size affects model fit differently for maximum pitch versus other variables.

## Abstract

Conversation is a dynamic, multimodal activity involving the exchange of complex streams of information like words, prosody, gesture, eye contact, and backchannels. Understanding how these different channels interact in naturalistic scenarios is essential for understanding the mechanisms governing human communication. Past studies suggested that the duration of words is tied to their predictability in context, but it remains unclear whether this relationship is speaker‐oriented (e.g., retrieval or production‐based) or due to listener‐oriented, intelligibility‐based pressures (i.e., emphasizing unpredictable words to ease comprehension). This study aims to examine the relationship between predictability and additional acoustic variables, to test how much intelligibility‐oriented principles impact conversation. We use the GPT‐2 large language model to assess the relationship between surprisal, a measure of unpredictability, and several variables known to play an important role in conversation—the prosodic features of duration, intensity, and pitch. We perform this analysis on the CANDOR corpus of naturalistic spoken video call conversation between strangers in English. In keeping with previous results using n‐gram predictability, we find that GPT‐2 surprisal predicts significantly higher values for duration. Moreover, surprisal also predicts maximum pitch and pitch range even when controlling for duration, with mixed evidence for an effect of surprisal on intensity. Additionally, we investigated listener backchannels (short interjections like “yeah” or “mhm”) and found that listener backchannels tended to be accompanied and followed by a spike in the surprisal of speakers' words. Finally, we demonstrate a divergence between the effect of context window size on the model fit of surprisal to maximum pitch and to other variables. The results provide additional support for intelligibility‐based accounts, which hold that language production is sensitive to a pressure for successful communication, not just speaker‐oriented pressures. Our data and analysis code are shared: https://osf.io/sqpn6/?view_only=e4d9e36c68b54863bc781e359463e1fe.

## Full-text entities

- **Genes:** GPT2 (glutamic--pyruvic transaminase 2) [NCBI Gene 84706] {aka ALT2, GPT 2, MRT49, NEDSPM}
- **Chemicals:** Surprisal (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12558650/full.md

---
Source: https://tomesphere.com/paper/PMC12558650