Is Information Density Uniform when Utterances are Grounded on Perception and Discourse?
Matteo Gay, Coleman Haley, Mario Giulianelli, Edoardo Ponti

TL;DR
This study investigates how visual grounding affects the distribution of information in language, showing that multimodal context leads to more uniform information flow across diverse languages and discourse types.
Contribution
It is the first computational analysis of UID in visually grounded settings, demonstrating increased information uniformity with perceptual grounding across multiple languages.
Findings
Grounding on perception increases information uniformity.
Visual and discourse context further reduce surprisal at discourse onsets.
Grounded language shows greater information flow consistency.
Abstract
The Uniform Information Density (UID) hypothesis posits that speakers are subject to a communicative pressure to distribute information evenly within utterances, minimising surprisal variance. While this hypothesis has been tested empirically, prior studies are limited exclusively to text-only inputs, abstracting away from the perceptual context in which utterances are produced. In this work, we present the first computational study of UID in visually grounded settings. We estimate surprisal using multilingual vision-and-language models over image-caption data in 30 languages and visual storytelling data in 13 languages, together spanning 11 families. We find that grounding on perception consistently smooths the distribution of information, increasing both global and local uniformity across typologically diverse languages compared to text-only settings. In visual narratives, grounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCategorization, perception, and language · Neurobiology of Language and Bilingualism · Language and cultural evolution
