Linguistic Structure from a Bottleneck on Sequential Information Processing
Richard Futrell, Michael Hahn

TL;DR
This paper demonstrates that systematic linguistic structures like words and phrases emerge from constraints on predictive information, linking language structure to cognitive processing limits.
Contribution
It introduces a novel connection between language structure and a statistical complexity measure, predictive information, supported by simulations and linguistic data analysis.
Findings
Codes constrained by predictive information produce word-like structures.
Human languages show reduced predictive information compared to baselines.
Language structures are shaped by cognitive constraints on information processing.
Abstract
Human language has a distinct systematic structure, where utterances break into individually meaningful words which are combined to form phrases. We show that natural-language-like systematicity arises in codes that are constrained by a statistical measure of complexity called predictive information, also known as excess entropy. Predictive information is the mutual information between the past and future of a stochastic process. In simulations, we find that such codes break messages into groups of approximately independent features which are expressed systematically and locally, corresponding to words and phrases. Next, drawing on crosslinguistic text corpora, we find that actual human languages are structured in a way that reduces predictive information compared to baselines at the levels of phonology, morphology, syntax, and lexical semantics. Our results establish a link between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
