On the origin of long-range correlations in texts
Eduardo G. Altmann, Giampaolo Cristadoro, and Mirko Degli Esposti

TL;DR
This paper investigates the origins of long-range correlations in texts, showing how hierarchical linguistic structures lead to bursty, correlated sequences that reflect semantic relevance, with implications for understanding language complexity.
Contribution
It explains the emergence of long-range correlations in texts through hierarchical linguistic levels and demonstrates their general applicability beyond language.
Findings
Correlations manifest as bursty sequences near semantic topics
Hierarchical linguistic structures generate long-range correlations
The mechanisms are applicable to other hierarchical systems
Abstract
The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
