Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words
Eduardo G. Altmann, Janet B. Pierrehumbert, and Adilson E. Motter

TL;DR
This paper reveals that word recurrence times in language follow a stretched exponential distribution, influenced by semantic type, and introduces a generative model to explain these bursty patterns, extending understanding beyond Zipf's law.
Contribution
It demonstrates that word recurrence times exhibit bursty deviations from Poisson processes, characterized by a stretched exponential distribution, and presents a generative model for this behavior.
Findings
Word recurrence times follow a stretched exponential distribution.
Semantic type influences the extent of burstiness in word usage.
A generative model accurately captures the observed recurrence dynamics.
Abstract
Background: Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well. Methodology/Principal Findings: By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type -- a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
