The dynamics of discovery and the Heaps-Zipf relationship
C\'elestin Zimmerlin, Thomas Louail, Manuel Moussallam, Marc Barthelemy

TL;DR
This paper explores how temporal correlations in sequences like music or web browsing impact the growth of unique elements, challenging the assumption that such growth solely reflects frequency distributions.
Contribution
It demonstrates that temporal structure significantly influences type-token growth, revealing deviations from traditional Zipf-Heaps models in real-world data.
Findings
Temporal correlations cause deviations from Zipf-Heaps predictions.
A minimal model reproduces diverse type-token trajectories.
Type growth depends on both frequency distribution and temporal structure.
Abstract
When following a sequence - such as reading a text or tracking a user's activity - one can measure how the "dictionary" of distinct elements (types) grows with the number of observations (tokens). When this growth follows a power law, it is referred to as Heaps' law, a regularity often associated with Zipf's law and frequently used to characterize human discovery processes. While random sampling from a Zipf-like distribution can reproduce Heaps' law, this connection relies on the assumption of temporal independence - an assumption often violated in real-world systems although frequently found in the literature. Here, we investigate how temporal correlations in token sequences affect the type-token curve. In human behaviors like music listening and web browsing, domain-specific correlations in token ordering lead to systematic deviations from the Zipf-Heaps framework, effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
