A version of Herbert A. Simon's model with slowly fading memory and its connections to branching processes
Jean Bertoin

TL;DR
This paper introduces a recursive word-generation model inspired by Herbert A. Simon's work, analyzing its long-term behavior and connection to branching processes, revealing power-law and exponential decay regimes based on parameters.
Contribution
It extends Simon's model by incorporating slowly fading memory and establishes links to branching processes, providing new insights into word frequency distributions.
Findings
Proportion of words with exact repetitions converges as string length grows.
Power-law decay in word frequency distribution when certain parameter conditions are met.
Exponential decay occurs in the distribution under different parameter regimes.
Abstract
Construct recursively a long string of words w1. .. wn, such that at each step k, w k+1 is a new word with a fixed probability p (0, 1), and repeats some preceding word with complementary probability 1 -- p. More precisely, given a repetition occurs, w k+1 repeats the j-th word with probability proportional to j for j = 1,. .. , k. We show that the proportion of distinct words occurring exactly times converges as the length n of the string goes to infinity to some probability mass function in the variable 1, whose tail decays as a power function when 1 -- p > /(1 + ), and exponentially fast when 1 -- p < /(1 + ).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · semigroups and automata theory · Theoretical and Computational Physics
A version of Herbert A. Simon’s model
with slowly fading memory and
its connections to branching processes
Jean Bertoin Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland. Email: [email protected]
Abstract
Construct recursively a long string of words , such that at each step , is a new word with a fixed probability , and repeats some preceding word with complementary probability . More precisely, given a repetition occurs, repeats the -th word with probability proportional to for . We show that the proportion of distinct words occurring exactly times converges as the length of the string goes to infinity to some probability mass function in the variable , whose tail decays as a power function when , and exponentially fast when .
Keywords: Yule-Simon model, preferential attachment, memory, continuous state branching process, Crump-Mode-Jagers branching process, heavy tail distributions.
AMS subject classifications: 60J85 ; 05C85.
1 Introduction
Partly inspired by the earlier work of G. U. Yule, Herbert A. Simon [16] argued in 1955 that an elementary stochastic model could explain the occurrence of power tail distributions in a variety of empirical data. In short, he introduced a simple random algorithm to produce a long string of words . This algorithm depends on a parameter which, in some sense, measures the innovation, and can be described as follows. Once the first word has been written, for each , with probability , is a new word different from all the preceding, and with complementary probability , is copied from a uniform sample from .
Simon’s model can be viewed the germ of many network growth dynamics with preferential attachment that flourished since the turn of the millennium. We merely refer here to [2] for an application to the World Wide Web, and to the textbooks [5, 6, 18] and their bibliographies which contain a wealth of relevant references. During the last decade or so, several works in this field have aimed at taking also into account further relevant features for specific models. In particular, an important issue e.g. in citation networks, is to incorporate aging, or decay of relevance, or fading memory phenomena. Dorogovtsev and Mendes [4, 5] first generalized the Barabási-Albert model by letting nodes loosing attraction as their ages increase. Cattuto et al. [3] modified Simon’s model by assigning different weights to different words; we shall present their framework explicitly later on as it constitutes the basis of the present work. In a somewhat different direction, the linear rate birth processes related to Yule’s model or to classical preferential attachment dynamics, have been replaced by nonlinear time fractional birth processes to emulate the effect of a slowly-decaying memory in [11, 13, 14], whereas [7] rather introduces fitness and age-dependency for those birth processes. Further generalizations can be found in the literature, also refer for instance to [17, 19].
The modification Simon’s model that Cattuto et al. considered concerns the copy mechanism, which is no longer made uniformly at random but rather depends on weights assigned to each word. In general, the weight of a word is a function of both the rank of that word and the current total length of the string. Recall that the word is attached at the -th step of the algorithm to the string ; just as for the original model of Simon, is a new word different from all the preceding with a fixed probability . However , with probability , is a copy of one of the preceding words sampled now at random with probability proportional to its weight, meaning for some random index with distribution
[TABLE]
We refer henceforth to this modified algorithm as the -weighted Simon’s model. When weights increase with the index , recent words are more likely to be repeated than old ones, henceforth emulating a fading memory effect; Simon’s original model is recovered plainly when all weights are equal. Obviously, assigning weights to words also changes profoundly the mathematical analysis of the algorithm. Notably, the process which counts the number of occurrences of some given world as a function of the number of steps of the algorithm is a Markov chain with explicit transition distributions for Simon’s original model, but the Markov property is lost when weights are not all equal.
Cattuto et al. [3] considered hyperbolic weights, viz. , where is a characteristic time-scale. Quite recently, [15] dealt with short-ranged memory with , where should be thought of as the memory range. In the present work, we shall focus on weights depending as a power function on the rank of the word only,
[TABLE]
where is another parameter. Note that for any , the probability of repeating some word with at the -th step tends to as for the choice (1), whereas this probability would converge [math] in Cattuto’s setting. Hence, memory fades away more slowly in the present framework than in Cattuto’s.
Our purpose here is not to increase marginally the already rich variety of models that exist in this area, but rather to point out that, despite of the loss of the Markov property, a fairly detailed analysis can be made thanks to connections with some rather simple branching processes that may be interesting in their own right. A priori, the appearance of branching processes in this setting should certainly not come as a surprise, as it is well-know that they play a fundamental role in preferential attachment dynamics. Nonetheless the connection here is somehow less direct, it only holds asymptotically, and it does not seem straightforward to describe a rigorous construction of weighted Simon’s models from branching processes.
To start with, recall two fundamental features of Simon’s original model. First, the proportion of distinct words which are repeated some fixed amount of times converges as the length of the string goes to infinity. Specifically, note that the number of different words which have been used when the string has total length is close to when . If for every , we write for the number of different words which occur exactly times in a string of total length , then
[TABLE]
where and stands for the Beta function. The right-hand side above is a probability mass function called the Yule-Simon distribution (with parameter ). Second, since as ; the tail of the Yule-Simon distribution decays as a power function with exponent .
We now state a simple version of our main result for the -weighted Simon’s model when weights are given by (1). In short, as the length of the string tends to infinity, the proportion of different words with any given occurrence number converges towards some mass distribution . Further, is the critical innovation parameter, in the sense that has a power tail when , and and exponential tail when .
Theorem 1**.**
Let denote the number of different words which are repeated exactly times in a string of total length generated by the -weighted Simon’s model, with given by (1). Then we have:
- (i)
for every ,
[TABLE]
where is the probability mass function of some random variable with values in , 2. (ii)
if , then
[TABLE]
for some , 3. (iii)
if , then there exists with
[TABLE]
More precisely, we shall describe the probability mass function that appears in Theorem 1 in two different ways. First, it can be seen as the law of the number of birth events in some continuous state branching process that occurred before an independent exponential random time. Equivalently, it can also be seen as the law of a general (Crump-Mode-Jagers) branching process also evaluated at an independent exponential random time. Although we have not been able to provide any explicit expressions , we shall establish the power or exponential tail behaviors stated in Theorem 1 by making use of classical martingales associated to branching processes.
The rest of this work is organized as follows. Section 2 introduces the notion of attraction of a given word as a function of the length of the string. This enables us to circumvent the lack of the Markov property of the occurrence counting alone by viewing it as a component of a Markovian pair, and then to analyze its asymptotic behavior. Section 3 describe representations of the limiting processes of Section 2 in terms of simple branching processes, and describe their behaviors as time goes to infinity using classical martingales in this setting. Finally, Theorem 1 is proved in Section 4.
2 Occurrence counting and attraction
As it has already been mentioned in the Introduction, an obvious obstacle in the analysis of frequencies of words for a weighted Simon’s model is that the process counting the number of occurrences of a word is not Markov. We shall circumvent this difficulty by considering another natural functional of the algorithm, which enjoys the Markov property and from which occurrence counting can be recovered.
To start with, we say that the -th word in a string is new if for all (for any , this event has probability ); otherwise, we say that the -th word is a repetition. When the -th word is new, we define its occurrence counting and its attraction as a function of length of the string, respectively by
[TABLE]
and
[TABLE]
Observe that the second quantity is proportional to the probability that the -th word will be repeated at the -th step (the normalization is chosen for the purpose of convenience as it should become clear later on). When the -th word is a repetition, we set .
When the -th word is new, the evolution of its occurrence counting and of its attraction bear obvious similarities, especially when : if is repeated at the -th step, the occurrence counting increases exactly by and the attraction by slightly less than , and otherwise the occurrence counting is unchanged whereas the attraction slightly decreases. In particular there is the identity
[TABLE]
Our purpose here is to check that up-to a simple time-rescaling, occurrence counting and attraction converge jointly in distribution as the total length of the string goes to . To describe their limits, we introduce a time-inhomogeneous Markov process with càdlàg paths in , . The law of depends on a parameter which should be thought of as a birth-time. Specifically, under , we have that for and , the (right) slope of the path at time is , and further for , jumps of size occur with intensity . In other words, its infinitesimal generator at time is given by
[TABLE]
for any and any bounded and continuously differentiable function . The slope and jump rates being bounded, the existence and uniqueness of this process are immediate. We further denote the counting process of the jumps of by
[TABLE]
We shall now check that the distribution of the pair of processes under arises as the weak limit of a time-rescaled version of the attraction and occurence counting.
Lemma 1**.**
- (i)
Let be a sequence with . The conditional distribution of the pair of time-rescaled processes
[TABLE]
given that the -th word is new converges in the sense of Skorohod towards the law of under . 2. (ii)
Let be a second sequence with and for all sufficiently large. The joint conditional distribution of the two pairs of time-rescaled counting processes
[TABLE]
given that the -th and the -th words are both new converges in the sense of Skorohod towards the law of two pairs of independent processes distributed as under and under , respectively.
Proof.
We shall only prove the first assertion, the argument for the second is similar but with heavier notation and details are left to scrupulous readers.
We see from from the dynamics of the -weighted Simon’s model that, given that the -th word is new, the pair is an inhomogeneous Markov chain started at time from , with probability transitions given for any , and by
[TABLE]
and
[TABLE]
where
[TABLE]
Then take and . It follows that for every bounded function which is continuously differentiable in the first variable,
[TABLE]
On the other hand, the process under is an inhomogeneous Feller process, with for all , , and thanks to (2), its infinitesimal generator given by the right-hand side above. Our claim can now be derived from basic Markov chains approximation; see e.g. Theorem 19.28 in [9]. ∎
3 Connection to continuous state and Crump-Mode-Jagers branching processes
The time-inhomogeneity of the Markov process introduced in Section 2 is essentially artificial, in the sense that actually results from a time-homogeneous Markov process by a deterministic logarithmic time substitution, as we shall now explain. Consider a time homogeneous Markov process on with infinitesimal generator given for any smooth function by
[TABLE]
where is some parameter. We shall always deal with the situation when the process starts from , and shall not mention the initial state any further. The following observation is immediate by comparing (3) with (2).
Lemma 2**.**
Take
[TABLE]
For any fixed , the process given by
[TABLE]
has the law .
The process is a simple instance of a continuous state branching process (in short, CSBP). We now recall some basic features in this setting and refer to Section VI.6 in [1] or Chapter 12 in [10] for background. Lamperti described a construction of general CSBPs from Lévy processes with no negative jumps (see Theorem 12.2 in [10]), which we specialize to our setting.
Consider first a standard Poisson process started from , and then
[TABLE]
So is a Lévy process with no negative jumps started from , and more precisely, all its jumps have unit size. Next write
[TABLE]
for the first hitting time [math], with the usual convention . The indefinite integral yields a bijection from to ; we denote the inverse bijection by . Then the time-changed process is a version of .
Conversely, one notes the identity
[TABLE]
so that if we define
[TABLE]
then we can identify . That is, is the number of birth events (i.e. the number of jumps of ) occurring on the time-interval , agreeing that is viewed as the first birth event (i.e. jump time of ). We now derive from Lemma 2 the following:
Corollary 1**.**
Let be given by (4) and fix any . In the notation above, the process defined by
[TABLE]
has the same law as under .
As we shall see later on, the tail estimates in Theorem 1(ii-iii) depend crucially on the behavior of under as . Corollary 1 enables us to translate this question in terms of the asymptotic behavior of as . In this direction, it is well-known that, loosely speaking, the large time asymptotic behavior of a supercritical branching process depends on integrability properties of a remarkable martingale. In the present setting, the CSBP is supercritical if and only if , and we claim the following.
Lemma 3**.**
When , the process
[TABLE]
is a martingale which is bounded in for all .
Proof.
That is a martingale is well-known and indeed immediate from (3). We shall now check by induction that for every , there exists some constant such that
[TABLE]
For , (8) is actually an equality with , thanks to the martingale property of . Let us now assume that for some , (8) holds for all . Take , so that
[TABLE]
Combining Kolmogorov’s forward equation
[TABLE]
with (8), we deduce that for some , there is the inequality
[TABLE]
On the other hand, we know from Jensen’s inequality that
[TABLE]
and hence
[TABLE]
We conclude that
[TABLE]
So (8) also holds for and the proof is complete. ∎
We write for the terminal value of the martingale in Lemma 3 and for its overall supremum. We immediately deduce the following strong limit theorem for the number of birth events .
Corollary 2**.**
When , there is the convergence
[TABLE]
Further,
[TABLE]
Proof.
Writing
[TABLE]
yields the first claim by dominated convergence. The inequality of the second claim is obvious and the assertion that follows from Lemma 3 and Doob’s inequality. ∎
Remark 1**.**
In the case , is the standard Yule process, and for each , has thus the geometric distribution with parameter . Theoretically, the calculation in the proof of Lemma 3 allows us to compute inductively the entire moments of for any , and then also those of by standard methods. Because is stochastically dominated by a Yule process, the moment problem is determinate. So in theory, this approach enables to characterize the law of ; however I have not been able to get an explicit formula.
The final lemma of this section deals with the sub-critical case, and will provide the key to Theorem 1(iii).
Lemma 4**.**
When , is subcritical, a.s. and one has
[TABLE]
for .
Proof.
The first two assertions should be plain, and we just need to establish the displayed identity. To start with, we recall from Lamperti’s transformation that the integral coincides with the hitting time of [math] by the Lévy process .
The Laplace exponent of the Lévy process is for , meaning that . This function reaches its minimum at , with . It follows from a well-known formula for the Laplace transform of first-passage times (see e.g. Theorem 3.12 in [10]) that
[TABLE]
which establishes our claim. ∎
We now conclude this section by pointing at a connection with another branching process, namely of Crump-Mode-Jagers (in short, C-M-J) type. Recall that a C-M-J branching process is a model for the evolution of a population in continuous time, where individuals beget children according to independent copies of a point process on and locations of atoms are interpreted as birth of children. In other words, a C-M-J process can be constructed from a branching random walk on with reproduction law given by the distribution of , by viewing any atom, say located at , at any generation of the branching random walk, as an individual born at time in the C-M-J process. Then individuals alive at time in the C-M-J process correspond to locations of atoms in the branching random walk.
Now take for a Poisson point measure with intensity , and assign to each individual in the C-M-J process a size which decays exponentially with time. Specifically the size of an individual at age is , so that each individual in the C-M-J process begets children at rate given precisely by its current size. It follows that, if we further assume that the C-M-J process starts from a single ancestor at time [math], then the process describing the sum of sizes of individuals as a function of time, is a version of the CSBP . As a consequence, the process has the same law as the process of the number of individuals alive of the population in this C-M-J process. In this framework, the so-called Malthus exponent is readily identified with , and the martingale is known as the intrinsic martingale. We refer to [12, 8] which provide in particular a criterion for uniformly integrability of intrinsic martingales and demonstrate the importance of their roles in limit processes for C-M-J branching processes. Notably, the first part of Corollary 2 can also be derived from Theorem 5.4 in [12].
4 Proof of Theorem 1
We have now all the ingredients needed for the proof of Theorem 1. To start with, recall that for every ,
[TABLE]
denotes the number of different words which have been repeated exactly times when the string reaches the length . Recall also from Section 2 the definition of the law for any , and of the counting process of jumps . We now define a probability mass function on by
[TABLE]
With these notation at hand, we easily deduce Theorem 1(i) from Lemma 1 by first and second moment calculations,.
Proof of Theorem 1(i).
Let be a uniform random variable on independent of the weighted Simon’s model. For every , write , so that is uniformly distributed on and . Recall that for every , the probability that the -th word is new equals and that if the -th word is a repetition. We immediately deduce from Lemma 1(i) that
[TABLE]
Next, let be a second uniform random variable on , independent of and the weighted Simon’s model, and set . So is uniformly distributed on and independent of , and writing
[TABLE]
we deduce similarly from Lemma 1(ii) that
[TABLE]
which establishes our claim. ∎
The proof Theorem 1(ii-iii) relies on the CSBP introduced in Section 3; we henceforth let the parameter there be given by (4).
Proof of Theorem 1(ii).
Thanks to Corollary 1, we have
[TABLE]
where is a random variable independent of and with the standard exponential distribution.
Next define for every the variables
[TABLE]
and observe from Corollary 2 that for any , we have by dominated convergence
[TABLE]
We first treat the lower-bound. Fix . By the definition of , the event holds whenever
[TABLE]
and hence a fortiori whenever
[TABLE]
This yields the lower bound
[TABLE]
On the one-hand, recalling that is given by (4) and that is exponentially distributed and independent of , we get
[TABLE]
On the other hand, since for any (see Corollary 2), we have by the Markov’s inequality
[TABLE]
Putting the pieces together, we have shown that for any ,
[TABLE]
Letting and using (10), we conclude that
[TABLE]
We now turn our attention to the upper-bound and fix . By the definition of , there is the inclusion of events
[TABLE]
with
[TABLE]
Using the identity and then observing that
[TABLE]
we see that implies
[TABLE]
This yields the upper-bound
[TABLE]
By the same argument as for the lower-bound, we arrive at
[TABLE]
Letting and using (10), we conclude that
[TABLE]
∎
Remark 2**.**
The proof above of Theorem 1(ii) identifies the constant there as
[TABLE]
In this direction, recall that for , one has , so is simply the Yule process and follows the standard exponential distribution. This yields , which agrees with what was known for Simon’s original model.
Finally, the last part of Theorem 1 follows readily from Lemma 4.
Proof of Theorem 1(iii).
We take and write for the limit as of the counting process . We have from the definition of and the inequality that
[TABLE]
Since , we know from Corollary 1 and Lemma 4 that the law of under is the same as that of . Lemma 4 further shows that for any , and we conclude that
[TABLE]
∎
To conclude this work, we observe that the characterization (9) of the limiting mass probability function in terms of a branching process extends the fact that the Yule-Simon distribution can be realized as the law of a standard Yule process evaluated at an independent exponentially distributed random time with parameter . It is also interesting to stress that similar variables -but associated to different branching processes- also arise in [7], even though the framework of [7] and the present one seem to be rather different (in particular, the degree distribution plays a key role in there, whereas we here are rather concerned with the distribution of the total population size).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Krishna B. Athreya and Peter E. Ney. Branching processes . Springer-Verlag, New York-Heidelberg, 1972. Die Grundlehren der mathematischen Wissenschaften, Band 196.
- 2[2] Stefan Bornholdt and Holger Ebel. World Wide Web scaling exponent from Simon’s 1955 model. Phys. Rev. E , 64:035104, Aug 2001.
- 3[3] C. Cattuto, V. Loreto, and V. D. P. Servedio. A Yule-Simon process with memory. EPL (Europhysics Letters) , 76(2):208, 2006.
- 4[4] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of networks with aging of sites. Phys. Rev. E , 62:1842–1845, Aug 2000.
- 5[5] S.N. Dorogovtsev and J.F.F. Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW . Oxford University Press, 2003.
- 6[6] Rick Durrett. Random graph dynamics , volume 20 of Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge, 2007.
- 7[7] Alessandro Garavaglia, Remco van der Hofstad, and Gerhard Woeginger. The dynamics of power laws: fitness and aging in preferential attachment trees. J. Stat. Phys. , 168(6):1137–1179, 2017.
- 8[8] Peter Jagers. General branching processes as Markov fields. Stochastic Process. Appl. , 32(2):183–212, 1989.
