Variations of the elephant random walk
Allan Gut, Ulrich Stadtm\"uller

TL;DR
This paper investigates variations of the elephant random walk, focusing on models where the walker has limited memory, and extends the analysis to more general step sizes, providing new insights into memory-dependent random walks.
Contribution
It introduces and analyzes new models of elephant random walks with restricted memory and generalized step sizes, extending previous work on memory-dependent stochastic processes.
Findings
Characterization of walk behavior with limited memory
Extension to generalized step size models
Analogs of classical results for restricted-memory walks
Abstract
In the classical simple random walk the steps are independent, viz., the walker has no memory. In contrast, in the elephant random walk which was introduced by Sch\"utz and Trimper in 2004, the walker remembers the whole past, and the next step always depends on the whole path so far. Our main aim is to prove analogous results when the elephant has only a restricted memory, for example remembering only the most remote step(s), the most recent step(s) or both. We also extend the models to cover more general step sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Variations of the elephant random walk
Allan Gut
Uppsala University
Ulrich Stadtmüller
Ulm University
Abstract
In the classical simple random walk the steps are independent, viz., the walker has no memory. In contrast, in the elephant random walk, which was introduced by Schütz and Trimper [11] in 2004, the walker remembers the whole past, and the next step always depends on the whole path so far. Our main aim is to prove analogous results when the elephant has only a restricted memory, for example remembering only the most remote step(s), the most recent step(s) or both. We also extend the models to cover more general step sizes.
††footnotetext: AMS 2000 subject classifications. Primary 60F05, 60G50,; Secondary 60F15, 60J10.
Keywords and phrases. Elephant random walk, law of large numbers, asymptotic (non)normality, method of moments, difference equation, Markov chain.
Abbreviated title. Elephant random walk.
Date.
1 Introduction
In the classical simple random walk the steps are equal to plus or minus one and independent—, (). In this model the walker has no memory. This random walk is, in particular, Markovian. Motivated by applications, although interesting in its own right, is the case when the walker has some memory. The extreme case is, of course, when the walker has a complete memory, that is, when ”the next step” depends on the whole process so far. This so called elephant random walk (ERW) was introduced by Schütz and Trimper [11] in 2004, the name being inspired by the fact that elephants have a very long memory.
The first, more substantial, paper on elephant random walks is, to the best of our knowledge, Bercu’s paper [1], in which he proves a number of limit theorems. A main point is that there is a kind of phase transition at the point , which divides the problem into the diffusive regime, , the critical regime, , and the superdiffusive regime, , with somewhat different asymptotics.
A main device in his paper is the use of martingale theory due to the observation that a multiplicative scaling of the random walk constitutes a martingale.
Our main interest is the situation in which the elephant has only a limited memory, either that he or she remembers only some distant past, only a recent past or a mixture of both. No paper on exact results and proofs seems to exist, only simulations, in which case a given fraction of the distant/recent past is remembered; [12, 2, 10].
The first task in this direction is to consider the cases when the walker only remembers the first (two) step(s) or only the most recent (previous) step. In particular the latter case involves rather cumbersome computations and we therefore invite the reader(s) to try to push our results further. It should also be mentioned that the paper by Engländer and Volkov [4] is devoted to this latter case, although from a different angle, in that the next step is not generated by flipping a coin, rather by turning it over or not. They have a somewhat different focus, in particular, they consider the case with different -values in each step.
The cases with limited memory behave very differently mathematically in that some of the walks are still non-markovian others are markovian, but there is no convenient martingale around. Moreover there are no phase transitions in these cases.
A second point concerns the extension of (some of) Bercu’s results in [1] from the simple random walk to general sums, that is, to the case when the steps have an arbitrary distribution on the integers.
We begin by defining the various models in Section 2. After some preliminaries in Section 3, some results for general ERW:s are obtained in Section 4. Sections 5 and 6 are devoted to the distant past and Sections 8 and 9 to the recent past, respectively. These ”one-sided” memories are then followed up in Sections 10 and 11 where we consider mixed cases, that is, when the memory contains some early steps as well as some recent ones, after which we shortly discuss some different models. We close with a section containing some questions and remarks. For easier reading we collect some of the somewhat more lengthy (elementary and tedious) computations in the Appendix.
2 Background
The elephant random walk is defined as a simple random walk, where, however, the steps are not i.i.d. but dependent as follows. The first step equals 1 with probability and is equal to with probability . After steps, that is, at position , one defines
[TABLE]
where has a uniform distribution on the integers . With this means (formula (2.2) of [1]) that
[TABLE]
after which, setting , it turns out that is a martingale.
Our main aim is to extend these results to the case when the elephant has only a restricted memory, for example remembering only the most remote step(s) and/or the most recent one(s). A result in Section 4 allows us to conclude that our results remain true (suitably modified) also when the steps of the ERW:s follow a general distribution on the integers.
First in line is the case when the elephant only remembers the distant past, the most extreme one being when the memory is reduced to the first step only, viz.,
[TABLE]
Somewhat more sophisticated is when the memory covers the first two steps, in which case
[TABLE]
where .
Technically more complicted is when the elephant only remembers the recent past. Here we focus on the very recent past, which is the last step, that is,
[TABLE]
We begin, throughout, by assuming that , and generalize our findings in this setting (for simplicity) to the case . We denote our partial sums with , , when the first variable(s) is/are fixed and let be reserved for the case when they are random.
In order to move from to we also need to discuss the behavior of the walk when the initial value equals . However, in that case the evolution of the walk is the same except for the fact that the trend of the walk is reversed, viz., the corresponding walk equals the mirrored image in the time axis. This implies that the mean after steps equals , but the dynamics being the same, implies that the variance remains the same ( for a random variable ). In fact, the second moments of the walk remain the same. The same goes for higher order moments—odd moments equal the negative of those when , and even moments remain the same. In Sections 6 and 11 we depart from the assumption that and are fixed, and then the additional case has to be taken care of.
Finally, in order to avoid special effects we assume throughout that ; note that corresponds to for all , and the the case of alternating summands.
3 Some auxiliary material
For easier access of the arguments below we shortly present some auxiliary results from probability and analysis.
3.1 Disturbed limit distributions
The following (well-known) result (which is a special case of the Cramér–Slutsky theorem) will be used in order to go from a special case to a more general one.
Proposition 3.1
Let be a sequence of random variables and suppose that is independent of all of them. If as , then as .
Proof. Using characteristic functions and bounded convergence we have, as ,
[TABLE]
An application of the continuity theorem for characteristic functions finishes the proof.
3.2 Conditioning in case of a restricted memory
Let be an ERW, let denote the -algebras generated by the memory of the elephant and let stand for the full memory. We already know from (2.1) above that . Our aim is to establish analogs when the elephant has a restricted memory, that is, analogs for .
Toward the end, let , where the memory of the elephant. Then,
[TABLE]
that is, the conditional mean equals the average of the possible choices multiplied by the expected value of the sign; in analogy with (2.1).
If, for example, the elephant only remembers the most recent step, and means that he/she only remembers the first and the most recent steps; these are two cases that will be considered in the sequel. In these cases (3.1) states that
[TABLE]
respectively.
The next problem is when we condition on steps that are not contained in the memory. In words, if they do not, the elephant does not remember them, and, hence, cannot choose among them in a following step. More precisely, mathematically is defined as those steps in the past on which the elephant bases the next step. Technically, let be an arbitrary set of indices, such that . Then
[TABLE]
It follows, in particular, that
[TABLE]
and that
[TABLE]
This, and the fact that , will be useful several times for the computation of second moments as follows:
[TABLE]
3.3 Difference equations
In the proofs we use several difference equations. For convenience and easy reference we summarize here some well-known facts about linear difference equations that are used on and off.
Proposition 3.2
(i)* Consider the first order equation*
[TABLE]
Then
[TABLE]
If, in addition, and with , then
[TABLE]
(ii)* If, in particular, and , then*
[TABLE]
(iii)* Next is the homogeneous, second order equation*
[TABLE]
Then, with , provided ,
[TABLE]
(iv)* As for the inhomogeneous second order equation*
[TABLE]
we have , where is some solution of the inhomogeneous equation, where the constants in are chosen properly. If and we may choose .
3.4 Some notation
We use the standard to denote the distribution function with a jump of height one at . Constants and are always numerical constants that may change between appearances.
4 General elephant random walks
Let be an ERW, and suppose that is a random variable with distribution function that is independent of the walk. If as for some normalizing positive sequence as , and some random variable , it follows from Proposition 3.1 that as . An immediate consequence of this fact is that we can extend Theorems 3.1, 3.4 and (the first half of) Theorem 3.7 of [1] to cover more general step sizes. Namely, consider the ERW for which , and let the random variables , , be constructeded as in Section 2 with this special as starting point. Furthermore, let be a random variable, independent of , and consider , , and, hence, .
The following theorem (which reduces to the cited results of [1] if is a coin-tossing random variable), holds for :
Theorem 4.1
(a)* For , ;
(b) For , ;
(c) For , ,
where is a non-dgenerate random variable.*
As for convergence in distribution, we have to distinguish more carefully between the three cases.
Theorem 4.2
For we obtain
[TABLE]
Moreover, if , then and as .
Proof. As and are independent we find that
[TABLE]
by dominated convergence which yields the desired result.
The second part is immediate, since is independent of everything else.
Remark 4.1
*If with probabilities and , respectively, the limit distributions of and are the same, and we rediscover Theorem 3.3 of [1].
Remark 4.2
For the critical case, one similarly obtains, using [1], Theorem 3.6, that
[TABLE]
The supercritical case, , has a different evolution and no analogous result exists.* *
5 Remembering only the distant past 1
This turns out as being the easiest case, since convenient independence is inherent. We begin by assuming that the elephant only remembers the first step, i.e., that , and begin with the assumption that (recall that partial sums are denoted with the letter ). Then,
[TABLE]
and, hence,
[TABLE]
Moreover, applying (3.5) to we find that
[TABLE]
which, after telescoping, yields
[TABLE]
and, finally,
[TABLE]
A completely analogous calculation for characteristic functions, with an eye on (3.2), shows that
[TABLE]
after which telescoping tells us that
[TABLE]
after which a standard computation shows that
[TABLE]
Next we note that the computations so far prove that the increments are uncorrelated, suggesting independence … In fact, recalling that we have assumed that , we have, setting if , for , and 0 otherwise,
[TABLE]
for and different.
This means that the ERW coincides with the classical simple random walk, except for the fact that the first step is always equal to one. This is—after some thinking—rather obvious, because (in the language of [4]) we might interpret as a coin that we either flip or not before each new step. Hence we obtain:
Proposition 5.1
The strong law of large numbers, the central limit theorem, and the law of the iterated logarithm all hold for .
If, on the other hand, the first step is equal to , then, by symmetry, , the variance remains the same (recall the discussion toward the end of Section 2), and, again, by symmetry, normalized by is asymptotically normal and the SLLN and the LIL do hold again.
As a consequence, assuming that is a coin-tossing random variable, we are (asymptotically) confronted with two normal distributions, one for each of the two portions of the probability space. In fact, if we imagine the situation that is close to zero or one it is rather apparent how the very first step determines along which branch it will evolve.
One also notes, more formally, that , so that , implying that (and not of order ) as . Thus, an ordinary CLT is not valid, with the exception that if the two ”branches” determined by the first step collaps (asymptotically) into one, and we are ultimately faced with a classical simple symmetric random walk.
Hence, the following limit result is always available in the general case:
Theorem 5.1
*Let .Then,
(a) ;
(b) .*
Proof of (a). If we know from above that , and that . This tells us that, as . The conclusion follows.
Proof of (b). Immediate.
Remark 5.1
(i) An interpretation of the limit in (a) is that the random walk at hand, on average, behaves, asymptotically, like a coin-tossing random variable with values at the points .
(ii) An alternative way of phrasing the conclusion of the theorem is that*
[TABLE]
However, if we use a random normalization we obtain the following result:
Theorem 5.2
*Let .Then,
(a) ;
(b) ;
(c) *
Proof of (a). We use the fact that
[TABLE]
together with Theorem 4.2 and its Remark 4.1.
Alternatively, one may condition on the value of . This procedure will be exploited in the proof of Theorem 6.2 in the next section.
Proof of (b) and (c). Define and . After renormalization the original probability measure will be a probability measure on . Based on this measure on we obtain an SLLN and an LIL for . Similarly on . Combining them yields the desired result.
Remark 5.2
The strong law can also be formulated with a random RHS:
[TABLE]
Remark 5.3
If is a general random variable with distribution having no mass at zero, then**
[TABLE]
A special case is, once again, :
Corollary 5.1
If , then
[TABLE]
6 Remembering only the distant past 2
In this section we begin by assuming that the elephant only remembers the first two steps, so that , and suppose that . Then, for ,
[TABLE]
for all , and, hence,
[TABLE]
(since ). Extending the idea from the previous section that the walk evolves as an ordinary simple random walk beginning at the third step, a natural guess is that
[TABLE]
To see this we first observe that , that is, the formula is correct for . Assuming it is correct for we have
[TABLE]
since by (3.4) and the fact that ,
[TABLE]
Next, by modifying the computations involving the characteristic function from Section 5, we obtain
[TABLE]
By continuing as before one obtains, after proper centering, a limiting normal distribution for these initial -values. Similarly for the other ones. But, only for each ”branch” separately. One can also ascertain that the variance is not linear if we assume random beginnings. Except, as before, when and the three main limit theorems (SLLN, CLT, LIL) hold (as in Corollary 5.1).
The following analog of Theorem 5.1 holds in the general case (as one might expect):
Theorem 6.1
*Let . Then
(a) ;
(b) E(S_{n}/n)\to p(2p-1)^{2}\quad\mbox{ and that}\quad\mathrm{Var\,}(S_{n}/n)\to p(1-p)(2p-1)^{2}\big{(}4p^{2}+1\big{)}.*
Proof of (a). If we know from above that , and that . Moreover, whenever and have different signs. The variance remains the same (with ). This, together with the fact that , , and helps us to finish the proof of the first part. Part (b) follows.
Remark 6.1
(i) In analogy with Remark 5.1 we have the interpretation that the elephant, asymptotically, on average, performs a random walk on the points and [math].
(ii) Mimicing Remark 5.1 we may rewrite the conclusion of the theorem is*
[TABLE]
Once again random normalization produces further limit results:
Theorem 6.2
*Let .Then,
(a) ;
(b) ;
(c)
where *
Proof of (a). Conditioning on the value of we obtain
[TABLE]
Parts (b) and (c) follow along the lines of the proof of Theorem 5.1.
7 The distant past; higher order
If one remembers the first random variables for some , the following obvious extension of the above results emerges.
Theorem 7.1
For , r_{k}=\big{(}(m-k)p+k(1-p)\big{)}/m, and , where and ,
[TABLE]
and
[TABLE]
Proof. As before we write
[TABLE]
and observe that in each conditional case we have a random walk with the appropriate success probabilities, i.e., for the sucess probability is r_{k}=\big{(}(m-k)p+k(1-p)\big{)}/m, and, hence, the expectation is .
Remark 7.1
*(i) The probabilities at the jumps are relatively complicated and therefore not expressed in detail, but and .
(ii) A more detailed analysis shows that the probability mass of the limit distribution of concentrates near zero as increases.*
(iii) One easily checks that the variance for each ”branch” equals , which, in turn, is dominated by , which, consequently, tells us that the analog of Theorems 5.1 and 6.1 holds.
(iv) Once again, the case is special as described in the two previous sections. *
8 Remembering only the recent past 1
This situation is much more complex, because, even though one remembers only recent steps, the path depends on the whole history so far (some remarks on that will be given in Subsection 12.3). Once again we begin by assuming that the elephant only remembers the very last step, which means that . This setting is reminiscent of [4], where one turns over a coin instead of tossing it. The main focus there, however, is on different -values at each step and, e.g., how this may affect phase transitions and behavior at critical values.
We begin, as always, by assuming that . Then, , and
[TABLE]
for all . By iterating this it follows that for, ,
[TABLE]
and
[TABLE]
For the second moment we have, by (3.5) and (3.2),
[TABLE]
For the middle term we obtain by (3.2),
[TABLE]
which in turn, after iteration, yields
[TABLE]
Now we can calculate the second moment:
[TABLE]
By telescoping we obtain
[TABLE]
which implies the following formula for the asymptotic variance:
[TABLE]
Noticing that and that , a glance at (8.1) and (8.2) shows that and that as , suggesting the following result:
Theorem 8.1
For ,
[TABLE]
Our next task is to apply the method of moments in order to prove that this is indeed true. We thus wish to prove that
[TABLE]
This amounts to lengthy computations of various higher order mixed moments. The reason for this is that higher order moments of can be expressed as linear combinations of lower order moments of and with the aid of the binomial theorem.
Convergence of mean and variance has already been established above. For higher order moments we use induction.
Throughout in the following, , with our without an index, are numerical constants which may differ from line to line and are quantities of smaller order than the leading term.
Lemma 8.1
For we have, as ,
[TABLE]
where denotes individual remainder terms.
The proof of the lemma amounts to extending the above computations for mean and variance to higher order variants and is deferred to the Appendix, Subsection A.1.
Proof of Theorem 8.1. As already mentioned, the proof exploits the method of moments. For the lemma tells us that
[TABLE]
which verifies (8.3). For we recall from the end of Section 3 that even moments remain the same and that odd moments are the same except for a change of sign, which yields the same conclusion. The limit result for then follows as in Theorem 4.2.
Remark 8.1
The sequence is a stationary recurrent Markov chain with finite state space which, hence, is uniformly ergodic. The asymptotic normality of therefore also follows from a CLT for Markov chains, see, e.g., Corollary 5 of [8] (cf. also [7], Theorem 19.1.)* *
The Markov property also provides a strong law.
Theorem 8.2
We have
[TABLE]
Proof. The stationary distribution of the ergodic Markov chain is , which has expectation zero. An application of Theorem 6.1 in [3] yields the conclusion.
9 Remembering only the recent past 2
In this section we assume that the elephant remembers the two most recent steps, that is, at time the next step is based on the steps and . The computations are as before, although more elaborate. We have, as always, , ,
[TABLE]
and, for ,
[TABLE]
Computing the moments one obtains the following result. For the proof we refer to the Appendix, Subsection A.2.
Lemma 9.1
As ,
[TABLE]
The expectation of tends to zero geometrically fast.
Remark 9.1
For the process reduces, as usual, to a simple symmetric random walk.* *
For the following limit theorems we lean on the Markov property (and invite the reader to try the moment method).
Theorem 9.1
We have
[TABLE]
Proof. The sequence now forms a Markov chain of order two. Theorem 6.1 in [3] yields the strong law, and the results in [5], Section 3, or [6], combined with Corollary 5 of [8], yield the asymptotic normality with the moments as calculated above.
Remark 9.2
If we suppose that the elephant remembers a fixed but finite number, say, of the most recent steps, the sequence of steps forms a Markov chain of order , and we obtain, by (basically) the same arguments as above that will be asymptotically normal (a Markov chain of order can be considered as a -dimensional Markov chain and use e.g. [6]).* *
10 Remembering the distant as well as the recent past 1
Next we consider the case when the elephant has a clear memory of the early steps as well as the very recent ones.
One can think of a(n old) person who remembers the early childhood and events from the last few days but nothing in between. The most elementary case is , for all . Following the approach of earlier variants we begin by assuming that . Then, for ,
[TABLE]
and
[TABLE]
Exploiting Proposition 3.2(i) we obtain, for ,
[TABLE]
and, hence, that
[TABLE]
Next we note that , and, by (3.5), that, for ,
[TABLE]
In order to establish a difference equation for the second moment we first have to compute the mixed moment. For the computational details we refer to Appendix A.3 and obtain (formula (A.7)),
[TABLE]
Joining the expressions for the first two moments, finally, tells us that the variance is linear in :
[TABLE]
where
[TABLE]
Given the expressions for mean and variance, a weak law is immediate:
[TABLE]
In analogy with our earlier results this suggets that is asymptotically normal. That this is, indeed, the case follows from the fact that is, once again, a uniformly ergodic Markov chain, since the only random piece from the past is the previous step. We may thus apply Corollary 5 of [8] (cf. also [7], Theorem 19.1) to conclude that is asymptotically normal with mean zero and variance , with as defined in (10.5), which, in view of (10.2), establishes that
[TABLE]
An appeal to the disussion at the end of Section 2 now allows us to conclude that
[TABLE]
which tells us that
[TABLE]
Furthermore, in analogy to Theorem 5.1, we arrive at the following asymptotic distributional behavior of :
Theorem 10.1
We have
[TABLE]
Moreover, for all , since for all .
Remark 10.1
Comparing this with Theorem 5.1 we see that the jump points are closer together here. This can be explained by the fact that the current random variables are less dependent than those in Section 5.* *
Finally, by combining (10.7) with the obvious analog for the case , asymptotic normality follows with a random centering:
Theorem 10.2
We have
[TABLE]
Proof. We first note that it follows from the discussion following (10.7) that the CLT there remains true when with a replacing the in the numerator. We thus may argue as in the proof of Theorem 5.1, via the fact that
[TABLE]
Alternatively, condition on the value of and proceed as in the proof of Theorem 6.2.
11 Remembering the recent as well as the distant past 2
In this section we extend the previous one in that we assume that , for all . Following the approach of earlier variants we begin by assuming that . Then , and, for ,
[TABLE]
Exploiting Proposition 3.2(i) yields
[TABLE]
and, hence,
[TABLE]
As for second moments, , , , and, generally, that,
[TABLE]
Concerning the mixed moments and other details we refer to Appendix A.4, from which we obtain
[TABLE]
The variance, finally, turns out as
[TABLE]
Following the path of the previous section we now immmediately obtain a weak law:
[TABLE]
It remains to consider the general case with arbitrary and . There is a slight change here from the previous section. Namely, we first have the case when , for which the arguments from the previous section carry over without change, that is, the mean equals and the second moment equals . However, now we also have a mixed case which behaves somewhat differently.
Namely, consider the case when the first two summands are not equal; , . Then,
[TABLE]
and, for ,
[TABLE]
from which we conclude that, for ,
[TABLE]
For the calculation of the second moment we refer again to Appendix A.4 and find that
[TABLE]
where the last equality is due to the fact that . The weak law now runs slightly differently, in that
[TABLE]
We note in passing that the mean is linear in and that the second moment is of order when the first two summands are equal, whereas the mean is zero and the second moment is linear in when they are not. However, the variance is linear in in all cases.
As for central limit theorems, the main arguments are the same as in Section 10, in that
[TABLE]
for the cases and , respectively, and
[TABLE]
when the first two summands are unequal.
Switching to moments of , using , and for the three cases, we obtain,
[TABLE]
Collecting the various pieces tells us that
[TABLE]
Finally, by modifying our earlier results of this kind, one ends up as follows:
Theorem 11.1
We have
[TABLE]
Morevover, for all , since for all .
We finally wish to combine the asymptotic normality for the three different beginnings of the process in order to arrive at a limit theorem for the -process. This works (in theory) the same way as in Section 10. However, there is a problem with the variance. Namely, in Theorem 10.2 both cases had the same variance, whereas here the variance, when and are equal, is not the same as when they are different. Nevertheless, here is the result.
Theorem 11.2
We have
[TABLE]
with as given in (11.4).
Proof. The conclusion follows by conditioning on the value of , and proceeding as in the proof of Theorem 6.2.
12 Miscellania
We close by mentioning some further specific models and by describing some problems and challenges for further research.
12.1 More on restricted memories
(i) The next logical step would be to check the case when . By modifying the computations in Appendix A.2, setting and , we find that
[TABLE]
after which Proposition 3.2(iv), and a glance at the computations in Appendix A.2, tell us that
[TABLE]
where , with , , defined in Appendix A.2, and it follows that
[TABLE]
If , then, with and , one similarly obtains that
[TABLE]
In fact, theoretically it is possible to obtain results of the above kind for any fixed number of early and/or late memory steps.
(ii) A more subtle case is when the number of memory steps depends on , such as or .
(iii) Another model is when the elephant remembers everything except the first step, more generally, the elephant remembers all but the first steps for some . Set , , and , and let . Then
[TABLE]
where . With
[TABLE]
one can, as in [1], show that is a martingale. From the same paper it follows, provided that , that
[TABLE]
which implies that
[TABLE]
The quantity depends on the construction used for the steps .
Other cases one might think of is when the memory covers everything except
- •
the last steps;
- •
the first steps and the last steps;
- •
the first steps and or the last steps for some ;
- •
the first steps and or the last steps for some ;
- •
the first steps and or the last steps for some ;
- •
and so on, aiming at more general (final) results.
12.2 Phase transition
The results of Bercu [1] show that for the full memory one has a phase transition at . There is no such thing in our results. An obvious, as well as interesting, question would be to find the breaking point. There exist some papers on this topic using simulations, see e.g., [12, 2, 10] and further papers cited therein, but we are not aware of any theoretical results concerning this matter.
12.3 Remembering the first vs. the last step
There is a fundamental difference in behavior in these extreme cases, it is not just a matter of recalling some earlier step. Namely, it is a matter of comparing
[TABLE]
with
[TABLE]
In order to see the difference more clearly, let us imagine that is close to one.
In the first case every new step equals most likely the first one, that is, a typical path will then constist of an overwhelming amout of :s interfoliated by an occasional . In the second case every new step equals most likely the most recent one, that is, a typical path will constist of an overwhelming amout of :s followed by an overwhelming amount of :s, followed by …., that is alternating long stretches of the same kind.
Moreover, since, in the first case, every new step is a function of just the first one, the independence structure does not come as a surprise, whereas in the second case the next step depends on the previous one, which in turn depends on its previous one, etcetera, which implies that the next step, in fact, depends on the whole path so far.
12.4 Final Remarks
(i) We have seen that the more the elephant remembers the cumbersomer become the computations. However, once again, in theory it would be possible to compute higher order moments and thus, e.g., use the moment method to prove limit theorems.
(ii) By using the device from Section 4 one can extend all limit theorems for ERW:s to the case with general steps.
Appendix A Appendix
In this appendix we collect more technical calculations.
A.1 Proof of Lemma 8.1
Recall that even powers of are always equal to 1, and, moreover, that = if is odd. One consequence of this is the following fact that will be used repeatedly below:
[TABLE]
As mentioned in connection with the statement of Theorem 8.1 we use induction. We thus assume that we know that the moments up to order converge properly, in particular we may choose so large that for , which, by symmetry, inplies that , for and , and for , for some small (recall that are the moments of the standard normal distribution as given in (8.3)).
Proof of (8.4).
[TABLE]
Taking expectations on either side yields
[TABLE]
Exploiting (A.1) yields a bound for the remainder:
[TABLE]
By iterating (A.2) we then obtain that
[TABLE]
Proof of (8.5).
[TABLE]
Taking expectations on either side yields
[TABLE]
The estimation of the remainder is the same as above. The remaining part of the proof follows the exact same lines and is therefore omitted.
Having estimates for the mixed moments we are now able to attack the ”pure” moments. This will be done without explicit mentioning. Moreover, the estimates for the remainders, are, again, the same.
Proof of (8.6).
[TABLE]
Taking expectations on either side yields
[TABLE]
by the induction hypothesis. Summing up the differences leads to the desired result.
Proof of (8.7).
[TABLE]
Taking expectations on either side yields
[TABLE]
by the induction hypothesis. Summing up, finally, leads to the desired result with .
A.2 Proof of Lemma 9.1
Set . Then,
[TABLE]
For we have
[TABLE]
With (note that ) this difference equation, with the two starting values and , has, for , the solution
[TABLE]
For we have , but the solution is still real. Next,
[TABLE]
The second moment is more tedious. We begin with
[TABLE]
and obtain
[TABLE]
As for the mixed moments,
[TABLE]
By the usual trick we find
[TABLE]
With we find that
[TABLE]
from which it follows that , the stationary solution.
Next,
[TABLE]
We finally arrive, recalling (A.4), at
[TABLE]
and thus, via telescoping, at
[TABLE]
A.3 Calculation of second moments in Section 10
We first note that , and, by (3.5), that, for ,
[TABLE]
At this point we have to pause and compute the mixed moments: We first note that , and that
[TABLE]
so that
[TABLE]
For we exploit (3.4), (10.2), and the fact that , to obtain
[TABLE]
Another application of Proposition 3.2(i) then tells us that
[TABLE]
Hence, using (A.5), we obtain
[TABLE]
after which we, via telescoping, obtain that
[TABLE]
A.4 Calculation of second moments in Section 11
The point of departure in this case is (11.3), viz.,
[TABLE]
For the mixed moments we use (3.4):
[TABLE]
We thus find, using (11.2), that for ,
[TABLE]
Invoking Proposition 3.2(i) then tells us that
[TABLE]
which, inserted into (A.8), yields
[TABLE]
and, after summation,
[TABLE]
We, finally, turn our attention to the second moment for the case when , where, again, the mixed moment is first in focus. Now, , , and . For we follow the usual pattern. Due to the fact that the mean is zero, an application of (3.4) now yields
[TABLE]
which, together with Proposition 3.2(i), tells us that
[TABLE]
Moving into second moments, , , and . For we insert our findings in (A.11) into (A.8):
[TABLE]
so that, via telescoping,
[TABLE]
Acknowledgement
The results of this paper were initiated during U.S.’s visit in Uppsala in May 2018. U.S. wants to thank for the kind hospitality and we both wish to thank Kungliga Vetenskapssamhället i Uppsala for financial support.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bercu, B. (2018). A martingale approach for the elephant random walk. J. Phys. A: Math. Theor. 81 , 015201.
- 2[2] Cressoni, J.C., da Silva, M.A.A., and Viswanathan, G.M. (2007). Amnestically induced persistence in random walks J. Phys. A.: Math. Theor. 46 , 505002.
- 3[3] Doob, J.L. (1953). Stochastic Processes . J. Wiley & Sons, New-York.
- 4[4] Engländer, J., and Volkov, S. (2018). Turning a coin over instead of tossing it. J. Theor. Probab. 31 , 1097-1118.
- 5[5] Herkenrath, U. (2003). A new approach to Markov processes of order 2. Ann. Univ. Craiova, Math. Comp. Sci. Ser. 30 , 106-115.
- 6[6] Herkenrath, U., Iosifescu, M., and Rudolph, A. (2003) A note on invariance principles for iterated random functions. J. Appl. Probab. 40 , 834–837.
- 7[7] Ibragimov, I.A., and Linnik, Y.V. (1971). Independent and Stationary Sequences of Random Variables. Wolters–Noordhof, Groningen.
- 8[8] Jones, G.L. (2004). On the Markov chain central limit theorem. Prob. Surveys 1 , 299-320.
