Variations of the elephant random walk

Allan Gut; Ulrich Stadtm\"uller

arXiv:1812.01915·math.PR·June 22, 2023·J. Appl. Probab.

Variations of the elephant random walk

Allan Gut, Ulrich Stadtm\"uller

PDF

TL;DR

This paper investigates variations of the elephant random walk, focusing on models where the walker has limited memory, and extends the analysis to more general step sizes, providing new insights into memory-dependent random walks.

Contribution

It introduces and analyzes new models of elephant random walks with restricted memory and generalized step sizes, extending previous work on memory-dependent stochastic processes.

Findings

01

Characterization of walk behavior with limited memory

02

Extension to generalized step size models

03

Analogs of classical results for restricted-memory walks

Abstract

In the classical simple random walk the steps are independent, viz., the walker has no memory. In contrast, in the elephant random walk which was introduced by Sch\"utz and Trimper in 2004, the walker remembers the whole past, and the next step always depends on the whole path so far. Our main aim is to prove analogous results when the elephant has only a restricted memory, for example remembering only the most remote step(s), the most recent step(s) or both. We also extend the models to cover more general step sizes.

Equations310

X_{n + 1} = {+ X_{K}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{K}, \mbox w i t h p r o babi l i t y 1 - p,

X_{n + 1} = {+ X_{K}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{K}, \mbox w i t h p r o babi l i t y 1 - p,

E (X_{n + 1} ∣ G_{n}) = (2 p - 1) \cdot \frac{S _{n}}{n},

E (X_{n + 1} ∣ G_{n}) = (2 p - 1) \cdot \frac{S _{n}}{n},

X_{n + 1} = {+ X_{1}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{1}, \mbox w i t h p r o babi l i t y 1 - p .

X_{n + 1} = {+ X_{1}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{1}, \mbox w i t h p r o babi l i t y 1 - p .

X_{n + 1} = {+ X_{K}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{K}, \mbox w i t h p r o babi l i t y 1 - p,

X_{n + 1} = {+ X_{K}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{K}, \mbox w i t h p r o babi l i t y 1 - p,

X_{n + 1} = {+ X_{n}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{n}, \mbox w i t h p r o babi l i t y 1 - p .

X_{n + 1} = {+ X_{n}, \mbox w i t h p r o babi l i t y p \in [0, 1], - X_{n}, \mbox w i t h p r o babi l i t y 1 - p .

φ_{U_{n} V} (t)

φ_{U_{n} V} (t)

E (X_{n + 1} ∣ F_{n}) = p \cdot i \in I_{n} \sum \frac{1}{∣ I _{n} ∣} X_{i} + (1 - p) \cdot i \in I_{n} \sum \frac{1}{∣ I _{n} ∣} (- X_{i}) = (2 p - 1) \cdot \frac{\sum _{i \in I_{n}} X _{i}}{∣ I _{n} ∣},

E (X_{n + 1} ∣ F_{n}) = p \cdot i \in I_{n} \sum \frac{1}{∣ I _{n} ∣} X_{i} + (1 - p) \cdot i \in I_{n} \sum \frac{1}{∣ I _{n} ∣} (- X_{i}) = (2 p - 1) \cdot \frac{\sum _{i \in I_{n}} X _{i}}{∣ I _{n} ∣},

E (X_{n + 1} ∣ F_{n}) = (2 p - 1) X_{n} \mbox an d E (X_{n + 1} ∣ F_{n}) = (2 p - 1) \frac{X _{1} + X _{n}}{2},

E (X_{n + 1} ∣ F_{n}) = (2 p - 1) X_{n} \mbox an d E (X_{n + 1} ∣ F_{n}) = (2 p - 1) \frac{X _{1} + X _{n}}{2},

E (X_{n + 1} ∣ σ {I_{n} \cup I}) = E (X_{n + 1} ∣ F_{n}) = (2 p - 1) \frac{\sum _{i \in I_{n}} X _{i}}{∣ I _{n} ∣} .

E (X_{n + 1} ∣ σ {I_{n} \cup I}) = E (X_{n + 1} ∣ F_{n}) = (2 p - 1) \frac{\sum _{i \in I_{n}} X _{i}}{∣ I _{n} ∣} .

E (X_{n + 1} ∣ G_{n}) = (2 p - 1) \frac{\sum _{i \in I_{n}} X _{i}}{∣ I _{n} ∣},

E (X_{n + 1} ∣ G_{n}) = (2 p - 1) \frac{\sum _{i \in I_{n}} X _{i}}{∣ I _{n} ∣},

\displaystyle E(S_{n}X_{n+1}\mid{\cal G}_{n})=S_{n}E\big{(}X_{n+1}\mid{\cal G}_{n}\big{)}=S_{n}(2p-1)\frac{\sum_{i\in I_{n}}X_{i}}{|I_{n}|}.

\displaystyle E(S_{n}X_{n+1}\mid{\cal G}_{n})=S_{n}E\big{(}X_{n+1}\mid{\cal G}_{n}\big{)}=S_{n}(2p-1)\frac{\sum_{i\in I_{n}}X_{i}}{|I_{n}|}.

E (S_{n + 1}^{2})

E (S_{n + 1}^{2})

x_{n + 1} = a x_{n} + b_{n}, \mbox f or n \geq 1, \mbox w i t h x_{1}^{*} \mbox g i v e n .

x_{n + 1} = a x_{n} + b_{n}, \mbox f or n \geq 1, \mbox w i t h x_{1}^{*} \mbox g i v e n .

x_{n} = a^{n - 1} x_{1}^{*} + ν = 0 \sum n - 2 a^{ν} b_{n - 1 - ν} .

x_{n} = a^{n - 1} x_{1}^{*} + ν = 0 \sum n - 2 a^{ν} b_{n - 1 - ν} .

x_{n}=\frac{b_{n-1}}{1-a}-\frac{\gamma ab_{n-1}}{n(1-a)^{2}}\big{(}1+o(1)\big{)}\quad\mbox{ as}\quad n\to\infty.

x_{n}=\frac{b_{n-1}}{1-a}-\frac{\gamma ab_{n-1}}{n(1-a)^{2}}\big{(}1+o(1)\big{)}\quad\mbox{ as}\quad n\to\infty.

x_{n}=\frac{b}{1-a}+a^{n-1}\big{(}x_{1}^{*}-\frac{b}{1-a}\big{)}=\frac{b}{1-a}\big{(}1+o(1)\big{)}\quad\mbox{ as}\quad n\to\infty.

x_{n}=\frac{b}{1-a}+a^{n-1}\big{(}x_{1}^{*}-\frac{b}{1-a}\big{)}=\frac{b}{1-a}\big{(}1+o(1)\big{)}\quad\mbox{ as}\quad n\to\infty.

x_{n + 1} = a x_{n} + b x_{n - 1}, \mbox f or n \geq 2, \mbox w i t h x_{1}^{*}, x_{2}^{*} \mbox g i v e n .

x_{n + 1} = a x_{n} + b x_{n - 1}, \mbox f or n \geq 2, \mbox w i t h x_{1}^{*}, x_{2}^{*} \mbox g i v e n .

x_{n}^{h} = c_{1} λ_{1}^{n} + c_{2} λ_{2}^{n} \mbox w i t h c_{1}, c_{2} \mbox c h ose n s u c h t ha t x_{i}^{h} = x_{i}^{*} \mbox f or i = 1, 2 .

x_{n}^{h} = c_{1} λ_{1}^{n} + c_{2} λ_{2}^{n} \mbox w i t h c_{1}, c_{2} \mbox c h ose n s u c h t ha t x_{i}^{h} = x_{i}^{*} \mbox f or i = 1, 2 .

x_{n + 1} = a x_{n} + b x_{n - 1} + d_{n}, \mbox f or n \geq 2, \mbox w i t h x_{1}^{*}, x_{2}^{*} \mbox g i v e n,

x_{n + 1} = a x_{n} + b x_{n - 1} + d_{n}, \mbox f or n \geq 2, \mbox w i t h x_{1}^{*}, x_{2}^{*} \mbox g i v e n,

\frac{S _{n}}{n} \to d \int_{R \ {0}} N_{0, \frac{1}{3 - 4 p}} (\cdot /∣ t ∣) d F_{R} (t) + P (R = 0) \cdot δ_{[0, \infty)} (\cdot) \mbox a s n \to \infty.

\frac{S _{n}}{n} \to d \int_{R \ {0}} N_{0, \frac{1}{3 - 4 p}} (\cdot /∣ t ∣) d F_{R} (t) + P (R = 0) \cdot δ_{[0, \infty)} (\cdot) \mbox a s n \to \infty.

\displaystyle P\Big{(}\frac{S_{n}}{\sqrt{n}}\leq x\Big{)}

\displaystyle P\Big{(}\frac{S_{n}}{\sqrt{n}}\leq x\Big{)}

\frac{S _{n}}{n lo g n} \to d \int_{R \ {0}} N_{0, 1} (\cdot /∣ t ∣) d F_{R} (t) + P (R = 0) \cdot δ_{[0, \infty)} (\cdot) \mbox a s n \to \infty.

\frac{S _{n}}{n lo g n} \to d \int_{R \ {0}} N_{0, 1} (\cdot /∣ t ∣) d F_{R} (t) + P (R = 0) \cdot δ_{[0, \infty)} (\cdot) \mbox a s n \to \infty.

E (X_{n + 1} ∣ F_{n}) = E (X_{n + 1} ∣ X_{1}) = (2 p - 1) \cdot 1 = E (X_{n + 1}) \mbox f or a l l n \geq 1,

E (X_{n + 1} ∣ F_{n}) = E (X_{n + 1} ∣ X_{1}) = (2 p - 1) \cdot 1 = E (X_{n + 1}) \mbox f or a l l n \geq 1,

E (T_{n + 1} ∣ F_{n}) = 1 + n (2 p - 1) = E (T_{n + 1}) .

E (T_{n + 1} ∣ F_{n}) = 1 + n (2 p - 1) = E (T_{n + 1}) .

E (T_{n + 1}^{2})

E (T_{n + 1}^{2})

E(T_{n+1}^{2})=1+(2p-1)^{2}n(n+1)+\big{(}4(2p-1)(1-p)+1\big{)}n,

E(T_{n+1}^{2})=1+(2p-1)^{2}n(n+1)+\big{(}4(2p-1)(1-p)+1\big{)}n,

\mathrm{Var\,}(T_{n+1})=\big{(}1-(2p-1)^{2})\big{)}n=4p(1-p)n.

\mathrm{Var\,}(T_{n+1})=\big{(}1-(2p-1)^{2})\big{)}n=4p(1-p)n.

φ_{T_{n + 1}} (t)

φ_{T_{n + 1}} (t)

\varphi_{T_{n+1}}(t)=\big{(}pe^{it}+(1-p)e^{-it}\big{)}^{n}\cdot e^{it},

\varphi_{T_{n+1}}(t)=\big{(}pe^{it}+(1-p)e^{-it}\big{)}^{n}\cdot e^{it},

φ_{(T_{n} - n (2 p - 1)) / n} (t) \to exp {- 4 p (1 - p) t^{2}} \mbox a s n \to \infty.

φ_{(T_{n} - n (2 p - 1)) / n} (t) \to exp {- 4 p (1 - p) t^{2}} \mbox a s n \to \infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Variations of the elephant random walk

Allan Gut

Uppsala University

Ulrich Stadtmüller

Ulm University

Abstract

In the classical simple random walk the steps are independent, viz., the walker has no memory. In contrast, in the elephant random walk, which was introduced by Schütz and Trimper [11] in 2004, the walker remembers the whole past, and the next step always depends on the whole path so far. Our main aim is to prove analogous results when the elephant has only a restricted memory, for example remembering only the most remote step(s), the most recent step(s) or both. We also extend the models to cover more general step sizes.

††footnotetext: AMS 2000 subject classifications. Primary 60F05, 60G50,; Secondary 60F15, 60J10.

Keywords and phrases. Elephant random walk, law of large numbers, asymptotic (non)normality, method of moments, difference equation, Markov chain.

Abbreviated title. Elephant random walk.

Date.

1 Introduction

In the classical simple random walk the steps are equal to plus or minus one and independent— $P(X=1)=1-P(X=-1)=p$ , ( $0<p<1$ ). In this model the walker has no memory. This random walk is, in particular, Markovian. Motivated by applications, although interesting in its own right, is the case when the walker has some memory. The extreme case is, of course, when the walker has a complete memory, that is, when ”the next step” depends on the whole process so far. This so called elephant random walk (ERW) was introduced by Schütz and Trimper [11] in 2004, the name being inspired by the fact that elephants have a very long memory.

The first, more substantial, paper on elephant random walks is, to the best of our knowledge, Bercu’s paper [1], in which he proves a number of limit theorems. A main point is that there is a kind of phase transition at the point $p=P(X=1)=3/4$ , which divides the problem into the diffusive regime, $0\leq p<3/4$ , the critical regime, $p=3/4$ , and the superdiffusive regime, $3/4<p\leq 1$ , with somewhat different asymptotics.

A main device in his paper is the use of martingale theory due to the observation that a multiplicative scaling of the random walk constitutes a martingale.

Our main interest is the situation in which the elephant has only a limited memory, either that he or she remembers only some distant past, only a recent past or a mixture of both. No paper on exact results and proofs seems to exist, only simulations, in which case a given fraction of the distant/recent past is remembered; [12, 2, 10].

The first task in this direction is to consider the cases when the walker only remembers the first (two) step(s) or only the most recent (previous) step. In particular the latter case involves rather cumbersome computations and we therefore invite the reader(s) to try to push our results further. It should also be mentioned that the paper by Engländer and Volkov [4] is devoted to this latter case, although from a different angle, in that the next step is not generated by flipping a coin, rather by turning it over or not. They have a somewhat different focus, in particular, they consider the case with different $p$ -values in each step.

The cases with limited memory behave very differently mathematically in that some of the walks are still non-markovian others are markovian, but there is no convenient martingale around. Moreover there are no phase transitions in these cases.

A second point concerns the extension of (some of) Bercu’s results in [1] from the simple random walk to general sums, that is, to the case when the steps have an arbitrary distribution on the integers.

We begin by defining the various models in Section 2. After some preliminaries in Section 3, some results for general ERW:s are obtained in Section 4. Sections 5 and 6 are devoted to the distant past and Sections 8 and 9 to the recent past, respectively. These ”one-sided” memories are then followed up in Sections 10 and 11 where we consider mixed cases, that is, when the memory contains some early steps as well as some recent ones, after which we shortly discuss some different models. We close with a section containing some questions and remarks. For easier reading we collect some of the somewhat more lengthy (elementary and tedious) computations in the Appendix.

2 Background

The elephant random walk is defined as a simple random walk, where, however, the steps are not i.i.d. but dependent as follows. The first step $X_{1}$ equals 1 with probability $r\in[0,1]$ and is equal to $-1$ with probability $1-r$ . After $n$ steps, that is, at position $S_{n}=\sum^{n}_{k=1}X_{k}$ , one defines

[TABLE]

where $K$ has a uniform distribution on the integers $1,2,\ldots,n$ . With ${\cal G}_{n}=\sigma\{X_{1},X_{2},\ldots,X_{n}\}$ this means (formula (2.2) of [1]) that

[TABLE]

after which, setting $a_{n}=\Gamma(n)\cdot\Gamma(2p)/\Gamma(n+2p-1)$ , it turns out that $\{M_{n}=a_{n}S_{n},\,n\geq 1\}$ is a martingale.

Our main aim is to extend these results to the case when the elephant has only a restricted memory, for example remembering only the most remote step(s) and/or the most recent one(s). A result in Section 4 allows us to conclude that our results remain true (suitably modified) also when the steps of the ERW:s follow a general distribution on the integers.

First in line is the case when the elephant only remembers the distant past, the most extreme one being when the memory is reduced to the first step only, viz.,

[TABLE]

Somewhat more sophisticated is when the memory covers the first two steps, in which case

[TABLE]

where $P(K=1)=P(K=2)=1/2$ .

Technically more complicted is when the elephant only remembers the recent past. Here we focus on the very recent past, which is the last step, that is,

[TABLE]

We begin, throughout, by assuming that $X_{1}=1$ , and generalize our findings in this setting (for simplicity) to the case $r=p$ . We denote our partial sums with $T_{n}$ , $n\geq 1$ , when the first variable(s) is/are fixed and let $S_{n}$ be reserved for the case when they are random.

In order to move from $T_{n}$ to $S_{n}$ we also need to discuss the behavior of the walk when the initial value equals $-1$ . However, in that case the evolution of the walk is the same except for the fact that the trend of the walk is reversed, viz., the corresponding walk equals the mirrored image in the time axis. This implies that the mean after $n$ steps equals $-E(T_{n})$ , but the dynamics being the same, implies that the variance remains the same ( $\mathrm{Var\,}(-Y)=\mathrm{Var\,}(Y)$ for a random variable $Y$ ). In fact, the second moments of the walk remain the same. The same goes for higher order moments—odd moments equal the negative of those when $X_{1}=1$ , and even moments remain the same. In Sections 6 and 11 we depart from the assumption that $X_{1}$ and $X_{2}$ are fixed, and then the additional case $X_{1}+X_{2}=0$ has to be taken care of.

Finally, in order to avoid special effects we assume throughout that $0<p<1$ ; note that $p=1$ corresponds to $X_{n}=X_{1}$ for all $n$ , and $p=0$ the the case of alternating summands.

3 Some auxiliary material

For easier access of the arguments below we shortly present some auxiliary results from probability and analysis.

3.1 Disturbed limit distributions

The following (well-known) result (which is a special case of the Cramér–Slutsky theorem) will be used in order to go from a special case to a more general one.

Proposition 3.1

Let $\{U_{n},\,n\geq 1\}$ be a sequence of random variables and suppose that $V$ is independent of all of them. If $U_{n}\stackrel{{\scriptstyle d}}{{\to}}U$ as $n\to\infty$ , then $U_{n}V\stackrel{{\scriptstyle d}}{{\to}}UV$ as $n\to\infty$ .

Proof. Using characteristic functions and bounded convergence we have, as $n\to\infty$ ,

[TABLE]

An application of the continuity theorem for characteristic functions finishes the proof. $\Box$

3.2 Conditioning in case of a restricted memory

Let $\{S_{n},\,n\geq 1\}$ be an ERW, let $\{{\cal F}_{n},\,n\geq 1\}$ denote the $\sigma$ -algebras generated by the memory of the elephant and let ${\cal G}_{n}=\sigma\{X_{1},X_{2},\ldots,X_{n}\}$ stand for the full memory. We already know from (2.1) above that $E(X_{n+1}\mid{\cal G}_{n})=(2p-1)S_{n}/n$ . Our aim is to establish analogs when the elephant has a restricted memory, that is, analogs for $E(X_{n+1}\mid{\cal F}_{n})$ .

Toward the end, let $I_{n}=\{i\leq n:i\in{\mathfrak{M}}\}$ , where ${\mathfrak{M}}=$ the memory of the elephant. Then,

[TABLE]

that is, the conditional mean equals the average of the possible choices multiplied by the expected value of the sign; in analogy with (2.1).

If, for example, $I_{n}=\{n\}$ the elephant only remembers the most recent step, and $I_{n}=\{1,n\}$ means that he/she only remembers the first and the most recent steps; these are two cases that will be considered in the sequel. In these cases (3.1) states that

[TABLE]

respectively.

The next problem is when we condition on steps that are not contained in the memory. In words, if they do not, the elephant does not remember them, and, hence, cannot choose among them in a following step. More precisely, mathematically ${\mathfrak{M}}$ is defined as those steps in the past on which the elephant bases the next step. Technically, let $I\subset\{1,2,\ldots,n\}$ be an arbitrary set of indices, such that $I\cap I_{n}=\emptyset$ . Then

[TABLE]

It follows, in particular, that

[TABLE]

and that

[TABLE]

This, and the fact that $X_{n+1}^{2}=1$ , will be useful several times for the computation of second moments as follows:

[TABLE]

3.3 Difference equations

In the proofs we use several difference equations. For convenience and easy reference we summarize here some well-known facts about linear difference equations that are used on and off.

Proposition 3.2

(i)* Consider the first order equation*

[TABLE]

Then

[TABLE]

If, in addition, $|a|<1$ and $b_{n}=bn^{\gamma}$ with $\gamma>-1$ , then

[TABLE]

(ii)* If, in particular, $|a|<1$ and $x_{n+1}=ax_{n}+b$ , then*

[TABLE]

(iii)* Next is the homogeneous, second order equation*

[TABLE]

Then, with $\lambda_{1/2}=(a\pm\sqrt{a^{2}+4b})/2$ , provided $a^{2}+4b\not=0$ ,

[TABLE]

(iv)* As for the inhomogeneous second order equation*

[TABLE]

we have $x_{n}=x^{h}_{n}+y_{n}$ , where $y_{n}$ is some solution of the inhomogeneous equation, where the constants $c_{1},c_{2}$ in $x^{h}_{n}$ are chosen properly. If $d_{n}\equiv d$ and $a+b\not=1$ we may choose $y_{n}=d/(1-a-b)$ .

3.4 Some notation

We use the standard $\delta_{a}(x)$ to denote the distribution function with a jump of height one at $a$ . Constants $c$ and $C$ are always numerical constants that may change between appearances.

4 General elephant random walks

Let $\{\widetilde{S}_{n},\,n\geq 1\}$ be an ERW, and suppose that $R$ is a random variable with distribution function $F_{R}$ that is independent of the walk. If $\widetilde{S}_{n}/a_{n}\stackrel{{\scriptstyle a.s.}}{{\to}}Z$ as $n\to\infty$ for some normalizing positive sequence $a_{n}\to\infty$ as $n\to\infty$ , and some random variable $Z$ , it follows from Proposition 3.1 that $R\widetilde{S}_{n}/a_{n}\stackrel{{\scriptstyle a.s.}}{{\to}}RZ$ as $n\to\infty$ . An immediate consequence of this fact is that we can extend Theorems 3.1, 3.4 and (the first half of) Theorem 3.7 of [1] to cover more general step sizes. Namely, consider the ERW for which $\widetilde{X}_{1}\equiv 1$ , and let the random variables $\widetilde{X}_{n}$ , $n\geq 2$ , be constructeded as in Section 2 with this special $\widetilde{X}_{1}$ as starting point. Furthermore, let $R$ be a random variable, independent of $\{\widetilde{X}_{n},\,n\geq 1\}$ , and consider $X_{n}=R\cdot\widetilde{X}_{n}$ , $n\geq 1$ , and, hence, $S_{n}=R\cdot\widetilde{S}_{n}$ .

The following theorem (which reduces to the cited results of [1] if $R$ is a coin-tossing random variable), holds for $S_{n}=R\tilde{S}_{n}$ :

Theorem 4.1

(a)* For $0<p<3/4$ , $\dfrac{S_{n}}{n}\stackrel{{\scriptstyle a.s.}}{{\to}}0\quad\mbox{ as}\quad n\to\infty$ ;

(b) For $p=3/4$ , $\dfrac{S_{n}}{\sqrt{n}\log n}\stackrel{{\scriptstyle a.s.}}{{\to}}0\quad\mbox{ as}\quad n\to\infty$ ;

(c) For $3/4<p<1$ , $\dfrac{S_{n}}{n^{2p-1}}\stackrel{{\scriptstyle a.s.}}{{\to}}RL\quad\mbox{ as}\quad n\to\infty$ ,

where $L$ is a non-dgenerate random variable.*

As for convergence in distribution, we have to distinguish more carefully between the three cases.

Theorem 4.2

For $0<p<3/4$ we obtain

[TABLE]

Moreover, if $E(R^{2})<\infty$ , then $E(S_{n}/\sqrt{n})\to 0$ and $E((S_{n}/\sqrt{n})^{2})\to E(R^{2})/(3-4p)$ as $n\to\infty$ .

Proof. As $R$ and $S_{n}$ are independent we find that

[TABLE]

by dominated convergence which yields the desired result.

The second part is immediate, since $R$ is independent of everything else. $\Box$

Remark 4.1

*If $R=\pm 1$ with probabilities $r$ and $(1-r)$ , respectively, the limit distributions of $S_{n}/\sqrt{n}$ and $\widetilde{S}_{n}/\sqrt{n}$ are the same, and we rediscover Theorem 3.3 of [1].

Remark 4.2

For the critical case, $p=3/4$ one similarly obtains, using [1], Theorem 3.6, that

[TABLE]

The supercritical case, $3/4<p<1$ , has a different evolution and no analogous result exists.* $\Box$ *

5 Remembering only the distant past 1

This turns out as being the easiest case, since convenient independence is inherent. We begin by assuming that the elephant only remembers the first step, i.e., that ${\cal F}_{n}=\sigma\{X_{1}\}$ , and begin with the assumption that $X_{1}=1$ (recall that partial sums are denoted with the letter $T$ ). Then,

[TABLE]

and, hence,

[TABLE]

Moreover, applying (3.5) to $T_{n}$ we find that

[TABLE]

which, after telescoping, yields

[TABLE]

and, finally,

[TABLE]

A completely analogous calculation for characteristic functions, with an eye on (3.2), shows that

[TABLE]

after which telescoping tells us that

[TABLE]

after which a standard computation shows that

[TABLE]

Next we note that the computations so far prove that the increments are uncorrelated, suggesting independence … In fact, recalling that we have assumed that $X_{1}=1$ , we have, setting $\alpha_{k}=1$ if $X_{k}=+X_{1}$ , for $k\geq 2$ , and 0 otherwise,

[TABLE]

for $i,j\geq 2$ and different.

This means that the ERW coincides with the classical simple random walk, except for the fact that the first step is always equal to one. This is—after some thinking—rather obvious, because (in the language of [4]) we might interpret $X_{1}$ as a coin that we either flip or not before each new step. Hence we obtain:

Proposition 5.1

The strong law of large numbers, the central limit theorem, and the law of the iterated logarithm all hold for $\{T_{n},\,n\geq 1\}$ .

If, on the other hand, the first step is equal to $-1$ , then, by symmetry, $E(T_{n+1})=-n(2p-1)-1$ , the variance remains the same (recall the discussion toward the end of Section 2), and, again, by symmetry, $T_{n+1}+n(2p-1)$ normalized by $\sqrt{n}$ is asymptotically normal and the SLLN and the LIL do hold again.

As a consequence, assuming that $X_{1}$ is a coin-tossing random variable, we are (asymptotically) confronted with two normal distributions, one for each of the two portions of the probability space. In fact, if we imagine the situation that $r=P(X_{1}=+1)$ is close to zero or one it is rather apparent how the very first step determines along which branch it will evolve.

One also notes, more formally, that $E(X_{n+1}\mid{\cal F}_{n})=(2p-1)E(X_{1})$ , so that $E(X_{n+1})=(2p-1)^{2}$ , implying that $\mathrm{Var\,}(S_{n})={\cal O}(n^{2})$ (and not of order $n$ ) as $n\to\infty$ . Thus, an ordinary CLT is not valid, with the exception that if $p=1/2$ the two ”branches” determined by the first step collaps (asymptotically) into one, and we are ultimately faced with a classical simple symmetric random walk.

Hence, the following limit result is always available in the general case:

Theorem 5.1

*Let $S_{n}=\sum^{n}_{k=1}X_{k}$ .Then,

(a) $\displaystyle\frac{S_{n}}{n}\stackrel{{\scriptstyle d}}{{\to}}\begin{cases}\phantom{-(}2p-1,&\quad\mbox{ with probability}\quad p,\\[5.69054pt] -(2p-1),&\quad\mbox{ with probability}\quad 1-p,\end{cases}\quad\mbox{ as}\quad n\to\infty$ ;

(b) $E(S_{n}/n)\to(2p-1)^{2}\quad\mbox{ and}\quad\mathrm{Var\,}(S_{n}/n)\to 4p(1-p)(2p-1)^{2}\quad\mbox{ as}\quad n\to\infty$ .*

Proof of (a). If $X_{1}=\pm 1$ we know from above that $E(T_{n})=\pm(1+(n-1)(2p-1))$ , and that $\mathrm{Var\,}(T_{n})=4p(1-p)(n-1)$ . This tells us that, $\frac{T_{n}}{n}\stackrel{{\scriptstyle p}}{{\to}}\pm(2p-1)$ as $n\to\infty$ . The conclusion follows.

Proof of (b). Immediate. $\Box$

Remark 5.1

(i) An interpretation of the limit in (a) is that the random walk at hand, on average, behaves, asymptotically, like a coin-tossing random variable with values at the points $\pm(2p-1)$ .

(ii) An alternative way of phrasing the conclusion of the theorem is that*

[TABLE]

However, if we use a random normalization we obtain the following result:

Theorem 5.2

*Let $S_{n}=\sum^{n}_{k=1}X_{k}$ .Then,

(a) $\displaystyle\frac{S_{n}-n(2p-1)X_{1}}{\sqrt{4np(1-p)}}\stackrel{{\scriptstyle d}}{{\to}}{\cal N}_{0,1}\quad\mbox{ as}\quad n\to\infty$ ;

(b) $\displaystyle\frac{S_{n}-n(2p-1)X_{1}}{n}\stackrel{{\scriptstyle a.s.}}{{\to}}0\quad\mbox{ as}\quad n\to\infty$ ;

(c) $\displaystyle\limsup_{n\to\infty}\,(\liminf_{n\to\infty})\frac{S_{n}-n(2p-1)X_{1}}{\sqrt{8np(1-p)\log\log n}}=1\;(-1)\quad\mbox{ a.s.}\quad$ *

Proof of (a). We use the fact that

[TABLE]

together with Theorem 4.2 and its Remark 4.1.

Alternatively, one may condition on the value of $X_{1}$ . This procedure will be exploited in the proof of Theorem 6.2 in the next section.

Proof of (b) and (c). Define $\Omega_{1}=\{\omega\in\Omega:X_{1}(\omega)=1\}$ and $\Omega_{2}=\Omega_{1}^{c}$ . After renormalization the original probability measure will be a probability measure on $\Omega_{1}$ . Based on this measure on $\Omega_{1}$ we obtain an SLLN and an LIL for $S_{n}-n(2p-1)X_{1}$ . Similarly on $\Omega_{2}$ . Combining them yields the desired result. $\Box$

Remark 5.2

The strong law can also be formulated with a random RHS:

[TABLE]

Remark 5.3

If $X_{1}$ is a general random variable with distribution $F$ having no mass at zero, then**

[TABLE]

$\Box$

A special case is, once again, $p=1/2$ :

Corollary 5.1

If $p=1/2$ , then

[TABLE]

6 Remembering only the distant past 2

In this section we begin by assuming that the elephant only remembers the first two steps, so that ${\cal F}_{n}=\sigma\{X_{1},X_{2}\}$ , and suppose that $X_{1}=X_{2}=1$ . Then, for $n\geq 2$ ,

[TABLE]

for all $n\geq 2$ , and, hence,

[TABLE]

(since $E(T_{2})=2$ ). Extending the idea from the previous section that the walk evolves as an ordinary simple random walk beginning at the third step, a natural guess is that

[TABLE]

To see this we first observe that $\mathrm{Var\,}(T_{3})=\mathrm{Var\,}(X_{3})=4p(1-p)$ , that is, the formula is correct for $n=2$ . Assuming it is correct for $n-1$ we have

[TABLE]

since by (3.4) and the fact that $X_{1}=X_{2}=1$ ,

[TABLE]

Next, by modifying the computations involving the characteristic function from Section 5, we obtain

[TABLE]

By continuing as before one obtains, after proper centering, a limiting normal distribution for these initial $X$ -values. Similarly for the other ones. But, only for each ”branch” separately. One can also ascertain that the variance is not linear if we assume random beginnings. Except, as before, when $p=1/2$ and the three main limit theorems (SLLN, CLT, LIL) hold (as in Corollary 5.1).

The following analog of Theorem 5.1 holds in the general case (as one might expect):

Theorem 6.1

*Let $S_{n}=\sum^{n}_{k=1}X_{k}$ . Then

(a) $\displaystyle\frac{S_{n}}{n}\stackrel{{\scriptstyle d}}{{\to}}\begin{cases}\phantom{-(}2p-1,&\quad\mbox{ with probability}\quad p^{2},\\ \phantom{5555}0,&\quad\mbox{ with probability}\quad 1-p,\\ -(2p-1),&\quad\mbox{ with probability}\quad p(1-p),\end{cases}\quad\mbox{ as}\quad n\to\infty$ ;

(b) $E(S_{n}/n)\to p(2p-1)^{2}\quad\mbox{ and that}\quad\mathrm{Var\,}(S_{n}/n)\to p(1-p)(2p-1)^{2}\big{(}4p^{2}+1\big{)}.$ *

Proof of (a). If $X_{1}=X_{2}=\pm 1$ we know from above that $E(T_{n})=\pm(n(2p-1)+3-2p)$ , and that $\mathrm{Var\,}(T_{n})=4p(1-p)(n-2)$ . Moreover, $E(T_{n})=0$ whenever $X_{1}$ and $X_{2}$ have different signs. The variance remains the same (with $p=1/2$ ). This, together with the fact that $P(X_{1}=X_{2}=1)=p^{2}$ , $P(X_{1}=X_{2}=-1)=(1-p)p$ , and $P(X_{1}\neq X_{2})=p(1-p)+(1-p)^{2}=1-p$ helps us to finish the proof of the first part. Part (b) follows. $\Box$

Remark 6.1

(i) In analogy with Remark 5.1 we have the interpretation that the elephant, asymptotically, on average, performs a random walk on the points $\pm(2p-1)$ and [math].

(ii) Mimicing Remark 5.1 we may rewrite the conclusion of the theorem is*

[TABLE]

$\Box$

Once again random normalization produces further limit results:

Theorem 6.2

*Let $S_{n}=\sum^{n}_{k=1}X_{k}$ .Then,

(a) $\displaystyle\frac{S_{n}-n(2p-1)\,(X_{1}+X_{2})/2}{\sqrt{n}}\stackrel{{\scriptstyle d}}{{\to}}p\cdot{\cal N}_{0,4p(1-p)}+(1-p)\cdot{\cal N}_{0,1}\quad\mbox{ as}\quad n\to\infty$ ;

(b) $\displaystyle\frac{S_{n}-n(2p-1)\,(X_{1}+X_{2})/2}{n}\stackrel{{\scriptstyle a.s.}}{{\to}}0\quad\mbox{ as}\quad n\to\infty$ ;

(c) $\displaystyle\limsup_{n\to\infty}(\liminf_{n\to\infty})\frac{S_{n}-n(2p-1)\,(X_{1}+X_{2})/2}{\sqrt{2n\log\log n}}=\sigma(X_{1},X_{2})\;(-\sigma(X_{1},X_{2}))\quad\mbox{ a.s.,}\quad$

where $\sigma(X_{1},X_{2})=\begin{cases}4p(1-p),&\quad\mbox{ for}\quad\omega\in\{\omega\in\Omega:X_{1}(\omega)\cdot X_{2}(\omega)=1\},\\ 1,&\quad\mbox{ otherwise.}\end{cases}$ *

Proof of (a). Conditioning on the value of $(X_{1}+X_{2})/2$ we obtain

[TABLE]

Parts (b) and (c) follow along the lines of the proof of Theorem 5.1. $\Box$

7 The distant past; higher order

If one remembers the first $m$ random variables for some $m\in\mathbf{N}$ , the following obvious extension of the above results emerges.

Theorem 7.1

For $q_{k}=P(S_{m}=m-2k)$ , $r_{k}=\big{(}(m-k)p+k(1-p)\big{)}/m$ , and $p_{k}=(m-2k)(2p-1)/m$ , where $0\leq k\leq m$ and $m\in\mathbf{N}$ ,

[TABLE]

and

[TABLE]

Proof. As before we write

[TABLE]

and observe that in each conditional case we have a random walk with the appropriate success probabilities, i.e., for $S_{m}=m-2k$ the sucess probability is $r_{k}=\big{(}(m-k)p+k(1-p)\big{)}/m$ , and, hence, the expectation is $p_{k}=2r_{k}-1=(m-2k)(2p-1)/m$ .

Remark 7.1

*(i) The probabilities at the jumps are relatively complicated and therefore not expressed in detail, but $q_{0}=p^{m}$ and $q_{m}=(1-p)p^{m-1}$ .

(ii) A more detailed analysis shows that the probability mass of the limit distribution of $S_{n}/n$ concentrates near zero as $m$ increases.*

(iii) One easily checks that the variance for each ”branch” equals $4p(1-p)(n-m)$ , which, in turn, is dominated by $\leq 4p(1-p)n$ , which, consequently, tells us that the analog of Theorems 5.1 and 6.1 holds.

(iv) Once again, the case $p=1/2$ is special as described in the two previous sections. $\Box$ *

8 Remembering only the recent past 1

This situation is much more complex, because, even though one remembers only recent steps, the path depends on the whole history so far (some remarks on that will be given in Subsection 12.3). Once again we begin by assuming that the elephant only remembers the very last step, which means that ${\cal F}_{n}=\sigma\{X_{n}\}$ . This setting is reminiscent of [4], where one turns over a coin instead of tossing it. The main focus there, however, is on different $p$ -values at each step and, e.g., how this may affect phase transitions and behavior at critical values.

We begin, as always, by assuming that $X_{1}=1$ . Then, $E(X_{1})=1$ , and

[TABLE]

for all $n\geq 2$ . By iterating this it follows that for, $n\geq 0$ ,

[TABLE]

and

[TABLE]

For the second moment we have, by (3.5) and (3.2),

[TABLE]

For the middle term we obtain by (3.2),

[TABLE]

which in turn, after iteration, yields

[TABLE]

Now we can calculate the second moment:

[TABLE]

By telescoping we obtain

[TABLE]

which implies the following formula for the asymptotic variance:

[TABLE]

Noticing that $S_{n}=X_{1}T_{n}$ and that $X_{1}=\pm 1$ , a glance at (8.1) and (8.2) shows that $\frac{T_{n}}{n}\stackrel{{\scriptstyle p}}{{\to}}0$ and that $\frac{S_{n}}{n}\stackrel{{\scriptstyle p}}{{\to}}0$ as $n\to\infty$ , suggesting the following result:

Theorem 8.1

For $X_{1}=\pm 1$ ,

[TABLE]

Our next task is to apply the method of moments in order to prove that this is indeed true. We thus wish to prove that

[TABLE]

This amounts to lengthy computations of various higher order mixed moments. The reason for this is that higher order moments of $T_{n}$ can be expressed as linear combinations of lower order moments of $T_{n}$ and $X_{n}$ with the aid of the binomial theorem.

Convergence of mean and variance has already been established above. For higher order moments we use induction.

Throughout in the following, $C(p,m)$ , with our without an index, are numerical constants which may differ from line to line and $E(R_{n}(p,m))$ are quantities of smaller order than the leading term.

Lemma 8.1

For $m\geq 1$ we have, as $n\to\infty$ ,

[TABLE]

where $E(R_{n}(p,m))$ denotes individual remainder terms.

The proof of the lemma amounts to extending the above computations for mean and variance to higher order variants and is deferred to the Appendix, Subsection A.1.

Proof of Theorem 8.1. As already mentioned, the proof exploits the method of moments. For $X_{1}=1$ the lemma tells us that

[TABLE]

which verifies (8.3). For $X_{1}=-1$ we recall from the end of Section 3 that even moments remain the same and that odd moments are the same except for a change of sign, which yields the same conclusion. The limit result for $S_{n}$ then follows as in Theorem 4.2. $\Box$

Remark 8.1

The sequence $\{X_{n},\,n\geq 1\}$ is a stationary recurrent Markov chain with finite state space which, hence, is uniformly ergodic. The asymptotic normality of $T_{n}$ therefore also follows from a CLT for Markov chains, see, e.g., Corollary 5 of [8] (cf. also [7], Theorem 19.1.)* $\Box$ *

The Markov property also provides a strong law.

Theorem 8.2

We have

[TABLE]

Proof. The stationary distribution of the ergodic Markov chain $\{X_{n},\,n\geq 1\}$ is $(1/2,1/2)$ , which has expectation zero. An application of Theorem 6.1 in [3] yields the conclusion. $\Box$

9 Remembering only the recent past 2

In this section we assume that the elephant remembers the two most recent steps, that is, at time $n$ the next step is based on the steps $X_{n}$ and $X_{n-1}$ . The computations are as before, although more elaborate. We have, as always, $X_{1}=1$ , $E(X_{2}\mid{\cal F}_{1})=(2p-1)X_{1}$ ,

[TABLE]

and, for $n\geq 3$ ,

[TABLE]

Computing the moments one obtains the following result. For the proof we refer to the Appendix, Subsection A.2.

Lemma 9.1

As $n\to\infty$ ,

[TABLE]

The expectation of $X_{n}$ tends to zero geometrically fast.

Remark 9.1

For $p=1/2$ the process reduces, as usual, to a simple symmetric random walk.* $\Box$ *

For the following limit theorems we lean on the Markov property (and invite the reader to try the moment method).

Theorem 9.1

We have

[TABLE]

Proof. The sequence $\{X_{n},\,n\geq 1\}$ now forms a Markov chain of order two. Theorem 6.1 in [3] yields the strong law, and the results in [5], Section 3, or [6], combined with Corollary 5 of [8], yield the asymptotic normality with the moments as calculated above. $\Box$

Remark 9.2

If we suppose that the elephant remembers a fixed but finite number, $k$ say, of the most recent steps, the sequence of steps forms a Markov chain of order $k$ , and we obtain, by (basically) the same arguments as above that $S_{n}/\sqrt{n}$ will be asymptotically normal (a Markov chain of order $k$ can be considered as a $k$ -dimensional Markov chain and use e.g. [6]).* $\Box$ *

10 Remembering the distant as well as the recent past 1

Next we consider the case when the elephant has a clear memory of the early steps as well as the very recent ones.

One can think of a(n old) person who remembers the early childhood and events from the last few days but nothing in between. The most elementary case is ${\cal F}_{n}=\sigma\{X_{1},X_{n}\}$ , for all $n\geq 2$ . Following the approach of earlier variants we begin by assuming that $X_{1}=1$ . Then, for $n\geq 2$ ,

[TABLE]

and

[TABLE]

Exploiting Proposition 3.2(i) we obtain, for $n\geq 1$ ,

[TABLE]

and, hence, that

[TABLE]

Next we note that $E(T_{1}^{2})=1$ , and, by (3.5), that, for $n\geq 1$ ,

[TABLE]

In order to establish a difference equation for the second moment we first have to compute the mixed moment. For the computational details we refer to Appendix A.3 and obtain (formula (A.7)),

[TABLE]

Joining the expressions for the first two moments, finally, tells us that the variance is linear in $n$ :

[TABLE]

where

[TABLE]

Given the expressions for mean and variance, a weak law is immediate:

[TABLE]

In analogy with our earlier results this suggets that $T_{n}$ is asymptotically normal. That this is, indeed, the case follows from the fact that $\{T_{n},\ n\geq 1\}$ is, once again, a uniformly ergodic Markov chain, since the only random piece from the past is the previous step. We may thus apply Corollary 5 of [8] (cf. also [7], Theorem 19.1) to conclude that $T_{n}-E(T_{n})$ is asymptotically normal with mean zero and variance $\sigma^{2}_{T}n$ , with $\sigma^{2}_{T}$ as defined in (10.5), which, in view of (10.2), establishes that

[TABLE]

An appeal to the disussion at the end of Section 2 now allows us to conclude that

[TABLE]

which tells us that

[TABLE]

Furthermore, in analogy to Theorem 5.1, we arrive at the following asymptotic distributional behavior of $S_{n}$ :

Theorem 10.1

We have

[TABLE]

Moreover, $E(S_{n}/n)^{r}\to E(S^{r})$ for all $r>0$ , since $|S_{n}/n|\leq 1$ for all $n$ .

Remark 10.1

Comparing this with Theorem 5.1 we see that the jump points are closer together here. This can be explained by the fact that the current random variables are less dependent than those in Section 5.* $\Box$ *

Finally, by combining (10.7) with the obvious analog for the case $X_{1}=-1$ , asymptotic normality follows with a random centering:

Theorem 10.2

We have

[TABLE]

Proof. We first note that it follows from the discussion following (10.7) that the CLT there remains true when $X_{1}=-1$ with a $+$ replacing the $-$ in the numerator. We thus may argue as in the proof of Theorem 5.1, via the fact that

[TABLE]

Alternatively, condition on the value of $X_{1}$ and proceed as in the proof of Theorem 6.2. $\Box$

11 Remembering the recent as well as the distant past 2

In this section we extend the previous one in that we assume that ${\cal F}_{n}=\sigma\{X_{1},X_{2},X_{n}\}$ , for all $n\geq 3$ . Following the approach of earlier variants we begin by assuming that $X_{1}=X_{2}=1$ . Then $E(X_{1})=E(X_{2})=1$ , $E(X_{3})=(2p-1)$ and, for $n\geq 3$ ,

[TABLE]

Exploiting Proposition 3.2(i) yields

[TABLE]

and, hence,

[TABLE]

As for second moments, $E(T_{1})^{2}=1$ , $E(T_{2})^{2}=4$ , $E(T_{3})^{2}=E(1+1+X_{3})^{2}=4+4E(X_{3})+1=4+4(2p-1)+1=8p+1$ , and, generally, that,

[TABLE]

Concerning the mixed moments and other details we refer to Appendix A.4, from which we obtain

[TABLE]

The variance, finally, turns out as

[TABLE]

Following the path of the previous section we now immmediately obtain a weak law:

[TABLE]

It remains to consider the general case with arbitrary $X_{1}$ and $X_{2}$ . There is a slight change here from the previous section. Namely, we first have the case when $X_{1}=X_{2}=-1$ , for which the arguments from the previous section carry over without change, that is, the mean equals $E(-T_{n})$ and the second moment equals $E(T_{n}^{2})$ . However, now we also have a mixed case which behaves somewhat differently.

Namely, consider the case when the first two summands are not equal; $X_{1}+X_{2}=0$ , $X_{1}X_{2}=-1$ . Then,

[TABLE]

and, for $n\geq 3$ ,

[TABLE]

from which we conclude that, for $n\geq 2$ ,

[TABLE]

For the calculation of the second moment we refer again to Appendix A.4 and find that

[TABLE]

where the last equality is due to the fact that $E(T_{n})=0$ . The weak law now runs slightly differently, in that

[TABLE]

We note in passing that the mean is linear in $n$ and that the second moment is of order $n^{2}$ when the first two summands are equal, whereas the mean is zero and the second moment is linear in $n$ when they are not. However, the variance is linear in $n$ in all cases.

As for central limit theorems, the main arguments are the same as in Section 10, in that

[TABLE]

for the cases $X_{1}=X_{2}=-1$ and $X_{1}=X_{2}=1$ , respectively, and

[TABLE]

when the first two summands are unequal.

Switching to moments of $S_{n}$ , using $T_{n}^{+}$ , $T_{n}^{-}$ and $T_{n}^{0}$ for the three cases, we obtain,

[TABLE]

Collecting the various pieces tells us that

[TABLE]

Finally, by modifying our earlier results of this kind, one ends up as follows:

Theorem 11.1

We have

[TABLE]

Morevover, $E(S_{n}/n)^{r}\to E(S^{r})$ for all $r>0$ , since $|S_{n}/n|\leq 1$ for all $n$ .

We finally wish to combine the asymptotic normality for the three different beginnings of the process in order to arrive at a limit theorem for the $S$ -process. This works (in theory) the same way as in Section 10. However, there is a problem with the variance. Namely, in Theorem 10.2 both cases had the same variance, whereas here the variance, when $X_{1}$ and $X_{2}$ are equal, is not the same as when they are different. Nevertheless, here is the result.

Theorem 11.2

We have

[TABLE]

with $\sigma_{T}^{2}$ as given in (11.4).

Proof. The conclusion follows by conditioning on the value of $(X_{1}+X_{2})/2$ , and proceeding as in the proof of Theorem 6.2. $\Box$

12 Miscellania

We close by mentioning some further specific models and by describing some problems and challenges for further research.

12.1 More on restricted memories

(i) The next logical step would be to check the case when ${\cal F}_{n}=\sigma\{X_{1},X_{n-1},X_{n}\}$ . By modifying the computations in Appendix A.2, setting $a=\frac{2p-1}{3}$ and $d=3a^{2}$ , we find that

[TABLE]

after which Proposition 3.2(iv), and a glance at the computations in Appendix A.2, tell us that

[TABLE]

where $q=\max\{|\lambda_{1}|,|\lambda_{2}|\}<1$ , with $\lambda_{i}$ , $i=1,2$ , defined in Appendix A.2, and it follows that

[TABLE]

If ${\cal F}_{n}=\sigma\{X_{1},X_{2},X_{n-1},X_{n}\}$ , then, with $a=\frac{2p-1}{4}$ and $d=2p(2p-1)^{2}/4$ , one similarly obtains that

[TABLE]

In fact, theoretically it is possible to obtain results of the above kind for any fixed number of early and/or late memory steps.

(ii) A more subtle case is when the number of memory steps depends on $n$ , such as $\log n$ or $\sqrt{n}$ .

(iii) Another model is when the elephant remembers everything except the first step, more generally, the elephant remembers all but the first $k$ steps for some $k\in\mathbb{N}$ . Set $p_{k}=P(X_{k+1}=1)$ , $V_{n}=X_{k+1}+\dots X_{n}$ , and ${\cal H}_{n}=\sigma\{X_{k+1},\dots X_{n}\}$ , and let $n\geq k+1$ . Then

[TABLE]

where $\gamma_{n}=(n-3+2p_{k})/(n-2)$ . With

[TABLE]

one can, as in [1], show that $\tilde{a}_{n}V_{n}$ is a martingale. From the same paper it follows, provided that $0<p_{k}<3/4$ , that

[TABLE]

which implies that

[TABLE]

The quantity $p_{k}$ depends on the construction used for the $k$ steps $X_{2},...,X_{k+1}$ .

Other cases one might think of is when the memory covers everything except

•

the last $j$ steps;

•

the first $k$ steps and the last $j$ steps;

•

the first $\alpha\log n$ steps and or the last $\beta\log n$ steps for some $\alpha,\beta>0$ ;

•

the first $\alpha\sqrt{n}$ steps and or the last $\beta\sqrt{n}$ steps for some $\alpha,\beta>0$ ;

•

the first $\alpha\log n$ steps and or the last $\beta\sqrt{n}$ steps for some $\alpha,\beta>0$ ;

•

and so on, aiming at more general (final) results.

12.2 Phase transition

The results of Bercu [1] show that for the full memory one has a phase transition at $p=3/4$ . There is no such thing in our results. An obvious, as well as interesting, question would be to find the breaking point. There exist some papers on this topic using simulations, see e.g., [12, 2, 10] and further papers cited therein, but we are not aware of any theoretical results concerning this matter.

12.3 Remembering the first vs. the last step

There is a fundamental difference in behavior in these extreme cases, it is not just a matter of recalling some earlier step. Namely, it is a matter of comparing

[TABLE]

with

[TABLE]

In order to see the difference more clearly, let us imagine that $p$ is close to one.

In the first case every new step equals most likely the first one, that is, a typical path will then constist of an overwhelming amout of $X_{1}$ :s interfoliated by an occasional $-X_{1}$ . In the second case every new step equals most likely the most recent one, that is, a typical path will constist of an overwhelming amout of $X_{1}$ :s followed by an overwhelming amount of $-X_{1}$ :s, followed by …., that is alternating long stretches of the same kind.

Moreover, since, in the first case, every new step is a function of just the first one, the independence structure does not come as a surprise, whereas in the second case the next step depends on the previous one, which in turn depends on its previous one, etcetera, which implies that the next step, in fact, depends on the whole path so far.

12.4 Final Remarks

(i) We have seen that the more the elephant remembers the cumbersomer become the computations. However, once again, in theory it would be possible to compute higher order moments and thus, e.g., use the moment method to prove limit theorems.

(ii) By using the device from Section 4 one can extend all limit theorems for ERW:s to the case with general steps. $\Box$

Appendix A Appendix

In this appendix we collect more technical calculations.

A.1 Proof of Lemma 8.1

Recall that even powers of $X_{n}$ are always equal to 1, and, moreover, that $X_{n}^{k}$ = $X_{n}$ if $k$ is odd. One consequence of this is the following fact that will be used repeatedly below:

[TABLE]

As mentioned in connection with the statement of Theorem 8.1 we use induction. We thus assume that we know that the moments up to order $2m-2$ converge properly, in particular we may choose $n$ so large that $|E(T_{n}/\sqrt{n})^{k}|\leq 2\mu_{k}$ for $k\leq 2m-2$ , which, by symmetry, inplies that $|E(T_{n})^{k}|\leq 4\mu_{k}n^{k/2}$ , for $k\mbox{ even }\leq 2m-1$ and $\leq 4\varepsilon n^{k/2}$ , and for $k\mbox{ odd }\leq 2m-1$ , for some $\varepsilon$ small (recall that $\mu_{k}$ are the moments of the standard normal distribution as given in (8.3)).

Proof of (8.4).

[TABLE]

Taking expectations on either side yields

[TABLE]

Exploiting (A.1) yields a bound for the remainder:

[TABLE]

By iterating (A.2) we then obtain that

[TABLE]

Proof of (8.5).

[TABLE]

Taking expectations on either side yields

[TABLE]

The estimation of the remainder is the same as above. The remaining part of the proof follows the exact same lines and is therefore omitted.

Having estimates for the mixed moments we are now able to attack the ”pure” moments. This will be done without explicit mentioning. Moreover, the estimates for the remainders, are, again, the same.

Proof of (8.6).

[TABLE]

Taking expectations on either side yields

[TABLE]

by the induction hypothesis. Summing up the differences leads to the desired result.

Proof of (8.7).

[TABLE]

Taking expectations on either side yields

[TABLE]

by the induction hypothesis. Summing up, finally, leads to the desired result with $C_{2}(p,m)=\frac{(2m+1)(2p-1)}{m}\cdot C_{1}(p,m)+(2m-1)\cdot C_{2}(p,m-1)$ . $\Box$

A.2 Proof of Lemma 9.1

Set $a=p-1/2\in(-1/2,\,1/2)$ . Then,

[TABLE]

For $n\geq 2$ we have

[TABLE]

With $\lambda_{1/2}=(a\pm\sqrt{a^{2}+4a})/2$ (note that $|\lambda_{1/2}|<1$ ) this difference equation, with the two starting values $2a$ and $4a^{2}$ , has, for $n\geq 1$ , the solution

[TABLE]

For $p<1/2$ we have $\sqrt{a^{2}+4a}=i\sqrt{|a^{2}+4a|}$ , but the solution is still real. Next,

[TABLE]

The second moment is more tedious. We begin with

[TABLE]

and obtain

[TABLE]

As for the mixed moments,

[TABLE]

By the usual trick we find

[TABLE]

With $\zeta_{n}=E(S_{n}X_{n})$ we find that

[TABLE]

from which it follows that $\zeta_{n}\to\frac{p^{2}-2p+7/4}{(1-p)(3-2p)}$ , the stationary solution.

Next,

[TABLE]

We finally arrive, recalling (A.4), at

[TABLE]

and thus, via telescoping, at

[TABLE]

$\Box$

A.3 Calculation of second moments in Section 10

We first note that $E(T_{1}^{2})=1$ , and, by (3.5), that, for $n\geq 1$ ,

[TABLE]

At this point we have to pause and compute the mixed moments: We first note that $E(T_{1}X_{1})=1$ , and that

[TABLE]

so that

[TABLE]

For $n\geq 2$ we exploit (3.4), (10.2), and the fact that $X_{n}^{2}=1$ , to obtain

[TABLE]

Another application of Proposition 3.2(i) then tells us that

[TABLE]

Hence, using (A.5), we obtain

[TABLE]

after which we, via telescoping, obtain that

[TABLE]

A.4 Calculation of second moments in Section 11

The point of departure in this case is (11.3), viz.,

[TABLE]

For the mixed moments we use (3.4):

[TABLE]

We thus find, using (11.2), that for $n\geq 3$ ,

[TABLE]

Invoking Proposition 3.2(i) then tells us that

[TABLE]

which, inserted into (A.8), yields

[TABLE]

and, after summation,

[TABLE]

We, finally, turn our attention to the second moment for the case when $X_{1}\cdot X_{2}=-1$ , where, again, the mixed moment is first in focus. Now, $E(T_{1}X_{1})=1$ , $E(T_{2}X_{2})=E(X_{1}X_{2}+X_{2}^{2})=-1+1=0$ , and $E(T_{3}X_{3})=E(T_{2}X_{3}+X_{3})^{2}=0+1=1$ . For $n\geq 3$ we follow the usual pattern. Due to the fact that the mean is zero, an application of (3.4) now yields

[TABLE]

which, together with Proposition 3.2(i), tells us that

[TABLE]

Moving into second moments, $E(T_{1}^{2})=1$ , $E(T_{2}^{2})=0$ , and $E(T_{3}^{2})=E(X_{3}^{2})=1$ . For $n\geq 3$ we insert our findings in (A.11) into (A.8):

[TABLE]

so that, via telescoping,

[TABLE]

Acknowledgement

The results of this paper were initiated during U.S.’s visit in Uppsala in May 2018. U.S. wants to thank for the kind hospitality and we both wish to thank Kungliga Vetenskapssamhället i Uppsala for financial support.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bercu, B. (2018). A martingale approach for the elephant random walk. J. Phys. A: Math. Theor. 81 , 015201.
2[2] Cressoni, J.C., da Silva, M.A.A., and Viswanathan, G.M. (2007). Amnestically induced persistence in random walks J. Phys. A.: Math. Theor. 46 , 505002.
3[3] Doob, J.L. (1953). Stochastic Processes . J. Wiley & Sons, New-York.
4[4] Engländer, J., and Volkov, S. (2018). Turning a coin over instead of tossing it. J. Theor. Probab. 31 , 1097-1118.
5[5] Herkenrath, U. (2003). A new approach to Markov processes of order 2. Ann. Univ. Craiova, Math. Comp. Sci. Ser. 30 , 106-115.
6[6] Herkenrath, U., Iosifescu, M., and Rudolph, A. (2003) A note on invariance principles for iterated random functions. J. Appl. Probab. 40 , 834–837.
7[7] Ibragimov, I.A., and Linnik, Y.V. (1971). Independent and Stationary Sequences of Random Variables. Wolters–Noordhof, Groningen.
8[8] Jones, G.L. (2004). On the Markov chain central limit theorem. Prob. Surveys 1 , 299-320.