Fluctuations of the Empirical Measure of Freezing Markov Chains

Florian Bouguet; Bertrand Cloez

arXiv:1705.02121·math.PR·May 8, 2017

Fluctuations of the Empirical Measure of Freezing Markov Chains

Florian Bouguet, Bertrand Cloez

PDF

Open Access

TL;DR

This paper studies the long-term behavior of empirical measures in a class of freezing Markov chains with decreasing transition probabilities, extending existing results to more general freezing speeds and providing detailed convergence characterizations.

Contribution

It generalizes previous convergence results for freezing Markov chains to arbitrary freezing speeds using stochastic approximation, offering improved limit distribution descriptions and convergence rates.

Findings

01

Generalized convergence results for any freezing speed

02

Characterized limit distributions and convergence rates

03

Provided functional convergence analysis

Abstract

In this work, we consider a finite-state inhomogeneous-time Markov chain whose probabilities of transition from one state to another tend to decrease over time. This can be seen as a cooling of the dynamics of an underlying Markov chain. We are interested in the long time behavior of the empirical measure of this freezing Markov chain. Some recent papers provide almost sure convergence and convergence in distribution in the case of the freezing speed $n^{- θ}$ , with different limits depending on $θ < 1, θ = 1$ or $θ > 1$ . Using stochastic approximation techniques, we generalize these results for any freezing speed, and we obtain a better characterization of the limit distribution as well as rates of convergence as well as functional convergence.

Figures7

Click any figure to enlarge with its caption.

Equations379

P (i_{n + 1} = j ∣ i_{n} = i) = q_{n} (i, j), q_{n} (i, j) = p_{n} (q (i, j) + r_{n} (i, j)),

P (i_{n + 1} = j ∣ i_{n} = i) = q_{n} (i, j), q_{n} (i, j) = p_{n} (q (i, j) + r_{n} (i, j)),

L_{O} f (y) = - y \cdot \nabla f (y) + \nabla f (y)^{⊤} Σ^{(p, Υ)} \nabla f (y),

L_{O} f (y) = - y \cdot \nabla f (y) + \nabla f (y)^{⊤} Σ^{(p, Υ)} \nabla f (y),

Σ_{k, l}^{(p, Υ)} = \frac{1}{1 + Υ} i = 1 \sum D ν_{i} [j = 1 \sum D q (i, j) (h_{l, j} - h_{l, i}) (h_{k, j} - h_{k, i)}) - p (ν_{k} - \mathds 1_{i = k}) (ν_{l} - \mathds 1_{i = l})],

Σ_{k, l}^{(p, Υ)} = \frac{1}{1 + Υ} i = 1 \sum D ν_{i} [j = 1 \sum D q (i, j) (h_{l, j} - h_{l, i}) (h_{k, j} - h_{k, i)}) - p (ν_{k} - \mathds 1_{i = k}) (ν_{l} - \mathds 1_{i = l})],

L_{Z} f (x, i) = (e_{i} - x) \cdot \nabla_{x} f (x, i) + j \neq = i \sum a q (i, j) [f (x, j) - f (x, i)] .

L_{Z} f (x, i) = (e_{i} - x) \cdot \nabla_{x} f (x, i) + j \neq = i \sum a q (i, j) [f (x, j) - f (x, i)] .

f^{(N)} = \partial_{1}^{N_{1}} \dots \partial_{d}^{N_{d}} f, ∥ f^{(N)} ∥_{\infty} = x \in R^{d} sup ∣ f^{(N)} (x) ∣.

f^{(N)} = \partial_{1}^{N_{1}} \dots \partial_{d}^{N_{d}} f, ∥ f^{(N)} ∥_{\infty} = x \in R^{d} sup ∣ f^{(N)} (x) ∣.

△ = {(x_{1}, \dots, x_{D}) \in R^{D} : x_{i} \in [0, 1], i = 1 \sum D x_{i} = 1},

△ = {(x_{1}, \dots, x_{D}) \in R^{D} : x_{i} \in [0, 1], i = 1 \sum D x_{i} = 1},

d_{F} (μ, ν) = f \in F sup ∣ μ (f) - ν (f) ∣.

d_{F} (μ, ν) = f \in F sup ∣ μ (f) - ν (f) ∣.

W (μ, ν) = ∣ f (x) - f (y) ∣ \leq ∣ x - y ∣ sup ∣ μ (f) - ν (f) ∣, d_{TV} (μ, ν) = ∥ f ∥_{\infty} \leq 1 sup ∣ μ (f) - ν (f) ∣

W (μ, ν) = ∣ f (x) - f (y) ∣ \leq ∣ x - y ∣ sup ∣ μ (f) - ν (f) ∣, d_{TV} (μ, ν) = ∥ f ∥_{\infty} \leq 1 sup ∣ μ (f) - ν (f) ∣

x \mapsto \frac{Γ ( \sum _{k = 1}^{D} θ _{k} )}{\prod _{k = 1}^{D} Γ ( θ _{k} )} k = 1 \prod D x_{k}^{θ_{k} - 1} \mathds 1_{{x \in △}} .

x \mapsto \frac{Γ ( \sum _{k = 1}^{D} θ _{k} )}{\prod _{k = 1}^{D} Γ ( θ _{k} )} k = 1 \prod D x_{k}^{θ_{k} - 1} \mathds 1_{{x \in △}} .

x \mapsto \frac{Γ ( θ _{1} + θ _{2} )}{Γ ( θ _{1} ) Γ ( θ _{2} )} x^{θ_{1}} (1 - x)^{θ_{2}} \mathds 1_{0 < x < 1} .

x \mapsto \frac{Γ ( θ _{1} + θ _{2} )}{Γ ( θ _{1} ) Γ ( θ _{2} )} x^{θ_{1}} (1 - x)^{θ_{2}} \mathds 1_{0 < x < 1} .

P (i_{n + 1} = j ∣ i_{n} = i) = q_{n} (i, j) .

P (i_{n + 1} = j ∣ i_{n} = i) = q_{n} (i, j) .

q_{n} (i, j) = p_{n} (q (i, j) + r_{n} (i, j)),

q_{n} (i, j) = p_{n} (q (i, j) + r_{n} (i, j)),

q (i, i) = - j \neq = i \sum q (i, j), q_{n} (i, i) = - j \neq = i \sum q_{n} (i, j) .

q (i, i) = - j \neq = i \sum q (i, j), q_{n} (i, i) = - j \neq = i \sum q_{n} (i, j) .

\left[\begin{array}[]{ccc}0&1&0\\ 1&0&0\\ 1/3&1/3&1/3\end{array}\right]

\left[\begin{array}[]{ccc}0&1&0\\ 1&0&0\\ 1/3&1/3&1/3\end{array}\right]

P^{\top}(\operatorname{Id}+q)P=\left[\begin{array}[]{ll}A&0\\ B&A^{\prime}\end{array}\right],

P^{\top}(\operatorname{Id}+q)P=\left[\begin{array}[]{ll}A&0\\ B&A^{\prime}\end{array}\right],

q_{n}=\left[\begin{array}[]{cc}-n^{-\theta}&n^{-\theta}\\ n^{-(\theta+\widetilde{\theta})}&-n^{-(\theta+\widetilde{\theta})}\end{array}\right],\quad\theta,\widetilde{\theta}>0

q_{n}=\left[\begin{array}[]{cc}-n^{-\theta}&n^{-\theta}\\ n^{-(\theta+\widetilde{\theta})}&-n^{-(\theta+\widetilde{\theta})}\end{array}\right],\quad\theta,\widetilde{\theta}>0

q=\left[\begin{array}[]{cc}-1&1\\ 0&0\end{array}\right],\quad p_{n}=n^{-\theta}.

q=\left[\begin{array}[]{cc}-1&1\\ 0&0\end{array}\right],\quad p_{n}=n^{-\theta}.

n \to + \infty lim i_{n} = ν^{⊤} in distribution .

n \to + \infty lim i_{n} = ν^{⊤} in distribution .

γ_{n} = \frac{1}{n}, α_{n} = \frac{p _{n}}{γ _{n}},

γ_{n} = \frac{1}{n}, α_{n} = \frac{p _{n}}{γ _{n}},

x_{n} = γ_{n} k = 1 \sum n e_{i_{k}}, y_{n} = α_{n} (x_{n} - ν) .

x_{n} = γ_{n} k = 1 \sum n e_{i_{k}}, y_{n} = α_{n} (x_{n} - ν) .

x_{n + 1} = \frac{γ _{n + 1}}{γ _{n}} x_{n} + γ_{n + 1} e_{i_{n + 1}},

x_{n + 1} = \frac{γ _{n + 1}}{γ _{n}} x_{n} + γ_{n + 1} e_{i_{n + 1}},

∣ x - x ∣ = \frac{1}{2} d_{TV} (x^{⊤}, x^{⊤}) = \frac{1}{2} d_{TV} (i = 1 \sum D x_{i} δ_{i}, i = 1 \sum D x_{i} δ_{i}) .

∣ x - x ∣ = \frac{1}{2} d_{TV} (x^{⊤}, x^{⊤}) = \frac{1}{2} d_{TV} (i = 1 \sum D x_{i} δ_{i}, i = 1 \sum D x_{i} δ_{i}) .

x_{n} = \frac{1}{\sum _{k = 1}^{n} ω _{k}} k = 1 \sum n ω_{k} e_{i_{k}},

x_{n} = \frac{1}{\sum _{k = 1}^{n} ω _{k}} k = 1 \sum n ω_{k} e_{i_{k}},

∣ x_{n} - ν ∣ \leq C exp (- v k = 1 \sum n γ_{k}) .

∣ x_{n} - ν ∣ \leq C exp (- v k = 1 \sum n γ_{k}) .

λ (γ, ϵ) = - n \to + \infty lim sup \frac{lo g ( γ _{n} \lor ϵ _{n} )}{\sum _{k = 1}^{n} γ _{k}} .

λ (γ, ϵ) = - n \to + \infty lim sup \frac{lo g ( γ _{n} \lor ϵ _{n} )}{\sum _{k = 1}^{n} γ _{k}} .

j \neq = i \sum q (i, j) (h_{j} - h_{i}) = ν - e_{i}, or equivalently j \neq = i \sum q (i, j) (h_{k, j} - h_{k, i}) = ν_{k} - \mathds 1_{i = k}

j \neq = i \sum q (i, j) (h_{j} - h_{i}) = ν - e_{i}, or equivalently j \neq = i \sum q (i, j) (h_{k, j} - h_{k, i}) = ν_{k} - \mathds 1_{i = k}

h_{i} = - \int_{0}^{+ \infty} (e_{i}^{⊤} e^{t (Id + q)} - ν^{⊤}) d t .

h_{i} = - \int_{0}^{+ \infty} (e_{i}^{⊤} e^{t (Id + q)} - ν^{⊤}) d t .

p_{n} n \to + \infty \sim \frac{a}{n} .

p_{n} n \to + \infty \sim \frac{a}{n} .

n \to + \infty lim sup \frac{γ _{n}}{p _{n}} = 0.

n \to + \infty lim sup \frac{γ _{n}}{p _{n}} = 0.

\frac{p _{n + 1}}{p _{n}} = 1 + \frac{Υ}{n} + o (\frac{1}{n}), n \to + \infty lim \frac{R _{n}}{p _{n} γ _{n}} = 0,

\frac{p _{n + 1}}{p _{n}} = 1 + \frac{Υ}{n} + o (\frac{1}{n}), n \to + \infty lim \frac{R _{n}}{p _{n} γ _{n}} = 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and statistical mechanics · Markov Chains and Monte Carlo Methods · Advanced Queuing Theory Analysis

Full text

Fluctuations of the Empirical Measure of

Freezing Markov Chains

Florian Bouguet

Bertrand Cloez

( Inria Nancy – Grand Est, BIGS, IECL

MISTEA, INRA, Montpellier SupAgro, Univ. Montpellier

)

Abstract

[

1 Introduction
2 Freezing Markov chains
2.1 Notation
2.2 Assumptions and main results
3 The auxiliary Markov processes
3.1 The exponential zig-zag process
3.2 The Ornstein-Uhlenbeck process
3.3 Acceleration of the jumps
4 Complete graph
4.1 General case
4.2 The turnover algorithm
5 Proofs
5.1 Asymptotic pseudotrajectories in the non-standard setting
5.2 ODE and SDE methods in the standard setting ] Markov chain; Long-time behavior; Piecewise-deterministic Markov process; Ornstein-Uhlenbeck process; Asymptotic pseudotrajectory

60J10; 60J25; 60F05

In this work, we consider a finite-state inhomogeneous-time Markov chain whose probabilities of transition from one state to another tend to decrease over time. This can be seen as a cooling of the dynamics of an underlying Markov chain. We are interested in the long time behavior of the empirical measure of this freezing Markov chain. Some recent papers provide almost sure convergence and convergence in distribution in the case of the freezing speed $1/n^{\theta}$ , with different limits depending on $\theta<1,\theta=1$ or $\theta>1$ . Using stochastic approximation techniques, we generalize these results for any freezing speed, and we obtain a better characterization of the limit distribution as well as rates of convergence as well as functional convergence.

1 Introduction

Let $(i_{n})_{n\geq 1}$ be an inhomogeneous-time Markov chain with state space $\{1,\dots,D\}$ with the following transitions when $i\neq j$ :

[TABLE]

where $(p_{n})_{n\geq 1}$ is a decreasing sequence converging toward some $p\in[0,1]$ , the remainders $r_{n}(i,j)$ tend to [math] (fast enough) and $q$ is the discrete generator of some $\{1,\dots,D\}$ -valued ergodic Markov chain. This model is related to the simulated annealing algorithm, and the sequence $(p_{n})_{n\geq 1}$ can be interpreted as the cooling scheme of an underlying Markov chain generated by $q$ . If $p<1$ , since $\lim_{n\to+\infty}q_{n}(i,j)=pq(i,j)$ , the probability of $(i_{n})_{n\geq 1}$ to move decreases over time, from which the appellation freezing Markov chain.

The behavior of $(i_{n})_{n\geq 1}$ is simple enough to understand, and depends on the summability of the sequence $(p_{n})_{n\geq 1}$ . The chain $(i_{n})_{n\geq 1}$ shall converge in distribution to the unique invariant probability $\nu^{\top}$ associated to $q$ if $\sum_{n=1}^{\infty}p_{n}=+\infty$ (see Theorem 2.4 below). On the other hand, if $\sum_{n=1}^{\infty}p_{n}<+\infty$ , the Markov chain shall freeze along the way, as a consequence of the Borel-Cantelli Lemma. Then, we shall assume that $\sum_{n=1}^{\infty}p_{n}=+\infty$ , so that we can investigate the convergence of the empirical distribution $x_{n}^{\top}=\frac{1}{n}\sum_{k=1}^{n}\delta_{i_{k}}$ .

The problem of the convergence of this empirical measure can be traced back to the thesis of Dobrušin [Dob53], and several questions are still open, as pointed out in the recent article [EV16]. Some results can be obtained from the general theory developed in [SV05, Pel12], and [DS07, EV16] study the present model. In particular, convergence results are only obtained for a freezing rate of the form $p_{n}=a/n^{\theta}$ (and $r_{n}(i,j)=0$ ). More precisely,

•

if $\theta<1$ then $(x_{n})_{n\geq 1}$ converges to $\nu$ in probability; see [DS07, Theorem 1.2].

•

if $\theta<1/2$ , then $(x_{n})_{n\geq 1}$ converges to $\nu$ a.s. This can be extended to $1/2\leq\theta<1$ when the state space contains only two points; see [DS07, Theorem 1.2] and [EV16, Corollary 2].

•

if $\theta<1$ and $D=2$ , then, up to an appropriate scaling, the empirical measure $(x_{n})_{n\geq 1}$ converges in distribution to a Gaussian distribution; see [EV16, Theorem 2].

•

if $\theta=1$ then $(x_{n})_{n\geq 1}$ converges in distribution, and the moments of the limit probability are explicit. If $q$ corresponds to the complete graph (see Section 4) then this limit probability is the Dirichlet distribution. When $D=2$ , this covers classical distribution such as Beta, uniform, Arcsine or Wigner distributions; see [DS07, Theorems 1.3 and 1.4] and [EV16, Theorem 1].

•

when $D=2$ , some convergence results are established for $(x_{n})_{n\geq 1}$ for general sequences $(p_{n})_{n\geq 1}$ , under technical conditions that we find hard to check in practice; see [EV16, Theorem 3].

We shall refer to the case $\theta<1$ as standard, since it is related to classic laws of large numbers and central limit theorems. This case was called subcritical in [EV16], in comparison with the critical case $\theta=1$ . Since we can slightly generalize this critical case here, the term non-standard will be preferred from now on. In the present article, we generalize the aforementioned results by proving that, in the standard case, if $\sum_{n=1}^{\infty}(p_{n}n^{2})^{-1}<+\infty$ then $(x_{n})_{n\geq 1}$ converges to $\nu$ a.s., and we also give weaker conditions for convergence in probability; this is the purpose of Theorem 2.11. Under slightly stronger assumptions and up to a rescaling, we obtain convergence of $(x_{n})_{n\geq 1}$ to a Gaussian distribution with explicit variance in Theorem 2.12. Finally, if $p_{n}\sim a/n$ , then $(x_{n})_{n\geq 1}$ converges in distribution exponentially fast to a limit probability (see Theorem 2.9). This distribution is characterized as the stationary measure of a piecewise-deterministic Markov process (PDMP), possesses a density with respect to the Lebesgue measure and satisfies a system of transport equations; see Propositions 3.1 and 3.4. Furthermore, Corollary 3.9 links the standard and non-standard setting by providing a convergence of the rescaled stationary measure of the PDMP to a Gaussian distribution as the switching accelerates. We also investigate the complete graph dynamics in Section 4 and are able to derive explicit results in Propositions 4.1 and 4.2. Most of our convergence results are also provided with quantitative speeds and functional convergences.

In contrast with the Pólya Urns model (see for instance [Gou97]), all these results of convergences in distribution are not almost sure. However, note that, by letting $p_{n}=1$ for all $n\geq 1$ , we can recover classical limit theorems for homogeneous-time Markov chains (see [Jon04]). Furthermore, the remainder term $r_{n}(i,j)$ enables us to deal with different freezing schemes (see Remark 2.1). In particular, the proofs in [DS07] and [EV16] are mainly based on the method of moments, which is why more stringent assumptions are considered there. Our approach is completely different, and is based on the theory of asymptotic pseudotrajectories detailed in [Ben99] and revisited in [BBC16].

Briefly, a sequence is an asymptotic pseudotrajectory of a flow if, for any given time window, the sequence and the flow starting from the same point evolve close to each other (see for instance [BH96, Ben99]). This definition can be formalized for dynamical systems and be extended to discrete sequences of probabilities and continuous Markov semi-groups. This theory allows us to derive the behavior of the sequence of empirical measures $(x_{n})_{n\geq 1}$ from the one of auxiliary continuous-time Markov processes. The interested reader may find illustrations of this phenomenon in [BBC16, Figures 3.1, 3.2 and 3.3], see also Figure 5.1. In the present paper, depending on whether we work in a standard or non-standard setting, these processes are either a diffusive process or a switching PDMP. The careful study of these limit processes is of interest per se, and is done in Section 3. More precisely, Gaussian distributions appear naturally since we deal with an Ornstein-Uhlenbeck process generated by

[TABLE]

where $\Sigma^{(p,\Upsilon)}$ is a $D\times D$ real-valued matrix such that

[TABLE]

with $p$ and $h$ respectively defined in Assumption 2.1, and in (2.6). On the contrary, we shall see that, in a non-standard framework, the empirical measure is linked to a PDMP, called exponential zig-zag process, generated by

[TABLE]

These Markov processes shall be defined and studied more rigorously in Section 3. In particular, besides some classic long-time properties (regularity, invariant measure, rate of convergence…), we prove in Theorem 3.3 the convergence of the exponential zig-zag process to the Ornstein-Uhlenbeck process when the frequency of jumps accelerates, i.e. when $a\to+\infty$ .

The rest of this paper is organized as follows. In Section 2, we specifiy the notation and assumptions mentioned earlier, that will be used in the whole paper. We also state convergence results for $(x_{n})_{n\geq 1}$ , which are Theorems 2.9, 2.11 and 2.12. We study the long-time behavior of the two auxiliary Markov processes in Section 3 and investigate the case of the complete graph in Section 4, for which it is possible to get explicit formulas. The paper is then concluded with the proofs of the main theorems in Section 5.

2 Freezing Markov chains

2.1 Notation

We shall use the following notation throughout the paper:

•

If $d$ is a positive integer, a multi-index is a $d$ -tuple $N=(N_{1},\dots,N_{d})\in(\{0,1,\dots\}\cup\{+\infty\})^{d}$ ; the set of multi-indices is endowed with the order $N\leq\widetilde{N}$ if, for all $1\leq i\leq d,N_{i}\leq\widetilde{N}_{i}$ . We define $|N|=\sum_{i=1}^{d}N_{i}$ and and we identify an integer $N$ with the multi-index $(N,\dots,N)$ . Likewise, for any $x\in\mathbb{R}^{d}$ , let $|x|=\sum_{i=1}^{d}|x_{i}|.$

•

For some multi-index $N$ and an open set $U\subseteq\mathbb{R}^{d},\mathscr{C}^{N}(U)$ is the set of functions $f:U\to\mathbb{R}$ which are $N_{i}$ times continuously differentiable in the direction $i$ . For any $f\in\mathscr{C}^{N}(U),$ we define

[TABLE]

When there is no ambiguity, we write $\mathscr{C}^{N}$ instead of $\mathscr{C}^{N}(U)$ , and denote by $\mathscr{C}^{N}_{b}$ and $\mathscr{C}^{N}_{c}$ the respective sets of bounded $\mathscr{C}^{N}$ functions and of compactly supported $\mathscr{C}^{N}$ functions.

•

Let $\triangle$ be the simplex of $\mathbb{R}^{D}$ defined by

[TABLE]

and $E=\triangle\times\{1,\dots,D\}$ .

•

We denote by $\mathscr{L}(X)$ the probability distribution of a random vector $X$ , and we identify the measures over $\{1,\dots,D\}$ with the $1\times D$ real-valued matrices. Let $\mathbb{L}$ be the Lebesgue measure over $\mathbb{R}^{D}$ .

•

If $\mu,\nu$ are probability measures and $f$ is a function, we write $\mu(f)=\int f(x)\mu(dx)$ . For a class of functions $\mathscr{F}$ , we define

[TABLE]

Note that, for every class of functions $\mathscr{F}$ considered in this paper, convergence in $d_{\mathscr{F}}$ implies (and is often equivalent to) convergence in distribution (see [BBC16, Lemma 5.1]). In particular, let

[TABLE]

be respectively the Wasserstein distance and the total variation distance.

•

For $\theta\in(0,+\infty)^{D},$ let $\mathscr{D}(\theta)$ be Dirichlet distribution over $\mathbb{R}^{D}$ , i.e. the probability distribution with probability density function

[TABLE]

For $\theta_{1},\theta_{2}>0,$ let $\beta(\theta_{1},\theta_{2})$ be the Beta distribution over $\mathbb{R}$ , i.e. the probability distribution with probability density function

[TABLE]

•

Let $x\wedge y:=\min(x,y)$ and $x\vee y:=\max(x,y)$ for any $x,y\in\mathbb{R}$ .

•

We write, for $n\geq 1,u_{n}=O(v_{n})$ if there exists some bounded sequence $(h_{n})_{n\geq 1}$ such that $u_{n}=h_{n}v_{n}$ . Moreover, if $\lim_{n\to+\infty}h_{n}=0$ , then we write $u_{n}=o(v_{n})$ .

2.2 Assumptions and main results

Let $D$ be a positive integer and $(i_{n})_{n\geq 1}$ be a $\{1,\dots,D\}$ -valued inhomogeneous-time Markov chain such that, $\forall i\neq j$ ,

[TABLE]

The following assumption, which will be in force in the rest of the paper, describes the behavior of the transitions $q_{n}$ as time goes by.

Assumption 2.1 (Freezing speed).

Assume that that the matrix $\operatorname{Id}+q$ is irreducible and, for $n\geq 1$ and $i\neq j$ ,

[TABLE]

where $(p_{n})$ is a sequence decreasing to $p\in[0,1]$ such that $\sum_{n=1}^{\infty}p_{n}=+\infty$ , and $\lim_{n\to+\infty}r_{n}(i,j)=0$ . For $i\neq j$ , assume $q(i,j)\geq 0,q_{n}(i,j)\geq 0$ and

[TABLE]

Note that we do not require $(p_{n})_{n\geq 1}$ to converge to 0. Of course, if $p>0$ , then the series $\sum_{n}p_{n}$ trivially diverges; as pointed out in the introduction, if this series converge then the problem is trivial. In fact, if $p_{n}=1$ and $r_{n}(i,j)=0$ for any integers $i,j,n$ , then the freezing Markov chain $(i_{n})_{n\geq 1}$ is a classic Markov chain. When $p=0$ , the dynamics of Assumption 2.1 corresponds to the lazier and lazier random walk introduced in [BBC16].

Remark 2.2 (Irreducibility or indecomposability).

The irreducibility of the transition matrix $\operatorname{Id}+q$ associated to $q$ is a classic hypothesis when it comes to Markov chains, since otherwise we can split their state space into different recurrent classes. However, the result of the present article can be extended to indecomposable111The algebric term indecomposable also exists for matrices, and is sometimes mistaken for irreducibility. Throughout this paper, a Markov chain (or its associated transistion matrix) is said indecomposable if it admits a unique recurrent class. Markov chains, which is a weaker concept. For instance, the transition matrix

[TABLE]

is indecomposable but not irreducible. Namely, $\operatorname{Id}+q$ is irreducible if it cannot be written as

[TABLE]

where $A,A^{\prime}$ are square matrices and $P$ is a permutation matrix. We could allow such a decomposition, as long as $B$ has a nonzero entry.

In any case, $\operatorname{Id}+q$ possesses a unique absorbing class of states on which it is irreducible. Using Perron-Frobenius Theorem (see [Gan59, Theorem 2p.53]), the matrix $\operatorname{Id}+q$ possesses a unique invariant measure $\nu^{\top}$ , and the associated chain converges toward it under aperiodicity assumptions (see also Remark 3.2). Note that aperiodicity hypotheses are not relevant for the freezing Markov chain whenever $p<1$ , since the freezing scheme automatically provides aperiodicity to the Markov chain.

$\diamondsuit$

Under Assumption 2.1, $\operatorname{Id}+q$ possesses a unique invariant distribution $\nu^{\top}$ , which writes $\nu^{\top}q=0$ ; let $\nu\in\triangle$ be its associated vector.

Remark 2.3 (Interpretation of the term $r_{n}(i,j)$ ).

The remainder $r_{n}(i,j)$ in (2.1) can either model small perturbations of the main freezing speed $p_{n}q(i,j)$ , or a multiscale freezing scheme with $p_{n}$ being the slowest freezing speed. For instance, the case

[TABLE]

is covered by Assumption 2.1, with

[TABLE]

$\diamondsuit$

The following result characterizes the long-time behavior of the inhomogeneous Markov chain $(i_{n})_{n\geq 1}$ .

Theorem 2.4 (Convergence of the freezing Markov chain).

Under Assumption 2.1, if either $p<1$ , or $p=1$ and $\operatorname{Id}+q$ is aperiodic,

[TABLE]

Now, let us define $(e_{1},\dots,e_{D})$ the natural basis of $\mathbb{R}^{D}$ and introduce two different scaling rates

[TABLE]

and the associated rescaled vectors

[TABLE]

It is clear that (2.3) writes

[TABLE]

that the vector $x_{n}$ belongs to the simplex $\triangle$ and that $(x_{n},i_{n})\in E=\triangle\times\{1,\dots,D\}$ . We highlight the fact that, in general, the sequence $(x_{n})_{n\geq 1}$ is not a Markov chain by itself, but $(x_{n},i_{n})_{n\geq 1}$ is.

Remark 2.5 (Interpretation of $\triangle$ ).

The transpose $x\mapsto x^{\top}$ is a natural bijection between $\triangle$ and the set of probability measures over $\{1,\dots,D\}$ . Then, the sequence $(x_{n}^{\top})_{n\geq 1}$ can be viewed as the sequence of empirical measures of the Markov chain $(i_{n})_{n\geq 1}$ . From that viewpoint, we highlight the fact that the $L^{1}$ norm over $\triangle$ can be interpreted (up to a multiplicative constant) as the total variation distance: indeed, for any $x,\widetilde{x}\in\triangle,$

[TABLE]

$\diamondsuit$

Remark 2.6 (Weighted means).

Note that one could consider weighted means of the form

[TABLE]

for any sequence of positive weights $(\omega_{n})_{n\geq 1}$ , as in [BC15, Remark 1.1] or [BBC16, Section 3.1]. Then, we define $\gamma_{n}=\sum_{k=1}^{n}\omega_{k}$ , and Theorem 2.11 below still holds with the bound

[TABLE]

$\diamondsuit$

Following [Ben99, BBC16], and given sequences $(\gamma_{n})_{n\geq 1},(\epsilon_{n})_{n\geq 1}$ , we define the following parameter which rules the speed of convergence in the context of standard fluctuations:

[TABLE]

Finally, we need to introduce a fundamental tool in the study of the standard fluctuations: the matrix $h$ , which is solution of the multidimensional Poisson equation

[TABLE]

for all $1\leq i,k\leq D$ , where we denoted by $h_{i}$ the $i$ -th column vector of the matrix $h$ . This solution is classically defined by

[TABLE]

With the help of Perron-Frobenius Theorem (see [Gan59, Theorem 2p.53]), it is easy to see that $h$ is well-defined.

Throughout the paper, we shall treat two different cases, which entail different limit behaviors for the fluctuations of $(x_{n})_{n\geq 1}$ or $(y_{n})_{n\geq 1}$ . Each of these cases corresponds to one of the two following assumptions.

Assumption 2.7 (Non-standard behavior).

Assume that

[TABLE]

Note that, under Assumption 2.7, the sequences $(\gamma_{n})_{n\geq 1}$ and $(p_{n})_{n\geq 1}$ are equivalent up to a multiplicative constant and the scaling $(\alpha_{n})_{n\geq 1}$ is trivial, hence we are not interested in the behavior of $(y_{n})_{n\geq 1}$ .

Assumption 2.8 (Standard behavior).

i)

Assume that

[TABLE] 2. ii)

Assume that

[TABLE]

with $R_{n}=\sup_{i}\sum_{j\neq i}|r_{n}(i,j)|$ .

Now, we have all the tools needed to study the behavior of the empirical measure $(x_{n})_{n\geq 1}$ .

Theorem 2.9 (Non-standard fluctuations).

Under Assumptions 2.1 and 2.7,

[TABLE]

where $\pi$ is characterized in Propositions 3.1 and 3.4.

Moreover, if there exist positive constants $A\geq 1,\theta\leq 1$ such that

[TABLE]

then, denoting by $\rho$ the spectral gap of $\operatorname{Id}+q$ , for any

[TABLE]

there exist a class of functions $\mathscr{F}$ defined in (5.4) and a positive constant $C$ such that

[TABLE]

It should be noted that our approach for the study of the long-time behavior of $(x_{n},i_{n})_{n\geq 0}$ also provides functional convergence for some interpolated process $(X_{t},I_{t})_{t\geq 0}$ defined in (5.3) (see Lemma 5.1, from which Theorem 2.9 is a straightforward consequence). Moreover, note that the speed of convergence provided by Theorem 2.9 writes, for any function $f:\triangle\times\{1,\dots,D\}\to\mathbb{R}$ , two times differentiable in the first variable, there exists a constant $C_{f}$ such that

[TABLE]

Remark 2.10 (Is it possible to generalize Assumption 2.7?).

This remarks leans heavily on the proof of Theorem 2.9 and may be omitted at first reading. It is interesting to wonder whether it is possible to obtain non-standard fluctuations for a more general freezing speed $(p_{n})_{n\geq 1}$ . To that end, let us try to mimic the computations of the proof of Lemma 5.1 with $(\widetilde{x}_{n},i_{n})_{n\geq 1}$ with

[TABLE]

for any vanishing sequences $(\gamma_{n})_{n\geq 1}$ and $(\widetilde{\gamma}_{n})_{n\geq 1}$ . Our method being based on asymptotic pseudotrajectories, the limit of the rescaled process of $(x_{n},i_{n})_{n\geq 1}$ belongs to a certain class of PDMPs which can be attained if, and only if,

[TABLE]

with $C_{1},C_{2},C_{3}>0$ . Without loss of generality, one can choose $\gamma_{n}=\widetilde{\gamma}_{n}$ and $C_{2}=C_{3}=1$ . Then, the third term of (2.7) entails $\gamma_{n}=(n+o(1))^{-1}$ as $n\to+\infty$ , which in turn implies $p_{n}=C_{1}n^{-1}+o(n^{-1})$ when injected in the first term of (2.7).

Also, note that assuming $A<1$ or $\theta>1$ in Theorem 2.9 would not provide better speeds of convergence, since one would obtain a speed of the form

[TABLE]

$\diamondsuit$

Theorem 2.11 (Standard convergence of the empirical measure).

Under Assumptions 2.1 and 2.8.i),

[TABLE]

or equivalently in $L^{1}$ .

Moreover, if $\sum_{n=1}^{\infty}\gamma_{n}^{2}p_{n}^{-1}<+\infty,$ then $\lim_{n\to+\infty}x_{n}=\nu$ a.s.

Moreover, if $\ell=\lambda(\gamma,\gamma/p)\wedge\lambda(\gamma,R)>0,$ then, for any $v<\ell$ there exists a (random) constant $C>0$ such that

[TABLE]

Theorem 2.12 (Standard fluctuations).

Under Assumptions 2.1 and 2.8, $(y_{n})_{n\geq 1}$ converges in distribution to the Gaussian distribution $\mathscr{N}\left(0,\Sigma^{(p,\Upsilon)}\right)$

The precise proofs of the main results are deferred to Section 5. As pointed out in the introduction, our proofs of Theorems 2.9 and 2.12 rely on comparing $(x_{n})_{n\geq 1}$ and $(y_{n})_{n\geq 1}$ with auxiliary continuous-time Markov processes, using the theory of asymptotic pseudotrajectories and the SDE method. Then, these discrete Markov chains will inherit some properties of the Markov processes that we shall prove in Section 3. In particular, the results we use provide functional convergence of the rescaled interpolating processes to the auxiliary Markov processes (see [BBC16, Theorem 2.12] and [Duf96, Théorème 4.II.4]).

Remark 2.13 (Examples of freezing rates).

For the sake of simplicity, consider $r_{n}(i,j)=0$ for all $i,j,n$ . Assumption 2.8 covers sequences $(p_{n})_{n\geq 1}$ of the form $p_{n}=n^{-\theta}$ for any $0<\theta<1$ , since $\gamma_{n}^{2}p_{n}^{-1}=n^{\theta-2}$ . In this case, $\ell=\lambda(n^{-1},n^{\theta-1})=1-\theta>0$ .

But we can also consider more exotic freezing rates, for instance $p_{n}=\log(n)^{\zeta}n^{-1},$ for some $\zeta\geq 1$ . Then, $\gamma_{n}^{2}p_{n}^{-1}=n^{-1}\log(n)^{-\zeta}$ . If $\zeta>1$ , then the series converges and $\ell=1$ . Our results do not provide almost sure convergence in the case $\zeta=1$ , however, but only convergence in probability.

It should be noted that assuming that $(p_{n})_{n\geq 1}$ is decreasing, $\lim_{n\to+\infty}p_{n}=0$ and $\sum_{n=1}^{\infty}p_{n}=+\infty$ do not imply in general that $p_{n+1}\sim p_{n}$ . A slight modification of the proof shows that, if $p_{n+1}$ is not equivalent to $p_{n}$ , we have to assume the existence of a sequence $(\beta_{n})_{n\geq 1}$ such that

[TABLE]

and such that the sequences $(\gamma_{n}^{2}\beta_{n}^{2}p_{n}^{-1})_{n\geq 1}$ and $(\beta_{n}\gamma_{n})_{n\geq 1}$ are decreasing; then the conclusion of Theorem 2.12 holds.

$\diamondsuit$

3 The auxiliary Markov processes

In this section, we study the ergodicity of the processes arising as limits of the freezing Markov from Section 2. We also study their invariant measure, and provide explicit formulas when it is possible.

3.1 The exponential zig-zag process

In this section, we investigate the asymptotic properties of the exponential zig-zag process, which arise from the non-standard scaling of the Markov chain $(i_{n})_{n\geq 1}$ . To this end, let $(\mathbf{X}_{t},\mathbf{I}_{t})_{t\geq 0}$ be the strong solution of the following SDE (see [IW89]), with values in $E$ :

[TABLE]

where the $N_{i,j}$ are independent Poisson processes of intensity $aq(i,j)\mathds{1}_{\{i\neq j\}}$ and

[TABLE]

Thus, the infinitesimal generator of this process is $\mathcal{L}_{\text{Z}}$ defined in (1.3) (see e.g. [EK86, Dav93, Kol11]). Actually, the exponential zig-zag process is a PDMP; the interested reader can consult [Dav93, BLBMZ15] for a detailed construction of the process $(\mathbf{X},\mathbf{I})$ . Let us describe briefly its dynamics: setting $\mathbf{I}_{0}=i$ , the process possesses a continuous component $\mathbf{X}$ which is exponentially attracted to the vector $e_{i}$ . The discrete component $\mathbf{I}_{t}$ is piecewise-constant, and jumps from $i$ to $j$ following the epochs of the processes $N_{i,j}$ , which in turn leads the continuous component to be attracted to $e_{j}$ (see Figure 3.1 for sample paths of the exponential zig-zag process, and Figure 4.2 for a typical path in the framework of Section 4.2).

The following result might be seen as a direct consequence of [BLBMZ12, Theorem 1.10] or [CH15, Theorem 1.4], although these articles do not provide explicit rates of convergence, which are useful for instance in the proof of Corollary 3.9.

Proposition 3.1 (Ergodicity).

The exponential zig-zag process $(\mathbf{X}_{t},\mathbf{I}_{t})_{t\geq 0}$ admits a unique stationary distribution $\pi$ . If $\rho$ is the spectral gap of $q$ , then for any for any $v<a\rho(1+a\rho)^{-1}$ , there exists a constant $C>0$ such that

[TABLE]

Moreover, if $\mathscr{L}(\mathbf{I}_{0})=\nu^{\top}$ , then

[TABLE]

Note that the speed of convergence provided in Proposition 3.1 can be improved when $D=2$ , since we are able to use more refined couplings (see Proposition 4.5).

Proof of Proposition 3.1:

The pattern of this proof follows [BLBMZ12]. Let $(\mathbf{X}_{t},\mathbf{I}_{t},\widetilde{\mathbf{X}}_{t},\widetilde{\mathbf{I}}_{t})_{t\geq 0}$ be the coupling for which the discrete components $\mathbf{I}$ and $\widetilde{\mathbf{I}}$ are equal forever once they are equal once. Let $t>0$ and $\alpha\in(0,1)$ . Firstly, note that, if $\mathbf{I}_{\alpha_{t}}=\widetilde{\mathbf{I}}_{\alpha_{t}}$ , then the processes always have common jumps and

[TABLE]

From the Perron-Frobenius theorem (see [Gan59, SC97]), for any $\varepsilon>0$ , there exists $\widetilde{C}>0$ such that

[TABLE]

Then there exists a coupling of the random variables $\mathbf{I}_{\alpha t}$ and $\widetilde{\mathbf{I}}_{\alpha t}$ such that

[TABLE]

Now, combining (3.3) and (3.4),

[TABLE]

One can optimize this speed of convergence by taking $\alpha=(1+a\rho-\varepsilon)^{-1}$ , and get

[TABLE]

with $C=2\widetilde{C}+2$ and $v=(a\rho-\varepsilon)(1+a\rho-\varepsilon)^{-1}$ . Then, $(\mathscr{L}((\mathbf{X}_{t},\mathbf{I}_{t}))$ is a Cauchy sequence and converges to a (stationary) distribution $\pi$ . Letting $\mathscr{L}(\widetilde{\mathbf{X}}_{0},\widetilde{\mathbf{I}}_{0})=\pi$ in (3.5), achieves the proof in the general case.

Now, if $\mathscr{L}(\mathbf{I}_{0})=\nu^{\top}$ , then $\mathscr{L}(\mathbf{I}_{0})=\mathscr{L}(\widetilde{\mathbf{I}}_{0})$ ; we can let $\mathbf{I}_{0}=\widetilde{\mathbf{I}}_{0}$ , and then it suffices to use (3.3) with $\alpha=0$ . ∎

If Assumption 2.1 is in force, there exists a unique invariant measure $\pi$ , which satisfies

[TABLE]

for any function $f$ smooth enough. Now, let us establish the absolute continuity of this invariant distribution with respect to the Lebesgue measure $\mathbb{L}$ .

Lemma 3.2 (Absolute continuity of the exponential zig-zag process).

Let $K\subset\mathring{\triangle}$ be a compact set. There exist constants $t_{0},c_{0}>0$ and a neighborhood $V$ of $K$ such that, for any $(x,i)\in E$ and for all $t\geq t_{0}$ ,

[TABLE]

Remark 3.3 (When $\operatorname{Id}+q$ is only indecomposable).

This remark echoes Remark 2.1 and describes the behavior of the Markov chain $(x_{n},i_{n})_{n\geq 1}$ when $\operatorname{Id}+q$ is reducible but indecomposable. In that case, Proposition 3.1 holds as well. However, $\operatorname{Id}+q$ possesses a unique recurrent class which is strictly contained in $\{1,\dots,D\}$ , the vector $\nu$ possesses at least one zero and belongs to the frontier of the simplex $\triangle$ , and $\pi(\mathring{\triangle})=0$ . It is then impossible to obtain an equivalent to Proposition 3.1 with a convergence in total variation; when $\operatorname{Id}+q$ is irreducible, this is possible using techniques inspired from [BMP*+*15, Proposition 2.5].

If $\operatorname{Id}+q$ is indecomposable, one can obtain equivalents of Lemma 3.2 and Proposition 3.4 below by replacing the Lebesgue measure $\mathbb{L}$ on $\mathbb{R}^{D}$ by the Lebesgue measure on the linear subspace spanned by the recurrent class of $\operatorname{Id}+q$ .

$\diamondsuit$

Proof of Lemma 3.2:

The proof is mainly based on Hörmander-type conditions for switching dynamical systems obtained in [BH12, BLBMZ15]. Using the notation of [BLBMZ15], let $F^{i}:x\mapsto e_{i}-x$ and then, if $D\geq 3$ ,

[TABLE]

where $\operatorname{Vect}A$ denotes the vector space spanned by $A\subseteq\mathbb{R}^{D}$ . If $D=2,$ then $\mathcal{G}_{1}(x)=\mathbb{R}^{2}$ . As a consequence, the strong bracket condition of [BLBMZ15, Definition 4.3] is satisfied. In particular, using [BLBMZ15, Theorems 4.2 and 4.4], we have that, for every $x\in\triangle$ , $y\in\mathring{\triangle}$ , there exist $t_{0}(x),c_{0}(x)>0$ and open sets $U_{0}(x),V(x,y)$ , such that for all $x_{0}\in U_{0}(x),i,j\in\{1,\dots,D\},A\subseteq\triangle$ and $t>t_{0}(x)$ ,

[TABLE]

Now, $\triangle=\cup_{x\in\triangle}U_{0}(x)$ and is compact, so there exist $x_{1},...,x_{n}$ such that $\triangle=\cup_{k=1}^{n}U_{0}(x_{k})$ . In particular, setting $V(y)=\cup_{k=1}^{n}V(x_{k},y)$ , $c_{0}=\min_{1\leq k\leq n}c_{0}(x_{k})$ , $t_{0}=\max_{1\leq k\leq n}t_{0}(x_{k})$ , we have, for all $x_{0}\in\triangle,i,j\in\{1,\dots,D\},A\subseteq\triangle$ and $t>t_{0}$ ,

[TABLE]

Once again, $K$ is compact so we can extract a finite family from the open sets $(V(y))_{y\in K}$ . Using the Markov property, this holds for every $t\geq t_{0}$ , which entails (3.6). ∎

Proposition 3.4 (System of transport equations for $\pi$ ).

The distribution $\pi$ introduced in Proposition 3.1 admits the following decomposition:

[TABLE]

where the function $\varphi$ satisfies, for any $(x,i)\in E$ ,

[TABLE]

Once we will have proved that $\pi$ admits the decomposition (3.7), the next step is the characterization of $\varphi$ . Indeed, since it satisfies

[TABLE]

for every smooth enough function $f$ , all we have to do is compute the adjoint operator of $\mathcal{L}_{\text{Z}}$ . For general switching model, it would not possible to characterize $\varphi$ as a solution of a simple system of PDEs like (3.8). However, the present form of the flow enables us to derive a simple expression for the adjoint operator of $\mathcal{L}_{\text{Z}}$ . Before turning to the proof of Proposition 3.4, let us present the following formula of integration by parts over the simplex $\triangle$ .

Lemma 3.5 (Integration by parts over $\triangle$ ).

For all $f,g\in\mathscr{C}_{c}^{1}\left(\mathring{\triangle}\right)$ , and $k,l\in\{1,\dots,D\}$ , we have

[TABLE]

Proof of Lemma 3.5:

Fix $l=1$ and let $\triangle_{1}=\left\{x_{2},\dots,x_{D}\in[0,1]:\sum_{i=2}^{D}x_{i}\leq 1\right\}$ . Then,

[TABLE]

Now, as $g(0,x_{2},\dots)=f(0,x_{2},\dots)=0$ and $\partial_{k}1=0$ , use a (classic) multidimensional integration by parts to establish that

[TABLE]

which entails Lemma 3.5. ∎

Proof of Proposition 3.4:

Integrating (3.6) with respect to the unique invariant measure $\pi$ , we obtain that $\pi$ admits an absolutely continuous part (note that uniqueness comes from Proposition 3.1). Since $\pi$ cannot have an absolutely continuous part and a singular one (see [BH12, Theorem 6]), $\pi$ admits a density with respect to the Lebesgue measure, which entails (3.7).

Now, let us characterize the function $\varphi$ . We have

[TABLE]

and, using Lemma 3.5, for any $1\leq i\leq D$ ,

[TABLE]

Hence, (3.9) writes

[TABLE]

It follows that $\varphi$ is the solution of (3.8). ∎

3.2 The Ornstein-Uhlenbeck process

In this short section, we recall a classic property of multidimensional Ornstein-Uhlenbeck processes, which is useful to characterize the behavior of $(y_{n})_{n\geq 1}$ in a standard setting. Thus, we define $(\mathbf{Y}_{t})_{t\geq 0}$ as the strong solution of the following SDE, with values in $\mathbb{R}^{D}$ :

[TABLE]

where $W$ is a standard $D$ -dimensional Brownian motion and $(\Sigma^{(p,\Upsilon)})^{1/2}$ is the square root of the positive-definite symmetric matrix $\Sigma^{(p,\Upsilon)}$ , i.e. $(\Sigma^{(p,\Upsilon)})^{1/2}((\Sigma^{(p,\Upsilon)})^{1/2})^{\top}=\Sigma^{(p,\Upsilon)}$ . The process $\mathbf{Y}$ is a classic Ornstein-Uhlenbeck process with infinitesimal generator $\mathcal{L}_{\text{O}}$ defined in (1.1). Such processes have already been thoroughly studied, so we present only the following proposition, which quantifies the speed of convergence of $\mathbf{Y}$ to its equilibrium.

Proposition 3.6 (Ergodicity of the Ornstein-Uhlenbeck process).

The Markov process $(\mathbf{Y}_{t})_{t\geq 0}$ generated by $\mathcal{L}_{\text{O}}$ in (1.1), with values in $\mathbb{R}$ , admits a unique stationary distribution $\mathscr{N}\left(0,\Sigma^{(p,\Upsilon)}\right).$

Moreover,

[TABLE]

Proof of Proposition 3.6:

First, since

[TABLE]

a straightforward integration by parts shows that, for any $f\in\mathscr{C}^{2}_{c}$ , $\mathscr{N}\left(0,\Sigma^{(p,\Upsilon)}\right)(\mathcal{L}_{\text{O}}f)=0$ so that $\mathscr{N}\left(0,\Sigma^{(p,\Upsilon)}\right)$ is an invariant measure for the Ornstein-Uhlenbeck process $\mathbf{Y}$ .

It is well-known and easy to check that $(\mathbf{Y}_{t})_{t\geq 0}$ writes

[TABLE]

where $W$ is a standard Brownian motion. Consequently, if we consider $\widetilde{\mathbf{Y}}$ another Ornstein-Uhlenbeck process generated by $\mathcal{L}_{\text{O}}$ and driven by the (same) Brownian motion $W$ ,

[TABLE]

Taking the infimum over all the couplings gives a contraction in Wasserstein distance. Now, if $\mathscr{L}(\widetilde{\mathbf{Y}}_{0})=\mathscr{N}\left(0,\Sigma^{(p,\Upsilon)}\right)$ and $(\mathbf{Y}_{0},\widetilde{\mathbf{Y}}_{0})$ is the optimal coupling between $\mathscr{L}(\mathbf{Y}_{0})$ and $\mathscr{N}\left(0,\Sigma^{(p,\Upsilon)}\right)$ with respect to $W$ , then (3.11) writes

[TABLE]

which entails the uniqueness of the invariant probability distribution as well as the exponential ergodicity of the process. ∎

3.3 Acceleration of the jumps

The current section links the Sections 3.1 and 3.2 in the following sense:

Markov chain $(i_{n})_{n\geq 1}$ Exponential zig-zag process $(\mathbf{X}_{t},\mathbf{I}_{t})_{t\geq 0}$ Ornstein-Uhlenbeck process $(\mathbf{Y}_{t})_{t\geq 0}$ Slow freezingFast freezingAcceleration of the jumps

Indeed, we prove in Theorem 3.7 the convergence of the (rescaled) exponential zig-zag process to a diffusive process as the jump rates go to infinity. Such results are fairly standard and are already known in the cases of (linear) zig-zag processes (see [FGM12, BD16]) or of particle transport processes (see [CK06]). Heuristically, since there are more frequent jumps, the process tends to concentrate around its mean $\nu$ , and the effect of the discrete component fades away. This phenomenon can be seen on Figure 3.1. We shall end this section with Corollary 3.9, which provides the convergence of the stationary distribution of the exponential zig-zag process toward a Gaussian distribution.

To this end, let $(a_{n})_{n\geq 1}$ be a sequence of positive numbers such that $a_{n}\to+\infty$ as $n\to+\infty$ and, for any integer $n$ , let $(\mathbf{X}_{t}^{(n)},\mathbf{I}_{t}^{(n)})_{t\geq 0}$ be a Markov process with values in $E$ generated by

[TABLE]

We define $\mathbf{Y}^{(n)}_{t}=\sqrt{a_{n}}(\mathbf{X}^{(n)}-\nu)$ and denote by $\mathbf{Y}^{(n)}_{t}(k)$ and $\mathbf{X}^{(n)}_{t}(k)$ the respective $k^{\text{th}}$ component of $\mathbf{Y}^{(n)}_{t}$ and $\mathbf{X}^{(n)}_{t}$ .

Theorem 3.7 (Convergence of the processes).

If $(\mathbf{Y}^{(n)}_{0})_{n\geq 1}$ converges in distribution to a probability distribution $\mu$ , then the sequence of processes $(\mathbf{Y}^{(n)})_{n\geq 1}$ converges in distribution to the diffusive Markov process generated by

[TABLE]

with initial condition $\mu$ .

Proof of Theorem 3.7:

We shall use a diffusion approximation and follow the proof of [FGM12, Proposition 1.1]. For now, we drop the superscript $(n)$ , and let, for any $1\leq k,l\leq D$ ,

[TABLE]

Then,

[TABLE]

Then, by Dynkin’s formula, for fixed $n$ , the processes $(M_{t}(k))_{t\geq 0}$ and $(N_{t}(k,l))_{t\geq 0}$ are local martingales with respect to the filtration generated by $(\mathbf{X}^{(n)},\mathbf{I}^{(n)})$ , where

[TABLE]

Remark that, for any $1\leq i\leq D$ , if $\sigma_{k,l}(i)=\sum_{j=1}^{D}q(i,j)(h_{l,j}-h_{l,i})(h_{k,j}-h_{k,i)})$ ,

[TABLE]

Then, denoting by $\mathbf{Z}_{s}(k)=\int_{0}^{t}\mathbf{Y}_{s}(k)ds$ ,

[TABLE]

and

[TABLE]

By integration by parts,

[TABLE]

hence

[TABLE]

Finally, for any $1\leq k,l\leq D$ , the processes $M^{(n)}(k)-B^{(n)}(k)$ and $M^{(n)}(k)M^{(n)}(l)-A^{(n)}(k,l)$ are local martingales, with

[TABLE]

Note that $\mathbf{I}^{(n)}$ is a Markov process on its own, generated by

[TABLE]

In other words, for any $t>0$ , we can write $\mathbf{I}^{(n)}_{t}=\mathbf{I}_{a_{n}t}$ a.s., for some pure-jump Markov process $(\mathbf{I}_{t})_{t\geq 0}$ generated by

[TABLE]

Using the ergodicity of $(\mathbf{I}_{t})_{t\geq 0}$ together with $\lim_{n\to+\infty}a_{n}=+\infty$ , we have

[TABLE]

Thus, the processes $\mathbf{Y}^{(n)}(k),B^{(n)}(k),A^{(n)}(k,l)$ satisfy the assumptions of [EK86, Chapter 7, Theorem 4.1], which entails Theorem 3.7. ∎

Remark 3.8 (Heuristics for a direct Taylor expansion of the generator).

As for many limit theorems for Markov processes, one would like to predict the convergence of the exponential zig-zag process to the Ornstein-Uhlenbeck diffusion from a Taylor expansion of the generator. Let us describe here a quick heuristic argument based on [CK06], which justifies the particular choice of functions $\varphi_{k}$ in the proof of Theorem 3.7. For the sake of simplicitylet us work in the setting of Section 4.2, that is the generator of $(\mathbf{X}_{t},\mathbf{I}_{t})_{t\geq 0}$ is of the form

[TABLE]

where $g_{i}:x\mapsto(\mathds{1}_{\{i=1\}}-x)$ . For some smooth function $f:\mathbb{R}^{D}\to\mathbb{R}$ , we have $\mathcal{L}_{\text{Z}}f(x,i)=g_{i}(x)\cdot\nabla_{x}f(x)$ which cannot be rescaled to converge to some diffusive operator. We need an approximation $f_{a}$ of $f$ in a sense that $\lim_{a\to+\infty}f_{a}=f$ and $\mathcal{L}_{\text{Z}}f_{a}$ has the form of a second order operator. Then, let

[TABLE]

where $h(x,i)$ is the solution of the multidimensional Poisson equation associated to the transitions of the flows

[TABLE]

Then,

[TABLE]

Here, $\sum_{j}\nu_{j}g_{j}(x)-g_{i}(x)=\nu_{1}-\mathds{1}_{\{i=1\}}$ does not depend on $x$ , neither does the function $h$ , which is thus defined by (2.6). Furthermore, $h(x,i)=(\theta_{1}+\theta_{2})^{-1}\mathds{1}_{i=1}$ . Moreover $\lim_{a\to+\infty}g_{i}(\nu+y/\sqrt{a})=e_{i}-\nu$ , so $\lim_{a\to+\infty}\mathcal{L}_{\text{Z}}f_{a}(x,i)=\mathcal{L}_{\text{O}}f(x)$ up to renormalization.

$\diamondsuit$

From Proposition 3.1, for any fixed $n\geq 1$ , the process $(\mathbf{X}_{t}^{(n)},\mathbf{I}_{t}^{(n)})_{t\geq 0}$ admits and converges to a unique invariant distribution $\pi^{(n)}$ , characterized in (3.7) as

[TABLE]

Let $\bar{\pi}^{(n)}$ be the first margin of the invariant measure of the Markov process $(\mathbf{Y}_{t}^{(n)},\mathbf{I}_{t}^{(n)})_{t\geq 0}$ , i.e. the probability distribution over $\mathbb{R}^{D}$ defined by

[TABLE]

Corollary 3.9 (Convergence of the stationary distributions).

The sequence of probability measures $(\bar{\pi}^{(n)})_{n\geq 1}$ converges to $\mathscr{N}\left(0,\Sigma^{(0,1)}\right)$ .

Proof of Corollary 3.9:

Let $n\geq 1,t\geq 0$ and

[TABLE]

Up to a constant, $d_{\mathscr{F}}$ is the Fortet-Mourier distance and metrizes the weak convergence. Fix $t\geq 0$ and let $\mathbf{X}_{0}^{(n)}=\nu$ and $\mathscr{L}(\mathbf{I}^{(n)}_{0})=\nu^{\top}$ . From Theorem 3.7,

[TABLE]

where $\mathbf{Y}$ is an Ornstein-Uhlenbeck process with generator $\mathcal{L}_{\text{O}}$ and initial condition [math]. Using the definition of $d_{\mathscr{F}}$ and Proposition 3.1,

[TABLE]

Let us check that the term $W\left(\delta_{0},\bar{\pi}^{(n)}\right)$ is uniformly bounded. To that end, let

[TABLE]

so that

[TABLE]

Since $\pi^{(n)}\left(\mathcal{L}^{(n)}f^{(n)}\right)=0$ ,

[TABLE]

Hence, with $C=\sum_{k=1}^{D}h_{k,k}\nu_{k}-\min_{i,j}h_{i,j}$ , and since $\int_{E}x_{k}\pi^{(n)}(dx,di)=\nu_{k}$ ,

[TABLE]

By Hölder’s inequality,

[TABLE]

Consequently to Proposition 3.6,

[TABLE]

Then,

[TABLE]

which goes to 0 as $t\to+\infty$ . ∎

4 Complete graph

In this section, we consider a particular case of freezing Markov chain, where all the states are connected, and the jump rate to a state does not depend on the position of the chain. This example of Markov chain has already been studied in the literature, for instance in [DS07]. Section 4.1 deals with the general $D$ -dimensional case, for which most of the results of Section 3 can be written explicitly, notably the invariant measure of the exponential zig-zag process, which is a mixture of Dirichlet distributions (see Figure 4.1). Section 4.2 studies more deeply the case $D=2$ , where we can refine the speed of convergence provided in Proposition 3.1.

4.1 General case

Throughout this section, following [DS07], we assume that there exists a positive vector $\theta\in(0,+\infty)^{D}$ such that, for any $1\leq i,j\leq D$ ,

[TABLE]

and we will recover [DS07, Theorem 1.4]. If $D=2$ , let us highlight that an irreducible matrix $\operatorname{Id}+q$ automatically satisfies (4.1) (if $\operatorname{Id}+q$ is indecomposable then this is true as soon as $q(1,2)q(2,1)\neq 0$ ).

Proposition 4.1 (Limit distribution for the complete graph in the non-standard setting).

Under Assumptions 2.1 and 2.7, and if $q$ satisfies (4.1), then $\nu_{i}=\theta_{i}|\theta|^{-1}$ and

[TABLE]

In particular,

[TABLE]

Proof of Proposition 4.1:

If $q$ satisfies (4.1), it is straightforward that its invariant distribution $\nu^{\top}$ is given by $\nu_{i}=\theta_{i}|\theta|^{-1}$ for any $1\leq i\leq D$ . The convergence of $(i_{n})_{n\geq 1}$ to $\nu^{\top}$ and of $(x_{n},i_{n})_{n\geq 1}$ to some distribution $\pi$ are direct corollaries of Theorems 2.4 and 2.9. Moreover, Proposition 3.4 holds, hence $\pi$ satisfies (3.7) and it is clear that

[TABLE]

is the unique (up to a multiplicative constant) solution of (3.8), which entails that

[TABLE]

Finally, if $\mathscr{L}(X,I)=\pi$ , it is clear that $\mathscr{L}(I)=\nu^{\top}$ and that

[TABLE]

∎

In the framework of (4.1), it is also possible to obtain explicitly the solution of the Poisson equation related to $q$ as well as the covariance matrix of the limit distribution in the standard setting. This is the purpose of the following result, whose proof is straightforward using Theorem 2.12 together with the expressions (1.2) and (2.6).

Proposition 4.2 (Limit distribution for the complete graph in the standard setting).

Under Assumptions 2.1 and 2.8, and if $q$ satisfies (4.1), then $\nu=|\theta|^{-1}\theta$ and $h_{i}=|\theta|^{-1}e_{i}$ and

[TABLE]

Finally, let us emphasize the fact that Corollary 3.9 provides an interesting convergence of rescaled Dirichlet distributions, when considered in the particular case of the complete graph.

Corollary 4.3 (Convergence of the rescaled Dirichlet distribution to a Gaussian law).

For any vector $\theta\in(0,+\infty)^{D}$ , if $(X_{n})_{n\geq 1}$ is a sequence of independent random variables such that $\mathscr{L}(X_{n})=\mathscr{D}(a_{n}\theta)$ , then

[TABLE]

4.2 The turnover algorithm

In this subsection, we consider the turnover algorithm introduced in [EV16]. This algorithm studies empirical frequency of heads when a coin is turned over with a certain probability, instead of being tossed as usual. The authors provide various convergences in distribution for this proportion, depending on the asymptotic behavior of the turnover probability, which corresponds to $(p_{n})_{n\geq 1}$ in the present paper. However, this turnover algorithm can be seen as a particular case of freezing Markov chain, and can then be written as the stochastic algorithm defined in (2.4), in the special case $D=2$ . Since $x_{n}(1)=1-x_{n}(2)$ , there is only one relevant variable in this section, which belongs to $[0,1]$ :

[TABLE]

Note that we are in the framework of Section 4.1, with $\theta_{1}=q(2,1)$ and $\theta_{2}=q(1,2)$ , and that Propositions 4.1 and 4.2 hold. In particular, we have $\nu_{i}=\theta_{i}(\theta_{1}+\theta_{2})^{-1}$ . Then, for any $y\in\mathbb{R}$ and $(x,i)\in[0,1]\times\{1,2\}$ , the infinitesimal generators defined in (1.1) and (1.3) write

[TABLE]

and

[TABLE]

Remark 4.4 (Comparison with [EV16]).

In the present paper, we recover [EV16, Theorems 1 and 2] as direct consequences of Theorems 2.9 and 2.12. The aforementioned results are extended by allowing $q(1,2)\neq q(2,1)$ , but mostly by obtaining results for general sequences $(p_{n})_{n\geq 1}$ while [EV16] deals only with $p_{n}=an^{-\theta}$ for positive constants $a$ and $\theta$ . It should be noted that, in order to perfectly mimic the algorithm of the aforementioned article, one should consider the chain $x^{\star}_{n}=\gamma_{n}\sum_{k=1}^{n}\left(\mathds{1}_{\{i_{k}=1\}}-\mathds{1}_{\{i_{k}=2\}}\right)$ , which evolves in $[-1,1]$ . The behavior of this sequence being completely similar to the one we are studying, we chose to work with (4.2) for the sake of consistence.

However, the reader should notice that the invariant measure of the process generated by (4.3) is a Gaussian distribution with variance $\Sigma^{(p,\Upsilon)}_{1,1}$ . In the particular case where $p=0$ and $\theta_{1}=\theta_{2}$ , this variance writes

[TABLE]

which is, at first glance, different from the variance provided in [EV16], which is (under our notation)

[TABLE]

The factor $a^{2}$ comes from the fact that [EV16] studies the behavior of $a^{-1}y_{n}$ . The factor 2 comes from the choice of normalization mentioned earlier, since $x_{n}\in[0,1]$ and $x_{n}^{\star}\in[-1,1]$ .

$\diamondsuit$

Whenever $D=2$ , it is easier to visualize the dynamics of $(\mathbf{X},\mathbf{I})$ (see Figure 4.2), and we can improve the results of Proposition 3.1 concerning the speed of convergence of the exponential zig-zag process to its stationary measure $\pi$ .

Proposition 4.5 (Ergodicity when $D=2$ ).

The Markov process $(\mathbf{X}_{t},\mathbf{I}_{t})_{t\geq 0}$ generated by $\mathcal{L}_{\text{Z}}$ in (4.4), with values in $[0,1]\times\{1,2\}$ , admits a unique stationary distribution

[TABLE]

Moreover, let $v=a(\theta_{1}\vee\theta_{2})$ , then

[TABLE]

Since the inter-jump times of the exponential zig-zag process are spread-out, it is also possible to show convergence in total variation with a method similar to [BMP*+*15, Proposition 2.5]. Note that, following Proposition 4.1, the limit distribution of $(X_{t})_{t\geq 0}$ is the first margin of $\pi$ , namely $\beta(a\theta_{1},a\theta_{2})$ .

Proof of Proposition 4.5:

Without loss of generality, let us assume that $\theta_{1}\geq\theta_{2}$ , that is $v=a\theta_{1}$ . Using Proposition 4.1, it is clear that $\pi$ is the limit distribution of $(\mathbf{X},\mathbf{I})$ . Let us turn to the quantification of the ergodicity of the process. Since the flow is exponentially contracting at rate 1, one can expect the Wasserstein distance of the spatial component $\mathbf{X}$ to decrease exponentially. The only issue is to bring $\mathbf{I}$ to its stationary measure first. So, consider the Markov coupling $\left((\mathbf{X},\mathbf{I}),(\widetilde{\mathbf{X}},\widetilde{\mathbf{I}})\right)$ of $\mathcal{L}_{\text{Z}}$ on $E\times E$ , which evolves independently if $\mathbf{I}\neq\widetilde{\mathbf{I}}$ , and else follows the same flow with common jumps. We set $T_{0}=0$ and denote by $T_{n}$ the epoch of its $n^{\text{th}}$ jump. If $\mathbf{I}_{0}\neq\widetilde{\mathbf{I}}_{0}$ , the first jump is not common a.s., but in any case, since $D=2,$ $\mathbf{I}_{T_{1}}=\widetilde{\mathbf{I}}_{T_{1}}$ a.s. and $\mathscr{L}(T_{1})=\mathscr{E}(v)$ . Consequently,

[TABLE]

Note that if $\mathscr{L}(\mathbf{I}_{0})=\mathscr{L}(\widetilde{\mathbf{I}}_{0})$ , let $\mathbf{I}_{0}=\widetilde{\mathbf{I}}_{0}$ , so that the coupling $\left((\mathbf{X},\mathbf{I}),(\widetilde{\mathbf{X}},\widetilde{\mathbf{I}})\right)$ always has common jumps and

[TABLE]

Letting $(\mathbf{X}_{0},\widetilde{\mathbf{X}}_{0})$ be the optimal Wasserstein coupling entails Wasserstein contraction. The results above hold for any initial conditions $(\widetilde{\mathbf{X}}_{0},\widetilde{\mathbf{I}}_{0})$ . Then, let $\mathscr{L}(\widetilde{\mathbf{X}}_{0},\widetilde{\mathbf{I}}_{0})=\pi$ to achieve the proof; in particular, $\mathscr{L}(\widetilde{\mathbf{I}}_{0})=\nu^{\top}=\frac{\theta_{1}}{\theta_{1}+\theta_{2}}\delta_{1}+\frac{\theta_{2}}{\theta_{1}+\theta_{2}}\delta_{2}$ . ∎

5 Proofs

In this section, we provide the proofs of the main results of this paper that were stated throughout Section 2.

Proof of Theorem 2.4:

Under Assumption 2.1, let us first assume that $p>0$ . The matrix $(\operatorname{Id}+q)$ is irreducible, and so is $(\operatorname{Id}+pq)$ . Moreover, $\nu^{\top}$ is also the invariant measure of $pq$ , and Perron-Frobenius Theorem entails that there exist $C>0$ and $\rho\in(0,1)$ such that for every $n\geq 1$ and $i\in\{1,\dots,D\}$ ,

[TABLE]

Now, let us prove that $(i_{n})_{n\geq 1}$ is an asymptotic pseudotrajectory of the dynamical system induced by $\operatorname{Id}+pq$ . The limit set of such a system being contained in every global attractor (see [Ben99, Theorems 6.9 and 6.10]), we have

[TABLE]

and the right-hand side of (5.1) converges to 0, which ends the proof.

The case $p=0$ is a mere application of [BBC16, Proposition 3.13]. ∎

5.1 Asymptotic pseudotrajectories in the non-standard setting

In this section, we prove Theorem 2.9 using results from [BBC16], based on the theory of asymptotic pseudotrajectories for inhomogeneous-time Markov chains. Indeed, with the convention $\sum_{k=1}^{0}=0$ , let

[TABLE]

and define the piecewise-constant processes

[TABLE]

We shall show that, as $t\to+\infty$ , the process $(X_{t},I_{t})_{t\geq 0}$ converges in a way (see Figure 5.1) to the exponential zig-zag process $(\mathbf{X}_{t},\mathbf{I}_{t})_{t\geq 0}$ solution of (3.1), that we already studied in Section 3.1. To that end, let $(P_{t})_{t\geq 0}$ be the Markov semigroup of $(\mathbf{X},\mathbf{I})$ , $N_{1}=(2,\dots,2,0)$ and

[TABLE]

Note that convergence with respect to $d_{\mathscr{F}}$ implies convergence in distribution (see [BBC16, Lemma 5.1]).

Lemma 5.1 (Asymptotic pseudotrajectory for non-standard fluctuations).

Under the assumptions of Theorem 2.9, the sequence of probability distributions $(\mu_{t})_{t\geq 0}$ is an asymptotic pseudotrajectory of $(P_{t}f)_{t\geq 0}$ with respect to $d_{\mathscr{F}}$ .

Moreover, if there exist positive constants $A\geq 1,\theta\leq 1$ such that

[TABLE]

then, for any $v<a\rho(1+a\rho)^{-1}$ , there exists a positive constant $C$ such that

[TABLE]

Moreover, the sequence of processes $\left((X_{n+t},I_{n+t})_{t\geq 0}\right)_{n\geq 1}$ converges in distribution, as $t\to+\infty$ , toward $(\mathbf{X}^{\pi}_{t},\mathbf{I}^{\pi}_{t})_{t\geq 0}$ in the Skorokhod space, where $(\mathbf{X}^{\pi},\mathbf{I}^{\pi})$ is a process generated by $\mathcal{L}_{\text{Z}}$ with initial condition $\pi$ .

The proof of Lemma 5.1 consists in checking [BBC16, Assumptions 2.1, 2.2, 2.3 and 2.7.ii)] and relies on three ingredients:

•

Convergence of a kind of discrete infinitesimal generator $\mathcal{L}_{n}$ , which characterizes the dynamics of $(X,I)$ , to $\mathcal{L}_{\text{Z}}$ defined in (1.3).

•

Smoothness of the limit semigroup $(P_{t})_{t\geq 0}$ and control of its derivatives with respect to the initial condition of the process.

•

Uniform boundedness of the moments of $(x_{n},i_{n})_{n\geq 1}$ up to some order, which is trivially satisfied here since $E$ is compact.

Proof of Lemma 5.1:

In what follows, the notation $O$ (as $n\to+\infty$ ) is uniform over $x,i,f$ . We define $\mathcal{L}_{n}f(x,i)=\gamma_{n+1}^{-1}\mathbb{E}[f(x_{n+1},i_{n+1}|x_{n}=x,i_{n}=i]$ , and we study the convergence of $\mathcal{L}_{n}$ to $\mathcal{L}_{\text{Z}}$ in the sense of [BBC16]. Let $(x,i)\in E$ and $\chi_{N_{1}}(x,i)=\prod_{k=1}^{D}x_{k}^{2}$ . We recall that $q_{n}(i,i)=1+O(p_{n})$ as $n\to+\infty$ . With

[TABLE]

we have

[TABLE]

We turn to the study of the regularity of the limit semigroup, following [Kun84]. Let $t>0$ and note that $\|P_{t}f\|_{\infty}\leq\|f\|_{\infty}$ . Moreover, the process $(\mathbf{X},\mathbf{I})$ is solution of the following SDE (we emphasize below the dependence on the initial condition):

[TABLE]

where $N_{i,j}$ is a Poisson process of intensity $aq(i,j)\mathds{1}_{\{i\neq j\}}$ and the matrices $A$ and $B_{i,j}$ are defined in (3.2). Then, if we denote by $\eta_{t}^{x,i,k,h}=h^{-1}\left[(\mathbf{X}^{x+he_{k},i}_{t},\mathbf{I}^{x+he_{k},i}_{t})-(\mathbf{X}^{x,i}_{t},\mathbf{I}^{x,i}_{t})\right]$ , we recover from (5.6) that the process $\eta^{x,i,k,h}$ satisfies the ODE

[TABLE]

so that $\eta_{t}^{x,i,k,h}=(\operatorname{e}^{-t}e_{k},0)$ . Thus, $\eta^{x,i,k,h}$ admits a continuous modification (notably at $h=0$ ) and $\partial_{k}(\mathbf{X}^{x,i},\mathbf{I}^{x,i})=(\operatorname{e}^{-t}e_{k},0)$ is continuous. Using similar arguments, $\partial_{k}\partial_{l}(\mathbf{X}^{x,i},\mathbf{I}^{x,i})=0$ . Gathering those expressions, and since $f^{N}$ is bounded for every multi-index $N\leq N_{1}$ , it is clear that $P_{t}f\in\mathscr{C}^{N_{1}}$ , with, for any $j,k\leq D$ ,

[TABLE]

Hence, for any $j\leq N_{1},\|(P_{t}f)^{(j)}\|_{\infty}\leq\|f^{(j)}\|_{\infty}$ . Finally, for any $n\geq 1,|x_{n}|\leq 1$ , so that

[TABLE]

Hence, we can apply [BBC16, Theorems 2.6 and 2.8.ii)] with $N_{1}=N_{2}=d_{1}=d=(2,\dots,2,0),C_{T}=1,M_{2}=3,$ to obtain the existence and the announced properties of $\pi$ as well as

[TABLE]

Moreover, following [BBC16, Remark 2.5],

[TABLE]

Finally, using Proposition 3.1 together with [BBC16, Theorem 2.8.ii)] entails (5.5). Recall the compactness of $E$ , then we can apply [BBC16, Theorem 2.12] and achieve the proof. ∎

5.2 ODE and SDE methods in the standard setting

In the present section, we successively provide proofs for Theorems 2.11 and 2.12. We shall prove the former with a method involving an asymptotic pseudotrajectory for some interpolated process, similarly to Section 5.1 and [BC15]. On the contrary, the fluctuations obtained for $(x_{n})_{n\geq 1}$ in Theorem 2.12 are obtained through a more classic result for stochastic algorithms, namely the SDE method developed in [Duf96] (see also [KY03]).

Proof of Theorem 2.11:

In the following, we mimic the proof of [BC15, Lemma 2.4] (see also [MP87, Ben97]). Indeed, for any $n\geq 1$ , (2.4) writes

[TABLE]

Let the sequence $(\tau_{n})_{n\geq 0}$ and the function $m$ be as in (5.2), and define the interpolated process

[TABLE]

for all $s\in[0,\gamma_{n+1})$ and $n\geq 0$ . We will show that $\widehat{X}$ is an asymptotic pseudotrajectory (and a $\ell-$ pseudotrajectory) for the flow $\Phi(x,t)=\nu+e^{-t}(x-\nu)$ associated to the ODE $\partial_{t}\Phi(x,t)=\nu-\Phi(x,t)$ . From [Ben99, Proposition 4.1] it suffices to show that, for all $T>0,$

[TABLE]

and

[TABLE]

Consider $h$ defined in (2.6). Then,

[TABLE]

We shall bound each term of the sum (5.9) separately. We easily have

[TABLE]

and

[TABLE]

where $\|h\|_{1}=\sup_{j}\sum_{i}|h_{i,j}|$ and $R_{n}=\sup_{i}\sum_{j}|r_{n}(i,j)|.$ Also, for some constant $C>0$ ,

[TABLE]

Note that $(\gamma_{n}\sum_{j=1}^{D}q(i_{n},j)h_{j}-\gamma_{n+1}\sum_{j=1}^{D}q(i_{n+1},j)h_{j})$ is the main term of a telescoping series. It remains to bound the norm of the sum of $\gamma_{n+1}p_{n}^{-1}\left(h_{i_{n+1}}-\mathbb{E}[h_{i_{n+1}}|i_{n+1}]\right)$ . For all $m,n\geq 1$ and $l=1,...,D$ , set

[TABLE]

The sequence $(M_{m,n}(l))_{m\geq n}$ is a martingale and

[TABLE]

Moreover, as

[TABLE]

by Theorem 2.11, we obtain

[TABLE]

As a consequence of (5.10), there exists some constant $C>0$ such that

[TABLE]

By Doob’s inequality and Assumption 2.8, it follows that, for every $k\geq 0$ ,

[TABLE]

which implies that $\lim_{k\to+\infty}\sup_{0\leq h\leq T}|M_{m(kT),m(kT+h)}|=0$ and then $\lim_{k\to+\infty}\Delta(kT,T)=0$ in probability. By the triangle inequality and [Ben99, Proposition 4.1], (5.7) holds.

Under the assumption that $\sum_{n=1}^{\infty}\gamma_{n+1}^{2}p_{n}^{-1}<+\infty,$

[TABLE]

which implies $\lim_{k\to+\infty}\sup_{0\leq h\leq T}|M_{m(kT),m(kT+h)}|=0$ a.s. Then, $\lim_{k\to+\infty}\Delta(kT,T)=0$ a.s. and $\lim_{t\to+\infty}\Delta(t,T)=0$ since

[TABLE]

In order to obtain a $\ell$ -pseudotrajectory, use Markov’s and Doob’s inequalities so that

[TABLE]

Now, for all $\varepsilon>0$ and $k$ large enough,

[TABLE]

where $\lambda(\gamma,\gamma/p)$ is defined in (2.5). Hence,

[TABLE]

and by the Borel-Cantelli lemma, we have

[TABLE]

Then, bounding all the other terms of (5.9), we find

[TABLE]

with

[TABLE]

Since the flow $\Phi$ converges to $\nu$ exponentially fast at rate $1$ , use [Ben99, Theorem 6.9 and Lemma 8.7] to achieve the proof. ∎

Proof of Theorem 2.12:

We have

[TABLE]

Recall (5.9), so that

[TABLE]

with a remainder term $b_{n}$ converging to 0. Now, we want to use [Duf96, Théorème 4.II.4]. In our setting, its notation reads

[TABLE]

with

[TABLE]

and

[TABLE]

Then, by (5.10) and similar computations,

[TABLE]

where $\Sigma^{(p,\Upsilon)}$ is defined in (1.2). Classically, we should prove that $\lim_{n\to+\infty}\|\widehat{r}_{n}\|=0$ , in order to work in the framework of [Duf96, Hypothèse H4-4], which is quite difficult. Nevertheless, rather than checking that $\lim_{n\to+\infty}\|\widehat{r}_{n}\|=0$ it is sufficient222This assertion can be easily checked at the end of [Duf96, p.156], whose proof is based on usual arguments on diffusion approximation, such as [EK86]. The decomposition (5.11) is often assumed in more recent generalizations, see for instance [For15]. Note that we cannot use directly [For15], which besides does not provide functional convergence. to prove that

[TABLE]

for any $T>0$ , where $m(t)$ is defined in (5.2). Then, let

[TABLE]

The sequence $(\widehat{r}^{(1)}_{n})_{n\geq 1}$ goes to [math] a.s. and in $L^{1}$ straightforwardly under our assumptions. Furthermore

[TABLE]

The first line of (5.12) is a telescoping series and is bounded by $\alpha_{n}\gamma_{n+1}$ which goes to [math]. The second line of (5.12) is bounded by,

[TABLE]

for some $C>0$ . Since (5.12) is a telescoping series as well, and goes to [math], we established the announced decomposition (5.11). As a conclusion, the diffusive limit $(\mathbf{Y}_{t})_{t\geq 0}$ is the solution of (3.10), which trivially admits $V:z\mapsto z$ as a Lyapunov function, as required in [Duf96, Hypothèse H4-3]. The only use of an assumption on the eigenelements of $\Sigma^{(p,\Upsilon)}$ would be to guaranty the existence, uniqueness of and convergence to an invariant distribution for $\mathbf{Y}$ , which was already proved in Proposition 3.6. ∎

**Acknowledgements: ** Both authors acknowledge financial support from the ANR PIECE (ANR-12-JS01-0006-01) and the Chaire Modélisation Mathématique et Biodiversité.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[BBC 16] M. Benaïm, F. Bouguet, and B. Cloez. Ergodicity of inhomogeneous Markov chains through asymptotic pseudotrajectories. Ar Xiv e-prints , January 2016.
2[BC 15] M. Benaïm and B. Cloez. A stochastic approximation approach to quasi-stationary distributions on finite spaces. Electron. Commun. Probab. , 20:no. 37, 14, 2015.
3[BD 16] J. Bierkens and A. Duncan. Limit theorems for the Zig-Zag process. Ar Xiv e-prints , July 2016.
4[Ben 97] M. Benaïm. Vertex-reinforced random walks and a conjecture of Pemantle. Ann. Probab. , 25(1):361–392, 1997.
5[Ben 99] M. Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de Probabilités, XXXIII , volume 1709 of Lecture Notes in Math. , pages 1–68. Springer, Berlin, 1999.
6[BH 96] M. Benaïm and M. W. Hirsch. Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dynam. Differential Equations , 8(1):141–176, 1996.
7[BH 12] Y. Bakhtin and T. Hurth. Invariant densities for dynamical systems with random switching. Nonlinearity , 25(10):2937–2952, 2012.
8[BLBMZ 12] M. Benaïm, S. Le Borgne, F. Malrieu, and P.-A. Zitt. Quantitative ergodicity for some switched dynamical systems. Electron. Commun. Probab. , 17:no. 56, 14, 2012.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Fluctuations of the Empirical Measure of

Abstract

Contents

1 Introduction

2 Freezing Markov chains

2.1 Notation

2.2 Assumptions and main results

Assumption 2.1** (Freezing speed).**

Theorem 2.4** (Convergence of the freezing Markov chain).**

Assumption 2.7** (Non-standard behavior).**

Assumption 2.8** (Standard behavior).**

Theorem 2.9** (Non-standard fluctuations).**

Theorem 2.11** (Standard convergence of the empirical measure).**

Theorem 2.12** (Standard fluctuations).**

3 The auxiliary Markov processes

3.1 The exponential zig-zag process

Proposition 3.1** (Ergodicity).**

Lemma 3.2** (Absolute continuity of the exponential zig-zag process).**

Proposition 3.4** (System of transport equations for π\piπ).**

Lemma 3.5** (Integration by parts over △\triangle△).**

3.2 The Ornstein-Uhlenbeck process

Proposition 3.6** (Ergodicity of the Ornstein-Uhlenbeck process).**

3.3 Acceleration of the jumps

Theorem 3.7** (Convergence of the processes).**

Corollary 3.9** (Convergence of the stationary distributions).**

4 Complete graph

4.1 General case

Proposition 4.1** (Limit distribution for the complete graph in the non-standard setting).**

Proposition 4.2** (Limit distribution for the complete graph in the standard setting).**

Corollary 4.3** (Convergence of the rescaled Dirichlet distribution to a Gaussian law).**

4.2 The turnover algorithm

Proposition 4.5** (Ergodicity when D=2D=2D=2).**

5 Proofs

5.1 Asymptotic pseudotrajectories in the non-standard setting

Lemma 5.1** (Asymptotic pseudotrajectory for non-standard fluctuations).**

5.2 ODE and SDE methods in the standard setting

Assumption 2.1 (Freezing speed).

Theorem 2.4 (Convergence of the freezing Markov chain).

Assumption 2.7 (Non-standard behavior).

Assumption 2.8 (Standard behavior).

Theorem 2.9 (Non-standard fluctuations).

Theorem 2.11 (Standard convergence of the empirical measure).

Theorem 2.12 (Standard fluctuations).

Proposition 3.1 (Ergodicity).

Lemma 3.2 (Absolute continuity of the exponential zig-zag process).

Proposition 3.4 (System of transport equations for $\pi$ ).

Lemma 3.5 (Integration by parts over $\triangle$ ).

Proposition 3.6 (Ergodicity of the Ornstein-Uhlenbeck process).

Theorem 3.7 (Convergence of the processes).

Corollary 3.9 (Convergence of the stationary distributions).

Proposition 4.1 (Limit distribution for the complete graph in the non-standard setting).

Proposition 4.2 (Limit distribution for the complete graph in the standard setting).

Corollary 4.3 (Convergence of the rescaled Dirichlet distribution to a Gaussian law).

Proposition 4.5 (Ergodicity when $D=2$ ).

Lemma 5.1 (Asymptotic pseudotrajectory for non-standard fluctuations).