Mixing time of the adjacent walk on the simplex

Pietro Caputo; Cyril Labb\'e; Hubert Lacoin

arXiv:1904.01088·math.PR·November 16, 2020

Mixing time of the adjacent walk on the simplex

Pietro Caputo, Cyril Labb\'e, Hubert Lacoin

PDF

TL;DR

This paper analyzes the mixing time of the adjacent walk on the simplex, showing a cutoff phenomenon with a precise spectral gap and extending results to log-concave distributions.

Contribution

It determines the spectral gap and mixing times for the adjacent walk on the simplex, revealing a cutoff phenomenon and extending to log-concave Beta distributions.

Findings

01

Spectral gap of the adjacent walk is explicitly determined.

02

Both total variation and separation distances exhibit cutoff.

03

Mixing times differ by a factor of 2.

Abstract

By viewing the $N$ -simplex as the set of positions of $N - 1$ ordered particles on the unit interval, the adjacent walk is the continuous time Markov chain obtained by updating independently at rate 1 the position of each particle with a sample from the uniform distribution over the interval given by the two particles adjacent to it. We determine its spectral gap and prove that both the total variation distance and the separation distance to the uniform distribution exhibit a cutoff phenomenon, with mixing times that differ by a factor $2$ . The results are extended to the family of log-concave distributions obtained by replacing the uniform sampling by a symmetric log-concave Beta distribution.

Figures1

Click any figure to enlarge with its caption.

Equations631

Ω_{N} := {x = (x_{1}, . . ., x_{N - 1}) \in R^{N - 1} : 0 \leq x_{1} \leq . . . \leq x_{N - 1} \leq N} .

Ω_{N} := {x = (x_{1}, . . ., x_{N - 1}) \in R^{N - 1} : 0 \leq x_{1} \leq . . . \leq x_{N - 1} \leq N} .

X^{x} (t) = (X_{1} (t), . . ., X_{N - 1} (t)),

X^{x} (t) = (X_{1} (t), . . ., X_{N - 1} (t)),

(L_{N} f) (x) = i = 1 \sum N - 1 \int_{0}^{1} (f (x^{(i, u)}) - f (x)) d u,

(L_{N} f) (x) = i = 1 \sum N - 1 \int_{0}^{1} (f (x^{(i, u)}) - f (x)) d u,

x_{j}^{(i, u)} = {x_{j}, u x_{i - 1} + (1 - u) x_{i + 1}, i \neq = j, i = j .

x_{j}^{(i, u)} = {x_{j}, u x_{i - 1} + (1 - u) x_{i + 1}, i \neq = j, i = j .

gap_{N} = 1 - cos (\frac{π}{N}),

gap_{N} = 1 - cos (\frac{π}{N}),

f_{N} (x) = k = 1 \sum N - 1 sin (\frac{π k}{N}) (x_{k} - k) .

f_{N} (x) = k = 1 \sum N - 1 sin (\frac{π k}{N}) (x_{k} - k) .

d_{N} (t) := x \in Ω_{N} sup ∥ P_{t}^{x} - π_{N} ∥_{T V},

d_{N} (t) := x \in Ω_{N} sup ∥ P_{t}^{x} - π_{N} ∥_{T V},

∥ μ - ν ∥_{T V} = B \in B (Ω_{N}) sup (μ (B) - ν (B)),

∥ μ - ν ∥_{T V} = B \in B (Ω_{N}) sup (μ (B) - ν (B)),

T_{N} (ε) := in f {t \geq 0 : d_{N} (t) < ε} .

T_{N} (ε) := in f {t \geq 0 : d_{N} (t) < ε} .

N \to \infty lim \frac{T _{N} ( ε )}{N ^{2} lo g N} = \frac{1}{π ^{2}} .

N \to \infty lim \frac{T _{N} ( ε )}{N ^{2} lo g N} = \frac{1}{π ^{2}} .

T_{N} (ε) \sim N \to \infty \frac{lo g N}{2 gap _{N}} .

T_{N} (ε) \sim N \to \infty \frac{lo g N}{2 gap _{N}} .

E [f_{N} (X^{x} (t))] = f_{N} (x) e^{- gap_{N} t},

E [f_{N} (X^{x} (t))] = f_{N} (x) e^{- gap_{N} t},

ρ_{α} (u) := \frac{Γ ( 2 α )}{Γ ( α ) ^{2}} [u (1 - u)]^{α - 1} .

ρ_{α} (u) := \frac{Γ ( 2 α )}{Γ ( α ) ^{2}} [u (1 - u)]^{α - 1} .

(L_{N, α} f) (x) = i = 1 \sum N - 1 \int_{0}^{1} (f (x^{(i, u)}) - f (x)) ρ_{α} (u) d u .

(L_{N, α} f) (x) = i = 1 \sum N - 1 \int_{0}^{1} (f (x^{(i, u)}) - f (x)) ρ_{α} (u) d u .

π_{N, α} (d x) := \frac{Γ ( N α )}{Γ ( α ) ^{N} N ^{N α - 1}} i = 1 \prod N (x_{i} - x_{i - 1})^{α - 1} d x

π_{N, α} (d x) := \frac{Γ ( N α )}{Γ ( α ) ^{N} N ^{N α - 1}} i = 1 \prod N (x_{i} - x_{i - 1})^{α - 1} d x

gap_{N} = 1 - cos (\frac{π}{N}), f_{N} (x) = k = 1 \sum N - 1 sin (\frac{π k}{N}) (x_{k} - k) .

gap_{N} = 1 - cos (\frac{π}{N}), f_{N} (x) = k = 1 \sum N - 1 sin (\frac{π k}{N}) (x_{k} - k) .

N \to \infty lim \frac{T _{N, α} ( ε )}{N ^{2} lo g N} = \frac{1}{π ^{2}} .

N \to \infty lim \frac{T _{N, α} ( ε )}{N ^{2} lo g N} = \frac{1}{π ^{2}} .

(G_{N, α} f) (η) = \frac{1}{N} 1 \leq i < j \leq N \sum \int_{0}^{1} (f (η^{(i, j, u)}) - f (η)) ρ_{α} (u) d u,

(G_{N, α} f) (η) = \frac{1}{N} 1 \leq i < j \leq N \sum \int_{0}^{1} (f (η^{(i, j, u)}) - f (η)) ρ_{α} (u) d u,

η_{k}^{(i, j, u)} = ⎩ ⎨ ⎧ η_{k}, u (η_{i} + η_{j}), (1 - u) (η_{i} + η_{j}), k \neq = i, j, k = i, k = j,

η_{k}^{(i, j, u)} = ⎩ ⎨ ⎧ η_{k}, u (η_{i} + η_{j}), (1 - u) (η_{i} + η_{j}), k \neq = i, j, k = i, k = j,

gap (G_{N, α}) = \frac{α N + 1}{( 2 α + 1 ) N},

gap (G_{N, α}) = \frac{α N + 1}{( 2 α + 1 ) N},

g_{N} (η) = const. + i = 1 \sum N η_{i}^{2} .

g_{N} (η) = const. + i = 1 \sum N η_{i}^{2} .

d_{N, α}^{sep} (t) := 1 - A \in B (Ω_{N}) B \in B (Ω_{N}) in f \frac{P _{t} ( A , B )}{π _{N, α} ( A ) π _{N, α} ( B )}

d_{N, α}^{sep} (t) := 1 - A \in B (Ω_{N}) B \in B (Ω_{N}) in f \frac{P _{t} ( A , B )}{π _{N, α} ( A ) π _{N, α} ( B )}

P_{t} (A, B) := \int_{A} P_{t}^{x} (B) π_{N, α} (d x),

P_{t} (A, B) := \int_{A} P_{t}^{x} (B) π_{N, α} (d x),

T_{N, α}^{sep} (ε) := in f {t \geq 0 : d_{N, α}^{sep} (t) < ε} .

T_{N, α}^{sep} (ε) := in f {t \geq 0 : d_{N, α}^{sep} (t) < ε} .

N \to \infty lim \frac{T _{N, α}^{sep} ( ε )}{N ^{2} lo g N} = \frac{2}{π ^{2}} .

N \to \infty lim \frac{T _{N, α}^{sep} ( ε )}{N ^{2} lo g N} = \frac{2}{π ^{2}} .

d_{N, α}^{'} (t) := 1 - x \in Ω_{N} B \in B (Ω_{N}) in f \frac{P _{t} ( x , B )}{π _{N, α} ( B )} .

d_{N, α}^{'} (t) := 1 - x \in Ω_{N} B \in B (Ω_{N}) in f \frac{P _{t} ( x , B )}{π _{N, α} ( B )} .

T_{N} (1/4) \geq c N^{2} lo g N,

T_{N} (1/4) \geq c N^{2} lo g N,

T_{N, α} (ε) \geq \frac{1}{2 gap _{N}} (lo g N - C_{α, ε}),

T_{N, α} (ε) \geq \frac{1}{2 gap _{N}} (lo g N - C_{α, ε}),

T_{N, α} (1/4) \leq C N^{2} lo g N,

T_{N, α} (1/4) \leq C N^{2} lo g N,

Ω_{N}^{+} := {x = (x_{1}, . . ., x_{N}) \in R^{N} : 0 \leq x_{1} \leq x_{2} \leq . . . \leq x_{N}} .

Ω_{N}^{+} := {x = (x_{1}, . . ., x_{N}) \in R^{N} : 0 \leq x_{1} \leq x_{2} \leq . . . \leq x_{N}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Mixing time of the adjacent walk on the simplex

Pietro Caputo

Department of Mathematics and Physics, Roma Tre University, Largo San Murialdo 1, 00146 Roma, Italy.

[email protected]

,

Cyril Labbé

Université Paris-Dauphine, PSL University, Ceremade, CNRS, 75775 Paris Cedex 16, France.

[email protected]

and

Hubert Lacoin

IMPA, Estrada Dona Castorina 110, Rio de Janeiro, Brasil.

[email protected]

(Date: January 26, 2020)

Abstract.

By viewing the $N$ -simplex as the set of positions of $N-1$ ordered particles on the unit interval, the adjacent walk is the continuous time Markov chain obtained by updating independently at rate 1 the position of each particle with a sample from the uniform distribution over the interval given by the two particles adjacent to it. We determine its spectral gap and prove that both the total variation distance and the separation distance to the uniform distribution exhibit a cutoff phenomenon, with mixing times that differ by a factor $2$ . The results are extended to the family of log-concave distributions obtained by replacing the uniform sampling by a symmetric log-concave Beta distribution.

**MSC 2010 subject classifications: Primary 60J25; Secondary 37A25, 82C22.

Keywords: Spectral gap; Mixing time; Cutoff; Adjacent walk.**

1. Introduction

Randomized algorithms based on Markov chains are commonly used for sampling points uniformly at random in a convex body and to simulate other log-concave distributions in $N$ -dimensional Euclidean space. The mixing time of the associated random walk in $\mathbb{R}^{N}$ is often known to be polynomial in $N$ , see e.g. [DFK91, LV03]. In analogy with the setting of Markov chains with discrete state space, where the theory seems to be more advanced, see e.g. the monographs [AF02, LPW17], it is of interest to develop a finer analysis of the asymptotic growth of mixing times in high dimensions.

In this paper we address that question for a specific model, namely the adjacent walk on the $N$ -simplex. By viewing the $N$ -simplex as the set of positions of $N-1$ ordered particles on the unit interval, this is the process obtained by updating independently at rate 1 the position of each particle with a sample from the uniform distribution over the interval given by the two particles adjacent to it. The process defines a Gibbs sampler for the uniform distribution over the simplex.

The adjacent walk on the $N$ -simplex has been previously analysed in [RW05b], where the authors proved upper and lower bounds on the mixing time that are tight up to constant factors; see also [RW05a, Smi14, Smi13] for similar estimates in related models. A version of this model with open boundary conditions was introduced in [KMP82] to study the heat flow in a chain of one-dimensional oscillators.

Here we determine the mixing time to leading order and we establish the so-called cutoff phenomenon for the adjacent walk on the $N$ -simplex. We also show that the same results hold if the uniform distribution over the sampling interval is replaced by a symmetric log-concave Beta law, in which case the chain converges to a log-concave distribution over the simplex.

1.1. Model and results

It is convenient to replace the unit interval by the interval $[0,N]$ , that is to rescale the $N$ -simplex to the set $\Omega_{N}$ of positions of $N-1$ ordered particles on the interval $[0,N]$ defined by

[TABLE]

We also set $x_{0}=0,x_{N}=N$ . An element $x$ in $\Omega_{N}$ can be viewed as a non-decreasing interface $k\mapsto x_{k}$ connecting $(0,0)$ to $(N,N)$ : this viewpoint will be adopted throughout the article. Given an initial condition $x\in\Omega_{N}$ , we consider the process

[TABLE]

where ${\bf X}(0)=x$ , and independently for each $i$ , at rate $1$ , $X_{i}$ is resampled according to the uniform distribution on $[X_{i-1},X_{i+1}]$ , where we use the convention $X_{0}:=0$ and $X_{N}:=N$ .

More formally, this is the continuous time Markov chain with generator

[TABLE]

where, given $u\in[0,1]$ and $i$ , the transformation $x\mapsto x^{(i,u)}$ is defined by

[TABLE]

Let $\pi_{N}$ denote the uniform distribution over $\Omega_{N}$ . Noting that $\int_{0}^{1}f(x^{(i,u)})du$ coincides with the conditional expectation $\pi_{N}[f\,|\,x_{j},j\neq i]$ shows that the generator is a finite linear combination of orthogonal projections in $L^{2}(\Omega_{N},\pi_{N})$ . In particular, $\mathcal{L}_{N}$ is a bounded self adjoint operator, and the Markov chain is reversible with respect to $\pi_{N}$ . Note that $\pi_{N}$ can be obtained by conditioning $N$ i.i.d. exponential random variables of parameter $1$ to have a sum equal to $N$ , and by taking the $x_{i}$ ’s to be their partial sums.

The generator is nonpositive definite and has the trivial eigenvalue zero associated to constant functions. Our first result is a characterization of the spectral gap, defined as the smallest nontrivial eigenvalue of $-\mathcal{L}_{N}$ .

Proposition 1.

For any $N\geq 2$ , the spectral gap of the generator is given by

[TABLE]

and the corresponding eigenfunction is

[TABLE]

We are interested in the time needed for the total variation distance to equilibrium, starting from the “worst” initial condition, to pass below some given threshold $\varepsilon\in(0,1)$ . When this time is, to leading order in $N$ , insensitive to the choice of $\varepsilon$ , one speaks of a cutoff phenomenon. More precisely, we set

[TABLE]

where $P_{t}^{x}$ is the law of ${\bf X}^{x}(t)$ , and for $\mu,\nu$ probability measures on $\Omega_{N}$ , the total variation distance is given by

[TABLE]

the supremum ranging over all Borel subsets of $\Omega_{N}$ . For any $\varepsilon\in(0,1)$ , the mixing time is defined by

[TABLE]

Our main result is the following.

Theorem 1.

For any $\varepsilon\in(0,1)$ ,

[TABLE]

As a consequence, the sequence of Markov chains displays a cutoff phenomenon.

In view of Proposition 1, the above theorem can be restated as

[TABLE]

Remark that if $f_{N}$ is as in (4) and we start with the extremal initial condition $x_{i}\equiv N$ , then $t=\frac{\log N}{2\operatorname{\mathrm{gap}}_{N}}$ is exactly the time it takes for the expected value

[TABLE]

to drop from the initial value $f_{N}(x)=\Theta(N^{2})$ to a value $\Theta(N^{3/2})$ , which is the typical size of fluctuations of $f_{N}$ at equilibrium.

Strikingly, Proposition 1 and Theorem 1 take the exact same form in the case of the symmetric simple exclusion process on the $N$ -segment; see [Wil04, Lac16].

1.2. Generalization to Beta-resampling

Given $\alpha\in(0,\infty)$ , the symmetric Beta distribution of parameter $\alpha$ is the probability measure on $[0,1]$ with density

[TABLE]

We define a generalization of the process described in (2) by resampling points according to a Beta distribution

[TABLE]

While $\rho_{\alpha}(u)du$ could be replaced by any probability on $[0,1]$ , the Beta laws are the only resampling laws that make the dynamics reversible with respect to probability measures with a product structure; see Remark 6 below. The associated equilibrium distribution is given by

[TABLE]

where $\,\text{\rm d}x$ is Lebesgue’s measure on $\Omega_{N}$ . Since $\int_{0}^{1}f(x^{(i,u)})\rho_{\alpha}(u)du$ equals the conditional expectation $\pi_{N,\alpha}[f\,|\,x_{j},j\neq i]$ , it follows that $\mathcal{L}_{N,\alpha}$ is a bounded self adjoint operator in $L^{2}(\Omega_{N},\pi_{N,\alpha})$ . The following theorem extends the results of Proposition 1 and Theorem 1.

Theorem 2.

For any $\alpha\geq 1$ , the spectral gap and the corresponding eigenfunction are given by

[TABLE]

Moreover, for any $\varepsilon\in(0,1)$ the $\varepsilon$ -mixing time associated with the Beta-resampling process satisfies

[TABLE]

In particular, the sequence of Markov chains displays a cutoff phenomenon.

Remark 2.

We have chosen $x_{N}:=N$ so that average inter-particle spacing at equilibrium is one. However, this convention has no influence on the result and we may take as well $x_{N}=1$ or any other constant (the only effect of this change being a dilation or contraction of space). In the course of the proof it is sometimes convenient to consider also the case of a random $x_{N}$ (since the process leaves $x_{N}$ fixed this makes the dynamics non-irreducible in that case).

Remark 3.

The restriction $\alpha\geq 1$ is due to the fact that certain parts of our proof require log-concavity of the probability density $\rho_{\alpha}$ , which in turn implies the validity of the FKG property for the equilibrium measure $\pi_{N,\alpha}$ , see Section 2.2. However, we believe the result to be valid for all $\alpha>0$ .

Remark 4.

The spectrum of $\mathcal{L}_{N,\alpha}$ restricted to the invariant subspace consisting of linear functions can be computed explicitly (see Section 2.3 below) and the fact that $\operatorname{\mathrm{gap}}_{N,\alpha}=1-\cos\left(\frac{\pi}{N}\right)$ is equivalent to the statement that the spectral gap is attained within this subspace. As the following mean field example shows, this phenomenon is not always to be expected for the exchange dynamics obtained by replacing the segment with another graph. Consider for instance the exchange process on the complete graph with generator

[TABLE]

where

[TABLE]

and $\eta=\{\eta_{k}\}$ denotes the increment variables $\eta_{k}=x_{k}-x_{k-1}$ , $k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N$ . In other words, $\mathcal{G}_{N,\alpha}$ defines the mean field version of (7). Notice that $\mathcal{G}_{N,\alpha}$ is reversible w.r.t. $\pi_{N,\alpha}$ . Using the arguments in [CCL03] or [Cap08] one can show that for each $N\geq 2,\alpha>0$ the spectral gap of $\mathcal{G}_{N,\alpha}$ satisfies

[TABLE]

and that the corresponding eigenfunction is the symmetric quadratic function

[TABLE]

1.3. Mixing time for the separation distance

Another distance for which mixing time can be considered is the separation distance (see [GS02]). Its use in the study of mixing times for Markovian systems was initially advocated by Aldous and Diaconis because of its appealing relation to strong-stationary times [AD87]. While to our knowledge, separation distance has been considered so far only for countable state spaces, it can be generalized to a continuous setup via the definition

[TABLE]

where

[TABLE]

and the infimum ranges over all Borel subsets of $\Omega_{N}$ of positive Lebesgue measure. We can then define the separation mixing time associated with our chain as

[TABLE]

The separation mixing time is always larger than or equal to its total variation counterpart. It is known (cf. [LPW17, Lemma 19.3] in the discrete setup, and Section 4 below for an adaptation of the argument to the continuous case) that in the case of reversible chains, the separation mixing time is at most twice as large as that in total variation. Because of this factor $2$ , cutoff in total variation does not necessarily imply that the same phenomenon holds for the separation distance (see [HLP16] for counter-examples)

We prove nonetheless that our dynamics displays a cutoff phenomenon for the separation distance and that the associated mixing times are twice as large as those for the total variation distance.

Theorem 3.

For all $\alpha\geq 1$ and every $\varepsilon\in(0,1)$ , the $\varepsilon$ -mixing time for the separation distance satisfies

[TABLE]

Here again, our result is analogous to the one obtained for the symmetric simple exclusion process [Wil04, Lac16].

Remark 5.

Note that another natural definition of the separation distance in the continuous setup would be

[TABLE]

*Using explicit bounds on the total-mass of the singular part of the law at time $t$ , see the proof of Lemma 38, one can prove that the two distances have a cutoff phenomenon at the same time. Because of the regularity of our setting, we strongly believe that $d^{\prime}_{N,\alpha}(t)=d^{\,\mathrm{sep}}_{N,\alpha}(t)$ , however this is certainly not true for every reversible Markov chain. *

1.4. Comments on the proof and related work

The cutoff phenomenon is a widely studied topic in Markov chain theory, see [Dia96, AF02, LPW17] for an introduction. Thanks to recent remarkable efforts, many interesting examples are known of Markov chains exhibiting this particular type of phase transition. Unfortunately, these seem to be mostly confined to the setting of Markov chains with discrete state space, see however [HJ17] for a recent exception. One of the reasons is possibly the fact that the analysis of total variation distance in a continuous state space is more demanding.

Let us briefly describe the main arguments used to establish the results of this article. Concerning the spectrum, the first observation is that when restricted to the invariant subspace of linear functions, the generator can be easily diagonalised. In particular, one finds that $f_{N}$ is an eigenfunction of $\mathcal{L}_{N,\alpha}$ with eigenvalue $\cos(\pi/N)-1$ and therefore, for all $\alpha>0$ , the spectral gap is at most $1-\cos(\pi/N)$ . Proving that this is actually the spectral gap is more delicate. We prove it only in the case $\alpha\geq 1$ since we rely on the FKG inequality. In this case, using the strict monotonicity of $f_{N}$ , we show that any eigenfunction associated to an eigenvalue larger than $\cos(\pi/N)-1$ needs to be non-decreasing as well, and the FKG inequality is then applied to conclude that the only such eigenfunction is the constant one; see Section 2.3 below. Concerning the case $\alpha\in(0,1)$ , we can only assert that the spectral gap is not smaller than $cN^{-2}$ for some constant $c=c(\alpha)>0$ . Indeed, this is a direct consequence of the mean field spectral gap (10) and a comparison argument (see e.g. [Cap08]).

To establish the lower bound on the mixing time of Theorem 2, we use Wilson’s method [Wil04]. For $\alpha=1$ , this was already used in [RW05b] to obtain the weaker lower bound

[TABLE]

for some constant $c>0$ . Here we sharpen this, and obtain for any $\alpha>0$ and for all $\varepsilon>0$

[TABLE]

see Proposition 11 below: note that we prove this result also for $\alpha\in(0,1)$ , but in this case $\operatorname{\mathrm{gap}}_{N}$ is not defined as the spectral gap but simply as being $1-\cos(\pi/N)$ . This suggests a cutoff window of order $N^{2}$ , but we do not have a corresponding upper bound. The proof of the above lower bound is based on Wilson’s method with a careful choice of the initial condition $x$ and a comparison argument which allows us to control the variance of $f_{N}({\bf X}^{x}(t))$ ; see Section 3.

The proof of the upper bound on the mixing time of Theorem 2 is the most involved part of the paper, and is worked out in Section 5 and Section 6. Our strategy can be roughly outlined as follows; see Section 5.1 below for a more detailed overview. In [RW05b], and for $\alpha=1$ , it was shown that

[TABLE]

for some constant $C>0$ . This upper bound was obtained by estimating, under some monotone grand coupling, the hitting time of [math] for the area comprised between the two extremal processes, that is, the processes starting from the highest ( $x_{1}=\ldots=x_{N-1}=N$ ) and the lowest ( $x_{1}=\ldots=x_{N-1}=0$ ) initial conditions. Note that under such a coupling this hitting time bounds from above the coalescing time starting from any two initial conditions. The proof consisted of two steps that we now briefly recall. First, using the decay of the heat equation solved by the expectation of the area, one can show that after a time $C(\delta)N^{2}\log N$ , the area lies below $N^{-\delta}$ with large probability. Choosing $\delta$ large enough, the second step relies on a brute force argument that shrinks the area to [math] in a time of order $\log N$ with large probability. In Appendix A, we give a proof of (13) in the general case $\alpha>0$ : in contrast with the case $\alpha=1$ , there is no simple monotone grand coupling for a general $\alpha$ that achieves an efficient coalescing time of the two extremal processes. We are then lead to controlling the coalescing time of the stationary process and the process starting from some arbitrary initial condition.

The constant $C$ of (13) is dictated by the smallest value $C(\delta)$ that one can take in the aforementioned strategy: since $\delta$ needs to be very large for the second step to apply, this value is much larger than desired. Our main contribution consists in introducing a sequence of intermediate steps that reduce the time necessary to bring the area to a small enough threshold from which a brute force argument can be applied. Namely, we use the first step above up to the target mixing time, which guarantees a shrinking of the expectation of the area down to $N^{3/2}$ (corresponding to $\delta=-3/2$ ), and then present a coupling under which, through a sequence of successive stages, we are able to bring the area from $N^{3/2}$ down to $N^{-1}$ within a time of order $N^{2}$ with large probability, that is, a time negligible compared to the target mixing time. The last step is then a (slightly different) brute force argument that shrinks the area to [math] with large probability within a time of order $\log N$ . This program is carried out in Section 5.

At a technical level, we use estimates on the derivative of the predictable bracket process associated to the evolution of the area together with diffusive estimates on supermartingales in order to upper bound the time needed for the area to descend through the intermediate values mentioned above. A key ingredient of these hitting time estimates, is the control of the gradients of the particle positions for the extremal processes, that is the process with maximal and minimal initial particle positions. The control of the gradients, in turn, is obtained by coupling the extremal processes with the stationary distribution. To this end we need to establish, with an independent argument, a sharp upper bound on the time needed for convergence to equilibrium of the two extremal processes.

The sharp upper bound for the extremal processes (by symmetry, we may consider only the highest process) is performed in Section 6. This part of the proof is based on a strategy developed in the discrete setting by [Lac16] for the adjacent transposition process. It relies on the FKG inequality and a version of the censoring inequality from [PW13]. Censoring inequalities are known to hold provided one has monotonicity of the density of the initial condition with respect to the invariant measure. While in the discrete setting this adapts well to the extremal process, in our context a nontrivial adaptation is required since the initial condition is singular with respect to the invariant measure.

2. Preliminaries

Let us consider a larger state-space where there is no constraint on $x_{N}$

[TABLE]

The generator $\mathcal{L}_{N,\alpha}$ defines a dynamics also on this enlarged space. If we let

[TABLE]

denote the increments of $x$ , one can easily check that the distribution under which the increments $\eta_{i}$ are i.i.d. random variables with distribution $\Gamma(\alpha,\lambda)$ , $\lambda>0$ being an arbitrary parameter, is a reversible measure for the dynamics. Here $\Gamma(\alpha,\lambda)$ stands for the Gamma distribution with mean $\alpha/\lambda$ and variance $\alpha/\lambda^{2}$ .

Remark 6.

We remark that the Beta resampling rule (6) is a natural choice in our context. Indeed, suppose we define a generator as in (7) with $\rho_{\alpha}(u)du$ replaced by another probability $\nu(du)$ on the unit interval. Then one can check that if $\mu$ is a probability measure on $\Omega^{+}_{N}$ under which the increments $\eta_{i}$ are independent, the generator is reversible with respect to $\mu$ if and only if $\mu$ is the product of $\Gamma(\alpha,\lambda)$ , and $\nu(du)=\rho_{\alpha}(u)du$ , for some $\alpha,\lambda>0$ . This rests on the well known fact that if $X,Y$ are two independent random variables such that $X+Y$ and $X/(X+Y)$ are independent, then $X$ and $Y$ must have the gamma distribution, see [Luk55].

2.1. Monotone grand coupling

There are two natural orders on $\Omega^{+}_{N}$ associated with the dynamics. The coordinate order

[TABLE]

and the gradient order

[TABLE]

Here and below we use the notation $\llbracket i,j\rrbracket$ for the integer interval $[i,j]\cap\mathbb{Z}$ . Note that the coordinate order is natural in both $\Omega_{N}$ and $\Omega_{N}^{+}$ , while the gradient order is only relevant for the unconstrained space $\Omega_{N}^{+}$ . An important observation is that the Beta resampling dynamics preserves both orders in the following sense.

Proposition 7 (Existence of a grand coupling).

For any $\alpha>0$ , and $N\geq 2$ , we can construct the trajectories $(\mathbf{X}^{x}(t))_{t\geq 0}$ of the Markov chain on $\Omega^{+}_{N}$ with generator $\mathcal{L}_{N,\alpha}$ on the same probability space in such a way that $\mathbb{P}$ -a.s, for all $x,x^{\prime}\in\Omega^{+}_{N}$

[TABLE]

Proof.

The coupling invoked above is the usual graphical construction (see e.g. [Lig05]). To each $k\in\llbracket 1,N-1\rrbracket$ we associate a Poisson clock process $(\mathcal{T}^{(k)}_{i})_{i\geq 1}$ whose increments are i.i.d. rate one exponentials, and a sequence $(U^{(k)}_{i})_{i\geq 1}$ of i.i.d. symmetric Beta variables of paramenter $\alpha$ . Then, for all $x\in\Omega^{+}_{N}$ , $(\mathbf{X}^{x}(t))_{t\geq 0}$ is chosen to be càd-làg and constant outside of the update times $(\mathcal{T}^{(k)}_{i})_{k\in\llbracket 1,N-1\rrbracket,i\geq 1}$ . At time $t=\mathcal{T}^{(k)}_{i}$ , if $U^{(k)}_{i}=u$ the $k$ -th coordinate is updated as follows

[TABLE]

One then checks by inspection that the process generated in this manner is the desired Markov chain. Moreover, the above construction implies that it preserves the two above mentioned orders in the sense of (17). ∎

Let us finally introduce the maximal $\wedge$ and minimal $\vee$ configurations for the coordinate order $\geq$ that we defined above:

[TABLE]

2.2. The FKG inequality

When $\alpha\geq 1$ , the equilibrum measure $\pi_{N,\alpha}$ can be interpreted as a one dimensional gradient field associated with a convex potential. More precisely we can write

[TABLE]

where $V$ is the convex potential

[TABLE]

Below we use a standard application of Holley’s criterion (see [Pre74]) to show that $\pi_{N,\alpha}$ satisfies the so-called FKG inequality which entails positive correlation between increasing functions.

We say that $f$ defined on $\Omega_{N}$ is increasing if

[TABLE]

A subset $A\subset\Omega_{N}$ is said to be increasing if $\mathbf{1}_{A}$ is an increasing function. We let $\wedge$ and $\vee$ denote the following operations in $\Omega_{N}$

[TABLE]

(Note that there is a little clash of notation with the maximal and minimal configurations introduced in (18): however we believe that this will never raise any confusion in the sequel).

Proposition 8 (FKG inequality).

For any $\alpha\geq 1$ and $N\geq 2$ , if $f$ and $g$ are increasing then

[TABLE]

Moreover, the inequality remains valid if $\pi_{N,\alpha}$ is replaced by its restriction $\pi_{N,\alpha}(\cdot|A)$ to any set $A$ which is stable under the operations $\vee$ and $\wedge$ .

Proof.

Setting $H(x):=\sum_{k=1}^{N}V(x_{k}-x_{k-1})$ , the density of $\pi_{N,\alpha}$ with respect to Lebesgue is given by $e^{-H(x)}$ and Holley’s criterion shows that the inequality holds provided that

[TABLE]

The inequality (22) can be deduced as a consequence of the convexity of $V$ see [Gia02, Appendix B1]. For the measure restricted to a set $A$ it is sufficient to check that (22) is valid when $H$ is replaced by

[TABLE]

which is an immediate consequence of (22) under our assumption on $A$ . ∎

Let us point out that the first inequality of Proposition 8 could be obtained directly from the monotonicity stated in Proposition 7, see for instance [LPW17, Th 22.16], and therefore does not require the convexity of the potential (that is, it holds also for $\alpha\in(0,1)$ ).

For two probability measures $\mu,\nu$ on $\Omega_{N}$ , we write $\mu\leq\nu$ and say that $\mu$ is stochastically dominated by $\nu$ if for all increasing $f$ (in the sense of (20)) one has $\mu(f)\leq\nu(f)$ . Let us mention another application of Holley’s criterion which we are going to use in the proof.

Lemma 9.

Let $A$ and $B$ be two increasing subsets of $\Omega_{N}$ such that

[TABLE]

Then

[TABLE]

Proof.

Let $\mu_{A}:=\pi_{N,\alpha}(\cdot\ |\ A)$ and $\mu_{B}:=\pi_{N,\alpha}(\cdot\ |\ B)$ . These probability measures have density proportional to $\exp(-H_{A}(x))$ and $\exp(-H_{B}(x))$ respectively, where the potentials $H_{A},H_{B}$ are defined as in (23). The result then directly follows from [Pre74, Proposition 1] if one can show that for every $x,x^{\prime}\in\Omega_{N}$

[TABLE]

This in turn follows from the inequality (22) for $H$ and our assumption on $A,B$ .

∎

2.3. Identification of the spectral gap

Here we prove the first statement of Theorem 2. Fix $\alpha\geq 1$ and $N\geq 2$ , and write $\mathcal{L}$ for the generator $\mathcal{L}_{N,\alpha}$ . Using the expression (7), the action of the generator on the coordinate map $h_{k}(x):=x_{k}$ , $k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ , is given by

[TABLE]

where $\Delta$ denotes the discrete Laplace operator

[TABLE]

Summation by parts and (25) then shows that for every $j\in\llbracket 1,N-1\rrbracket$ the map

[TABLE]

is an eigenfunction of $\mathcal{L}$ with the eigenvalue $-\lambda_{N}^{(j)}$ where

[TABLE]

In the case $j=1$ , we simply write $f_{N}$ for $f_{N}^{(1)}$ and $\lambda_{N}$ for $\lambda_{N}^{(1)}$ . We now turn to show that $\lambda_{N}$ is actually the spectral gap.

It is not hard to check that for any $n\geq 1$ , the set of all polynomials of degree at most $n$ in the variables $x_{1},\ldots,x_{N-1}$ is stable under the action of the generator. When restricted to any such set, the generator admits a finite complete decomposition into eigenvalues / eigenfunctions. By density of polynomials in $L^{2}(\Omega_{N},\pi_{N,\alpha})$ , there exists an orthonormal basis of polynomial eigenfunctions and therefore the generator has pure point spectrum in $L^{2}(\Omega_{N},\pi_{N,\alpha})$ .

$\mathcal{L}$ maps to zero all constant functions and any nontrivial eigenvalue of $\mathcal{L}$ must be associated with an eigenfunction with mean zero. Therefore it is sufficient to show that if $g$ is a normalized polynomial eigenfunction such that $\pi_{N,\alpha}(g)=0$ and $\mathcal{L}g=-\mu g$ , with $\mu<\lambda_{N}$ , then $g=0$ . Since $g$ is polynomial, and since $f_{N}$ is strictly increasing in all its variables, there exists $\epsilon>0$ such that $f_{N}+\epsilon g$ remains increasing in all its variables. Next, we define the following normalized function

[TABLE]

where $P_{t}=e^{t\mathcal{L}}$ denotes the semigroup generated by $\mathcal{L}$ , and $\|\cdot\|_{2}$ is the $L^{2}(\Omega_{N},\pi_{N,\alpha})$ -norm. From our assumptions one has $\pi_{N,\alpha}(f_{N}g)=0$ and

[TABLE]

with $\mu<\lambda_{N}$ . It follows that $v_{t}\to g$ as $t\to\infty$ in $L^{2}(\Omega_{N},\pi_{N,\alpha})$ . On the other hand, the semigroup preserves monotonicity by Proposition 7, so that $v_{t}$ is a non-decreasing function at any time $t$ . Thus, $g$ must be also non-decreasing. Notice that so far the argument is valid for any $\alpha>0$ . We shall now use the assumption $\alpha\geq 1$ .

The FKG inequality and the orthogonality of $f_{N}$ and $g$ imply that the centered coordinate maps $\bar{h}_{k}(x):=x_{k}-k$ satisfy

[TABLE]

Indeed, by Proposition 8 one has $\pi_{N,\alpha}(\bar{h}_{k}g)\geq 0$ and if this is positive for some $k$ , then $\pi_{N,\alpha}(f_{N}g)>0$ .

Let $A_{\varepsilon}$ be the event $h_{1}\geq N-\varepsilon$ , with $\varepsilon\in(0,1)$ . Since both $A_{\varepsilon}$ and $A_{\varepsilon}^{\complement}$ are stable for the operations $\wedge$ and $\vee$ introduced in (21), by Proposition 8 the restrictions $\pi_{N,\alpha}(\cdot\ |\ A_{\varepsilon})$ and $\pi_{N,\alpha}\left(\cdot\ |\ A^{\complement}_{\varepsilon}\right)$ satisfy the FKG inequality. Therefore,

[TABLE]

The FKG inequality implies that both terms in the last expression are nonnegative: indeed, $A_{\varepsilon}$ is increasing and therefore

[TABLE]

Hence the first term in the right hand side of (27) must be zero. Since $\bar{h}_{1}\mathbf{1}_{A_{\varepsilon}}\geq N-1-\varepsilon>0$ , this is possible only if $\pi_{N,\alpha}(g\,|A_{\varepsilon})=0$ . However, that and the continuity of $g$ imply that the extremal configuration $x_{i}\equiv N$ satisfies

[TABLE]

Hence $g(x)\leq 0$ for all $x\in\Omega_{N}$ and therefore $g\equiv 0$ . ∎

2.4. Absolute continuity

Recall that under $\pi_{N,\alpha}$ , the r.v. $(\eta_{k})_{k=1}^{N}$ sum up to $N$ . At several places in the proofs, it will be convenient to deal with independent r.v. instead. To that end, we use the following informal fact: for any $p\in(0,1)$ , the law of $(\eta_{1},\eta_{2},\ldots,\eta_{\lfloor pN\rfloor})$ under $\pi_{N,\alpha}$ is uniformly over $N$ absolutely continuous w.r.t. the law of $\lfloor pN\rfloor$ independent $\Gamma(\alpha,\alpha)$ r.v. The formal version is stated below:

Lemma 10.

Fix $p\in(0,1)$ . There exists a constant $C_{p}>0$ such that for all $N\geq 1$ , writing $n=\lfloor pN\rfloor$ , for any bounded measurable function $f:\mathbb{R}^{n}\to\mathbb{R}_{+}$ we have

[TABLE]

where $\nu_{N}$ is the law on $\Omega_{N}^{+}$ under which the $\eta_{k}$ ’s are IID $\Gamma(\alpha,\alpha)$ r.v.

Proof.

Let $g(\cdot)$ be the density function of a centered Gaussian distribution of unit variance. Let $q_{k}$ be the density function of a centered $\Gamma(k\alpha,\alpha)$ r.v. By the Local Limit Theorem [Pet75, Th.VII.2.7], we have

[TABLE]

Using the above together with the fact that $g$ is maximized at [math], we have for all $N$ sufficiently large

[TABLE]

The result follows by tuning the value of $C_{p}$ to also include the first few values of $N$ . ∎

3. Lower bound on total variation distance

In this section we prove the lower bound on the mixing time displayed in Theorems 1 and 2. We obtain in fact a more quantitative lower bound which is valid for all values $\alpha>0$ .

Proposition 11.

For any $\alpha>0$ , for any $\varepsilon\in(0,1)$ there exists $C_{\alpha,\varepsilon}>0$ such that for all $N\geq 2$

[TABLE]

We are going to follow a variant of the method introduced by Wilson in [Wil04]: to obtain the lower bound (28), we select a specific test function $f$ and show that by time $t=\frac{N^{2}}{\pi^{2}}(\log N-C_{\alpha,\varepsilon})$ , the value $f({\bf X}(t))$ is far from the equilibrium value $\pi_{N,\alpha}(f)$ with large probability. This is achieved by picking a suitable initial condition and by evaluating the first two moments of $f({\bf X}(t))$ . As in the case of the exclusion process, we select $f=f_{N}$ , the eigenfunction appearing in Proposition 1. As noted in the proof of that proposition, $f_{N}$ is an eigenfunction with eigenvalue $\operatorname{\mathrm{gap}}_{N}$ for all $\alpha>0$ .

The initial condition $\mathbf{X}(0)$ is defined as follows. When $\alpha=1$ we let $\eta_{k},k=1,\ldots,N/2$ be111In this section, we write $N/2$ for $\lfloor N/2\rfloor$ in order to alleviate notations. i.i.d. exponential random variables with mean $2$ conditioned on $\eta_{1}+\ldots+\eta_{N/2}=N$ , and set $\eta_{k}=0$ for all $k\in\llbracket N/2+1,N\rrbracket$ . More generally, for arbitrary $\alpha>0$ we choose the distribution of $\eta_{k}$ for $k\leq N/2$ to be i.i.d. $\Gamma(\alpha,\alpha/2)$ , with the same conditioning on the sum, and set again $\eta_{k}=0$ for all $k\in\llbracket N/2+1,N\rrbracket$ . For the rest of this section, we let ${\bf X}(t)$ be the process starting from the random initial condition with increments $\eta_{k}$ ,

[TABLE]

We shall use $\mathbb{P},\mathbb{E}$ for the corresponding probability measure and expectation, and ${\rm Var}$ for the associated variance. For later use we also prove a result concerning the variance of other Fourier coefficients of $\mathbf{X}(t)$ , namely the functions $f_{N}^{(j)}$ from (26).

Lemma 12.

For the process described above with initial condition (29), there exists $C>0$ such that for all $t\geq 0$ and all $N$ large enough

[TABLE]

and for every $j\in\llbracket 2,N-1\rrbracket$

[TABLE]

With this lemma at hand, the proof of the lower bound is simple.

Proof of Proposition 11.

Define

[TABLE]

and note that for all $t\geq 0$ :

[TABLE]

Chebyshev’s inequality and Lemma 12 imply that

[TABLE]

Similarly, noting that ${\rm Var}_{\pi_{N,\alpha}}(f_{N})=\pi_{N,\alpha}(f_{N}^{2})=\lim_{t\to\infty}{\rm Var}[f_{N}(\mathbf{X}(t))]$ , one has

[TABLE]

Recalling that $\operatorname{\mathrm{gap}}_{N}=1-\cos(\pi/N)$ concludes the proof. ∎

We close this section with the proof of the lemma.

Proof of Lemma 12.

First of all, since $f_{N}$ is an eigenfunction of $\mathcal{L}_{N,\alpha}$ associated with the eigenvalue $\operatorname{\mathrm{gap}}_{N}$ , we have

[TABLE]

where the last bound follows from the fact that the initial condition satisfies

[TABLE]

To compute the variance at a fixed time $t>0$ (we focus on the case $j=1$ to keep the notation light and explain the general case at the end of the proof), we introduce the process

[TABLE]

Since $\mathbb{E}[f_{N}(\mathbf{X}(s))|\mathbf{X}(u)]=e^{-\operatorname{\mathrm{gap}}_{N}(s-u)}f_{N}(\mathbf{X}(u))$ , $u\in[0,s]$ , it follows that $(M_{s}^{t},\,s\in[0,t])$ is a martingale. The associated increasing predictable process, or angle bracket, is denoted $\langle M^{t}_{\cdot}\rangle_{s}$ , $s\in[0,t]$ . In particular, we look for an upper bound on ${\rm Var}[f_{N}(\mathbf{X}(t))]=\mathbb{E}[\langle M^{t}_{\cdot}\rangle_{t}]$ .

Note that independently of the value of $\alpha$ , when an update is performed at coordinate $k$ , the value of $f_{N}({\mathbf{X}}(t))$ varies at most by $\eta_{k}(s)+\eta_{k+1}(s)$ , where $\eta_{k}(s)=(X_{k}-X_{k-1})(s)$ . Thus,

[TABLE]

Therefore,

[TABLE]

Now using monotonicity of the dynamics for the gradients (Proposition 7), we can replace $\eta_{k}(s)$ by the gradients corresponding to a dynamics on $\Omega^{+}_{N}$ starting from a larger initial condition for the order $\succcurlyeq$ defined in (16). A natural choice is to pick an initial condition which is stationary for the dynamics so that the dependence in $s$ vanishes in the integral.

Let $(\eta^{\prime}_{k})_{k=1}^{N}$ be i.i.d. variables with distribution $\Gamma(\alpha,\alpha/2)$ , conditioned to

[TABLE]

and consider $\mathbf{X}^{\prime}(t)$ the dynamics on $\Omega^{+}_{N}$ with initial condition $X^{\prime}_{k}(0):=\sum_{i=1}^{k}\eta^{\prime}_{k}$ . Note that one can construct $\eta$ and $\eta^{\prime}$ on the same probability space in such a way that $\eta_{k}\leq\eta^{\prime}_{k}$ for all $k$ by setting

[TABLE]

and $\eta_{k}=0$ , for $k\in\llbracket N/2+1,N\rrbracket$ . Indeed, by a standard property of the gamma distribution, the variables $\eta^{\prime}_{k}/\sum_{j\leq N/2}\eta_{j}^{\prime}$ and $\sum_{j\leq N/2}\eta_{j}^{\prime}$ are independent and therefore the $\eta_{k}$ defined in (31) has the correct distribution. Hence using Proposition 7 we can assume $\mathbf{X}(t)\preccurlyeq\mathbf{X}^{\prime}(t)$ for all $t\geq 0$ and thus in particular $\mathbb{E}\left[\eta_{k}(s)^{2}\right]\leq\mathbb{E}\left[\eta^{\prime}_{k}(s)^{2}\right]$ .

Finally let $(\eta^{\prime\prime}_{k})_{k=1}^{N}$ be (unconditioned) i.i.d. with distribution $\Gamma(\alpha,\alpha/2)$ . Defining the dynamics $\eta^{\prime\prime}_{k}(s)$ with this initial condition one has

[TABLE]

By stationarity,

[TABLE]

Since the expected value of each $\eta^{\prime\prime}_{k}(0)$ is $2$ , the central limit theorem shows that the event $\sum_{j\leq N/2}\eta_{j}^{\prime\prime}(0)\geq N$ has probability at least $1/3$ if $N$ is sufficiently large. In conclusion,

[TABLE]

Using this bound in (30) shows that ${\rm Var}[f_{N}(\mathbf{X}(t))]\leq CN/\operatorname{\mathrm{gap}}_{N}\leq C^{\prime}N^{3}$ uniformly in $t\geq 0$ .

For $j\in\llbracket 2,N-1\rrbracket$ , we repeat the above procedure for the martingale

[TABLE]

and obtain ${\rm Var}[f^{(j)}_{N}(\mathbf{X}(t))]\leq CN/\lambda^{(j)}_{N}\leq CN^{3}j^{-2}$ . This concludes the proof of Lemma 12.

∎

4. Mixing time for the separation distance

Here we prove Theorem 3. The main result of this section is the following lower bound.

Proposition 13.

For any $\alpha>0$ , there exists $C_{\alpha,\varepsilon}>0$ such that for any $\varepsilon\in(0,1)$ and for all $N\geq N_{\varepsilon}$ sufficiently large we have

[TABLE]

With this result at hand, and assuming the validity of Theorem 2, the derivation of the asymptotic of the separation mixing times is somewhat standard.

Proof of Theorem 3.

Proposition 13 gives the desired lower bound on the mixing times. Regarding the upper bound, we adapt the argument used in the discrete setup (see e.g [LPW17, Lemma 19.3]) to show that

[TABLE]

where $(\cdot)_{+}$ denotes the positive part. Theorem 2 and (33) clearly imply the desired upper bound in Theorem 3.

Recalling the notation $P_{t}(A,B)$ defined below (11) we notice that reversibility implies that $P_{t}(A,B)=P_{t}(B,A)$ and as a consequence

[TABLE]

Thus for any $A,B$ with positive Lebesgue measure, using the semigroup property at the first line and reversibility at the second line

[TABLE]

By Schwarz’ inequality, we have

[TABLE]

Let $\bar{P}_{t}(A,\,\text{\rm d}z)=(\pi_{N,\alpha}(A))^{-1}P_{t}(A,\,\text{\rm d}z)$ denote the normalized version of $P_{t}(A,\,\text{\rm d}z)$ . From (34) it follows that $\frac{P^{z}_{t}(A)}{\pi_{N,\alpha}(A)}$ and $\frac{P^{z}_{t}(B)}{\pi_{N,\alpha}(B)}$ are the respective densities of $\bar{P}_{t}(A,\cdot)$ and $\bar{P}_{t}(B,\cdot)$ w.r.t. $\pi_{N,\alpha}$ . Therefore, using the triangular inequality one has

[TABLE]

Taking the infimum over $A$ and $B$ yields (33). ∎

Proof of Proposition 13.

We are going to show that for all $\alpha,\varepsilon>0$ , there exists $C>0$ such that if

[TABLE]

then one can find measurable sets $A,B\subset\Omega_{N}$ such that

[TABLE]

A natural choice to minimize $P_{t_{0}}(A,B)$ is to choose $A,B$ as tiny neighbourhoods of the opposite extremal configurations. We define the neighbourhoods of our extremal configurations $\vee$ and $\wedge$ as follows

[TABLE]

and we are going to prove

[TABLE]

We assume for simplicity that $N$ is even just for the sake of notation. The first step of the proof is to reduce (36) to a simple statement about the process at time $t_{0}/2$

[TABLE]

and the second step is to prove (37).

Let us now show that (36) follows from (37) as a consequence of reversibility and the FKG inequality. The proof is in fact very similar to the one developped in [Lac16, Section 7.1] in a discrete setup. We have

[TABLE]

Now we can split the integral on $y$ into two contributions

[TABLE]

with $U_{+}\cup U_{-}=\Omega_{N}$ and $\pi_{N,\alpha}(U_{+}\cap U_{-})=0$ . Using reversibility and symmetry we have

[TABLE]

so that the right hand side of (38) is equal to

[TABLE]

Now according to the observations made in Section 6 (see Lemma 32), the measure

[TABLE]

has an increasing density with respect to $\pi_{N,\alpha}$ (call it $\rho_{+}$ ). Also we observe that by monotone coupling $P_{t_{0}/2}(y,\underline{\vee})$ is decreasing in $y$ . Hence using the FKG inequality (Proposition 8 for the restriction to the stable set $U_{-}$ ) we obtain that the quantity (40) satisfies

[TABLE]

Using stationarity we see that the second integral is smaller than

[TABLE]

Hence using the fact that $\pi_{N,\alpha}(U_{-})=1/2$ and replacing $\rho_{+}(x)\pi_{N,\alpha}(\,\text{\rm d}x)$ by its definition we can conclude that

[TABLE]

This proves that (36) follows from (37). It remains to prove (37), or in other words that starting from a random initial condition in $\bar{\wedge}$ the probability that $\mathbf{X}_{t_{0}/2}\in U_{-}$ is small.

Note that (37) follows from a slightly stronger result for the maximal initial condition

[TABLE]

Indeed, with the grand coupling described in the proof of Proposition 7, for any $x\in\bar{\wedge}$ , $i\in\llbracket 1,N-1\rrbracket$ and any $t>0$ we have $X^{x}_{i}(t)\geq X^{\wedge}_{i}-1.$

Now by monotonicity we can replace $\mathbf{X}^{\wedge}(t)$ by the dynamics with the random initial condition considered in Lemma 12.

The function $i\mapsto X_{i}(t)-i$ can be decomposed on the orthonormal basis of the discrete Laplacian (endowed with Dirichlet b.c.). Using (26), we thus obtain

[TABLE]

Using the fact that $\mathbb{E}[f^{(j)}_{N}(\mathbf{X}(t))]=e^{-\lambda^{(j)}_{N}t}\mathbb{E}[f^{(j)}_{N}(\mathbf{X}(0))]$ , and observing that all the terms with $j\geq 2$ become negligible, it follows that if the constant $C$ is large enough, then

[TABLE]

Now the important part is to control the variance of $X_{N/2}(t)$ . Our control is uniform in time.

From the very rough bound ${\rm Var}(\sum_{i\in I}Z_{i})\leq\left(\sum_{i\in I}\sqrt{{\rm Var}Z_{i}}\right)^{2}$ valid for any sequence of random variables $(Z_{i})_{i\in I}$ , and using Lemma 12 one obtains

[TABLE]

Then (43) can easily be deduced by combining the inequalities for the two first moments (45) and (46). ∎

5. Upper bound on total variation distance

5.1. Decomposition of the proof

In this section we prove the upper bound on the total variation mixing time displayed in Theorems 1 and 2. Given $\delta>0$ we set

[TABLE]

and we are going to prove that

[TABLE]

Since $\delta>0$ is arbitrary and $\operatorname{\mathrm{gap}}_{N}{\sim}\frac{1}{2}\pi^{2}N^{-2}$ this yields the desired upper bound. We establish this statement via two intermediate propositions. Firstly, we show that by time $t_{\delta}$ we can bring to equilibrium the extremal process started in the maximal configuration $\wedge$ .

Proposition 14.

For any $\delta>0$ we have

[TABLE]

Secondly, we show that with large probability and for any given initial condition $x$ , we can couple the process starting form $x$ with the maximal process.

Proposition 15.

For any $\delta>0$ we have

[TABLE]

By the triangle inequality, (47) is an immediate consequence of the two propositions above.

Even though Proposition 14 is a consequence of Proposition 15, our proof of the latter will actually rely on the former result; see also Remark 42 below. Consequently, we will need an independent proof of Proposition 14 and this will be carried out in Section 6. The remainder of this section is concerned with the proof of Proposition 15.

From now on, $x\in\Omega_{N}$ is fixed; however all the constants that will appear below will be independent of this chosen $x$ . In the forthcoming Section 5.2, we construct the processes $\mathbf{X}^{\wedge}$ and $\mathbf{X}^{x}$ on the same probability space in such a way that (recall (15))

[TABLE]

In the remainder of this proof $\mathbb{P}$ denotes the probability associated with the probability space on which this coupling is constructed. We let $A_{t}$ denote the area comprised between the graphs of $\mathbf{X}^{x}$ and $\mathbf{X}^{\wedge}$ :

[TABLE]

and we aim at bounding its hitting time of [math]

[TABLE]

As the ordering given by (48) implies that $\mathbf{X}^{\wedge}(t)=\mathbf{X}^{x}(t)$ for $t\geq\tau$ , we have

[TABLE]

Of course the distribution of $\tau$ depends tremendously on the coupling. For instance, the reader can check that for the coupling provided by Proposition 7 which satisfies (48), we have $\tau=\infty$ with probability one. The coupling presented below is constructed with the aim of minimizing the merging time $\tau$ . The proof is split into three main steps:

(1)

The area passes below $N^{3/2}$ by time $t_{\delta/2}$ with large probability. 2. (2)

Within an additional time of order $N^{2}$ , the area is very likely to decline from $N^{3/2}$ to $N^{-1}$ . 3. (3)

The area goes from $N^{-1}$ down to [math] within a time of order $\log N$ with large probability.

The above steps clearly ensure that the event $\tau>t_{\delta}$ has a small probability, which will conclude the proof of Proposition 15.

The first step is a rather simple consequence of the fact that $\mathbb{E}[X_{k}^{\wedge}(t)-X^{x}_{k}(t)]$ is a solution of the discrete heat equation.

The third step is a brute force argument inspired by Randall and Winkler [RW05b]. More precisely, for this last step we build on the main idea of [RW05b], but we improve it in a quantitative manner using estimates on the minimum gradient of $\mathbf{X}^{\wedge}$ .

The second step above is by far the most delicate one. The strategy is to introduce a sequence of intermediate thresholds between the values $N^{3/2}$ and $N^{-1}$ , and then analyse the associated hitting times for the area process. Our control of these hitting times relies on diffusive estimates for the supermartingale $A_{t}$ and on estimates on the corresponding angle bracket process. One of the ingredients of the latter estimates is a fine control of the gradients of $\mathbf{X}^{\wedge}$ , which is in turn derived from a combination of equilibrium estimates and Proposition 14.

5.2. The coupling and preliminary lemmas

We start by defining the coupling $\mathbb{P}$ . For any $k\in\llbracket 1,N-1\rrbracket$ and any configuration $x\in\Omega_{N}$ , we define the “interval of resampling of the $k$ -th coordinate” as $I(x,k):=[x_{k-1},x_{k+1}]$ and write $|I(x,k)|=\nabla x_{k}$ for its length, where the “gradient” $\nabla x_{k}$ is defined by

[TABLE]

For simplicity we sometimes use the short-hand notation $I^{\wedge}=I(\mathbf{X}^{\wedge}(t_{-}),k)$ and $I^{x}=I(\mathbf{X}^{x}(t_{-}),k)$ . In our construction, we are going to try, at each resampling event, to couple $\mathbf{X}^{x}_{k}$ and $\mathbf{X}^{\wedge}_{k}$ with the maximal probability. Letting $\mathrm{Beta}_{\alpha}(I)$ denoting the distribution with density given by (when $I=[a,b]$ )

[TABLE]

we set

[TABLE]

and $q:=1-p$ . In the case $\alpha=1$ we have

[TABLE]

but there is no such simple expression for $p$ when $\alpha>1$ ; however we will be able to provide good estimates for it (cf. Lemma 16).

Then, we define $\nu_{1}$ , $\nu_{2}$ , $\nu_{3}$ to be the probability measures with respective densities $\rho_{1},\rho_{2}$ , and $\rho_{3}$ given by

[TABLE]

As a consequence of our assumption $\alpha\geq 1$ (which makes the functions $B_{\alpha}(I)$ unimodal), and the fact that by monotonicity both extremities of $I^{\wedge}$ are larger than their counterparts in $I^{x}$ , the supports of $\rho_{1}$ and $\rho_{3}$ are intervals $I_{1}$ and $I_{3}$ , the lower extremity of $I_{3}$ being larger or equal than the upper extremity of $I_{1}$ . To see this it is sufficient to check that there exists at most one value $u$ such that the equation $B_{\alpha}(I^{x})(u)=B_{\alpha}(I^{\wedge})(u)>0$ has at most one solution. We refer to Figure 1 for the case $\alpha=1$ and to Figure 2 for the general case of a unimodal density.

Our coupled dynamics is defined as follows. Each pair of coordinates $(X^{x}_{k},X^{\wedge}_{k})$ , $k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ , is updated with rate one independently, and when an update occurs at time $t$ then

•

with probability $p$ , the two new coordinates are set to the same value $X_{k}^{x}(t)=X_{k}^{\wedge}(t)$ drawn from the distribution $\nu_{2}$ ,

•

with probability $q$ , the two new coordinates are sampled independently with respective distributions $\nu_{1}$ and $\nu_{3}$ , and therefore satisfy $X_{k}^{x}(t)\leq X_{k}^{\wedge}(t)$ .

For convenience, we set

[TABLE]

(in the remainder of the proof $\delta$ as a positive parameter is not used anymore so that this should not yield confusion). We also introduce the “mean” interfaces $\bar{\mathbf{X}}^{\wedge}$ and $\bar{\mathbf{X}}^{x}$ by setting

[TABLE]

and similarly for $\bar{\mathbf{X}}^{x}$ . Note that $\bar{X}^{\wedge}_{k}(t)$ is the midpoint of the interval of resampling of the $k$ -th coordinate. We finally set

[TABLE]

We now collect a few facts on our coupling. We start by showing that the probability $q(t,k)$ can be fairly approximated by

[TABLE]

and that, as soon as $q$ is close enough to $1$ , the overlap between the resampling intervals represents a small fraction of the largest resampling interval. This second bound will be convenient in order to bound from below the derivative of the bracket of the area, see the proof of Proposition 18.

Lemma 16.

For any $\alpha\geq 1$ , there is a constant $C$ (that depends on $\alpha$ ) such that for all $t\geq 0$ and $k\in\llbracket 1,N-1\rrbracket$

[TABLE]

Furthermore, there exists a constant $c_{1}>0$ such that

[TABLE]

(where the notation used is the one introduced below (50)).

Proof.

Note that (54) is simply a result concerning the total variation between two $\mathrm{Beta}_{\alpha}$ variables defined on two intervals $I_{1}:=[l_{1},r_{1}]$ and $I_{2}:=[l_{2},r_{2}]$ such that $l_{1}\leq l_{2}$ and $r_{1}\leq r_{2}$ . Recalling the definition (51)

[TABLE]

and (54) holds if we can prove (for a different constant $C$ ) that

[TABLE]

By symmetry, we can assume that $I_{1}$ is the largest of the two intervals and by scaling invariance we can assume without loss of generality that $I_{1}=[0,1]$ and $I_{2}=:[a,a+b]$ , $a>0$ , $b\in(1-a,1]$ , in which case

[TABLE]

We can further assume that $a<1$ as the result is trivially valid when $a\geq 1$ .

For better readability of the proof we replace $\mathrm{Beta}_{\alpha}$ by a generic unimodal function $\rho$ which is positive on $(0,1)$ , integrates to $1$ and is symmetric around $1/2$ . We replace $\mathrm{Beta}_{\alpha}[a,a+b]$ by $\rho_{[a,a+b]}=b^{-1}\rho((\cdot-a)/b)$ with the convention that $\rho(u)=0$ for $u\notin[0,1]$ .

We first let the reader check that $\int_{[0,1]}\left[\rho(u)-\rho_{[a,a+b]}(u)\right]_{+}\,\text{\rm d}u$ is increasing in $a$ and $b$ simply because the integrand displays the same monotonicities.

For (55), we simply observe, using these monotonicities in $a$ and $b$ , that if $a<1/2$ then

[TABLE]

As a consequence, setting $c_{1}=1-\int_{[0,1]}\left[\rho(u)-\rho_{[1/2,3/2]}(u)\right]_{+}\,\text{\rm d}u$ , we deduce that if $q\geq 1-c_{1}$ , we have $a\geq 1/2$ and $|I_{1}\cap I_{2}|\leq 1/2$ thus yielding (55).

To prove (54), we show that for every $a\in[0,1]$

[TABLE]

which allows to prove (57) with $C=\max\left(\|\rho\|_{\infty},4/\rho(1/4)\right)$ . By monotonicity it is sufficient to check the upper-bound for $b=1$ . Figure 2 provides a graphical proof of the following inequality

[TABLE]

Now concerning the lower bound, using again the graphical proof of Figure 2 (by symmetry of $\rho$ the two curves intersect at $(1+a)/2$ ), we have for $a\leq 1/2$

[TABLE]

and thus the l.h.s. being increasing in $a$ we conclude that for all $a\in(0,1)$ (the factor $1/2$ is present so that the inequality is also valid for $a\in(1/2,1]$ )

[TABLE]

Using symmetry and invariance by translation at the first line and the triangle inequality for the total variation distance at the second line, we have

[TABLE]

This lower bound is also valid for $b\in(1-a,1)$ by monotonicity in $b$ , thus completing the proof of (58). ∎

Remark 17.

Let us observe for latter use that the upper bound in (57) is also valid without the assumption $l_{1}\leq l_{2}$ , $r_{1}\leq r_{2}$ (at the cost of taking $C$ twice as large). This can be deduced from the other case. Indeed, assume without loss of generality that $[l_{1},r_{1}]\subset[l_{2},r_{2}]$ and set $I_{3}=[l_{1},r_{2}]$ . Then, using the previous bounds for the pairs $(I_{1},I_{3})$ and $(I_{2},I_{3})$ ,

[TABLE]

Using the expression for the generator (25) and the fact that our coupling preserves monotonicity the reader can check that for every $t\geq 0$

[TABLE]

and hence that $A_{t}$ is a supermartingale for the filtration $(\mathcal{F}_{t})_{t\geq 0}$ defined by $\mathcal{F}_{t}:=\sigma(\mathbf{X}^{x}_{s},\mathbf{X}^{\wedge}_{s},s\leq t)$ . We write $\{\langle A_{\cdot}\rangle_{t},t\geq 0\}$ , for the associated angle bracket process, namely the increasing predictable process that compensates the square of the martingale part of $A_{t}$ .

Proposition 18.

There exists a constant $c>0$ such that for all $N$ large enough and for all $t\geq 0$ , we have

[TABLE]

Remark 19.

The proof will actually establish the same bound but with $\nabla X^{\wedge}_{k}(t)$ replaced by the maximum of the latter and $\nabla X^{x}_{k}(t)$ .

Proof.

First of all, recalling the coupling defined in Section 5.2, we have

[TABLE]

where $J(t,k)$ is the mean square displacement corresponding to an instantaneous uncoupled jump of $X^{\wedge}_{k}-X^{x}_{k}$ at time $t$ . We are going to prove a lower bound for each term in the sum, and as before we omit from now on the dependence in $k$ and $t$ . Without loss of generality we assume that $\nabla X^{\wedge}\geq\nabla X^{x}$ .

We let $Z^{x}$ and $Z^{\wedge}$ denote the two independent variables with respective distribution $\nu_{1}$ and $\nu_{3}$ (whose densities are described in Equation (52)) that are used in the coupling. We are going to prove first that

[TABLE]

This is achieved by computing explicitly $J$ (here $E$ denotes expectation for the pair of variables $(Z^{x},Z^{\wedge})$ )

[TABLE]

Replacing $J$ by its value, observing that $\delta\bar{X}=qE[Z^{\wedge}-Z^{x}]$ , and taking the minimum over all possible values for $\delta X$ we obtain that the l.h.s. of (61) is larger than

[TABLE]

which is the desired result.

To conclude from (61), we consider $c_{1}$ from (55) in Lemma 16. If $p\geq c_{1}$ then the r.h.s. of (61) is larger than $c_{1}(\delta\bar{X})^{2}/q$ and we can conclude using the upper bound in (58).

When $p\leq c_{1}$ then we use (55) which ensures that

[TABLE]

so that with probability at least $1/2$ , $Z^{\wedge}$ coincides with a $\mathrm{Beta}_{\alpha}(I^{\wedge})$ r.v. conditioned on being larger than its median $\bar{X}^{\wedge}$ . Since the variance of the latter conditional law is of order $|I^{\wedge}|^{2}=(\nabla X^{\wedge})^{2}$ , so that for some adequate choice of $c>0$ we have we have

[TABLE]

which concludes the proof.

∎

5.3. Successive hitting times

To prove Proposition 15, our argument is to show that by time $t_{\delta/2}+N^{2}$ , the area has become very small (smaller than $N^{-1}$ ) and then to use some brute force argument to show that $\tau$ cannot be much larger.

Our strategy to control the decay of the area requires several steps and we introduce the successive hitting times by the area of a sequence of well-chosen thresholds. For $\eta>0$ small, we define

[TABLE]

We first show that with large probability $\mathcal{T}_{2}$ is equal to $t_{\delta/2}$ . Then setting $L:=\min\{i\geq 1:\frac{3}{2}-i\eta<-1\}$ , we show that the increments $\mathcal{T}_{i}-\mathcal{T}_{i-1}$ are small for all $i\leq L$ . The argument to control the increment $\mathcal{T}_{i}-\mathcal{T}_{i-1}$ differs for different ranges of $i$ so that it is practical for us to introduce the following intermediate thresholds

[TABLE]

We now introduce the following events which allow us to split our proof in four parts

[TABLE]

We prove in the forthcoming Sections 5.4, 5.7, 5.8, and 5.9 respectively that each of these four events holds with probability tending to 1. This implies in particular that

[TABLE]

In Section 5.10 we use this last statement to conclude the proof of Proposition 15. The remaining Sections 5.5-5.6 are dedicated to the introduction of technical material which is used throughout Sections 5.7-5.10.

5.4. Initial contraction of $A_{t}$ and control of $\mathcal{T}_{2}$

The probability of $\mathcal{B}_{(1)}^{N}$ can be controlled using Markov’s inequality for the non-negative random variable $A_{t}$ . Indeed $\mathbb{E}[A_{t}]$ has an explicit expression in terms of the discrete heat equation. This argument does not exploit our specific coupling $\mathbb{P}$ .

Lemma 20.

For any $\eta>0$ we have as $N\to\infty$

[TABLE]

In particular, if $\eta\leq\delta/10$ then $\lim_{N\to\infty}\mathbb{P}(\mathcal{B}_{(1)}^{N})=1$ .

Proof.

Let us set $a(t,k):=\mathbb{E}[X^{\wedge}_{k}(t)-X_{k}^{x}(t)]$ . From the expression (25), we deduce that

[TABLE]

In addition we have $a(t,0)=a(t,N)=0$ . Expanding $k\mapsto a(t,k)$ on the orthonormal basis of the discrete Laplacian as in (44) and estimating all the corresponding eigenvalues by the main eigenvalue, it follows that

[TABLE]

By Cauchy-Schwarz’s inequality,

[TABLE]

Markov’s inequality then yields the asserted result. ∎

5.5. Technical preliminaries to control the probability of $\mathcal{B}_{(2)}^{N}$ , $\mathcal{B}_{(3)}^{N}$ , $\mathcal{B}_{(4)}^{N}$

Before going into the specifics of each case let us introduce the common framework which allows us to control $\mathcal{T}_{i}-\mathcal{T}_{i-1}$ for $i\in\llbracket 3,L\rrbracket$ . Our main idea is to exploit the fact that $(A_{t})_{t\geq 0}$ is a supermartingale for which we have a reasonable control on the jumps. For such processes the hitting time can be estimated if one can control the angle bracket of the martingale, as shown in the following result from [LL18].

Proposition 21.

Let $(M_{t})_{t\geq 0}$ be a pure-jump supermartingale with bounded jump rate and jump amplitude. Given $a\in\mathbb{R}$ and $b\leq a$ , set

[TABLE]

If the amplitude of the jumps of $(M_{t})_{t\geq 0}$ is bounded above by $d$ , then we have for any $v\geq 4d^{2}$

[TABLE]

where $\langle M_{\cdot}\rangle$ denotes the bracket of $(M_{t})_{t\geq 0}$ (the predictable processes which compensates the square of the martingale part of $M$ ).

Proof.

The only difference with the statement of [LL18, Proposition 29] is that $d$ is not necessarily below $a-b$ . However, a careful inspection of the proof therein shows that the present statement holds (note that the proof therein relies on [LL18, Lemma 30] and this result does not need any modification to cover our setting). ∎

We apply the previous proposition to the supermartingale $M_{t}=A_{t}$ . Our idea is to combine the above result with estimates on the increments of the bracket of $A$ , $\Delta_{i}\langle A\rangle:=\langle A_{\cdot}\rangle_{\mathcal{T}_{i}}-\langle A_{\cdot}\rangle_{\mathcal{T}_{i-1}}$ in order to obtain upper-tail probability for the increments of the hitting times $\mathcal{T}_{i}-\mathcal{T}_{i-1}$ . The control of $\Delta_{i}\langle A\rangle$ as a function of $\mathcal{T}_{i}-\mathcal{T}_{i-1}$ is technically involved. Our general strategy is to restrict ourselves to an event of large probability for which we have (for some adequate constant $C(i,N)$ )

[TABLE]

The specific events that we have to consider is introduced in the following section.

5.6. Restriction to the right set of events

Our convenient event is the intersection of two events $\mathcal{A}^{(N)}_{1}$ and $\mathcal{A}^{(N)}_{2}$ that we now introduce (for these events and others we make the dependence in $N$ appear only when necessary). Regarding $\mathcal{A}_{1}$ , we only impose that the increments of the higher interface are not too large “at all times”:

[TABLE]

That the probability of $\mathcal{A}_{1}$ goes to $1$ will follow from the fact that the higher interface is close to equilibrium by Proposition 14 and from simple estimates under the invariant measure. To define $\mathcal{A}_{2}$ , we introduce the following events that impose some restrictions on the interfaces at a given time $t$ . The events $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ require respectively the gradients of the higher interface to be not too small, and the distance between the two interfaces to be not too large:

[TABLE]

Given $x\in\Omega_{N}$ , we let $(a_{i}(x))_{i=1}^{N-1}$ be the increasingly ordered sequence of the values $(\nabla x_{k})_{k=1}^{N-1}$ . Then we set

[TABLE]

Finally we define

[TABLE]

The event $\mathcal{A}_{2}$ then requires that for a large proportion of the interval of time $[t_{\delta/2},t_{\delta/2}+N^{2}]$ , the four events $\mathcal{C}_{i}(t)$ are satisfied:

[TABLE]

Note that the probability of $\mathcal{A}_{2}$ still goes to $1$ if $1-2^{-(L+1)}$ is replaced by any factor $1-c\in[0,1)$ . However, for latter use we need this factor to be larger than $1-2^{-L}$ , and this explains our particular choice $1-2^{-(L+1)}$ .

Proposition 22.

We have $\lim_{N\to\infty}\mathbb{P}[\mathcal{A}^{(N)}_{1}\cap\mathcal{A}^{(N)}_{2}]=1$ .

Proof.

Recall that $\pi_{N,\alpha}$ is the invariant measure of our dynamics, and let $\nu_{N}$ be the law on $\Omega_{N}^{+}$ under which the $\eta_{k}$ ’s are independent $\Gamma(\alpha,\alpha)$ r.v. Without further mention, we will apply Lemma 10 that allows us to bound some functionals under $\pi_{N,\alpha}$ by the same functionals under $\nu_{N}$ .

We start with the event $\mathcal{A}_{1}$ . We have (recall that $\alpha\geq 1$ )

[TABLE]

By Proposition 14, it follows that

[TABLE]

Therefore, it suffices to work with the process starting from the stationary measure $\pi_{N,\alpha}$ : we denote by $\mathbf{P}$ the law of such a process. Let us subdivide the interval $[t_{\delta/2},t_{\delta/2}+N^{2}]$ into disjoint intervals $[t_{i},t_{i+1}]$ of size $N^{-5}$ : note that there are of order $N^{7}$ such intervals. Then, a standard argument on independent Poisson clocks ensures that the probability that on each interval $[t_{i},t_{i+1}]$ there is no more than $1$ resampling event is larger than $1-CN^{-1}$ for some constant $C>0$ , hence on that event, at any time $s\in[t_{\delta/2},t_{\delta/2}+N^{2}]$ , $\eta_{k}(s)$ is equal to some $\eta_{k}(t_{i})$ for some $i$ . Moreover, a simple union bound combined with (72) shows that the probability that for all $t_{i}$ and all $k$ , $\eta_{k}(t_{i})<10\log N$ goes to $1$ . Therefore,

[TABLE]

We turn to the event $\mathcal{A}_{2}$ . By Markov’s inequality, it suffices to show that for every $i\in\llbracket 1,4\rrbracket$

[TABLE]

To handle the events $\mathcal{C}_{1}$ , $\mathcal{C}_{3}$ and $\mathcal{C}_{4}$ , Proposition 14 ensures that one can work under the stationary measure $\pi_{N,\alpha}$ . We are going to make extensive use of the following tail estimates for the $\Gamma(2\alpha,\alpha)$ distribution:

[TABLE]

By Lemma 10, the bound is also valid under $\pi_{N,\alpha}$ (with a different constant). To control the probability of $\mathcal{C}^{\complement}_{1}$ it is sufficient to observe that by union bound and exchangeability of the increments

[TABLE]

We turn to $\mathcal{C}_{4}$ . By union bound and exchangeability again we have

[TABLE]

for all $\lambda\geq 0$ . Since $\nu_{N}[(\nabla x_{2k})^{2}]=(4+2\alpha^{-1})>1$ , it is simple to check that for small enough $\lambda$ , we have $\mathbb{E}[e^{-\lambda(\nabla x_{2})^{2}}]\leq e^{-\lambda}$ which suffices to conclude.

We now consider $\mathcal{C}_{3}$ . We let $b_{i}(x)$ denote the set of increasingly ordered values of $(\nabla x_{2k})_{k=1}^{N/2}$ . We are going to show that $\pi_{N,\alpha}(\mathcal{C}^{\prime\complement}_{3})$ tends to zero where

[TABLE]

The same bound concerning the odd gradients $(\nabla x_{2k-1})_{k=1}^{N/2}$ is then sufficient to conclude. By union bound and exchangeability of the variables $(\nabla x_{2k})_{k=1}^{N/2}$ , we have

[TABLE]

Using Lemma 10 and (73) we obtain for some positive constant $C$

[TABLE]

Regarding $\mathcal{C}_{2}$ , we first show that for $N$ sufficiently large

[TABLE]

By symmetry, we can restrict to $k\in\llbracket 0,N/2\rrbracket$ and by Lemma 10, it suffices to prove this bound under $\nu_{N}$ . For every $k\in\llbracket 0,N/2\rrbracket$ the law of $x_{k}$ under $\nu_{N}$ is $\Gamma(k\alpha,\alpha)$ . Using a Chernoff bound for all $N$ large enough we obtain

[TABLE]

(Note that a finer upper bound would be $e^{-c(\log N)^{2}}$ for some small enough $c>0$ ). Consequently, (76) follows. By Proposition 14, we deduce that

[TABLE]

By symmetry the same is valid for $X_{k}^{\vee}(t)$ . By Proposition 7 $X_{k}^{x}(t)$ is stochastically dominated by $X_{k}^{\wedge}(t)$ and stochastically dominates $X_{k}^{\vee}(t)$ . This implies that

[TABLE]

and the result is obtained by observing that $\mathcal{C}_{2}(t)^{\complement}$ is contained in the union of the two events in (77) and (78). ∎

5.7. Controlling the $\mathcal{T}_{i}$ increments for $i\in\llbracket 3,I\rrbracket$

We are now ready to prove

Lemma 23.

The event $\mathcal{B}_{(2)}^{N}$ satisfies

[TABLE]

Recall that for any $t\in[t_{\delta/2},\mathcal{T}_{I})$ , the area satisfies $A_{t}>N^{1+\kappa}$ for some $\kappa>0$ . Our proof relies then on the following observation.

Lemma 24.

For all $t\in[t_{\delta/2},t_{\delta/2}+N^{2}]$ , on the event $\mathcal{A}_{1}\cap\mathcal{C}_{2}(t)\cap\mathcal{C}_{4}(t)\cap\{A_{t}\geq N(\log N)^{4}\}$ we have

[TABLE]

Proof of Lemma 23.

Proposition 21 applied to the supermartingale $(A_{\mathcal{T}_{i-1}+s})_{s\geq 0}$ (whose maximal jump size is $N$ ) with $v=N^{3-2(i-1)\eta}\log N$ , yields that $\lim_{N\to\infty}\mathbb{P}(\mathcal{A}^{(N)}_{3})=1$ where

[TABLE]

To conclude we only need to show that

[TABLE]

We proceed by contradiction. On the event $\mathcal{A}_{1}\cap\mathcal{A}_{2}\cap\mathcal{A}_{3}$ consider the smallest integer $j$ in $\llbracket 3,I\rrbracket$ such that $\mathcal{T}_{j}-\mathcal{T}_{j-1}>2^{-j}N^{2}$ . Applying Lemma 24 on the interval $[\mathcal{T}_{j-1},\mathcal{T}_{j-1}+2^{-j}N^{2}]\subset[t_{\delta/2},t_{\delta/2}+N^{2}]$ , on which $A_{t}\geq N^{3/2-j\eta}$ we obtain

[TABLE]

where the last inequality uses the definition of $\mathcal{A}_{2}$ to assert that

[TABLE]

∎

Proof of Lemma 24.

Since $A_{t}\geq N(\log N)^{4}$ , we have for $N$ large enough

[TABLE]

Since we work on $\mathcal{C}_{2}(t)$ , we get

[TABLE]

The bound on the gradients given by $\mathcal{A}_{1}$ ensures that

[TABLE]

so that for $N$ large enough we get

[TABLE]

and

[TABLE]

Consequently by Proposition 18 on the event $\mathcal{A}_{1}$

[TABLE]

The set over which the sum is taken on the r.h.s. can be decomposed into connected components of size at least $(\log N)^{2}$ . On $\mathcal{C}_{4}(t)$ , we thus deduce that there exists $c^{\prime}>0$ such that

[TABLE]

where in the last line we simply used (82). ∎

5.8. Controlling the $\mathcal{T}_{i}$ increments for $i\in\llbracket I+1,K\rrbracket$

We can prove now

Lemma 25.

We have

[TABLE]

We follow essentially the same line of proof as in the previous section, but using a different inequality for the bracket derivative. We also need an extra trick to control the maximal amplitude of jumps. Recall that for every $t\in[\mathcal{T}_{I},\mathcal{T}_{K})$ , we have $A_{t}>2N^{1/4}$ .

Lemma 26.

Fix $\varepsilon>0$ . On the event $\mathcal{A}_{1}\cap\mathcal{C}_{3}(t)\cap\mathcal{C}_{4}(t)\cap\{A_{t}>2N^{\varepsilon}\}$ , we have for all $t\in[t_{\delta/2},t_{\delta/2}+N^{2}]$

[TABLE]

Proof of Lemma 25.

In order to diminish the restriction on $v$ imposed in Proposition 21 we introduce the following stopping times with the objective of considering a process with smaller jump amplitude

[TABLE]

We consider the supermartingale $(A_{(\mathcal{T}_{i-1}+t)\wedge\mathcal{R}_{i}})_{t\geq 0}$ . For every $i\leq L$ , its maximal jump size up to time $\mathcal{T}_{i}-\mathcal{T}_{i-1}$ is bounded by

[TABLE]

Indeed, the maximal variation of the area due to an update of site $k$ at time $t$ is bounded above by

[TABLE]

We consider now the event

[TABLE]

with $v_{i}:=4(10\log N+N^{\frac{3}{2}-(i-2)\eta})^{2}$ . The standard union bound and a supermartingale version of Doob’s maximal inequality (sometimes refered to as Ville’s inequality) show that $\mathbb{P}[(\mathcal{A}^{(N)}_{4,1})^{\complement}]\leq 3(L-I)N^{-\eta}$ , and hence converges to [math]. Now, applying Proposition 21 to $(A_{(\mathcal{T}_{i-1}+t)\wedge\mathcal{R}_{i}})_{t\geq 0}$ with $v=v_{i}$ yields $\lim_{N\to\infty}\mathbb{P}(\mathcal{A}^{(N)}_{4,2})=1$ .

Our final step is to prove

[TABLE]

We place ourselves on the event $\mathcal{A}_{1}\cap\mathcal{A}_{2}\cap\mathcal{A}_{4}\cap\mathcal{B}_{(2)}$ and we proceed by contradiction. Consider the smallest element $j$ of $\llbracket 1,K\rrbracket$ such that $\mathcal{T}_{j}-\mathcal{T}_{j-1}\geq 2^{-j}N^{2}$ (note that as we are on $\mathcal{B}_{(2)}^{N}$ we must have $j\geq I+1$ ). Because we are on the event $\mathcal{A}_{1}\cap\mathcal{A}_{4,1}$ , the reader can check that we must have

[TABLE]

Hence by Lemma 26 , $\mathcal{A}_{4,2}$ and $\mathcal{A}_{2}$ imply that

[TABLE]

where for the inequality we proceed as in the proof of Lemma 23 to show that the integral is larger than $2^{-j-1}N^{2}$ . This yields the desired contradiction. ∎

We are left with proving Lemma 26. The first step is the following technical estimate.

Lemma 27.

Given $B>0$ , let $(a_{i})_{i=1}^{n}$ be an increasing sequence of positive real numbers with $\max a_{i}\leq B$ , and $(b_{i})_{i=1}^{n}$ an arbitrary sequence in $\mathbb{R}_{+}$ with $\max b_{i}\leq B$ , and let us set $\sigma:=\sum_{i=1}^{n}b_{i}$ , and $K:=\lfloor\sigma/B\rfloor$ . Then we have

[TABLE]

If $K=0$ we have

[TABLE]

Proof.

Let us start with the case when $\sigma\geq B$ . Using that $a_{i}$ and $b_{i}$ are bounded by $B$ we have

[TABLE]

Now the reader can check that if $\sigma$ is fixed, the r.h.s. is minimized when $b_{i}=B$ for $i\leq K$ , $b_{K+1}=\sigma-BK$ , and $b_{i}=0$ for $i>K+1$ . When $\sigma<B$ , it is sufficient to check (88) when $\sigma\leq a_{1}$ by monotonicity, and in that case we have

[TABLE]

∎

Proof of Lemma 26.

First assume that $\max_{k}|X^{\wedge}_{k}(t)-X^{x}_{k}(t)|>N^{\varepsilon}$ . As in the proof of Lemma 24, the bound on the gradients given by $\mathcal{A}_{1}$ ensures that

[TABLE]

Consequently, if we let $\ell$ be an integer for which $\delta X_{\ell}(t)>N^{\varepsilon}$ we get the bound

[TABLE]

By $\mathcal{C}_{4}(t)$ , we thus deduce that

[TABLE]

Next, assume that $\max_{k}|X^{\wedge}_{k}(t)-X^{x}_{k}(t)|\leq N^{\varepsilon}$ . Set $B=N^{\varepsilon}$ . We apply Lemma 27 with $(a_{i})_{i=1}^{N}$ being the ordered sequence of the $(\nabla X_{k}^{\wedge}(t))_{k=1}^{N}$ and $(b_{i})_{i=1}^{N}$ being the corresponding $\delta\bar{X}_{k}(t)$ ’s. Since $\sum_{k}\delta\bar{X}_{k}(t)\geq A_{t}/2$ and since $A_{t}\geq 2N^{\varepsilon}$ , the lemma combined with $\mathcal{C}_{3}(t)$ and Proposition 18 gives the following lower bound

[TABLE]

∎

5.9. Controlling the $\mathcal{T}_{i}$ increments for $i\in\llbracket K+1,L\rrbracket$

Following our plan we now prove

Lemma 28.

We have

[TABLE]

This time we use the following control for the martingale bracket, which is almost immediate to prove

Lemma 29.

On the event $\mathcal{A}_{1}\cap\mathcal{C}_{1}(t)$ we have

[TABLE]

Proof.

Combining Proposition 18 and Lemma 27, we obtain

[TABLE]

∎

Proof of Lemma 28.

We only need to show that

[TABLE]

A contradiction can be obtained exactly as in the proof of Lemma 28. More precisely assuming that $\mathcal{B}_{(4)}^{N}$ does not hold we obtain that for some $j\in\llbracket K+1,L\rrbracket$

[TABLE]

∎

5.10. Proof of Proposition 15

We finally conclude the proof by proving

[TABLE]

We first need to restrict ourself to an adequate event of large probability. Using the fact that $(A_{\mathcal{T}_{L}+t})_{t\geq 0}$ is a supermartingale, combinining the Martingale stopping theorem and Doob’s inequality we obtain that

[TABLE]

Since Lemma 20, 23, 25 and 28 assert that $\mathcal{T}_{L}\leq t_{\delta/2}+N^{2}$ with large probability, we have $\lim_{N\to\infty}\mathbb{P}[\mathcal{A}^{(N)}_{5,1}]=1$ where

[TABLE]

We also need to make sure that the gradients in our dynamics are not too small.

Lemma 30.

Setting

[TABLE]

we have

[TABLE]

Proof.

First of all, by Proposition 14 it suffices to control the probability of this event for the stationary process in the time interval $[0,2\log N]$ , which we denote by $\mathbf{X}$ for the rest of this proof. Recalling (73) we have

[TABLE]

Now for each $k\in\llbracket 1,N-1\rrbracket$ , consider the ordered set of update times $(t^{(k)}_{i})_{i\geq 1}$ of the coordinate $k$ for our dynamics. We let $n_{k}$ be the number of updates of the site $k$ occurring in the time interval $[0,2\log N]$ .

As the dynamics is of heat-bath type, for every $i$ and $k$ , $\mathbf{X}(t^{(k)}_{i})$ is distributed like $\pi_{N,\alpha}$ . This is also valid conditionally on the realization $(t^{(j)}_{i})_{i,j}$ of the update times. Hence considering that $\nabla X_{k}$ can only be altered at times $t^{(k\pm 1)}_{i}$ , from the union bound and (94),

[TABLE]

Taking the expectation of the above we obtain the desired result. ∎

For every site $k\in\llbracket 1,N-1\rrbracket$ , let $(t^{(k)}_{i})_{i=1}^{n_{k}}$ denote the random set of update times occurring in the interval $[t_{\delta/2}+N^{2},t_{\delta/2}+N^{2}+2\log N]$ . We set

[TABLE]

The probability of $\mathcal{A}^{(N)}_{5,3}$ goes to one (by a standard coupon collector argument for the lower bound while the upper bound is a direct union bound) and one can thus safely restrict oneself to the event $\mathcal{A}_{5}:=\cap_{i=1}^{3}\mathcal{A}_{5,i}$ .

To conclude the proof, we are going to show that

[TABLE]

We are in fact going to prove an upper bound for this probability conditioned on both $(t^{(k)}_{i})_{i=1}^{n_{k}}$ and the state of the system at the initial time $t_{\delta/2}+N^{2}$ . We use the short-hand notation $\widetilde{\mathbb{P}},\,\widetilde{\mathbb{E}}$ for this conditional probability and the corresponding expectation.

Say that an update at time $t^{(k)}_{i}$ is successful if $X^{\wedge}_{k}(t^{(k)}_{i})=X^{x}_{k}(t^{(k)}_{i})$ . The strategy is to work recursively on the successive update times, and to use the following fact. If all the previous updates have been successful, then using Lemma 16 the probability that the next update (occurring at site $k$ say) is not successful is bounded from above by (a constant times) $\delta\bar{X}_{k}$ divided by the largest gradient, this ratio being bounded by $\delta\bar{X}_{k}\sqrt{N}\log N$ on the event $\mathcal{A}_{5,2}$ . Since there are at most $10\log N$ updates per site, since $\sum_{k}\delta\bar{X}_{k}$ is bounded by twice the area and since the area is non-increasing as long as the updates are all successful, we deduce that the probability (95) that there exists an unsuccessful update is bounded by $A_{t_{\delta/2}+N^{2}}\sqrt{N}\log N<N^{-1/2}\log N$ on the event $\mathcal{A}_{5,1}$ . Thus, one can conclude. To put these heuristic observations on a firm ground, we need to introduce some notations.

Let $\widetilde{\tau}$ be, among the set of update times $(t^{(k)}_{i})_{i=1}^{n_{k}}$ , the time of the first unsuccessful update: on the event that there are no unsuccessful updates, we set arbitrarily $\widetilde{\tau}:=t_{\delta/2}+N^{2}+2\log N$ . Note that the event $\mathcal{A}_{5,1}\cap\mathcal{A}_{5,3}$ is measurable w.r.t. the sigma-field generated by $(t^{(k)}_{i})_{i=1}^{n_{k}}$ and the state of the system at the initial time $t_{\delta/2}+N^{2}$ . Since the probability of this event goes to $1$ , it suffices to show that

[TABLE]

in order to deduce that the merging time $\tau$ satisfies

[TABLE]

To prove (96), we set

[TABLE]

Then, almost surely

[TABLE]

The first term on the r.h.s. goes to [math] by Lemma 30. Regarding the second term, we argue as follows. Recall that $\mathcal{F}_{t}$ is the sigma-field generated by the system up to time $t$ , and let $\mathcal{F}_{t_{-}}=\sigma(\cup_{s<t}\mathcal{F}_{s})$ . Using Lemma 16 at the second line, for all $i$ and $k$ we have

[TABLE]

Consequently,

[TABLE]

and on the event $\mathcal{A}_{5,1}\cap\mathcal{A}_{5,3}$ this last expression is bounded by

[TABLE]

for some new constant $C^{\prime}>0$ . This concludes the proof of (96).

6. From the top down to equilibrium

The goal of this section is to prove Proposition 14, that is: when started from the maximal configuration $\wedge=(N,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N)$ , setting $t_{\delta}=(1+\delta)\frac{\log N}{2\operatorname{\mathrm{gap}}_{N}}$ , one has

[TABLE]

for any fixed $\delta>0$ .

Inspired by a strategy that was introduced in [Lac16] in the context of the adjacent interchange process, we shall base our proof on a two-scale argument, that can be roughly described as follows. For any integer $K\geq 2$ , consider the $K-1$ particles with labels $u_{i}:=\lfloor iN/K\rfloor$ , $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ . These will be called the special particles. The proof of (97) consists of three main steps.

Step 1. Starting from $\wedge$ , after a time $t=t_{\delta/2}$ , if $K$ is fixed and $N$ tends to $\infty$ , then the joint distribution of the positions of the special particles

[TABLE]

is arbitrarily close to the corresponding equilibrium distribution, see Proposition 39 below. This step is based on a subtle use of the FKG inequality together with the control of the expected value of the variables $Y_{i}(t)$ .

Step 2. Consider the censored dynamics obtained by freezing the positions of the special particles and letting the rest of the particles evolve as usual. We will show that if $K$ is taken proportional to $\delta^{-1}$ , uniformly in the initial condition, the censored dynamics at time $s_{\delta}:=\delta\,\frac{\log N}{2\operatorname{\mathrm{gap}}_{N}}$ has essentially reached the conditional equilibrium given by $\pi_{N,\alpha}$ conditioned on the positions of the special particles. For this step it will be sufficient to exploit an upper bound on the mixing time that is tight up to a constant factor as e.g. the one obtained in [RW05b] in the case $\alpha=1$ , see (13) above.

Step 3. We combine the results in the previous two steps to obtain the desired conclusion. The key point is that the distance to equilibrium at time $t_{\delta}=t_{\delta/2}+s_{\delta/2}$ appearing in (97) satisfies

[TABLE]

where the distribution $P_{t_{\delta},*}^{\wedge}$ is obtained by running the standard dynamics, starting from $\wedge$ , for a time $t_{\delta/2}$ and then by running from there the censored dynamics for a time $s_{\delta/2}$ . This step requires an adaptation to our continuous setting of the so called censoring inequality of Peres and Winkler [PW13].

We start developing the above program with a discussion of the censoring inequality. We then move to the proof of the mentioned steps in the given order.

6.1. Censoring lemma

The censoring inequality established by Peres and Winkler [PW13] allows one to compare the distance to equilibrium at time $t$ for the process under consideration with the distance to equilibrium at time $t$ for a censored process in which some of the updates have been omitted according to a given censoring scheme. In the context of Glauber dynamics for monotone, finite state spin systems, their argument rests on the following two key properties of a monotone dynamics: If the initial state has a distribution $\nu$ whose density w.r.t. the equilibrium measure $\pi$ is increasing, then for any $t\geq 0$ , the distribution $\nu P_{t}$ of the state at time $t$ satisfies

$\nu P_{t}$ has an increasing density w.r.t. $\pi$ , 2. 2)

$\nu P_{t}$ is stochastically lower than the distribution of the state of the censored dynamics, say $\nu P_{t,*}$ , for any valid censoring scheme.

Properties 1 and 2 then allow one to prove the censoring inequality

[TABLE]

We shall follow the same line of reasoning here. However, a technical problem arises with respect to the usual discrete spin setting: to prove the censoring inequality (99) we need to start with the distribution $\nu=\delta_{\wedge}$ which has no density w.r.t. equilibrium, while the strategy outlined above is crucially based on the existence of such a density. Notice also that $P_{t}^{\wedge}=\delta_{\wedge}P_{t}$ has no density w.r.t. equilibrium at any time $t\geq 0$ and so one cannot get around this problem by regularising the measure with a burn-in time. We shall need a more general version of the above properties which extends to a certain family of measures with a singular part.

We start by defining the latter. For $k\in\llbracket 1,N\rrbracket$ , define the nested sets

[TABLE]

Thus $\Omega_{1,N}=\{\wedge\}$ is the maximal configuration, while $\Omega_{N,N}=\Omega_{N}$ is the whole set of particle positions. Let $\pi_{k,\alpha}$ denote the probability measure supported on $\Omega_{k,N}$ defined as the law of the random vector $(x_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},x_{N-1})$ of the partial sums $x_{j}=\sum_{i=1}^{j}\eta_{j}$ where $\eta_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},\eta_{k}$ are i.i.d. with distribution $\Gamma(\alpha,\lambda)$ , for some arbitrary $\lambda>0$ , conditioned to $\sum_{i=1}^{k}\eta_{i}=N$ , while $\eta_{k+1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},\eta_{N}\equiv 0$ . Notice that this notation is consistent with our notation $\pi_{N,\alpha}$ for the equilibrium measure on $\Omega_{N}$ . If a probability measure $\mu$ supported on $\Omega_{k,N}$ is absolutely continuous w.r.t. $\pi_{k,\alpha}$ , we write $\frac{d\mu}{d\pi_{k,\alpha}}$ for the corresponding density, and say that $\mu$ belongs to the family $\mathcal{S}_{k}$ if $\frac{d\mu}{d\pi_{k,\alpha}}$ is an increasing function. In particular, $\mathcal{S}_{1}=\{\delta_{\wedge}\}$ , while $\mathcal{S}_{N}$ coincides with the set of distributions on $\Omega_{N}$ with an increasing density with respect to equilibrium. Finally, we define the family of measures $\mathcal{S}$ consisting of all probability measures $\mu$ on $\Omega_{N}$ such that

[TABLE]

for some $\gamma_{k}\geq 0$ with $\sum_{k=1}^{N}\gamma_{k}=1$ . Notice that (102) is a decomposition into mutually singular measures, since if $1\leq j<k\leq N$ , then $\mu_{j}(\Omega_{j,N})=1$ while $\mu_{k}(\Omega_{j,N})=0$ . The following lemma is a key fact about the set $\mathcal{S}$ .

Lemma 31.

If $\mu,\nu$ are two probability measures on $\Omega_{N}$ such that $\mu\in\mathcal{S}$ and $\mu\leq\nu$ , then

[TABLE]

Proof.

Write

[TABLE]

where $\mu^{\prime}=(1-\gamma_{N})^{-1}\sum_{j=1}^{N-1}\gamma_{j}\mu_{j}$ is singular w.r.t. $\pi_{N,\alpha}$ while $\mu_{N}$ is absolutely continuous w.r.t. $\pi_{N,\alpha}$ , with an increasing density $\varphi_{N}$ . We have

[TABLE]

We then define $A=\{x\in\Omega_{N}:\,\varphi_{N}(x)\geq 1/\gamma_{N}\}$ . It is easy to check that this event maximises the second term on the r.h.s. of the last equation. Therefore, setting $B=A\cup\Omega_{N-1,N}$ , and observing that one has $\mu^{\prime}(B)=\mu^{\prime}(\Omega_{N-1,N})=1$ and $\mu_{N}(B)=\mu_{N}(A)$ , $\pi_{N,\alpha}(B)=\pi_{N,\alpha}(A)$ , we deduce that

[TABLE]

Since $A$ and $\Omega_{N-1,N}$ are increasing, the set $B$ is also increasing. Therefore, $\mu(B)\leq\nu(B)$ and

[TABLE]

∎

Let $Q_{i}:L^{2}(\Omega_{N},\pi_{N,\alpha})\mapsto L^{2}(\Omega_{N},\pi_{N,\alpha})$ , $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ , denote the orthogonal projection onto functions that do not depend on the position of the $i$ -th particle:

[TABLE]

If $\mu$ is a probability on $\Omega_{N}$ , we write $\mu Q_{i}$ for the probability measure defined by

[TABLE]

This is the distribution obtained from $\mu$ after one update at $i$ .

Lemma 32.

If $\mu\in\mathcal{S}$ then for any $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$

$\mu Q_{i}\in\mathcal{S}$ ; 2. 2)

$\mu Q_{i}$ * is stochastically lower than $\mu$ .*

Proof.

Pick $\mu_{k}\in\mathcal{S}_{k}$ . If $i>k$ , then $\mu_{k}Q_{i}=\mu_{k}$ , since $x_{k}=N$ forces $x_{j}=N$ for all $j>k$ . If $i<k$ , writing $\varphi_{k}=\frac{d\mu_{k}}{d\pi_{k,\alpha}}$ , for any bounded measurable $f$ one has

[TABLE]

where we note that if $x_{k}=N$ and $i<k$ then $Q_{i}f(x)=\pi_{k,\alpha}[f\,|\,x_{j},j\neq i]$ , and therefore $Q_{i}$ is self-adjoint in $L^{2}(\Omega_{k,N},\pi_{k,\alpha})$ . Thus, $\mu_{k}Q_{i}$ has density $Q_{i}\varphi_{k}$ w.r.t. $\pi_{k,\alpha}$ . Recall the notation $x^{(i,u)}$ for the configuration $x$ updated at the $i$ -th particle. For any $x,y\in\Omega_{k,N}$ with $x\leq y$ , using $x^{(i,u)}\leq y^{(i,u)}$ for all $u\in[0,1]$ , it follows that

[TABLE]

In other words $Q_{i}\varphi_{k}$ is increasing, and $\mu_{k}Q_{i}\in\mathcal{S}_{k}$ if $i<k$ . When $i=k$ , observe that if $x_{k+1}=N$ , then $Q_{k}(f)=\pi_{k+1,\alpha}\left[f\,|\,x_{j},j<k\right]$ . Therefore, if $\psi_{k,\alpha}$ denotes the density of the marginal of $\pi_{k,\alpha}$ on $(x_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},x_{k-1})$ w.r.t. the marginal of $\pi_{k+1,\alpha}$ on the same variables,

[TABLE]

This shows that $\mu_{k}Q_{k}$ is supported on $\Omega_{k+1,N}$ and has density $\psi_{k,\alpha}\varphi_{k}$ w.r.t. $\pi_{k+1,\alpha}$ . A direct computation shows that

[TABLE]

for some positive constant $C_{\alpha,k,N}$ . In particular, $\psi_{k,\alpha}$ is increasing for any $\alpha>0$ . It follows that if $\mu_{k}\in\mathcal{S}_{k}$ , then $\mu_{k}Q_{k}\in\mathcal{S}_{k+1}$ , for all $k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ . Taking a generic $\mu\in\mathcal{S}$ , by linearity the above implies that $\mu Q_{i}\in\mathcal{S}$ for any $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ .

To prove the stochastic domination $\mu Q_{i}\leq\mu$ , for $\mu\in\mathcal{S}$ , it is sufficient to show that $\mu_{k}Q_{i}\leq\mu_{k}$ , for $\mu_{k}\in\mathcal{S}_{k}$ , for all $k,i$ . Pick $\mu_{k}\in\mathcal{S}_{k}$ for some $k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N$ and an increasing function $g$ on $\Omega_{N}$ . We are going to show that $\mu_{k}Q_{i}(g)\leq\mu_{k}(g)$ for any $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ . If $i>k$ then $\mu_{k}Q_{i}=\mu_{k}$ and there is nothing to prove. If $i<k$ , as above we may write

[TABLE]

Since $\varphi_{k}$ is also increasing, the FKG inequality on $\mathbb{R}$ , which is valid for any probability measure, implies that $(Q_{i}\varphi_{k})(Q_{i}g)\leq Q_{i}(\varphi_{k}g)$ pointwise. Therefore

[TABLE]

Finally, if $i=k$ , then as before we have

[TABLE]

On the other hand, defining the function $\bar{g}(x):=g(x_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},x_{k-1},N,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N)$ , one has

[TABLE]

Thus, the conclusion $\mu_{k}Q_{k}(g)\leq\mu_{k}(g)$ follows from the fact that $g\leq\bar{g}$ . ∎

In the next lemma we consider the effect of a sequence of updates on a measure $\mu\in\mathcal{S}$ and compare it with the effect of another sequence obtained from the first by removing some of the updates.

Lemma 33.

Pick $n\in\mathbb{N}$ and fix a sequence $z:=(z_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},z_{n})\in\llbracket 1,N-1\rrbracket^{n}$ . For any $\mu\in\mathcal{S}$ , if $\mu^{z}$ denotes the new measure

[TABLE]

then $\mu^{z}\in\mathcal{S}$ . Moreover, if $z^{\prime}$ denotes a sequence obtained from $z$ by removing some of the entries, then $\mu^{z}\leq\mu^{z^{\prime}}$ and

[TABLE]

Proof.

Lemma 32 shows that $\mu^{z}\in\mathcal{S}$ for any $\mu\in\mathcal{S}$ and any sequence $z$ . For the second part of the lemma, by a telescoping argument it is sufficient to consider the case where $z$ and $z^{\prime}$ differ by the removal of a single update, say $z_{j}$ , so that

[TABLE]

Let $\mu_{1}=\mu Q_{z_{1}}\cdots Q_{z_{j}}$ , and $\mu_{2}=\mu Q_{z_{1}}\cdots Q_{z_{j-1}}$ . Then $\mu_{1}=\mu_{2}Q_{z_{j}}$ and thus, by Lemma 32 one has $\mu_{1}\leq\mu_{2}$ . Moreover,

[TABLE]

where the inequality follows from the fact that each update preserves the monotonicity, see Proposition 7. The conclusion follows from Lemma 31. ∎

We can now state and prove the censoring inequality in our setup. A censoring scheme $\mathcal{C}$ is defined as a càdlàg map

[TABLE]

where $\mathcal{P}(A)$ denotes the set of all subset of a set $A$ . The subset $\mathcal{C}(s)$ , at any time $s\geq 0$ , represents the set of particles whose update is to be suppressed at that time. More precisely, given a censoring scheme $\mathcal{C}$ , and an initial condition $x\in\Omega_{N}$ , we write $P_{t,\mathcal{C}}^{x}$ for the law of the random variable obtained by starting at $x$ and applying the standard graphical construction (see Proposition 7) with the proviso that if the particle with label $j$ rings at time $s$ , then the update is performed if and only if $j\notin\mathcal{C}(s)$ . In particular, the uncensored evolution $P^{x}_{t}$ corresponds to $P^{x}_{t,\mathcal{C}}$ when $\mathcal{C}(s)\equiv\eset$ . Given a distribution $\mu$ on $\Omega_{N}$ , we write

[TABLE]

Lemma 34.

For any $\alpha>0$ , if $\mu\in\mathcal{S}$ , and $\mathcal{C}$ is a censoring scheme, then for all $t\geq 0$ :

$\mu P_{t}\in\mathcal{S}$ * and $\mu P_{t,\mathcal{C}}\in\mathcal{S}$ ,* 2. 2)

$\mu P_{t,\mathcal{C}}$ * is stochastically higher than $\mu P_{t}$ .*

Moreover, for all $t\geq 0$ :

[TABLE]

Proof.

It is sufficient to prove 1) and 2) above, since the conclusion (106) is then a consequence of Lemma 31. To prove 1) and 2) note that by conditioning on the realization $\mathcal{T}_{t}$ of the Poisson clocks $\mathcal{T}^{(j)}$ , $j\in\llbracket 1,N-1\rrbracket$ up to time $t$ in the graphical construction, one has that the uncensored and the censored evolution are measures of the form $\mu^{z}$ and $\mu^{z^{\prime}}$ respectively; see (104). By Lemma 33 one has $\mu^{z},\mu^{z^{\prime}}\in\mathcal{S}$ and $\mu^{z}\leq\mu^{z^{\prime}}$ . Taking the expectation over $\mathcal{T}_{t}$ shows that $\mu P_{t}\in\mathcal{S}$ and $\mu P_{t,\mathcal{C}}\in\mathcal{S}$ , and that $\mu P_{t}\leq\mu P_{t,\mathcal{C}}$ .

∎

6.2. Relaxation of the special particles

Here we show that special particles have reached equilibrium by time $t_{\delta/2}$ ; see Proposition 39. The key to this result will be Proposition 36 below. Recall the notation $\mathcal{S}_{N}$ introduced in Section 6.1 for the set of probability measures on $\Omega_{N}$ with an increasing density w.r.t. $\pi_{N,\alpha}$ . Given a probability $\mu$ on $\Omega_{N}$ , we write $\bar{\mu}$ for the marginal of $\mu$ on the special particle positions $y:=(y_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},y_{K-1})$ .

Lemma 35.

If $\mu\in\mathcal{S}_{N}$ then $\bar{\mu}$ is absolutely continuous w.r.t. $\bar{\pi}_{N,\alpha}$ and the corresponding density is increasing on $\mathbb{R}^{K-1}$ .

Proof.

Let $\varphi=d\mu/d\pi_{N,\alpha}$ . The density of $\bar{\mu}$ w.r.t. $\bar{\pi}_{N,\alpha}$ is given by the conditional expectation

[TABLE]

To prove that it is increasing, we have to show that $\pi_{N,\alpha}(\varphi|y)\geq\pi_{N,\alpha}(\varphi|y^{\prime})$ whenever $y\geq y^{\prime}$ . The latter domination can be seen as follows. Let $x$ be the highest configuration of particle positions such that $y_{i}=x_{u_{i}}$ , $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ , and let $x^{\prime}$ be the lowest configuration of particle positions such that $y^{\prime}_{i}=x^{\prime}_{u_{i}}$ , $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ . Clearly, $x\geq x^{\prime}$ . Then use $(x,x^{\prime})$ as initial conditions in the graphical construction (Proposition 7) for the censored dynamics where all updates of the special particles are suppressed. As time goes to infinity the two distributions converge weakly to $\pi_{N,\alpha}(\cdot|y),\pi_{N,\alpha}(\cdot|y^{\prime})$ respectively. Since $x\geq x^{\prime}$ and the graphical construction preserves the order, this shows that $\pi_{N,\alpha}(\cdot|y)\geq\pi_{N,\alpha}(\cdot|y^{\prime})$ . ∎

We use the following notation for the centered height of the special particles:

[TABLE]

and write $\mu(W)=\bar{\mu}(W)$ for the expected value of $W$ . Note that at equilibrium, for $N$ large, the vector $(w_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},w_{K-1})$ behaves roughly as a normal vector, and the fluctuations of $W$ are of order $\sqrt{N}$ for each fixed $K$ . The results below are valid for all $\alpha\geq 1$ .

Proposition 36.

For any $\varepsilon>0$ , $K\in\mathbb{N}$ , there exists $\eta=\eta(K,\varepsilon)>0$ such that for all $N\geq 2$ , $\mu\in\mathcal{S}_{N}$ one has:

[TABLE]

Proof.

We follow [Lac16, Section 5], where a similar statement was proved in the context of random permutations. Given a constant $\lambda>0$ , define the events

[TABLE]

Let us first show that for any $\mu\in\mathcal{S}_{N}$ , $a>0$

[TABLE]

Let $\varphi$ denote the density $d\mu/d\pi_{N,\alpha}$ . The sets $A_{i}$ and $A_{i}^{\complement}$ are stable under the operations $\wedge$ and $\vee$ introduced in (21), and the FKG inequality applied to $\pi_{N,\alpha}(\cdot|A_{i})$ shows that

[TABLE]

Similarly, the FKG inequality for $\pi_{N,\alpha}(\cdot|A_{i}^{\complement})$ shows that

[TABLE]

Therefore, using $\pi_{N,\alpha}(w_{i})=0$ :

[TABLE]

Since $A_{i}$ and $w_{i}$ are increasing and since $\pi_{N,\alpha}(w_{i})=0$ , one has $(\mu(A^{\complement}_{i})-\pi_{N,\alpha}(A^{\complement}_{i}))\pi_{N,\alpha}(w_{i}|A^{\complement}_{i})\geq 0$ and therefore

[TABLE]

Consider the function

[TABLE]

Then $\psi$ is increasing, and applying FKG we obtain

[TABLE]

Summing in (109),

[TABLE]

This proves (108). Next, let us show that for any $\mu\in\mathcal{S}_{N}$ , $a>0$

[TABLE]

The sets $A,B$ satisfy the assumptions of Lemma 9. Therefore,

[TABLE]

If $\mu(A)\leq(1+a)\pi_{N,\alpha}(A)$ , then

[TABLE]

From Lemma 35 we know that $\bar{\varphi}=d\bar{\mu}/d\bar{\pi}_{N,\alpha}$ is increasing. If $x,x^{\prime}$ are two particle configurations such that $x^{\prime}\in B^{\complement}$ and $x\in A$ , then $y_{i}=x_{u_{i}}\geq x^{\prime}_{u_{i}}=y^{\prime}_{i}$ , for all $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ . Since $A,B$ are measurable w.r.t. the $y$ variables, we write $\bar{A},\bar{B}$ for the corresponding subsets of $\mathbb{R}^{K-1}$ , so that $x\in A\iff y\in\bar{A}$ and $x\in B\iff y\in\bar{B}$ . Thus, we have

[TABLE]

Integrating over $y$ w.r.t. $\bar{\pi}_{N,\alpha}(\cdot|\bar{A})$ in the above inequality one finds

[TABLE]

where we have used the assumption $\mu(A)\leq(1+a)\pi_{N,\alpha}(A)$ . In conclusion,

[TABLE]

Using (111) we obtain (110).

Finally, we need a lower bound on the probability $\pi_{N,\alpha}(A)$ and an upper bound on the probability $\pi_{N,\alpha}(B)$ . At equilibrium $w_{i}$ is approximately normal with mean zero and variance $iN/K$ , and therefore

[TABLE]

for all $i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ , $N\geq 2$ , for some constants $b_{2}(\lambda,K)\to 0$ as $\lambda\to\infty$ and $b_{1}(\lambda,K)>0$ for all $\lambda>0$ . As for the lower bound on $\pi_{N,\alpha}(A)$ observe that by the FKG inequality one has

[TABLE]

On the other hand,

[TABLE]

Then, for any $\varepsilon>0$ , any fixed $K$ , taking $\lambda$ large enough, we find that there exists a constant $\delta_{1}=\delta_{1}(\varepsilon,K)>0$ such that

[TABLE]

Once we have (108), (110) and (112) we can conclude as follows. Suppose that $\mu(W)\leq\eta\sqrt{N}$ . If $a=2\eta/\lambda\delta_{1}$ , then by (108) and (112) one must have $\mu(A)\leq(1+a)\pi_{N,\alpha}(A)$ . Therefore by (110) and (112)

[TABLE]

The desired conclusion follows by taking $\eta=\varepsilon\lambda\delta_{1}/4$ . ∎

Next, we address the problem of controlling the expected value of $W$ at time $t$ when started from the maximal configuration.

Proposition 37.

For any $k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ , any $t\geq 0$ :

[TABLE]

In particular, if $\mu_{t}=\delta_{\wedge}P_{t}$ , then for all $t\geq 0$ :

[TABLE]

Proof.

Defining

[TABLE]

one has for all $j,k=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ :

[TABLE]

where

[TABLE]

Set $v(t)=(v_{1}(t),\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},v_{N-1}(t))$ , where

[TABLE]

Expanding the vector $v(t)$ in the orthonormal basis $\varphi_{j}$ , $j=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1$ , one finds $v_{k}(t)=\sum_{j=1}^{N-1}a_{j}(t)\varphi_{j}(k)$ , where $a_{j}(t)=\sum_{k=1}^{N-1}\varphi_{j}(k)v_{k}(t)$ . Using (25) one finds

[TABLE]

In particular, $|a_{j}(0)|\leq N^{3/2}/\sqrt{2}$ , and

[TABLE]

Using $\lambda_{j}\geq j\lambda_{1}$ it follows that

[TABLE]

If $t$ is such that $e^{-\lambda_{1}t}\leq 1/2$ then this implies $v_{k}(t)\leq 2Ne^{-\lambda_{1}t}$ . On the other hand if $e^{-\lambda_{1}t}\geq 1/2$ then clearly $v_{k}(t)\leq N\leq 2Ne^{-\lambda_{1}t}$ . Since $\lambda_{1}=\operatorname{\mathrm{gap}}_{N}$ , this proves the desired upper bound. ∎

Next we want to use the bound in Proposition 37 to obtain, via Proposition 36, the desired control on the convergence of special particles. However, again a technical problem arises due to the fact that $\mu_{t}=\delta_{\wedge}P_{t}$ has no density w.r.t. equilibrium. We overcome this by showing that the singular part of $\mu_{t}$ has very small mass if $t$ is large.

Lemma 38.

For any $t\geq 0$ , the measure $\mu_{t}=\delta_{\wedge}P_{t}$ satisfies

[TABLE]

where $\mu_{t,N}\in\mathcal{S}_{N}$ , and $\gamma_{t,N}\geq 1-Ne^{-{t/N}}$ , for some probability measure $\mu^{\prime}_{t}$ .

Proof.

We know from Lemma 34 that $\mu_{t}\in\mathcal{S}$ for all $t\geq 0$ . Thus

[TABLE]

with $\mu_{t,N}\in\mathcal{S}_{N}$ for some coefficients $\gamma_{t,N}$ , and some singular measure $\mu^{\prime}_{t}$ . It remains to show that $\gamma_{t,N}\geq 1-Ne^{-{t/N}}$ . By conditioning on the realization $\mathcal{T}_{t}$ of the Poisson clocks $\mathcal{T}^{(j)}$ , $j\in\llbracket 1,N-1\rrbracket$ up to time $t$ in the graphical construction, one has that the distribution of the particles at time $t$ is of the form $\mu^{z}$ , for some sequence of updates $z$ ; see (104). In the proof of Lemma 32 we have seen that if $\mu\in\mathcal{S}_{k}$ , then $\mu Q_{k}\in\mathcal{S}_{k+1}$ , and therefore if the sequence $z$ contains the full sweep $(1,2,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1)$ as a subsequence, then $\mu^{z}\in\mathcal{S}_{N}$ . Let $E_{t}$ denote the event that $z$ contains $(1,2,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},N-1)$ as a subsequence. Since $\mathcal{S}_{N}$ is stable under convex combinations, taking the expectation over $\mathcal{T}_{t}$ one finds that $\gamma_{t,N}\geq\mathbb{P}(E_{t})$ . A rough lower bound on the latter can be obtained by dividing the interval $[0,t]$ in $N-1$ intervals and by requiring that for each $i$ the clock of particle with label $i$ rings during the $i$ -th time interval. This shows that $\mathbb{P}(E_{t})\geq 1-Ne^{-{t/N}}$ . ∎

We are ready to accomplish the first and most delicate step in the program outlined at the beginning of this section.

Proposition 39.

Fix $\alpha\geq 1$ and $K\in\mathbb{N}$ . Let $\mu_{t}=\delta_{\wedge}P_{t}$ and let $\bar{\mu}_{t}$ denote the marginal of $\mu_{t}$ on the special particle positions $Y_{i}(t),i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ . If $\bar{\pi}_{N,\alpha}$ denotes the corresponding equilibrium distribution, for any fixed $\delta>0$ , with $t_{\delta}=(1+\delta)\frac{\log N}{2\operatorname{\mathrm{gap}}_{N}}$ one has

[TABLE]

Proof.

From Lemma 38 it follows that

[TABLE]

where $\mu_{t_{\delta},N}\in\mathcal{S}_{N}$ . Since $Ne^{-t_{\delta}/N}\to 0$ , $N\to\infty$ , from Proposition 36, (117) follows if we show that

[TABLE]

By Proposition 37 we know that

[TABLE]

On the other hand, Lemma 38 shows that

[TABLE]

∎

6.3. Relaxation of the censored dynamics

Here we establish the second step in the proof of Proposition 14. Consider the censored process obtained by suppressing all updates of the special particles. In other words, we use the censoring scheme $\mathcal{C}$ such that $\mathcal{C}(s)=\{u_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},u_{K-1}\}$ , $s\geq 0$ .

Proposition 40.

Fix $\alpha>0$ . Let $P_{t,\mathcal{C}}^{x}=\delta_{x}P_{t,\mathcal{C}}$ and let $\pi_{N,\alpha}(\cdot|y)$ denote the equilibrium distribution given the special particle positions $y_{i}=x_{u_{i}},i=1,\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},K-1$ . For any $\delta>0$ small enough, setting $K=\lfloor\delta^{-1}\rfloor$ , and $s_{\delta}=\delta\frac{\log N}{2\operatorname{\mathrm{gap}}_{N}}$ , one has, for $N$ sufficiently large

[TABLE]

Proof.

By construction, the censored process corresponds to the product of $K$ independent adjacent walks each on the $n$ -simplex, with $n:=\lfloor N/K\rfloor$ . The bound (120) implies that when $N$ is sufficiently large, the mixing time of a system of size $n$ satisfies $T_{n}(\varepsilon)\leq Cn^{2}\log n$ for any given $\varepsilon\in(0,1)$ if $n$ is large enough. Therefore,

[TABLE]

for $\delta>0$ small enough. Thus if $d_{N}(t)$ is the distance defined in (5), then

[TABLE]

as required. ∎

6.4. Proof of Proposition 14

With the previous results at hand it is relatively simple to conclude the proof of the desired estimate. We formulate the result as follows. Recall the notation $t_{\delta}=(1+\delta)\frac{\log N}{2\operatorname{\mathrm{gap}}_{N}}$ .

Proposition 41.

Fix $\alpha\geq 1$ . For any $\delta>0$ ,

[TABLE]

Proof.

Set $K=\lfloor\delta^{-1}\rfloor$ and let $\mathcal{C}^{\prime}$ denote the censoring scheme defined by $\mathcal{C}^{\prime}(s)=\eset$ for $s\in[0,t_{\delta/2})$ and $\mathcal{C}^{\prime}(s)=\{u_{1},\mathinner{\ldotp\kern-1.99997pt\ldotp\kern-1.99997pt\ldotp},u_{K-1}\}$ for $s\geq t_{\delta/2}$ . Let also $P^{\wedge}_{t,*}=\delta_{\wedge}P_{t,\mathcal{C}^{\prime}}$ denote the corresponding censored process. From Lemma 34 we have

[TABLE]

A coupling of $P_{t_{\delta},*}^{\wedge}$ and $\pi_{N,\alpha}$ can be achieved as follows. Call $\mu_{t}=\delta_{\wedge}P_{t}$ . Use the optimal coupling attaining the total variation distance $\|\bar{\mu}_{t_{\delta/2}}-\bar{\pi}_{N,\alpha}\|_{TV}$ to couple the special particles at time $t_{\delta/2}$ . If the special particles are coupled at time $t_{\delta/2}$ and if $x$ is the configuration at that time, then couple the remaining particles at time $t_{\delta}=t_{\delta/2}+s_{\delta/2}$ with the optimal coupling attaining the total variation distance $\|P_{s_{\delta/2},\mathcal{C}}^{x}-\pi_{N,\alpha}(\cdot|y)\|_{TV}$ , where $\mathcal{C}$ is as in Proposition 40. This shows that

[TABLE]

From Proposition 39 and Proposition 40,

[TABLE]

The distance $\|P_{t_{\delta}}^{\wedge}-\pi_{N,\alpha}\|_{TV}$ is decreasing as a function of $\delta$ , and therefore we may take $\delta\to 0$ in the right hand side above to conclude. ∎

Remark 42.

Proposition 41 can be strengthened to obtain that for any $\delta>0$ ,

[TABLE]

for an arbitrary $\mu\in\mathcal{S}$ . Indeed, for any $\mu\in\mathcal{S}$ one has $\mu P_{t}\in\mathcal{S}$ and $\mu P_{t}\leq P^{\wedge}_{t}$ and therefore the claim follows from Lemma 31. However, at this point we cannot infer that the same holds for arbitrary initial distribution without the requirement that it belongs to $\mathcal{S}$ .

Appendix A An extension of the Randall-Winkler upper bound

As already discussed in the previous sections, one of the ingredients of our main results is the upper bound (13) on the mixing time that captures the right order of magnitude up to a multiplicative constant, which in the case $\alpha=1$ was obtained by Randall and Winkler [RW05a]. In this section, we explain how one can extend that bound to the Beta-resampling case with $\alpha\geq 1$ . We are going to show more precisely that for any $\varepsilon>0$ , if $N$ is sufficiently large, then

[TABLE]

Let us fix $x\in\Omega_{N}$ and construct a coupling of the two processes $\mathbf{X}^{x}$ and $\mathbf{X}^{\mbox{\tiny eq}}$ , where $\mathbf{X}^{x}$ has initial state $x$ and $\mathbf{X}^{\mbox{\tiny eq}}$ has initial distribution $\pi_{N,\alpha}$ . Clearly,

[TABLE]

Our manner of coupling the two processes is time dependent and is as follows:

(A)

For $t\leq t_{1}:=4(\log N)\operatorname{\mathrm{gap}}_{N}^{-1}$ , we couple the two trajectories using the construction presented in the proof of Proposition 7.

(B)

For $t>t_{1}$ , we couple the two trajectories according to the construction described in Section 5.2, namely we use a coupling that maximizes the probability of sticking at each update.

We first observe that at time $t_{1}$ the two processes have come close to one another:

[TABLE]

To see this, referring to the proof of Proposition 7, we consider the two processes $(\mathbf{X}^{\wedge}(t))_{t\in[0,t_{1}]}$ , $(\mathbf{X}^{\vee}(t))_{t\in[0,t_{1}]}$ obtained by using the same clock processes $\mathcal{T}$ and update variables $U$ as in the construction of $\mathbf{X}^{x}(t)$ and $\mathbf{X}^{\mbox{\tiny eq}}(t)$ . Since this construction is order preserving, we necessarily have

[TABLE]

and (122) can be proved for $X_{k}^{\wedge}(t_{1})-X^{\vee}_{k}(t_{1})$ repeating the computation leading to (65).

Now for $t>t_{1}$ , we remark that in contrast with Randall-Winkler’s case $\alpha=1$ , the step B described above does not descend from a monotone grand coupling, and thus it is not sufficient to couple $X^{\wedge}_{k}(t)$ and $X^{\vee}_{k}(t)$ by time $5\log N\operatorname{\mathrm{gap}}_{N}^{-1}$ to prove the desired mixing time bound.

Instead, we need to couple $\mathbf{X}^{x}(t)$ and $\mathbf{X}^{\mbox{\tiny eq}}(t)$ for arbitrary $x$ . To this end we are going to repeat the recursive argument of Section 5.10 with $X^{\wedge}_{k}(t)$ replaced by $X^{\mbox{\tiny eq}}_{k}(t)$ . In particular, when we perform the update at a site $k$ , we have to replace the interval $I^{\wedge}$ (see Section 5.2) by $I^{\mbox{\tiny eq}}$ , which we define as the resampling interval of $X_{k}^{\mbox{\tiny eq}}$ . As a consequence of this modification, the supports of the densities $\rho_{1}$ and $\rho_{3}$ presented in (52) are not necessarily intervals. Define the events

[TABLE]

From (122) and Markov’s inequality it follows that $\mathbb{P}(\bar{\mathcal{A}}^{(N)}_{1})\to 1$ , and from the proof of Lemma 30 it follows that $\mathbb{P}(\bar{\mathcal{A}}^{(N)}_{2})\to 1$ .

As in Section 5.10 we consider the probability of an unsuccessful update in the time interval $[t_{1},t_{1}+2\log N]$ . Let $\bar{\mathcal{A}}^{(N)}_{3}$ denote the event that each site is updated at least once and no more than $10\log N$ times in this time interval. The probability of $\bar{\mathcal{A}}^{(N)}_{3}$ tends to 1 (cf. Section 5.10).

Using the upper bound (57) and recalling Remark 17, we see that conditioned on the realization of the update times $(t_{i}^{(k)})_{i,k}$ in the time interval $[t_{1},t_{1}+2\log N]$ , on the event $\bar{\mathcal{A}}^{(N)}_{1}\cap\bar{\mathcal{A}}^{(N)}_{3}$ , the probability that there exists at least one unsuccessful update within $[t_{1},t_{1}+2\log N]$ is bounded above by $C\,N^{-1/2}(\log N)^{2}+o(1)$ . This shows that the probability that $\mathbf{X}^{x}(t_{1}+2\log N)\neq\mathbf{X}^{\mbox{\tiny eq}}(t_{1}+2\log N)$ vanishes as $N\to\infty$ , which implies (120). Note that the constant $5$ in this bound is not optimal.

Finally, we remark that the proof given above can be extended to the case $\alpha\in(0,1)$ , with $5$ replaced by a constant which depends on $\alpha$ . However, this requires to prove an inequality that replaces the upper bound of (57), namely, that when $\alpha\in(0,1)$ there exists $C_{\alpha}>0$ such that for any pair of intervals

[TABLE]

We leave the details of the adaptation as well as the proof of (123) to the interested reader.

Acknowledgements

H.L. is grateful to Werner Krauth for bringing to his knowledge the work of Randall and Winkler and for a stimulating discussion on the subject. This work was initiated during a stay of P.C. and C.L. at IMPA, and continued during a stay of P.C. and H.L. at Dauphine: we are grateful for hospitality and support provided by these two institutions. H.L. also acknowledges support from a productivity grant of CNPq and a JCNE grant from FAPERj. C.L. acknowledges support from the grant SINGULAR ANR-16-CE40-0020-01.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AD 87] D. Aldous and P. Diaconis . Strong uniform times and finite random walks. Adv. in Appl. Math. 8 , no. 1, (1987), 69–97. doi:10.1016/0196-8858(87)90006-6 . · doi ↗
2[AF 02] D. Aldous and J. Fill . Reversible markov chains and random walks on graphs, 2002. URL https://www.stat.berkeley.edu/~aldous/RWG/book.pdf .
3[Cap 08] P. Caputo . On the spectral gap of the kac walk and other binary collision processes. Alea 4 , (2008), 205–222.
4[CCL 03] E. A. Carlen , M. C. Carvalho , and M. Loss . Determination of the spectral gap for kac’s master equation and related stochastic evolution. Acta mathematica 191 , no. 1, (2003), 1–54.
5[DFK 91] M. Dyer , A. Frieze , and R. Kannan . A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM (JACM) 38 , no. 1, (1991), 1–17.
6[Dia 96] P. Diaconis . The cutoff phenomenon in finite markov chains. Proceedings of the National Academy of Sciences 93 , no. 4, (1996), 1659–1664.
7[Gia 02] G. Giacomin . Aspects of statistical mechanics of random surfaces . Lecture Notes for course given at IHP. 2002.
8[GS 02] A. L. Gibbs and F. E. Su . On choosing and bounding probability metrics. INTERNAT. STATIST. REV. 419–435.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Mixing time of the adjacent walk on the simplex

Abstract.

Contents

1. Introduction

1.1. Model and results

Proposition 1**.**

Theorem 1**.**

1.2. Generalization to Beta-resampling

Theorem 2**.**

Remark 2**.**

Remark 3**.**

Remark 4**.**

1.3. Mixing time for the separation distance

Theorem 3**.**

Remark 5**.**

1.4. Comments on the proof and related work

2. Preliminaries

Remark 6**.**

2.1. Monotone grand coupling

Proposition 7** (Existence of a grand coupling).**

Proof.

2.2. The FKG inequality

Proposition 8** (FKG inequality).**

Proof.

Lemma 9**.**

Proof.

2.3. Identification of the spectral gap

2.4. Absolute continuity

Lemma 10**.**

Proof.

3. Lower bound on total variation distance

Proposition 11**.**

Lemma 12**.**

Proof of Proposition 11.

Proof of Lemma 12.

4. Mixing time for the separation distance

Proposition 13**.**

Proof of Theorem 3.

Proof of Proposition 13.

5. Upper bound on total variation distance

5.1. Decomposition of the proof

Proposition 14**.**

Proposition 15**.**

5.2. The coupling and preliminary lemmas

Lemma 16**.**

Proof.

Remark 17**.**

Proposition 18**.**

Remark 19**.**

Proof.

5.3. Successive hitting times

5.4. Initial contraction of AtA_{t}At​ and control of T2\mathcal{T}_{2}T2​

Lemma 20**.**

Proof.

5.5. Technical preliminaries to control the probability of B(2)N\mathcal{B}_{(2)}^{N}B(2)N​, B(3)N\mathcal{B}_{(3)}^{N}B(3)N​, B(4)N\mathcal{B}_{(4)}^{N}B(4)N​

Proposition 21**.**

Proof.

5.6. Restriction to the right set of events

Proposition 22**.**

Proof.

5.7. Controlling the Ti\mathcal{T}_{i}Ti​ increments for i∈⟦3,I⟧i\in\llbracket 3,I\rrbracketi∈[[3,I]]

Lemma 23**.**

Lemma 24**.**

Proof of Lemma 23.

Proof of Lemma 24.

5.8. Controlling the Ti\mathcal{T}_{i}Ti​ increments for i∈⟦I+1,K⟧i\in\llbracket I+1,K\rrbracketi∈[[I+1,K]]

Lemma 25**.**

Lemma 26**.**

Proof of Lemma 25.

Lemma 27**.**

Proof.

Proof of Lemma 26.

5.9. Controlling the Ti\mathcal{T}_{i}Ti​ increments for i∈⟦K+1,L⟧i\in\llbracket K+1,L\rrbracketi∈[[K+1,L]]

Lemma 28**.**

Proposition 1.

Theorem 1.

Theorem 2.

Remark 2.

Remark 3.

Remark 4.

Theorem 3.

Remark 5.

Remark 6.

Proposition 7 (Existence of a grand coupling).

Proposition 8 (FKG inequality).

Lemma 9.

Lemma 10.

Proposition 11.

Lemma 12.

Proposition 13.

Proposition 14.

Proposition 15.

Lemma 16.

Remark 17.

Proposition 18.

Remark 19.

5.4. Initial contraction of $A_{t}$ and control of $\mathcal{T}_{2}$

Lemma 20.

5.5. Technical preliminaries to control the probability of $\mathcal{B}_{(2)}^{N}$ , $\mathcal{B}_{(3)}^{N}$ , $\mathcal{B}_{(4)}^{N}$

Proposition 21.

Proposition 22.

5.7. Controlling the $\mathcal{T}_{i}$ increments for $i\in\llbracket 3,I\rrbracket$

Lemma 23.

Lemma 24.

5.8. Controlling the $\mathcal{T}_{i}$ increments for $i\in\llbracket I+1,K\rrbracket$

Lemma 25.

Lemma 26.

Lemma 27.

5.9. Controlling the $\mathcal{T}_{i}$ increments for $i\in\llbracket K+1,L\rrbracket$

Lemma 28.

Lemma 29.

Lemma 30.

Lemma 31.

Lemma 32.

Lemma 33.

Lemma 34.

Lemma 35.

Proposition 36.

Proposition 37.

Lemma 38.

Proposition 39.

Proposition 40.

Proposition 41.

Remark 42.