Long-term concentration of measure and cut-off

Andrew Barbour; Graham Brightwell; Malwina Luczak

arXiv:1902.00822·math.PR·May 24, 2022

Long-term concentration of measure and cut-off

Andrew Barbour, Graham Brightwell, Malwina Luczak

PDF

Open Access

TL;DR

This paper develops new concentration inequalities for Markov chains, enabling the analysis of the cut-off phenomenon in both finite and infinite state spaces, with applications to models like Bernoulli-Laplace, disease spread, and supermarket queues.

Contribution

It introduces generalized concentration of measure inequalities for Markov chains, extending cut-off analysis to infinite state spaces and diverse models.

Findings

01

Probabilistic proof of cut-off for Bernoulli-Laplace model

02

Extended cut-off concept to infinite state space chains

03

Concentration results for supermarket model

Abstract

We present new concentration of measure inequalities for Markov chains, generalising results for chains that are contracting in Wasserstein distance. These are particularly suited to establishing the cut-off phenomenon for suitable chains. We apply our discrete-time inequality to the well-studied Bernoulli-Laplace model of diffusion, and give a probabilistic proof of cut-off, recovering and improving the bounds of Diaconis and Shahshahani. We also extend the notion of cut-off to chains with an infinite state space, and illustrate this in a second example, of a two-host model of disease in continuous time. We give a third example, giving concentration results for the supermarket model, illustrating the full generality and power of our results.

Equations437

d_{n}(t)=\max_{x\in{S}^{(n)}}d_{TV}\Bigl{(}{\mathcal{L}}_{x}\bigl{(}X^{(n)}(t)\bigr{)},\pi^{(n)}\Bigr{)},

d_{n}(t)=\max_{x\in{S}^{(n)}}d_{TV}\Bigl{(}{\mathcal{L}}_{x}\bigl{(}X^{(n)}(t)\bigr{)},\pi^{(n)}\Bigr{)},

s \to \infty lim n \to \infty lim inf d_{n} (t_{n} - s w_{n}) = 1; s \to \infty lim n \to \infty lim sup d_{n} (t_{n} + s w_{n}) = 0.

s \to \infty lim n \to \infty lim inf d_{n} (t_{n} - s w_{n}) = 1; s \to \infty lim n \to \infty lim sup d_{n} (t_{n} + s w_{n}) = 0.

\displaystyle d_{TV}\Bigl{(}{\mathcal{L}}_{x}\bigl{(}X^{(n)}(t_{n}(x)-s(\varepsilon)w_{n})\bigr{)},\pi^{(n)}\Bigr{)}\ >\ 1-\varepsilon,

\displaystyle d_{TV}\Bigl{(}{\mathcal{L}}_{x}\bigl{(}X^{(n)}(t_{n}(x)-s(\varepsilon)w_{n})\bigr{)},\pi^{(n)}\Bigr{)}\ >\ 1-\varepsilon,

\displaystyle d_{TV}\Bigl{(}{\mathcal{L}}_{x}\bigl{(}X^{(n)}(t_{n}(x)+s(\varepsilon)w_{n})\bigr{)},\pi^{(n)}\Bigr{)}\ <\ \varepsilon,

N (x)

N (x)

(P^{k} f) (x) := E_{x} [f (X (k))], x \in S,

(P^{k} f) (x) := E_{x} [f (X (k))], x \in S,

\displaystyle\big{|}(P^{i}f)(x)-(P^{i}f)(y)]\big{|}\ \leq\ \beta,\quad x\in{\widetilde{S}},\ y\in N(x);

\displaystyle\big{|}(P^{i}f)(x)-(P^{i}f)(y)]\big{|}\ \leq\ \beta,\quad x\in{\widetilde{S}},\ y\in N(x);

\displaystyle\sum_{y\in N(x)}P(x,y)\bigl{(}(P^{i}f)(x)-(P^{i}f)(y)\bigr{)}^{2}\ \leq\ \alpha_{i},\quad x\in{\widetilde{S}},

\operatorname{\mathbb{P}{}}_{x_{0}}\Bigl{(}\bigl{\{}\bigl{|}f(X(k))-(P^{k}f)(x_{0})\bigr{|}\geq m\bigr{\}}\cap A_{k}\Bigr{)}\ \leq\ 2e^{-m^{2}/(2a_{k}+4\beta m/3)}.

\operatorname{\mathbb{P}{}}_{x_{0}}\Bigl{(}\bigl{\{}\bigl{|}f(X(k))-(P^{k}f)(x_{0})\bigr{|}\geq m\bigr{\}}\cap A_{k}\Bigr{)}\ \leq\ 2e^{-m^{2}/(2a_{k}+4\beta m/3)}.

L^{2}\sum_{y\in N(x)}P(x,y)W\bigl{(}{\mathcal{L}}_{x}(X(i)),{\mathcal{L}}_{y}(X(i))\bigr{)}^{2},

L^{2}\sum_{y\in N(x)}P(x,y)W\bigl{(}{\mathcal{L}}_{x}(X(i)),{\mathcal{L}}_{y}(X(i))\bigr{)}^{2},

var (f (X (k))

var (f (X (k))

{\mathbb{P}}\big{(}\big{|}Z-\mu\big{|}\geq m\big{)}\ \leq\ 2e^{-m^{2}/(2\delta+2\gamma m/3)}.

{\mathbb{P}}\big{(}\big{|}Z-\mu\big{|}\geq m\big{)}\ \leq\ 2e^{-m^{2}/(2\delta+2\gamma m/3)}.

A(\delta,\gamma)\ :=\ \Bigl{\{}\sum_{i=1}^{k}\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})\leq\delta\Bigr{\}}\cap\bigl{\{}\big{|}Z_{i}-Z_{i-1}\big{|}\leq\gamma,\,1\leq i\leq k\bigr{\}}.

A(\delta,\gamma)\ :=\ \Bigl{\{}\sum_{i=1}^{k}\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})\leq\delta\Bigr{\}}\cap\bigl{\{}\big{|}Z_{i}-Z_{i-1}\big{|}\leq\gamma,\,1\leq i\leq k\bigr{\}}.

\displaystyle{\mathbb{P}}\Bigl{(}\big{\{}\big{|}Z-\mu\big{|}\geq m\big{\}}\cap A(\delta,\gamma)\Bigr{)}\ \leq\ 2e^{-m^{2}/(2\delta+2\gamma m/3)}.

\displaystyle{\mathbb{P}}\Bigl{(}\big{\{}\big{|}Z-\mu\big{|}\geq m\big{\}}\cap A(\delta,\gamma)\Bigr{)}\ \leq\ 2e^{-m^{2}/(2\delta+2\gamma m/3)}.

\operatorname{\mathbb{E}{}}\Bigl{(}Ie^{h\sum_{i=1}^{k}Y_{i}}\mid\mathcal{F}_{0}\Bigr{)}\ \leq\ \operatorname*{ess\,sup}\Bigl{(}I\prod_{i=1}^{k}\operatorname{\mathbb{E}{}}(e^{hY_{i}}\mid\mathcal{F}_{i-1})\mid\mathcal{F}_{0}\Bigr{)}.

\operatorname{\mathbb{E}{}}\Bigl{(}Ie^{h\sum_{i=1}^{k}Y_{i}}\mid\mathcal{F}_{0}\Bigr{)}\ \leq\ \operatorname*{ess\,sup}\Bigl{(}I\prod_{i=1}^{k}\operatorname{\mathbb{E}{}}(e^{hY_{i}}\mid\mathcal{F}_{i-1})\mid\mathcal{F}_{0}\Bigr{)}.

E (e^{h (Z_{i} - Z_{i - 1})} ∣ F_{i - 1}) \leq e^{h^{2} g (h dev_{i}^{+}) var_{i}} .

E (e^{h (Z_{i} - Z_{i - 1})} ∣ F_{i - 1}) \leq e^{h^{2} g (h dev_{i}^{+}) var_{i}} .

E (I e^{h (Z - μ)})

E (I e^{h (Z - μ)})

P ({Z - μ \geq m} \cap A (δ, γ))

P ({Z - μ \geq m} \cap A (δ, γ))

P ({Z - μ \geq m} \cap A (δ, γ)) \leq e^{- m^{2} / (2 δ + 2 γ m /3)} .

P ({Z - μ \geq m} \cap A (δ, γ)) \leq e^{- m^{2} / (2 δ + 2 γ m /3)} .

Z_{i} = E_{x_{0}} [f (X (k)) ∣ F_{i}] = (P^{k - i} f) (X (i)) .

Z_{i} = E_{x_{0}} [f (X (k)) ∣ F_{i}] = (P^{k - i} f) (X (i)) .

var (Z_{i} ∣ X (i - 1) = x_{i - 1})

var (Z_{i} ∣ X (i - 1) = x_{i - 1})

var (Z_{i} ∣ X (i - 1) = x_{i - 1})

var (Z_{i} ∣ X (i - 1) = x_{i - 1})

i = 1 \sum k var (Z_{i} ∣ F_{i - 1}) \leq j = 0 \sum k - 1 α_{j} = a_{k},

i = 1 \sum k var (Z_{i} ∣ F_{i - 1}) \leq j = 0 \sum k - 1 α_{j} = a_{k},

Z_{i-1}=\operatorname{\mathbb{E}{}}\bigl{\{}\operatorname{\mathbb{E}{}}(f(X(k))\,|\,\mathcal{F}_{i})\,|\,\mathcal{F}_{i-1}\bigr{\}}=\!\sum_{z\in N(X(i-1))}P(X(i-1),z)(P^{k-i}f)(z).

Z_{i-1}=\operatorname{\mathbb{E}{}}\bigl{\{}\operatorname{\mathbb{E}{}}(f(X(k))\,|\,\mathcal{F}_{i})\,|\,\mathcal{F}_{i-1}\bigr{\}}=\!\sum_{z\in N(X(i-1))}P(X(i-1),z)(P^{k-i}f)(z).

\bigl{|}(P^{i}f)(y)-(P^{i}f)(z)\bigr{|}\ \leq\ 2\beta.

\bigl{|}(P^{i}f)(y)-(P^{i}f)(z)\bigr{|}\ \leq\ 2\beta.

\displaystyle\bigl{|}Z_{i}-Z_{i-1}\bigr{|}

\displaystyle\bigl{|}Z_{i}-Z_{i-1}\bigr{|}

E [d (X^{(1)} (1), X^{(2)} (1)) ∣ (X^{(1)} (0), X^{(2)} (0)) = (x, y)] \leq (1 - ρ) d (x, y) .

E [d (X^{(1)} (1), X^{(2)} (1)) ∣ (X^{(1)} (0), X^{(2)} (0)) = (x, y)] \leq (1 - ρ) d (x, y) .

x, y \in S sup \frac{W _{d} ( L _{x} ( X ( 1 )) , L _{y} ( X ( 1 )))}{d ( x , y )} \leq 1 - ρ,

x, y \in S sup \frac{W _{d} ( L _{x} ( X ( 1 )) , L _{y} ( X ( 1 )))}{d ( x , y )} \leq 1 - ρ,

x \sim y sup W_{d} (L_{x} (X (1)), L_{y} (X (1))) \leq 1 - ρ,

x \sim y sup W_{d} (L_{x} (X (1)), L_{y} (X (1))) \leq 1 - ρ,

y \in N (x) \sum P (x, y) d (x, y)^{2} \leq D_{2}

y \in N (x) \sum P (x, y) d (x, y)^{2} \leq D_{2}

\displaystyle\operatorname{\mathbb{P}{}}_{x}\Big{(}\left|f(X(k))-\operatorname{\mathbb{E}{}}_{x}[f(X(k))]\right|\geq m\Big{)}\phantom{ebweuybrubare}

\displaystyle\operatorname{\mathbb{P}{}}_{x}\Big{(}\left|f(X(k))-\operatorname{\mathbb{E}{}}_{x}[f(X(k))]\right|\geq m\Big{)}\phantom{ebweuybrubare}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Neurological and metabolic disorders · Geometric Analysis and Curvature Flows

Full text

Long-term concentration of measure and cut-off

A. D. Barbour

Institut für Mathematik, Universität Zürich, Winterthurertrasse 190, CH-8057 Zürich

[email protected]

,

Graham Brightwell

Department of Mathematics, LSE

[email protected]

and

Malwina Luczak

School of Mathematics and Statistics, University of Melbourne

[email protected]

Abstract.

We present new concentration of measure inequalities for Markov chains, generalising results for chains that are contracting in Wasserstein distance. These are particularly suited to establishing the cut-off phenomenon for suitable chains. We apply our discrete-time inequality to the well-studied Bernoulli-Laplace model of diffusion, and give a probabilistic proof of cut-off, recovering and improving the bounds of Diaconis and Shahshahani. We also extend the notion of cut-off to chains with an infinite state space, and illustrate this in a second example, of a two-host model of disease in continuous time. We give a third example, giving concentration results for the supermarket model, illustrating the full generality and power of our results.

Key words and phrases:

Markov chains, concentration of measure, coupling, cut-off

2000 Mathematics Subject Classification:

60J75, 60C05, 60F15

ADB: Work supported in part by Australian Research Council Grants Nos DP150101459 and DP150103588, and by the ARC Centre of Excellence for Mathematical and Statistical Frontiers, CE140100049. Thanks to the mathematics departments of the University of Melbourne and Monash University for their kind hospitality.

MJL: Research partly supported by an EPSRC Leadership Fellowship EP/J004022/2 and partly by ARC Future Fellowship FT170100409.

1. Introduction

We have two main aims in this paper. The first is to develop some new concentration of measure inequalities for Markov chains, both in discrete and continuous time, and the second is to introduce a wider perspective on the cut-off phenomenon for convergence to equilibrium of Markov chains. Our past work suggests a strong connection between long-term concentration of measure, rapid mixing, and cut-off: this paper is an attempt to formalise, explain and illustrate this.

Our concentration of measure inequalities generalise and extend earlier results applicable for chains contracting in Wasserstein distance, which means that there is a metric on the state space so that the chain makes only short steps with respect to the metric, and a coupling of two copies of the chain so that the distance between the two copies decreases in expectation – in the language of Ollivier [26], this means that the chain has positive coarse Ricci curvature. For discrete-time Markov chains with positive coarse Ricci curvature, Ollivier proves that any real-valued function of the Markov chain that is Lipschitz with respect to the metric remains well-concentrated around its expectation for all time, and in equilibrium; a similar result follows from results of Luczak [19] proved independently at around the same time. Paulin [27] gives a more general framework, obtaining concentration results, and bounds on the mixing time, in cases where the “multi-step coarse Ricci curvature” is positive, even if the coarse Ricci curvature is not. The concentration results proved in these papers, as well as in the present paper, are of the “Gaussian then exponential” type, akin to Bernstein’s Inequalities: the probability of deviations of at least $m$ from the mean is of order $e^{-cm^{2}}$ for small $m$ and $e^{-cm}$ for large $m$ – Ollivier gives examples where this is the best possible form of the concentration inequality.

Our new results in discrete time do not rely on the existence of a well-behaved metric on the state space, and require only conditions regarding the function of interest. Thus we obtain stronger concentration results for functions of the chain that evolve much more slowly than the total transition rate of the chain, as long as they are contractive, in a suitable sense. We recover essentially the same result as Ollivier in the case of positive coarse Ricci curvature, and we can also obtain results very similar to those of Paulin, but our results can also be used to prove concentration of measure in other settings. The application we give in the final section of our paper gives a concentration result that we do not know how to obtain by other means.

We also give analogous concentration inequalities for continuous-time Markov chains. These are entirely new, although, for chains contracting in Wasserstein distance, similar results could be obtained via the methods and results of Ollivier [26], Luczak [19] and Paulin [27]. Veysseire [30] gives definitions and results for coarse Ricci curvature in continuous time, but does not prove any results that are closely related to ours.

We now turn to the cut-off phenomenon. For a Markov chain $(X(t))$ , with initial state $X(0)=x$ , consider the total variation distance between the law of the process at time $t$ and the equilibrium distribution. The chain is said to exhibit the cut-off phenomenon if this distance falls from near 1 to near 0 over a window of time that is much shorter than the mixing time. In previous work, it is assumed that the state space is finite, and the starting state $x$ is chosen to maximise the mixing time. We present a version of the definition allowing for an infinite state space, and for variation of the mixing time over a region of potential initial states, with a cut-off window of width that is uniform across this region.

Our concentration of measure inequalities, combined with coupling arguments, are well-suited to proving cut-off, and we illustrate this with two examples of independent interest. The first is the well-known Bernoulli-Laplace model of diffusion: there are initially $n$ red balls in one urn and $n$ black balls in another, and at each time step one ball from each urn is chosen uniformly at random and the two balls are exchanged. Cut-off was proved for this model in 1987 by Diaconis and Shahshahani [6] using algebraic techniques: we provide a probabilistic proof, essentially recovering the bound of Diaconis and Shahshahani for the upper tail of the distribution of the mixing time, while providing a sharper bound for the lower tail.

Our second application concerns a continuous-time model of a disease with two types of host, each infecting the other; the disease is supported at a low level in a population by immigration of both types of infected host from outside. This example illustrates both the application of our new continuous-time concentration inequality and our new concept of cut-off, as the state space is infinite and the mixing time varies significantly depending on the initial conditions.

In both of the sample applications above, the chain we examine is contractive in Wasserstein distance, and variants of the results we obtain could also be obtained from concentration inequalities in earlier work. We also present a third application which uses the full power of our new continuous-time inequality; this treats the supermarket model, a well-known queueing system, with a certain range of parameter values. In this example, we utilise facts about the equilibrium distribution from a paper of Brightwell, Fairthorne and Luczak [2], alongside our long-term concentration result, to show tight concentration in equilibrium of the number of empty queues.

1.1. Concentration of measure inequalities

Our general concentration inequality for discrete-time Markov chains appears as Theorem 2.1, and the special case where the chain is contracting in Wasserstein distance with respect to a suitable metric as Corollary 2.3. Part (a) of Theorem 2.3 is very similar to Theorem 32 of Ollivier [26] – that result is for the equilibrium distribution of the chain, whereas ours is for finite-time distributions, but Ollivier’s Remark 34 indicates that the proof in his paper transfers to the finite-time case. A similar result for chains contracting in Wasserstein distance also follows readily from Theorem 4.5 of Luczak [19]. We give more details after we have given precise definitions and statements of theorems.

There is another quite different recent strand of work providing tools to show concentration of measure and rapid mixing for a given function of a Markov chain, useful in circumstances where the function mixes more rapidly than the chain itself. See Watanabe and Hayashi [32] and Rabinovitch, Ramdas, Jordan and Wainwright [29].

Results similar to Theorem 2.1 appear in earlier works of the third author, some unpublished, and a number of other applications are to be found in these papers, as well as in Gheissari, Lubetzky and Peres [11]. The flavour of the inequality is similar to that of Luczak [19], but Theorem 2.1 can be much more powerful when the chain makes frequent transitions that do not alter the value of the function of interest.

One example where this is relevant is the supermarket model, as studied in the final section of this paper, where the number of queues of length $k$ only changes infrequently for some values of $k$ .

Another example is the alternative routing model of Gibbens, Hunt and Kelly [12]. Here, there are links of limited capacity between each pair of nodes in a phone network; requests for pairs of nodes to be connected arrive according to a Poisson process, and these can be met either by using the direct link or by using some path of two links. Different protocols have been proposed and studied for choosing the route; one such is to use the direct link if it has spare capacity, and if not then to inspect $d\geq 1$ links of two routes, and use one of those with most spare capacity. In an unpublished preprint of Luczak [20], an earlier version of Theorem 2.1 is used to prove a differential equation approximation for this model, extending earlier results of Crametz and Hunt [5] and Graham and Méléard [13]. The equilibrium behaviour of the model is studied via a similar approach in an unpublished preprint of Brightwell and Luczak [3]. The same methods can be used to treat other routing protocols. The key principle is that quantities such as the number of occupied links incident with a given node change far less often than the overall state of the network. Our new result, Theorem 2.1, improves on the earlier version in Luczak [20] (Theorem 2.3) by weakening and simplifying its hypotheses.

The corresponding inequality for continuous-time Markov chains is Theorem 3.1, and the special case for chains that are contracting in Wasserstein distance is Theorem 3.3. Our proof for continuous time uses different methods to those used for discrete time (although both proofs draw on principles of concentration of measure for martingales), and it is perhaps a little surprising that the resulting theorems are nearly exact analogues of each other. In Brightwell and Luczak [3], a continuous-time model is analysed (somewhat awkwardly) by applying discrete-time concentration of measure inequalities from [20] to its jump chain; it seems that this analysis would be eased by direct application of our new continuous-time inequalities, and we plan to produce an improved version of [3] in the future.

Our notion of contraction in Wasserstein distance is very different in flavour from that of contraction in total variation distance, as studied by Marton [22] and others subsequently. In particular, for a chain to exhibit contraction in total variation distance, it is necessary that, from any two states, there is a positive probability that two coupled chains started in these states coalesce in a single step.

1.2. Cut-off

We now discuss the cut-off phenomenon in the convergence to equilibrium for sequences $X^{(n)}$ of Markov chains.

Let ${\mathcal{L}}_{x}(X^{(n)}(t))$ denote the distribution of $X^{(n)}$ when $X^{(n)}(0)=x$ , and let $\pi^{(n)}$ be the equilibrium distribution of $X^{(n)}$ . Let $S^{(n)}$ denote the state space of the chain $X^{(n)}$ .

In earlier papers (for instance, Diaconis and Shahshahani [6] and Levin, Luczak and Peres [17]), cut-off is defined as follows, in the case where the state space $S^{(n)}$ is finite for each $n$ . The worst-case distance to stationarity for the chain $X^{(n)}$ at time $t$ is

[TABLE]

and the sequence $X^{(n)}$ of chains is said to exhibit cut-off at time $t_{n}$ with window width $w_{n}$ if $w_{n}=o(t_{n})$ and

[TABLE]

In other words, for a large constant $s$ , at time $t_{n}+sw_{n}$ , the chain $X^{(n)}$ is nearly in equilibrium, whatever the starting state; on the other hand, there is a starting state $x\in{S}^{(n)}$ such that the chain $X^{(n)}$ starting from state $x$ is very far from equilibrium at time $t_{n}-sw_{n}$ .

In many cases where cut-off, with window width $w_{n}$ , can be proven, the situation is typically as follows, with a proof involving two separate arguments. The state space has a metric, and the Markov chain makes jumps that are small with respect to this metric. The equilibrium distribution is concentrated around some point $y$ (suitably scaled with $n$ ) in the state space. If the chain is started at some “distant” point $x$ , one shows that its trajectory is concentrated around its expectation, up until some time $t_{n}(x)$ when the expectation becomes suitably close to $y$ . Once in the neighbourhood of $y$ , one seeks a coupling with a copy of the chain in equilibrium, where coalescence takes place in time of order $w_{n}$ . One example of such a proof was given by Levin, Luczak and Peres [17], and our examples in Sections 5 and 6 both illustrate this general approach.

Similar behaviour is often to be found in examples where the state space is infinite, and there is no “most distant” starting point from equilibrium. For instance, in a population model, there may be no effective upper bound on the initial size of a population. Thus we find it useful to introduce a more general notion of cut-off, where the mixing time $t_{n}(x)$ depends on the initial state, but the window width $w_{n}$ is independent of the starting state. The proof scheme above can then be applied, provided we restrict the class of allowed initial states to exclude (a) states $x$ too close to the point $y$ around which the equilibrium is concentrated, where the “travel time” $t_{n}(x)$ from $x$ to $y$ will be of similar or smaller order to the time $w_{n}$ required for coalescence of the coupled chains in the neighbourhood of $y$ , and (b) possibly also states $x$ extremely distant from $y$ , where the fluctuation in the travel time exceeds the window width $w_{n}$ .

We now give our formal definition of cut-off, which extends the previous definition, and in particular allows for an infinite state space. For $E_{n}$ a subset of the state space $S^{(n)}$ of $X^{(n)}$ , let $(t_{n}(x),\,x\in E_{n})$ be a collection of non-random times, and let $(w_{n})$ be a sequence of numbers such that $\lim_{n\to\infty}\inf_{x\in E_{n}}t_{n}(x)/w_{n}=\infty$ . We say that $X^{(n)}$ exhibits cut-off at time $t_{n}(x)$ on $E_{n}$ with window width $w_{n}$ , if there exist (non-random) constants $(s(\varepsilon),\,\varepsilon>0)$ such that, for any $\varepsilon>0$ and for all $n$ large enough,

[TABLE]

uniformly for all $x\in E_{n}$ .

In some examples, the travel time $t_{n}(x)$ can be taken not to depend on $x$ , as long as $x\in E_{n}$ . We say that $X^{(n)}$ exhibits cut-off at $t_{n}$ on $E_{n}$ with window width $w_{n}$ , for a sequence $(t_{n},\,n\geq 1)$ , if the $t_{n}(x)$ in the definition above can be set equal to $t_{n}$ for all $n$ and all $x\in E_{n}$ . An illustration of this last concept comes in Section 5; the idea here is that the expected “travel times” from all suitably distant starting states are nearly equal.

Our concentration of measure results are suited to showing that a Markov chain closely follows an almost deterministic trajectory until it reaches the neighbourhood near where the equilibrium is concentrated. In order to complete a proof of cut-off, one needs to show that convergence to equilibrium is rapid once that neighbourhood has been reached. Proposition 4.1 gives conditions guaranteeing that a Markov chain taking non-negative real values, with a non-positive drift in all positive states, reaches 0 quickly with high probability. This implies an upper bound on the coalescence time for the two copies of the chain in a contracting coupling. We give such a result only in continuous time, and apply it in our continuous-time sample application in Section 6. Our proof of Proposition 4.1 is based on the proof of a discrete-time analogue appearing as Proposition 17.19 of Levin, Peres and Wilmer [18]. Our application in Section 5 requires a sharper coupling result specific to the model; using some version of Proposition 17.19 from [18] would give weaker bounds on the tail of the distribution of the mixing time.

1.3. Applications

We give three examples. The first two feature chains that are contracting in Wasserstein distance, illustrating both our methods and the cut-off phenomenon. In the third example, we prove results about concentration of the equilibrium distribution by using the full strength of our new concentration inequalities.

Section 5 concerns the Bernoulli–Laplace model of diffusion, originally investigated in the context of cut-off by Diaconis and Shahshahani [6]. In this discrete-time model, there are two urns each containing $n$ balls, with $n$ red and $n$ black balls in total: at each time step, one ball is chosen uniformly at random from each urn and the two are exchanged. The state of the system after $r$ steps is captured by the number $X(r)$ of red balls in the left urn, and one compares the distribution of $X(r)$ with the stationary distribution (which is concentrated around $n/2$ ). Diaconis and Shahshahani prove cut-off for $X(r)$ at time $\frac{1}{4}n\log n$ with window width $n$ . Indeed, their proof establishes cut-off not only for the most distant starting states (where $X(0)=0$ or $n$ ) but on any set $E_{n}(\varepsilon)=\{j:|j-\frac{n}{2}|\geq\varepsilon n\}$ . They also give specific exponential rates for the tail of the distribution of the mixing time. The methods used by Diaconis and Shahshahani are algebraic: we give an alternative proof, using our concentration of measure results. Our proof gives the same exponential rate for the upper tail as in [6], although our proof does not give information about the extreme end of the tail, where the total variation distance between the distribution at time $r$ and the equilibrium distribution is below $n^{-1/2}\log^{2}n$ . Our methods yield a doubly exponential rate for the lower tail, improving on the results of Diaconis and Shahshahani.

In Section 6, we consider a toy model of a subcritical two-host infection, maintained by immigration of infectives from outside, at rates that are constant multiples of a scale parameter $n$ . Our model is appropriate in circumstances where the number of infectives is small compared to the total population size, and the expected number of infectives of each type of host satisfies a linear equation with a fixed point $n{\mathbf{c}}\in{\mathbb{R}}_{+}^{2}$ . We consider an arbitrary starting state ${\mathbf{x}}$ within an annular region $E_{n}(\zeta)=\{{\mathbf{y}}:n\zeta\leq|{\mathbf{y}}-n{\mathbf{c}}|\leq n/\zeta\}$ , where $\zeta\in(0,1)$ , and we show cut-off at $t_{n}({\mathbf{x}})$ with window width 1 over this region. Here the travel time $t_{n}({\mathbf{x}})$ is bounded between two constants times $\log n$ , but varies over the region $E_{n}(\zeta)$ , for any $\zeta\in(0,1)$ .

In Section 7, we consider the supermarket model. In this $n$ -server queueing model, customers arrive according to a Poisson process at rate $\lambda n$ , where $\lambda<1$ , and inspect $d\geq 1$ queues before joining a shortest queue among these $d$ . The service time of each customer is exponential of mean 1. We consider a parameter regime where $\lambda$ tends to 1 as $1-n^{-\alpha}$ , and $d$ grows as $n^{\beta}$ , where $\alpha$ and $\beta$ are constants satisfying certain inequalities. We choose the precise parameter range so that, as shown by Brightwell, Fairthorne and Luczak [2], the maximum queue length in equilibrium is 2 with high probability, and most queues have length exactly 2. For this model, we study the distribution of the number of empty queues, and show that it is concentrated within order $n^{\frac{1}{2}(1-\beta)}$ of its mean $n^{1-\alpha}$ . The application is chosen to illustrate the power of our general results; most transitions of the chain do not affect the number of empty queues, so that our methods give stronger concentration results than we are able to obtain by any other means. The techniques we use will extend readily to other parameter ranges.

Further consequences of inequalities Theorem 2.1 and Theorem 3.1 will be explored in future work.

2. Concentration inequalities: discrete time

In this section, we first state and prove a general concentration of measure inequality designed for the analysis of discrete-time Markov chains, generalizing results of Luczak [19]. We then show how to recover a version of a result of Ollivier [26] for contracting chains, which is perhaps more appealing and still fairly widely applicable. Next, we outline how to use the inequality when we have a coupling of two copies of the chain which is “approximately contracting” in the function of interest. Finally, we give a toy example to illustrate the application of the inequalities.

2.1. Main result

Here and throughout, we use ${\mathbb{Z}}_{+}$ to denote the non-negative integers. Let $X=(X(i))_{i\in{\mathbb{Z}}_{+}}$ be a discrete-time Markov chain with a discrete state space ${S}$ and transition probabilities $P(x,y)$ for $x,y\in S$ . We allow $X$ to be lazy; that is, we allow $P(x,x)>0$ for $x\in S$ .

For $x\in S$ , we set

[TABLE]

For $k\in{\mathbb{Z}}_{+}$ and a function $f\colon S\to{\mathbb{R}}$ , define the function $P^{k}f$ by

[TABLE]

whenever it exists, where $\operatorname{\mathbb{E}{}}_{x}$ and ${\mathbb{P}}_{x}$ are used to denote conditional expectation and probability given $X(0)=x$ .

Theorem 2.1.

Let $P$ be the transition matrix of a discrete-time Markov chain $(X(i))_{i\in{\mathbb{Z}}_{+}}$ with discrete state space $S$ . Let ${\widetilde{S}}$ be a subset of $S$ . Let $f\colon S\to{\mathbb{R}}$ be a function such that $(P^{i}f)(x)$ exists for all $x\in S$ and $i\in{\mathbb{Z}}_{+}$ , and satisfying, for all $i\in{\mathbb{Z}}_{+}$ ,

[TABLE]

where $\beta$ and $(\alpha_{i})_{i\in{\mathbb{Z}}_{+}}$ are positive constants. Set $a_{k}:=\sum_{i=0}^{k-1}\alpha_{i}$ , $k\geq 1$ . Define $A_{k}:=\{X(i)\in{\widetilde{S}}\mbox{ for }0\leq i\leq k-1\}$ , the event that $(X(i))$ stays in ${\widetilde{S}}$ for the first $k-1$ steps. Then, for all $x_{0}\in{\widetilde{S}}$ and all $m\geq 0$ ,

[TABLE]

The conditions of the theorem are what is needed to fit into the framework of bounded differences (Bernstein-like) inequalities, and the expression in the assumption on $f$ is, as we shall see, exactly what emerges when we bound conditional variances.

Evidently $a_{k}$ increases with $k$ . Under a contractivity assumption, as we shall see shortly, the $\alpha_{i}$ can be taken to tend to 0 exponentially, so that the $a_{k}$ are uniformly bounded: this means that we have a concentration of measure bound that is uniform in $k$ . The result can also be applied in circumstances where the $\alpha_{i}$ either converge more slowly to 0, or increase not too rapidly: in these cases, we obtain tighter concentration of $f(X(k))$ for smaller values of $k$ .

Theorem 2.1 improves on Theorem 4.5 of Luczak [19] by using (2.2) to define $\alpha_{i}$ , instead of the cruder bound

[TABLE]

where $f$ is assumed to be a Lipschitz function with Lipschitz constant $L$ , and $W$ denotes the Wasserstein distance (both defined with respect to the same metric on the state space $S$ ). This is particularly important in contexts in which $f(X(i))$ evolves significantly more slowly than $X(i)$ itself, because many of the transitions of $X$ do not change the value of $f$ . An example where this is relevant is the supermarket model, discussed in the final section of this paper, as well as the alternative routing model of Gibbens, Hunt and Kelly [12] and its generalisation, as studied in Brightwell and Luczak [3]. (These particular examples are set up as continuous-time Markov chains, for which our companion inequality, Theorem 3.1, is more naturally applicable, though it is also natural to consider their discrete-time analogues.) Theorem 2.1 also improves on Theorem 2.3 of Luczak [20], by weakening and simplifying its hypotheses.

In the case where the hypotheses of Theorem 2.1 are satisfied with ${\widetilde{S}}=S$ , we can immediately derive a bound on the variance of $f(X(k))$ , valid for any fixed starting state $x_{0}$ . Indeed, we have

[TABLE]

2.2. Proof of Theorem 2.1

To prove Theorem 2.1, we use a slight extension of a result of McDiarmid [23]. Inequality (2.4) in Lemma 2.2 below is a ‘two-sided’ version of inequality (3.28) in Theorem 3.15 of McDiarmid [23]; inequality (2.5) is a slight extension of inequality (3.29) of McDiarmid [23], in that we work with a non-deterministic bound on $|Z_{i}-Z_{i-1}|$ , and is also two-sided.

For a square integrable random variable $Y$ and a $\sigma$ -field $\mathcal{G}\subseteq\mathcal{F}$ , we use $\operatorname{\mathrm{v}ar}(Y\mid\mathcal{G})$ to denote the conditional variance of $Y$ on $\mathcal{G}$ .

Lemma 2.2.

Let $(\Omega,\mathcal{F},{\mathbb{P}})$ be a probability space equipped with a filtration $\{\emptyset,\Omega\}=\mathcal{F}_{0}\subseteq\mathcal{F}_{1}\subseteq\cdots\subseteq\mathcal{F}_{k}$ in $\mathcal{F}$ . Let $Z$ be an $\mathcal{F}_{k}$ -measurable random variable with $\operatorname{\mathbb{E}{}}Z=\mu$ , and let $Z_{i}=\operatorname{\mathbb{E}{}}(Z\mid\mathcal{F}_{i})$ , for $i=0,\dots,k$ . Let $\gamma$ and $\delta$ be constants such that $\sum_{i=1}^{k}\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})(\omega)\leq\delta$ a.s. and $|Z_{i}(\omega)-Z_{i-1}(\omega)|\leq\gamma$ a.s. for all $i=1,\dots,k$ . Then for any $m\geq 0$ ,

[TABLE]

More generally, the following holds. For $\delta,\gamma\geq 0$ , let

[TABLE]

For any $m\geq 0$ and any values $\delta,\gamma\geq 0$ ,

[TABLE]

The proof is that of Theorem 3.15 (inequalities (3.28) and (3.29)) in McDiarmid [23], except that we use the indicator of the event $A(\delta,\gamma)$ instead of the event $\Bigl{\{}\sum_{i=1}^{k}\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})\leq\delta\Bigr{\}}$ . The proof is rather like a stopping argument, avoiding some technicalities.

Proof.

Following McDiarmid [23], we use Lemma 3.16 [23], which is as follows. If $(Y_{i})$ is a martingale difference sequence with respect to a filtration $(\mathcal{F}_{i})$ , where each $Y_{i}$ is bounded above, if $I$ is an indicator random variable, and if $h$ is a real number, then

[TABLE]

(The statement in [23] involves the supremum instead of the essential supremum: the notionally stronger version is obtained by changing the $Y_{i}$ on a set of measure 0. The proof is fairly straightforward by induction over a single-step inequality.)

Now, for any random variable $X$ such that $X\leq b$ and $\operatorname{\mathbb{E}{}}X=0$ , we have $\operatorname{\mathbb{E}{}}(e^{X})\leq e^{g(b)\operatorname{\mathrm{v}ar}X}$ , where $g(x):=(e^{x}-1-x)/x^{2}$ (see Lemma 2.8 in McDiarmid [23]). So, for any $h$ , defining the (possibly infinite) $\mathcal{F}_{i-1}$ random variables $\operatorname{\mathrm{v}ar}_{i}:=\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})$ and $\operatorname{\mathrm{d}ev}_{i}^{+}:=\operatorname*{ess\,sup}(Z_{i}-Z_{i-1}\mid\mathcal{F}_{i-1})$ , we have

[TABLE]

Let $I$ be the indicator of the event $A(\delta,\gamma)$ . It then follows that

[TABLE]

Hence

[TABLE]

Optimising in $h$ , we set $h=\frac{1}{\gamma}\log(1+\frac{m\gamma}{\delta})$ and use the inequality $(1+x)\log(1+x)-x\geq x^{2}/(2+2x/3)$ for $x\geq 0$ , as in the proof of Theorem 2.7 in McDiarmid [23].

We obtain that

[TABLE]

The same proof gives the same upper bound on ${\mathbb{P}}(\{Z-\mu\leq-m\}\cap A(\delta,\gamma))$ , and the result follows. ∎

Proof of Theorem 2.1. We start by assuming that ${\widetilde{S}}=S$ . Let $({\mathcal{F}}_{i})$ denote the natural filtration of $(X(i))_{i\in{\mathbb{Z}}_{+}}$ . We fix a function $f\colon S\rightarrow{\mathbb{R}}$ , a natural number $k$ , and an initial state $x_{0}\in S$ . We consider the evolution of $(X(i))_{i\in{\mathbb{Z}}_{+}}$ for $k$ steps, conditional on $X(0)=x_{0}$ . Define the random variable $Z:=f(X(k))$ . Then, for $i=0,\ldots,k$ , $Z_{i}$ is given by

[TABLE]

To apply Lemma 2.2, we need to bound the conditional variances $\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})$ , for $1\leq i\leq k$ . Conditional on the event $X(i-1)=x_{i-1}$ , $Z_{i}$ takes the value $(P^{k-i}f)(x)$ with probability $P(x_{i-1},x)$ . Since $\operatorname{\mathrm{v}ar}Z\leq\operatorname{\mathbb{E}{}}\{(Z-c)^{2}\}$ for any $c\in{\mathbb{R}}$ , it follows that

[TABLE]

with $c_{i-1}:=(P^{k-i}f)(x_{i-1})$ . Using Assumption (2.2), this yields

[TABLE]

uniformly in $x_{i-1}\in S$ . It thus follows that

[TABLE]

so we set $\delta=a_{k}$ .

We also need a uniform upper bound on $|Z_{i}-Z_{i-1}|$ . We note that

[TABLE]

Note that, from Assumption (2.1), if $y,z\in N(x)$ for some $x\in S$ , then

[TABLE]

It then follows from (2.8) that, on the event $\{X(i-1)=x_{i-1}\}$ ,

[TABLE]

uniformly in $x_{i-1}\in S$ , since, in the last sum, both $X(i)$ and $z$ belong to $N(x_{i-1})$ . Accordingly, we take $\gamma=2\beta$ .

Theorem 2.1 now follows from inequality (2.4) in Lemma 2.2, in the case where ${\widetilde{S}}=S$ .

In general, for each $i$ , (2.7) and (2.9) hold if $x_{i-1}\in{\widetilde{S}}$ , and so all the above bounds hold on the event $A_{k}=\{X(i)\in{\widetilde{S}}\mbox{ for }i=0,\ldots,k-1\}$ . Thus $A_{k}\subseteq A(\delta,\gamma)$ , as defined in Lemma 2.2, and the full statement of Theorem 2.1 follows from inequality (2.5) in Lemma 2.2. $\square$

2.3. Contracting chains

We next show how to use Theorem 2.1 to recover a version of Ollivier’s results on chains with positive coarse Ricci curvature.

Let $d(\cdot,\cdot)$ be a metric on the state space $S$ of a discrete-time Markov chain $X=(X(i))_{i\geq 0}$ . A Markovian coupling $(X^{(1)},X^{(2)})$ of two copies of the chain is contracting with respect to the metric if, for some positive constant $\rho$ and for all $x,y\in S$ ,

[TABLE]

If condition (2.10) holds for all $x,y$ in some subset ${\widetilde{S}}$ of $S$ , then we say that the coupling is contracting on ${\widetilde{S}}$ .

The existence of a coupling satisfying (2.10) for all pairs of states is equivalent to the inequality

[TABLE]

where $W_{d}$ denotes the Wasserstein distance between two measures with respect to the metric $d$ on a space $S$ : $W_{d}(\mu,\nu)$ is the infimum of $\operatorname{\mathbb{E}{}}d(X,Y)$ over all pairs $(X,Y)$ of $S$ -valued random variables, with ${\mathcal{L}}(X)=\mu$ and ${\mathcal{L}}(Y)=\nu$ . Ollivier [26] defines a Markov chain to have coarse Ricci curvature at least $\rho$ if (2.11) holds: we prefer to say that the Markov chain is contracting in Wasserstein distance.

In the case where $d$ is a graph distance – i.e., $d(x,y)$ is the length of a shortest path in a graph between vertices $x$ and $y$ – inequality (2.11) is equivalent to

[TABLE]

where $\sim$ denotes adjacency in the graph. Gheissari, Lubetzky and Peres [11] call a chain satisfying (2.12) $(1-\rho)$ -contracting. We prefer to use the term contracting in Wasserstein distance to avoid confusion with the concept of contraction introduced by Marton [22], which is contraction in total variation distance.

For a Markov chain that is contracting in Wasserstein distance with respect to a metric $d$ , we now prove concentration of measure for any real-valued function $f$ on the state space that is Lipschitz with respect to $d$ . Part (a) of the theorem below applies when the Markov chain is contracting on the entire state space; part (b) is for when the contraction is only on some “good set”.

For an event $A$ , we let $\overline{A}$ denote its complement.

Theorem 2.3.

Let $X$ be a discrete-time chain on discrete state space $S$ with transition matrix $P$ . Suppose that $d(\cdot,\cdot)$ is a metric on $S$ , and let $f\colon S\to{\mathbb{R}}$ be a function such that, for some constant $L$ , $|f(x)-f(y)|\leq Ld(x,y)$ for all $x,y\in S$ . Suppose also that $D$ is a positive constant such that $d(x,y)\leq D$ whenever $P(x,y)>0$ .

(a) If $X$ is contracting in Wasserstein distance, with constant $\rho$ , and $D_{2}$ is a constant such that, for all $x\in S$ ,

[TABLE]

then, for all $x\in S$ , $m\geq 0$ , and $k\in{\mathbb{N}}$ ,

[TABLE]

(b) More generally, suppose that $X$ is contracting in Wasserstein distance on a subset ${\widehat{S}}$ of $S$ , with constant $\rho$ , and let ${\widetilde{S}}$ be a further subset of $S$ such that ${\widetilde{S}}^{+}:={\widetilde{S}}\cup\bigcup_{x\in{\widetilde{S}}}N(x)\subseteq{\widehat{S}}$ . Suppose that (2.13) holds for all $x\in{\widetilde{S}}$ . For $k$ a positive integer, let $A_{k}=\{X(j)\in{\widetilde{S}}\mbox{ for }0\leq j\leq k-1\}$ , and define

[TABLE]

Then, for all $x\in{\widetilde{S}}$ and $m\geq 0$ ,

[TABLE]

Note that we may always take $D_{2}=D^{2}$ , but sometimes it is possible to take $D_{2}$ significantly smaller. In part (b), we would expect to be able to choose the various sets so that $e_{k}$ is very small. In order to apply part (b) effectively, one would need to know that $\operatorname{\mathbb{P}{}}(\overline{A_{k}})$ is small, and this will not be true if the starting state is “close to the boundary” of ${\widetilde{S}}$ : a natural approach is to have three nested sets of states $S^{*}\subset{\widetilde{S}}\subset\widehat{S}$ , with the starting state restricted to $S^{*}$ , and with the probability of escaping from one set to the next over the time interval of interest being small; then we obtain concentration of measure over that time interval, uniformly over starting states in $S^{*}$ .

Proof.

For part (a), we apply Theorem 2.1 with ${\widetilde{S}}=S$ . For states $x$ and $y$ with $y\in N(x)$ , let $(X^{(1)}(i))$ and $(X^{(2)}(i))$ be copies of the chain with $X^{(1)}(0)=x$ and $X^{(2)}(0)=y$ , coupled so that $\operatorname{\mathbb{E}{}}[d(X^{(1)}(i),X^{(2)}(i))]\leq d(x,y)(1-\rho)^{i}$ for each $i\in{\mathbb{Z}}^{+}$ . Then we have

[TABLE]

whenever $y\in N(x)$ and $i\in{\mathbb{Z}}^{+}$ . Thus we may take $\beta=LD$ in (2.1) and $\alpha_{i}=(1-\rho)^{2i}L^{2}D_{2}$ in (2.2) for each $i\in{\mathbb{Z}}^{+}$ . Since then $a_{k}\leq L^{2}D_{2}/(2\rho-\rho^{2})$ for all $k\geq 1$ , the inequality follows.

For part (b), our plan is to apply Theorem 2.1 to the “inner” set ${\widetilde{S}}$ , so we need bounds on $|(P^{i}f)(x)-(P^{i}f)(y)|$ valid whenever $x\in{\widetilde{S}}$ and $y\in N(x)\subseteq{\widetilde{S}}^{+}$ . Accordingly, we fix such a pair $(x,y)$ , and $k\in{\mathbb{N}}$ . We now consider two copies $(X^{(1)}(i))$ and $(X^{(2)}(i))$ of the chain, with $X^{(1)}(0)=x$ and $X^{(2)}(0)=y$ , with a contractive coupling on $\widehat{S}$ with constant $\rho$ . For $i\geq 1$ , let $B_{i}$ be the event that both copies of the chain are in $\widehat{S}$ for all $j<i$ , and note that $\operatorname{\mathbb{P}{}}(\overline{B_{i}})\leq 2e_{i}$ . We claim that, for each $i$ ,

[TABLE]

This is true for $i=0$ . If the inequality is true for $i-1$ , then

[TABLE]

as claimed. As each step of either chain increases the distance between them by at most $D$ , we also have the bound

[TABLE]

for $i\geq 1$ and also for $i=0$ , and therefore

[TABLE]

Hence we have

[TABLE]

whenever $x\in{\widetilde{S}}$ and $y\in N(x)$ . Additionally we have that

[TABLE]

for all $x\in{\widetilde{S}}$ . Thus we can apply Theorem 2.1 with $\beta=LD(1+6ke_{k})$ , for $k\geq 1$ , and $\alpha_{i}=2L^{2}((1-\rho)^{2i}D_{2}+36i^{2}D^{2}e_{i}^{2})$ for each $i$ . Since then $a_{k}\leq 2L^{2}(D_{2}/\rho+12k^{3}D^{2}e_{k}^{2})$ , the inequality follows. ∎

Both parts of Theorem 2.3 follow directly, with essentially the same proof as here, from Theorem 4.5 of Luczak [19]. Part (a) of the result is also very similar to Theorem 33 of Ollivier [26]. Ollivier’s result is for the equilibrium distribution, although he notes in Remark 39 that a similar result can be obtained for the finite-time distributions. Ollivier’s bounds are stated in terms of a quantity called the coarse diffusion constant $\sigma(x)$ , at a state $x$ , which is closely related to our $D_{2}$ , and a quantity called the local dimension $n_{x}$ , that is of constant order in most applications with discrete state spaces. Our proof of Theorem 2.1 could be reworked to use the coarse diffusion constant directly (when bounding the conditional variances, we could instead use that $\operatorname{\mathrm{v}ar}(Z)=\frac{1}{2}\operatorname{\mathbb{E}{}}(Z_{1}-Z_{2})^{2}$ , where $Z_{1}$ and $Z_{2}$ are independent copies of $Z$ – see the proof of Lemma 4.6 in [19]). The conclusion of our result translates to essentially the same as Ollivier’s, with different constants. The concentration result is of the “Gaussian-then-exponential” type.

2.4. Approximately $f$ -contracting chains

We next illustrate how Theorem 2.1 can be applied in other settings, without even a metric on the state space. One can obtain a result by analysing the direct effect a coupling has on the function $f$ of interest, if the coupling is “approximately $f$ -contracting”, as we now describe. As before, let $(X^{(1)})$ and $(X^{(2)})$ be two coupled copies of the Markov chain, and let $f:S\to{\mathbb{R}}$ be any function. Suppose that $\sum_{y\in N(x)}P(x,y)|f(x)-f(y)|^{2}\leq F^{2}$ for any $x\in S$ , and that, for all states $x,y\in S$ ,

[TABLE]

for some constant $\rho>0$ , and some “error function” $\varepsilon$ . (An example where there is a need for such an error function is in Lemma 3.1 of [3].)

An induction argument then gives that, for all $x,y\in S$ and every $k\in{\mathbb{N}}$ ,

[TABLE]

where

[TABLE]

A convenient assumption, which is satisfied in the example from [3], is that $\operatorname{\mathbb{E}{}}_{x,y}[\varepsilon(X^{(1)}_{i},X^{(2)}_{i})]\leq\varepsilon_{0}(1-\rho)^{i}$ , for all $i$ and all $x,y\in S$ with $y\in N(x)$ , so that $\eta_{k}(x,y)\leq\varepsilon_{0}(k+1)(1-\rho)^{k}$ for each $k$ and each $x$ and $y$ with $y\in N(x)$ . It follows in this case that, for $x\in S$ and every $i\in{\mathbb{N}}$ ,

[TABLE]

So we may take $\alpha_{i}=2(F^{2}+\varepsilon_{0}^{2}(i+1)^{2})(1-\rho)^{2i}$ in Theorem 2.1, and hence $a_{k}=a=2F^{2}/\rho+4\varepsilon_{0}^{2}/\rho^{3}$ for all $k$ . Also we may take $\beta=G+\varepsilon_{0}$ , where $G$ is a uniform bound on $|f(x)-f(y)|$ for all $x\in S$ and $y\in N(x)$ . Applying Theorem 2.1 with these constants then gives a concentration inequality valid for all $x\in S$ and all $m\geq 0$ :

[TABLE]

2.5. A toy example

Many of the chains we might be interested in have stationary distributions, and under suitable conditions our results on long-term concentration of measure imply concentration of measure in equilibrium. This is explored in Corollary 4.2 of Luczak [19], giving circumstances where the chain is guaranteed to have a stationary distribution, and where concentration results carry over to equilibrium. The main focus of the paper of Ollivier [26] is also concentration of measure in equilibrium. In the example in Section 7 of this paper, we use facts from elsewhere about the equilibrium distribution, as well as our long-term concentration results, to prove concentration of measure of a suitable function in equilibrium.

We finish this section with a very simple class of examples, illustrating very different circumstances when our results can be applied. These examples have no stationary distributions, and our results can be applied to show concentration of measure within a window whose width may be constant, or may increase with time.

Consider the discrete-time chain $X(k)$ with state space ${\mathbb{Z}}_{+}$ , $X(0)=0$ , and transition probabilities $p(i,i)=p(i,i+1)=1/2$ . This is thus a pure-birth chain, stepping up with probability 1/2 at each time. We also consider a function $f:{\mathbb{Z}}_{+}\to{\mathbb{R}}$ , and we are interested in the long-term behaviour of $f(X(k))$ . Of course, this is easy to analyse directly since $X(k)$ has a Binomial distribution with parameters $(k,1/2)$ . If, for example, $f(x)=x^{r}$ for some constant $r\in(0,1]$ , then $f(X(k))$ is concentrated within a window of width $ck^{r-1/2}$ around $(k/2)^{r}$ .

We start by explaining why the hypotheses of Theorem 2.3 are too restrictive to encompass these examples. Consider a coupling of two copies of the chain, so that at each step either both copies move up, or neither moves up. (Choosing a different coupling would not make any difference.) Suppose that this coupling is contracting, with constant $\rho>0$ , with respect to some metric $d$ on ${\mathbb{Z}}_{+}$ . Then we have

[TABLE]

for each pair $(i,j)$ , which amounts to $d(i+1,i)\leq(1-2\rho)d(i,i-1)$ for each $i\geq 1$ . If the function $f$ is Lipschitz with respect to $d$ , with constant $L$ , then $|f(i+1)-f(i)|\leq L(1-2\rho)^{i}d(1,0)$ . This condition is only satisfied if $(f(i))$ converges to a limit $f_{\infty}$ , and moreover $|f(i)-f_{\infty}|\leq C(1-2\rho)_{i}$ for some constant $C$ . In particular, none of the functions $f(x)=x^{r}$ satisfy the hypotheses, even though a time-independent concentration result does hold when $r\leq 1/2$ .

We now show how to apply our more general result, Theorem 2.1, to the class of functions $f(x)=x^{r}$ , with $0<r\leq 1$ . We note that $f(x+1)-f(x)$ is non-increasing in $x$ , and that $\operatorname{\mathbb{P}{}}(X(i)\leq i/3)\leq e^{-i/36}$ from the Chernoff bound. Then we have, for any $x$ , and $i$ sufficiently large,

[TABLE]

Hence we may take $\alpha_{i}=2(i/3)^{2r-2}$ for large enough $i$ , and then $a_{k}=\sum_{i=0}^{k-1}\alpha_{i}$ is at most a constant $C(r)$ for $r\in(0,1/2)$ , and at most $C(r)k^{2r-1}$ for $r>1/2$ . We may also take $\beta=1$ . For $r<1/2$ , applying Theorem 2.1 with ${\widehat{S}}$ equal to the entire state space ${\mathbb{Z}}_{+}$ , gives a uniform bound on the concentration:

[TABLE]

for all $k$ , showing that $f(X(k))$ remains concentrated within a window of constant width around its mean for all $k$ . Of course, this is still far from a sharp result. For $r>1/2$ , we obtain that

[TABLE]

so that $f$ is concentrated within $k^{r-1/2}$ of its expectation, which in this case is the correct order of magnitude.

3. Concentration inequality: continuous time

We now state and prove a continuous-time version of Theorem 2.1. For definitions concerning continuous-time Markov chains, see Anderson [1], in particular pages 13 and 81 (we use the term “non-explosive” in place of “regular”).

Let $\widehat{X}=(\widehat{X}(t))_{t\in{\mathbb{R}}^{+}}$ be a stable, conservative, non-explosive continuous-time Markov chain with a discrete state space $S$ and $Q$ -matrix $(\widehat{Q}(x,y):x,y\in S)$ . Let $\widehat{P}^{t}=e^{\widehat{Q}t}$ denote the transition probabilities of $\widehat{X}$ . Much as before, for a function $f:S\to{\mathbb{R}}$ , we write $(\widehat{P}^{t}f)(x)$ to denote $\operatorname{\mathbb{E}{}}_{x}f(\widehat{X}(t))$ , whenever it exists.

For $x\in S$ , we set

[TABLE]

Theorem 3.1.

Let $(\widehat{Q}(x,y):x,y\in S)$ be the $Q$ -matrix of a stable, conservative, non-explosive continuous-time Markov chain $(\widehat{X}(t))_{t\geq 0}$ with discrete state space $S$ . Writing $q_{x}=-\widehat{Q}(x,x)$ , let ${\widehat{S}}$ be a subset of $S$ , for which $q:=\sup_{x\in{\widehat{S}}}\{q_{x}\}<\infty$ . Let $f\colon S\to{\mathbb{R}}$ be a function such that $(\widehat{P}^{t}f)(x):={\mathbb{E}}_{x}f(\widehat{X}(t))$ exists for all $t\geq 0$ and $x\in S$ , and suppose that $\widehat{\beta}$ is a constant such that

[TABLE]

for all $s\geq 0$ , all $x\in{\widehat{S}}$ and all $y\in N(x)$ . Assume also that the continuous function $\widehat{\alpha}:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+}$ satisfies

[TABLE]

for all $x\in{\widehat{S}}$ and all $s\geq 0$ . Define $\widehat{a}_{t}:=\int_{s=0}^{t}\widehat{\alpha}(s)\,ds$ . Finally, let $A_{t}:=\{\widehat{X}(s)\in{\widehat{S}}\mbox{ for all }0\leq s<t\}$ . Then, for all $x_{0}\in{\widehat{S}}$ , $t\geq 0$ and $m\geq 0$ ,

[TABLE]

Exactly as in the discrete case, a bound on the variance of $f(\widehat{X}(t))$ follows in the case where ${\widehat{S}}=S$ .

In order to prove the theorem, we first need to show that, for any fixed $x\in{\widehat{S}}$ , the function $(\widehat{P}^{s}f)(x)$ has zero quadratic variation on any finite $s$ -interval. This follows from the following lemma.

Lemma 3.2.

Under the above assumptions, for each $x\in{\widehat{S}}$ , $(\widehat{P}^{s}f)(x)$ is continuously differentiable with respect to $s$ .

Proof.

We can suppose that $f(x)\geq 0$ for all $x\in S$ ; if not, it suffices to consider the positive and negative parts $f^{+}$ and $f^{-}$ of $f$ separately. This enables the exchange of sums and integrals in the argument that follows.

First, by considering what happens up to time $s$ , we have

[TABLE]

Thus, from (3.1), for $x\in{\widehat{S}}$ and $y\in N(x)$ , it follows that

[TABLE]

Now, since

[TABLE]

the Kolmogorov backward equations imply that, for any $x\in{\widehat{S}}$ and $s>0$ , we have

[TABLE]

In view of (3.3), and because $\sum_{z\in S}\widehat{Q}(x,z)=q_{x}<\infty$ , the integrand on the right hand side of (3.4) is uniformly bounded on $[0,t]$ for any $t<\infty$ , implying that the indefinite integral is continuous in $s$ . From this, it follows immediately that $(\widehat{P}^{s}f)(x)$ is continuous in $s$ also. But then, for $x\in{\widehat{S}}$ ,

[TABLE]

is a uniformly convergent sum, in view of (3.1), and so the integrand in (3.4) is continuous; thus the indefinite integral is continuously differentiable with respect to $s$ , and hence $(\widehat{P}^{s}f)(x)$ is also. ∎

Proof of Theorem 3.1. Fix $\widehat{X}(0)=x_{0}\in{\widehat{S}}$ and, for $0\leq s\leq t$ , define

[TABLE]

note that $Z_{t}=f(\widehat{X}(t))-(\widehat{P}^{t}f)(x_{0})$ and that $Z_{0}=0$ . Then $(Z_{s})_{0\leq s\leq t}$ is a martingale, and so is $({\widehat{Z}}_{s})_{0\leq s\leq t}$ , where ${\widehat{Z}}_{s}:=Z_{s\wedge\tau_{0}}$ , and

[TABLE]

We now use a supermartingale derived from ${\widehat{Z}}$ to prove a concentration bound.

In view of Lemma 3.2, the continuous part of $Z$ has no quadratic variation until $\tau_{0}$ , and so the predictable quadratic variation of ${\widehat{Z}}$ is given by

[TABLE]

Hence, by (3.2),

[TABLE]

Let the jump times of $\widehat{X}$ be denoted by $0<\sigma_{1}<\sigma_{2}<\cdots$ , and write

[TABLE]

where $g(x)=(e^{x}-1-x)/x^{2}$ , as in the proof of Lemma 2.2, and, for $i$ such that $\sigma_{i}\leq\tau_{0}$ ,

[TABLE]

using the continuity of $(\widehat{P}^{s}f)(x)$ in $s\geq 0$ for each $x\in{\widehat{S}}$ .

Let $V^{h}$ denote the compensator of $U^{h}$ . We first note that $V^{h}_{s}$ is finite, at least for $s\leq\tau_{0}$ . This is because, for $0\leq v<s\leq\tau_{0}$ , we have

[TABLE]

by (3.1), as $g$ is increasing on $[0,\infty)$ . Hence, noting that $A_{t}=\{\tau_{0}\geq t\}$ , we see that

[TABLE]

in view of (3.5).

Now ${\widehat{Z}}$ is a square integrable martingale, because of (3.5), and hence, from the proof of Lemma 2.2 in van de Geer [10], $\exp\{h{\widehat{Z}}_{s}-V^{h}_{s\wedge\tau_{0}}\}$ is a non-negative supermartingale with initial value $1$ , since the continuous part of ${\widehat{Z}}$ has no quadratic variation. Thus

[TABLE]

On the other hand, using (3.6),

[TABLE]

Hence

[TABLE]

or

[TABLE]

We again optimise in $h$ , as in the proof of Theorem 2.7 in McDiarmid [23], and then repeat the argument for a bound on ${\mathbb{P}}_{x}[\{f(\widehat{X}(t))-(\widehat{P}^{t}f)(x)\leq-m\}\cap A_{t}]$ . $\square$

Let $(\widehat{X}(t))_{t\geq 0}$ be a stable, conservative, non-explosive continuous-time chain with state space $S$ , and let $d(\cdot,\cdot)$ be a metric on $S$ . A Markovian coupling of two copies of $(\widehat{X}(t))_{t\geq 0}$ is itself a contiuous-time Markov chain, with a generator that we denote $\mathcal{A}$ . The coupling is said to be contracting with respect to $d$ , with constant $\rho>0$ , if, for all $x,y\in S$ ,

[TABLE]

If the above holds for all $x$ and $y$ in some ${\widehat{S}}\subseteq S$ , then we say that the coupling is contracting on ${\widehat{S}}$ . We say that $(\widehat{X}(t))_{t\geq 0}$ is contracting in Wasserstein distance if there is a coupling satisfying (3.7) for all $x,y\in S$ . This definition corresponds to that of positive coarse Ricci curvature for continuous-time chains given by Veysseire [30], in the setting of jump chains.

The next result establishes concentration of measure for continuous-time chains that are contracting in Wasserstein distance. We state our result only for the case when the Markov chain is contracting on the entire state space, but there is not necessarily a global upper bound on the total transition rate out of a state. We could also provide a version for use when the contraction property only holds on a “good set”, but it seems hard to cover all the possible cases where such a result might be useful: an issue is that we need some mild control on the growth of $f$ in the unlikely event that the chain leaves the good set (in the discrete case, we used that the chain makes a bounded number of steps of bounded distance) and the form of the bounds will depend on the manner of that control.

Theorem 3.3.

Let $\widehat{X}$ be a stable, conservative, non-explosive continuous-time Markov chain on a discrete state space $S$ , with $Q$ -matrix $\widehat{Q}:=(\widehat{Q}(x,y):x,y\in S)$ . Suppose that $d(\cdot,\cdot)$ is a metric on $S$ , and let $f\colon S\to{\mathbb{R}}$ be a function such that, for some constant $L$ , $|f(x)-f(y)|\leq Ld(x,y)$ for all $x,y\in S$ .

Let ${\widehat{S}}$ be a subset of $S$ , and let $q$ and $D$ be constants such that $-\widehat{Q}(x,x)\leq q$ for all $x\in{\widehat{S}}$ and $d(x,y)\leq D$ whenever $x\in{\widehat{S}}$ and $y\in N(x)$ . For $t>0$ , let $A_{t}=\{\widehat{X}(s)\in{\widehat{S}}\mbox{ for }0\leq s<t\}$ .

Suppose that $\widehat{X}$ is contracting in Wasserstein distance, as in (3.7), with constant $\rho$ . Then, for all $x\in{\widehat{S}}$ , $t>0$ and $m\geq 0$ ,

[TABLE]

Proof.

It follows from (3.7) that, under a contracting coupling of two copies $\widehat{X}^{(1)}$ and $\widehat{X}^{(2)}$ , the process $\bigl{\{}e^{\rho t}d(\widehat{X}^{(1)}(t),\widehat{X}^{(2)}(t))\bigr{\}}_{t\geq 0}$ is a non-negative local supermartingale. Thus, if $(\widehat{X}^{(1)}(0),\widehat{X}^{(2)}(0))=(x,y)$ , then

[TABLE]

We can now apply Theorem 3.1, with

[TABLE]

and so, for any $t>0$ ,

[TABLE]

The result now follows from Theorem 3.1. ∎

Note that the upper bound in Theorem 3.3 on the deviations of $f(\widehat{X}(t))$ from its expectation does not depend on $t$ . As in the discrete case, in many applications, the distribution of $\widehat{X}(t)$ will approach an equilibrium, and the bound above implies a bound on the concentration of $f(\widehat{X}(t))$ in equilibrium. However, it might well be the case that $\operatorname{\mathbb{P}{}}(A_{t})\to 0$ as $t\to\infty$ : eventually the chain leaves the good set, and once it does we cannot hope to say much about its behaviour.

4. Upper bounds on coalescence times

In this section, we prove an auxiliary result for continuous-time Markov chains, which we will use (primarily in Section 6) to show that a chain with a contracting coupling mixes rapidly once it enters a region $R$ of the state space where the equilibrium distribution is concentrated; this is therefore a useful ingredient in a proof of cut-off, showing that the mixing time from any “distant” state is dominated by the “travel time” to reach $R$ .

We study a function of a continuous-time Markov chain on the non-negative reals, with non-positive drift in all positive states, and prove a lower bound on the hitting time of state 0. For a contracting coupling $(X(t),Y(t))$ of two copies of a Markov chain with respect to the metric $d$ on their state space $S$ , we can apply our result below to the function $d(X(t),Y(t))$ of the Markov chain $(X(t),Y(t))$ , in order to show that coalescence occurs quickly once the distance between the two copies is reasonably small: we illustrate this method in Section 6.

We deal only with the continuous-time case. Proposition 17.19 of Levin, Peres and Wilmer [18] gives an analogous result for discrete-time chains, which can often be used in a similar way to that described above; our proof of the proposition below follows theirs.

Proposition 4.1.

Let $X$ be a stable, conservative, non-explosive continuous-time Markov jump chain, with state space $S$ and $Q$ -matrix $Q$ . Let $B$ and $\sigma^{2}$ be positive, and let $f\colon S\to{\mathbb{R}}_{+}$ be a function. Set $S_{0}:=\{x\colon f(x)=0\}$ , and assume that:

(i)

the drift $\sum_{y}Q(x,y)\big{(}f(y)-f(x)\big{)}$ of $f$ is non-positive for all $x$ in $S\setminus S_{0}$ ;

(ii)

$f(X)$ * makes jumps of magnitude at most $B$ ;*

(iii)

$\sum_{y}Q(x,y)\big{(}f(y)-f(x)\big{)}^{2}\geq\sigma^{2}$ * for all $x\in S\setminus S_{0}$ .*

Define $T_{*}:=\inf\{t\colon f(X(t))=0\}$ , the hitting time of $S_{0}$ . Then, for any $t_{0}\geq 2B^{2}/\sigma^{2}$ ,

[TABLE]

Notes:

(a)

The nature of the underlying state space $S$ is not relevant, and we do not need to assume that the set $\{f(x):x\in S\}$ is discrete. 2. (b)

It is not a priori obvious that $S_{0}$ is non-empty or that $T_{*}$ is a.s. finite, but these follow from the result. 3. (c)

Suppose that $f(X_{0})\geq B/2$ . In the case where $t_{0}<2B^{2}/\sigma^{2}$ , we then have $\operatorname{\mathbb{P}{}}(T_{*}\geq t_{0})\leq 1\leq\frac{2\sqrt{2}f(X(0))}{\sigma\sqrt{t_{0}}}$ , and so (4.1) holds without any condition on $t_{0}$ .

The motivating example underlying the proposition is that of a simple random walk $X(t)$ on ${\mathbb{Z}}_{+}$ (with $f(x)=x$ ), making steps up and down each at rate $1$ , until the walk hits 0, so that the sum in (iii) is equal to $2$ for each positive state. In this case, the proposition says that the walk hits 0 before time $t_{0}$ with probability at least $1-\frac{2X(0)}{\sqrt{t_{0}}}$ , which is best possible up to a constant factor. The proposition then gives conditions, for more general processes, under which the same behaviour holds.

As mentioned already, we shall apply Proposition 4.1 to a Markovian coupling $(X,Y)$ , where $X$ and $Y$ are two copies of a jump Markov chain with a state space $S$ equipped with a metric $d$ , and $f\big{(}(x,y)\big{)}=d(x,y)$ . The conclusion is equivalent to saying that the chains have coalesced by time $t_{0}$ with probability at least $1-2\sqrt{2}d(X(0),Y(0))/\sqrt{t_{0}}\sigma$ (unless the two chains start within distance $B/2$ of each other, where $B$ is the maximum size $B$ of a jump in the distance, and $t_{0}$ is less than $2B^{2}/\sigma^{2}$ ). If the coupling is contracting with respect to $d$ , then condition (i) is satisfied. A lower bound $\sigma^{2}$ on the expression in condition (iii) can be obtained when, under the coupling, the distance between the two copies changes by at least $\eta$ at rate at least $r$ , for suitable $\eta$ and $r$ .

Our proof follows that of Proposition 17.19 in Levin, Peres and Wilmer [18].

Proof.

Let $D(t)=f(X(t))$ , so that $T_{*}=\inf\{t:D(t)=0\}$ . For some $h\geq B\vee D(0)$ to be chosen later, let $T_{h}=\inf\{t:D(t)=0\mbox{ or }D(t)\geq h\}$ . We note that, for any $t_{0}\geq 0$ ,

[TABLE]

We now give bounds on the two terms on the right above.

By (i), the process $(D(t\wedge T_{h}))$ is a supermartingale, and by (ii) it is bounded between [math] and $h+B$ . Therefore, by the Optional Stopping Theorem, we have $D(0)\geq\operatorname{\mathbb{E}{}}D(t_{0}\wedge T_{h})\geq h\operatorname{\mathbb{P}{}}(D(t_{0}\wedge T_{h})\geq h)$ , and so $\operatorname{\mathbb{P}{}}(D(t_{0}\wedge T_{h})\geq h)\leq D(0)/h$ .

For $t\geq 0$ , we set $G(t)=D(t)^{2}-2hD(t)-\sigma^{2}t$ . We claim that $(G(t\wedge T_{h}))$ is a submartingale. For $s<t\wedge T_{h}$ , we have

[TABLE]

As

[TABLE]

we have

[TABLE]

for all $u<T_{h}$ , by (i) and (iii), and so indeed $\operatorname{\mathbb{E}{}}[G(t\wedge T_{h})\mid X(s)]\geq G(s)$ for $s<t\wedge T_{h}$ .

For $t\leq T_{h}$ , we have $2hD(t)-D(t)^{2}=(2h-D(t))D(t)\geq 0$ , as $0\leq D(t)\leq h+B\leq 2h$ (since $h\geq B$ ) for $t\leq T_{h}$ . Thus we have, for any $t\geq 0$ , $\operatorname{\mathbb{E}{}}(2hD(t\wedge T_{h})-D(t\wedge T_{h})^{2})\geq 0$ , and so

[TABLE]

Hence we obtain, for any $t\geq 0$ , $\operatorname{\mathbb{E}{}}(t\wedge T_{h})\leq 2hD(0)/\sigma^{2}$ . Letting $t$ tend to infinity and applying the Monotone Convergence Theorem, we obtain the same upper bound on $\operatorname{\mathbb{E}{}}T_{h}$ . Therefore, for any $t_{0}>0$ ,

[TABLE]

We conclude that

[TABLE]

Optimising this bound by setting $h=\sigma\sqrt{t_{0}/2}$ now gives, provided $t_{0}\geq 2(B\vee D(0))^{2}/\sigma^{2}$ (so that $h\geq B\vee D(0)$ ),

[TABLE]

If $D(0)>\sigma\sqrt{t_{0}/2}$ , then the result is trivial, so we obtain the bound above under the condition $t_{0}\geq 2B^{2}/\sigma^{2}$ . ∎

We remark that the assumption of bounded jumps cannot be dropped. Let $(X(t))$ be a chain on $\mathbb{Q}$ with $Q$ -matrix $Q$ given by (a) for $x<1$ , $Q(x,x/2)=1$ and $Q(x,x+1/x)=x^{2}/2$ , and (b) for $x\geq 1$ , $Q(x,x+1/2)=Q(x,x-1/2)=1$ . Then $(X(t))$ is a non-explosive jump chain satisfying conditions (i) and (iii) with $\sigma^{2}=1/2$ . From a state $x<1$ , the probability that all subsequent jumps are down is equal to $\prod_{k=0}^{\infty}1/(1+x^{2}/2^{2k+1})>0$ . Thus the chain makes a.s. finitely many visits to $[1,\infty)$ before entering $(0,1)$ and making only downward jumps thereafter, but $(X(t))$ can never reach 0.

Alternatively, consider the chain on $\mathbb{Q}$ with a $Q$ -matrix such that $Q(x,x+1)=1$ for all $x$ , $Q(x,x/2)=2/x$ for $x\leq 2$ , and $Q(x,x-1)=1$ for $x\geq 2$ . This chain satisfies all of (i)-(iii), with $\sigma^{2}=1$ , but is explosive: starting from a state $x\leq 2$ , the probability that the chain makes infinitely many downward jumps before the first upward jump is $\prod_{k=1}^{\infty}2^{k}/(2^{k}+x)>0$ . State 0 is not reached before the explosion time.

5. Bernoulli–Laplace diffusion model

As our first example, we re-examine the Bernoulli–Laplace chain (Feller [9], Example XV.2(f)), for which cut-off was first established in Diaconis and Shahshahani [6]. In this model, there are two urns, the left urn initially containing $n$ red balls, and the right urn $n$ black balls. Then, at each time step, a ball is chosen at random in each urn, and the two balls are switched.

The state of the system at any time $r\geq 0$ is captured by the number $X^{(n)}(r)$ of red balls in the left urn at time $r$ . The chain $X^{(n)}$ can be viewed as a discrete-time lazy random walk with state space $\{0,\dots,n\}\subset{\mathbb{Z}}$ , with state-dependent transition probabilities

[TABLE]

Diaconis and Shahshahani examine the total variation distance between the distribution of $X^{(n)}(r)$ and its equilibrium distribution $\pi=\pi^{(n)}$ , a hypergeometric distribution with parameters $(2n,n,n)$ , defined by

[TABLE]

Analogously to earlier, we use ${\mathcal{L}}_{j}$ , ${\mathbb{P}}_{j}$ and ${\mathbb{E}}_{j}$ to refer to distributions conditional on $X^{(n)}(0)=j$ , and we also use ${\mathcal{L}}_{\pi^{(n)}}$ , ${\mathbb{P}}_{\pi^{(n)}}$ and ${\mathbb{E}}_{\pi^{(n)}}$ to refer to the equilibrium distribution.

Letting $r_{n}(\delta):=\lfloor\tfrac{1}{4}n\log n+\delta n\rfloor$ , Diaconis and Shahshahani [6] show that there are universal constants $C_{1},C_{2}>0$ such that

[TABLE]

Their proofs, especially that of (5.1), are based on algebraic techniques. Although they only consider starting from state $n$ , which is easily seen to maximise the mixing time, their proofs extend readily to cover other starting states. The upper bound (5.1) holds for any starting state. If the chain is started in a state $j$ in

[TABLE]

then a minor adjustment to their proof yields a bound of the form

[TABLE]

for some universal constant $C_{3}$ .

Thus, in the language introduced in Section 1, we have the following result.

Theorem 5.1.

For any $\varepsilon>0$ , the Bernoulli–Laplace chain exhibits cut-off at $\tfrac{1}{4}n\log n$ on $E_{n}(\varepsilon)$ with window width $n$ .

We use the results of the previous sections to give an alternative, coupling proof of Theorem 5.1, yielding the bounds in the result below.

Theorem 5.2.

Let $X^{(n)}(r)$ be a copy of the Bernoulli-Laplace chain. For $\delta\in{\mathbb{R}}$ , set $r_{n}(\delta):=\lfloor\tfrac{1}{4}n\log n+\delta n\rfloor$ .

(a) For $-\tfrac{1}{4}\log n\leq\delta<0$ , we have

[TABLE]

for any $\varepsilon>0$ , any $j\in E_{n}(\varepsilon)$ , and $n\geq 4$ .

(b) For $0\leq\delta\leq\tfrac{1}{4}\log n-\log\log n$ , we have

[TABLE]

for any $j\in\{0,\dots,n\}$ , and $n$ sufficiently large.

Thus our upper bound in Theorem 5.2(b) matches that of Diaconis and Shahshahani in (5.1), except that our proof requires a mild upper bound on $\delta$ , and our lower bound in part (a) improves on (5.2). The inequalities above are more than enough to imply Theorem 5.1.

Extensions and generalisations of the result of Diaconis and Shahshahani have also been obtained. For instance, Donnelly, Lloyd and Sudbury [7] showed cut-off for the separation distance mixing time for this model, and recently Eskenazis and Nestoridi [8] showed cut-off for the version where $k>1$ balls are exchanged at each step. All of these papers make some use of algebraic techniques.

We now give a brief overview of our proof of Theorem 5.2. The first step is to use our discrete-time concentration of measure inequality, Theorem 2.3(a), to show that, for any starting state $j=X^{(n)}(0)$ and any $r$ , $X^{(n)}(r)$ is well-concentrated around its mean. An easy estimate for the mean then shows that, with high probability, $X^{(n)}(r)$ is far from $n/2$ for $r\leq r_{n}(0)$ , and this is enough to give part (a).

The proof of (b) is more complicated. The concentration of measure result shows that $X^{(n)}(r)$ is unlikely to leave a neighbourhood of $n/2$ for a long period of time after $r_{n}(0)$ ; while it is in this neighbourhood, we can approximate the transitions of the chain by the transitions of a simpler chain whose long-term behaviour is easy to analyse, and show that the two chains therefore have approximately the same distributions over a suitably long time interval.

We proceed by stating and proving a sequence of lemmas. In what follows we drop the superscript $(n)$ , writing $X(r)$ instead of $X^{(n)}(r)$ , to lighten the notation.

Lemma 5.3.

Let $X(r)=X^{(n)}(r)$ be a copy of the Bernoulli-Laplace chain, with $n\geq 4$ . For all starting states $j\in\{0,\dots,n\}$ , all $r\in{\mathbb{Z}}_{+}$ , and all $c$ with $0\leq c\leq 3\sqrt{n}/4$ , we have

[TABLE]

where

[TABLE]

Proof.

Our plan is to use Theorem 2.3, and accordingly our first step is to describe a contractive coupling.

We fix $n\geq 4$ , and $j_{0}\in\{0,\ldots,n-1\}$ , and let $(X^{1}(r))$ and $(X^{2}(r))$ be two copies of the chain starting in $j_{0}$ and $j_{0}+1$ respectively. We describe a coupling of the chains such that $|X^{1}(r)-X^{2}(r)|$ remains equal to 1 until dropping to 0. When the two chains are in adjacent states $j$ and $j+1$ with $1\leq j\leq n-2$ , say with $X^{1}(r)=j$ and $X^{2}(r)=j+1$ , then the next step of the coupling is as follows. The two chains jump together up by 1 with probability $(1-(j+1)/n)^{2}$ and down by 1 with probability $(j/n)^{2}$ . Additionally, the lower chain $X^{1}(r)$ jumps up by 1 alone with probability $(1-j/n)^{2}-(1-(j+1)/n)^{2}=(2n-2j-1)/n^{2}$ , and the higher chain $X^{2}(r)$ jumps down by 1 alone at rate $((j+1)/n)^{2}-(j/n)^{2}=(2j+1)/n^{2}$ . This leaves probability $\frac{1}{n^{2}}\big{(}(n-j)^{2}+(j+1)^{2}\big{)}$ that both chains stay in their current state. Note that indeed $X^{2}(r+1)-X^{1}(r+1)$ is either 1 or 0, and that

[TABLE]

for $1\leq j\leq n-2$ .

The rules above do not define a coupling in the case where $j=0$ or $j=n-1$ . In the case $j=0$ , for instance, $X^{1}(r)$ jumps from 0 to 1 with probability 1, and $X^{2}(r)$ jumps to one of 0, 1, or 2 with probabilities $(1/n)^{2}$ , $2/n-2/n^{2}$ , and $(1-1/n)^{2}$ respectively. There is thus no monotone coupling possible. However, when $X^{1}(r)=0$ and $X^{2}(r)=1$ , the next step of the coupling is forced since $X^{1}(r+1)=1$ with probability 1, and it is still the case that $|X^{2}(r+1)-X^{1}(r+1)|$ is either 1 or 0. We have

[TABLE]

and similarly for $j=n-1$ . Hence our coupling is contractive with constant $\rho=2/n-2/n^{2}$ .

We take $f(x)=x$ in Theorem 2.3(a), with $S=\{0,\dots,n\}$ , $d(x,y)=|x-y|$ , $L=D=D_{2}=1$ , and $\rho=2/n-2/n^{2}$ , so that $2/(2\rho-\rho^{2})\leq n$ for all $n\geq 4$ . Then, by Theorem 2.3(a), for all $j\in\{0,\dots,n\}$ , all $r\in{\mathbb{Z}}_{+}$ , and all $m>0$ , we have

[TABLE]

If we set $m=c\sqrt{n}$ , for $0\leq c\leq 3\sqrt{n}/4$ , we obtain that $n+4m/3\leq 2n$ , and so

[TABLE]

To complete the proof, it remains to verify the formula for $x_{j}(r):=\operatorname{\mathbb{E}{}}_{j}X(r)/n$ . Observe that

[TABLE]

so that

[TABLE]

and hence

[TABLE]

as claimed ∎

A matching tail bound for the equilibrium distribution $\pi^{(n)}$ follows from Lemma 5.3. In fact, unsurprisingly, sharper tail bounds on the hypergeometric distribution are known: results of Hoeffding [15] (see Section 6 and Theorem 1) imply that, for any $c\geq 0$ ,

[TABLE]

An alternative proof was given by Chvátal [4].

It is now not hard to obtain the claimed lower bound on total variation distance for $r<\tfrac{1}{4}n\log n$ .

Proof of Theorem 5.2(a).

For $r=r_{n}(\delta)=\lfloor\tfrac{1}{4}n\log n+\delta n\rfloor$ , and $\delta<0$ , we have seen that both $X(r)$ and the equilibrium distribution are well-concentrated around their respective means. We will show that, if $j=X(0)$ is in $E_{n}(\varepsilon)$ for some fixed $\varepsilon>0$ , so that $|j-\frac{n}{2}|\geq\varepsilon n$ , then the means are still far apart at time $r$ .

From (5.4), we have that, uniformly in $-\tfrac{1}{4}\log n\leq\delta\leq 0$ ,

[TABLE]

for all $n\geq 4$ (so that $n^{1/2}\Big{(}1-\frac{2}{n}\Big{)}^{\tfrac{1}{4}n\log n}\geq 1/2$ ).

For fixed $\varepsilon>0$ and $\delta$ with $-\tfrac{1}{4}\log n\leq\delta\leq 0$ , we set

[TABLE]

By (5.5), we have

[TABLE]

Similarly, using (5.6) and Lemma 5.3, we have that, for any $j\in E_{n}(\varepsilon)$ ,

[TABLE]

for all $n\geq 4$ . Hence we have

[TABLE]

uniformly in $-\tfrac{1}{4}\log n\leq\delta\leq 0$ , which is the required result. ∎

Our proof of the lower bound above is actually very similar to that of Diaconis and Shahshahani: we have obtained an improved result by using Lemma 5.3, giving Gaussian concentration for $X(r_{n}(\delta))$ , instead of appealing to Chebyshev’s inequality.

We now turn to the upper bound. We start by using Lemma 5.3 to show that, for a long period beyond time $r_{n}(0)=\lfloor\frac{1}{4}n\log n\rfloor$ , the process $X(r)$ is unlikely to leave an interval of width $C\sqrt{n\log n}$ around $n/2$ .

Lemma 5.4.

For $n\geq 2e^{4}$ , any $s\in{\mathbb{Z}}_{+}$ , and any starting state $j$ ,

[TABLE]

Proof.

For $t\geq r_{n}(0)=\lfloor\frac{1}{4}n\log n\rfloor$ and any starting state $j$ , we have from (5.4) that

[TABLE]

for all $n\geq 5$ .

Therefore, at times $r_{n}(0)+r$ , $r\geq 0$ , for any starting state $j$ and for $n\geq 5$ , we have

[TABLE]

Combining this with Lemma 5.3, we have for $c\leq 3\sqrt{n}/4$ ,

[TABLE]

We apply this inequality with $c=4\sqrt{\tfrac{1}{2}\log(n/2)}-3/4$ , which is greater than $\sqrt{6\log(n/2)}$ for $n>2e^{4}$ (since $2\sqrt{2}-3/8>\sqrt{6}$ ), and deduce that

[TABLE]

The required result now follows. ∎

We remark here that it would be relatively straightforward to complete the proof of cut-off at this point: we can exhibit a coupling between two copies of the chain both remaining close to $n/2$ , such that the distance between the two copies is stochastically dominated by a simple lazy random walk – such a proof would show quickly that the two copies coalesce by time $r_{n}(0)+\delta n$ with probability $1-O(\delta^{-1/2})$ . (A similar argument is used by Eskenazis and Nestoridi [8], based on a discrete-time analogue of Proposition 4.1.) In order to establish the bound (5.3), we need a more precise argument.

For the moment we assume, for simplicity of exposition, that $n=4k$ for some positive integer $k$ . We consider the walk $Y=Y^{(n)}$ defined by $Y(r)=X(r_{n}(0)+r)-n/2=X(r_{n}(0)+r)-2k$ , $r\geq 0$ , which describes the evolution of $X$ beyond the time $r_{n}(0)$ . The transitions of this walk are given by:

[TABLE]

for $-2k\leq j\leq 2k$ .

At least when $j/4k$ is small, $Y$ has transition probabilities close to those of the simpler process ${\widetilde{Y}}:=({\widetilde{Y}}^{(n)}(r),\,r\geq 0)$ , with ${\widetilde{Y}}(0)=Y(0)$ , and transition probabilities given by

[TABLE]

We shall use ${\widetilde{Y}}$ as a surrogate for $Y$ in the argument to come.

The similarity of the transition probabilities (5.8) and (5.9), together with Lemma 5.4, is next used to show that, with high probability, the processes $Y$ and ${\widetilde{Y}}$ are almost indistinguishable for a long time.

For a sequence $y:=(y(r),\,r\geq 0)$ , we denote the initial segment up to time $s$ by $y([0,s]):=(y(0),y(1),\ldots,y(s))$ .

Lemma 5.5.

For $n=4k\geq 8000$ , and $\displaystyle s\leq\frac{k^{2}}{2500\log^{2}(2k)}$ , we have

[TABLE]

Proof.

For a sequence $y:=(y(r),\,r\geq 0)$ such that the $y(r)$ are integers with $|y(r)-y(r-1)|\leq 1$ for all $r\geq 1$ , let the likelihood ratio of the process ${\widetilde{Y}}$ compared to $Y$ on the segment $y([0,s])$ be given by

[TABLE]

For $k\geq 2000$ , we set $\varepsilon_{k}=5\sqrt{2\log(2k)/k}$ , and note that $\varepsilon_{k}\leq 1/2$ . If $|j|/k\leq\varepsilon_{k}$ , we then have, from the formulae for the transition probabilities, that

[TABLE]

so that, if $\Lambda(y([0,s]))\leq 2$ , it follows that

[TABLE]

Replacing $y$ by a path of $Y$ , we note that $(\Lambda(Y([0,s])),\,s\geq 0)$ is a martingale. Defining

[TABLE]

it follows from (5.11) that the quadratic variation of the martingale $\Lambda(Y([0,r]))$ until time $s\wedge\tau$ is at most $s\varepsilon_{k}^{4}$ . Since also ${\mathbb{E}}\Lambda(Y([0,s\wedge\tau]))=1$ , it follows from the Burkholder–Davis–Gundy inequality that

[TABLE]

Define the events $A_{s}$ and $B_{s}$ by

[TABLE]

Then

[TABLE]

and, on $B_{s}$ , $s=s\wedge\tau$ . Hence,

[TABLE]

From (5.12) and Kolmogorov’s inequality, and from Lemma 5.4, we have

[TABLE]

Hence, for $s\leq\varepsilon_{k}^{-4}$ , we have

[TABLE]

as required. ∎

Thus, with error at most $100s^{1/2}k^{-1}\log 2k$ , we can replace $Y([0,s])$ by ${\widetilde{Y}}([0,s])$ when calculating probabilities, and make only a small error if $s\ll(k/\log k)^{2}$ . Recalling that $n=4k$ , this means that the approximation of $Y$ by ${\widetilde{Y}}$ is asymptotically accurate over time intervals of length $o\bigl{(}(n/\log n)^{2}\bigr{)}$ .

We now use a coupling argument to show how fast ${\widetilde{Y}}$ converges to its equilibrium distribution ${\widetilde{\pi}}^{(k)}$ .

Lemma 5.6.

For any $k\geq 1$ and $r\geq 4k$ , we have

[TABLE]

Proof.

First, we note that the process ${\widetilde{Y}}$ can equivalently be described by way of a discrete Ehrenfest ball scheme. There are $2k$ balls, each of which is in state [math] or $1$ . At each step, a ball is chosen independently at random from the $2k$ balls, and its state is chosen to be [math] or $1$ , each with probability $1/2$ , independently of the whole past of the process. If $k+j$ balls are in state $1$ and $k-j$ in state [math] at step $r$ , we say that ${\widetilde{Y}}(r)=j$ ; then the probabilities for ${\widetilde{Y}}(r+1)$ are easily seen to be given by (5.9), and its equilibrium distribution ${\widetilde{\pi}}^{(k)}$ to be $\operatorname{Bi}(2k,1/2)*\delta_{-k}$ .

We now define a coupling of two copies ${\widetilde{Y}}^{1}$ and ${\widetilde{Y}}^{2}$ of the process ${\widetilde{Y}}$ , with ${\widetilde{Y}}^{1}(0)\geq{\widetilde{Y}}^{2}(0)$ . Pair the balls in the two processes so that those initially in state $1$ in ${\widetilde{Y}}^{2}$ are paired with balls in state $1$ in ${\widetilde{Y}}^{1}$ , and those initially in state [math] in ${\widetilde{Y}}^{1}$ are paired with balls in state [math] in ${\widetilde{Y}}^{2}$ ; then pair the remaining ${\widetilde{Y}}^{1}(0)-{\widetilde{Y}}^{2}(0)$ balls in the two processes. Couple the evolution by selecting one of these pairs of balls at each step, and re-assigning its state independently (the new state being the same for both ${\widetilde{Y}}^{1}$ and ${\widetilde{Y}}^{2}$ ). Let $M(r)$ denote the number of pairs of balls that have not been drawn up to step $r$ , made up of $M_{1}(r)$ in state $1$ , $M_{0}(r)$ in state [math], and of $M_{2}(r)=M(r)-M_{1}(r)-M_{0}(r)$ from the ${\widetilde{Y}}^{1}(0)-{\widetilde{Y}}^{2}(0)$ pairs of balls with differing initial states. Conditional on $M_{0}(r)$ , $M_{1}(r)$ and $M_{2}(r)$ , we have

[TABLE]

where $Z(r)$ has distribution $\operatorname{Bi}(2k-M(r),1/2)$ . Now, since the distribution $\operatorname{Bi}(m,1/2)$ is unimodal with mode $\lfloor m/2\rfloor$ , we have, for all $m\geq 1$ , that

[TABLE]

It follows that

[TABLE]

implying that

[TABLE]

Now ${\mathbb{P}}(M(r)>k)$ is the probability that all the $k$ draws come from some subset of $k$ of the $2k$ matched pairs of balls, and so, for $r\geq 4k$ ,

[TABLE]

We also have ${\mathbb{E}}M_{2}(r)=({\widetilde{Y}}^{1}(0)-{\widetilde{Y}}^{2}(0))(1-1/(2k))^{r}\leq({\widetilde{Y}}^{1}(0)-{\widetilde{Y}}^{2}(0))e^{-r/2k}$ . Hence, allowing either ordering of ${\widetilde{Y}}^{1}(0)$ and ${\widetilde{Y}}^{2}(0)$ , it follows from (5.14) that

[TABLE]

Setting ${\widetilde{Y}}^{1}(0)=Y(0)$ , and taking ${\widetilde{Y}}^{2}(0)\sim{\widetilde{\pi}}^{(k)}$ to be in equilibrium, we deduce, by taking expectations in (5.15), that

[TABLE]

as desired. ∎

Proof of Theorem 5.2(b).

We combine Lemma 5.5 with Lemma 5.6, replacing $Y(r)$ by $X(r_{n}(0)+r)-n/2$ , to deduce that, for any $j$ ,

[TABLE]

where we have used (5.7) to reach the last inequality, provided $8000\leq 4k=n\leq r\leq n^{2}/40000\log^{2}(n/2)$ .

The bound in (5.16) remains valid for any initial distribution; taking $X(0)\sim\pi^{(n)}$ , so that also $X(r_{n}(0))\sim\pi^{(n)}$ , this implies that

[TABLE]

also. (The bound above is valid for any $r$ , and is minimised for $r$ of order $n\log n$ . One could obtain a stronger bound, of order $n^{-1}$ , by direct computation, but this is rather delicate and the gain is not relevant to us.)

Hence, for $n$ a sufficiently large multiple of 4, and $n\leq r\leq\tfrac{1}{4}n\log n-n\log\log n$ , we have,

[TABLE]

This bound also holds trivially for $r\leq n$ . Taking $r=\delta n$ , this proves the result in the case where $n$ is a multiple of 4.

If $n$ is not divisible by $4$ , the argument remains almost the same. Define $k:=\lfloor n/4\rfloor$ , and set $Y(r):=X(r_{n}(0)+r)-2k$ , as above. The transition rates for $Y$ are not quite as in (5.8), but they are very close, resulting only in an extra contribution of order $O(k^{-1})$ to the bounds in (5.10). This correction is of smaller order than $\varepsilon_{k}^{2}$ , and can be absorbed into the bound (5.13) provided $k$ is sufficiently large. The rest of the proof is unchanged. ∎

Diaconis and Shahshahani [6], and other authors, actually consider a more general version, with boxes of unequal sizes. The first box initially contains $n^{\prime}$ red balls, and the second $2n-n^{\prime}$ black balls. The mixing process runs as before. Our approach can be used for this model as well. The jump probabilities for the process counting the number $X$ of red balls in the first box are again quadratic in the current state $j$ of the process. When evaluated close to the equilibrium mean $n^{\prime}p$ , where $p:=n^{\prime}/2n$ , these probabilities are close to the linear jump probabilities near equilibrium of another process ${\widetilde{Y}}$ consisting of $\ell$ balls, coloured red or black, with the following dynamics. At each time step, a ball is chosen. It is left with unchanged colour with probability $1-\theta$ ; otherwise, it is re-coloured red with probability $q$ and black with probability $1-q$ , independently of everything else (so that its colour may in fact still be unchanged). Then ${\widetilde{Y}}(r)$ denotes the number of red balls at time $r$ . The values of $\ell,\theta$ and $q$ to best match the original process are found to be

[TABLE]

note that, for $n^{\prime}=n$ , as previously, we have $p=1/2=q$ , $\theta=1$ and $\ell=\lfloor n/2\rfloor$ , corresponding to the approximation made before. With these modifications, an analogous argument can be carried out, to establish cut-off.

6. A two host model of disease

Our next example is a two-dimensional Markov chain $\widehat{X}^{(n)}$ in continuous time, representing a two host model of disease, in which transmission only occurs between one host type and the other (snails and human beings in schistosomiasis (Jordan, Webbe and Sturrock [16])), or males and females in sexually transmitted diseases (Hethcote and Yorke [14])). Our framework is appropriate for a disease that is not naturally endemic in a region, being supported at a low level through immigration from outside. In state ${\mathbf{x}}:=(x_{1},x_{2})^{T}\in{\mathbb{Z}}_{+}^{2}$ , there are $x_{1}$ type- $1$ hosts and $x_{2}$ type- $2$ hosts infected. From any state ${\mathbf{x}}$ , there are four possible transitions, whose rates are as follows:

[TABLE]

Here, $\alpha$ , $\beta$ , $\gamma$ , $\delta$ , $\mu$ and $\nu$ are fixed positive constants, and the parameter $n$ is a measure of the typical size of the infected population. The first transition corresponds to the infection of a type $1$ host, by a type $2$ host or from outside, and the second to the infection of a type $2$ host. The third transition corresponds to the recovery of a type $1$ host, and the fourth to the recovery of a type $2$ host. The infection transition rates are appropriate in circumstances in which the host population is so large that the reduction in infection rate caused by some of the population already being infected is negligible, or for diseases such as malaria, when ‘super-infection’ is possible: a host infected more than once is proportionately more infectious – in this case, $\bf x$ denotes the total number of infections of each type of host.

Let ${\mathbf{m}}(t):={\mathbf{m}}_{\mathbf{x}}(t):=n^{-1}\operatorname{\mathbb{E}{}}_{\mathbf{x}}\{\widehat{X}^{(n)}(t)\}$ , where $\operatorname{\mathbb{E}{}}_{\mathbf{x}},{\mathbb{P}}_{\mathbf{x}}$ and ${\mathcal{L}}_{\mathbf{x}}$ refer to the distribution conditional on $\widehat{X}^{(n)}(0)={\mathbf{x}}$ . It follows that ${\mathbf{m}}$ satisfies the differential equation $d{\mathbf{m}}/dt=A{\mathbf{m}}+{\mathbf{b}}$ , where

[TABLE]

with initial condition ${\mathbf{m}}(0)=n^{-1}{\mathbf{x}}$ . We define $R:=\alpha\beta/\gamma\delta$ , and assume from now on that $R<1$ , so that $A$ has both eigenvalues negative, and we denote them by ${-\rho>-\rho^{\prime}}$ , with corresponding unit (right) eigenvectors ${\bf v}$ and ${\bf v}^{\prime}$ . The differential equation has a non-trivial equilibrium at

[TABLE]

and its full solution is

[TABLE]

showing that the equilibrium ${\mathbf{c}}$ is globally attractive when $R<1$ .

For any $n$ and any ${\mathbf{x}}\in{\mathbb{Z}}_{+}^{2}$ , we define the travel time from state ${\mathbf{x}}$ (to within $n^{-1/2}$ of ${\mathbf{c}}$ ) to be

[TABLE]

which, in view of (6.3), is therefore the infimum of times $t$ such that $|\operatorname{\mathbb{E}{}}_{\mathbf{x}}\{\widehat{X}^{(n)}(t)\}-n{\mathbf{c}}|\leq n^{1/2}$ .

For $0<\zeta<1$ , let

[TABLE]

We shall prove the following theorem.

Theorem 6.1.

Suppose that $R<1$ . Then, for any $0<\zeta<1$ , $\widehat{X}^{(n)}$ exhibits cut-off at $t_{n}({\mathbf{x}})$ on $E_{n}(\zeta)$ , with window width $1$ .

We first consider the problem of estimating $t_{n}({\mathbf{x}})$ for ${\mathbf{x}}\in E_{n}(\zeta)$ . Writing $n^{-1}{\mathbf{x}}-{\mathbf{c}}$ as a linear combination $\lambda{\mathbf{v}}+\lambda^{\prime}{\mathbf{v}}^{\prime}$ of the unit eigenvectors ${\mathbf{v}}$ and ${\mathbf{v}}^{\prime}$ of $A$ , we have

[TABLE]

Then $t_{n}({\mathbf{x}})\sim\max\{\rho^{-1}\log(n^{1/2}\lambda),(\rho^{\prime})^{-1}\log(n^{1/2}\lambda^{\prime})\}$ .

For $\zeta\in(0,1)$ , there is a constant $L_{\zeta}$ such that, for all ${\mathbf{x}}\in E_{n}(\zeta)$ , $t_{n}({\mathbf{x}})\leq\frac{1}{2}\rho^{-1}\log n+L_{\zeta}$ . For “most” states in $E_{n}(\zeta)$ , there is a matching lower bound, but $t_{n}({\mathbf{x}})$ is as small as $\frac{1}{2}(\rho^{\prime})^{-1}\log n+O(1)$ when $\frac{1}{n}{\mathbf{x}}-{\mathbf{c}}$ is close to a multiple of ${\mathbf{v}}^{\prime}$ .

The rest of this section is devoted to a proof of Theorem 6.1: we give a brief road map of the proof here. Our basic plan is to apply Theorem 3.3 to our chain, showing concentration of measure for $\widehat{X}^{(n)}(t)$ while $t\leq t_{n}({\mathbf{x}})$ . To this end, we specify a suitable metric, and a Markovian coupling of two copies of the chain which is contracting in Wasserstein distance with respect to that metric. We show that the chain remains within a good set (where, in particular, the total transition rate is bounded) over a long time period. Then we apply Theorem 3.3 to each of the two coordinate projections, showing that both remain concentrated around their means for a long time. We deduce readily that the chain is far from its equilibrium for times less than $t_{n}({\mathbf{x}})$ . On the other hand, once the chain reaches a neighbourhood of $n{\mathbf{c}}$ , we can use Proposition 4.1 to show that it couples rapidly with an equilibrium copy of the chain, so the total variation distance to the equilibrium copy is small for times only slightly greater than $t_{n}({\mathbf{x}})$ .

The two left eigenvectors of $A$ can be written in the form $(1,\xi)$ , where $\xi$ is a solution of the equation $\delta-\alpha/\xi=\gamma-\beta\xi$ , with the common value $\delta-\alpha/\xi$ being minus the corresponding eigenvalue. This equation has one negative solution $\xi=\theta^{\prime}$ , corresponding to the eigenvalue $-\rho^{\prime}$ , and the other solution $\xi=\theta$ lying in the interval $(\alpha/\delta,\gamma/\beta)$ . Thus we have

[TABLE]

We introduce the norm $\|\cdot\|_{\theta}$ on ${\mathbb{R}}^{2}$ , with

[TABLE]

We shall shortly prove that our chain has a contracting coupling with respect to the distance $\|{\mathbf{x}}-{\mathbf{y}}\|_{\theta}$ .

Next, we collect some elementary properties of the Markov chain $\widehat{X}^{(n)}$ . First, we note that, for $R<1$ , $\widehat{X}^{(n)}$ is a $2$ -type subcritical Markov branching process with immigration, and hence has an equilibrium distribution $\pi^{(n)}$ . Furthermore, since the process without immigration is sub-critical and has birth and death rates that do not depend on $n$ , whereas the immigration rates are multiples of $n$ , the mean of $\pi^{(n)}$ is $n{\mathbf{c}}$ , and its covariance matrix is of the form $n\Sigma$ , for $\Sigma$ not depending on $n$ (see, for example, Quine [28] (Theorem on p. 414 and Equation (29)) for analogues in discrete time).

Next, for use with Theorem 3.3, we show that the chain rarely gets too far from the origin, so that the total transition rate remains bounded. For $H>0$ , we define

[TABLE]

Proposition 6.2.

Suppose that $R<1$ . Then there exist positive constants $C$ and $\psi$ , depending on the parameters of the model but not on $n$ , such that, for any $H\geq 4\|{\mathbf{b}}\|_{\theta}/\rho$ , any $n\in{\mathbb{N}}$ , any ${\mathbf{x}}\in D_{n}(H)$ , and any $T,w>0$ ,

[TABLE]

Proof.

Let $\mathcal{A}^{(n)}$ denote the generator of $\widehat{X}^{(n)}$ , and define $h_{\psi}({\mathbf{x}}):=\exp\{\psi\|{\mathbf{x}}\|_{\theta}\}$ . The first step is to show that, for sufficiently small positive $\psi$ , $(\mathcal{A}^{(n)}h_{\psi})({\mathbf{x}})<0$ for all ${\mathbf{x}}$ such that $\|{\mathbf{x}}\|_{\theta}$ is large enough.

Setting $g(s):=s^{-2}(e^{s}-1-s)$ for $s\neq 0$ , and $g(0)=1/2$ , we have:

[TABLE]

We now see that

[TABLE]

We bound the $\psi^{2}$ term in (6) above by noting that $g(\pm\psi)$ and $g(\pm\theta\psi)$ are all at most 1, provided $\psi\leq 1/(1\vee\theta)$ , and hence

[TABLE]

Hence, for $\psi\leq\min(1/(1\vee\theta),\frac{1}{2}\rho/(\alpha/\theta+\beta\theta^{2}+\gamma+\delta\theta))$ , we have

[TABLE]

which is non-positive whenever $\|{\mathbf{x}}\|_{\theta}\geq 4n\|{\mathbf{b}}\|_{\theta}/\rho$ .

Now fix some $H\geq 4\|{\mathbf{b}}\|_{\theta}/\rho$ , and some starting state ${\mathbf{x}}\in D_{n}(H)$ , so that $\|{\mathbf{x}}\|_{\theta}\leq nH$ and therefore $x_{1}\leq nH$ and $x_{2}\leq nH\theta^{-1}$ . Fix also some $w>0$ . We will show that the probability that $\widehat{X}^{(n)}$ ever exits the set $D_{n}(H+w)$ during a fixed time interval $[0,T]$ is very small for large $n$ .

We consider the excursions out of the set $D_{n}(H)$ during $[0,T]$ . Note that, each time that $\widehat{X}^{(n)}$ enters $D_{n}(H)$ , it remains there at least for the holding time of the state at which it first enters, which has an exponential distribution with mean at least $1/n{q(H)}$ , for

[TABLE]

This implies that the number of exits of $\widehat{X}^{(n)}$ from $D_{n}(H)$ in $[0,T]$ is stochastically dominated by a Poisson random variable with mean $nTq(H)$ .

We claim that, each time that $\widehat{X}^{(n)}$ leaves $D_{n}(H)$ , the probability that $\|\widehat{X}^{(n)}\|_{\theta}$ exceeds the value $n(H+w)$ before $\widehat{X}^{(n)}$ returns to $D_{n}(H)$ is exponentially small in $n$ . To prove this, consider starting in some state $\bf y$ which can be reached in one step from $D_{n}(H)$ , so that $\|{\mathbf{y}}\|_{\theta}\leq nH+(1\vee\theta)$ , and let

[TABLE]

In view of (6.6), $h_{\psi}(\widehat{X}^{(n)}(t\wedge\tau_{1}))$ is a non-negative supermartingale in $t\geq 0$ . Stopping at $\min\{\tau_{2},\tau_{1}\}$ , it thus follows that,

[TABLE]

from which it follows that

[TABLE]

It follows that the expected number of times that $\widehat{X}^{(n)}$ exits $D_{n}(H+w)$ in the interval $[0,T]$ is at most $nTq(H)e^{\psi(1\vee\theta)}e^{-n\psi w}$ , establishing the proposition. ∎

We now introduce a Markovian coupling of two copies of the Markov chain $\widehat{X}^{(n)}$ , which we will then show to be contracting with respect to the metric $d({\mathbf{x}},{\mathbf{y}})=\|{\mathbf{x}}-{\mathbf{y}}\|_{\theta}$ on ${\mathbb{Z}}_{+}^{2}$ . In this coupling, the two copies $U^{(n)}$ and $V^{(n)}$ make moves independently in any co-ordinate where they currently differ (so in particular the two copies a.s. never move together in such a co-ordinate), but make moves together as far as possible in co-ordinates where they currently agree.

For each ${\bf J}\in\mathcal{J}:=\{(1,0)^{T},(0,1)^{T},(-1,0)^{T},(0,-1)\}$ , we denote the transition rate of $\widehat{X}^{(n)}$ from ${\mathbf{x}}$ to ${\mathbf{x}}+{\mathbf{J}}$ , given in (6.1), by $r_{\bf J}({\mathbf{x}})$ . We then couple copies $U^{(n)}$ and $V^{(n)}$ of $\widehat{X}^{(n)}$ as follows.

Suppose that $U^{(n)}(t)={\mathbf{u}}$ and $V^{(n)}(t)={\mathbf{v}}$ . If $u_{1}\not=v_{1}$ , then for ${\bf J}=(1,0)^{T}$ or $(-1,0)^{T}$ , there is a transition to $({\mathbf{u}}+{\bf J},{\mathbf{v}})$ at rate $r_{\bf J}({\mathbf{u}})$ , and a transition to $({\mathbf{u}},{\mathbf{v}}+{\bf J})$ at rate $r_{\bf J}({\mathbf{v}})$ . If $u_{1}=v_{1}$ , then there is a transition to $({\mathbf{u}}+{\bf J},{\mathbf{v}}+{\bf J})$ at rate $\min(r_{\bf J}({\mathbf{u}}),r_{\bf J}({\mathbf{v}}))$ , a transition to $({\mathbf{u}}+{\bf J},{\mathbf{v}})$ at rate $\max(0,r_{\bf J}({\mathbf{u}})-r_{\bf J}({\mathbf{v}}))$ , and a transition to $({\mathbf{u}},{\mathbf{v}}+{\bf J})$ at rate $\max(0,r_{\bf J}({\mathbf{v}})-r_{\bf J}({\mathbf{u}}))$ . The transitions in directions $(0,1)^{T}$ and $(0,-1)^{T}$ are defined analogously.

Proposition 6.3.

The coupling defined above for $\widehat{X}^{(n)}$ is contracting with respect to the metric $d({\mathbf{x}},{\mathbf{y}})=\|{\mathbf{x}}-{\mathbf{y}}\|_{\theta}$ , with constant $\rho$ .

Proof.

If both chains make the same transition at $t$ , then the distance between them does not change: $d(U^{(n)}(t),V^{(n)}(t))=d(U^{(n)}(t-),V^{(n)}(t-))$ . Otherwise, the distance changes by $\pm 1$ as a result of a jump by either copy in either $1$ -direction, or by $\pm\theta$ as a result of a jump by either copy in either $2$ -direction.

Let the generator of the process $(U^{(n)},V^{(n)})$ be denoted by ${\widehat{\mathcal{A}}}^{(n)}$ . We start by looking at the contribution of the $(-1,0)^{T}$ jumps to $({\widehat{\mathcal{A}}}^{(n)}d)({\mathbf{u}},{\mathbf{v}})$ . If $u_{1}=v_{1}$ , then $r_{(-1,0)^{T}}({\mathbf{u}})=\gamma u_{1}=\gamma v_{1}=r_{(-1,0)^{T}}({\mathbf{v}})$ , so the two chains always make this transition together, contributing no change to the distance. If $u_{1}>v_{1}$ , then the $(-1,0)^{T}$ jump in $U^{(n)}$ occurs at rate $\gamma u_{1}$ and reduces the distance by 1, while the $(-1,0)$ jump in $V^{(n)}$ occurs at rate $\gamma v_{1}$ and increases the distance by 1: overall, the net contribution is $-\gamma|u_{1}-v_{1}|$ . The same calculation applies if $u_{1}<v_{1}$ , so in all cases the contribution of this jump is $-\gamma|u_{1}-v_{1}|$ . Similarly, the contribution of the $(0,-1)^{T}$ jump is $-\delta\theta|u_{2}-v_{2}|$ .

We now turn to the $(1,0)^{T}$ jump. If $u_{1}=v_{1}$ , the distance increases by 1 whenever one chain makes this jump and the other does not, which occurs at rate $|r_{(1,0)^{T}}({\mathbf{u}})-r_{(1,0)^{T}}({\mathbf{v}})|=\alpha|u_{2}-v_{2}|$ . If $u_{1}\not=v_{1}$ , a $(1,0)^{T}$ jump in one of the chains increases the distance by 1, while the same jump in the other chain decreases the distance by 1, so the net contribution from this jump is at most $|r_{(1,0)^{T}}({\mathbf{u}})-r_{(1,0)^{T}}({\mathbf{v}})|$ , which is again equal to $\alpha|u_{2}-v_{2}|$ . Similarly, the contribution of the $(0,1)^{T}$ jump is at most $\beta\theta|u_{1}-v_{1}|$ .

Referring to (6.4), it follows that, for all states ${\mathbf{u}},{\mathbf{v}}$ ,

[TABLE]

as required. ∎

We will now apply Theorem 3.3 to the Markov chain $\widehat{X}^{(n)}$ , with $f({\mathbf{x}})$ either of the two co-ordinate projections $f_{1}({\mathbf{x}})=x_{1}$ or $f_{2}({\mathbf{x}})=x_{2}$ . We fix some $0<\zeta<1$ , and note that, for any ${\mathbf{x}}\in E_{n}(\zeta)$ , we have $|{\mathbf{x}}-n{\mathbf{c}}|\leq n/\zeta$ , and therefore

[TABLE]

Now we take $H=\max((1\vee\theta)(1/\zeta+|{\mathbf{c}}|),4\|{\mathbf{b}}\|_{\theta}/\rho)$ , so that $E_{n}(\zeta)\subseteq D_{n}(H)$ , and apply Proposition 6.2 with $w=H$ . We see that, for any ${\mathbf{x}}\in E_{n}(\zeta)$ , and any $T>0$ , the probability that the chain exits the set $D_{n}(2H)$ before time $T$ is at most $CnTe^{-n\psi H}$ , for some constants $C$ and $\psi$ . To apply Theorem 3.3, we take ${\widehat{S}}=D_{n}(2H)$ , and note that, for ${\mathbf{y}}\in{\widehat{S}}$ , the total transition rate $-\widehat{Q}({\mathbf{y}},{\mathbf{y}})$ out of state ${\mathbf{y}}$ is at most $q:=n\bigl{[}\mu+\nu+2(\theta^{-1}(\alpha+\delta)+\beta+\gamma)H\bigr{]}$ . If $f$ is the first co-ordinate projection $f_{1}$ , we have $|f_{1}({\mathbf{x}})-f_{1}({\mathbf{y}})|\leq\|{\mathbf{x}}-{\mathbf{y}}\|_{\theta}$ , so we may take $L=1$ : for $f=f_{2}$ , we need instead $L=1/\theta$ . We may also take $D=1\vee\theta$ .

Theorem 3.3 now tells us that, for $i=1,2$ , all $t>0$ and all $c>0$ , and all ${\mathbf{x}}\in E_{n}(\zeta)$ ,

[TABLE]

where

[TABLE]

Thus, for some constant $b$ depending on the parameters of the model and on $\zeta$ , and all $c\leq\varepsilon\sqrt{n}$ , where $\varepsilon>0$ is sufficiently small, we have

[TABLE]

for $i=1,2$ , all $t>0$ and all ${\mathbf{x}}\in E_{n}(\zeta)$ .

Moreover, for a suitable constant $K$ , $t\leq n$ , and $c\leq\varepsilon\sqrt{n}$ for some sufficiently small $\varepsilon>0$ ,

[TABLE]

From (6.9) and (6.10), it now follows that, for $0<t\leq n$ , ${\mathbf{x}}\in E_{n}(\zeta)$ , and $c\leq\varepsilon\sqrt{n}$ ,

[TABLE]

for suitable constants $b$ , $\varepsilon$ and $K$ , depending on the parameters of the model and on the choice of $\zeta$ .

We are now in a position to prove cut-off for our model.

Proof of Theorem 6.1. A lower bound on the mixing time can now easily be proved, much as in the previous example, by considering the distribution of $\widehat{X}^{(n)}(t_{n}({\bf x})-s)$ , for $s>0$ . Let ${\kappa}>0$ , depending on the parameters of the model, be such that

[TABLE]

By (6.3) and the definition of $t_{n}(\cdot)$ , we have

[TABLE]

Therefore, using (6.12),

[TABLE]

Let $B_{s}:=\{{\mathbf{w}}\in{\mathbb{Z}}_{+}^{2}\colon|{\mathbf{w}}-n{\mathbf{c}}|\leq\tfrac{1}{2}\kappa n^{1/2}e^{\rho s}\}$ . Then, from (6.11) with $c=\frac{1}{4}\kappa e^{\rho s}$ , noting that $t_{n}({\mathbf{x}})\leq n$ for ${\mathbf{x}}\in E_{n}(\zeta)$ provided $n$ is sufficiently large, we have

[TABLE]

On the other hand, as stated in the discussion before Proposition 6.2, the covariance matrix of the equilibrium distribution of $\widehat{X}^{(n)}$ is of the form $n\Sigma$ , with $\Sigma$ being independent of $n$ . It hence follows, using Chebyshev’s inequality, that $\pi^{(n)}(B_{s})\geq 1-4\kappa^{-2}ve^{-2\rho s}$ , with $v:={\rm Tr\,}(\Sigma)$ .

This then gives, for a suitable constant $K^{\prime}$ and $s$ and $n$ sufficiently large,

[TABLE]

for any ${\mathbf{x}}\in E_{n}(\zeta)$ . This establishes the first part of the definition of cut-off in (1.1).

We now turn to the upper bound. We will apply Proposition 4.1 to the Markov chain $(U^{(n)},V^{(n)})$ , where $U^{(n)}$ is a copy of the started close to $n{\mathbf{c}}$ , $V^{(n)}$ is another copy in equilibrium, and the pair are coupled as in Proposition 6.3. We use the proposition to show that coalescence occurs quickly with high probability.

Consider a copy $U^{(n)}$ of $\widehat{X}^{(n)}$ starting from state ${\mathbf{x}}$ and couple it with an equilibrium copy $V^{(n)}$ , as in Proposition 6.3. For any fixed $\varepsilon>0$ , we choose $c{=c(\varepsilon)}$ so that $(4+K)e^{-bc^{2}}\leq\varepsilon/4$ , and use (6.11) and (6.13) to conclude that

[TABLE]

and similarly for the equilibrium copy $V^{(n)}(t_{n}({\mathbf{x}}))$ . Therefore, with probability at least $1-\varepsilon/2$ , we have

[TABLE]

We are now in a position to apply Proposition 4.1 to the function $\|U^{(n)}(t_{n}({\mathbf{x}})+s)-V^{(n)}(t_{n}({\mathbf{x}})+s)\|_{\theta}$ , for $s\geq 0$ . Condition (i) of the proposition is satisfied by Proposition 6.3, and condition (ii) is satisfied with $B=1\vee\theta$ . For condition (iii), note that, if ${\bf u}\not={\bf v}$ , each of the chains moves while the other does not – and so the distance between the two chains changes by at least $1\wedge\theta$ – at rate at least $(\mu\wedge\nu)n$ . Hence the generator of the quadratic variation process is at least $\sigma^{2}:=(1\wedge\theta)^{2}(\mu\wedge\nu)n$ from all states where coalescence has not occurred.

Proposition 6.3 then implies that, on the event that $\|U^{(n)}(t_{n}({\mathbf{x}}))-V^{(n)}(t_{n}({\mathbf{x}}))\|_{\theta}\leq 2(c(\varepsilon)+1)(1\vee\theta)n^{1/2}$ , the probability that coalescence has not occurred by time $s$ is at most

[TABLE]

where $\varphi(\varepsilon):=\frac{4(c(\varepsilon)+1)(1\vee\theta)}{(1\wedge\theta)\sqrt{\mu\wedge\nu}}$ . For $s=s(\varepsilon)=4\varphi(\varepsilon)^{2}/\varepsilon^{2}$ , we conclude that

[TABLE]

Since $V^{(n)}(t_{n}({\mathbf{x}})+s(\varepsilon))$ is in equilibrium, it follows that

[TABLE]

as required for the second part of the definition of cut-off in (1.1). $\square$

7. Supermarket model

In this section, we apply our general continuous-time inequality, Theorem 3.1, to a range of instances of the supermarket model. This is a simple and natural model of a queuing system, introduced by Mitzenmacher [25] and Vvedenskaya, Dobrushin and Karpelevich [31], and studied extensively since; see, for instance [21] and [2], which contain other references to related literature.

The supermarket model (in continuous time) with parameters $(n,d,\lambda)$ ( $n$ and $d$ natural numbers, $\lambda\in(0,1)$ ) is defined as follows. There are $n$ servers, each with their own queue of customers, and customers arrive according to a Poisson process with rate $\lambda n$ . Each arriving customer inspects the queues for $d$ of the servers, chosen uniformly at random with replacement, and joins one of the shortest queues among these $d$ ; customers cannot subsequently switch to a different queue. At each server, customer service times are iid exponential of mean 1.

The memoryless property of the arrival and service processes means that the supermarket model can be viewed as a continuous-time jump Markov chain, whose state space is the set ${\mathbb{Z}}_{+}^{n}$ of possible $n$ -tuples of queue lengths. The possible transitions are of two types: (i) departures, where each queue of positive length is shortened by one at rate $1$ , and (ii) arrivals, at total rate $\lambda n$ , where some queue, chosen by the procedure described above, is lengthened by 1. To be precise, on an arrival, an ordered $d$ -tuple of queues is chosen uniformly at random from all the $n^{d}$ possibilities, and the first shortest queue in the list receives the arriving customer and is thus lengthened by 1.

Much of the initial interest in the supermarket model stemmed from its properties as a “low-cost” load-balancing mechanism: for $\lambda<1$ a constant, the maximum queue length in equilibrium is of order $\log n$ when $d=1$ , but of order $\log\log n$ when $d$ is a constant at least 2. In [2] and this paper, we are interested in different ranges of parameters, where $\lambda$ tends to 1 from below as $n\to\infty$ , while $d$ tends to infinity. In these ranges, as shown in [2], the load-balancing among the servers in equilibrium is close to perfect – the maximum queue length is a given constant $k$ with high probability, and most queues have length exactly $k$ – even though the system is nearly at full capacity.

For the rest of this section, as in [2], we set $\lambda=1-n^{-\alpha}$ , and $d=n^{\beta}$ , where $\alpha$ and $\beta$ are fixed constants in $(0,1)$ . We will assume throughout that

[TABLE]

For $\beta\leq 1/3$ , the corresponding range of $\alpha$ is thus $(\beta,2\beta)$ ; for $1/3\leq\beta<1$ , the corresponding range for $\alpha$ is $(\beta,(1+\beta)/2)$ . Other parameter ranges come into the scope of [2] and, with a little more work, we could prove concentration results for those too.

Theorem 6.1 from [2] gives the general behaviour of the model in a variety of ranges, including this one (referring to that theorem, assumptions (7.1) are equivalent to setting $k=2$ ). The basic result is that, in equilibrium, the chain lies in a “good set” where all queues have length at most 2, with very high probability; it also states that, if the chain is started anywhere within an “interior good set”, then with high probability it remains in the good set for a long period of time. We first set up notation, and then state the part of the result covering our range.

In fact, the model analysed in [2] is a discrete-time variant of the continuous-time model studied here. In that variant, the transition at each time-step is an arrival with probability $\lambda/(1+\lambda)$ and a potential departure with probability $1/(1+\lambda)$ . If the transition is an arrival, a queue is chosen as in the continuous-time version, and the length of that queue is increased by 1. If the transition is a potential departure, a queue is chosen uniformly at random, and the length of that queue is decreased by 1 if it is not empty. If an empty queue is chosen for a departure, then the chain remains in its current state. An alternative description of the continuous-time model is that events occur according to a Poisson process with rate $(1+\lambda)n$ , and the transition associated with an event is chosen as for the discrete-time model above. A consequence is that the two models have the same equilibrium distribution, and if the probability that the chain remains in some set $S$ of states for $k$ steps of the discrete chain is at least $q$ , then the probability that the chain remains in $S$ up to time $k/4n$ in the continuous model is at least $q$ minus the probability that a Poisson random variable with mean $k(1+\lambda)/4\leq k/2$ is greater than $k$ , which is at least $q-e^{-k/8}$ . Similarly, provided $\lambda\geq 1/2$ , if the total variation distance between the discrete-time supermarket model after $k$ or more steps and the equilibrium distribution is at most $p$ , then the total variation distance between the continuous-time model and the equilibrium distribution is at most $p+e^{-k/16}$ for all times at least $k/n$ .

For $n\in{\mathbb{N}}$ , a state $x$ in ${\mathbb{Z}}_{+}^{n}$ , and $j\in{\mathbb{N}}$ , let $u_{j}(x)$ denote the proportion of queues in $x$ of length at least $j$ . Let $\varepsilon=\varepsilon(n)$ be any function such that $\varepsilon\leq 1/100$ and $\varepsilon(n)^{-1}=o(n^{\delta})$ for every $\delta>0$ . For $n\in{\mathbb{N}}$ , and $\alpha$ and $\beta$ satisfying the inequalities in (7.1), let $\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ be the set of states $x$ such that:

[TABLE]

A state $x$ in $\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ will thus have between $(1\pm 6\varepsilon)n^{1-\alpha}$ empty queues, between $(1\pm 6\varepsilon)n^{1-\alpha+\beta}$ queues of length 0 or 1 – most of which will then have length 1 – and the remaining queues all of length 2. As $\beta<\alpha$ , this implies that the proportion of queues of length exactly 2 tends to 1 as $n\to\infty$ .

The following result is taken from Theorems 6.1 and 1.2 of [2] – in the application of Theorem 1.2, we take $t\geq n^{2}$ so that $t/3200n^{1+\beta}>\frac{1}{4}\log^{2}n$ for $n$ sufficiently large; as is remarked after Theorem 10.5 of [2], the conclusion is valid for the full range of $\varepsilon$ stated above. Note that the results in [2] are stated for the discrete-time version of the model; we have derived results for the continuous-time version as described above, and bounded above the error probabilities involved in the translation by $e^{-\frac{1}{4}\log^{2}n}$ .

Theorem 7.1 (Brightwell, Fairthorne and Luczak).

Given $n$ and $\varepsilon(n)$ as above, and $\alpha$ and $\beta$ satisfying the inequalities in (7.1), let $(Y(t))$ be a copy of the supermarket process with parameters $(n,d,\lambda)$ , where $\lambda=1-n^{-\alpha}$ and $d=n^{\beta}$ , in equilibrium. Then, for $n$ sufficiently large,

[TABLE]

Moreover, if $(X(t))$ is a copy of the supermarket process with $X(0)\in\mathcal{N}^{\varepsilon/6}(n,\alpha,\beta)$ , then

[TABLE]

and, for $n$ sufficiently large and $t\geq n$ ,

[TABLE]

where $\Pi$ denotes the equilibrium distribution.

We will focus on the number $V(x)=n(1-u_{1}(x))$ of empty queues, and investigate how well $V(Y(t))$ is concentrated around its mean for an equilibrium copy $(Y(t))$ of the supermarket process with parameters as above. For $(Y(t))$ , the mean total arrival rate is $\lambda n=n(1-n^{-\alpha})$ , while the mean total departure rate is the expected number of non-empty queues, which is $n-\operatorname{\mathbb{E}{}}V(Y(t))=n\operatorname{\mathbb{E}{}}u_{1}(Y(t))$ . In equilibrium, the mean arrival rate is equal to the mean departure rate, so we have $\operatorname{\mathbb{E}{}}V(Y(t))=n^{1-\alpha}$ . States $x$ in $\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ thus all have $V(x)$ within $6\varepsilon n^{1-\alpha}$ of the mean $\operatorname{\mathbb{E}{}}V(Y(t))$ . We shall prove that we have concentration of $V(Y(t))$ within $n^{(1-\beta)/2}$ of its mean $n^{1-\alpha}$ : as $(1-\beta)/2<1-\alpha$ , this is a sharper concentration result than is given by Theorem 7.1. It is remarked in [2] that the proof of Theorem 7.1 goes through for $\varepsilon=n^{-\delta}$ , where $\delta$ is sufficiently small: the implied result is still not as strong as we shall prove here, since $\delta$ would have to be strictly less than the minimum of several quantities, one of which is $(1-\alpha)-(1-\beta)/2$ , and this is the smallest of the quantities for part, but not all, of our range – more details can be found in the arXiv version of [2].

The supermarket model is also used as an example by Luczak in [19], to illustrate the concentration inequality derived in that paper. That analysis is based on a natural coupling $(X(t),Z(t))$ of two copies of the supermarket model with the same parameters, which we now describe – our proof is also based on this coupling. In the coupling, the arrival times for the two processes are identical, and on an arrival the same ordered $d$ -tuple of queues is inspected in the two processes. For each of the queues, a “potential departure” from the queue occurs at rate 1: for each of the copies of the process, if the queue is non-empty at the time of the potential departure, a customer is served and leaves the system at that time. If states $x$ and $z$ are adjacent (i.e., one can be reached from the other by a single transition), then they differ by 1 in exactly one queue. For an adjacent pair $(x,z)$ , we call the queue where the two states differ the unbalanced queue, and we say that $x>z$ if the unbalanced queue is longer in $x$ than in $z$ . If $X(0)=x$ and $Z(0)=z$ , where $x$ and $z$ are adjacent with $x>z$ , then we claim that, under the coupling, the pair $(X(t),Z(t))$ remains adjacent, with $X(t)>Z(t)$ , until the two copies coalesce. On a departure from the unbalanced queue, coalescence occurs if that queue is already empty in $Z(t)$ , and otherwise the queue remains unbalanced. If an arriving customer joins the unbalanced queue in $x$ , they join that queue in $z$ as well. It is also possible that an arriving customer joins the unbalanced queue in $z$ and a different queue in $x$ ; the states remain adjacent, but a different queue becomes unbalanced.

The analysis in [19] assumes that $d$ is a constant, but it is easy to see that the proof there gives concentration around the mean only to within order $\sqrt{nd}$ . For small enough $\beta$ and $\alpha$ , this is still a stronger result than that implied by Theorem 7.1, but the result we prove below always gives stronger concentration.

Theorem 7.2.

Let $(Y(t))$ be a copy of the supermarket model with parameters $(n,d,\lambda)$ , in equilibrium, where $\lambda=1-n^{-\alpha}$ , $d=n^{\beta}$ , and $(\alpha,\beta)$ satisfy (7.1). Then, for $n$ sufficiently large, and any $m$ ,

[TABLE]

In particular, if $c=c(n)$ is positive, with $c=o(\log n)$ , and $n$ is sufficiently large, we have

[TABLE]

Our proof will be an application of Theorem 3.1 to the (well-behaved) continuous-time chain $(Y(t))$ . We give the proof below, postponing the proof of a key lemma.

Proof.

We shall apply Theorem 3.1 to the supermarket model with the given parameters, with $f(x)=V(x)$ , the number of empty queues in state $x$ . We set $\varepsilon=\varepsilon(n)=1/\log n$ , and let $\widehat{S}$ be the set $\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ . We then consider starting in a state $X(0)\in\mathcal{N}^{\varepsilon/6}(n,\alpha,\beta)$ . Note that, for any state $x$ , the total transition rate $q_{x}$ out of state $x$ is at most $2n$ . In order to apply the result, we need to identify a constant $\widehat{\beta}$ satisfying (3.1), and a function $\widehat{\alpha}(s)$ satisfying (3.2). We obtain these by analysing the natural coupling of two copies of the chain described above.

Accordingly, we consider a pair of copies $(X,Z)$ starting in adjacent states $x$ and $z$ with $x>z$ , evolving according to the coupling described above, so that the two copies remain adjacent until coalescence. At any time $t$ , $X(t)$ and $Z(t)$ are adjacent or equal, and if they are adjacent then there is one unbalanced queue. Let $L(t)$ denote the length of the longer unbalanced queue, or 0 if there is none, at time $t$ : the random process $(L(t))$ is thus a function of the coupled pair $(X(t),Z(t))$ , taking values in ${\mathbb{Z}}_{+}$ , making steps up and down by 1, until it steps from 1 to 0 and remains at 0 thereafter. For a pair $(x,z)$ of adjacent initial states, and $s\geq 0$ , let $a_{1}(s)=a_{1}^{xz}(s)$ denote the probability that $L(s)$ is equal to $1$ .

For an initial adjacent pair of states $(X(0),Z(0))=(x,z)$ with $x>z$ , and any time $s$ , the difference $V(X(s))-V(Z(s))$ is equal to 1 when $L(s)=1$ and 0 otherwise, so the quantity $(\widehat{P}^{s}V)(x)-(\widehat{P}^{s}V)(z)=\operatorname{\mathbb{E}{}}_{x}V(X(s))-\operatorname{\mathbb{E}{}}_{z}V(Z(s))$ is exactly equal to $a_{1}^{xz}(s)$ . In particular, we thus have $|(\widehat{P}^{s}V)(x)-(\widehat{P}^{s}V)(z)|\leq 1$ , so we may take $\widehat{\beta}=1$ .

If $x\in{\widehat{S}}=\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ , and $z$ is adjacent to $x$ , then either $z\in\mathcal{N}^{2\varepsilon}(n,\alpha,\beta)$ , or $z>x$ and $z$ has a queue of length 3; in the latter case, the transition from $x$ to $z$ is an arrival in which only queues of length 2 are inspected, and the rate of such arrivals from any state $x\in\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ is at most $(1-\frac{1}{2}n^{-\alpha+\beta})^{d}\leq\exp(-\frac{1}{2}n^{-\alpha+2\beta})\leq\exp(-\frac{1}{4}\log^{2}n)$ , for $n$ sufficiently large. As the total transition rate out of any state $x$ is at most $2n$ , we have, a little crudely,

[TABLE]

where the maximum is over initial pairs $(x,z)$ where $z$ is adjacent to $x$ and both are in $\mathcal{N}^{2\varepsilon}(n,\alpha,\beta)$ .

Lemma 7.3.

[TABLE]

whenever $x$ and $z$ are adjacent states in $\mathcal{N}^{2\varepsilon}(n,\alpha,\beta)$ , and $s\leq e^{\frac{1}{5}\log^{2}n}$ .

We postpone a proof of Lemma 7.3 until later, but we now indicate briefly what the terms in (7.3) signify. The final term accounts for the possibility of leaving the set $\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ . The first term accounts for the probability that $L(0)=1$ and no transition has occurred before time $s$ to change the length of the unbalanced queue. The second term is the main term; roughly speaking, it arises from showing that coalescence occurs in time of order $d$ , and, conditional on coalescence not occurring before time $s$ , the probability that $L(s)=1$ is of order $1/d$ .

We continue with the main thread of our proof, assuming the bound (7.3) in Lemma 7.3. Given this bound, we may take $\widehat{\alpha}(s)$ in (3.2) to be

[TABLE]

and

[TABLE]

for $t\leq e^{\frac{1}{5}\log^{2}n}$ and $n$ sufficiently large.

Now consider starting at any state $X(0)\in\mathcal{N}^{\varepsilon/6}(n,\alpha,\beta)$ , and let $A_{t}$ be the event that the process stays within $\widehat{S}=\mathcal{N}^{\varepsilon}(n,\alpha,\beta)$ until time $t$ . For $t\leq e^{\frac{1}{5}\log^{2}n}$ , Theorem 7.1 tells us that the probability of $A_{t}^{c}$ is at most $2e^{-\frac{1}{4}\log^{2}n}$ . We now apply Theorem 3.1, and obtain that, for any $m\geq 0$ , and any $t\leq e^{\frac{1}{5}\log^{2}n}$ ,

[TABLE]

For $m<n/d$ , we have $\frac{m^{2}}{110n/(d+2)+2m/3}\geq\frac{m^{2}d}{111n}$ ; for $m\geq n/d$ , we have that $\frac{m^{2}}{110n/(d+2)+2m/3}\geq\frac{m}{111}\geq\frac{1}{4}\log^{2}n$ for $n$ sufficiently large (depending on $\beta$ ). Therefore we have, for any $m$ and $t\leq e^{\frac{1}{5}\log^{2}n}$ ,

[TABLE]

The final part of Theorem 7.1 tells us that, for $t\geq n$ and any $X(0)$ in $\mathcal{N}^{\varepsilon/6}(n,\alpha,\beta)$ , the total variation distance between ${\mathcal{L}}_{t}^{X(0)}$ and the equilibrium distribution is at most $7ne^{-\frac{1}{4}\log^{2}n}$ . Thus, choosing $t$ so that $n\leq t\leq e^{\frac{1}{5}\log^{2}n}$ , we see firstly that $|\operatorname{\mathbb{E}{}}V(X(t))-\operatorname{\mathbb{E}{}}V(Y(t))|\leq 7n^{2}e^{-\frac{1}{4}\log^{2}n}\leq 1$ , for $n$ sufficiently large, where $Y(t)$ is a copy in equilibrium. Recalling that $\operatorname{\mathbb{E}{}}V(Y(t))=n^{1-\alpha}$ , this yields that

[TABLE]

for $m\geq 224$ , and the inequality also holds trivially for $m<224$ provided $n$ is sufficiently large. We then further deduce that

[TABLE]

which implies the claimed result. ∎

It remains to prove Lemma 7.3. For this, we will use the following technical lemma, a variant of Gronwall’s Lemma.

Lemma 7.4.

If $f(x)$ is continuous on $[0,\tau]$ and, for some $\gamma>0$ ,

[TABLE]

then $f(s)e^{\gamma s}$ is non-increasing on $[0,\tau]$ and so $f(s)\leq f(0)e^{-\gamma s}$ for all $s\in[0,\tau]$ .

Proof.

Suppose for a contradiction that $f(s)e^{\gamma s}>f(t)e^{\gamma t}$ , where $0\leq t<s\leq\tau$ . Now take $t^{\prime}$ to be the maximum value in $[t,s)$ such that $f(t^{\prime})e^{\gamma t^{\prime}}=f(t)e^{\gamma t}$ . By continuity, it follows that $f(u)e^{\gamma u}\geq f(t^{\prime})e^{\gamma t^{\prime}}$ for all $u\in[t^{\prime},s]$ .

Applying the hypothesis to the times $t^{\prime}$ and $s$ , we obtain that

[TABLE]

which gives the desired contradiction. ∎

Proof of Lemma 7.3.

We need to show that, for $n$ sufficiently large, (7.3) holds whenever $x$ and $z$ are adjacent states in $\mathcal{N}^{2\varepsilon}(n,\alpha,\beta)$ , and $s\leq e^{\frac{1}{5}\log^{2}n}$ . Fix adjacent states $x$ and $z$ in $\mathcal{N}^{2\varepsilon}(n,\alpha,\beta)$ with $x>z$ . Let $p_{\rm exit}$ be the probability that either copy of the chain exits $\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ before time $e^{\frac{1}{5}\log^{2}n}$ ; by Theorem 7.1 (with $\varepsilon/6$ replaced by $2\varepsilon$ ), we have

[TABLE]

Until the copies coalesce, there is an unbalanced queue, with length $L(t)$ in $X(t)$ and length $L(t)-1$ in $Z(t)$ ; whatever the length of the unbalanced queue, the rate of departures from the unbalanced queue is 1, and a departure would lead to coalescence if $L(t)=1$ , or reduce the unbalanced queue lengths by 1 if $L(t)\geq 2$ . If $L(t)=2$ , then an arrival does not change $L(t)$ unless the process leaves $\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ . If $L(t)=1$ , then an arrival increases the length of the unbalanced queue exactly when the arriving customer joins the (empty) unbalanced queue in $Z(t)$ . The rate $R_{t}$ of such arrivals depends on the number of empty queues in $Z(t)$ ; we could give an exact expression, but we content ourselves with loose bounds that are easy to derive. The rate $R_{t}$ is certainly at most the rate of arrivals in which the unbalanced queue is inspected, which is equal to $\lambda n(1-(1-1/n)^{d})\leq\lambda d\leq d$ . Any arriving customer who inspects the unbalanced queue and no other empty queue in $Z_{t}$ – we call such an arrival a critical arrival – will join the unbalanced queue and thus cause $L(t)$ to increase from 1 to 2: while $Z(t)$ is in $\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ , the proportion $p_{t}$ of empty queues in $Z(t)$ is at most $2n^{-\alpha}$ , and so the rate of critical arrivals is

[TABLE]

for $n$ sufficiently large. Hence $R_{t}\geq d/2$ as long as $L(t)=1$ and $Z(t)$ is in $\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ .

In summary, if $L(t)=1$ , then $L$ decreases at rate 1, and increases at a rate $R_{t}$ between $d/2$ (provided $Z(t)\in\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ ) and $d$ . If $L(t)=2$ , then $L$ decreases at rate 1.

We consider the coupled pair of chains up to time $\tau=e^{\frac{1}{5}\log^{2}n}$ , starting from the initial state $(x,z)$ . Extending our earlier notation, we let $a_{j}(t)=\operatorname{\mathbb{P}{}}(L(t)=j)$ and $a_{\geq j}(t)=\operatorname{\mathbb{P}{}}(L(t)\geq j)$ , for $j=1,2$ .

Applying Dynkin’s formula, as well as the facts we have established about the rates of transitions for $L(t)$ , we have that, for $t<s$ ,

[TABLE]

We also note that, for $u\leq\tau$ ,

[TABLE]

Recall that $p_{\rm exit}$ is the probability that either copy leaves the set $\mathcal{N}^{12\varepsilon}(n,\alpha,\beta)$ before time $\tau$ . Note that

[TABLE]

for all $t\in[0,\tau]$ .

Our aim is to prove the upper bound (7.3) on $a_{1}(s)=a_{1}^{xz}(s)$ for all $s\leq\tau$ . We shall establish that $a_{\geq 2}(s)$ is of order $da_{1}(s)$ , for $s$ larger than about $1/d$ , and that $a_{\geq 1}(s)$ falls off at least as fast as roughly $e^{-s/d}$ . This implies that the time to coalescence is approximately dominated by an exponential random variable with mean $d$ , while, for $t$ greater than about $1/d$ , conditional on coalescence not having occurred, the probability that $L(t)=1$ is of order $1/d$ ; these bounds will yield (7.3). In our formal analysis, we shall use Lemma 7.4 several times.

We first consider the function

[TABLE]

From (7.6), (7.7), (7.8) and (7.9), we have

[TABLE]

and therefore from Lemma 7.4 we have that $r(s)\leq r(0)e^{-(d+2)s}\leq e^{-(d+2)s}$ . Rearranging, we obtain that, for $s\leq\tau$ ,

[TABLE]

This tells us that, roughly speaking, after a lead-in time of order $1/d$ , the probability $a_{1}(s)$ that the unbalanced queue has length 1 is at least about $1/(d+2)$ times the probability $a_{\geq 1}(s)$ that coalescence has not occurred.

The next step is to use the above to show that $a_{\geq 1}(s)$ falls off at least as fast as roughly $e^{-s/d}$ . We see from (7.5) and (7.10) that

[TABLE]

Now we consider the function

[TABLE]

We have

[TABLE]

and so

[TABLE]

Therefore, by Lemma 7.4, $v(s)\leq v(0)e^{-s/(d+2)}\leq 2e^{-s/(d+2)}$ , and we deduce that, for $s\leq\tau$ ,

[TABLE]

Finally, we show that $a_{1}(s)$ is at most about $2/d$ times $a_{\geq 1}(s)$ . We apply Lemma 7.4 to the function

[TABLE]

From (7.6), (7.7) and (7.8), we have, for $t<s$ ,

[TABLE]

We obtain that $q(s)\leq q(0)e^{-(d/2+1)s}\leq\frac{d}{2}e^{-(d/2+1)s}$ , so

[TABLE]

Summing with (7.11) yields, for $s\leq\tau$ ,

[TABLE]

and so

[TABLE]

which is the required bound. ∎

Theorem 7.2 gives concentration of the random variable $V(Y(t))$ about its mean within order $\sqrt{n/d}$ . We note that no such bound can be shown if we rely only on the fact that $V(x)$ is a Lipschitz function of the state space. Indeed, coalescence of the Markov chain takes time of order $d$ , and the results of [19], [26] or [27] would only give concentration within order $\sqrt{nd}$ of the mean.

We indicate briefly why we expect that concentration of $V(Y(t))$ within order $\sqrt{n/d}$ of its expectation is best possible. If we look at the transitions of the process over a time period $[0,t]$ of length $t=nd$ , the number of arrivals has fluctuations of order $\sqrt{nd}$ . The analysis in the proof of Theorem 7.2 and Lemma 7.3 suggests that a positive proportion of the extra customers will still be in the system at the end of the period, and approximately a proportion $1/d$ of these will be in queues of length 1, so that fluctuations of order $\sqrt{nd}$ in the number of arrivals during $[0,t]$ result in fluctuations of order $\sqrt{n/d}$ in the number of empty queues at time $t$ .

We believe that a similar proof can be used to show sharp concentration of measure results for the supermarket model in the range where $\lambda<1$ and $d\geq 2$ are fixed constants. Here it is known that the proportion of queues of length at least $k$ , for each $k$ fixed, is close to $v(k)=\lambda^{(d^{k}-1)/(d-1)}$ in equilibrium. For $k\geq 1$ , let $f_{k}(x)$ be the number of queues at least $k$ ; for $x$ any state with approximately $nv(k)$ queues of length $k$ for each $k$ , and $k$ large, the quantity $Q(x,y)((\widehat{P}^{s}f_{k})(x)-(\widehat{P}^{s}f_{k})(y))^{2}$ is dominated by terms where the transition from $x$ to $y$ creates an unbalanced queue of length $k$ , and there is no departure from the unbalanced queue before time $s$ . Thus we may take $\widehat{\alpha}(s)$ at most some constant times $nv(k)e^{-2s}$ , and obtain concentration within order $\sqrt{nv(k)}$ for $f_{k}(x)$ in equilibrium, at least for $k$ large.

Acknowledgements. MJL thanks Monash University for their kind hospitality while part of this work was accomplished. GRB thanks the University of Melbourne for their equally kind hospitality while a different part of the work was accomplished.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W.J. Anderson (1991). Continuous-time Markov chains - An applications-oriented approach . Springer Series in Statistics. Springer Verlag, New York Inc.
2[2] G.R. Brightwell, M. Fairthorne and M.J.Luczak (2018). The supermarket model with bounded queue lengths in equilibrium. J. Stat. Phys. 173 , 1149–1194.
3[3] G.R. Brightwell and M.J. Luczak (2013). A fixed-point approximation for a routing model in equilibrium. Preprint, ar Xiv: 1306.5002 .
4[4] V. Chvátal (1979). The tail of the hypergeometric distribution. Discr. Math. 25 , 285–287.
5[5] J.P. Crametz and P.J. Hunt (1991). A limit result respecting graph structure for a fully connected loss network with alternative routing. Ann. Appl. Probab. 1 , 436–444
6[6] P. Diaconis and M. Shahshahani (1987). Time to reach stationarity in the Bernoulli–Laplace diffusion model. SIAM J. Math. Anal. 18 , 208–218.
7[7] P.J. Donnelly, P. Lloyd and A. Sudbury (1994). Approach to stationarity of the Bernoulli–Laplace diffusion model. Adv. Appl. Probab. 26 , 715–727.
8[8] A. Eskenazis and E. Nestoridi (2020). Cutoff for the Bernoulli-Laplace urn model with o ( n ) 𝑜 𝑛 o(n) swaps. Ann. Inst. H. Poincaré, Probab. Statist. 56 , 2621–2639.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Long-term concentration of measure and cut-off

Abstract.

Key words and phrases:

2000 Mathematics Subject Classification:

1. Introduction

1.1. Concentration of measure inequalities

1.2. Cut-off

1.3. Applications

2. Concentration inequalities: discrete time

2.1. Main result

Theorem 2.1**.**

2.2. Proof of Theorem 2.1

Lemma 2.2**.**

Proof.

2.3. Contracting chains

Theorem 2.3**.**

Proof.

2.4. Approximately fff-contracting chains

2.5. A toy example

3. Concentration inequality: continuous time

Theorem 3.1**.**

Lemma 3.2**.**

Proof.

Theorem 3.3**.**

Proof.

4. Upper bounds on coalescence times

Proposition 4.1**.**

Proof.

5. Bernoulli–Laplace diffusion model

Theorem 5.1**.**

Theorem 5.2**.**

Lemma 5.3**.**

Proof.

Proof of Theorem 5.2(a).

Lemma 5.4**.**

Proof.

Lemma 5.5**.**

Proof.

Lemma 5.6**.**

Proof.

Proof of Theorem 5.2(b).

6. A two host model of disease

Theorem 6.1**.**

Proposition 6.2**.**

Proof.

Proposition 6.3**.**

Proof.

7. Supermarket model

Theorem 7.1** (Brightwell, Fairthorne and Luczak).**

Theorem 7.2**.**

Proof.

Lemma 7.3**.**

Lemma 7.4**.**

Proof.

Proof of Lemma 7.3.

Theorem 2.1.

Lemma 2.2.

Theorem 2.3.

2.4. Approximately $f$ -contracting chains

Theorem 3.1.

Lemma 3.2.

Theorem 3.3.

Proposition 4.1.

Theorem 5.1.

Theorem 5.2.

Lemma 5.3.

Lemma 5.4.

Lemma 5.5.

Lemma 5.6.

Theorem 6.1.

Proposition 6.2.

Proposition 6.3.

Theorem 7.1 (Brightwell, Fairthorne and Luczak).

Theorem 7.2.

Lemma 7.3.

Lemma 7.4.