Long-term concentration of measure and cut-off
Andrew Barbour, Graham Brightwell, Malwina Luczak

TL;DR
This paper develops new concentration inequalities for Markov chains, enabling the analysis of the cut-off phenomenon in both finite and infinite state spaces, with applications to models like Bernoulli-Laplace, disease spread, and supermarket queues.
Contribution
It introduces generalized concentration of measure inequalities for Markov chains, extending cut-off analysis to infinite state spaces and diverse models.
Findings
Probabilistic proof of cut-off for Bernoulli-Laplace model
Extended cut-off concept to infinite state space chains
Concentration results for supermarket model
Abstract
We present new concentration of measure inequalities for Markov chains, generalising results for chains that are contracting in Wasserstein distance. These are particularly suited to establishing the cut-off phenomenon for suitable chains. We apply our discrete-time inequality to the well-studied Bernoulli-Laplace model of diffusion, and give a probabilistic proof of cut-off, recovering and improving the bounds of Diaconis and Shahshahani. We also extend the notion of cut-off to chains with an infinite state space, and illustrate this in a second example, of a two-host model of disease in continuous time. We give a third example, giving concentration results for the supermarket model, illustrating the full generality and power of our results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Neurological and metabolic disorders · Geometric Analysis and Curvature Flows
Long-term concentration of measure and cut-off
A. D. Barbour
Institut für Mathematik, Universität Zürich, Winterthurertrasse 190, CH-8057 Zürich
,
Graham Brightwell
Department of Mathematics, LSE
and
Malwina Luczak
School of Mathematics and Statistics, University of Melbourne
Abstract.
We present new concentration of measure inequalities for Markov chains, generalising results for chains that are contracting in Wasserstein distance. These are particularly suited to establishing the cut-off phenomenon for suitable chains. We apply our discrete-time inequality to the well-studied Bernoulli-Laplace model of diffusion, and give a probabilistic proof of cut-off, recovering and improving the bounds of Diaconis and Shahshahani. We also extend the notion of cut-off to chains with an infinite state space, and illustrate this in a second example, of a two-host model of disease in continuous time. We give a third example, giving concentration results for the supermarket model, illustrating the full generality and power of our results.
Key words and phrases:
Markov chains, concentration of measure, coupling, cut-off
2000 Mathematics Subject Classification:
60J75, 60C05, 60F15
ADB: Work supported in part by Australian Research Council Grants Nos DP150101459 and DP150103588, and by the ARC Centre of Excellence for Mathematical and Statistical Frontiers, CE140100049. Thanks to the mathematics departments of the University of Melbourne and Monash University for their kind hospitality.
MJL: Research partly supported by an EPSRC Leadership Fellowship EP/J004022/2 and partly by ARC Future Fellowship FT170100409.
1. Introduction
We have two main aims in this paper. The first is to develop some new concentration of measure inequalities for Markov chains, both in discrete and continuous time, and the second is to introduce a wider perspective on the cut-off phenomenon for convergence to equilibrium of Markov chains. Our past work suggests a strong connection between long-term concentration of measure, rapid mixing, and cut-off: this paper is an attempt to formalise, explain and illustrate this.
Our concentration of measure inequalities generalise and extend earlier results applicable for chains contracting in Wasserstein distance, which means that there is a metric on the state space so that the chain makes only short steps with respect to the metric, and a coupling of two copies of the chain so that the distance between the two copies decreases in expectation – in the language of Ollivier [26], this means that the chain has positive coarse Ricci curvature. For discrete-time Markov chains with positive coarse Ricci curvature, Ollivier proves that any real-valued function of the Markov chain that is Lipschitz with respect to the metric remains well-concentrated around its expectation for all time, and in equilibrium; a similar result follows from results of Luczak [19] proved independently at around the same time. Paulin [27] gives a more general framework, obtaining concentration results, and bounds on the mixing time, in cases where the “multi-step coarse Ricci curvature” is positive, even if the coarse Ricci curvature is not. The concentration results proved in these papers, as well as in the present paper, are of the “Gaussian then exponential” type, akin to Bernstein’s Inequalities: the probability of deviations of at least from the mean is of order for small and for large – Ollivier gives examples where this is the best possible form of the concentration inequality.
Our new results in discrete time do not rely on the existence of a well-behaved metric on the state space, and require only conditions regarding the function of interest. Thus we obtain stronger concentration results for functions of the chain that evolve much more slowly than the total transition rate of the chain, as long as they are contractive, in a suitable sense. We recover essentially the same result as Ollivier in the case of positive coarse Ricci curvature, and we can also obtain results very similar to those of Paulin, but our results can also be used to prove concentration of measure in other settings. The application we give in the final section of our paper gives a concentration result that we do not know how to obtain by other means.
We also give analogous concentration inequalities for continuous-time Markov chains. These are entirely new, although, for chains contracting in Wasserstein distance, similar results could be obtained via the methods and results of Ollivier [26], Luczak [19] and Paulin [27]. Veysseire [30] gives definitions and results for coarse Ricci curvature in continuous time, but does not prove any results that are closely related to ours.
We now turn to the cut-off phenomenon. For a Markov chain , with initial state , consider the total variation distance between the law of the process at time and the equilibrium distribution. The chain is said to exhibit the cut-off phenomenon if this distance falls from near 1 to near 0 over a window of time that is much shorter than the mixing time. In previous work, it is assumed that the state space is finite, and the starting state is chosen to maximise the mixing time. We present a version of the definition allowing for an infinite state space, and for variation of the mixing time over a region of potential initial states, with a cut-off window of width that is uniform across this region.
Our concentration of measure inequalities, combined with coupling arguments, are well-suited to proving cut-off, and we illustrate this with two examples of independent interest. The first is the well-known Bernoulli-Laplace model of diffusion: there are initially red balls in one urn and black balls in another, and at each time step one ball from each urn is chosen uniformly at random and the two balls are exchanged. Cut-off was proved for this model in 1987 by Diaconis and Shahshahani [6] using algebraic techniques: we provide a probabilistic proof, essentially recovering the bound of Diaconis and Shahshahani for the upper tail of the distribution of the mixing time, while providing a sharper bound for the lower tail.
Our second application concerns a continuous-time model of a disease with two types of host, each infecting the other; the disease is supported at a low level in a population by immigration of both types of infected host from outside. This example illustrates both the application of our new continuous-time concentration inequality and our new concept of cut-off, as the state space is infinite and the mixing time varies significantly depending on the initial conditions.
In both of the sample applications above, the chain we examine is contractive in Wasserstein distance, and variants of the results we obtain could also be obtained from concentration inequalities in earlier work. We also present a third application which uses the full power of our new continuous-time inequality; this treats the supermarket model, a well-known queueing system, with a certain range of parameter values. In this example, we utilise facts about the equilibrium distribution from a paper of Brightwell, Fairthorne and Luczak [2], alongside our long-term concentration result, to show tight concentration in equilibrium of the number of empty queues.
1.1. Concentration of measure inequalities
Our general concentration inequality for discrete-time Markov chains appears as Theorem 2.1, and the special case where the chain is contracting in Wasserstein distance with respect to a suitable metric as Corollary 2.3. Part (a) of Theorem 2.3 is very similar to Theorem 32 of Ollivier [26] – that result is for the equilibrium distribution of the chain, whereas ours is for finite-time distributions, but Ollivier’s Remark 34 indicates that the proof in his paper transfers to the finite-time case. A similar result for chains contracting in Wasserstein distance also follows readily from Theorem 4.5 of Luczak [19]. We give more details after we have given precise definitions and statements of theorems.
There is another quite different recent strand of work providing tools to show concentration of measure and rapid mixing for a given function of a Markov chain, useful in circumstances where the function mixes more rapidly than the chain itself. See Watanabe and Hayashi [32] and Rabinovitch, Ramdas, Jordan and Wainwright [29].
Results similar to Theorem 2.1 appear in earlier works of the third author, some unpublished, and a number of other applications are to be found in these papers, as well as in Gheissari, Lubetzky and Peres [11]. The flavour of the inequality is similar to that of Luczak [19], but Theorem 2.1 can be much more powerful when the chain makes frequent transitions that do not alter the value of the function of interest.
One example where this is relevant is the supermarket model, as studied in the final section of this paper, where the number of queues of length only changes infrequently for some values of .
Another example is the alternative routing model of Gibbens, Hunt and Kelly [12]. Here, there are links of limited capacity between each pair of nodes in a phone network; requests for pairs of nodes to be connected arrive according to a Poisson process, and these can be met either by using the direct link or by using some path of two links. Different protocols have been proposed and studied for choosing the route; one such is to use the direct link if it has spare capacity, and if not then to inspect links of two routes, and use one of those with most spare capacity. In an unpublished preprint of Luczak [20], an earlier version of Theorem 2.1 is used to prove a differential equation approximation for this model, extending earlier results of Crametz and Hunt [5] and Graham and Méléard [13]. The equilibrium behaviour of the model is studied via a similar approach in an unpublished preprint of Brightwell and Luczak [3]. The same methods can be used to treat other routing protocols. The key principle is that quantities such as the number of occupied links incident with a given node change far less often than the overall state of the network. Our new result, Theorem 2.1, improves on the earlier version in Luczak [20] (Theorem 2.3) by weakening and simplifying its hypotheses.
The corresponding inequality for continuous-time Markov chains is Theorem 3.1, and the special case for chains that are contracting in Wasserstein distance is Theorem 3.3. Our proof for continuous time uses different methods to those used for discrete time (although both proofs draw on principles of concentration of measure for martingales), and it is perhaps a little surprising that the resulting theorems are nearly exact analogues of each other. In Brightwell and Luczak [3], a continuous-time model is analysed (somewhat awkwardly) by applying discrete-time concentration of measure inequalities from [20] to its jump chain; it seems that this analysis would be eased by direct application of our new continuous-time inequalities, and we plan to produce an improved version of [3] in the future.
Our notion of contraction in Wasserstein distance is very different in flavour from that of contraction in total variation distance, as studied by Marton [22] and others subsequently. In particular, for a chain to exhibit contraction in total variation distance, it is necessary that, from any two states, there is a positive probability that two coupled chains started in these states coalesce in a single step.
1.2. Cut-off
We now discuss the cut-off phenomenon in the convergence to equilibrium for sequences of Markov chains.
Let denote the distribution of when , and let be the equilibrium distribution of . Let denote the state space of the chain .
In earlier papers (for instance, Diaconis and Shahshahani [6] and Levin, Luczak and Peres [17]), cut-off is defined as follows, in the case where the state space is finite for each . The worst-case distance to stationarity for the chain at time is
[TABLE]
and the sequence of chains is said to exhibit cut-off at time with window width if and
[TABLE]
In other words, for a large constant , at time , the chain is nearly in equilibrium, whatever the starting state; on the other hand, there is a starting state such that the chain starting from state is very far from equilibrium at time .
In many cases where cut-off, with window width , can be proven, the situation is typically as follows, with a proof involving two separate arguments. The state space has a metric, and the Markov chain makes jumps that are small with respect to this metric. The equilibrium distribution is concentrated around some point (suitably scaled with ) in the state space. If the chain is started at some “distant” point , one shows that its trajectory is concentrated around its expectation, up until some time when the expectation becomes suitably close to . Once in the neighbourhood of , one seeks a coupling with a copy of the chain in equilibrium, where coalescence takes place in time of order . One example of such a proof was given by Levin, Luczak and Peres [17], and our examples in Sections 5 and 6 both illustrate this general approach.
Similar behaviour is often to be found in examples where the state space is infinite, and there is no “most distant” starting point from equilibrium. For instance, in a population model, there may be no effective upper bound on the initial size of a population. Thus we find it useful to introduce a more general notion of cut-off, where the mixing time depends on the initial state, but the window width is independent of the starting state. The proof scheme above can then be applied, provided we restrict the class of allowed initial states to exclude (a) states too close to the point around which the equilibrium is concentrated, where the “travel time” from to will be of similar or smaller order to the time required for coalescence of the coupled chains in the neighbourhood of , and (b) possibly also states extremely distant from , where the fluctuation in the travel time exceeds the window width .
We now give our formal definition of cut-off, which extends the previous definition, and in particular allows for an infinite state space. For a subset of the state space of , let be a collection of non-random times, and let be a sequence of numbers such that . We say that exhibits cut-off at time on with window width , if there exist (non-random) constants such that, for any and for all large enough,
[TABLE]
uniformly for all .
In some examples, the travel time can be taken not to depend on , as long as . We say that exhibits cut-off at on with window width , for a sequence , if the in the definition above can be set equal to for all and all . An illustration of this last concept comes in Section 5; the idea here is that the expected “travel times” from all suitably distant starting states are nearly equal.
Our concentration of measure results are suited to showing that a Markov chain closely follows an almost deterministic trajectory until it reaches the neighbourhood near where the equilibrium is concentrated. In order to complete a proof of cut-off, one needs to show that convergence to equilibrium is rapid once that neighbourhood has been reached. Proposition 4.1 gives conditions guaranteeing that a Markov chain taking non-negative real values, with a non-positive drift in all positive states, reaches 0 quickly with high probability. This implies an upper bound on the coalescence time for the two copies of the chain in a contracting coupling. We give such a result only in continuous time, and apply it in our continuous-time sample application in Section 6. Our proof of Proposition 4.1 is based on the proof of a discrete-time analogue appearing as Proposition 17.19 of Levin, Peres and Wilmer [18]. Our application in Section 5 requires a sharper coupling result specific to the model; using some version of Proposition 17.19 from [18] would give weaker bounds on the tail of the distribution of the mixing time.
1.3. Applications
We give three examples. The first two feature chains that are contracting in Wasserstein distance, illustrating both our methods and the cut-off phenomenon. In the third example, we prove results about concentration of the equilibrium distribution by using the full strength of our new concentration inequalities.
Section 5 concerns the Bernoulli–Laplace model of diffusion, originally investigated in the context of cut-off by Diaconis and Shahshahani [6]. In this discrete-time model, there are two urns each containing balls, with red and black balls in total: at each time step, one ball is chosen uniformly at random from each urn and the two are exchanged. The state of the system after steps is captured by the number of red balls in the left urn, and one compares the distribution of with the stationary distribution (which is concentrated around ). Diaconis and Shahshahani prove cut-off for at time with window width . Indeed, their proof establishes cut-off not only for the most distant starting states (where or ) but on any set . They also give specific exponential rates for the tail of the distribution of the mixing time. The methods used by Diaconis and Shahshahani are algebraic: we give an alternative proof, using our concentration of measure results. Our proof gives the same exponential rate for the upper tail as in [6], although our proof does not give information about the extreme end of the tail, where the total variation distance between the distribution at time and the equilibrium distribution is below . Our methods yield a doubly exponential rate for the lower tail, improving on the results of Diaconis and Shahshahani.
In Section 6, we consider a toy model of a subcritical two-host infection, maintained by immigration of infectives from outside, at rates that are constant multiples of a scale parameter . Our model is appropriate in circumstances where the number of infectives is small compared to the total population size, and the expected number of infectives of each type of host satisfies a linear equation with a fixed point . We consider an arbitrary starting state within an annular region , where , and we show cut-off at with window width 1 over this region. Here the travel time is bounded between two constants times , but varies over the region , for any .
In Section 7, we consider the supermarket model. In this -server queueing model, customers arrive according to a Poisson process at rate , where , and inspect queues before joining a shortest queue among these . The service time of each customer is exponential of mean 1. We consider a parameter regime where tends to 1 as , and grows as , where and are constants satisfying certain inequalities. We choose the precise parameter range so that, as shown by Brightwell, Fairthorne and Luczak [2], the maximum queue length in equilibrium is 2 with high probability, and most queues have length exactly 2. For this model, we study the distribution of the number of empty queues, and show that it is concentrated within order of its mean . The application is chosen to illustrate the power of our general results; most transitions of the chain do not affect the number of empty queues, so that our methods give stronger concentration results than we are able to obtain by any other means. The techniques we use will extend readily to other parameter ranges.
Further consequences of inequalities Theorem 2.1 and Theorem 3.1 will be explored in future work.
2. Concentration inequalities: discrete time
In this section, we first state and prove a general concentration of measure inequality designed for the analysis of discrete-time Markov chains, generalizing results of Luczak [19]. We then show how to recover a version of a result of Ollivier [26] for contracting chains, which is perhaps more appealing and still fairly widely applicable. Next, we outline how to use the inequality when we have a coupling of two copies of the chain which is “approximately contracting” in the function of interest. Finally, we give a toy example to illustrate the application of the inequalities.
2.1. Main result
Here and throughout, we use to denote the non-negative integers. Let be a discrete-time Markov chain with a discrete state space and transition probabilities for . We allow to be lazy; that is, we allow for .
For , we set
[TABLE]
For and a function , define the function by
[TABLE]
whenever it exists, where and are used to denote conditional expectation and probability given .
Theorem 2.1**.**
Let be the transition matrix of a discrete-time Markov chain with discrete state space . Let be a subset of . Let be a function such that exists for all and , and satisfying, for all ,
[TABLE]
where and are positive constants. Set , . Define , the event that stays in for the first steps. Then, for all and all ,
[TABLE]
The conditions of the theorem are what is needed to fit into the framework of bounded differences (Bernstein-like) inequalities, and the expression in the assumption on is, as we shall see, exactly what emerges when we bound conditional variances.
Evidently increases with . Under a contractivity assumption, as we shall see shortly, the can be taken to tend to 0 exponentially, so that the are uniformly bounded: this means that we have a concentration of measure bound that is uniform in . The result can also be applied in circumstances where the either converge more slowly to 0, or increase not too rapidly: in these cases, we obtain tighter concentration of for smaller values of .
Theorem 2.1 improves on Theorem 4.5 of Luczak [19] by using (2.2) to define , instead of the cruder bound
[TABLE]
where is assumed to be a Lipschitz function with Lipschitz constant , and denotes the Wasserstein distance (both defined with respect to the same metric on the state space ). This is particularly important in contexts in which evolves significantly more slowly than itself, because many of the transitions of do not change the value of . An example where this is relevant is the supermarket model, discussed in the final section of this paper, as well as the alternative routing model of Gibbens, Hunt and Kelly [12] and its generalisation, as studied in Brightwell and Luczak [3]. (These particular examples are set up as continuous-time Markov chains, for which our companion inequality, Theorem 3.1, is more naturally applicable, though it is also natural to consider their discrete-time analogues.) Theorem 2.1 also improves on Theorem 2.3 of Luczak [20], by weakening and simplifying its hypotheses.
In the case where the hypotheses of Theorem 2.1 are satisfied with , we can immediately derive a bound on the variance of , valid for any fixed starting state . Indeed, we have
[TABLE]
2.2. Proof of Theorem 2.1
To prove Theorem 2.1, we use a slight extension of a result of McDiarmid [23]. Inequality (2.4) in Lemma 2.2 below is a ‘two-sided’ version of inequality (3.28) in Theorem 3.15 of McDiarmid [23]; inequality (2.5) is a slight extension of inequality (3.29) of McDiarmid [23], in that we work with a non-deterministic bound on , and is also two-sided.
For a square integrable random variable and a -field , we use to denote the conditional variance of on .
Lemma 2.2**.**
Let be a probability space equipped with a filtration in . Let be an -measurable random variable with , and let , for . Let and be constants such that a.s. and a.s. for all . Then for any ,
[TABLE]
More generally, the following holds. For , let
[TABLE]
For any and any values ,
[TABLE]
The proof is that of Theorem 3.15 (inequalities (3.28) and (3.29)) in McDiarmid [23], except that we use the indicator of the event instead of the event \Bigl{\{}\sum_{i=1}^{k}\operatorname{\mathrm{v}ar}(Z_{i}\mid\mathcal{F}_{i-1})\leq\delta\Bigr{\}}. The proof is rather like a stopping argument, avoiding some technicalities.
Proof.
Following McDiarmid [23], we use Lemma 3.16 [23], which is as follows. If is a martingale difference sequence with respect to a filtration , where each is bounded above, if is an indicator random variable, and if is a real number, then
[TABLE]
(The statement in [23] involves the supremum instead of the essential supremum: the notionally stronger version is obtained by changing the on a set of measure 0. The proof is fairly straightforward by induction over a single-step inequality.)
Now, for any random variable such that and , we have , where (see Lemma 2.8 in McDiarmid [23]). So, for any , defining the (possibly infinite) random variables and , we have
[TABLE]
Let be the indicator of the event . It then follows that
[TABLE]
Hence
[TABLE]
Optimising in , we set and use the inequality for , as in the proof of Theorem 2.7 in McDiarmid [23].
We obtain that
[TABLE]
The same proof gives the same upper bound on , and the result follows. ∎
Proof of Theorem 2.1. We start by assuming that . Let denote the natural filtration of . We fix a function , a natural number , and an initial state . We consider the evolution of for steps, conditional on . Define the random variable . Then, for , is given by
[TABLE]
To apply Lemma 2.2, we need to bound the conditional variances , for . Conditional on the event , takes the value with probability . Since for any , it follows that
[TABLE]
with . Using Assumption (2.2), this yields
[TABLE]
uniformly in . It thus follows that
[TABLE]
so we set .
We also need a uniform upper bound on . We note that
[TABLE]
Note that, from Assumption (2.1), if for some , then
[TABLE]
It then follows from (2.8) that, on the event ,
[TABLE]
uniformly in , since, in the last sum, both and belong to . Accordingly, we take .
Theorem 2.1 now follows from inequality (2.4) in Lemma 2.2, in the case where .
In general, for each , (2.7) and (2.9) hold if , and so all the above bounds hold on the event . Thus , as defined in Lemma 2.2, and the full statement of Theorem 2.1 follows from inequality (2.5) in Lemma 2.2.
2.3. Contracting chains
We next show how to use Theorem 2.1 to recover a version of Ollivier’s results on chains with positive coarse Ricci curvature.
Let be a metric on the state space of a discrete-time Markov chain . A Markovian coupling of two copies of the chain is contracting with respect to the metric if, for some positive constant and for all ,
[TABLE]
If condition (2.10) holds for all in some subset of , then we say that the coupling is contracting on .
The existence of a coupling satisfying (2.10) for all pairs of states is equivalent to the inequality
[TABLE]
where denotes the Wasserstein distance between two measures with respect to the metric on a space : is the infimum of over all pairs of -valued random variables, with and . Ollivier [26] defines a Markov chain to have coarse Ricci curvature at least if (2.11) holds: we prefer to say that the Markov chain is contracting in Wasserstein distance.
In the case where is a graph distance – i.e., is the length of a shortest path in a graph between vertices and – inequality (2.11) is equivalent to
[TABLE]
where denotes adjacency in the graph. Gheissari, Lubetzky and Peres [11] call a chain satisfying (2.12) -contracting. We prefer to use the term contracting in Wasserstein distance to avoid confusion with the concept of contraction introduced by Marton [22], which is contraction in total variation distance.
For a Markov chain that is contracting in Wasserstein distance with respect to a metric , we now prove concentration of measure for any real-valued function on the state space that is Lipschitz with respect to . Part (a) of the theorem below applies when the Markov chain is contracting on the entire state space; part (b) is for when the contraction is only on some “good set”.
For an event , we let denote its complement.
Theorem 2.3**.**
Let be a discrete-time chain on discrete state space with transition matrix . Suppose that is a metric on , and let be a function such that, for some constant , for all . Suppose also that is a positive constant such that whenever .
(a) If is contracting in Wasserstein distance, with constant , and is a constant such that, for all ,
[TABLE]
then, for all , , and ,
[TABLE]
(b) More generally, suppose that is contracting in Wasserstein distance on a subset of , with constant , and let be a further subset of such that . Suppose that (2.13) holds for all . For a positive integer, let , and define
[TABLE]
Then, for all and ,
[TABLE]
Note that we may always take , but sometimes it is possible to take significantly smaller. In part (b), we would expect to be able to choose the various sets so that is very small. In order to apply part (b) effectively, one would need to know that is small, and this will not be true if the starting state is “close to the boundary” of : a natural approach is to have three nested sets of states , with the starting state restricted to , and with the probability of escaping from one set to the next over the time interval of interest being small; then we obtain concentration of measure over that time interval, uniformly over starting states in .
Proof.
For part (a), we apply Theorem 2.1 with . For states and with , let and be copies of the chain with and , coupled so that for each . Then we have
[TABLE]
[TABLE]
whenever and . Thus we may take in (2.1) and in (2.2) for each . Since then for all , the inequality follows.
For part (b), our plan is to apply Theorem 2.1 to the “inner” set , so we need bounds on valid whenever and . Accordingly, we fix such a pair , and . We now consider two copies and of the chain, with and , with a contractive coupling on with constant . For , let be the event that both copies of the chain are in for all , and note that . We claim that, for each ,
[TABLE]
This is true for . If the inequality is true for , then
[TABLE]
as claimed. As each step of either chain increases the distance between them by at most , we also have the bound
[TABLE]
for and also for , and therefore
[TABLE]
Hence we have
[TABLE]
whenever and . Additionally we have that
[TABLE]
for all . Thus we can apply Theorem 2.1 with , for , and for each . Since then , the inequality follows. ∎
Both parts of Theorem 2.3 follow directly, with essentially the same proof as here, from Theorem 4.5 of Luczak [19]. Part (a) of the result is also very similar to Theorem 33 of Ollivier [26]. Ollivier’s result is for the equilibrium distribution, although he notes in Remark 39 that a similar result can be obtained for the finite-time distributions. Ollivier’s bounds are stated in terms of a quantity called the coarse diffusion constant , at a state , which is closely related to our , and a quantity called the local dimension , that is of constant order in most applications with discrete state spaces. Our proof of Theorem 2.1 could be reworked to use the coarse diffusion constant directly (when bounding the conditional variances, we could instead use that , where and are independent copies of – see the proof of Lemma 4.6 in [19]). The conclusion of our result translates to essentially the same as Ollivier’s, with different constants. The concentration result is of the “Gaussian-then-exponential” type.
2.4. Approximately -contracting chains
We next illustrate how Theorem 2.1 can be applied in other settings, without even a metric on the state space. One can obtain a result by analysing the direct effect a coupling has on the function of interest, if the coupling is “approximately -contracting”, as we now describe. As before, let and be two coupled copies of the Markov chain, and let be any function. Suppose that for any , and that, for all states ,
[TABLE]
for some constant , and some “error function” . (An example where there is a need for such an error function is in Lemma 3.1 of [3].)
An induction argument then gives that, for all and every ,
[TABLE]
where
[TABLE]
A convenient assumption, which is satisfied in the example from [3], is that , for all and all with , so that for each and each and with . It follows in this case that, for and every ,
[TABLE]
So we may take in Theorem 2.1, and hence for all . Also we may take , where is a uniform bound on for all and . Applying Theorem 2.1 with these constants then gives a concentration inequality valid for all and all :
[TABLE]
2.5. A toy example
Many of the chains we might be interested in have stationary distributions, and under suitable conditions our results on long-term concentration of measure imply concentration of measure in equilibrium. This is explored in Corollary 4.2 of Luczak [19], giving circumstances where the chain is guaranteed to have a stationary distribution, and where concentration results carry over to equilibrium. The main focus of the paper of Ollivier [26] is also concentration of measure in equilibrium. In the example in Section 7 of this paper, we use facts from elsewhere about the equilibrium distribution, as well as our long-term concentration results, to prove concentration of measure of a suitable function in equilibrium.
We finish this section with a very simple class of examples, illustrating very different circumstances when our results can be applied. These examples have no stationary distributions, and our results can be applied to show concentration of measure within a window whose width may be constant, or may increase with time.
Consider the discrete-time chain with state space , , and transition probabilities . This is thus a pure-birth chain, stepping up with probability 1/2 at each time. We also consider a function , and we are interested in the long-term behaviour of . Of course, this is easy to analyse directly since has a Binomial distribution with parameters . If, for example, for some constant , then is concentrated within a window of width around .
We start by explaining why the hypotheses of Theorem 2.3 are too restrictive to encompass these examples. Consider a coupling of two copies of the chain, so that at each step either both copies move up, or neither moves up. (Choosing a different coupling would not make any difference.) Suppose that this coupling is contracting, with constant , with respect to some metric on . Then we have
[TABLE]
for each pair , which amounts to for each . If the function is Lipschitz with respect to , with constant , then . This condition is only satisfied if converges to a limit , and moreover for some constant . In particular, none of the functions satisfy the hypotheses, even though a time-independent concentration result does hold when .
We now show how to apply our more general result, Theorem 2.1, to the class of functions , with . We note that is non-increasing in , and that from the Chernoff bound. Then we have, for any , and sufficiently large,
[TABLE]
Hence we may take for large enough , and then is at most a constant for , and at most for . We may also take . For , applying Theorem 2.1 with equal to the entire state space , gives a uniform bound on the concentration:
[TABLE]
for all , showing that remains concentrated within a window of constant width around its mean for all . Of course, this is still far from a sharp result. For , we obtain that
[TABLE]
so that is concentrated within of its expectation, which in this case is the correct order of magnitude.
3. Concentration inequality: continuous time
We now state and prove a continuous-time version of Theorem 2.1. For definitions concerning continuous-time Markov chains, see Anderson [1], in particular pages 13 and 81 (we use the term “non-explosive” in place of “regular”).
Let be a stable, conservative, non-explosive continuous-time Markov chain with a discrete state space and -matrix . Let denote the transition probabilities of . Much as before, for a function , we write to denote , whenever it exists.
For , we set
[TABLE]
Theorem 3.1**.**
Let be the -matrix of a stable, conservative, non-explosive continuous-time Markov chain with discrete state space . Writing , let be a subset of , for which . Let be a function such that exists for all and , and suppose that is a constant such that
[TABLE]
for all , all and all . Assume also that the continuous function satisfies
[TABLE]
for all and all . Define . Finally, let . Then, for all , and ,
[TABLE]
Exactly as in the discrete case, a bound on the variance of follows in the case where .
In order to prove the theorem, we first need to show that, for any fixed , the function has zero quadratic variation on any finite -interval. This follows from the following lemma.
Lemma 3.2**.**
Under the above assumptions, for each , is continuously differentiable with respect to .
Proof.
We can suppose that for all ; if not, it suffices to consider the positive and negative parts and of separately. This enables the exchange of sums and integrals in the argument that follows.
First, by considering what happens up to time , we have
[TABLE]
Thus, from (3.1), for and , it follows that
[TABLE]
Now, since
[TABLE]
the Kolmogorov backward equations imply that, for any and , we have
[TABLE]
In view of (3.3), and because , the integrand on the right hand side of (3.4) is uniformly bounded on for any , implying that the indefinite integral is continuous in . From this, it follows immediately that is continuous in also. But then, for ,
[TABLE]
is a uniformly convergent sum, in view of (3.1), and so the integrand in (3.4) is continuous; thus the indefinite integral is continuously differentiable with respect to , and hence is also. ∎
Proof of Theorem 3.1. Fix and, for , define
[TABLE]
note that and that . Then is a martingale, and so is , where , and
[TABLE]
We now use a supermartingale derived from to prove a concentration bound.
In view of Lemma 3.2, the continuous part of has no quadratic variation until , and so the predictable quadratic variation of is given by
[TABLE]
Hence, by (3.2),
[TABLE]
Let the jump times of be denoted by , and write
[TABLE]
where , as in the proof of Lemma 2.2, and, for such that ,
[TABLE]
using the continuity of in for each .
Let denote the compensator of . We first note that is finite, at least for . This is because, for , we have
[TABLE]
by (3.1), as is increasing on . Hence, noting that , we see that
[TABLE]
in view of (3.5).
Now is a square integrable martingale, because of (3.5), and hence, from the proof of Lemma 2.2 in van de Geer [10], is a non-negative supermartingale with initial value , since the continuous part of has no quadratic variation. Thus
[TABLE]
On the other hand, using (3.6),
[TABLE]
Hence
[TABLE]
or
[TABLE]
We again optimise in , as in the proof of Theorem 2.7 in McDiarmid [23], and then repeat the argument for a bound on .
Let be a stable, conservative, non-explosive continuous-time chain with state space , and let be a metric on . A Markovian coupling of two copies of is itself a contiuous-time Markov chain, with a generator that we denote . The coupling is said to be contracting with respect to , with constant , if, for all ,
[TABLE]
If the above holds for all and in some , then we say that the coupling is contracting on . We say that is contracting in Wasserstein distance if there is a coupling satisfying (3.7) for all . This definition corresponds to that of positive coarse Ricci curvature for continuous-time chains given by Veysseire [30], in the setting of jump chains.
The next result establishes concentration of measure for continuous-time chains that are contracting in Wasserstein distance. We state our result only for the case when the Markov chain is contracting on the entire state space, but there is not necessarily a global upper bound on the total transition rate out of a state. We could also provide a version for use when the contraction property only holds on a “good set”, but it seems hard to cover all the possible cases where such a result might be useful: an issue is that we need some mild control on the growth of in the unlikely event that the chain leaves the good set (in the discrete case, we used that the chain makes a bounded number of steps of bounded distance) and the form of the bounds will depend on the manner of that control.
Theorem 3.3**.**
Let be a stable, conservative, non-explosive continuous-time Markov chain on a discrete state space , with -matrix . Suppose that is a metric on , and let be a function such that, for some constant , for all .
Let be a subset of , and let and be constants such that for all and whenever and . For , let .
Suppose that is contracting in Wasserstein distance, as in (3.7), with constant . Then, for all , and ,
[TABLE]
Proof.
It follows from (3.7) that, under a contracting coupling of two copies and , the process \bigl{\{}e^{\rho t}d(\widehat{X}^{(1)}(t),\widehat{X}^{(2)}(t))\bigr{\}}_{t\geq 0} is a non-negative local supermartingale. Thus, if , then
[TABLE]
We can now apply Theorem 3.1, with
[TABLE]
and so, for any ,
[TABLE]
The result now follows from Theorem 3.1. ∎
Note that the upper bound in Theorem 3.3 on the deviations of from its expectation does not depend on . As in the discrete case, in many applications, the distribution of will approach an equilibrium, and the bound above implies a bound on the concentration of in equilibrium. However, it might well be the case that as : eventually the chain leaves the good set, and once it does we cannot hope to say much about its behaviour.
4. Upper bounds on coalescence times
In this section, we prove an auxiliary result for continuous-time Markov chains, which we will use (primarily in Section 6) to show that a chain with a contracting coupling mixes rapidly once it enters a region of the state space where the equilibrium distribution is concentrated; this is therefore a useful ingredient in a proof of cut-off, showing that the mixing time from any “distant” state is dominated by the “travel time” to reach .
We study a function of a continuous-time Markov chain on the non-negative reals, with non-positive drift in all positive states, and prove a lower bound on the hitting time of state 0. For a contracting coupling of two copies of a Markov chain with respect to the metric on their state space , we can apply our result below to the function of the Markov chain , in order to show that coalescence occurs quickly once the distance between the two copies is reasonably small: we illustrate this method in Section 6.
We deal only with the continuous-time case. Proposition 17.19 of Levin, Peres and Wilmer [18] gives an analogous result for discrete-time chains, which can often be used in a similar way to that described above; our proof of the proposition below follows theirs.
Proposition 4.1**.**
Let be a stable, conservative, non-explosive continuous-time Markov jump chain, with state space and -matrix . Let and be positive, and let be a function. Set , and assume that:
- (i)
the drift \sum_{y}Q(x,y)\big{(}f(y)-f(x)\big{)} of is non-positive for all in ;
- (ii)
* makes jumps of magnitude at most ;*
- (iii)
\sum_{y}Q(x,y)\big{(}f(y)-f(x)\big{)}^{2}\geq\sigma^{2}* for all .*
Define , the hitting time of . Then, for any ,
[TABLE]
Notes:
- (a)
The nature of the underlying state space is not relevant, and we do not need to assume that the set is discrete. 2. (b)
It is not a priori obvious that is non-empty or that is a.s. finite, but these follow from the result. 3. (c)
Suppose that . In the case where , we then have , and so (4.1) holds without any condition on .
The motivating example underlying the proposition is that of a simple random walk on (with ), making steps up and down each at rate , until the walk hits 0, so that the sum in (iii) is equal to for each positive state. In this case, the proposition says that the walk hits 0 before time with probability at least , which is best possible up to a constant factor. The proposition then gives conditions, for more general processes, under which the same behaviour holds.
As mentioned already, we shall apply Proposition 4.1 to a Markovian coupling , where and are two copies of a jump Markov chain with a state space equipped with a metric , and f\big{(}(x,y)\big{)}=d(x,y). The conclusion is equivalent to saying that the chains have coalesced by time with probability at least (unless the two chains start within distance of each other, where is the maximum size of a jump in the distance, and is less than ). If the coupling is contracting with respect to , then condition (i) is satisfied. A lower bound on the expression in condition (iii) can be obtained when, under the coupling, the distance between the two copies changes by at least at rate at least , for suitable and .
Our proof follows that of Proposition 17.19 in Levin, Peres and Wilmer [18].
Proof.
Let , so that . For some to be chosen later, let . We note that, for any ,
[TABLE]
We now give bounds on the two terms on the right above.
By (i), the process is a supermartingale, and by (ii) it is bounded between [math] and . Therefore, by the Optional Stopping Theorem, we have , and so .
For , we set . We claim that is a submartingale. For , we have
[TABLE]
As
[TABLE]
we have
[TABLE]
for all , by (i) and (iii), and so indeed for .
For , we have , as (since ) for . Thus we have, for any , , and so
[TABLE]
Hence we obtain, for any , . Letting tend to infinity and applying the Monotone Convergence Theorem, we obtain the same upper bound on . Therefore, for any ,
[TABLE]
We conclude that
[TABLE]
Optimising this bound by setting now gives, provided (so that ),
[TABLE]
If , then the result is trivial, so we obtain the bound above under the condition . ∎
We remark that the assumption of bounded jumps cannot be dropped. Let be a chain on with -matrix given by (a) for , and , and (b) for , . Then is a non-explosive jump chain satisfying conditions (i) and (iii) with . From a state , the probability that all subsequent jumps are down is equal to . Thus the chain makes a.s. finitely many visits to before entering and making only downward jumps thereafter, but can never reach 0.
Alternatively, consider the chain on with a -matrix such that for all , for , and for . This chain satisfies all of (i)-(iii), with , but is explosive: starting from a state , the probability that the chain makes infinitely many downward jumps before the first upward jump is . State 0 is not reached before the explosion time.
5. Bernoulli–Laplace diffusion model
As our first example, we re-examine the Bernoulli–Laplace chain (Feller [9], Example XV.2(f)), for which cut-off was first established in Diaconis and Shahshahani [6]. In this model, there are two urns, the left urn initially containing red balls, and the right urn black balls. Then, at each time step, a ball is chosen at random in each urn, and the two balls are switched.
The state of the system at any time is captured by the number of red balls in the left urn at time . The chain can be viewed as a discrete-time lazy random walk with state space , with state-dependent transition probabilities
[TABLE]
Diaconis and Shahshahani examine the total variation distance between the distribution of and its equilibrium distribution , a hypergeometric distribution with parameters , defined by
[TABLE]
Analogously to earlier, we use , and to refer to distributions conditional on , and we also use , and to refer to the equilibrium distribution.
Letting , Diaconis and Shahshahani [6] show that there are universal constants such that
[TABLE]
Their proofs, especially that of (5.1), are based on algebraic techniques. Although they only consider starting from state , which is easily seen to maximise the mixing time, their proofs extend readily to cover other starting states. The upper bound (5.1) holds for any starting state. If the chain is started in a state in
[TABLE]
then a minor adjustment to their proof yields a bound of the form
[TABLE]
for some universal constant .
Thus, in the language introduced in Section 1, we have the following result.
Theorem 5.1**.**
For any , the Bernoulli–Laplace chain exhibits cut-off at on with window width .
We use the results of the previous sections to give an alternative, coupling proof of Theorem 5.1, yielding the bounds in the result below.
Theorem 5.2**.**
Let be a copy of the Bernoulli-Laplace chain. For , set .
(a) For , we have
[TABLE]
for any , any , and .
(b) For , we have
[TABLE]
for any , and sufficiently large.
Thus our upper bound in Theorem 5.2(b) matches that of Diaconis and Shahshahani in (5.1), except that our proof requires a mild upper bound on , and our lower bound in part (a) improves on (5.2). The inequalities above are more than enough to imply Theorem 5.1.
Extensions and generalisations of the result of Diaconis and Shahshahani have also been obtained. For instance, Donnelly, Lloyd and Sudbury [7] showed cut-off for the separation distance mixing time for this model, and recently Eskenazis and Nestoridi [8] showed cut-off for the version where balls are exchanged at each step. All of these papers make some use of algebraic techniques.
We now give a brief overview of our proof of Theorem 5.2. The first step is to use our discrete-time concentration of measure inequality, Theorem 2.3(a), to show that, for any starting state and any , is well-concentrated around its mean. An easy estimate for the mean then shows that, with high probability, is far from for , and this is enough to give part (a).
The proof of (b) is more complicated. The concentration of measure result shows that is unlikely to leave a neighbourhood of for a long period of time after ; while it is in this neighbourhood, we can approximate the transitions of the chain by the transitions of a simpler chain whose long-term behaviour is easy to analyse, and show that the two chains therefore have approximately the same distributions over a suitably long time interval.
We proceed by stating and proving a sequence of lemmas. In what follows we drop the superscript , writing instead of , to lighten the notation.
Lemma 5.3**.**
Let be a copy of the Bernoulli-Laplace chain, with . For all starting states , all , and all with , we have
[TABLE]
where
[TABLE]
Proof.
Our plan is to use Theorem 2.3, and accordingly our first step is to describe a contractive coupling.
We fix , and , and let and be two copies of the chain starting in and respectively. We describe a coupling of the chains such that remains equal to 1 until dropping to 0. When the two chains are in adjacent states and with , say with and , then the next step of the coupling is as follows. The two chains jump together up by 1 with probability and down by 1 with probability . Additionally, the lower chain jumps up by 1 alone with probability , and the higher chain jumps down by 1 alone at rate . This leaves probability \frac{1}{n^{2}}\big{(}(n-j)^{2}+(j+1)^{2}\big{)} that both chains stay in their current state. Note that indeed is either 1 or 0, and that
[TABLE]
[TABLE]
for .
The rules above do not define a coupling in the case where or . In the case , for instance, jumps from 0 to 1 with probability 1, and jumps to one of 0, 1, or 2 with probabilities , , and respectively. There is thus no monotone coupling possible. However, when and , the next step of the coupling is forced since with probability 1, and it is still the case that is either 1 or 0. We have
[TABLE]
and similarly for . Hence our coupling is contractive with constant .
We take in Theorem 2.3(a), with , , , and , so that for all . Then, by Theorem 2.3(a), for all , all , and all , we have
[TABLE]
If we set , for , we obtain that , and so
[TABLE]
To complete the proof, it remains to verify the formula for . Observe that
[TABLE]
so that
[TABLE]
and hence
[TABLE]
as claimed ∎
A matching tail bound for the equilibrium distribution follows from Lemma 5.3. In fact, unsurprisingly, sharper tail bounds on the hypergeometric distribution are known: results of Hoeffding [15] (see Section 6 and Theorem 1) imply that, for any ,
[TABLE]
An alternative proof was given by Chvátal [4].
It is now not hard to obtain the claimed lower bound on total variation distance for .
Proof of Theorem 5.2(a).
For , and , we have seen that both and the equilibrium distribution are well-concentrated around their respective means. We will show that, if is in for some fixed , so that , then the means are still far apart at time .
From (5.4), we have that, uniformly in ,
[TABLE]
for all (so that n^{1/2}\Big{(}1-\frac{2}{n}\Big{)}^{\tfrac{1}{4}n\log n}\geq 1/2).
For fixed and with , we set
[TABLE]
By (5.5), we have
[TABLE]
Similarly, using (5.6) and Lemma 5.3, we have that, for any ,
[TABLE]
for all . Hence we have
[TABLE]
uniformly in , which is the required result. ∎
Our proof of the lower bound above is actually very similar to that of Diaconis and Shahshahani: we have obtained an improved result by using Lemma 5.3, giving Gaussian concentration for , instead of appealing to Chebyshev’s inequality.
We now turn to the upper bound. We start by using Lemma 5.3 to show that, for a long period beyond time , the process is unlikely to leave an interval of width around .
Lemma 5.4**.**
For , any , and any starting state ,
[TABLE]
Proof.
For and any starting state , we have from (5.4) that
[TABLE]
for all .
Therefore, at times , , for any starting state and for , we have
[TABLE]
Combining this with Lemma 5.3, we have for ,
[TABLE]
We apply this inequality with , which is greater than for (since ), and deduce that
[TABLE]
The required result now follows. ∎
We remark here that it would be relatively straightforward to complete the proof of cut-off at this point: we can exhibit a coupling between two copies of the chain both remaining close to , such that the distance between the two copies is stochastically dominated by a simple lazy random walk – such a proof would show quickly that the two copies coalesce by time with probability . (A similar argument is used by Eskenazis and Nestoridi [8], based on a discrete-time analogue of Proposition 4.1.) In order to establish the bound (5.3), we need a more precise argument.
For the moment we assume, for simplicity of exposition, that for some positive integer . We consider the walk defined by , , which describes the evolution of beyond the time . The transitions of this walk are given by:
[TABLE]
for .
At least when is small, has transition probabilities close to those of the simpler process , with , and transition probabilities given by
[TABLE]
We shall use as a surrogate for in the argument to come.
The similarity of the transition probabilities (5.8) and (5.9), together with Lemma 5.4, is next used to show that, with high probability, the processes and are almost indistinguishable for a long time.
For a sequence , we denote the initial segment up to time by .
Lemma 5.5**.**
For , and , we have
[TABLE]
Proof.
For a sequence such that the are integers with for all , let the likelihood ratio of the process compared to on the segment be given by
[TABLE]
For , we set , and note that . If , we then have, from the formulae for the transition probabilities, that
[TABLE]
so that, if , it follows that
[TABLE]
Replacing by a path of , we note that is a martingale. Defining
[TABLE]
it follows from (5.11) that the quadratic variation of the martingale until time is at most . Since also , it follows from the Burkholder–Davis–Gundy inequality that
[TABLE]
Define the events and by
[TABLE]
Then
[TABLE]
and, on , . Hence,
[TABLE]
From (5.12) and Kolmogorov’s inequality, and from Lemma 5.4, we have
[TABLE]
Hence, for , we have
[TABLE]
as required. ∎
Thus, with error at most , we can replace by when calculating probabilities, and make only a small error if . Recalling that , this means that the approximation of by is asymptotically accurate over time intervals of length o\bigl{(}(n/\log n)^{2}\bigr{)}.
We now use a coupling argument to show how fast converges to its equilibrium distribution .
Lemma 5.6**.**
For any and , we have
[TABLE]
Proof.
First, we note that the process can equivalently be described by way of a discrete Ehrenfest ball scheme. There are balls, each of which is in state [math] or . At each step, a ball is chosen independently at random from the balls, and its state is chosen to be [math] or , each with probability , independently of the whole past of the process. If balls are in state and in state [math] at step , we say that ; then the probabilities for are easily seen to be given by (5.9), and its equilibrium distribution to be .
We now define a coupling of two copies and of the process , with . Pair the balls in the two processes so that those initially in state in are paired with balls in state in , and those initially in state [math] in are paired with balls in state [math] in ; then pair the remaining balls in the two processes. Couple the evolution by selecting one of these pairs of balls at each step, and re-assigning its state independently (the new state being the same for both and ). Let denote the number of pairs of balls that have not been drawn up to step , made up of in state , in state [math], and of from the pairs of balls with differing initial states. Conditional on , and , we have
[TABLE]
where has distribution . Now, since the distribution is unimodal with mode , we have, for all , that
[TABLE]
It follows that
[TABLE]
implying that
[TABLE]
Now is the probability that all the draws come from some subset of of the matched pairs of balls, and so, for ,
[TABLE]
We also have . Hence, allowing either ordering of and , it follows from (5.14) that
[TABLE]
Setting , and taking to be in equilibrium, we deduce, by taking expectations in (5.15), that
[TABLE]
as desired. ∎
Proof of Theorem 5.2(b).
We combine Lemma 5.5 with Lemma 5.6, replacing by , to deduce that, for any ,
[TABLE]
where we have used (5.7) to reach the last inequality, provided .
The bound in (5.16) remains valid for any initial distribution; taking , so that also , this implies that
[TABLE]
also. (The bound above is valid for any , and is minimised for of order . One could obtain a stronger bound, of order , by direct computation, but this is rather delicate and the gain is not relevant to us.)
Hence, for a sufficiently large multiple of 4, and , we have,
[TABLE]
This bound also holds trivially for . Taking , this proves the result in the case where is a multiple of 4.
If is not divisible by , the argument remains almost the same. Define , and set , as above. The transition rates for are not quite as in (5.8), but they are very close, resulting only in an extra contribution of order to the bounds in (5.10). This correction is of smaller order than , and can be absorbed into the bound (5.13) provided is sufficiently large. The rest of the proof is unchanged. ∎
Diaconis and Shahshahani [6], and other authors, actually consider a more general version, with boxes of unequal sizes. The first box initially contains red balls, and the second black balls. The mixing process runs as before. Our approach can be used for this model as well. The jump probabilities for the process counting the number of red balls in the first box are again quadratic in the current state of the process. When evaluated close to the equilibrium mean , where , these probabilities are close to the linear jump probabilities near equilibrium of another process consisting of balls, coloured red or black, with the following dynamics. At each time step, a ball is chosen. It is left with unchanged colour with probability ; otherwise, it is re-coloured red with probability and black with probability , independently of everything else (so that its colour may in fact still be unchanged). Then denotes the number of red balls at time . The values of and to best match the original process are found to be
[TABLE]
note that, for , as previously, we have , and , corresponding to the approximation made before. With these modifications, an analogous argument can be carried out, to establish cut-off.
6. A two host model of disease
Our next example is a two-dimensional Markov chain in continuous time, representing a two host model of disease, in which transmission only occurs between one host type and the other (snails and human beings in schistosomiasis (Jordan, Webbe and Sturrock [16])), or males and females in sexually transmitted diseases (Hethcote and Yorke [14])). Our framework is appropriate for a disease that is not naturally endemic in a region, being supported at a low level through immigration from outside. In state , there are type- hosts and type- hosts infected. From any state , there are four possible transitions, whose rates are as follows:
[TABLE]
Here, , , , , and are fixed positive constants, and the parameter is a measure of the typical size of the infected population. The first transition corresponds to the infection of a type host, by a type host or from outside, and the second to the infection of a type host. The third transition corresponds to the recovery of a type host, and the fourth to the recovery of a type host. The infection transition rates are appropriate in circumstances in which the host population is so large that the reduction in infection rate caused by some of the population already being infected is negligible, or for diseases such as malaria, when ‘super-infection’ is possible: a host infected more than once is proportionately more infectious – in this case, denotes the total number of infections of each type of host.
Let , where and refer to the distribution conditional on . It follows that satisfies the differential equation , where
[TABLE]
with initial condition . We define , and assume from now on that , so that has both eigenvalues negative, and we denote them by , with corresponding unit (right) eigenvectors and . The differential equation has a non-trivial equilibrium at
[TABLE]
and its full solution is
[TABLE]
showing that the equilibrium is globally attractive when .
For any and any , we define the travel time from state (to within of ) to be
[TABLE]
which, in view of (6.3), is therefore the infimum of times such that .
For , let
[TABLE]
We shall prove the following theorem.
Theorem 6.1**.**
Suppose that . Then, for any , exhibits cut-off at on , with window width .
We first consider the problem of estimating for . Writing as a linear combination of the unit eigenvectors and of , we have
[TABLE]
Then .
For , there is a constant such that, for all , . For “most” states in , there is a matching lower bound, but is as small as when is close to a multiple of .
The rest of this section is devoted to a proof of Theorem 6.1: we give a brief road map of the proof here. Our basic plan is to apply Theorem 3.3 to our chain, showing concentration of measure for while . To this end, we specify a suitable metric, and a Markovian coupling of two copies of the chain which is contracting in Wasserstein distance with respect to that metric. We show that the chain remains within a good set (where, in particular, the total transition rate is bounded) over a long time period. Then we apply Theorem 3.3 to each of the two coordinate projections, showing that both remain concentrated around their means for a long time. We deduce readily that the chain is far from its equilibrium for times less than . On the other hand, once the chain reaches a neighbourhood of , we can use Proposition 4.1 to show that it couples rapidly with an equilibrium copy of the chain, so the total variation distance to the equilibrium copy is small for times only slightly greater than .
The two left eigenvectors of can be written in the form , where is a solution of the equation , with the common value being minus the corresponding eigenvalue. This equation has one negative solution , corresponding to the eigenvalue , and the other solution lying in the interval . Thus we have
[TABLE]
We introduce the norm on , with
[TABLE]
We shall shortly prove that our chain has a contracting coupling with respect to the distance .
Next, we collect some elementary properties of the Markov chain . First, we note that, for , is a -type subcritical Markov branching process with immigration, and hence has an equilibrium distribution . Furthermore, since the process without immigration is sub-critical and has birth and death rates that do not depend on , whereas the immigration rates are multiples of , the mean of is , and its covariance matrix is of the form , for not depending on (see, for example, Quine [28] (Theorem on p. 414 and Equation (29)) for analogues in discrete time).
Next, for use with Theorem 3.3, we show that the chain rarely gets too far from the origin, so that the total transition rate remains bounded. For , we define
[TABLE]
Proposition 6.2**.**
Suppose that . Then there exist positive constants and , depending on the parameters of the model but not on , such that, for any , any , any , and any ,
[TABLE]
Proof.
Let denote the generator of , and define . The first step is to show that, for sufficiently small positive , for all such that is large enough.
Setting for , and , we have:
[TABLE]
We now see that
[TABLE]
We bound the term in (6) above by noting that and are all at most 1, provided , and hence
[TABLE]
Hence, for , we have
[TABLE]
which is non-positive whenever .
Now fix some , and some starting state , so that and therefore and . Fix also some . We will show that the probability that ever exits the set during a fixed time interval is very small for large .
We consider the excursions out of the set during . Note that, each time that enters , it remains there at least for the holding time of the state at which it first enters, which has an exponential distribution with mean at least , for
[TABLE]
This implies that the number of exits of from in is stochastically dominated by a Poisson random variable with mean .
We claim that, each time that leaves , the probability that exceeds the value before returns to is exponentially small in . To prove this, consider starting in some state which can be reached in one step from , so that , and let
[TABLE]
In view of (6.6), is a non-negative supermartingale in . Stopping at , it thus follows that,
[TABLE]
from which it follows that
[TABLE]
It follows that the expected number of times that exits in the interval is at most , establishing the proposition. ∎
We now introduce a Markovian coupling of two copies of the Markov chain , which we will then show to be contracting with respect to the metric on . In this coupling, the two copies and make moves independently in any co-ordinate where they currently differ (so in particular the two copies a.s. never move together in such a co-ordinate), but make moves together as far as possible in co-ordinates where they currently agree.
For each , we denote the transition rate of from to , given in (6.1), by . We then couple copies and of as follows.
Suppose that and . If , then for or , there is a transition to at rate , and a transition to at rate . If , then there is a transition to at rate , a transition to at rate , and a transition to at rate . The transitions in directions and are defined analogously.
Proposition 6.3**.**
The coupling defined above for is contracting with respect to the metric , with constant .
Proof.
If both chains make the same transition at , then the distance between them does not change: . Otherwise, the distance changes by as a result of a jump by either copy in either -direction, or by as a result of a jump by either copy in either -direction.
Let the generator of the process be denoted by . We start by looking at the contribution of the jumps to . If , then , so the two chains always make this transition together, contributing no change to the distance. If , then the jump in occurs at rate and reduces the distance by 1, while the jump in occurs at rate and increases the distance by 1: overall, the net contribution is . The same calculation applies if , so in all cases the contribution of this jump is . Similarly, the contribution of the jump is .
We now turn to the jump. If , the distance increases by 1 whenever one chain makes this jump and the other does not, which occurs at rate . If , a jump in one of the chains increases the distance by 1, while the same jump in the other chain decreases the distance by 1, so the net contribution from this jump is at most , which is again equal to . Similarly, the contribution of the jump is at most .
Referring to (6.4), it follows that, for all states ,
[TABLE]
as required. ∎
We will now apply Theorem 3.3 to the Markov chain , with either of the two co-ordinate projections or . We fix some , and note that, for any , we have , and therefore
[TABLE]
Now we take , so that , and apply Proposition 6.2 with . We see that, for any , and any , the probability that the chain exits the set before time is at most , for some constants and . To apply Theorem 3.3, we take , and note that, for , the total transition rate out of state is at most q:=n\bigl{[}\mu+\nu+2(\theta^{-1}(\alpha+\delta)+\beta+\gamma)H\bigr{]}. If is the first co-ordinate projection , we have , so we may take : for , we need instead . We may also take .
Theorem 3.3 now tells us that, for , all and all , and all ,
[TABLE]
where
[TABLE]
Thus, for some constant depending on the parameters of the model and on , and all , where is sufficiently small, we have
[TABLE]
for , all and all .
Moreover, for a suitable constant , , and for some sufficiently small ,
[TABLE]
From (6.9) and (6.10), it now follows that, for , , and ,
[TABLE]
for suitable constants , and , depending on the parameters of the model and on the choice of .
We are now in a position to prove cut-off for our model.
Proof of Theorem 6.1. A lower bound on the mixing time can now easily be proved, much as in the previous example, by considering the distribution of , for . Let , depending on the parameters of the model, be such that
[TABLE]
By (6.3) and the definition of , we have
[TABLE]
Therefore, using (6.12),
[TABLE]
Let . Then, from (6.11) with , noting that for provided is sufficiently large, we have
[TABLE]
On the other hand, as stated in the discussion before Proposition 6.2, the covariance matrix of the equilibrium distribution of is of the form , with being independent of . It hence follows, using Chebyshev’s inequality, that , with .
This then gives, for a suitable constant and and sufficiently large,
[TABLE]
for any . This establishes the first part of the definition of cut-off in (1.1).
We now turn to the upper bound. We will apply Proposition 4.1 to the Markov chain , where is a copy of the started close to , is another copy in equilibrium, and the pair are coupled as in Proposition 6.3. We use the proposition to show that coalescence occurs quickly with high probability.
Consider a copy of starting from state and couple it with an equilibrium copy , as in Proposition 6.3. For any fixed , we choose so that , and use (6.11) and (6.13) to conclude that
[TABLE]
and similarly for the equilibrium copy . Therefore, with probability at least , we have
[TABLE]
We are now in a position to apply Proposition 4.1 to the function , for . Condition (i) of the proposition is satisfied by Proposition 6.3, and condition (ii) is satisfied with . For condition (iii), note that, if , each of the chains moves while the other does not – and so the distance between the two chains changes by at least – at rate at least . Hence the generator of the quadratic variation process is at least from all states where coalescence has not occurred.
Proposition 6.3 then implies that, on the event that , the probability that coalescence has not occurred by time is at most
[TABLE]
where . For , we conclude that
[TABLE]
Since is in equilibrium, it follows that
[TABLE]
as required for the second part of the definition of cut-off in (1.1).
7. Supermarket model
In this section, we apply our general continuous-time inequality, Theorem 3.1, to a range of instances of the supermarket model. This is a simple and natural model of a queuing system, introduced by Mitzenmacher [25] and Vvedenskaya, Dobrushin and Karpelevich [31], and studied extensively since; see, for instance [21] and [2], which contain other references to related literature.
The supermarket model (in continuous time) with parameters ( and natural numbers, ) is defined as follows. There are servers, each with their own queue of customers, and customers arrive according to a Poisson process with rate . Each arriving customer inspects the queues for of the servers, chosen uniformly at random with replacement, and joins one of the shortest queues among these ; customers cannot subsequently switch to a different queue. At each server, customer service times are iid exponential of mean 1.
The memoryless property of the arrival and service processes means that the supermarket model can be viewed as a continuous-time jump Markov chain, whose state space is the set of possible -tuples of queue lengths. The possible transitions are of two types: (i) departures, where each queue of positive length is shortened by one at rate , and (ii) arrivals, at total rate , where some queue, chosen by the procedure described above, is lengthened by 1. To be precise, on an arrival, an ordered -tuple of queues is chosen uniformly at random from all the possibilities, and the first shortest queue in the list receives the arriving customer and is thus lengthened by 1.
Much of the initial interest in the supermarket model stemmed from its properties as a “low-cost” load-balancing mechanism: for a constant, the maximum queue length in equilibrium is of order when , but of order when is a constant at least 2. In [2] and this paper, we are interested in different ranges of parameters, where tends to 1 from below as , while tends to infinity. In these ranges, as shown in [2], the load-balancing among the servers in equilibrium is close to perfect – the maximum queue length is a given constant with high probability, and most queues have length exactly – even though the system is nearly at full capacity.
For the rest of this section, as in [2], we set , and , where and are fixed constants in . We will assume throughout that
[TABLE]
For , the corresponding range of is thus ; for , the corresponding range for is . Other parameter ranges come into the scope of [2] and, with a little more work, we could prove concentration results for those too.
Theorem 6.1 from [2] gives the general behaviour of the model in a variety of ranges, including this one (referring to that theorem, assumptions (7.1) are equivalent to setting ). The basic result is that, in equilibrium, the chain lies in a “good set” where all queues have length at most 2, with very high probability; it also states that, if the chain is started anywhere within an “interior good set”, then with high probability it remains in the good set for a long period of time. We first set up notation, and then state the part of the result covering our range.
In fact, the model analysed in [2] is a discrete-time variant of the continuous-time model studied here. In that variant, the transition at each time-step is an arrival with probability and a potential departure with probability . If the transition is an arrival, a queue is chosen as in the continuous-time version, and the length of that queue is increased by 1. If the transition is a potential departure, a queue is chosen uniformly at random, and the length of that queue is decreased by 1 if it is not empty. If an empty queue is chosen for a departure, then the chain remains in its current state. An alternative description of the continuous-time model is that events occur according to a Poisson process with rate , and the transition associated with an event is chosen as for the discrete-time model above. A consequence is that the two models have the same equilibrium distribution, and if the probability that the chain remains in some set of states for steps of the discrete chain is at least , then the probability that the chain remains in up to time in the continuous model is at least minus the probability that a Poisson random variable with mean is greater than , which is at least . Similarly, provided , if the total variation distance between the discrete-time supermarket model after or more steps and the equilibrium distribution is at most , then the total variation distance between the continuous-time model and the equilibrium distribution is at most for all times at least .
For , a state in , and , let denote the proportion of queues in of length at least . Let be any function such that and for every . For , and and satisfying the inequalities in (7.1), let be the set of states such that:
[TABLE]
A state in will thus have between empty queues, between queues of length 0 or 1 – most of which will then have length 1 – and the remaining queues all of length 2. As , this implies that the proportion of queues of length exactly 2 tends to 1 as .
The following result is taken from Theorems 6.1 and 1.2 of [2] – in the application of Theorem 1.2, we take so that for sufficiently large; as is remarked after Theorem 10.5 of [2], the conclusion is valid for the full range of stated above. Note that the results in [2] are stated for the discrete-time version of the model; we have derived results for the continuous-time version as described above, and bounded above the error probabilities involved in the translation by .
Theorem 7.1** (Brightwell, Fairthorne and Luczak).**
Given and as above, and and satisfying the inequalities in (7.1), let be a copy of the supermarket process with parameters , where and , in equilibrium. Then, for sufficiently large,
[TABLE]
Moreover, if is a copy of the supermarket process with , then
[TABLE]
and, for sufficiently large and ,
[TABLE]
where denotes the equilibrium distribution.
We will focus on the number of empty queues, and investigate how well is concentrated around its mean for an equilibrium copy of the supermarket process with parameters as above. For , the mean total arrival rate is , while the mean total departure rate is the expected number of non-empty queues, which is . In equilibrium, the mean arrival rate is equal to the mean departure rate, so we have . States in thus all have within of the mean . We shall prove that we have concentration of within of its mean : as , this is a sharper concentration result than is given by Theorem 7.1. It is remarked in [2] that the proof of Theorem 7.1 goes through for , where is sufficiently small: the implied result is still not as strong as we shall prove here, since would have to be strictly less than the minimum of several quantities, one of which is , and this is the smallest of the quantities for part, but not all, of our range – more details can be found in the arXiv version of [2].
The supermarket model is also used as an example by Luczak in [19], to illustrate the concentration inequality derived in that paper. That analysis is based on a natural coupling of two copies of the supermarket model with the same parameters, which we now describe – our proof is also based on this coupling. In the coupling, the arrival times for the two processes are identical, and on an arrival the same ordered -tuple of queues is inspected in the two processes. For each of the queues, a “potential departure” from the queue occurs at rate 1: for each of the copies of the process, if the queue is non-empty at the time of the potential departure, a customer is served and leaves the system at that time. If states and are adjacent (i.e., one can be reached from the other by a single transition), then they differ by 1 in exactly one queue. For an adjacent pair , we call the queue where the two states differ the unbalanced queue, and we say that if the unbalanced queue is longer in than in . If and , where and are adjacent with , then we claim that, under the coupling, the pair remains adjacent, with , until the two copies coalesce. On a departure from the unbalanced queue, coalescence occurs if that queue is already empty in , and otherwise the queue remains unbalanced. If an arriving customer joins the unbalanced queue in , they join that queue in as well. It is also possible that an arriving customer joins the unbalanced queue in and a different queue in ; the states remain adjacent, but a different queue becomes unbalanced.
The analysis in [19] assumes that is a constant, but it is easy to see that the proof there gives concentration around the mean only to within order . For small enough and , this is still a stronger result than that implied by Theorem 7.1, but the result we prove below always gives stronger concentration.
Theorem 7.2**.**
Let be a copy of the supermarket model with parameters , in equilibrium, where , , and satisfy (7.1). Then, for sufficiently large, and any ,
[TABLE]
In particular, if is positive, with , and is sufficiently large, we have
[TABLE]
Our proof will be an application of Theorem 3.1 to the (well-behaved) continuous-time chain . We give the proof below, postponing the proof of a key lemma.
Proof.
We shall apply Theorem 3.1 to the supermarket model with the given parameters, with , the number of empty queues in state . We set , and let be the set . We then consider starting in a state . Note that, for any state , the total transition rate out of state is at most . In order to apply the result, we need to identify a constant satisfying (3.1), and a function satisfying (3.2). We obtain these by analysing the natural coupling of two copies of the chain described above.
Accordingly, we consider a pair of copies starting in adjacent states and with , evolving according to the coupling described above, so that the two copies remain adjacent until coalescence. At any time , and are adjacent or equal, and if they are adjacent then there is one unbalanced queue. Let denote the length of the longer unbalanced queue, or 0 if there is none, at time : the random process is thus a function of the coupled pair , taking values in , making steps up and down by 1, until it steps from 1 to 0 and remains at 0 thereafter. For a pair of adjacent initial states, and , let denote the probability that is equal to .
For an initial adjacent pair of states with , and any time , the difference is equal to 1 when and 0 otherwise, so the quantity is exactly equal to . In particular, we thus have , so we may take .
If , and is adjacent to , then either , or and has a queue of length 3; in the latter case, the transition from to is an arrival in which only queues of length 2 are inspected, and the rate of such arrivals from any state is at most , for sufficiently large. As the total transition rate out of any state is at most , we have, a little crudely,
[TABLE]
where the maximum is over initial pairs where is adjacent to and both are in .
Lemma 7.3**.**
[TABLE]
whenever and are adjacent states in , and .
We postpone a proof of Lemma 7.3 until later, but we now indicate briefly what the terms in (7.3) signify. The final term accounts for the possibility of leaving the set . The first term accounts for the probability that and no transition has occurred before time to change the length of the unbalanced queue. The second term is the main term; roughly speaking, it arises from showing that coalescence occurs in time of order , and, conditional on coalescence not occurring before time , the probability that is of order .
We continue with the main thread of our proof, assuming the bound (7.3) in Lemma 7.3. Given this bound, we may take in (3.2) to be
[TABLE]
and
[TABLE]
for and sufficiently large.
Now consider starting at any state , and let be the event that the process stays within until time . For , Theorem 7.1 tells us that the probability of is at most . We now apply Theorem 3.1, and obtain that, for any , and any ,
[TABLE]
For , we have ; for , we have that for sufficiently large (depending on ). Therefore we have, for any and ,
[TABLE]
The final part of Theorem 7.1 tells us that, for and any in , the total variation distance between and the equilibrium distribution is at most . Thus, choosing so that , we see firstly that , for sufficiently large, where is a copy in equilibrium. Recalling that , this yields that
[TABLE]
for , and the inequality also holds trivially for provided is sufficiently large. We then further deduce that
[TABLE]
which implies the claimed result. ∎
It remains to prove Lemma 7.3. For this, we will use the following technical lemma, a variant of Gronwall’s Lemma.
Lemma 7.4**.**
If is continuous on and, for some ,
[TABLE]
then is non-increasing on and so for all .
Proof.
Suppose for a contradiction that , where . Now take to be the maximum value in such that . By continuity, it follows that for all .
Applying the hypothesis to the times and , we obtain that
[TABLE]
[TABLE]
which gives the desired contradiction. ∎
Proof of Lemma 7.3.
We need to show that, for sufficiently large, (7.3) holds whenever and are adjacent states in , and . Fix adjacent states and in with . Let be the probability that either copy of the chain exits before time ; by Theorem 7.1 (with replaced by ), we have
[TABLE]
Until the copies coalesce, there is an unbalanced queue, with length in and length in ; whatever the length of the unbalanced queue, the rate of departures from the unbalanced queue is 1, and a departure would lead to coalescence if , or reduce the unbalanced queue lengths by 1 if . If , then an arrival does not change unless the process leaves . If , then an arrival increases the length of the unbalanced queue exactly when the arriving customer joins the (empty) unbalanced queue in . The rate of such arrivals depends on the number of empty queues in ; we could give an exact expression, but we content ourselves with loose bounds that are easy to derive. The rate is certainly at most the rate of arrivals in which the unbalanced queue is inspected, which is equal to . Any arriving customer who inspects the unbalanced queue and no other empty queue in – we call such an arrival a critical arrival – will join the unbalanced queue and thus cause to increase from 1 to 2: while is in , the proportion of empty queues in is at most , and so the rate of critical arrivals is
[TABLE]
[TABLE]
for sufficiently large. Hence as long as and is in .
In summary, if , then decreases at rate 1, and increases at a rate between (provided ) and . If , then decreases at rate 1.
We consider the coupled pair of chains up to time , starting from the initial state . Extending our earlier notation, we let and , for .
Applying Dynkin’s formula, as well as the facts we have established about the rates of transitions for , we have that, for ,
[TABLE]
[TABLE]
[TABLE]
We also note that, for ,
[TABLE]
Recall that is the probability that either copy leaves the set before time . Note that
[TABLE]
for all .
Our aim is to prove the upper bound (7.3) on for all . We shall establish that is of order , for larger than about , and that falls off at least as fast as roughly . This implies that the time to coalescence is approximately dominated by an exponential random variable with mean , while, for greater than about , conditional on coalescence not having occurred, the probability that is of order ; these bounds will yield (7.3). In our formal analysis, we shall use Lemma 7.4 several times.
We first consider the function
[TABLE]
From (7.6), (7.7), (7.8) and (7.9), we have
[TABLE]
and therefore from Lemma 7.4 we have that . Rearranging, we obtain that, for ,
[TABLE]
This tells us that, roughly speaking, after a lead-in time of order , the probability that the unbalanced queue has length 1 is at least about times the probability that coalescence has not occurred.
The next step is to use the above to show that falls off at least as fast as roughly . We see from (7.5) and (7.10) that
[TABLE]
Now we consider the function
[TABLE]
We have
[TABLE]
and so
[TABLE]
Therefore, by Lemma 7.4, , and we deduce that, for ,
[TABLE]
Finally, we show that is at most about times . We apply Lemma 7.4 to the function
[TABLE]
From (7.6), (7.7) and (7.8), we have, for ,
[TABLE]
We obtain that , so
[TABLE]
Summing with (7.11) yields, for ,
[TABLE]
and so
[TABLE]
which is the required bound. ∎
Theorem 7.2 gives concentration of the random variable about its mean within order . We note that no such bound can be shown if we rely only on the fact that is a Lipschitz function of the state space. Indeed, coalescence of the Markov chain takes time of order , and the results of [19], [26] or [27] would only give concentration within order of the mean.
We indicate briefly why we expect that concentration of within order of its expectation is best possible. If we look at the transitions of the process over a time period of length , the number of arrivals has fluctuations of order . The analysis in the proof of Theorem 7.2 and Lemma 7.3 suggests that a positive proportion of the extra customers will still be in the system at the end of the period, and approximately a proportion of these will be in queues of length 1, so that fluctuations of order in the number of arrivals during result in fluctuations of order in the number of empty queues at time .
We believe that a similar proof can be used to show sharp concentration of measure results for the supermarket model in the range where and are fixed constants. Here it is known that the proportion of queues of length at least , for each fixed, is close to in equilibrium. For , let be the number of queues at least ; for any state with approximately queues of length for each , and large, the quantity is dominated by terms where the transition from to creates an unbalanced queue of length , and there is no departure from the unbalanced queue before time . Thus we may take at most some constant times , and obtain concentration within order for in equilibrium, at least for large.
Acknowledgements. MJL thanks Monash University for their kind hospitality while part of this work was accomplished. GRB thanks the University of Melbourne for their equally kind hospitality while a different part of the work was accomplished.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] W.J. Anderson (1991). Continuous-time Markov chains - An applications-oriented approach . Springer Series in Statistics. Springer Verlag, New York Inc.
- 2[2] G.R. Brightwell, M. Fairthorne and M.J.Luczak (2018). The supermarket model with bounded queue lengths in equilibrium. J. Stat. Phys. 173 , 1149–1194.
- 3[3] G.R. Brightwell and M.J. Luczak (2013). A fixed-point approximation for a routing model in equilibrium. Preprint, ar Xiv: 1306.5002 .
- 4[4] V. Chvátal (1979). The tail of the hypergeometric distribution. Discr. Math. 25 , 285–287.
- 5[5] J.P. Crametz and P.J. Hunt (1991). A limit result respecting graph structure for a fully connected loss network with alternative routing. Ann. Appl. Probab. 1 , 436–444
- 6[6] P. Diaconis and M. Shahshahani (1987). Time to reach stationarity in the Bernoulli–Laplace diffusion model. SIAM J. Math. Anal. 18 , 208–218.
- 7[7] P.J. Donnelly, P. Lloyd and A. Sudbury (1994). Approach to stationarity of the Bernoulli–Laplace diffusion model. Adv. Appl. Probab. 26 , 715–727.
- 8[8] A. Eskenazis and E. Nestoridi (2020). Cutoff for the Bernoulli-Laplace urn model with o ( n ) 𝑜 𝑛 o(n) swaps. Ann. Inst. H. Poincaré, Probab. Statist. 56 , 2621–2639.
