Attracting Random Walks

Julia Gaudio; Yury Polyanskiy

arXiv:1903.00427·math.PR·June 1, 2020

Attracting Random Walks

Julia Gaudio, Yury Polyanskiy

PDF

TL;DR

This paper introduces the Attracting Random Walks model, analyzing its phase transition in mixing times on graphs, revealing a rich get richer dynamic with a critical temperature affecting convergence speed.

Contribution

The paper defines a new non-reversible Markov chain model on graphs, demonstrating a phase transition in mixing times without relying on Gibbsian stationary distributions.

Findings

01

Mixing time is $O(n ext{log} n)$ at high temperature.

02

Mixing time is exponential in $n$ at low temperature.

03

The model exhibits a dynamic phase transition independent of stationary distribution decomposition.

Abstract

This paper introduces the Attracting Random Walks model, which describes the dynamics of a system of particles on a graph with $n$ vertices. At each step, a single particle moves to an adjacent vertex (or stays at the current one) with probability proportional to the exponent of the number of other particles at a vertex. From an applied standpoint, the model captures the rich get richer phenomenon. We show that the Markov chain exhibits a phase transition in mixing time, as the parameter governing the attraction is varied. Namely, mixing time is $O (n lo g n)$ when the temperature is sufficiently high and $exp (Ω (n))$ when temperature is sufficiently low. When $G$ is the complete graph, the model is a projection of the Potts model, whose mixing properties and the critical temperature have been known previously. However, for any other graph our model is non-reversible and…

Equations356

P (x, y) = ⎩ ⎨ ⎧ \frac{x ( i )}{n} \frac{e x p ( \frac{β}{n} x ( j ) )}{Z} \frac{x ( i )}{n} \frac{e x p ( \frac{β}{n} ( x ( i ) - 1 ) )}{Z} if i \sim j if i = j,

P (x, y) = ⎩ ⎨ ⎧ \frac{x ( i )}{n} \frac{e x p ( \frac{β}{n} x ( j ) )}{Z} \frac{x ( i )}{n} \frac{e x p ( \frac{β}{n} ( x ( i ) - 1 ) )}{Z} if i \sim j if i = j,

Q (x, y) ≜ π (x) P (x, y) .

Q (x, y) ≜ π (x) P (x, y) .

Φ (S) ≜ \frac{Q ( S , S ^{c} )}{π ( S )} .

Φ (S) ≜ \frac{Q ( S , S ^{c} )}{π ( S )} .

Φ_{*} ≜ S : π (S) \leq \frac{1}{2} min Φ (S) .

Φ_{*} ≜ S : π (S) \leq \frac{1}{2} min Φ (S) .

δ (s (i), s (j)) ≜ {1, 0, for s (i) = s (j) for s (i) \neq = s (j) .

δ (s (i), s (j)) ≜ {1, 0, for s (i) = s (j) for s (i) \neq = s (j) .

π (s) = \frac{1}{Z} exp \frac{β}{n} (i, j), i \neq = j \sum δ (s (i), s (j)) .

π (s) = \frac{1}{Z} exp \frac{β}{n} (i, j), i \neq = j \sum δ (s (i), s (j)) .

π (x)

π (x)

= \frac{1}{Z} (x ( 1 ) , x ( 2 ) , \dots , x ( k ) n) exp (\frac{β}{2 n} i = 1 \sum n (x (v (i)) - 1))

= \frac{1}{Z} (x ( 1 ) , x ( 2 ) , \dots , x ( k ) n) exp (\frac{β}{2 n} i = 1 \sum k x (i)^{2} - \frac{β}{2})

= \frac{1}{Z ^{'}} (x ( 1 ) , x ( 2 ) , \dots , x ( k ) n) exp (\frac{β}{2 n} i = 1 \sum k x (i)^{2}) .

(j = 1 \prod l - 1 P (i_{j}, i_{j + 1})) P (i_{l}, i_{1}) = P (i_{1}, i_{l}) (j = 0 \prod l - 2 P (i_{l - j}, i_{l - j - 1})) .

(j = 1 \prod l - 1 P (i_{j}, i_{j + 1})) P (i_{l}, i_{1}) = P (i_{1}, i_{l}) (j = 0 \prod l - 2 P (i_{l - j}, i_{l - j - 1})) .

(\frac{2}{n} \frac{f ( n - 2 )}{f ( n - 2 ) + f ( 1 ) + 1 + d _{v}}) (\frac{1}{n} \frac{1}{f ( n - 1 ) + 1 + 1 + d _{v}}) (\frac{n - 1}{n} \frac{1}{f ( n - 2 ) + 1 + d _{u}}) (\frac{1}{n} \frac{f ( 1 )}{f ( 1 ) + 1 + d _{w}}) .

(\frac{2}{n} \frac{f ( n - 2 )}{f ( n - 2 ) + f ( 1 ) + 1 + d _{v}}) (\frac{1}{n} \frac{1}{f ( n - 1 ) + 1 + 1 + d _{v}}) (\frac{n - 1}{n} \frac{1}{f ( n - 2 ) + 1 + d _{u}}) (\frac{1}{n} \frac{f ( 1 )}{f ( 1 ) + 1 + d _{w}}) .

(\frac{2}{n} \frac{1}{f ( n - 2 ) + f ( 1 ) + 1 + d _{v}}) (\frac{1}{n} \frac{f ( n - 2 )}{f ( n - 2 ) + 1 + f ( 1 ) + d _{v}}) (\frac{1}{n} \frac{1}{1 + 1 + d _{w}}) (\frac{n - 1}{n} \frac{f ( 1 )}{f ( n - 2 ) + f ( 1 ) + d _{u}}) .

(\frac{2}{n} \frac{1}{f ( n - 2 ) + f ( 1 ) + 1 + d _{v}}) (\frac{1}{n} \frac{f ( n - 2 )}{f ( n - 2 ) + 1 + f ( 1 ) + d _{v}}) (\frac{1}{n} \frac{1}{1 + 1 + d _{w}}) (\frac{n - 1}{n} \frac{f ( 1 )}{f ( n - 2 ) + f ( 1 ) + d _{u}}) .

(f (n - 1) + 1 + 1 + d_{v}) (f (n - 2) + 1 + d_{u}) (f (1) + 1 + d_{w})

(f (n - 1) + 1 + 1 + d_{v}) (f (n - 2) + 1 + d_{u}) (f (1) + 1 + d_{w})

(f (n - 2) + 1 + f (1) + d_{v}) (1 + 1 + d_{w}) (f (n - 2) + f (1) + d_{u}) .

(f (n - 2) + 1 + f (1) + d_{v}) (1 + 1 + d_{w}) (f (n - 2) + f (1) + d_{u}) .

t_{mix} (X, (1 - p) \frac{1}{k}) \geq in f {t : x min P (T_{x} \leq t) \geq p} .

t_{mix} (X, (1 - p) \frac{1}{k}) \geq in f {t : x min P (T_{x} \leq t) \geq p} .

T_{x} (δ) ≜ in f {t : X_{t} (u) \leq (1 - δ) n, X_{0} = x} = in f {t : \tilde{X}_{t} (0) \leq (1 - δ) n, \tilde{X}_{0} = F (x)} .

T_{x} (δ) ≜ in f {t : X_{t} (u) \leq (1 - δ) n, X_{0} = x} = in f {t : \tilde{X}_{t} (0) \leq (1 - δ) n, \tilde{X}_{0} = F (x)} .

S ≜ {x \in \tilde{Ω} : x (0) > (1 - δ) n} and S^{c} ≜ \tilde{Ω} ∖ S .

S ≜ {x \in \tilde{Ω} : x (0) > (1 - δ) n} and S^{c} ≜ \tilde{Ω} ∖ S .

p ≜ \frac{1}{e ^{β δ} + Δ},

p ≜ \frac{1}{e ^{β δ} + Δ},

q ≜ \frac{exp ( β ( 1 - δ ) - \frac{β}{n} )}{exp ( β ( 1 - δ ) - \frac{β}{n} ) + e ^{β δ} + Δ - 1},

q ≜ \frac{exp ( β ( 1 - δ ) - \frac{β}{n} )}{exp ( β ( 1 - δ ) - \frac{β}{n} ) + e ^{β δ} + Δ - 1},

\frac{exp ( β ( 1 - δ ) )}{exp ( β ( 1 - δ ) ) + e ^{β δ} + Δ - 1} > q .

\frac{exp ( β ( 1 - δ ) )}{exp ( β ( 1 - δ ) ) + e ^{β δ} + Δ - 1} > q .

r = 0 \sum d Z_{t} (r) \leq \mbox s t r = 0 \sum d \tilde{X}_{t} (r)

r = 0 \sum d Z_{t} (r) \leq \mbox s t r = 0 \sum d \tilde{X}_{t} (r)

P_{π (Z)} (Z (0) \leq E_{π (Z)} [Z (0)] - \overline{ϵ} n) \leq 2 exp (- 2 \overline{ϵ}^{2} n),

P_{π (Z)} (Z (0) \leq E_{π (Z)} [Z (0)] - \overline{ϵ} n) \leq 2 exp (- 2 \overline{ϵ}^{2} n),

P_{π (Z)} (Z (0) \leq (1 - δ) n) \leq 2 exp (- 2 \overline{ϵ}^{2} n) .

P_{π (Z)} (Z (0) \leq (1 - δ) n) \leq 2 exp (- 2 \overline{ϵ}^{2} n) .

t_{mix} (X, \frac{1}{2 k})

t_{mix} (X, \frac{1}{2 k})

t_{mix} (X, \frac{1}{2 k})

t_{mix} (X, \frac{1}{2 k})

= min {t : P (T_{x} (δ) \leq t) \geq \frac{1}{2}, \forall x \in Ω}

= min {t : P (T_{x} (δ) \leq t) \geq \frac{1}{2}, \forall x \in S} .

T_{x}^{Z} ≜ in f {t : Z_{t} \in S^{c}, Z_{0} = F (x)} .

T_{x}^{Z} ≜ in f {t : Z_{t} \in S^{c}, Z_{0} = F (x)} .

P (T_{x} (δ) \leq t) \leq P (T_{x}^{Z} \leq t)

P (T_{x} (δ) \leq t) \leq P (T_{x}^{Z} \leq t)

t_{mix} (X, \frac{1}{2 k}) \geq min {t : P (T_{x}^{Z} \leq t) \geq \frac{1}{2}, \forall x \in S} .

t_{mix} (X, \frac{1}{2 k}) \geq min {t : P (T_{x}^{Z} \leq t) \geq \frac{1}{2}, \forall x \in S} .

T_{π_{z}}^{Z} \geq \mbox s t Geom (2 exp (- 2 \overline{ϵ}^{2} n)) .

T_{π_{z}}^{Z} \geq \mbox s t Geom (2 exp (- 2 \overline{ϵ}^{2} n)) .

v \in V \sum π (S_{v}) \geq P_{π} (\cup_{v \in V} {x (v) = w max x (w)}) = 1. \qed

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Attracting Random Walks

Julia Gaudio111Massachusetts Institute of Technology, United States of America. [email protected] .

Yury Polyanskiy222Massachusetts Institute of Technology, United States of America. [email protected] . Some key suggestions for the proof ideas in this paper came from David Gamarnik, Patrick Jaillet, Eyal Lubetzky, Reza Gheissari, and Yuval Peres, detailed in the acknowledgements. The authors are extremely grateful to these people for the guidance and help the during the progression of this project.

Abstract

This paper introduces the Attracting Random Walks model, which describes the dynamics of a system of particles on a graph with $n$ vertices. At each step, a single particle moves to an adjacent vertex (or stays at the current one) with probability proportional to the exponent of the number of other particles at a vertex. From an applied standpoint, the model captures the rich get richer phenomenon. We show that the Markov chain exhibits a phase transition in mixing time, as the parameter governing the attraction is varied. Namely, mixing time is $O(n\log n)$ when the temperature is sufficiently high and $\exp(\Omega(n))$ when temperature is sufficiently low. When $\mathcal{G}$ is the complete graph, the model is a projection of the Potts model, whose mixing properties and the critical temperature have been known previously. However, for any other graph our model is non-reversible and does not seem to admit a simple Gibbsian description of a stationary distribution. Notably, we demonstrate existence of the dynamic phase transition without decomposing the stationary distribution into phases.

1 Introduction

In this paper, we introduce the Attracting Random Walks (ARW) model. The motivation of the model is to understand the formation of wealth disparities in an economic network. Consider a network of economic agents, each with a certain number of coins representing their wealth. At each time step, one coin is selected uniformly at random, and moves to a neighbor of its owner with a probability that depends on how wealthy the neighbors are. Those who are well-connected and initially wealthy will tend to accumulate more wealth. We refer to particles instead of coins in what follows.

This is a flexible model based on a few principles: There are a fixed number of particles moving around on a graph. Movements are asynchronous, and particles make choices about where to move based on their local environment. The model can encompass a variety of situations. Further, the model can be extended by allowing for multiple particle types, with intra– and inter–group attraction parameters, though we do not consider this extension in this paper. There are many more applications beyond the economic application. As an interacting particle system, it could be relevant for physics or chemistry applications.

This paper analyzes the Attracting Random Walks model and establishes phase transition properties. The difficulty in bounding mixing times, particularly in finding lower bounds, is due to the fact that the stationary distribution cannot be simply formulated. Additionally, the model is not reversible unless the graph is complete (Theorem 3), meaning that familiar techniques do not apply.

We establish the existence of phase transition in mixing time as the attraction parameter, $\beta$ , is varied. Slow mixing for $\beta$ large enough is established by relating the mixing time to a suitable hitting time. Fast mixing for $\beta$ small enough is proven by a path coupling approach that relates the Attracting Random Walks chain to the simple (non-interacting) random walk on the same graph (i.e. with $\beta=0$ ). As a corollary of our main results, we establish properties of the Cheeger cut for the stationary distribution. We find it interesting that even though the stationary distribution is not known analytically for general graphs, we have shown that it undergoes a phase transition (i.e. develops an exponentially small Cheeger cut) by arguing indirectly via mixing times.

The rest of the paper is structured as follows. We describe the dynamics of the model in Section 2, along with some possible applications. The remainder of the paper is focused on properties of the Markov chain governing the dynamics. In Section 2.2 we discuss a link to the Potts model. Section 3 proves the existence of phase transition in mixing time for general graphs, and is the main theoretical contribution of this work. In Section 4, we collect partial results on the version of the model in which particles repel each other instead of attracting, a model we call “Repelling Random Walks.”

2 The Model

2.1 Definitions and Main Results

The model is a discrete time process on a simple graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}$ is the set of vertices and $\mathcal{E}$ is the set of undirected edges. We assume throughout that $\mathcal{G}$ is connected. We write $i\sim j$ if $(i,j)\in\mathcal{E}$ . Let $k=|\mathcal{V}|$ . Initially, $n$ indistinguishable particles are placed on the vertices of $\mathcal{G}$ in some configuration. Let $x(i)$ be the number of particles at vertex $i$ . The particle configuration is updated in two stages, according to a fixed parameter $\beta$ :

Choose a particle uniformly at random. Let $i$ be the location of that particle. 2. 2.

Move the particle to a vertex $j\sim i,j\neq i$ , with probability ${1\over Z}\exp\left(\frac{\beta}{n}x(j)\right)$ . Keep the particle at vertex $i$ with probability ${1\over Z}\exp\left(\frac{\beta}{n}\left(x(i)-1\right)\right)$ , where $Z$ is the normalization constant.

Let $P$ be the transition probability matrix of the resulting Markov chain. Let $e_{i}$ denote the $i$ th standard basis vector in $\mathbb{R}^{k}$ . Then for two configurations $x$ and $y$ such that $y=x-e_{i}+e_{j}$ for $i\sim j$ or $i=j$ , we have

[TABLE]

with $Z=\sum_{l\sim i}\exp\left(\frac{\beta}{n}{x(l)}\right)+\exp\left(\frac{\beta}{n}{\left(x(i)-1\right)}\right)$ .

The probabilities are a function of the numbers of particles at each vertex, excluding the particle that is to move. This modeling choice means that the moving particle is neutral toward itself, and relates the ARW model to the Potts model, as will be explained below.

When $\beta$ is positive (ferromagnetic dynamics), the particle is more likely to travel to a vertex that has more particles. Greater $\beta$ encourages stronger aggregation of the particles. On the other hand, taking $\beta<0$ (antiferromagnetic dynamics) encourages particles to spread. Note that $\beta=0$ corresponds to the case of independent (lazy) random walks.

For an application with $\beta<0$ , consider an ensemble of identical gas particles in a container. We can discretize the container into blocks. Each block becomes a vertex in our graph. Vertices are connected by an edge whenever the corresponding blocks share a face. Note that depending on the type of gas, particles may primarily repel each other, in which case $\beta<0$ , which discourages particles from occupying the same block, does become reasonable. The focus of this paper is the case of $\beta>0$ , though. we collect some results on the $\beta<0$ case as well.

To get an idea of the effect of $\beta$ , Figure 1 displays some instances of the Attracting Random Walks model run for $10^{5}$ steps for different values of $\beta$ . The graph is the $8\times 8$ grid graph, with $n=320$ , for an average of $5$ particles per vertex.

We now state our main results regarding the phase transition in mixing time. We let $\left\|P-Q\right\|_{\text{TV}}$ denote the total variation distance between two discrete probability measures $P$ and $Q$ , and let $d(X,t)\triangleq\max_{x\in\mathcal{X}}\left\|P^{t}(x,\cdot)-\pi\right\|_{\text{TV}}$ be the worst-case (with respect to the initial state) total variation distance for a chain $\{X_{t}\}$ with stationary distribution $\pi$ . Let $t_{\text{mix}}(X,\epsilon)\triangleq\min\left\{t:d(X,t)\leq\epsilon\right\}$ denote the mixing time of a chain $\{X_{t}\}$ .

Theorem 1.

For any graph $\mathcal{G}$ , there exists $\beta_{+}>0$ such that if $\beta>\beta_{+}$ , the mixing time of the ARW model is $\exp(\Omega(n))$ .

Theorem 2.

For any graph $\mathcal{G}$ , there exists $\beta_{-}>0$ such that if $0\leq\beta<\beta_{-}$ , the mixing time of the ARW model is $O(n\log n)$ .

Note that we do not prove that one value $\beta_{+}=\beta_{-}$ satisfies both statements.

Through our analysis of mixing time, we establish a transition in the dynamics of the chain. By standard results, this also indirectly implies that the stationary distribution develops multiple almost disjoint phases for $\beta>\beta_{+}$ , while this is not the case for $\beta<\beta_{-}$ . More precisely, we have the following corollary.

Definition 1 (Cheeger constant [7]).

Let $P$ be the transition matrix of a Markov chain that is irreducible and aperiodic. Let $\mathcal{X}$ denote the state space of the chain, and let $\pi$ be the stationary distribution. Define the edge measure $Q$ by

[TABLE]

For two sets $A,B\subset\mathcal{X}$ , let $Q(A,B)=\sum_{x\in A,y\in B}Q(x,y)$ . For $S\subset\mathcal{X}$ , let

[TABLE]

Finally, the Cheeger constant is defined as

[TABLE]

Our results on fast and slow mixing allow us to indirectly bound the Cheeger constant of the Attracting Random Walks chain on a given graph. We obtain the following corollary of Theorems 1 and 2.

Corollary 1.

Fix a graph $\mathcal{G}$ , and let $P$ be the transition probability matrix of the Attracting Random Walks chain on $G$ . Let $\Phi_{*}$ be the Cheeger constant of $P$ . Then if $0\leq\beta<\beta_{-}$ we have $\Phi_{*}=\frac{1}{O(n\log n)}$ . If $\beta>\beta_{+}$ then $\Phi_{*}=\exp(-\Omega(n))$ .

2.2 Connection to the Potts Model

In the case where $\mathcal{G}$ is the complete graph, the Attracting Random Walks model is a projection of Glauber dynamics of the Curie–Weiss Potts model. The Potts model is a multicolor generalization of the Ising model, and the Curie–Weiss version considers a complete graph. In the Curie–Weiss Potts model, the vertices of a complete graph are assigned a color from $[q]=\left\{1,\dots,q\right\}$ . Setting $q=2$ corresponds to the Ising model.

Let $s(i)$ be the color of vertex $i$ for each $1\leq i\leq n$ . Define

[TABLE]

The stationary distribution of the Potts model, with no external field, is

[TABLE]

The Glauber dynamics for the Curie–Weiss Potts model are as follows:

Choose a vertex $i$ uniformly at random. 2. 2.

Update the color of vertex $i$ to color $k\in[q]$ with probability proportional to $\exp\left(\frac{\beta}{n}\sum_{j\neq i}\delta\left(k,s(j)\right)\right)$ .

Observe that the summation $\sum_{j\neq i}\delta\left(k,s(j)\right)$ is equal to the number of vertices, apart from vertex $i$ , that have color $k$ . Therefore if each vertex in the Potts model corresponds to a particle in the ARW model, and each color in the Potts model corresponds to a vertex in the ARW model, then the ARW model is a projection of the Glauber dynamics for the Potts model. The correspondence is illustrated in Figure 2. Under the correspondence, the ARW chain is exactly the “vector of proportions” chain in the Potts model.

Let $v(i)$ be the vertex location of the $i$ th particle in the ARW model, for $1\leq i\leq n$ . By the correspondence, we show that the stationary distribution of the ARW model is

[TABLE]

Observe that the $\exp\left({\frac{\beta}{2n}\sum_{i}x(i)^{2}}\right)$ factor encourages particle aggregation, while the multinomial encourages particle spread.

The reader is encouraged to refer to [3] for a detailed study of the mixing time of the Curie–Weiss Potts model, for different values of $\beta$ . For instance, [3] shows that there exists $\beta_{s}(q)$ such that if $\beta<\beta_{s}(q)$ , the mixing time is $\Theta(n\log n)$ , and if $\beta>\beta_{s}(q)$ , the mixing time is exponential in $n$ . In the $ARW$ context, these results hold with $q$ replaced by $k$ . On the other hand, when $\mathcal{G}$ is not the complete graph, the correspondence to the Potts model is lost. In fact, the following can be shown:

Theorem 3.

For $n\geq 3$ , the ARW Markov chain is reversible for all $\beta$ if and only if the graph $\mathcal{G}$ is complete.

The non-reversibility can be shown by applying Kolmogorov’s cycle criterion, demonstrating a cycle of states (configurations) that violates the criterion.

Lemma 1 (Kolmogorov’s criterion).

A finite state space Markov chain associated with the transition probability matrix $P$ is reversible if and only if for all cyclic sequences of states $i_{1},i_{2},\dots,i_{l-1},i_{l},i_{1}$ it holds that

[TABLE]

In other words, the forward product of transition probabilities must equal the reverse product, for all cycles of states.

Proof of Theorem 3.

First, if the graph is complete, then the chain is a projection of Glauber dynamics, which is automatically reversible. Now suppose $\mathcal{G}$ is not complete. We apply Kolmogorov’s cycle criterion. In the ARW model, a state is a particle configuration. A cycle of states is then a sequence of particle configurations such that

Subsequent configurations differ by the movement of a single particle. 2. 2.

The first and last configurations are the same.

If $\mathcal{G}$ is not a complete graph, then it is straightforward to show that there exist three vertices $u\sim v\sim w$ such that $u\nsim w$ . Now we demonstrate a cycle of states that breaks Kolmogorov’s criterion. We have the following situation, illustrated by Figure 3. The values $d_{u}$ , $d_{v}$ , and $d_{w}$ indicate the degrees of the vertices, excluding the named vertices. Place $n-2$ particles at $u$ and $2$ particles at $v$ . The particle movements are as follows: $v\rightarrow u$ , $v\rightarrow w$ , $u\rightarrow v$ , $w\rightarrow v$ .

For clarity, let $f(z)=\exp\left(\frac{\beta}{n}z\right)$ . The forward transition probabilities are

[TABLE]

The reverse transition probabilities are

[TABLE]

Canceling factors that appear in both products, we are left comparing

[TABLE]

to

[TABLE]

Observe that $f(z_{1})f(z_{2})=f(z_{1}+z_{2})$ . Taking leading terms, the first product is therefore a degree- $(2n-2)$ polynomial in $e^{\beta}$ . Since $n-2\geq 1$ , the second is a degree- $(2n-4)$ polynomial in $e^{\beta}$ . These polynomials have a finite number of solutions for $e^{\beta}$ , and therefore $\beta$ itself. Therefore the Markov chain is not reversible. ∎

3 Mixing Time on General Graphs

In this section, we show the existence of phase transition in mixing time in the ARW model when $\beta$ is varied, for a general fixed graph. First, we show exponentially slow mixing for $\beta$ suitably large, namely prove Theorem 1 by relating mixing times to hitting times. Next, we show polynomial time mixing for small values of $\beta$ . The proof is by an adaptation of path coupling. We use definitions and notations on Markov chains from [7].

3.1 Slow Mixing

The idea of the proof of slow mixing is to show that with substantial probability, the chain takes an exponential time to access a constant portion of the state space. We now outline the proof, deferring the proofs of the lemmas. First we state a helper lemma.

Lemma 2.

For any graph $\mathcal{G}=\left(\mathcal{V},\mathcal{E}\right)$ , there exists a vertex $v\in\mathcal{V}$ such that for the set of configurations $S_{v}\triangleq\{x:x(v)=\max_{w}x(w)\}$ , it holds that $\pi(S_{v})\geq\nicefrac{{1}}{{k}}$ . In other words, the states where $v$ has the greatest number of particles contribute at least $\nicefrac{{1}}{{k}}$ to the stationary probability mass.

By Lemma 2, there exists a vertex $v$ such that $\pi(S_{v})\geq\nicefrac{{1}}{{k}}$ . Choose any other vertex $u$ . Whenever $x(u)>\nicefrac{{n}}{{2}}$ , we can be sure that $v$ is not the maximizing vertex, and therefore that a set of states having at least $\nicefrac{{1}}{{k}}$ mass under the stationary measure has not been reached. It therefore suffices to lower bound the time until vertex $u$ has lost sufficient particles for vertex $v$ to have the maximum number of particles.

Let $T_{x}\triangleq\inf\{t:X_{t}(u)\leq\frac{1}{2}n,X_{0}=x\}.$ If the probability that $\{X_{t}\}$ has reached the set $\{x\in\Omega:x(0)\leq\nicefrac{{n}}{{2}}\}$ by time $t$ is less than some $p$ , then the total variation distance at time $t$ is at least $(1-p)\frac{1}{k}$ . Therefore we get the following relationship between the mixing time and hitting time:

Proposition 1.

[TABLE]

The problem now reduces to lower bounding this hitting time. The idea is that when particles leave vertex $u$ , there is a strong drift back to $u$ . However, controlling the hitting times of a multidimensional Markov chain is challenging, and direct comparison is difficult to establish. We instead reason by comparison to another Markov chain, $Z$ , which lower-bounds the particle occupancy at vertex $u$ .

Let $l(w)$ be the length of the shortest path connecting vertex $u$ to vertex $w$ . Let $\tilde{X}_{t}$ be a projection of the $X_{t}$ chain defined by $\tilde{X}_{t}(d)\triangleq\sum_{w:l(w)=d}X_{t}(w)$ , and let $\tilde{\Omega}$ be its state space. In other words, the $d$ th coordinate of the projected chain counts the number of particles that are a distance $d$ away from vertex $u$ . Note that $\tilde{X}_{t}(0)=X_{t}(u)$ . We let $F$ denote this projection, writing, $\tilde{X}=F(X)$ . For any $0<\delta<\nicefrac{{1}}{{2}}$ , define

[TABLE]

For some $\delta>0$ to be determined, let

[TABLE]

We now build a chain $Z$ on $\tilde{\Omega}$ coupled to $\tilde{X}$ such that as long as $\tilde{X}_{t}\in S$ , $Z_{t}(0)\stackrel{{\scriptstyle\mathclap{\mbox{\small{st}}}}}{{\leq}}\tilde{X}_{t}(0)$ . Then $T_{x}(\delta)\stackrel{{\scriptstyle\mathclap{\mbox{\small{st}}}}}{{\geq}}\inf_{t}\{Z_{t}\in S^{c}\}$ . The remainder the proof of slow mixing is as follows.

Construct a lower-bounding comparison chain $Z$ satisfying $Z_{t}(0)\stackrel{{\scriptstyle\mathclap{\mbox{\small{st}}}}}{{\leq}}\tilde{X}_{t}(0)$ when $t\leq T_{x}(\delta)$ . 2. 2.

Compute $\mathbb{E}_{\pi_{Z}}\left[Z(0)\right]$ and use a concentration bound to show that $Z(0)\sim\pi_{Z}(0)$ places exponentially little mass on the set $S^{c}$ . 3. 3.

Comparing the chain $X$ to $Z$ , show that $X$ takes exponential time to achieve $X(u)\leq(1-\delta)n$ . The result is complete by $1-\delta>\nicefrac{{1}}{{2}}$ .

We now define the lower-bounding comparison chain $Z$ , which is a chain on $n$ independent particles. These particles move on the discrete line with points $\{0,1,\dots,D\}$ , where $D=diam(\mathcal{G})$ . We first describe the case $D\geq 2$ . Since the comparison needs to hold only when $\tilde{X}_{t}(0)\geq(1-\delta)n$ , we assume that $\tilde{X}_{t}(0)\geq(1-\delta)n$ . The idea is to identify a uniform constant lower bound on the probability of a particle moving closer to $u$ under this assumption, which tells us that once the particle is at $u$ , there is a high probability of remaining there.

Let $\mathcal{N}(u)$ denote the neighbourhood of $u$ , i.e. $\mathcal{N}(u)=\{w:w\sim u\}$ . In the $X$ chain, when a particle is at a vertex $w\notin\{u\}\cup\mathcal{N}(u)$ , its probability of moving to any one of its neighbors is at least

[TABLE]

where $\Delta$ is the maximum degree of the graph. This is because the lowest probability when $\beta$ is large corresponds to placing all $\delta n$ movable particles at some other neighbor of $w$ . When a particle is at a vertex $u$ , it stays there with probability at least

[TABLE]

When a particle is at a vertex $w\in\mathcal{N}(u)$ , it moves to $u$ with probability at least

[TABLE]

Note that $q>p$ .

The transitions of the $Z$ chain are chosen in order to maintain comparison. At each time step, a particle is selected uniformly at random. When the chosen particle is located at $d\notin\{0,1\}$ , the particle moves to $d-1$ with probability $p$ and moves to $\min\{d+1,D\}$ with probability $(1-p)$ . When the chosen particle is located at $d\in\{0,1\}$ , it moves to [math] with probability $q$ , and moves to $d+1$ with probability $1-q$ . The transition probabilities for single particle movements are depicted in Figure 4. When $D=1$ (i.e., $\mathcal{G}$ is the complete graph), we instead have the transitions depicted by Figure 5. Lemma 3 establishes the comparison.

Let $\pi_{Z}$ denote the stationary distribution of the $Z$ chain, and let $\lambda(w)$ be the probability according to $\pi_{Z}$ of a particular particle being located at vertex $w$ in the line graph. The following results about the $Z$ chain are required to complete the proof.

Lemma 3.

For a configuration $x\in\Omega$ , set $Z_{0}=\tilde{X}_{0}=F(x)$ . As long as $t\leq T_{x}(\delta)$ , the chain $Z_{t}$ satisfies

[TABLE]

for all $d\in\{0,1,\dots,D\}$ and $t\in\{0,1,2,\dots\}$ . In particular, $Z_{t}(0)\stackrel{{\scriptstyle\mathclap{\mbox{\small{st}}}}}{{\leq}}\tilde{X}_{t}(0)$ .

Lemma 4.

Recall that $D=diam(\mathcal{G})$ . Let $\delta=\frac{1}{3D}$ and fix $0<\overline{\epsilon}<\nicefrac{{\delta}}{{2}}$ . For all $\beta$ large enough, $\mathbb{E}_{\pi(Z)}\left[Z(0)\right]\geq(1-\delta+\overline{\epsilon})n$ . Moreover,

[TABLE]

which implies

[TABLE]

Proof of Theorem 1.

Recall the choices of $u$ and $v$ above. Lemma 4 tells us that the $Z$ chain places exponentially little stationary mass on the set $S^{c}$ . We now combine this fact with the comparison established in Lemma 3.

Recall $T_{x}(\delta)=\inf\{t:X_{t}(u)\leq(1-\delta)n,X_{0}=x\}=\inf\{t:\tilde{X}_{t}(0)\leq(1-\delta)n,\tilde{X}_{0}=F(x)\}.$ Applying Proposition 1 with $p=\nicefrac{{1}}{{2}}$ ,

[TABLE]

Since $\nicefrac{{1}}{{2}}<1-\delta$ , it also holds that

[TABLE]

The last equality is due to the fact that $\mathbb{P}(T_{x}(\delta)\leq t)=1$ for all $x$ in $S^{c}$ .

Additionally define

[TABLE]

Now because $Z_{t}$ is a lower-bounding chain, it holds that

[TABLE]

for all $x\in S$ and $t\geq 0$ . Therefore,

[TABLE]

Finally, from Lemma 4 we know that $\pi_{Z}(S^{c})\leq 2\exp\left(-2\overline{\epsilon}^{2}n\right)$ . Suppose that $Z_{0}$ is distributed according to $\pi_{Z}$ and consider the hitting time $T_{\pi_{z}}^{Z}$ . It holds that

[TABLE]

Therefore, $t=e^{\Theta(n)}$ time is required for $\mathbb{P}\left(T_{\pi_{Z}}^{Z}\leq t\right)\geq\frac{1}{2}$ . The same is true when $Z_{0}=x$ , for some $x\in S$ . Therefore $\min\left\{t:\mathbb{P}\left(T_{x}^{Z}\leq t\right)\geq\frac{1}{2},\forall x\in S\right\}=e^{\Theta(n)}$ and $t_{\text{mix}}\left(X,\frac{1}{2k}\right)=e^{\Omega(n)}$ , which proves Theorem 1. ∎

We now provide the deferred proofs.

Proof of Lemma 2.

By the Union Bound,

[TABLE]

Proof of Lemma 3.

We show that there exists a coupling $(\tilde{X}_{t},Z_{t})$ satisfying

[TABLE]

for all $d\in\{0,1,\dots,D\}$ and $t\leq T_{x}(\delta)$ . Since $Z_{0}=\tilde{X}_{0}$ , we can pair up the particles at time $t=0$ and design a synchronous coupling, i.e. when a certain particle is chosen in the $\tilde{X}$ process, its copy is chosen in the $Z$ chain. We design the coupling so that for each particle, the $\tilde{X}$ –copy is at least as close to [math] as the $Z$ –copy, for all $t\leq T_{x}(\delta)$ . Note that this implies $\eqref{eq:star}$ for all $d\in\{0,1,\dots,D\}$ and $t<T_{x}(\delta)$ . The uniformity of $p$ and $q$ over all configurations in $S$ ensures that the coupling will maintain the requirement (2), which is established by induction on $t$ . The following analysis applies to both $D\geq 2$ and $D=1$ by considering the relevant cases.

The base case $(t=0)$ holds since $Z_{0}=\tilde{X}_{0}$ . Suppose that at time $t<T_{x}(\delta)$ , each particle in the $\tilde{X}$ chain is at least as close to [math] as its copy in the $Z$ chain. We will show that the same property holds for time $t+1$ . First consider a particle located at [math] in the $Z$ chain. By the inductive hypothesis, its copy must be located at [math] in the $\tilde{X}$ process also, and the corresponding particle in the $X$ chain must be at $u$ . The probability of the particle staying at [math] in the $Z$ chain is smaller than the probability of the corresponding particle staying at $u$ in the $X$ chain, since $q$ is a uniform lower bound on the probability of staying at $u$ . Therefore in this case, the property is maintained in the next time step.

Next consider a particle located at vertex $d\neq 0$ in the $Z$ chain and suppose its copy is located at vertex $d^{\prime}$ in the $\tilde{X}$ process. By the inductive hypothesis, $d^{\prime}\leq d$ . If $d^{\prime}<d-1$ , then clearly the property is maintained in the next step. It remains to consider the cases $d^{\prime}=d$ and $d^{\prime}=d-1$ . Consider the case $d=d^{\prime}$ . We couple the particles so that if the particle in the $Z$ chain moves left to vertex $d-1$ , then the particle in the $\tilde{X}$ process makes the same transition. This coupling is possible by the uniformity of $p$ and $q$ . Otherwise, the particle in the $Z$ chain moves right, and the property is maintained.

Next consider the case $d^{\prime}=d-1$ . It suffices to design a coupling such that if the particle in the $\tilde{X}$ process moves right, then so does the particle in the $Z$ chain. If $d\geq 3$ , this is possible due to the fact that $1-p$ is a uniform upper bound on the probability of moving right from these states. Next suppose that $d=2$ and $d^{\prime}=1$ . The particle in the $\tilde{X}$ process moves right with probability upper-bounded by $1-q$ , which is smaller than $1-p$ for $\delta$ sufficiently small and $n$ sufficiently large. Therefore we can ensure the property in the next step. Finally, suppose $d=1$ and $d^{\prime}=0$ . Due to the fact that $1-q$ is a uniform upper bound on the probability of moving right from these states, we can again construct a coupling that maintains the property. ∎

To prove Lemma 4, we need the stationary probability $\lambda(0)$ .

Proposition 2.

It holds

[TABLE]

The proof of Proposition 2 is deferred to the appendix.

Proof of Lemma 4.

When $D=1$ , we have $\lambda(0)=q$ . Since $q(\beta)\to 1$ as $\beta\to\infty$ , it holds that $\lambda(0)\geq 1-\delta+\overline{\epsilon}$ for $\beta$ large enough. Next, for $D\geq 2$ we have

[TABLE]

We show that $\lambda(0)\geq 1-\delta+\overline{\epsilon}$ for $\beta$ and $n$ large enough. Again, for $\beta$ large enough, we have $q\geq 1-\delta+2\overline{\epsilon}$ . Next,

[TABLE]

where the last inequality holds for $\beta$ and $n$ large enough, since the first term of (4) dominates. Since $p<\frac{1}{\Delta+1}$ , we have $1-2p>\frac{\Delta-1}{\Delta+1}$ . Finally,

[TABLE]

where the first inequality holds for $\beta$ large enough. Substituting into (3), we obtain for $\beta$ and $n$ large enough

[TABLE]

where the second inequality holds for $\beta$ large enough, due to $\nicefrac{{2}}{{D}}-\nicefrac{{3}}{{2}}<0$ for $D\geq 2$ .

We conclude that the expectation is linearly separated from the boundary:

[TABLE]

Next we show concentration. Label all the particles, and define $U_{i}=1$ if particle $i$ is at vertex [math] in the line graph, and $U_{i}=0$ otherwise. Then $Z(0)=\sum_{i}U_{i}$ , and $U_{i}$ is independent of $U_{j}$ for all $i\neq j$ . Applying Hoeffding’s inequality,

[TABLE]

for $c>0$ . Let $c=\overline{\epsilon}$ . Then the above implies

[TABLE]

3.2 Fast Mixing

The proof is by a modification of path coupling, which is a method to find an upper bound on mixing time through contraction of the Wasserstein distance. 333An alternative prove of fast mixing is to use a variable-length path coupling, as introduced in [6]. For further details, see [5]. The following definition can be found in [7], pp. 189.

Definition 2 (Transportation metric).

Given a metric $\rho$ on a state space $\Omega$ , the associated transportation metric $\rho_{T}$ for two probability distributions $\mu$ and $\nu$ is defined as

[TABLE]

where the infimum is over all couplings of $\mu$ and $\nu$ on $\Omega\times\Omega$ .

Definition 3 (Wasserstein distance).

Let $P$ be the transition probability matrix of a Markov chain on a state space $\Omega$ , and let $\rho$ be a metric on $\Omega$ . The Wasserstein distance $W_{\rho}^{P}(x,y)$ of two states $x,y\in\Omega$ with respect to $P$ and $\rho$ is defined as follows:

[TABLE]

In other words, the Wasserstein distance is the transportation metric distance between the next state distributions from initial states $x$ and $y$ .

The following lemma is the path coupling result which can be found in [2] and [7]. Given a Markov chain on state space $\Omega$ with transition probability matrix $P$ , consider a connected graph $\mathcal{H}=\left(\Omega,\mathcal{E}_{\mathcal{H}}\right)$ , i.e. the vertices of $\mathcal{H}$ are the states in $\Omega$ and the edges are $\mathcal{E}_{\mathcal{H}}$ . Let $l$ be a “length function” for the edges of $\mathcal{H}$ , which is an arbitrary function $l:\mathcal{E}_{\mathcal{H}}\to[1,\infty)$ . For $x,y\in\Omega$ , define $\rho(x,y)$ to be the path metric, i.e. $\rho(x,y)$ is the length of the shortest path from $x$ to $y$ in terms of $l$ and $\mathcal{H}$ .

Lemma 5 (Path Coupling).

Under the above construction, if there exists $\delta>0$ such that for all $x,y$ that are connected by an edge in $\mathcal{H}$ it holds that

[TABLE]

then

[TABLE]

where $\text{diam}(\Omega)=\max_{x,y\in\Omega}\rho(x,y)$ is the diameter of the graph $\mathcal{H}$ with respect to $\rho$ .

Our proof of rapid mixing for small enough $\beta$ relies on rapid mixing of a single random walk. The following lemma demonstrates the existence of a contracting metric for a single random walk. It is possible that such a result appears elsewhere, but we are not aware of a published proof.

Lemma 6.

Consider a random walk on $\mathcal{G}$ which makes a uniform choice among staying or moving to any of the neighbors and denote by $Q$ its transition matrix. Let $d(x,y)$ be the expected meeting time of two independent copies of a random walk on a graph started from states $x$ and $y$ . Then $d(x,y)$ is a metric and $Q$ contracts the respective Wasserstein distance. In particular,

[TABLE]

where $d_{\text{max}}=\max_{x,y}d(x,y)$ . Furthermore, if $x\sim y$ , then

[TABLE]

where $d^{\prime}_{\text{max}}=\max_{x,y:x\sim y}d(x,y)$ . 444The statement for $x\sim y$ was pointed out by the reviewer.

Remark 1.

In fact, we can show a stronger result (i.e. with a smaller value in the place of $d_{\max}$ ): we can allow arbitrary Markovian coupling between two copies of the random walk and define $d(x,y)$ to be the meeting time under that coupling.

In order to apply path coupling, we let $\mathcal{H}=\left(\Omega,\mathcal{E}_{\mathcal{H}}\right)$ be a graph on particle configurations, where $(x,y)\in\mathcal{E}_{\mathcal{H}}$ whenever $y=x-e_{i}+e_{j}$ for some pair of distinct vertices $i$ and $j$ in $\mathcal{G}$ . In other words, $x$ and $y$ differ by the position of a single particle. Note that $i$ and $j$ need not be neighboring vertices in $\mathcal{G}$ . For such a pair of neighboring configurations $(x,y)$ , let $l(x,y)=d(i,j)$ . Clearly, $l(x,y)\geq\mathbbm{1}\{x\neq y\}$ . Now for any two configurations $x,y\in\Omega$ , let $\rho(x,y)$ denote the path metric induced by $\mathcal{H}$ and $l(\cdot,\cdot)$ . We show that $\rho(x,y)=l(x,y)$ for neighboring configurations.

Proposition 3.

For any two configurations $x,y$ such that $y=x-e_{i}+e_{j}$ , it holds that $\rho(x,y)=l(x,y)$ .

Let $P_{x}(i,\cdot)$ be the probability distribution of the next location of the selected particle, when it is initially located at vertex $i\in\mathcal{V}$ in configuration $x$ . Recall that $Q(i,\cdot)$ is the probability distribution of the next location of a simple random walk on $\mathcal{G}$ , initially located at vertex $i$ . Note that when $\beta=0$ , it holds that $P_{x}(i,\cdot)=Q(i,\cdot)$ . When $\beta$ is small, $P_{x}(i,\cdot)\approx Q(i,\cdot)$ . Lemma 7 quantifies this statement.

Lemma 7.

For all configurations $x$ and vertices $i\in\mathcal{G}$ , it holds that

[TABLE]

Next, consider two neighbouring configurations $x$ and $y$ . Because only the position of one particle is different between the two configurations, $P_{x}(v,\cdot)\approx P_{y}(v,\cdot)$ . The following lemma makes this precise.

Lemma 8.

Let $x$ and $y$ be neighbouring configurations. Recall that $\Delta$ is the maximum degree of the vertices in $\mathcal{V}$ . The following holds:

[TABLE]

With these results stated, we prove Theorem 2.

Proof of Theorem 2.

Suppose $d(i,j)\geq 1\{i\neq j\}$ is a metric on $\mathcal{G}$ such that a single-particle random walk’s kernel $Q$ satisfies

[TABLE]

for all $i\neq j$ and $d(i,j)\leq d_{max}$ . Note that the existence of such a metric $d(\cdot,\cdot)$ was established in Lemma 6 with an estimate of $\delta=\nicefrac{{1}}{{d_{max}}}$ .

Now we wish to bound $W_{\rho}^{P}(x,y)$ for all neighboring particle configurations $x$ and $y$ related by $y=x-e_{i}+e_{j}$ . We may choose any coupling in order to obtain an upper bound. The coupling will be synchronous: the choice of particle to be moved will be coordinated between the chains. Namely, if the “extra” particle is chosen in configuration $x$ , then so too will the “extra” particle be chosen in configuration $y$ . Similarly, if some other particle is chosen in $x$ , than a particle at the same vertex will be chosen in $y$ . For an illustration, see Figure 6.

Let $X_{1}\sim P(x,\cdot)$ and $Y_{1}\sim P(y,\cdot)$ denote the coupled random variables corresponding to the next configurations. Let $p^{\star}$ be the “extra” particle. Let $\tilde{p}$ be a random variable that denotes the uniformly selected particle. Since our coupling gives an upper bound, we can write

[TABLE]

First, suppose the “extra” particle, $p^{\star}$ , is chosen in both chains. This happens with probability $\frac{1}{n}$ . By Lemma 7, we can couple the distributions $P_{x}(i,\cdot)$ and $P_{y}(j,\cdot)$ to $Q(i,\cdot)$ and $Q(j,\cdot)$ respectively with probability at least $1-\nicefrac{{(e^{\nicefrac{{\beta}}{{2}}}-1)}}{{(e^{\nicefrac{{\beta}}{{2}}}+1)}}$ . In that case, we get contraction by a factor of $(1-\delta)$ . With the remaining probability, we assume the worst-case distance of $d_{\text{max}}$ . Therefore, the conditional Wasserstein distance is upper bounded as follows:

[TABLE]

Next, suppose some other particle (located at $v$ ) is chosen in both chains. This happens with probability $\frac{n-1}{n}$ . We claim

[TABLE]

Indeed, by Lemma 8, we can couple particle $\tilde{p}$ so that it moves to the same vertex in both chains with probability at least

[TABLE]

By Proposition 3, it holds that $\rho(X_{1},Y_{1})=d(i,j)=\rho(x,y)$ in the case that the particle $\tilde{p}$ moves to the same vertex in both chains. Otherwise, an additional distance of at most $2d_{\text{max}}$ is incurred.

Finally, we substitute the bounds (8) and (9) into (7).

[TABLE]

where the last inequality is due to $\rho(x,y)\geq 1$ and $\frac{n-1}{n}<1$ . In order to show contraction, it is sufficient that the expression multiplying $\frac{1}{n}$ be positive:

[TABLE]

For an example of a satisfying $\beta$ , choose $\beta$ so that

[TABLE]

Therefore, we can choose

[TABLE]

Substituting $\beta=\beta_{-}$ into (10), we obtain for some $\delta^{\prime}>0$

[TABLE]

Applying the path coupling lemma (Lemma 5), we obtain

[TABLE]

Setting the right hand side to be less than $\epsilon>0$ in order to bound $t_{\text{mix}}(X,\epsilon)$ ,

[TABLE]

Since

[TABLE]

we have

[TABLE]

Therefore, $t_{\text{mix}}(X,\epsilon)=O(n\log n)$ , which completes the proof of Theorem 2. ∎

Remark 2.

Arguably, a more natural approach to show fast mixing would be through a more traditional path coupling approach: Let $\mathcal{H}$ have an edge between configurations $x$ and $y=x-e_{i}+e_{j}$ if $i$ and $j$ are adjacent vertices in $\mathcal{G}$ . Set $l(x,y)=1$ for adjacent configurations. However, this approach does not yield contraction in the Wasserstein distance, which we show at the end of this section.

We now provide the deferred proofs.

Proof of Lemma 6.

First we verify that $d(x,y)$ is a metric. It holds that $d(x,y)=d(y,x)$ , and $d(x,y)\geq 0$ with equality if and only if $x=y$ . To show the triangle inequality, start three random walks from vertices $x,y,z$ and let $\tau(x,y)$ be the meeting time of the walks started from $x$ and $y$ . The three random walks are advanced according to the independent coupling, and if a pair of walks collides, they are advanced identically starting from that time. Under this coupling, observe that

[TABLE]

and take expectations. Next we show that $W_{\rho}^{Q}(x,y)\leq d(x,y)-1$ for $x\neq y$ . We can choose any coupling of $X_{1}\sim P(x,\cdot)$ and $Y_{1}\sim P(y,\cdot)$ to show an upper bound. Letting $X_{1}\sim P(x,\cdot)$ and $Y_{1}\sim P(y,\cdot)$ be independent, we have

[TABLE]

and

[TABLE]

These two equations imply $W_{\rho}^{Q}(x,y)\leq d(x,y)-1$ . Finally, $d(x,y)-1\leq d(x,y)\left(1-\nicefrac{{1}}{{d_{\text{max}}}}\right)$ . If $x\sim y$ , then we conclude $d(x,y)-1\leq d(x,y)\left(1-\nicefrac{{1}}{{d^{\prime}_{\text{max}}}}\right)$ . ∎

Proof of Proposition 3.

Consider any path from $x$ to $y$ : $(x=x_{0},x_{1},\dots,x_{m-1},x_{m}=y)$ , where $x_{r+1}=x_{r}-e_{i_{r}}+e_{j_{r}}$ for $r\in\{0,1,\dots,m-1\}$ . Then we have

[TABLE]

We claim that we can rearrange this summation to be of the form

[TABLE]

for some sequence $l_{1},\dots,l_{m-1}$ . Indeed, let $\mathcal{I}=\{i_{r}:0\leq r\leq m-1\}$ and $\mathcal{J}=\{j_{r}:0\leq r\leq m-1\}$ be the multisets that collect the “outbound” and “inbound” particle transfers, respectively. The value $i$ must appear one more time in $\mathcal{I}$ than in $\mathcal{J}$ . Similarly, the value $j$ must appear one more time in $\mathcal{J}$ than in $\mathcal{I}$ . All other values appear an equal number of times in $\mathcal{I}$ and $\mathcal{J}$ . By choosing terms $d(i_{r},j_{r})$ in order, beginning with $d(i,l_{1})$ , it is possible to rearrange the sum into the given form. By the triangle inequality for $d(\cdot,\cdot)$ ,

[TABLE]

Therefore, the shortest distance between $x$ and $y$ is along the edge connecting them, and we conclude that $\rho(x,y)=l(x,y)$ for neighboring configurations. ∎

To prove Lemma 7, we state the following proposition.

Proposition 4.

The set of distributions $\{P_{x}(i,\cdot):x\in\Omega\}$ parametrized by the configuration $x$ is contained within the convex set

[TABLE]

Proof.

To show this claim, we compute the ratio $\frac{P_{x}(i,j_{1})}{P_{x}(i,j_{2})}$ when $j_{1},j_{2}\in\mathcal{N}(i)\cup\{i\}$ and $j_{1}\neq j_{2}$ , and show that it is upper bounded by $e^{\beta}$ . There are three cases to consider.

The case $j_{1}=i$ .

[TABLE]

Since $x(j_{1})-x(j_{2})-1\leq n-1<n$ , it holds that $\frac{P_{x}(i,j_{1})}{P_{x}(i,j_{2})}<e^{\beta}$ . 2. 2.

The case $j_{2}=i$ .

[TABLE]

Since $j_{2}=i$ , we have $j_{2}\geq 1$ . Therefore, again $\frac{P_{x}(i,j_{1})}{P_{x}(i,j_{2})}<e^{\beta}$ . 3. 3.

The case $j_{1},j_{2}\neq i$ .

[TABLE]

Proof of Lemma 7.

Recall that $\mathcal{N}(i)$ is the neighbor set of vertex $i$ in graph $\mathcal{G}$ . Let $d=\left|\mathcal{N}(i)\right|$ . We have

[TABLE]

and

[TABLE]

Using Proposition 4,

[TABLE]

The inequality is due to the fact that $\{P_{x}(i,\cdot):x\in\Omega\}\subset P_{\beta}$ and the equality is due to the fact that the maximum of a convex function over a closed and bounded convex set is achieved at an extreme point, namely $\left(\frac{e^{\beta}}{d+e^{\beta}},\frac{1}{d+e^{\beta}},\dots,\frac{1}{d+e^{\beta}}\right)$ . To maximize the right hand side of (11), let $f(d)=\frac{e^{\beta}}{d+e^{\beta}}-\frac{1}{d+1}$ . Then

[TABLE]

Setting $f^{\prime}(d)=0$ we obtain the solutions $d=\pm e^{\nicefrac{{\beta}}{{2}}}$ . The solution $d=e^{\nicefrac{{\beta}}{{2}}}$ is the maximizer. Substituting $d=e^{\nicefrac{{\beta}}{{2}}}$ into (11),

[TABLE]

which completes the proof. ∎

Proof of Lemma 8.

First,

[TABLE]

We will show that each term is upper bounded by $\frac{2\beta}{n}$ . Since there are at most $\Delta+1$ terms, the bound follows.

We compute $\max_{x,y:x\sim y}\left|P_{x}(v,w)-P_{y}(v,w)\right|$ for $w\in\mathcal{N}(v)\cup\{v\}$ . Since $x$ and $y$ are interchangeable, we can drop the absolute value.

[TABLE]

First consider the case that $v\neq w$ . Then

[TABLE]

Let

[TABLE]

Note that $A(y)\leq e^{\nicefrac{{\beta}}{{n}}}A(x)$ for $x\sim y$ . We have

[TABLE]

Next, we consider the case $v=w$ . We have

[TABLE]

We now show that the approach for proving Theorem 2 based on the natural one-step path coupling does not yield the required contraction.

Theorem 4.

Let $\mathcal{H}$ have an edge between configurations $x$ and $y=x-e_{i}+e_{j}$ whenever $i$ and $j$ are adjacent vertices in $\mathcal{G}$ . Let $l(x,y)=1$ for adjacent configurations. There exists a graph $\mathcal{G}$ such that for $\beta=0$ ,

[TABLE]

for some adjacent configurations $x,y$ .

Proof.

Let $\mathcal{G}$ be the 4-vertex path graph. Label the vertices $1,2,3,4$ in order along the path, and consider $x$ and $y$ related by $y=x-e_{2}+e_{3}$ so that the two configurations differ by a transfer from one middle vertex to the other. When $\beta=0$ , the transition probabilities are simple: given that a particle is chosen at vertex $v$ , it moves to vertex $w\in\mathcal{N}(v)\cup\{v\}$ with probability $\frac{1}{deg(v)+1}$ . The optimal coupling of $P(x,\cdot)$ and $P(y,\cdot)$ may be expressed as an optimal solution of a linear program, as follows. Write $x^{\prime}\sim x$ if $x^{\prime}$ is adjacent to $x$ in $\mathcal{H}$ or $x^{\prime}=x$ . For each $x^{\prime}\sim x$ and $y^{\prime}\sim y$ , let $z(x^{\prime},y^{\prime})$ be the probability of the next states being $x^{\prime}$ and $y^{\prime}$ in a coupling. The constraints require the collection of $z$ variables to be a valid coupling, and the objective function calculates the expected distance under the coupling.

[TABLE]

This linear program is known as a Kantorovich problem. Our goal is to show that the optimal objective value is at least $1$ . We will first write down the dual problem. By weak duality, any feasible solution to the dual problem gives a lower bound to the optimal value of the primal problem. Next we will construct a primal solution with objective value equal to $1$ , and apply the complimentary slackness condition to help us construct a dual solution whose objective value is also equal to $1$ . Finally we will conclude that the optimal solution to the primal problem is equal to $1$ , by strong duality. For a reference to linear programming duality, see e.g. Chapter 4 of [1].

First we take the dual of the linear program, introducing dual variables $u(x^{\prime})$ for $x^{\prime}\sim x$ and $v(y^{\prime})$ for $y^{\prime}\sim y$ :

[TABLE]

This linear program is a Kantorovich dual problem. By weak duality, if there exists a dual solution with objective value $Z$ , then the optimal solution of the primal is at least $Z$ . Therefore our goal is to find a dual solution with objective value at least $1$ .

For $x^{\prime}=x-e_{a}+e_{b}$ with $a,b\in\{1,2,3,4\}$ , $P(x,x^{\prime})=\frac{x(a)}{n\left(\deg(a)+1\right)}$ . Similarly, for $y^{\prime}=y-e_{a}+e_{b}$ , $P(y,y^{\prime})=\frac{y(a)}{n\left(\deg(a)+1\right)}$ . The value of $\rho$ is given by

[TABLE]

There exists a primal solution with objective value $1$ : Set

[TABLE]

and

[TABLE]

Other values of $z(x^{\prime},y^{\prime})$ are set to zero. In other words, $z$ describes a synchronous coupling according to the pairing in Figure 6, with particles moving in the same direction always. Now supposing this is an optimal solution, we apply complementary slackness to identify candidate dual optimal solutions. The complementary slackness condition states that if $z$ and $(u,v)$ are optimal primal and dual solutions, then it holds that for all $x^{\prime}\sim x,y^{\prime}\sim y$ ,

[TABLE]

If our primal solution $z$ is optimal, then whenever $z(x^{\prime},y^{\prime})\neq 0$ , we need $u(x^{\prime})+v(y^{\prime})=\rho(x^{\prime},y^{\prime})$ . These additional constraints help us construct the following dual feasible solution:

[TABLE]

We find that the objective value of this solution is equal to $1$ . By strong duality, we conclude that the optimal value of the primal problem is equal to $1$ , and therefore there does not exist a contractive coupling. ∎

Remark 3.

The argument in the proof of Theorem 4 should apply to all graphs $\mathcal{G}$ that contain the a four-vertex path graph as a subgraph, and possibly to other graphs as well.

3.3 Bounding the Cheeger constant

The following results will be useful in proving Corollary 1.

Lemma 9 ([7]).

Let $\Phi_{*}$ be the Cheeger constant of an aperiodic and irreducible Markov chain, and let $t_{\text{mix}}$ be its mixing time. Then

[TABLE]

The following result follows directly from Equation 2.28 in [4].

Lemma 10.

Let $P$ be the transition probability matrix of an aperiodic and irreducible Markov chain on state space $\mathcal{X}$ , satisfying $P(x,x)\geq\frac{1}{2}$ for all $x\in\mathcal{X}$ . Let $\pi$ denote the stationary distribution, and let $\pi_{\text{min}}=\min_{x\in\mathcal{X}}\pi(x)$ . Let $\Phi_{*}$ be the Cheeger constant of this chain, and let $t_{\text{mix}}$ be its mixing time. Then

[TABLE]

Proof of Corollary 1.

We first prove the lower bound. Let $0\leq\beta<\beta_{-}$ . By Lemma 9 and Theorem 2, we have

[TABLE]

We next prove the upper bound. Let $\overline{P}=\frac{1}{2}(P+I)$ denote the lazy version of the ARW chain. Note that $\pi$ is also the stationary distribution of the lazy chain. Let $\overline{\Phi}$ denote its Cheeger constant. Observe that for all $S\subset\Omega$ , it holds that $\Phi(S)=2\overline{\Phi}(S)$ . Therefore,

[TABLE]

and so it suffices to show $\overline{\Phi}_{*}=e^{-\Omega(n)}$ .

Fix $\beta>\beta^{+}$ . We claim that the mixing time of the lazy chain is $e^{\Omega(n)}$ . To show this, we modify the proof of Theorem 1 as follows. We define $\overline{Z}$ to be the lazy version of the $Z$ chain. That way, we can couple the $\overline{Z}$ chain to the chain advancing according to $\overline{P}$ . Observe that the lower-bounding property still holds. Furthermore, the hitting time of the $\overline{Z}$ chain to the set $\{x\in\tilde{\Omega}:x(0)\leq(1-\delta)n\}$ is greater than the hitting time for the $Z$ chain. Finally, since the stationary distributions of the $\overline{Z}$ and $Z$ chains coincide, the rest of the proof follows identically.

We now apply Lemma 10 to the lazy chain. We need a lower bound on $\pi_{\text{min}}$ . Observe that if $t\in\mathbb{N}$ and $p\in[0,1]$ are such that $P^{t}(x,y)\geq p$ for all $x,y\in\Omega$ , then $\pi_{\text{min}}\geq p$ . Set $t=n\cdot diam(\mathcal{G})$ . There exists at least one sequence of possible transitions in $t$ steps to get from $x$ to $y$ . Each step in the sequence has probability at least $\frac{1}{n(\Delta+e^{\beta})}$ . Therefore,

[TABLE]

By Lemma 10, we have

[TABLE]

Since $\log(x)\geq 1-\frac{1}{x}$ for $x>0$ , we have

[TABLE]

Substituting the lower bound for $\pi_{\text{min}}$ , we obtain

[TABLE]

which implies $\overline{\Phi}_{*}=e^{-\Omega(n)}$ . ∎

4 Repelling Random Walks

Throughout our analysis, we have only considered $\beta\geq 0$ . However, the case $\beta<0$ (“Repelling Random Walks”) is theoretically and practically interesting to study also. Simulations confirm the intuition that the particles behave like independent random walks when $\beta$ is close to zero, and spread evenly when $\beta$ is very negative (see Figure 7). We conjecture that there are not any hard-to-escape subsets of the state space for all $\beta<0$ .

Conjecture 1.

For all $\beta<0$ and any graph, the mixing time of the ARW model is polynomial in $n$ .

We consider two cases: the extreme case of $\beta=-\infty$ , and the case where $\mathcal{G}$ is the complete graph, for certain values of $\beta$ .

4.1 The Case $\beta=-\infty$

Theorem 5.

When $\beta=-\infty$ , the mixing time of the Attracting Random Walks model is $O(n^{2})$ .

Proof.

When $\beta=-\infty$ , the dynamics are simplified. Suppose a particle is chosen at vertex $i$ . Let $A$ be the set of vertices corresponding to the minimal value(s) of $\{x(i)-1\}\cup\{x(j):j\sim i\}$ . The chosen particle moves to a vertex among those in $A$ , uniformly at random.

Our goal is to show that the set

[TABLE]

satisfies the following three properties: (1) It is absorbing, meaning that once the chain enters $C$ , it cannot escape $C$ ; (2) The chain enters $C$ in polynomial time; (3) Within $C$ , the chain mixes in constant time with respect to $n$ .

We claim that the maximum particle occupancy cannot increase, and the minimum particle occupancy cannot decrease. We now show that the maximum particle occupancy, $M_{t}\triangleq\max_{v}X_{t}(v)$ , is monotonically non-increasing over time. Suppose that at time $t$ , a particle at vertex $i$ is selected and moves to vertex $j$ . There are five cases:

$i=j$ . The maximum does not change. 2. 2.

$i\neq j$ , and both are maximizers. This case is not possible, since $x(j)>x(i)-1$ . 3. 3.

$i\neq j$ , $i$ is a maximizer, and $j$ is not. The new maximum value is at most $M_{t}$ , in the case that $X_{t}(j)=X_{t}(i)-1$ . 4. 4.

$i\neq j$ , $i$ is not a maximizer, and $j$ is. This case is not possible, since $x(j)>x(i)-1$ . 5. 5.

$i\neq j$ , $i$ and $j$ are not maximizers. The new maximum value is at most $M_{t}$ , in the case that $X_{t}(j)=X_{t}(i)-1$ .

Therefore $M_{t+1}\leq M_{t}$ . A similar argument shows that the minimum particle occupancy is monotonically non-decreasing over time. Together, they imply Property (1).

Next, we show Property (2). Assume $X_{t}\notin C$ . Let $\mathcal{M}_{t}$ be the set of maximizing vertices at time $t$ . We claim there exists at least one vertex $u\in\mathcal{M}_{t}$ such that there exists a path of distinct vertices $u=i_{1}\sim i_{2}\sim\dots\sim i_{p}$ satisfying $x_{i_{2}}=x_{i_{3}}=\dots=x_{i_{p-1}}=M_{t}-1$ and $x_{i_{p}}\leq M_{t}-2$ (allowing $p=2$ ). In other words, there is a walkable path from $u=i_{1}$ to $i_{p}$ . The maximum length of the path is $k-1$ . The probability that a particle is transferred along this path before any other events happen is therefore lower bounded by

[TABLE]

Therefore the probability that such a transfer happens within $T_{1}$ trials is at least

[TABLE]

If there had been at least two maximizing vertices to start, the number of maximizing vertices would have fallen by $1$ . If there had been only one maximizing vertex to start, the maximum value itself would have fallen by $1$ .

We see that there are two types of “good” events: reducing the number of maximizing vertices while the maximum value stays the same, or reducing the maximum value. We claim that the number of “good” events that happen before the chain enters the set $C$ is upper bounded by $n^{2}$ . Indeed, imagine that the particles at each vertex are stacked vertically. A particle movement from vertex $i$ to vertex $j$ is interpreted as a particle moving from the top of the stack at vertex $i$ to the top of the stack at vertex $j$ . Observe that the height of a particle cannot increase. Further, each particle’s height can fall by at most $n-1$ units over time, and can therefore drop at most $n-1$ times. Since all good events require a particle’s height to drop, the number of good events is at most $n(n-1)<n^{2}$ . Let $T_{2}=\lceil 2n^{2}\frac{1}{p}\rceil$ be the number of trials of length $T_{1}$ each. Let $N$ be the number of successes during the $T_{2}$ trials. By the Hoeffding inequality,

[TABLE]

Since $\mathbb{E}[N]=p\lceil 2n^{2}\frac{1}{p}\rceil\geq 2n^{2}$ ,

[TABLE]

Therefore the probability that the chain is in $C$ after $T_{1}\times(k-1)\times T_{2}$ steps is at least $1-2\exp(-\frac{1}{2}pn^{2})$ . For an example, we can even set $T_{1}=1$ . Then

[TABLE]

Therefore, within $O(n^{2})$ steps, the chain is in $C$ with high probability.

Finally, we show Property (3). Once the chain is in $C$ , there are two types of vertices: those that have $\left\lfloor\frac{n}{k}\right\rfloor$ particles, and those that have $\left\lfloor\frac{n}{k}\right\rfloor+1$ particles. Note that there are always $\tilde{k}\triangleq n-k\lfloor\frac{n}{k}\rfloor$ vertices with the higher number of particles. Therefore it is equivalent to study an exclusion process with just $\tilde{k}$ particles on the graph $\mathcal{G}$ . With probability $\left\lfloor\frac{n}{k}\right\rfloor\cdot\frac{k-\tilde{k}}{n}$ , an unoccupied vertex is selected, and the chain stays in place. With the remaining probability, an occupied vertex is chosen uniformly at random. Its particle then moves to a neigboring empty vertex or stays where it is, uniformly at random. Equivalently, the chain is lazy with probability $\left\lfloor\frac{n}{k}\right\rfloor\cdot\frac{k-\tilde{k}}{n}$ , and otherwise one of the $\tilde{k}$ particles is chosen, and either stays or moves to a neighbor. Since the number of particles $\tilde{k}$ can be upper and lower bounded by constants ( $0\leq\tilde{k}\leq k$ ), the mixing time within $C$ is independent of $n$ . Therefore, we conclude that the overall mixing time is $O(n^{2})$ . ∎

4.2 The Complete Graph Case

Note that the complete graph case for $\beta<0$ is equivalent to the vector of proportions chain in the antiferromagnetic Curie–Weiss Potts model.

Theorem 6.

On the complete graph with $k$ vertices, the mixing time is $O\left(n\log n\right)$ for all $\beta$ satisfying $-\frac{k}{10}<\beta\leq 0$ .

The proof relies on the following two lemmas.

Lemma 11.

Let $\left(X_{t},t\geq 0\right)$ be the ARW chain for any $\beta<0$ and let $\left(Y_{t},t\geq 0\right)$ be a chain of independent particles ( $\beta=0$ ). Set $X_{0}=Y_{0}$ . For every vertex $v$ and time $t$ ,

[TABLE]

For $\lambda\geq 0$ , let $C(\lambda)\triangleq\left\{x:\left|x(v)-\frac{n}{k}\right|\leq\lambda n\right\}$ .

Lemma 12.

On the complete graph, if $y=x-e_{i}+e_{j}$ and $x,y\in C(\lambda)$ , then

[TABLE]

for

[TABLE]

The proof of Lemma 11 appears later in this section, and the proof of Lemma 12 is deferred to the appendix due to its technical nature.

Proof of Theorem 6.

We may assume that $n$ is large enough so that

[TABLE]

Let $\{Y(v),v\in\mathcal{V}\}$ be a random variable distributed according to the stationary distribution of the $\{Y_{t}(v),v\in\mathcal{V},t\geq 0\}$ chain at stationarity. At stationarity, the vertex occupancies are strongly concentrated around their means. By the Hoeffding Inequality, for every $\lambda>0$ ,

[TABLE]

for every vertex $v$ .

Fix $\epsilon>0$ . We wish to upper bound $t_{\text{mix}}(X,\epsilon)$ . Note that the mixing time of the $Y$ chain is $O(n\log n)$ . To see this, consider a synchronous coupling. The expected amount of time to select all the particles is $O(n\log n)$ , and whenever a particle is selected, it moves to a uniformly random location, which is coupled. Now, for all $\epsilon^{\prime}$ , $T_{1}\triangleq t_{\text{mix}}\left(Y,\epsilon^{\prime}\right)=O(n\log n)$ . Therefore at time $T_{1}$ , for every $\lambda>0$ ,

[TABLE]

for every vertex $v$ . By Lemma 11, it also holds that for every $\lambda>0$ ,

[TABLE]

for every vertex $v$ . Recall that $C(\lambda)=\left\{x:\left|x(v)-\frac{n}{k}\right|\leq\lambda n\right\}$ . Then by the Union Bound,

[TABLE]

for every $\lambda$ and $v$ . We observe that for $n$ large enough, there is always an $\epsilon^{\prime}$ small enough so that

[TABLE]

Then with probability at least $1-\nicefrac{{\epsilon}}{{2}}$ , $X_{T_{1}}$ belongs to $C(\lambda)$ .

Next, we establish that for every $\beta<0$ , there exists $\lambda_{\beta}$ such that (1) once the chain enters $C(\lambda_{\beta})$ , it takes exponential time to leave $C(2\lambda_{\beta})$ , with high probability; (2) we can applying path coupling within $C(2\lambda_{\beta})$ . The first claim is due to comparison with the $\beta=0$ chain, as established above.

We now demonstrate the required contraction for path coupling within $C(2\lambda)$ . Recall that we need to define the edges of the graph $\mathcal{H}=\left(\Omega,\mathcal{E}_{\mathcal{H}}\right)$ and choose a length function on the edges. Let $(x,y)\in\mathcal{E}_{\mathcal{H}}$ if $y=x-e_{i}+e_{j}$ for some $i\neq j$ , and let $l(x,y)=1$ . Consider any pair of neighboring configurations $x$ and $y$ . We employ a synchronous coupling, as in Figure 6. Namely, the “extra” particle at vertex $i$ in configuration $x$ is paired to the “extra” particle at vertex $j$ in configuration $y$ . All other particles are paired by vertex location. When a particle is selected to be moved in the $x$ configuration, the particle that it is paired to in the $y$ configuration is also selected to be moved.

With probability $\frac{n-1}{n}$ , one of the $(n-1)$ pairs that has the same vertex location is chosen. Suppose it is located at vertex $v$ . We couple the transitions in the two chains according to the coupling achieving the total variation distance $\left\|P_{x}(v,\cdot)-P_{y}(v,\cdot)\right\|_{\text{TV}}$ .

By Lemma 12, when one of the $(n-1)$ particles paired by vertex location is chosen, we can couple them so that they move to the same vertex with probability at least

[TABLE]

With the remaining probability, the distance increases by at most $2$ .

With the remaining $\frac{1}{n}$ probability, the “extra” particle is chosen in both chains. The chains can then equalize with probability $1$ because $P_{x}(i,\cdot)=P_{y}(j,\cdot)$ on the complete graph. Therefore, we can bound the Wasserstein distance as follows:

[TABLE]

Therefore, in order to achieve contraction, it suffices that

[TABLE]

Fix $0<\delta<1$ , and let $\lambda_{\beta}=\frac{1}{4\beta}\log(1-\delta)>0$ . Then substituting $\lambda=\lambda_{\beta}$ , we obtain the condition

[TABLE]

When $\beta\leq 0$ is such that $-10\beta<k$ , there exists $\delta>0$ small enough so that the condition (13) holds. We conclude that contraction holds for $-\nicefrac{{k}}{{10}}<\beta\leq 0$ .

To summarize the argument, we have shown that in time $O(n\log n)$ , the chain enters $C(\lambda_{\beta})$ . After that, the chain leaves the larger set, $C(2\lambda_{\beta})$ , with exponentially small probability, which can be disregarded. Within $C(2\lambda_{\beta})$ , the Wasserstein distance with respect to the chosen $\mathcal{H}$ and $\rho$ contracts by a factor of $\left(1-\theta\left(\frac{1}{n}\right)\right)$ , so an additional $O\left(n\log n\right)$ steps are sufficient. Therefore, the overall mixing time is $O\left(n\log n\right)$ . ∎

Proof of Lemma 11.

We claim that there exists a coupling of $\{X_{t},Y_{t}\}$ such that for all $v$ and $t$ , $\left|X_{t}(v)-\frac{n}{k}\right|\leq\left|Y_{t}(v)-\frac{n}{k}\right|$ . Let $\tilde{X}_{t}(v)=\left|X_{t}(v)-\frac{n}{k}\right|$ and define $\tilde{Y}_{t}(v)$ similarly. We claim that for all configurations $x$ and vertices $v$ , if $x(v)\neq\frac{n}{k}$ , then

[TABLE]

and

[TABLE]

If $x(v)=\frac{n}{k}$ , then

[TABLE]

and

[TABLE]

In other words, the inequalities (14)-(17) state that the $X$ chain is less likely to move in the absolute value–increasing direction, and more likely to move in the absolute value–decreasing direction. These inequalities, along with the fact that $X_{0}=Y_{0}$ , suffice to prove the lemma.

The transitions for the $Y_{t}(v)$ process are $+1$ with probability $\left(1-\frac{Y_{t}(v)}{n}\right)\frac{1}{k}$ , and $-1$ with probability $\frac{Y_{t}(v)}{n}\frac{k-1}{k}$ . With the remaining probability, $Y_{t+1}(v)=Y_{t}(v)$ . Suppose $x(v)\neq\frac{n}{k}$ . There are two cases to analyze when $x(v)\neq\frac{n}{k}$ :

$X_{t}(v)<\frac{n}{k}$ . The probability that $X_{t+1}(v)=X_{t}(v)-1$ is upper bounded by $\frac{X_{t}(v)}{n}\frac{k-1}{k}$ , because vertex $v$ is a more likely than average destination. In other words, it is harder to lose a particle from vertex $v$ that has fewer than the average number of particles when $\beta<0$ , compared to when $\beta=0$ . Formally,

[TABLE]

For the same reason, the probability that $X_{t+1}(v)=X_{t}(v)+1$ is lower bounded by

[TABLE]

Therefore, inequalities (14) and (15) hold in this case. 2. 2.

$X_{t}(v)>\frac{n}{k}$ . This time, $v$ is a less likely than average destination. The probability that $X_{t+1}(v)=X_{t}(v)-1$ is lower bounded by

[TABLE]

The probability that $X_{t+1}(v)=X_{t}(v)+1$ is upper bounded by

[TABLE]

Therefore, inequalities (14) and (15) hold in this case also.

Finally, suppose $x(v)=\frac{n}{k}$ . Then the probability of losing a particle is upper bounded by $\frac{1}{k}\frac{k-1}{k}$ , and the probability of gaining a particle is upper bounded by $\frac{k-1}{k}\frac{1}{k}$ . Therefore, inequalities (16) and (17) hold.

We conclude that such a coupling exists, and therefore the stochastic dominance holds. ∎

5 Conclusion

In this paper we have introduced a new interacting particle system model. We have shown that for any fixed graph, the mixing time of the Attracting Random Walks Markov chain exhibits phase transition. We have also partially investigated the Repelling Random Walks model, and we conjecture that model is always fast mixing. Beyond theoretical results, it is our hope that the model will find practical use.

6 Appendix

Proof of Proposition 2.

To compute the stationary probabilities $\lambda(r),r\in\{0,1,\dots,D\}$ , note that we can disregard the initial uniform particle choice, and simply consider a Markov chain on a graph with $(D+1)$ nodes as in Figure 4 or 5.

When $D=1$ , we have $\lambda(0)=q\lambda(0)+q\lambda(1)\implies\lambda(0)=q$ .

Next, consider $D=2$ . We have

[TABLE]

Since $\lambda(0)+\lambda(1)+\lambda(2)=1$ , we have

[TABLE]

and so

[TABLE]

Finally, consider the case $D\geq 3$ . We solve the equations for the stationary distribution.

[TABLE]

Using Equations (18)-(20),

[TABLE]

Since $\sum_{i=0}^{D}\lambda(i)=1$ ,

[TABLE]

Substituting into (21)

[TABLE]

Proof of Lemma 12.

Let

[TABLE]

Then we can write

[TABLE]

and

[TABLE]

To check the sign of $P_{x}(v,w)-P_{y}(v,w)$ , it is equivalent to check the sign of

[TABLE]

Next we show that for fixed $v$ , the sign of $P_{x}(v,w)-P_{x}(v,w)$ is the same for all $w\not\in\{i,j\}$ . Suppose $w\not\in\{i,j\}$ . Then $\exp\left(\frac{\beta}{n}{x(w)}\right)=\exp\left(\frac{\beta}{n}{y(w)}\right)$ , and is equivalent to check the sign of the expression $D(x)-C(x)$ . Since this expression does not depend on $w$ , we conclude that the sign is the same for all $w\not\in\{i,j\}$ .

If $P_{x}(v,w)-P_{x}(v,w)\geq 0$ for all $w\not\in\{i,j\}$ , then

[TABLE]

Similarly, if $P_{x}(v,w)-P_{x}(v,w)<0$ for all $w\not\in\{i,j\}$ , then

[TABLE]

Therefore,

[TABLE]

Consider the ratio of denominators of $P_{x}(v,w)$ and $P_{y}(v,w)$ . We have

[TABLE]

We first bound $\left|P_{x}(v,i)-P_{y}(v,i)\right|$ . If $v\neq i$ , we obtain

[TABLE]

Similarly, if $v=i$ , we obtain

[TABLE]

We similarly bound $\left|P_{x}(v,j)-P_{y}(v,j)\right|$ . If $v\neq j$ , we obtain

[TABLE]

If $v=j$ , we obtain

[TABLE]

For any choice of $v$ ,

[TABLE]

Therefore,

[TABLE]

Recall that $x\in C(\lambda)$ . We upper bound by setting $x(i)$ and $x(j)$ to their lower bounds, and $x(u)$ to its upper bound for $u\not\in\{i,j\}$ .

[TABLE]

where in the second-last inequality we have used the fact that $1+z\leq e^{z}$ and the last inequality holds when $e^{-\nicefrac{{3\beta}}{{n}}}\leq\nicefrac{{5}}{{4}}$ . ∎

Bibliography7

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Bertsimas and J. N. Tsitsiklis, Introduction to linear optimization , Athena Scientific, Dynamic Ideas, 1997.
2[2] R. Bubley and M. E. Dyer, Path coupling: A technique for proving rapid mixing in Markov chains , 38th Annual Symposium on Foundations of Computer Science (1975).
3[3] P. Cuff, J. Ding, O. Louidor, E. Lubetzky, Y. Peres, and A. Sly, Glauber dynamics for the mean-field Potts model , Journal of Statistical Physics 149 (2012), 432–477.
4[4] J. A Fill, Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the Exclusion Process , The Annals of Applied Probability 1 (1991), no. 1, 62–87.
5[5] Julia Gaudio, Investigations in applied probability and high-dimensional statistics , Ph.D. thesis, Massachusetts Institute of Technology, 2020.
6[6] Thomas P. Hayes and Eric Vigoda, Variable length path coupling , Random Structures and Algorithms 31 (2007), no. 3, 251–272.
7[7] David A. Levin and Yuval Peres, Markov chains and mixing times , 2 ed., American Mathematical Society, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Attracting Random Walks

1 Introduction

2 The Model

2.1 Definitions and Main Results

Theorem 1**.**

Theorem 2**.**

Definition 1** (Cheeger constant [7]).**

Corollary 1**.**

2.2 Connection to the Potts Model

Theorem 3**.**

Lemma 1** (Kolmogorov’s criterion).**

Proof of Theorem 3.

3 Mixing Time on General Graphs

3.1 Slow Mixing

Lemma 2**.**

Proposition 1**.**

Lemma 3**.**

Lemma 4**.**

Proof of Theorem 1.

Proof of Lemma 2.

Proof of Lemma 3.

Proposition 2**.**

Proof of Lemma 4.

3.2 Fast Mixing

Definition 2** (Transportation metric).**

Definition 3** (Wasserstein distance).**

Lemma 5** (Path Coupling).**

Lemma 6**.**

Remark 1**.**

Proposition 3**.**

Lemma 7**.**

Lemma 8**.**

Proof of Theorem 2.

Remark 2**.**

Proof of Lemma 6.

Proof of Proposition 3.

Proposition 4**.**

Proof.

Proof of Lemma 7.

Proof of Lemma 8.

Theorem 4**.**

Proof.

Remark 3**.**

3.3 Bounding the Cheeger constant

Lemma 9** ([7]).**

Lemma 10**.**

Proof of Corollary 1.

4 Repelling Random Walks

Conjecture 1**.**

4.1 The Case β=−∞\beta=-\inftyβ=−∞

Theorem 5**.**

Proof.

4.2 The Complete Graph Case

Theorem 6**.**

Lemma 11**.**

Lemma 12**.**

Proof of Theorem 6.

Proof of Lemma 11.

5 Conclusion

6 Appendix

Proof of Proposition 2.

Proof of Lemma 12.

Theorem 1.

Theorem 2.

Definition 1 (Cheeger constant [7]).

Corollary 1.

Theorem 3.

Lemma 1 (Kolmogorov’s criterion).

Lemma 2.

Proposition 1.

Lemma 3.

Lemma 4.

Proposition 2.

Definition 2 (Transportation metric).

Definition 3 (Wasserstein distance).

Lemma 5 (Path Coupling).

Lemma 6.

Remark 1.

Proposition 3.

Lemma 7.

Lemma 8.

Remark 2.

Proposition 4.

Theorem 4.

Remark 3.

Lemma 9 ([7]).

Lemma 10.

Conjecture 1.

4.1 The Case $\beta=-\infty$

Theorem 5.

Theorem 6.

Lemma 11.

Lemma 12.