
TL;DR
This paper introduces the Attracting Random Walks model, analyzing its phase transition in mixing times on graphs, revealing a rich get richer dynamic with a critical temperature affecting convergence speed.
Contribution
The paper defines a new non-reversible Markov chain model on graphs, demonstrating a phase transition in mixing times without relying on Gibbsian stationary distributions.
Findings
Mixing time is $O(n ext{log} n)$ at high temperature.
Mixing time is exponential in $n$ at low temperature.
The model exhibits a dynamic phase transition independent of stationary distribution decomposition.
Abstract
This paper introduces the Attracting Random Walks model, which describes the dynamics of a system of particles on a graph with vertices. At each step, a single particle moves to an adjacent vertex (or stays at the current one) with probability proportional to the exponent of the number of other particles at a vertex. From an applied standpoint, the model captures the rich get richer phenomenon. We show that the Markov chain exhibits a phase transition in mixing time, as the parameter governing the attraction is varied. Namely, mixing time is when the temperature is sufficiently high and when temperature is sufficiently low. When is the complete graph, the model is a projection of the Potts model, whose mixing properties and the critical temperature have been known previously. However, for any other graph our model is non-reversible and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Attracting Random Walks
Julia Gaudio111Massachusetts Institute of Technology, United States of America. [email protected] .
Yury Polyanskiy222Massachusetts Institute of Technology, United States of America. [email protected] . Some key suggestions for the proof ideas in this paper came from David Gamarnik, Patrick Jaillet, Eyal Lubetzky, Reza Gheissari, and Yuval Peres, detailed in the acknowledgements. The authors are extremely grateful to these people for the guidance and help the during the progression of this project.
Abstract
This paper introduces the Attracting Random Walks model, which describes the dynamics of a system of particles on a graph with vertices. At each step, a single particle moves to an adjacent vertex (or stays at the current one) with probability proportional to the exponent of the number of other particles at a vertex. From an applied standpoint, the model captures the rich get richer phenomenon. We show that the Markov chain exhibits a phase transition in mixing time, as the parameter governing the attraction is varied. Namely, mixing time is when the temperature is sufficiently high and when temperature is sufficiently low. When is the complete graph, the model is a projection of the Potts model, whose mixing properties and the critical temperature have been known previously. However, for any other graph our model is non-reversible and does not seem to admit a simple Gibbsian description of a stationary distribution. Notably, we demonstrate existence of the dynamic phase transition without decomposing the stationary distribution into phases.
1 Introduction
In this paper, we introduce the Attracting Random Walks (ARW) model. The motivation of the model is to understand the formation of wealth disparities in an economic network. Consider a network of economic agents, each with a certain number of coins representing their wealth. At each time step, one coin is selected uniformly at random, and moves to a neighbor of its owner with a probability that depends on how wealthy the neighbors are. Those who are well-connected and initially wealthy will tend to accumulate more wealth. We refer to particles instead of coins in what follows.
This is a flexible model based on a few principles: There are a fixed number of particles moving around on a graph. Movements are asynchronous, and particles make choices about where to move based on their local environment. The model can encompass a variety of situations. Further, the model can be extended by allowing for multiple particle types, with intra– and inter–group attraction parameters, though we do not consider this extension in this paper. There are many more applications beyond the economic application. As an interacting particle system, it could be relevant for physics or chemistry applications.
This paper analyzes the Attracting Random Walks model and establishes phase transition properties. The difficulty in bounding mixing times, particularly in finding lower bounds, is due to the fact that the stationary distribution cannot be simply formulated. Additionally, the model is not reversible unless the graph is complete (Theorem 3), meaning that familiar techniques do not apply.
We establish the existence of phase transition in mixing time as the attraction parameter, , is varied. Slow mixing for large enough is established by relating the mixing time to a suitable hitting time. Fast mixing for small enough is proven by a path coupling approach that relates the Attracting Random Walks chain to the simple (non-interacting) random walk on the same graph (i.e. with ). As a corollary of our main results, we establish properties of the Cheeger cut for the stationary distribution. We find it interesting that even though the stationary distribution is not known analytically for general graphs, we have shown that it undergoes a phase transition (i.e. develops an exponentially small Cheeger cut) by arguing indirectly via mixing times.
The rest of the paper is structured as follows. We describe the dynamics of the model in Section 2, along with some possible applications. The remainder of the paper is focused on properties of the Markov chain governing the dynamics. In Section 2.2 we discuss a link to the Potts model. Section 3 proves the existence of phase transition in mixing time for general graphs, and is the main theoretical contribution of this work. In Section 4, we collect partial results on the version of the model in which particles repel each other instead of attracting, a model we call “Repelling Random Walks.”
2 The Model
2.1 Definitions and Main Results
The model is a discrete time process on a simple graph , where is the set of vertices and is the set of undirected edges. We assume throughout that is connected. We write if . Let . Initially, indistinguishable particles are placed on the vertices of in some configuration. Let be the number of particles at vertex . The particle configuration is updated in two stages, according to a fixed parameter :
Choose a particle uniformly at random. Let be the location of that particle. 2. 2.
Move the particle to a vertex , with probability . Keep the particle at vertex with probability , where is the normalization constant.
Let be the transition probability matrix of the resulting Markov chain. Let denote the th standard basis vector in . Then for two configurations and such that for or , we have
[TABLE]
with .
The probabilities are a function of the numbers of particles at each vertex, excluding the particle that is to move. This modeling choice means that the moving particle is neutral toward itself, and relates the ARW model to the Potts model, as will be explained below.
When is positive (ferromagnetic dynamics), the particle is more likely to travel to a vertex that has more particles. Greater encourages stronger aggregation of the particles. On the other hand, taking (antiferromagnetic dynamics) encourages particles to spread. Note that corresponds to the case of independent (lazy) random walks.
For an application with , consider an ensemble of identical gas particles in a container. We can discretize the container into blocks. Each block becomes a vertex in our graph. Vertices are connected by an edge whenever the corresponding blocks share a face. Note that depending on the type of gas, particles may primarily repel each other, in which case , which discourages particles from occupying the same block, does become reasonable. The focus of this paper is the case of , though. we collect some results on the case as well.
To get an idea of the effect of , Figure 1 displays some instances of the Attracting Random Walks model run for steps for different values of . The graph is the grid graph, with , for an average of particles per vertex.
We now state our main results regarding the phase transition in mixing time. We let denote the total variation distance between two discrete probability measures and , and let be the worst-case (with respect to the initial state) total variation distance for a chain with stationary distribution . Let denote the mixing time of a chain .
Theorem 1**.**
For any graph , there exists such that if , the mixing time of the ARW model is .
Theorem 2**.**
For any graph , there exists such that if , the mixing time of the ARW model is .
Note that we do not prove that one value satisfies both statements.
Through our analysis of mixing time, we establish a transition in the dynamics of the chain. By standard results, this also indirectly implies that the stationary distribution develops multiple almost disjoint phases for , while this is not the case for . More precisely, we have the following corollary.
Definition 1** (Cheeger constant [7]).**
Let be the transition matrix of a Markov chain that is irreducible and aperiodic. Let denote the state space of the chain, and let be the stationary distribution. Define the edge measure by
[TABLE]
For two sets , let . For , let
[TABLE]
Finally, the Cheeger constant is defined as
[TABLE]
Our results on fast and slow mixing allow us to indirectly bound the Cheeger constant of the Attracting Random Walks chain on a given graph. We obtain the following corollary of Theorems 1 and 2.
Corollary 1**.**
Fix a graph , and let be the transition probability matrix of the Attracting Random Walks chain on . Let be the Cheeger constant of . Then if we have . If then .
2.2 Connection to the Potts Model
In the case where is the complete graph, the Attracting Random Walks model is a projection of Glauber dynamics of the Curie–Weiss Potts model. The Potts model is a multicolor generalization of the Ising model, and the Curie–Weiss version considers a complete graph. In the Curie–Weiss Potts model, the vertices of a complete graph are assigned a color from . Setting corresponds to the Ising model.
Let be the color of vertex for each . Define
[TABLE]
The stationary distribution of the Potts model, with no external field, is
[TABLE]
The Glauber dynamics for the Curie–Weiss Potts model are as follows:
Choose a vertex uniformly at random. 2. 2.
Update the color of vertex to color with probability proportional to .
Observe that the summation is equal to the number of vertices, apart from vertex , that have color . Therefore if each vertex in the Potts model corresponds to a particle in the ARW model, and each color in the Potts model corresponds to a vertex in the ARW model, then the ARW model is a projection of the Glauber dynamics for the Potts model. The correspondence is illustrated in Figure 2. Under the correspondence, the ARW chain is exactly the “vector of proportions” chain in the Potts model.
Let be the vertex location of the th particle in the ARW model, for . By the correspondence, we show that the stationary distribution of the ARW model is
[TABLE]
Observe that the factor encourages particle aggregation, while the multinomial encourages particle spread.
The reader is encouraged to refer to [3] for a detailed study of the mixing time of the Curie–Weiss Potts model, for different values of . For instance, [3] shows that there exists such that if , the mixing time is , and if , the mixing time is exponential in . In the context, these results hold with replaced by . On the other hand, when is not the complete graph, the correspondence to the Potts model is lost. In fact, the following can be shown:
Theorem 3**.**
For , the ARW Markov chain is reversible for all if and only if the graph is complete.
The non-reversibility can be shown by applying Kolmogorov’s cycle criterion, demonstrating a cycle of states (configurations) that violates the criterion.
Lemma 1** (Kolmogorov’s criterion).**
A finite state space Markov chain associated with the transition probability matrix is reversible if and only if for all cyclic sequences of states it holds that
[TABLE]
In other words, the forward product of transition probabilities must equal the reverse product, for all cycles of states.
Proof of Theorem 3.
First, if the graph is complete, then the chain is a projection of Glauber dynamics, which is automatically reversible. Now suppose is not complete. We apply Kolmogorov’s cycle criterion. In the ARW model, a state is a particle configuration. A cycle of states is then a sequence of particle configurations such that
Subsequent configurations differ by the movement of a single particle. 2. 2.
The first and last configurations are the same.
If is not a complete graph, then it is straightforward to show that there exist three vertices such that . Now we demonstrate a cycle of states that breaks Kolmogorov’s criterion. We have the following situation, illustrated by Figure 3. The values , , and indicate the degrees of the vertices, excluding the named vertices. Place particles at and particles at . The particle movements are as follows: , , , .
For clarity, let . The forward transition probabilities are
[TABLE]
The reverse transition probabilities are
[TABLE]
Canceling factors that appear in both products, we are left comparing
[TABLE]
to
[TABLE]
Observe that . Taking leading terms, the first product is therefore a degree- polynomial in . Since , the second is a degree- polynomial in . These polynomials have a finite number of solutions for , and therefore itself. Therefore the Markov chain is not reversible. ∎
3 Mixing Time on General Graphs
In this section, we show the existence of phase transition in mixing time in the ARW model when is varied, for a general fixed graph. First, we show exponentially slow mixing for suitably large, namely prove Theorem 1 by relating mixing times to hitting times. Next, we show polynomial time mixing for small values of . The proof is by an adaptation of path coupling. We use definitions and notations on Markov chains from [7].
3.1 Slow Mixing
The idea of the proof of slow mixing is to show that with substantial probability, the chain takes an exponential time to access a constant portion of the state space. We now outline the proof, deferring the proofs of the lemmas. First we state a helper lemma.
Lemma 2**.**
For any graph , there exists a vertex such that for the set of configurations , it holds that . In other words, the states where has the greatest number of particles contribute at least to the stationary probability mass.
By Lemma 2, there exists a vertex such that . Choose any other vertex . Whenever , we can be sure that is not the maximizing vertex, and therefore that a set of states having at least mass under the stationary measure has not been reached. It therefore suffices to lower bound the time until vertex has lost sufficient particles for vertex to have the maximum number of particles.
Let If the probability that has reached the set by time is less than some , then the total variation distance at time is at least . Therefore we get the following relationship between the mixing time and hitting time:
Proposition 1**.**
[TABLE]
The problem now reduces to lower bounding this hitting time. The idea is that when particles leave vertex , there is a strong drift back to . However, controlling the hitting times of a multidimensional Markov chain is challenging, and direct comparison is difficult to establish. We instead reason by comparison to another Markov chain, , which lower-bounds the particle occupancy at vertex .
Let be the length of the shortest path connecting vertex to vertex . Let be a projection of the chain defined by , and let be its state space. In other words, the th coordinate of the projected chain counts the number of particles that are a distance away from vertex . Note that . We let denote this projection, writing, . For any , define
[TABLE]
For some to be determined, let
[TABLE]
We now build a chain on coupled to such that as long as , . Then . The remainder the proof of slow mixing is as follows.
Construct a lower-bounding comparison chain satisfying when . 2. 2.
Compute and use a concentration bound to show that places exponentially little mass on the set . 3. 3.
Comparing the chain to , show that takes exponential time to achieve . The result is complete by .
We now define the lower-bounding comparison chain , which is a chain on independent particles. These particles move on the discrete line with points , where . We first describe the case . Since the comparison needs to hold only when , we assume that . The idea is to identify a uniform constant lower bound on the probability of a particle moving closer to under this assumption, which tells us that once the particle is at , there is a high probability of remaining there.
Let denote the neighbourhood of , i.e. . In the chain, when a particle is at a vertex , its probability of moving to any one of its neighbors is at least
[TABLE]
where is the maximum degree of the graph. This is because the lowest probability when is large corresponds to placing all movable particles at some other neighbor of . When a particle is at a vertex , it stays there with probability at least
[TABLE]
When a particle is at a vertex , it moves to with probability at least
[TABLE]
Note that .
The transitions of the chain are chosen in order to maintain comparison. At each time step, a particle is selected uniformly at random. When the chosen particle is located at , the particle moves to with probability and moves to with probability . When the chosen particle is located at , it moves to [math] with probability , and moves to with probability . The transition probabilities for single particle movements are depicted in Figure 4. When (i.e., is the complete graph), we instead have the transitions depicted by Figure 5. Lemma 3 establishes the comparison.
Let denote the stationary distribution of the chain, and let be the probability according to of a particular particle being located at vertex in the line graph. The following results about the chain are required to complete the proof.
Lemma 3**.**
For a configuration , set . As long as , the chain satisfies
[TABLE]
for all and . In particular, .
Lemma 4**.**
Recall that . Let and fix . For all large enough, . Moreover,
[TABLE]
which implies
[TABLE]
Proof of Theorem 1.
Recall the choices of and above. Lemma 4 tells us that the chain places exponentially little stationary mass on the set . We now combine this fact with the comparison established in Lemma 3.
Recall Applying Proposition 1 with ,
[TABLE]
Since , it also holds that
[TABLE]
The last equality is due to the fact that for all in .
Additionally define
[TABLE]
Now because is a lower-bounding chain, it holds that
[TABLE]
for all and . Therefore,
[TABLE]
Finally, from Lemma 4 we know that . Suppose that is distributed according to and consider the hitting time . It holds that
[TABLE]
Therefore, time is required for . The same is true when , for some . Therefore and , which proves Theorem 1. ∎
We now provide the deferred proofs.
Proof of Lemma 2.
By the Union Bound,
[TABLE]
Proof of Lemma 3.
We show that there exists a coupling satisfying
[TABLE]
for all and . Since , we can pair up the particles at time and design a synchronous coupling, i.e. when a certain particle is chosen in the process, its copy is chosen in the chain. We design the coupling so that for each particle, the –copy is at least as close to [math] as the –copy, for all . Note that this implies for all and . The uniformity of and over all configurations in ensures that the coupling will maintain the requirement (2), which is established by induction on . The following analysis applies to both and by considering the relevant cases.
The base case holds since . Suppose that at time , each particle in the chain is at least as close to [math] as its copy in the chain. We will show that the same property holds for time . First consider a particle located at [math] in the chain. By the inductive hypothesis, its copy must be located at [math] in the process also, and the corresponding particle in the chain must be at . The probability of the particle staying at [math] in the chain is smaller than the probability of the corresponding particle staying at in the chain, since is a uniform lower bound on the probability of staying at . Therefore in this case, the property is maintained in the next time step.
Next consider a particle located at vertex in the chain and suppose its copy is located at vertex in the process. By the inductive hypothesis, . If , then clearly the property is maintained in the next step. It remains to consider the cases and . Consider the case . We couple the particles so that if the particle in the chain moves left to vertex , then the particle in the process makes the same transition. This coupling is possible by the uniformity of and . Otherwise, the particle in the chain moves right, and the property is maintained.
Next consider the case . It suffices to design a coupling such that if the particle in the process moves right, then so does the particle in the chain. If , this is possible due to the fact that is a uniform upper bound on the probability of moving right from these states. Next suppose that and . The particle in the process moves right with probability upper-bounded by , which is smaller than for sufficiently small and sufficiently large. Therefore we can ensure the property in the next step. Finally, suppose and . Due to the fact that is a uniform upper bound on the probability of moving right from these states, we can again construct a coupling that maintains the property. ∎
To prove Lemma 4, we need the stationary probability .
Proposition 2**.**
It holds
[TABLE]
The proof of Proposition 2 is deferred to the appendix.
Proof of Lemma 4.
When , we have . Since as , it holds that for large enough. Next, for we have
[TABLE]
We show that for and large enough. Again, for large enough, we have . Next,
[TABLE]
where the last inequality holds for and large enough, since the first term of (4) dominates. Since , we have . Finally,
[TABLE]
where the first inequality holds for large enough. Substituting into (3), we obtain for and large enough
[TABLE]
where the second inequality holds for large enough, due to for .
We conclude that the expectation is linearly separated from the boundary:
[TABLE]
Next we show concentration. Label all the particles, and define if particle is at vertex [math] in the line graph, and otherwise. Then , and is independent of for all . Applying Hoeffding’s inequality,
[TABLE]
for . Let . Then the above implies
[TABLE]
3.2 Fast Mixing
The proof is by a modification of path coupling, which is a method to find an upper bound on mixing time through contraction of the Wasserstein distance. 333An alternative prove of fast mixing is to use a variable-length path coupling, as introduced in [6]. For further details, see [5]. The following definition can be found in [7], pp. 189.
Definition 2** (Transportation metric).**
Given a metric on a state space , the associated transportation metric for two probability distributions and is defined as
[TABLE]
where the infimum is over all couplings of and on .
Definition 3** (Wasserstein distance).**
Let be the transition probability matrix of a Markov chain on a state space , and let be a metric on . The Wasserstein distance of two states with respect to and is defined as follows:
[TABLE]
In other words, the Wasserstein distance is the transportation metric distance between the next state distributions from initial states and .
The following lemma is the path coupling result which can be found in [2] and [7]. Given a Markov chain on state space with transition probability matrix , consider a connected graph , i.e. the vertices of are the states in and the edges are . Let be a “length function” for the edges of , which is an arbitrary function . For , define to be the path metric, i.e. is the length of the shortest path from to in terms of and .
Lemma 5** (Path Coupling).**
Under the above construction, if there exists such that for all that are connected by an edge in it holds that
[TABLE]
then
[TABLE]
where is the diameter of the graph with respect to .
Our proof of rapid mixing for small enough relies on rapid mixing of a single random walk. The following lemma demonstrates the existence of a contracting metric for a single random walk. It is possible that such a result appears elsewhere, but we are not aware of a published proof.
Lemma 6**.**
Consider a random walk on which makes a uniform choice among staying or moving to any of the neighbors and denote by its transition matrix. Let be the expected meeting time of two independent copies of a random walk on a graph started from states and . Then is a metric and contracts the respective Wasserstein distance. In particular,
[TABLE]
where . Furthermore, if , then
[TABLE]
where . 444The statement for was pointed out by the reviewer.
Remark 1**.**
In fact, we can show a stronger result (i.e. with a smaller value in the place of ): we can allow arbitrary Markovian coupling between two copies of the random walk and define to be the meeting time under that coupling.
In order to apply path coupling, we let be a graph on particle configurations, where whenever for some pair of distinct vertices and in . In other words, and differ by the position of a single particle. Note that and need not be neighboring vertices in . For such a pair of neighboring configurations , let . Clearly, . Now for any two configurations , let denote the path metric induced by and . We show that for neighboring configurations.
Proposition 3**.**
For any two configurations such that , it holds that .
Let be the probability distribution of the next location of the selected particle, when it is initially located at vertex in configuration . Recall that is the probability distribution of the next location of a simple random walk on , initially located at vertex . Note that when , it holds that . When is small, . Lemma 7 quantifies this statement.
Lemma 7**.**
For all configurations and vertices , it holds that
[TABLE]
Next, consider two neighbouring configurations and . Because only the position of one particle is different between the two configurations, . The following lemma makes this precise.
Lemma 8**.**
Let and be neighbouring configurations. Recall that is the maximum degree of the vertices in . The following holds:
[TABLE]
With these results stated, we prove Theorem 2.
Proof of Theorem 2.
Suppose is a metric on such that a single-particle random walk’s kernel satisfies
[TABLE]
for all and . Note that the existence of such a metric was established in Lemma 6 with an estimate of .
Now we wish to bound for all neighboring particle configurations and related by . We may choose any coupling in order to obtain an upper bound. The coupling will be synchronous: the choice of particle to be moved will be coordinated between the chains. Namely, if the “extra” particle is chosen in configuration , then so too will the “extra” particle be chosen in configuration . Similarly, if some other particle is chosen in , than a particle at the same vertex will be chosen in . For an illustration, see Figure 6.
Let and denote the coupled random variables corresponding to the next configurations. Let be the “extra” particle. Let be a random variable that denotes the uniformly selected particle. Since our coupling gives an upper bound, we can write
[TABLE]
First, suppose the “extra” particle, , is chosen in both chains. This happens with probability . By Lemma 7, we can couple the distributions and to and respectively with probability at least . In that case, we get contraction by a factor of . With the remaining probability, we assume the worst-case distance of . Therefore, the conditional Wasserstein distance is upper bounded as follows:
[TABLE]
Next, suppose some other particle (located at ) is chosen in both chains. This happens with probability . We claim
[TABLE]
Indeed, by Lemma 8, we can couple particle so that it moves to the same vertex in both chains with probability at least
[TABLE]
By Proposition 3, it holds that in the case that the particle moves to the same vertex in both chains. Otherwise, an additional distance of at most is incurred.
Finally, we substitute the bounds (8) and (9) into (7).
[TABLE]
where the last inequality is due to and . In order to show contraction, it is sufficient that the expression multiplying be positive:
[TABLE]
For an example of a satisfying , choose so that
[TABLE]
Therefore, we can choose
[TABLE]
Substituting into (10), we obtain for some
[TABLE]
Applying the path coupling lemma (Lemma 5), we obtain
[TABLE]
Setting the right hand side to be less than in order to bound ,
[TABLE]
Since
[TABLE]
we have
[TABLE]
Therefore, , which completes the proof of Theorem 2. ∎
Remark 2**.**
Arguably, a more natural approach to show fast mixing would be through a more traditional path coupling approach: Let have an edge between configurations and if and are adjacent vertices in . Set for adjacent configurations. However, this approach does not yield contraction in the Wasserstein distance, which we show at the end of this section.
We now provide the deferred proofs.
Proof of Lemma 6.
First we verify that is a metric. It holds that , and with equality if and only if . To show the triangle inequality, start three random walks from vertices and let be the meeting time of the walks started from and . The three random walks are advanced according to the independent coupling, and if a pair of walks collides, they are advanced identically starting from that time. Under this coupling, observe that
[TABLE]
and take expectations. Next we show that for . We can choose any coupling of and to show an upper bound. Letting and be independent, we have
[TABLE]
and
[TABLE]
These two equations imply . Finally, . If , then we conclude . ∎
Proof of Proposition 3.
Consider any path from to : , where for . Then we have
[TABLE]
We claim that we can rearrange this summation to be of the form
[TABLE]
for some sequence . Indeed, let and be the multisets that collect the “outbound” and “inbound” particle transfers, respectively. The value must appear one more time in than in . Similarly, the value must appear one more time in than in . All other values appear an equal number of times in and . By choosing terms in order, beginning with , it is possible to rearrange the sum into the given form. By the triangle inequality for ,
[TABLE]
Therefore, the shortest distance between and is along the edge connecting them, and we conclude that for neighboring configurations. ∎
To prove Lemma 7, we state the following proposition.
Proposition 4**.**
The set of distributions parametrized by the configuration is contained within the convex set
[TABLE]
Proof.
To show this claim, we compute the ratio when and , and show that it is upper bounded by . There are three cases to consider.
The case .
[TABLE]
Since , it holds that . 2. 2.
The case .
[TABLE]
Since , we have . Therefore, again . 3. 3.
The case .
[TABLE]
Proof of Lemma 7.
Recall that is the neighbor set of vertex in graph . Let . We have
[TABLE]
and
[TABLE]
Using Proposition 4,
[TABLE]
The inequality is due to the fact that and the equality is due to the fact that the maximum of a convex function over a closed and bounded convex set is achieved at an extreme point, namely . To maximize the right hand side of (11), let . Then
[TABLE]
Setting we obtain the solutions . The solution is the maximizer. Substituting into (11),
[TABLE]
which completes the proof. ∎
Proof of Lemma 8.
First,
[TABLE]
We will show that each term is upper bounded by . Since there are at most terms, the bound follows.
We compute for . Since and are interchangeable, we can drop the absolute value.
[TABLE]
First consider the case that . Then
[TABLE]
Let
[TABLE]
Note that for . We have
[TABLE]
Next, we consider the case . We have
[TABLE]
We now show that the approach for proving Theorem 2 based on the natural one-step path coupling does not yield the required contraction.
Theorem 4**.**
Let have an edge between configurations and whenever and are adjacent vertices in . Let for adjacent configurations. There exists a graph such that for ,
[TABLE]
for some adjacent configurations .
Proof.
Let be the 4-vertex path graph. Label the vertices in order along the path, and consider and related by so that the two configurations differ by a transfer from one middle vertex to the other. When , the transition probabilities are simple: given that a particle is chosen at vertex , it moves to vertex with probability . The optimal coupling of and may be expressed as an optimal solution of a linear program, as follows. Write if is adjacent to in or . For each and , let be the probability of the next states being and in a coupling. The constraints require the collection of variables to be a valid coupling, and the objective function calculates the expected distance under the coupling.
[TABLE]
This linear program is known as a Kantorovich problem. Our goal is to show that the optimal objective value is at least . We will first write down the dual problem. By weak duality, any feasible solution to the dual problem gives a lower bound to the optimal value of the primal problem. Next we will construct a primal solution with objective value equal to , and apply the complimentary slackness condition to help us construct a dual solution whose objective value is also equal to . Finally we will conclude that the optimal solution to the primal problem is equal to , by strong duality. For a reference to linear programming duality, see e.g. Chapter 4 of [1].
First we take the dual of the linear program, introducing dual variables for and for :
[TABLE]
This linear program is a Kantorovich dual problem. By weak duality, if there exists a dual solution with objective value , then the optimal solution of the primal is at least . Therefore our goal is to find a dual solution with objective value at least .
For with , . Similarly, for , . The value of is given by
[TABLE]
There exists a primal solution with objective value : Set
[TABLE]
[TABLE]
and
[TABLE]
Other values of are set to zero. In other words, describes a synchronous coupling according to the pairing in Figure 6, with particles moving in the same direction always. Now supposing this is an optimal solution, we apply complementary slackness to identify candidate dual optimal solutions. The complementary slackness condition states that if and are optimal primal and dual solutions, then it holds that for all ,
[TABLE]
If our primal solution is optimal, then whenever , we need . These additional constraints help us construct the following dual feasible solution:
[TABLE]
We find that the objective value of this solution is equal to . By strong duality, we conclude that the optimal value of the primal problem is equal to , and therefore there does not exist a contractive coupling. ∎
Remark 3**.**
The argument in the proof of Theorem 4 should apply to all graphs that contain the a four-vertex path graph as a subgraph, and possibly to other graphs as well.
3.3 Bounding the Cheeger constant
The following results will be useful in proving Corollary 1.
Lemma 9** ([7]).**
Let be the Cheeger constant of an aperiodic and irreducible Markov chain, and let be its mixing time. Then
[TABLE]
The following result follows directly from Equation 2.28 in [4].
Lemma 10**.**
Let be the transition probability matrix of an aperiodic and irreducible Markov chain on state space , satisfying for all . Let denote the stationary distribution, and let . Let be the Cheeger constant of this chain, and let be its mixing time. Then
[TABLE]
Proof of Corollary 1.
We first prove the lower bound. Let . By Lemma 9 and Theorem 2, we have
[TABLE]
We next prove the upper bound. Let denote the lazy version of the ARW chain. Note that is also the stationary distribution of the lazy chain. Let denote its Cheeger constant. Observe that for all , it holds that . Therefore,
[TABLE]
and so it suffices to show .
Fix . We claim that the mixing time of the lazy chain is . To show this, we modify the proof of Theorem 1 as follows. We define to be the lazy version of the chain. That way, we can couple the chain to the chain advancing according to . Observe that the lower-bounding property still holds. Furthermore, the hitting time of the chain to the set is greater than the hitting time for the chain. Finally, since the stationary distributions of the and chains coincide, the rest of the proof follows identically.
We now apply Lemma 10 to the lazy chain. We need a lower bound on . Observe that if and are such that for all , then . Set . There exists at least one sequence of possible transitions in steps to get from to . Each step in the sequence has probability at least . Therefore,
[TABLE]
By Lemma 10, we have
[TABLE]
Since for , we have
[TABLE]
Substituting the lower bound for , we obtain
[TABLE]
which implies . ∎
4 Repelling Random Walks
Throughout our analysis, we have only considered . However, the case (“Repelling Random Walks”) is theoretically and practically interesting to study also. Simulations confirm the intuition that the particles behave like independent random walks when is close to zero, and spread evenly when is very negative (see Figure 7). We conjecture that there are not any hard-to-escape subsets of the state space for all .
Conjecture 1**.**
For all and any graph, the mixing time of the ARW model is polynomial in .
We consider two cases: the extreme case of , and the case where is the complete graph, for certain values of .
4.1 The Case
Theorem 5**.**
When , the mixing time of the Attracting Random Walks model is .
Proof.
When , the dynamics are simplified. Suppose a particle is chosen at vertex . Let be the set of vertices corresponding to the minimal value(s) of . The chosen particle moves to a vertex among those in , uniformly at random.
Our goal is to show that the set
[TABLE]
satisfies the following three properties: (1) It is absorbing, meaning that once the chain enters , it cannot escape ; (2) The chain enters in polynomial time; (3) Within , the chain mixes in constant time with respect to .
We claim that the maximum particle occupancy cannot increase, and the minimum particle occupancy cannot decrease. We now show that the maximum particle occupancy, , is monotonically non-increasing over time. Suppose that at time , a particle at vertex is selected and moves to vertex . There are five cases:
. The maximum does not change. 2. 2.
, and both are maximizers. This case is not possible, since . 3. 3.
, is a maximizer, and is not. The new maximum value is at most , in the case that . 4. 4.
, is not a maximizer, and is. This case is not possible, since . 5. 5.
, and are not maximizers. The new maximum value is at most , in the case that .
Therefore . A similar argument shows that the minimum particle occupancy is monotonically non-decreasing over time. Together, they imply Property (1).
Next, we show Property (2). Assume . Let be the set of maximizing vertices at time . We claim there exists at least one vertex such that there exists a path of distinct vertices satisfying and (allowing ). In other words, there is a walkable path from to . The maximum length of the path is . The probability that a particle is transferred along this path before any other events happen is therefore lower bounded by
[TABLE]
Therefore the probability that such a transfer happens within trials is at least
[TABLE]
If there had been at least two maximizing vertices to start, the number of maximizing vertices would have fallen by . If there had been only one maximizing vertex to start, the maximum value itself would have fallen by .
We see that there are two types of “good” events: reducing the number of maximizing vertices while the maximum value stays the same, or reducing the maximum value. We claim that the number of “good” events that happen before the chain enters the set is upper bounded by . Indeed, imagine that the particles at each vertex are stacked vertically. A particle movement from vertex to vertex is interpreted as a particle moving from the top of the stack at vertex to the top of the stack at vertex . Observe that the height of a particle cannot increase. Further, each particle’s height can fall by at most units over time, and can therefore drop at most times. Since all good events require a particle’s height to drop, the number of good events is at most . Let be the number of trials of length each. Let be the number of successes during the trials. By the Hoeffding inequality,
[TABLE]
Since ,
[TABLE]
Therefore the probability that the chain is in after steps is at least . For an example, we can even set . Then
[TABLE]
Therefore, within steps, the chain is in with high probability.
Finally, we show Property (3). Once the chain is in , there are two types of vertices: those that have particles, and those that have particles. Note that there are always vertices with the higher number of particles. Therefore it is equivalent to study an exclusion process with just particles on the graph . With probability , an unoccupied vertex is selected, and the chain stays in place. With the remaining probability, an occupied vertex is chosen uniformly at random. Its particle then moves to a neigboring empty vertex or stays where it is, uniformly at random. Equivalently, the chain is lazy with probability , and otherwise one of the particles is chosen, and either stays or moves to a neighbor. Since the number of particles can be upper and lower bounded by constants (), the mixing time within is independent of . Therefore, we conclude that the overall mixing time is . ∎
4.2 The Complete Graph Case
Note that the complete graph case for is equivalent to the vector of proportions chain in the antiferromagnetic Curie–Weiss Potts model.
Theorem 6**.**
On the complete graph with vertices, the mixing time is for all satisfying .
The proof relies on the following two lemmas.
Lemma 11**.**
Let be the ARW chain for any and let be a chain of independent particles (). Set . For every vertex and time ,
[TABLE]
For , let .
Lemma 12**.**
On the complete graph, if and , then
[TABLE]
for
[TABLE]
The proof of Lemma 11 appears later in this section, and the proof of Lemma 12 is deferred to the appendix due to its technical nature.
Proof of Theorem 6.
We may assume that is large enough so that
[TABLE]
Let be a random variable distributed according to the stationary distribution of the chain at stationarity. At stationarity, the vertex occupancies are strongly concentrated around their means. By the Hoeffding Inequality, for every ,
[TABLE]
for every vertex .
Fix . We wish to upper bound . Note that the mixing time of the chain is . To see this, consider a synchronous coupling. The expected amount of time to select all the particles is , and whenever a particle is selected, it moves to a uniformly random location, which is coupled. Now, for all , . Therefore at time , for every ,
[TABLE]
for every vertex . By Lemma 11, it also holds that for every ,
[TABLE]
for every vertex . Recall that . Then by the Union Bound,
[TABLE]
for every and . We observe that for large enough, there is always an small enough so that
[TABLE]
Then with probability at least , belongs to .
Next, we establish that for every , there exists such that (1) once the chain enters , it takes exponential time to leave , with high probability; (2) we can applying path coupling within . The first claim is due to comparison with the chain, as established above.
We now demonstrate the required contraction for path coupling within . Recall that we need to define the edges of the graph and choose a length function on the edges. Let if for some , and let . Consider any pair of neighboring configurations and . We employ a synchronous coupling, as in Figure 6. Namely, the “extra” particle at vertex in configuration is paired to the “extra” particle at vertex in configuration . All other particles are paired by vertex location. When a particle is selected to be moved in the configuration, the particle that it is paired to in the configuration is also selected to be moved.
With probability , one of the pairs that has the same vertex location is chosen. Suppose it is located at vertex . We couple the transitions in the two chains according to the coupling achieving the total variation distance .
By Lemma 12, when one of the particles paired by vertex location is chosen, we can couple them so that they move to the same vertex with probability at least
[TABLE]
With the remaining probability, the distance increases by at most .
With the remaining probability, the “extra” particle is chosen in both chains. The chains can then equalize with probability because on the complete graph. Therefore, we can bound the Wasserstein distance as follows:
[TABLE]
Therefore, in order to achieve contraction, it suffices that
[TABLE]
Fix , and let . Then substituting , we obtain the condition
[TABLE]
When is such that , there exists small enough so that the condition (13) holds. We conclude that contraction holds for .
To summarize the argument, we have shown that in time , the chain enters . After that, the chain leaves the larger set, , with exponentially small probability, which can be disregarded. Within , the Wasserstein distance with respect to the chosen and contracts by a factor of , so an additional steps are sufficient. Therefore, the overall mixing time is . ∎
Proof of Lemma 11.
We claim that there exists a coupling of such that for all and , . Let and define similarly. We claim that for all configurations and vertices , if , then
[TABLE]
and
[TABLE]
If , then
[TABLE]
and
[TABLE]
In other words, the inequalities (14)-(17) state that the chain is less likely to move in the absolute value–increasing direction, and more likely to move in the absolute value–decreasing direction. These inequalities, along with the fact that , suffice to prove the lemma.
The transitions for the process are with probability , and with probability . With the remaining probability, . Suppose . There are two cases to analyze when :
. The probability that is upper bounded by , because vertex is a more likely than average destination. In other words, it is harder to lose a particle from vertex that has fewer than the average number of particles when , compared to when . Formally,
[TABLE]
For the same reason, the probability that is lower bounded by
[TABLE]
Therefore, inequalities (14) and (15) hold in this case. 2. 2.
. This time, is a less likely than average destination. The probability that is lower bounded by
[TABLE]
The probability that is upper bounded by
[TABLE]
Therefore, inequalities (14) and (15) hold in this case also.
Finally, suppose . Then the probability of losing a particle is upper bounded by , and the probability of gaining a particle is upper bounded by . Therefore, inequalities (16) and (17) hold.
We conclude that such a coupling exists, and therefore the stochastic dominance holds. ∎
5 Conclusion
In this paper we have introduced a new interacting particle system model. We have shown that for any fixed graph, the mixing time of the Attracting Random Walks Markov chain exhibits phase transition. We have also partially investigated the Repelling Random Walks model, and we conjecture that model is always fast mixing. Beyond theoretical results, it is our hope that the model will find practical use.
6 Appendix
Proof of Proposition 2.
To compute the stationary probabilities , note that we can disregard the initial uniform particle choice, and simply consider a Markov chain on a graph with nodes as in Figure 4 or 5.
When , we have .
Next, consider . We have
[TABLE]
Since , we have
[TABLE]
and so
[TABLE]
Finally, consider the case . We solve the equations for the stationary distribution.
[TABLE]
[TABLE]
Since ,
[TABLE]
Substituting into (21)
[TABLE]
Proof of Lemma 12.
Let
[TABLE]
Then we can write
[TABLE]
and
[TABLE]
To check the sign of , it is equivalent to check the sign of
[TABLE]
Next we show that for fixed , the sign of is the same for all . Suppose . Then , and is equivalent to check the sign of the expression . Since this expression does not depend on , we conclude that the sign is the same for all .
If for all , then
[TABLE]
Similarly, if for all , then
[TABLE]
Therefore,
[TABLE]
Consider the ratio of denominators of and . We have
[TABLE]
We first bound . If , we obtain
[TABLE]
Similarly, if , we obtain
[TABLE]
We similarly bound . If , we obtain
[TABLE]
If , we obtain
[TABLE]
For any choice of ,
[TABLE]
Therefore,
[TABLE]
Recall that . We upper bound by setting and to their lower bounds, and to its upper bound for .
[TABLE]
where in the second-last inequality we have used the fact that and the last inequality holds when . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Bertsimas and J. N. Tsitsiklis, Introduction to linear optimization , Athena Scientific, Dynamic Ideas, 1997.
- 2[2] R. Bubley and M. E. Dyer, Path coupling: A technique for proving rapid mixing in Markov chains , 38th Annual Symposium on Foundations of Computer Science (1975).
- 3[3] P. Cuff, J. Ding, O. Louidor, E. Lubetzky, Y. Peres, and A. Sly, Glauber dynamics for the mean-field Potts model , Journal of Statistical Physics 149 (2012), 432–477.
- 4[4] J. A Fill, Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the Exclusion Process , The Annals of Applied Probability 1 (1991), no. 1, 62–87.
- 5[5] Julia Gaudio, Investigations in applied probability and high-dimensional statistics , Ph.D. thesis, Massachusetts Institute of Technology, 2020.
- 6[6] Thomas P. Hayes and Eric Vigoda, Variable length path coupling , Random Structures and Algorithms 31 (2007), no. 3, 251–272.
- 7[7] David A. Levin and Yuval Peres, Markov chains and mixing times , 2 ed., American Mathematical Society, 2017.
