The interchange process on high-dimensional products
Jonathan Hermon, Justin Salez

TL;DR
This paper proves that the mixing time of the interchange process on high-dimensional hypercubes is proportional to the dimension, showing rapid emergence of macroscopic cycles and providing bounds on related constants.
Contribution
It resolves a long-standing conjecture about the mixing time on hypercubes and extends results to products of arbitrary fixed-size graphs.
Findings
Mixing time on hypercube is of order n
Macroscopic cycles emerge in constant time
Log-Sobolev constant is of order 1
Abstract
We resolve a long-standing conjecture of Wilson (2004), reiterated by Oliveira (2016), asserting that the mixing-time of the unit-rate Interchange Process on the -dimensional hypercube is of order . This follows from a sharp inequality established at the level of Dirichlet forms, from which we also deduce that macroscopic cycles emerge in constant time, and that the log-Sobolev constant of the exclusion process is of order . Beyond the hypercube, our results apply to cartesian products of arbitrary graphs of fixed size, shedding light on a broad conjecture of Oliveira (2013).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The interchange process on high-dimensional products
Jonathan Hermon, Justin Salez
Abstract
We resolve a long-standing conjecture of Wilson (2004), reiterated by Oliveira (2016), asserting that the mixing time of the Interchange Process with unit edge rates on the -dimensional hypercube is of order . This follows from a sharp inequality established at the level of Dirichlet forms, from which we also deduce that macroscopic cycles emerge in constant time, and that the log-Sobolev constant of the exclusion process is of order . Beyond the hypercube, our results apply to cartesian products of arbitrary graphs of fixed size, shedding light on a broad conjecture of Oliveira (2013).
Contents
1 Introduction
1.1 Interchange process
Let be a finite undirected connected graph. The interchange process (ip) on is the continuous-time random walk on the symmetric group with initial condition and the following Markov generator: for all observables ,
[TABLE]
where denotes the transposition of the endpoints of the edge . One may think of each vertex as carrying a labelled particle, and of the edges as being equipped with independent unit-rate Poisson clocks: whenever a clock rings, the particles sitting at the endpoints of the corresponding edge simply exchange their positions.
Since is symmetric and irreducible, the law of converges to that of a uniform permutation as . We shall here be interested in the time-scale on which this convergence occurs, as traditionally measured by the total-variation mixing time:
[TABLE]
Understanding the relation between this fundamental quantity and the geometry of is a challenging problem, to which a remarkable variety of tools have been applied: representation theory [13, 9], couplings [25, 32, 2, 29], eigenvectors [32, 20], functional inequalities [24, 33, 8], comparison methods [11, 4], etc. The question is of course particularly meaningful when the number of states becomes large, and one is thus naturally led to study asymptotics along various growing sequences of graphs .
The case of the clique has been extensively studied under the name random transposition shuffle. In particular, Diaconis and Shahshahani [13] proved that
[TABLE]
In fact, this was shown for any precision instead of in (2), thereby extablishing the very first instance of what is now called a cutoff phenomenon [10]. Another well-understood case is the path , for which Lacoin [23] recently proved cutoff at time
[TABLE]
There are, however, many simple graph sequences along which even the order of magnitude of is unknown. An emblematic example (which was the initial motivation for our work) is the boolean hypercube , for which Wilson [32] conjectured in that
[TABLE]
This was reiterated as Problem 4.2 of the AIM workshop Markov Chains Mixing Times [28]. Here and throughout the paper, and denote equality and inequality up to universal positive multiplicative constants. The current best estimates are
[TABLE]
The lower bound is due to Wilson [32, Section 9.1], and the upper bound was recently obtained by Alon and Kozma [4, Corollary 10] as a special case of a much more general estimate which we will now discuss.
1.2 The big picture
An important observation about the ip is that the motion of a single particle is itself a Markov process. The generator is the usual graph Laplacian, which acts on functions by
[TABLE]
It is natural to expect the mixing properties of and to be intimately related. Indeed, a celebrated conjecture of Aldous, now resolved by Caputo, Liggett and Richthammer [8], asserts that the relaxation times (inverse spectral gaps) of these two operators coincide:
[TABLE]
Recall that classically controls the mixing time of the single-particle dynamics (7), up to a correction which is only logarithmic in the number of vertices:
[TABLE]
Inspired by the identity (8), Oliveira [29] conjectured that the same control applies to . More precisely, he proposed the following simple-looking but far-reaching estimate, which is sharp in the three very different graph examples mentioned above (see Table 1).
Conjecture 1** (Oliveira [29]).**
For any connected graph ,
[TABLE]
Some partial progress on Conjecture 1 can be found in [18, 17]. It is easy to prove that is comparable up to some universal constants to the mixing time of independent particles [18, §2.1] and thus Oliveira’s conjecture has the following probabilistic interpretation. It is saying that the mixing time of the interchange process is at most some universal constant multiple of the mixing time of independent particles (in fact, this is how it phrased in [29]). . We also note that under a mild spectral condition one has that [18, Theorem 1.4], but that in general one can have that is of smaller order than .111One such example can be obtained by taking to be the graph obtained by attaching a path of some diverging length to a clique of size , where . A similar example is analyzed in [16], where it is shown that the order of the total variation mixing time of the interchange process may increase as a result of increasing some of the edge rates by a multiplicative factor, or by adding a small number of edges to the base graph (in a manner that makes the original graph and the new graph quasi-isometric).
One of the most powerful techniques to bound the mixing time of a complicated Markov chain consists in comparing its Dirichlet form with that of a better understood chain having the same state space and stationary law, see the seminal paper by Diaconis and Saloff-Coste [11]. In the case of ip, the Dirichlet form is given by
[TABLE]
and a natural candidate for the comparison is the mean-field version , where denotes the complete graph on the same vertex set as . Let us therefore define the comparison constant of the ip on as the smallest number such that the inequality
[TABLE]
holds for all . This constant is the optimal price to pay in order to systematically transfer quantitative estimates from ip on to ip on . In a recent breakthrough, Alon and Kozma [4, Theorem 1] established the following remarkably general estimate.
Theorem 1** (Alon and Kozma [4]).**
For any regular connected graph ,
[TABLE]
In particular, they deduced the following bound on the mixing times.
Corollary 1** (Alon and Kozma [4]).**
For any regular connected graph ,
[TABLE]
Note that this proves Conjecture 1 along sequences satisfying . Examples include , , or the discrete tori , etc. On the other hand, for various other graphs such as the hypercube or bounded-degree expanders, one has
[TABLE]
and Theorem 1 fails at capturing the conjectured asymptotics. In light of this, the next step towards Conjecture 1 should naturally consist in understanding the mixing properties of the ip on graphs satisfying (15). This is precisely the program to which the present paper is intended to contribute.
2 Results
2.1 Comparison constant and mixing time
A natural and important class of graphs satisfying (15) are the “high-dimensional” graphs obtained by taking cartesian products of a large number of small graphs. Recall that the cartesian product of graphs is the graph with vertex set and where the neighbors of a vertex are obtained by replacing an arbitrary coordinate () with an arbitrary neighbor of in the graph . Note that is connected as soon as are. We will allow the dimension to grow arbitrarily but will keep the side-length fixed, meaning that
[TABLE]
for some fixed integer . Two simple examples are the dimensional torus and the dimensional Hamming graph . In particular, when , we recover the hypercube. Our main result is the determination of the exact order of magnitude of on all product graphs of fixed side-length.
Theorem 2** (Comparison).**
All connected product graphs of side-length satisfy
[TABLE]
where means equality up to multiplicative constants that depend only on .
Our estimate on classically yields an upper bound on the mixing time, even in the strong sense (see [4, Lemma 6]). Moreover, a standard application of Wilson’s method (see [20, Proposition 1.2]) yields a matching lower bound. We thus obtain the following result, which confirms in particular Wilson’s long-standing prediction (5).
Corollary 2** (Mixing time).**
All connected product graphs of side-length satisfy
[TABLE]
Note that on product graphs, the single-particle dynamics (7) updates each coordinate independently. Consequently, any connected product graph of side-length satisfies
[TABLE]
(The double logarithm comes from the fact that there are coordinates, and that the time it takes to update all of them a constant number of times is logarithmic in the number of coordinates). Thus, our Corollary 2 resolves Conjecture 1 for all product graphs of fixed side-length, in a regime where Theorem 1 always fails at doing so.
Remark 1** (Pre-cutoff).**
Let us comment on the constants hidden in our results, at least for the Hamming graph . Wilson’s eigenvector method [32] produces the precise lower bound
[TABLE]
with the constant being completely explicit (for example, we have ). On the other hand, our Corollary 2 guarantees that
[TABLE]
for some constant that can certainly be made explicit as well, by a careful examination of our proof. However, we did not try to optimize the value of , nor even to extract its rough dependency in , because we believe that our comparison-based approach is inherently too rough to produce sharp constants anyway. Nevertheless, we note that neither Wilson’s lower bound nor our upper bound change if we replace by any other precision in the definition (2), thereby establishing what is known as a pre-cutoff. Improving this to a true cutoff (i.e. ) remains a fascinating open problem.
We would like to close this section with a plausible extension of Theorem 2, inspired by an analogous result that we recently obtained for the Zero-Range Process [19, Corollary 3].
Conjecture 2** (General comparison).**
All finite connected graphs satisfy
[TABLE]
Note that a proof of this would immediately imply Conjecture 1.
2.2 Emergence of macroscopic cycles
One statistics of particular interest is the cycle structure of the random permutation , as a function of the time . On the infinite dimensional lattice with , a long-standing conjecture of Tóth [31] predicts a phase transition, indicated by the sudden emergence of infinite cycles at some critical time . This is related to a major open problem about the so-called quantum Heisenberg ferromagnet in statistical mechanics. To the best of our knowledge, the phase transition has only been proved on infinite regular trees [5, 15].
In the case of a large finite graph , the relative lengths of cycles in a uniform random permutation asymptotically follow the Poisson-Dirichlet distribution (see, e.g., [30]). In particular, is likely to contain a macroscopic cycle at time . By analogy with Tóth’s conjecture, one should however expect macroscopic cycles to emerge much before the mixing time. This was established in a precise sense by Schramm [30] in the mean-field case where , see also [6, 7]. Alas, results on other finite graphs are quite limited. In [3], Alon and Kozma obtained intriguing identities – involving the irreducible representations of the symmetric group – for the expected number of cycles of a given size in on any finite graph. Using these identities, they obtained a comparison-based estimate on the quantity
[TABLE]
Theorem 3** (Alon and Kozma [4]).**
All finite graphs satisfy
[TABLE]
Thus, our main result implies that on high-dimensional graphs, macroscopic cycles do indeed emerge much before a single particle even mixes (recall (20)).
Corollary 3** (Giant cycles).**
All connected product graphs of fixed side-length satisfy
[TABLE]
We note that Corollary 3 may not be sharp: it is actually quite possible that the macroscopic cycles already emerge at time (where is the number of terms in the product), although proving this would require new ideas beyond the Alon-Kozma estimate (25). When specialized to the hypercube , Corollary 3 complements a result of Koteckỳ, Miłoś and Ueltschi [22] regarding the appearance of mesoscopic cycles. It also complements a recent result by Adamczak, Kotowski and Miłoś [1], who established a phase transition for the emergence of macroscopic cycles on the dimensional Hamming graph . Finally, we note that, by virtue of [4, Theorem 13], our main result also has direct implications on the magnetisation of the quantum Heisenberg ferromagnet.
2.3 Exclusion process
Another widely-studied interacting particle system is the exclusion process [14, 27, 26, 21]. For a finite graph and an integer , the particle exclusion process (ex-k) on is a Markov chain on the set of element subsets of , with generator given by
[TABLE]
where denotes the symmetric difference and the edge-boundary of in . This process describes the set occupied by fixed particles under the ip. More precisely, the ex-k with initial condition can be constructed from the ip by setting . This observation, together with (8), easily implies that
[TABLE]
where denotes the complete graph on , and the optimal constant in the functional inequality . Recalling (19) and the fact that , we obtain the following corollary.
Corollary 4** (Comparison constant for ex-k).**
For all connected product graphs of side-length , and all , we have
As a consequence, one can transfer many quantitative estimates from to . This includes the inverse log-Sobolev constant , defined as the smallest number such that
[TABLE]
for all , where is expectation under the uniform law. This constant provides powerful controls on the underlying Markov semi-group [12]. It is easy to see that
[TABLE]
On the other hand, the log-Sobolev constant of the exclusion process on the complete graph (Bernoulli-Laplace model) was determined by Lee and Yau [24, Theorem 5]:
[TABLE]
In particular, this allows us to pinpoint the exact order of in the dense-particle regime.
Corollary 5** (Log-Sobolev constant of ex-k).**
Fix , . Then, for all connected product graphs of side-length and all , we have
Finally, we note that our main result also implies an upper bound of order (uniformly in ) on the mixing time of ex-k on the hypercube , complementing a total-variation estimate recently obtained by Hermon and [18] (as part of a much more general result).
3 Proof of the main result
3.1 Proof outline
The lower bound in Theorem 2 is easy. Indeed, if is any finite graph and if denotes the complete graph on , then the very definition of implies
[TABLE]
where the first equality uses (8). For a graph product of side-length , we deduce
[TABLE]
The remainder of the paper is devoted to proving a matching upper bound. To do so, we combine four simple ideas, each one corresponding to a step of the proof.
Our first step consists in reducing the analysis of ip on a general dimensional graph-product of side-length to the special case of the Hamming graph . This reduction relies on the classical method of canonical paths. An important simplification is that, by a standard path-lifting procedure, it is actually enough to just compare the single-particle on to that on . See Section 3.2 for details. 2. 2.
Our second step consists in re-interpreting the single-particle dynamics on as a random walk on the additive group , with the increment law being uniform over vectors with a single non-zero coordinate. This algebraic reformulation is performed in Section 3.3. It will allow one to use group-theoretical methods. 3. 3.
The third step consists in exploiting the celebrated octopus inequality [8, Theorem 2.3] to compare the ip with increment law to the ip with increment law (fold convolution), at a cost of order . This is directly inspired by what Alon and Kozma did in [4]. However, instead of taking so as to ensure that is close to uniform (all coordinates being refreshed with high probability), we crucially take only, with the prefactor being carefully adjusted so that only roughly half of the coordinates get refreshed under . This important point is made rigorous by an application of the de Moivre - Laplace Local Limit Theorem, see Section 3.4. 4. 4.
Finally, the last step consists in showing that, although the increment law is still very far from uniform (because of our choice of ), the associated Dirichlet form is actually comparable to the one with uniform increments. This is achieved by constructing canonical paths of length , the underlying intuition being that randomizing all coordinates of a vector can be achieved by randomizing half the coordinates in one step, and the other half in a second step. This is described in Section 3.5.
3.2 Canonical paths
Our starting point is a powerful tool for comparing Dirichlet forms known as canonical paths, see e.g., [11]. As a warm-up, consider the single-particle dynamics (7) with Dirichlet form
[TABLE]
As usual, a path in will be a finite sequence of vertices such that for each . We call the length of the path and denote it by . Also, we refer to as the endpoints of the path , and to as the traversed edges. By a random path in , we simply mean a random variable taking value in the (countable) set of all paths in . We write for the corresponding expectation.
Theorem 4** (Canonical paths, see e.g. [11]).**
Let , be connected graphs on the same vertex set. For each edge , let be any random path in with the same endpoints as . Then, where is the congestion, defined as follows:
[TABLE]
We now make three elementary but important remarks.
Remark 2** (Trivial choice).**
Even in the worst-case situation where is the complete graph on , we can always achieve the poor bound
[TABLE]
by considering a spanning tree of and letting be the unique simple path in connecting the endpoints of . Note that this path is actually non-random. Exploiting randomness and the particular structure of to design paths with a low congestion is a matter of art.
Remark 3** (Congestion behaves well under products).**
If for , we can compare to with congestion , then we can compare to with congestion
[TABLE]
by considering paths that only evolve along a single coordinate, in the obvious way.
Remark 4** (Cayley graphs).**
Theorem 4 simplifies when and are Cayley graphs generated by subsets of a finite group . Indeed, any word can be used to define a path
[TABLE]
in from to , where is the evaluation of in . Consequently, we only have to specify, for each , a random word over whose evaluation is . Moreover, a straightforward computation shows that the resulting congestion is simply
[TABLE]
where denotes the length of a word , and the number of occurrences of in it.
Remark 4 applies in particular to the ip on any graph . Indeed, one has
[TABLE]
with and . Moreover, any path in with endpoints and traversed edges can be lifted to a word over that evaluates to , namely:
[TABLE]
Since the congestion is multiplied by at most ( for the length of the word, and for the number of occurrences of a letter in it), we obtain the following classical result.
Corollary 6** (From canonical paths for rw to canonical paths for ip).**
Under the exact same assumptions (and notation) as in Theorem 4, we also have
[TABLE]
Combining this with Remarks 2 and 3, we obtain the following inequality, which reduces the upper bound of Theorem 2 to the extremal case where is the Hamming graph:
[TABLE]
Corollary 7**.**
For any dimensional connected product graph of side-length ,
[TABLE]
In light of this result, the upper bound in Theorem 2 now boils down to the claim
[TABLE]
for each , to which the remainder of the paper is devoted.
3.3 The octopus inequality
From now on, we fix the side-length and the dimension . Writing for the complete graph on , our goal is to establish the comparison
[TABLE]
where does not depend on . We start by observing that the random walks on and on can both be conveniently viewed as random walks on the group
[TABLE]
equipped with coordinate-wise addition mod (which we will simply denote by ). Given a probability measure on , we recall that the random walk with increment law has Dirichlet form
[TABLE]
for all . In particular, we have the representation
[TABLE]
where and respectively denote the uniform distributions on and on
[TABLE]
Here naturally denotes the support of . Similarly, the ip on with increment law has Dirichlet form
[TABLE]
for , with the interpretation when . In view of (49) (with ip instead of rw), our claim (46) rewrites as
[TABLE]
for some (possibly different) constant that is only allowed to depend on . The proof will crucially rely on the following elegant application of the octopus inequality [8, Theorem 2.3], which we borrow from Alon and Kozma [4]. We include a short proof as our setting is here slightly different. The convolution of two probability measures on is defined by
[TABLE]
Also, we say that a measure on is symmetric if for all .
Lemma 3** (Comparison for convolutions).**
For any symmetric probability measure on ,
[TABLE]
Proof.
If , the octopus inequality [8, Theorem 2.3] asserts that
[TABLE]
for all and (the factor on the right-hand side compensates for the fact that we are here summing over all ordered pairs ). Averaging over all , and applying the (bijective) change of variables on the right-hand side, we obtain
[TABLE]
which is precisely by symmetry of . This proves the claim when . In the general case, we write with , and we observe that
[TABLE]
Thus, the claim is equivalent to . ∎
For reasons that will become clear later, we henceforth set
[TABLE]
Let us introduce the measure defined by
[TABLE]
Since is symmetric and is a power of , we may iterate Lemma 3 to get
[TABLE]
where denotes the fold convolution of . Thus, our goal (52) now boils down to showing that
[TABLE]
for some constant that only depends on . To this end, we analyze the convolution accurately using the de Moivre - Laplace Local Limit Theorem.
3.4 Local Limit Theorem
As a warm-up, consider the binomial expansion of the uniform law: , where
[TABLE]
The classical de Moivre - Laplace Local Limit Theorem provides uniform estimates on the coefficients . Although a specific value of was chosen at (58), the statement is of course valid for any .
Theorem 5** (de Moivre - Laplace).**
There is depending only on such that
[TABLE]
for all , with .
We can use this to approximate with , where is defined as follows:
[TABLE]
Lemma 4** (Plateau proxy for ).**
There is depending on only, such that
[TABLE]
Proof.
If is a symmetric distribution on , the Cauchy-Schwartz inequality yields
[TABLE]
for all , where denotes the total-variation distance. In particular, when , we obtain for all and hence
[TABLE]
Let us now apply this general observation to the restriction of to :
[TABLE]
Note that , and that thanks to our definition of and Chebychev’s inequality for the Binomial. Thus, (67) applies and yields
[TABLE]
where the second inequality uses Lemma 3, and the third . Finally, Theorem 5 ensures that is bounded by a quantity which only depends on . ∎
In order to establish (61), we will now approximate by the distribution
[TABLE]
Note that the center of is twice smaller than that of .
Lemma 5** (Plateau proxy for ).**
There is depending on only, such that
[TABLE]
Proof.
The convolution with describes the following transformation on valued random variables: pick one of the coordinates uniformly at random and, with probability , replace it with a fresh uniform sample from . Consequently, we may construct a random variable with law by setting
[TABLE]
where are independent random variables with the following laws:
- •
is binomial with parameters and ;
- •
are uniform on ;
- •
are uniform on .
In particular, setting , we have
[TABLE]
and our proof boils down to establishing that
[TABLE]
for some constant that only depends on . Now, conditionally on , the variable
[TABLE]
counts the number of distinct coupons collected by time in the standard coupon-collector problem of size . Thus,
[TABLE]
Recalling our definitions (57), and since is a Binomial variable, we easily deduce
[TABLE]
Consequently, Chebychev’s inequality yields
[TABLE]
Now, conditionally on , the random variable is just a Binomial with parameters . In particular, Theorem 2 with instead of ensures that
[TABLE]
where only depends on . Combining (80) and (81) establishes the claim. ∎
3.5 Final comparison
In view of Lemmas 4 and 5, our objective (61) now reduces to establishing the following.
Proposition 1** (Final comparison).**
There exists depending only on , such that
[TABLE]
The crucial ingredient of the proof is the following lemma.
Lemma 6**.**
For any with , we have
[TABLE]
Proof.
Let denote a random element from the set
[TABLE]
Then is a random word of length over , whose evaluation is uniform over . By Corollary 6 and Remark 4, we deduce that
[TABLE]
where the congestion is given by
[TABLE]
The second line follows from the observation that and are uniformly distributed on and , respectively. On the other hand, the definitions of imply
[TABLE]
The claim readily follows. ∎
Proof of Proposition 1.
Our definitions of ensure that and that
[TABLE]
In particular, we have
[TABLE]
Now, since for all , we have
[TABLE]
As varies across , this ratio remains bounded away from [math] and uniformly in . Consequently, Lemma 6 ensures that for all ,
[TABLE]
where depends only on . Inserting this above, we obtain
[TABLE]
This concludes the proof. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Radosław Adamczak, Michał Kotowski, and Piotr Miłoś. Phase transition for the interchange and quantum heisenberg models on the hamming graph. ar Xiv preprint ar Xiv:1808.08902 , 2018.
- 2[2] David Aldous and Jim Fill. Reversible Markov chains and random walks on graphs, 2002. Unfinished manuscript. Available at http://www.stat.berkeley.edu/~aldous/RWG/book.html .
- 3[3] Gil Alon and Gady Kozma. The probability of long cycles in interchange processes. Duke Math. J. , 162(9):1567–1585, 2013. MR 3079255 .
- 4[4] Gil Alon and Gady Kozma. Comparing with octopi. Ann. Inst. Henri Poincaré Probab. Stat. , 56(4):2672–2685, 2020. MR 4164852 .
- 5[5] Omer Angel. Random infinite permutations and the cyclic time random walk. In Discrete random walks (Paris, 2003) , Discrete Math. Theor. Comput. Sci. Proc., AC, pages 9–16. Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2003. MR 2042369 .
- 6[6] Nathanaël Berestycki. Emergence of giant cycles and slowdown transition in random transpositions and k 𝑘 k -cycles. Electron. J. Probab. , 16:no. 5, 152–173, 2011. MR 2754801 .
- 7[7] Nathanaël Berestycki and Gady Kozma. Cycle structure of the interchange process and representation theory. Bull. Soc. Math. France , 143(2):265–280, 2015. MR 3351179 .
- 8[8] Pietro Caputo, Thomas M. Liggett, and Thomas Richthammer. Proof of Aldous’ spectral gap conjecture. J. Amer. Math. Soc. , 23(3):831–851, 2010. MR 2629990 .
