Limit profile for random transpositions
Lucas Teyssier (ENS Paris)

TL;DR
This paper improves a mathematical tool to better analyze how quickly random transpositions mix to a uniform distribution, enhancing understanding of their convergence behavior.
Contribution
It introduces an improved upper bound lemma for analyzing the limit profile of random transpositions, refining previous methods.
Findings
Enhanced bounds for the mixing time of random transpositions
More precise estimates of convergence to stationarity
Application of the improved lemma to classical random transposition models
Abstract
We present an improved version of Diaconis' upper bound lemma, which is used to compute the limiting value of the distance to stationarity. We then apply it to random transpositions studied by Diaconis and Shahshahani.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Stochastic processes and statistical mechanics · Markov Chains and Monte Carlo Methods
Limit profile for random transpositions
Lucas Teyssier111Pronunciation: [lyka tesje]. Student at ENS and Sorbonne Université. Current email adress: [email protected]
(May 2019)
Résumé222Cet article possède aussi une version en français.
Nous présentons une amélioration du lemme de majoration de Diaconis, qui permet de calculer la valeur limite de la distance à la stationnarité. Nous l’appliquons ensuite aux transpositions aléatoires étudiées par Diaconis et Shahshahani.
Rezumo333Tiu artikolo ankaŭ havas version en Esperanto.
Ni prezentas plibonigon de la superbara lemo de Diaconis, kiu ebligas nin kalkuli la limesan valoron de la distanco al staranteco. Ni poste aplikas ĝin al hazardaj -cikloj studitaj de Diaconis kaj Shahshahani.
Abstract
We present an improved version of Diaconis’ upper bound lemma, which is used to compute the limiting value of the distance to stationarity. We then apply it to random transpositions studied by Diaconis and Shahshahani.
Contents
1 Introduction
1.1 Main results
Let be the symmetric group of indice and the probability on defined by
[TABLE]
This is the random transposition shuffle on , as studied in a landmark paper of Diaconis and Shashahani [8].
Let also be the uniform probability on . If is a set and , are probabilities on , we define the total variation distance444In the proofs we will use the distance, noted , in order not to carry the factor . between and by the formula
[TABLE]
In [8], Diaconis and Shahshahani showed that this random walk undergoes a cutoff phenomenon at , i.e., letting , that for all ,
[TABLE]
Despite a lot of work on mixing times in general and on random transpositions in particular (see references below), obtaining a precise description of the way this transition occurs has remained an open problem, formally asked by Nathanaël Berestycki at an AIM workshop on Markov chains mixing times in 2016 (http://aimpl.org/markovmixing/5/).
Our main result is the following:
Theorem 1.1**.**
Let . Then we have:
[TABLE]
where stands for the Poisson law of parameter .
Limiting profile conjectures
We anticipate the limiting profile , which we obtain in our problem if we replace the time by a slightly more natural time, , to arise for many other mixing time problems on , namely the problems where the last things to be mixed are the fixed points. It seems to be often the case when the probability is constant over conjugacy classes. For example, using the formulas in [10], one can adapt the present proof for random -cycles ( fixed) at time , and we conjecture that the same limiting profile still holds for random conjugacy classes of size , as studied in [3], but that it would be technically much harder to adapt the present proof in that case. For this general case, a beautiful formula (Proposition 10.15 in [16]) used in the proof of the Stanley-Féray formula, which allows to compute any reduced character as an expectation, , might be very useful.
We conjecture that this profile also holds for the random involution walk studied by Megan Bernstein in [4], at time . For other problems where the limiting profile is known, see [1] and [12].
1.2 Links with previous results and idea of the proof
Links with previous results
In 1981, Diaconis and Shahshahani showed in [8], using representations of the symmetric group, a cutoff555In fact their lower bound is so it is not exactly a cutoff. at for the random transposition shuffle, giving asymptotic inequalities at time , fixed. In 1987, Matthews, in [15], refined these results thanks to a probabilistic proof. In 2011, Berestycki, Schramm and Zeitouni generalized in [2] the previous result to the shuffle by random -cycles, for fixed as , proving a cutoff at , conjectured by Diaconis. Finally, in 2014, Berestycki and Şengül generalized again this result, in [3], to any conjugacy class whose support is , and without representation theory.
The proof in [8] relies on the so-called Diaconis’ upper bound lemma, which leads to a sum over irreducible representations which they delicately bound with representation theory and analysis. Actually we can observe that the only place where a lot of information (we lose a factor in the limit of the limit profile) is lost on the limit profile is at the very begining, when the Cauchy-Schwarz inequality is used in the proof of the upper bound lemma. Section 2 presents a remedy to this information loss, improving the upper bound lemma to an approximation lemma (Lemma 2.1) which is asymptotically much more precise. Subsection 4.1, quite technical, generalizes the asymptotic bounds of Diaconis and Shahshahani to any .
Another crucial point of our proof is to pack together, in the sums over the irreducible representations of , all the partitions with the same . More precisely, Subsection 4.2 shows that when is fixed, we can study the sum over the partitions with equal to as a sum over the partitions of the integer , resulting in explicit manipulable formulas.
To understand where the limiting profile comes from, observe that, thanks to the lower bound of Matthews, the key observable is the number of fixed points. The limit profile is the distance between the asymptotic distribution of the number of fixed points of our walk at time , which is a distribution, and that of a pemutation taken uniformly at random, i.e. .
Theorem 1.1 stated above gives support to the following conjecture of Nathanaël Berestycki:
Conjecture 1.2**.**
Let be the first time that all cards have been touched, and let be the state of the deck of cards at this (random) time. Then as .
In other words, the conjecture says that is a stopping time at which the random permutation is well mixed for all practical purposes. Note that at time the permutation contains at least one fixed point, so that cannot converge to zero. Hence, the conjecture implies that is in some strong sense optimal for mixing the deck of cards.
Let us now explain in what way Theorem 1.1 above is related to this conjecture. For any time , let be the random graph which contains an edge if and only if the corresponding transposition has been applied at least once prior to time . Then is essentially a realisation of the Erdős–Rényi random graph with parameters and . It is easy to check that any cycle of the random permutation at time , considered as a set, is a subset of a connected component of . Hence it makes sense to consider the cycle structure of the permutation restricted to any particular connected component of . Let be the largest component of (which is macroscopic if for some , and actually contains all vertices with high probability after time ). is called the giant component of . By a famous result of Schramm [18], the distribution of the lengths of the largest cycles of within , normalised by the total size of the giant component, converges to a Poisson–Dirichlet distribution (in the sense of finite dimensional distributions). Hence these largest cycles can be seen to coincide in the limit with the distribution of a uniform permutation on the giant component (see e.g. [2]). A stronger version of Schramm’s theorem would be the following conjecture (also by N. Berestycki):
Conjecture 1.3**.**
Suppose for some . Given , the distribution of , is approximately uniform, in the sense that in probability as , where is a uniform permutation on the giant component .
It is not hard to see that Conjecture 1.3 implies Conjecture 1.2. Indeed, Conjecture 1.3 implies a very precise description of the structure of close to the mixing time: if , then according to this conjecture would consist, if of a permutation that is approximately uniform on points, plus an extra fixed point; and would otherwise be indistinguishable from a uniform permutation if . Such a description would imply that
[TABLE]
where is the number of fixed points of . It is furthermore relatively easy to check that and hence, still assuming Conjecture 1.3, we would deduce
[TABLE]
where the extra term in the right hand side accounts precisely for the probability that . Of course, this last display is precisely the content of our Theorem 1.1.
Organisation of the article
In Section 2, we present the improvement of Diaconis’ upper bound lemma, using the non-commutative Fourier transform, which brings us back to group representations. In Section 3, we will recall some results on the representations of the symmetric group, get precise estimations of the hook-length and Murnagham-Nakayama combinatorial formulas when the size of our partitions tend to infinity with constant, and we will prove some some upper bounds useful in the sequel. In Section 4, we will prove the announced theorem decomposing approximation by approximation. From now on, will denote without ambiguity the integer
[TABLE]
Idea of the proof
The algebraic objects and will be defined at the begining of Section 2. For all , will denote the number of fixed points of the permutation . For , let us also define the polynomial by the formula . The idea is to first fix , and then to define for all an integer such that when tends to infinity, all the following approximations are true up to .
Rewriting the sum using the Fourier transform and the improvement of Diaconis’ lemma,
[TABLE]
Then, thanks to the polynomial convergence lemma and letting , we will get
[TABLE]
Finally, letting ,
[TABLE]
2 Improvement of Diaconis’ upper bound lemma
In this section we present the improvement of Diaconis’ upper bound lemma. We will stay in the framework of finite groups, but this lemma can be used in a wider framework, of compact groups for example. Our aim is to get a better approximation than in [8] by not using Cauchy-Schwarz before Fourier.
Let be a finite group, the group algebra of and the set of the irreducible representations of . We note triv the trivial representation of and . For , we also name the matrix of the representation , its character and its dimension. Let us first recall the inversion formula for the non-commutative Fourier transform, well-explained in [16]. For and , we have
[TABLE]
We deduce that for all ,
[TABLE]
Besides, as is a function which is constant on every conjugacy class, we know that for each , by Schur’s lemma, is a homothety, of ratio . We hence obtain:
[TABLE]
Now, if instead of having a single group we have an increasing sequence of groups , and if is a well-chosen time depending on (and possibly on another parameter), we will wish to make tend to infinity inside our sums, and thus obtain a convergence to an explicit formula which will prove a cutoff or give a limiting profile. The idea of the following lemma is to spot a finite set of irreducible representations which will (asymptotically) have most of the mass, in order to approximate the sum over all irreducible representations by a sum over only finitely many terms, uniformly in , and then be allowed to make tend to infinity inside the finite sum.
Lemma 2.1**.**
(Approximation lemma) Let be a finite group and . Then:
[TABLE]
Proof
Using the fact that \Bigl{\lvert}\left\lvert a\right\rvert-\left\lvert b\right\rvert\Bigr{\rvert}\leq\left\lvert a-b\right\rvert and triangle inequalites,
[TABLE]
Now, for every irreducible character , by Cauchy-Schwarz inequality and orthonormality of the characters,
[TABLE]
Plugging into , this concludes the proof.
3 The symmetric group and its representations
3.1 Hook-length formula
We recall a few facts from the representation theory of the symmetric group, that we will naturally index by integer partitions . In a diagram associated to a partition, the hook of a box is the number of boxes which are above or on the right of our box. We call the product of the hooks of the partition . For example, consider the partition of the integer filled with its hooks:
12346811
136
14
2
1
.
In this case, we have:
[TABLE]
We now recall the hook length formula, a proof of which can be found in Chapter 3 of [16].
Proposition 3.1**.**
(Hook-length formula) If is a partition of some integer , then . In particular, .
If is an integer partition, we will denote by the truncated partition , where the largest row has been removed. For example if , and in this case we have when ,
[TABLE]
This can be easily generalized and gives the following asymptotic formula:
Proposition 3.2**.**
(Asymptotic hook-length formula) Let and be fixed integers such that . Then when ,
[TABLE]
Proof
Let and . Then when , denoting by the conjugated partition of the partition ,
[TABLE]
Remark 3.3**.**
Actually we will only need the equivalent, but the term in allows us, in the next subsection, to have a better intuition of the modified character ratios.
3.2 Character ratios
Let be a transposition. We define as in [8] the character ratio . We can give different explicit formulas for this object, among which the following symmetric one, which follows from Lemma in [16].
If is a partition of the integer , then we have:
[TABLE]
The modified character ratio, as defined in Section 2, writes as and takes into account that we pick the identity with probability . The following upper bounds are given in [7].
Proposition 3.4**.**
If is a partition of the integer , then
[TABLE]
Moreover, if , then
[TABLE]
We will also need an asymptotic expansion of , easily obtainable from the explicit formula for : If and are non-negative integers such that , then when ,
[TABLE]
and so
[TABLE]
Remark 3.5**.**
In the general case, to guess a cutoff, we want to find a for which as , for the representations which have the most mass. In the case of the symmetric group, as , we want to find such that . For instance, for random transpositions, it is very natural to expect a cutoff at from the formula of , as .
3.3 Mass transfer in the Young graph
It will be convenient to use the formalism of the Young graph for some calculations. Here we are going to study, in the Young graph, a measure transfer from a row to the next one, which can be extended by recurrence to several lines. We will write for some to indicate that is a partition of the integer . We will also write if and to say that the diagram of can be obtained from the diagram of by adding a box. Let us fix an integer . We recall the transition formula for the dimensions of diagrams, which we can find in [11] or [16]: if we fix , then we have the following transfer, which may be of independent interest:
[TABLE]
Let be an integer and a sequence of real numbers. We extend this line to the next line, , as follows, following the edges of the graph: if , we set . Then we have the transfer:
Proposition 3.6**.**
[TABLE]
Proof
[TABLE]
3.4 Permutations usually do not have only little cycles
We set, for and ,
[TABLE]
Let us show that when is fixed, is asymptotically much smaller than .
Proposition 3.7**.**
Let be a fixed integer. Then for large enough,
[TABLE]
where .
Proof
We can see that in , there are at most conjugacy classes, because such a conjugacy class is determined by the number of fixed points, -cycles,…, -cycles of a representative, each one necessarily between [math] and . Let us give an upper bound on the cardinality of such a class. Let be a large integer, a partition of the integer such that and , and the associated conjugacy class. Then if denotes the number of equal to , we have for big enough:
[TABLE]
Moreover this latest product will be greater if the increase, so we can assume without loss of generality that . One of the is therefore necessarily of cardinal greater than . Furthermore, as , we obtain:
[TABLE]
Thus for large enough,
[TABLE]
i.e.
[TABLE]
As , this leads to the desired asymptotic upper bound.
Remark 3.8**.**
This upper bound proves in particular that the ratio tends to [math], even multiplied by any power function, or polynomial. It is this fact that we will use. The case that we did not process is trivial because in this case .
Besides, if we had proceeded more carefully, we could have shown that maximizes the heavy terms of the cardinality of the conjugacy class, and therefore that .
3.5 Upper bound on the number of -cycles
For every permutation and , let denote the number of -cycles in the cycle decomposition of . We recall the well-know law for the number of fixed points of a random permutation666For , we apply the inclusion-exclusion principle to , where , and then generalize for any .
[TABLE]
In particular, we deduce that for all , . Now we generalize this upper bound to the number of -cycles.
Proposition 3.9**.**
Let , then
[TABLE]
Proof
As in the previous paragraph, if is a partition of the integer , we denote by the number of equal to .
[TABLE]
4 Proof of Theorem 1.1
For this whole section, we fix . We recall that .
4.1 Bounding the error
The upper bound is similar to the upper bound of the sum appearing in [7] after applying Diaconis’ upper bound lemma. However, as we want a more precise result, there will be some additional technical difficulties as may be negative.
We can observe that the representations of the symmetric group which contribute the most in the sum
[TABLE]
correspond to partitions with a large first row. We will therefore naturally split according to . We set for all , and integer large enough,
[TABLE]
From Lemma 2.1, we get that for all ,
[TABLE]
It remains to prove that the right hand side of this inequality tends to [math] uniformly in when , and to estimate the second term in the left hand side. Our first task is to bound the error in the approximation.
Lemma 4.1**.**
**(Upper bound on the remainder)
**For all there exist and such that if , then
[TABLE]
Proof
We recall that . Observe that if is a partition of such that , then and so . Let us first bound splitting the sum into pieces. Note that corresponds to , i.e. to , the trivial representation, which disappeared when we used the Fourier transform. Likewise, corresponds to .
[TABLE]
Let us bound these different pieces separately. The first one is the easiest:
[TABLE]
[TABLE]
[TABLE]
where we used in the upper bound for that . If we succeed in proving that is bounded (in ), then we will be able to conclude that is bounded (in ). We will bound a sum a little larger than , namely . Let us begin by a crude bound which will prove useful in the sequel. If , we have
[TABLE]
where the two first inequalities come from Proposition 3.1 and Cauchy-Schwarz, and the before last inequality comes from the fact that each partitions of the integer can be seen as one of the subsets of the set with elements. Therefore we have, using Proposition 3.4 (note that implies that )
[TABLE]
Let us bound . We have, using and ,
[TABLE]
Let be the summand in the right hand side, and note that
[TABLE]
As a function of when is fixed, this is decreasing until and then increasing. If the first and the last ratios are (strictly) less than , then we will have a subgeometric sum, which will hence be bounded. The last ratio, at , is equal to
[TABLE]
For the first ratio, we need to be a little more careful. At , we can have a ratio much larger than , all the more when is little (i.e negative and far from [math]). So we will need to split once more and consider the sum starting at a suitably chosen , depending on but not on . Thus, though the convergence is fast in the case of a positive , already treated by Diaconis and Shahshahani, if is very negative, we will have to consider a very large amount of terms, and the convergence will be much slower. Let be such that
[TABLE]
and large enough such that
[TABLE]
and that the ratio at be less than . Then as all the ratios from are less than , we have:
[TABLE]
Thus, as is fixed, is bounded uniformly in . Let us now treat , which will be slightly easier.
We observe that for all , , hence by ,
[TABLE]
Let be an integer between and . Then
[TABLE]
where is a real constant and is a positive constant. Thus,
[TABLE]
Now we are able to conclude, using the bounds in the proof for . Let , and let such that and . Then for large enough,
[TABLE]
4.2 Polynomial convergence lemma
We now start to estimate the main term.
Lemma 4.2**.**
Let . Then when ,
[TABLE]
where we recall that
[TABLE]
Let us first show how the polynomials , a key element of the proof, arise naturally.
Lemma 4.3**.**
Let be a fixed integer, and a permutation with at least one cycle of length greater777It still works for . than (i.e. ). Then
[TABLE]
Proof of Lemma 4.3
This proof is combinatorial and strongly relies on the Murnagham-Nakayama rule. We first consider as an indeterminate in and recall that, for any permutation and , is the number of -cycles in the cycle decomposition of . For example, if and has a cycle of length greater than , we have, using the Murnagham-Nakayama formula and writing for ,
[TABLE]
We can observe that is a polynomial in . The key observation is that we will be able to compute everything when we take the sum at constant, and that our polynomial, which seemingly has indeterminates, will in reality be a polynomial in only one variable, , the number of fixed points of . This comes from the orthogonality of some characters and the mass transfer (Proposition 3.6), which will make all the other terms cancel. Let us give a little more details.
For the polynomial algebra , we will not use the canonical basis generated by the , but rather the one generated by the , better suited here.
Let . If is a partition of such that , then the coefficient of in is naturally the number of ways we can fill the Young diagram of with all the numbers from to with line and column growth, i.e. the number of standard tableaux of , which is .
More generally, if are such that , then the coefficient of
[TABLE]
in is
[TABLE]
Thus, by orthogonality of the characters, the coefficient of in the sum
[TABLE]
is
[TABLE]
By mass transfer, we can also observe that for , if has at least fixed points (if it has less, the coefficient is zero), the coefficient of
[TABLE]
in the sum
[TABLE]
is times the coefficient of
[TABLE]
in the sum
[TABLE]
where has less fixed points than , but as many -cycles for each , coefficient which is zero except when , where it is equal to . To summarize, we have shown that
[TABLE]
Proof of Lemma 4.2
Using the fact that \Bigl{\lvert}\left\lvert a\right\rvert-\left\lvert b\right\rvert\Bigr{\rvert}\leq\left\lvert a-b\right\rvert and the triangle inequality,
[TABLE]
Let us now split the sum on into two parts, along and , and let us bound each of these two sums separately. We begin by the sum on . As in our sum and ,
[TABLE]
where is a constant depending only on and . Let us treat the second sum, which we rewrite using Lemma 4.3:
[TABLE]
Let us observe that
[TABLE]
for all and such that . (Note that there are only a finite number of such terms.) We split the right hand side according to whether is larger or smaller than . On the one hand,
[TABLE]
On the other hand,
[TABLE]
4.3 Neglecting polynomials of high degree
Lemma 4.4**.**
Let . There exist such that for all and ,
[TABLE]
Proof
Let . Then we have, using again \Bigl{\lvert}\left\lvert a\right\rvert-\left\lvert b\right\rvert\Bigr{\rvert}\leq\left\lvert a-b\right\rvert,
[TABLE]
Now we observe that if ,
[TABLE]
and if ,
[TABLE]
We therefore conclude that
[TABLE]
when .
Before proving the last approximation, let us rewrite the infinite sum inside the absolute values. Let us define
[TABLE]
Proposition 4.5**.**
Let . Then
[TABLE]
Proof
We just need to make a change of variables and swap the two sums:
[TABLE]
4.4 Conclusion of the proof
Lemma 4.6**.**
When , we have:
[TABLE]
where denotes the Poisson law of parameter .
Proof
As factorials grow much faster than exponentials, and hence than , we have as ,
[TABLE]
We are now ready to combine all our estimates.
Proof of Theorem 1.1
Let and such that for , all the approximations be true up to . Let .
[TABLE]
From Lemma 4.2,
[TABLE]
From Lemma 4.4,
[TABLE]
From Lemma 4.6,
[TABLE]
Consequently, by triangle inequalities,
[TABLE]
Thus, we proved that for all ,
[TABLE]
To conclude, let us rewrite this expectation into the natural form of the wording:
[TABLE]
This concludes the proof of Theorem 1.1.
Acknowledgements
I am very thankful to my former professor and advisor, Justin Salez, who introduced me to mixing times and then took great care of me during my master thesis. I would also like to thank Nathanaël Berestycki, for his hospitality when he invited me to the University of Vienna, and for numerous helpful suggestions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Dave Bayer, Persi Diaconis. Trailing the Dovetail Shuffle to its Lair. Ann. Appl. Prob. , 2(2):294-313, 1992.
- 2[2] Nathanaël Berestycki, Oded Schramm, Ofer Zeitouni. Mixing times for random k 𝑘 k -cycles and coalescence-fragmentation chains. Ann. Probab. ,39(5):1815-1843, 2011.
- 3[3] Nathanaël Berestycki, Bati Şengül, Cutoff for conjugacy-invariant random walks on the permutation group, Probab. Theor. Rel. Fields , to appear.
- 4[4] Megan Bernstein, A random walk on the symmetric group generated by random involutions. Electronic Journal of Probability , 2018.
- 5[5] Megan Bernstein, Evita Nestoridi, Cutoff for random to random card shuffle, submitted
- 6[6] Philippe Biane. Combien de fois faut-il battre un jeu de cartes? Gaz. Math. No. 91, 4-10, 2002.
- 7[7] Persi Diaconis. Group representations in probability and statistics. Institute of Mathematical Statistics Lecture Notes - Monograph Series, 11. Institute of Mathematical Statistics, Hayward, CA, 1988.
- 8[8] Persi Diaconis, Mehrdad Shahshahani. Generating a random permutation with random transpositions. Z. Wahrsch. Verw. Gebiete , 57(2):159-179, 1981.
