Limit theorems for statistics of non-crossing partitions
Vladislav Kargin

TL;DR
This paper investigates the statistical properties of large non-crossing partitions, establishing limit theorems and distributional behaviors for various statistics, revealing differences from ordinary set partitions.
Contribution
It provides new limit theorems and distributional results for statistics of large non-crossing partitions, including Gaussian, geometric, double exponential, and Theta distributions.
Findings
Number of blocks of fixed size follows a Gaussian limit.
Block sizes are negatively correlated and follow a geometric distribution.
Largest block size concentrates around log2(n) and follows a double exponential distribution.
Abstract
We study the distribution of several statistics of large non-crossing partitions. First, we prove the Gaussian limit theorem for the number of blocks of a given fixed size. In contrast to the properties of usual set partitions, we show that the number of blocks of different sizes are negatively correlated, even for large partitions. In addition, we show that the sizes of blocks in a given large non-crossing partition are distributed according to a geometric distribution and not Poisson, as in the case of usual set partitions. Next, we show that the size of the largest block concentrates at , and that after an appropriate rescaling, it can be described by the double exponential distribution. Finally, we show that the width of a large non-crossing partition converges to the Theta-distribution which arises in the theory of Brownian excursions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Stochastic processes and statistical mechanics · Point processes and geometric inequalities
Limit theorems for statistics of non-crossing partitions
Vladislav Kargin111email: [email protected]; current address: 4400 Vestal Pkwy East, Department of Mathematics, Binghamton University, Binghamton, 13902-6000, USA
Abstract:
We study the distribution of several statistics of large non-crossing partitions. First, we prove the Gaussian limit theorem for the number of blocks of a given fixed size. In contrast to the properties of usual set partitions, we show that the number of blocks of different sizes are negatively correlated, even for large partitions. In addition, we show that the sizes of blocks in a given large non-crossing partition are distributed according to a geometric distribution and not Poisson, as in the case of usual set partitions. Next, we show that the size of the largest block concentrates at , and that after an appropriate rescaling, it can be described by the double exponential distribution. Finally, we show that the width of a large non-crossing partition converges to the Theta-distribution which arises in the theory of Brownian excursions.
1. Introduction
1.1. Definition of NC partitions and bijections to other combinatorial structures
Consider a partition of the ordered set into subsets (blocks) . This partition has a crossing if we can find elements such that and are in one block and and are in a different block. The partitions without crossings are called non-crossing (“NC”) partitions. Their study was initiated in [13] by Kreweras who described many of their properties.
Later, non-crossing partitions and no-crossing pairings (NC partitions with blocks of size 2) have found many applications to problems in random matrix theory, in free probability, representation theory, in theories of meanders and of Temperley-Lieb algebras (see, for example, [2], [18], [22]).
The class of non-crossing partitions belong to a broad family of Catalan discrete structures in the sense that the number of NC partitions of the set is given by the Catalan number and that NC partitions are connected by interesting bijections to other structures in this large family (see Stanley’s book [23] for a very complete description).
Figure 1 illustrate some of these bijections. For example, NC partitions of are in bijection with the Dyck paths with steps. These are paths on the lattice which start at , and then can go either by one step up or one step down, but are not allowed to go below the horizontal axis, and finally end up at . Obviously, each Dyck path has “up” steps and “down” steps. Let the “up” steps be labeled as in order of their appearance. Then, blocks of an NC partition correspond to un-interrupted stretches of “down” steps, and elements of each block can be read off as the closest preceding “up” steps at the same level as the “down” steps in the stretch.
The Dyck paths with steps are in well-known bijection with rooted ordered trees with vertex. This bijection can be described by a depth-first walk that explores the tree. The “up” and “down” steps of the Dyck path correspond to the steps of the walk that, respectively, increase or decrease the distance from the root. The superposition of these two bijections gives a bijection between NC partitions and rooted planar trees, and if we label every edge by the label of the corresponding “up” step in the Dyck path, then blocks of an NC partition corresponds to leaves of the tree. More precisely, each block corresponds to the set of the tree edges which are visited on the way back from a given leaf to either the node where the walker goes to a new, un-explored branch, or to the final node of the walk, the root. The size of the block corresponds to the length of this return trip.
Another bijection is between Dyck paths with 2n steps and ordered rooted binary trees on vertices. Its details are described in Appendix. Here we only note that in this bijection, the partition blocks correspond to the leaves at the end of right-directed edges. For a given leaf, we can construct a path that goes back over the right edges only. Then the partition block is given by labels of the left edges from the vertices in this path and the block length is the length of this path. See Figure 1 for illustration.
It is worthwhile to note that while the blocks of NC partitions can be interpreted in terms of trees, they correspond to properties of the trees which have not been much investigated, in contrast to such popular properties as height and profile of trees.222The height of a tree is the maximal distance from the root to a leaf and the profile describes how many vertices of each given degree is contained in the tree.
1.2. Statistics of usual set partitions
Before exploring the properties of NC partitions, it is useful to recall results about the usual set partitions where the non-crossing condition is not imposed. These results can then be used as benchmark.
We rely here on the book [20] by V. N. Sachkov.
Let is the number of blocks in a random partition of . (When we call an object random, we mean that it is selected from a uniform distribution on the complete set of these objects.) Then, for large ,
[TABLE]
and the distribution of the normalized random variable
[TABLE]
converges to the standard normal distribution as (Theorem 4.1.1 in [20]).
Now, let the random variables , denote the number of blocks that have size in a random partition. Then, the distribution of has the expectation and the variance both equal to , where is the solution of the equation .
If is fixed and is growing then the variances of random variables , are also growing. We can define the normalized random variables
[TABLE]
For a fixed -tuple , the joint distribution of normalized random variables converges to the standard multivariate normal distribution (Theorem 4.2.1 in [20]).
V. N. Sachkov discusses the distribution of the size of the maximum block, and shows that it is concentrated within a neighborhood of the point
[TABLE]
and that in this domain it is close to the double exponential distribution (without any additional normalization). (For a more precise statement, see Theorem 4.5.2 in [20].) Note that is asymptotically close to , hence in the first approximation, the size of the largest block is .
1.3. Statistics of Catalan structures
In this section we very briefly describe what is already known about statistics of non-crossing partitions and related Catalan structures.
While direct studies about statistics of NC partitions are not numerous and are essentially limited to a study by Simion in [21], there are many studies about statistics of Dyck paths and rooted ordered trees. Some results can be translated into the language of non-crossing partitions by using the bijection we described above. The problem with this approach is that these results are often not very natural in the setting of NC partitions.
Now, here is a small list of the results. The study by Simion in [21] investigated statistics of NC partitions arising from restricted growth functions. In [7], Denise and Simion derived generating functions for two statistics on Dyck paths, which they called the pyramid weight and the number of exterior pairs. The generating functions for many other statistics of Dyck paths were derived by Deutsch in [8]. Blanco and Petersen in [4] researched the joint distribution of the area under a Dyck path and the rank of this path.
All these studies do not address the questions of asymptotic behavior of these statistics. In contrast, the asymptotic behavior was researched for statistics of rooted ordered trees, due to the importance of tree structures in the analysis of algorithms. In particular, for various families of trees, the researchers investigated their height, number of leaves, and the distribution of node degrees.
In particular, in [6] and [10], it was shown that the tree height in many families has the expectation proportional to , where is a constant specific to the family and is the size of the tree. The distribution of the height is a so-called theta distribution, named after theta distribution. It has connections to other areas of mathematics, see [3].
It appears that the statistics about Dyck paths do not correspond to such natural statistics of NC partitions as the total number of blocks and the number of blocks of a fixed size. However, the number of leaves of rooted ordered trees can be interpreted as the number of blocks in a NC partition, and we elaborate on the known results about the number of tree leaves by investigating its limit distribution and the distribution of the blocks with a fixed size. In addition, we will use a somewhat non-standard bijection to show that the height of trees has a relation to another characteristic of NC partitions which we call width. As a result we will establish that the width of a non-crossing partition has the theta-distribution.
2. Limit theorems for random NC partitions
2.1. Number of parts
Let us randomly select an NC partition from the uniform distribution on the set of all NC partitions of (denoted ). Then we can define several random variables associated with this random partition. Then we use to denote the number of blocks in and to denote the number of blocks of length in . Obviously, . It is natural to ask the question about the distribution of these random quantities for large .
\egothfamilyTheorem 2.1.
Let be the number of blocks in a random NC partition of . Then
[TABLE]
Let
[TABLE]
Then, as , the cumulative distribution function of converges to the standard Gaussian distribution function .
Note that the expectation and variance are somewhat larger than corresponding quantities for usual partitions, which are and , respectively.
Proof.
This result is easy because after a bijection it follows from analogous results for other Catalan structures. Namely, by the bijections above, the number of blocks in an NC partition corresponds to the number of leaves in a rooted planar tree and to the number of peaks in a Dyck path. The expectation and variance of these quantities are known. See, for example, section 6.1 in [8].
It is also known that the distribution of the number of leaves in rooted planar trees is asymptotically Gaussian. See Examples IX.24 and IX.25 on pp. 678 - 680 in [11]. ∎
2.2. Distribution of the number of blocks of size
Consider a random NC partition of . In the previous section, we have shown that on average, this partition has approximately blocks. How many of them have size ?
First, let us define the relevant generating function. Let be the number of non-crossing partitions of that have blocks of size . Then we define
[TABLE]
\egothfamilyTheorem 2.2.
The generating function satisfies the following equation:
[TABLE]
Proof.
The equation is obtained by applying the symbolic transfer method by Flajolet and Sedgewick to the construction of an appropriate combinatorial class. The class here consists of all non-crossing partitions where the blocks of size are marked by marker . The construction is given by the equation
[TABLE]
Here denotes the empty partition, is an atom (that is, an element of a partition), and corresponds to a block of size containing a marked element (“root”), together with a sequence of NC partitions which are nested between the elements of this block. See Figure 2 for illustration.
Then by the symbolic method (Theorem I.1 and description of markers on p. 167 in [11]), this expression translates to the desired formula for the bivariate generating function:
[TABLE]
∎
For , this equation can be solved explicitly, and we get an explicit formula for the generating function of the number of singletons.
Corollary 2.3**.**
The generating function for the number of singletons in non-crossing partitions is
[TABLE]
\egothfamilyTheorem 2.4.
Let denote the number of blocks of length in a random NC partition of . Then,
[TABLE]
For the variance, we have the asymptotic expression
[TABLE]
where the -term is for and the constant implied in this term may depend on .
The covariance of and , for , is given by
[TABLE]
Note that here we have two differences with the similar result for usual set partitions. First, for large the distribution of partition blocks over sizes is not Poisson with mean , but rather geometric with the expected number of blocks of size approximately . Second, the covariance between number of blocks of two different sizes is not negligible even for large . In particular, after rescaling we cannot expect that random variables will form a Gaussian process with elements independent for different .
Proof of Theorem 2.4.
Let us define and . This function satisfies the equation
[TABLE]
which we can re-write as
[TABLE]
where
[TABLE]
and is a parameter. This expression is suitable for the Lagrange inversion formula for the coefficients in the series , which gives
[TABLE]
where is notation for the coefficient before in the power series expansion of around .
Since , for coefficients in the series , we get
[TABLE]
Then, by basic properties of generating functions, the expectation of the random variable is
[TABLE]
where is the -th Catalan number. Hence, this expectation is
[TABLE]
which is in agreement with the expression in the statement of this theorem.
For the variance, we first compute the second factorial moment as
[TABLE]
Then, we can get the following expansions for the expectation and the 2nd factorial moment:
[TABLE]
which implies that
[TABLE]
and gives the second statement of the theorem.
Finally, in order to calculate the covariance of and , we define a trivariate generating function,
[TABLE]
where is the number of NC partitions of that have blocks of size and blocks of size .
Then, by an argument similar to the argument is Theorem 2.2, we find that satisfies the equation
[TABLE]
Then, the coefficients in the expansion
[TABLE]
can be calculated as
[TABLE]
and
[TABLE]
After some calculations, this leads to
[TABLE]
which completes the proof of the third statement of the theorem. ∎
Now we come to the question about the asymptotic distribution of the number of blocks of a given size.
\egothfamilyTheorem 2.5.
Let denote the number of blocks of length in a random NC partition of . Define
[TABLE]
Then, for every as , the cumulative distribution function of converges to the standard Gaussian distribution function .
Before prooving this theorem, we summarize some tools from the book by Flajolet and Sedgewick. They are collected here for the convenience of the reader.
We say that that a function of complex argument is an analytic generating function (analytic GF) if it is analytic at zero and if its expansion,
[TABLE]
have real non-negative coefficients , .
Definition 2.6**.**
The analytic GF is said to have a stable dominant singularity333Flajolet and Sedgewick say that belongs to the smooth implicit-function schema. at , if there exists a bivariate function such that
[TABLE]
and satisfies the following conditions:
- (A)
is analytic in a domain and for some . 2. (B)
Coefficients are non-negative reals, , and for some and for some . 3. (C)
The number and there exists such that such that
[TABLE]
We say that is the characteristic function of .
The condition in (C) is aimed to ensure that is a singularity of with . Then the conditions in (A) and (B), especially the non-negativity of the coefficients, ensure that this singularity is a quadratic singularity with the smallest absolute value among all singularities of (which is why we call it “the stable dominant singularity”). This statement is explicated in the following theorem. For the case of polynomial or entire with non-negative coefficients the fact that is a dominant singularity can be found in the classic book by Hille ([12], Theorem 9.4.6 on p. 274 of volume I) without mention that the singularity is quadratic. In a more general form it was formulated first in [1] with an error in the set of conditions (see counterexample in [5]) and proved in correct form in [16].
\egothfamilyTheorem 2.7.
Let be an analytic GF that has a stable singularity at with the characteristic function . Then the series in (5) converges at and
[TABLE]
in a neighborhood of , where and
[TABLE]
This theorem allows to extract information about the coefficients in the expansion of . One additional condition is needed. An analytic generating function is called aperiodic if for some , the coefficients are all non-zero and
Corollary 2.8**.**
If analytic GF satisfies the conditions of the previous theorem and aperiodic then
[TABLE]
Now, let us consider the bivariate generating function and let us consider the probability distribution with the following probability generating function:
[TABLE]
We are interested in sufficient conditions on the generating function that ensure that this probability distribution converges (after normalization) to the standard Gaussian law. These conditions are given by Proposition IX.17 in Flajolet-Sedgewick, which we repeat below.
Recall that the variability operator is defined as
[TABLE]
provided that .
Let be a bivariate generating function, analytic at and suppose that it solves the equation , where is a polynomial of degree at least in . Let us define the following conditions that we can impose on .
- (I)
The function has a stable dominant singularity at with the characteristic function . 2. (II)
There is a function (“singularity movement function”) that solves the equation obtained from polynomial equations
[TABLE]
by elimination of variable . This function is analytic in a neighborhood of and , where is as in condition I above. 3. (III)
The function satisfies the variability condition:
[TABLE]
\egothfamilyProposition 2.1.
*Let be a bivariate generating function, analytic at and suppose that it solves the equation , where is a polynomial of degree at least in .
Assume that Conditions I, II, III above are satisfied.*
Then the probability distribution with the probability generating function
[TABLE]
has an asymptotic Gaussian distribution.
Now, we are able to proceed to the proof of our result about the asymptotic Gaussian distribution of the random variables .
Proof of Theorem2.5.
In our case the bivariate generating function is defined in equation (1), and it satisfies the equation (2). It is convenient to define , which satisfies the equation
[TABLE]
Thus, in terms of Proposition 2.1, we can use
[TABLE]
For the solution has a stable dominant singularity at . (Condition I is satisfied.)
If , then the equation leads to
[TABLE]
After substituting this expression into equation and simplifying, we are led to the following equation for (the value of at the branching point).
[TABLE]
We are interested in the expansion
[TABLE]
and we calculate
[TABLE]
Then we get the expansion for by using (17),
[TABLE]
In other words, we found the function which is analytic in the neighborhood of and satisfies Condition II of the proposition. It is a routine calculation to check that Condition III is also satisfied. Hence, the normalized coefficients in have an asymptotic Gaussian limit. ∎
2.3. The size of the largest block
First, we show that the largest block in a typical NC partition has size .
\egothfamilyTheorem 2.9.
Let denote the size of the largest block in a random NC partition of . Then, as ,
[TABLE]
where convergence is in probability.
Note that , and therefore the largest block in a NC partition is on average shorter than in a usual partition where it is around
Proof.
[TABLE]
where . Note that by the Markov inequality,
[TABLE]
for . Then, the inequality above becomes
[TABLE]
as .
In the opposite direction, we can write,
[TABLE]
where . We can estimate the probability on the right hand side of the inequality by the Chebyshev inequality,
[TABLE]
From the asymptotic formulas for the expectation and the variance we obtain,
[TABLE]
which implies that
[TABLE]
as . This concludes the proof. ∎
The next step is to determine the distribution of the largest block size as it deviates from . As it turns out, for large , the largest block size distribution depends on how places itself with respect to powers of . We use notation for the largest integer , and for the fractional part of .
\egothfamilyTheorem 2.10.
Let denote the size of the largest block in a random NC partition of . Let an integer , and define . Then, as ,
[TABLE]
Proof.
First, we find the generating function for NC partitions with blocks whose length is . The symbolic formula for the class of these partitions is
[TABLE]
This leads to the following equation for the generating function:
[TABLE]
or
[TABLE]
Let us use the notation
[TABLE]
Then the equation for is , where .
This leads us to the situation described in Definition 2.6 and Theorem 2.7, where is the characteristic function for . Theorem 2.7 is useful for us because it will allow us to determine the expansion of the generating function near the dominant singularity, and this expansion and the transfer theorems of the symbolic method will give us the asymptotic expression for the number of NC partitions of with all block sizes .
The singularity solves the characteristic system:
[TABLE]
The first equation of the system gives
[TABLE]
and after plugging this expression into the second equation we obtain:
[TABLE]
The solution for this equation is
[TABLE]
And then,
[TABLE]
In order to apply Theorem 2.7 we also compute
[TABLE]
and conclude that
[TABLE]
Using the power expansion for the square root and Theorem VI.4 in [11] to justify that the error term in the formula for can be neglected, we find that
[TABLE]
For Catalan numbers (total number of NC partitions of ) we have the asymptotic approximation
[TABLE]
Hence we have the following estimate:
[TABLE]
We have assumed that and defined , hence . Plugging this into the previous expression, we find that
[TABLE]
∎
3. Width of non-crossing partitions
Let us think about the elements of the set as points on the line of real numbers. If are points in a block , then we can represent the block by semicircles , , …, , where denotes a semicircle in the upper half-plane with the diameter . If the block has only one element , then it is represented by a small vertical interval of length 1/2 that sticks out into upper half-plane at abscissa .
Note that if we draw semicircles and intervals for all blocks of an NC partition , they will be non-intersecting (except trivially for the same block at the real line). We say that this system of semicircles and intervals represents the partition .
Then for every half-integer point , we can calculate width of the partition at as the number of intersections of the vertical line with abscissa and the semicircles in the graphical representation of . (This vertical line never intersects the vertical intervals of singleton blocks.) Let us denote it . Then, the width of an NC partition of the set is the maximum of over all possible ,
[TABLE]
For the asymptotic distribution of the width we have the following interesting theorem.
\egothfamilyTheorem 3.1.
Let be the width of a random NC partition of the set , and let . Then, uniformly in ,
[TABLE]
where
[TABLE]
In particular, this implies that the expected width is . More generally, the -th moment of the distribution is given by the expression:
[TABLE]
where is the Riemann zeta-function. For these formulas, see Example V.8 and Proposition V.4 on pp. 326 - 329 in [11].
In the case when non-crossing condition is not imposed, the width for pairings of points on the line was analyzed. In this case, it was found that width converges to (see [14]). Thus, the typical width of pairings without non-crossing condition is significantly larger. In addition, in the case of general pairings there is some research on the width as a random process in (see Example V.10 on p. 333 in [11] and references within). This is open in the non-crossing case.
We start the proof of Theorem 3.1 by noting that a pairing on is a particular case of a partition and, therefore, its width is well defined. In addition, there is a bijection from NC partitions of to NC pairings of by means of a so-called doubling construction. It is defined in [17] and it is equivalent to a bijection between Dyck paths and NC partitions described in [24].444This path-partition bijection is different from the bijection that we described in the beginning of this paper.
An illustration of the doubling construction in Figure 3 is hoped to be sufficient for understanding of how it works. In this construction, every point is doubled, so instead of a point , we have two points and . If a block is not a singleton, then each line of a block in an NC partition corresponds to two lines in the corresponding NC pairing . A vertical line interval of a singleton block in corresponds to a single line between and in the NC pairing . For formal definition, see formula (2.4) in [17].
We have the following lemma.
Lemma 3.2**.**
If NC partition and NC pairing are related by the doubling construction then .
Proof.
Let be a NC partition of . Consider and let , that is, the vertical line with abscissa intersects partition lines . Then, , since the vertical line with abscissa intersects exactly those lines that correspond to the lines under the doubling construction.
This implies, in particular, that . Suppose that the maximum of is reached at some . If then , hence . Alternatively, if , then must be odd by properties of pairings and since no more than one crossing with a vertical line can be eliminated when the line’s abscissa changes from to . Hence , which together with the inequality above implies that .
∎
Proof of Theorem 3.1.
By preceding arguments, it is enough to prove a corresponding result for the width of non-crossing pairings.
For NC pairings, the number of intersections of the vertical line and the semicircles of a pairing equals the number of pairs such that , that is, the number of pairing arcs that have already started but have not yet been closed. Note that in the standard bijection of NC pairings and Dyck paths, the start of a pair correspond to a step up and the end of a pair corresponds to a step down. Hence, the width of a pairing at equals to the height of the corresponding Dyck path at .
Then the conclusion of the theorem follows from the known results about the height distribution of Dyck paths and rooted planar trees. (See [9] and Example V.8 on p. 326 - 330 in [11]).
∎
Remark: The results about the height of a random planar tree/Dyck path have been obtained in [9] by a difficult analysis of generating functions of trees that have limited height. An alternative approach to prove these results is to note that a random Dyck path converges uniformly almost surely to a Brownian excursion as , (see [15]), and then use known results about Brownian excursions, as described in [19].
Figure 4 illustrates that the width of non-crossing partitions at converges as a process in to a Brownian excursion process.
Appendix A A bijection between Dyck paths with steps and binary planar trees with vertex
The map from paths to the set of these trees is defined recursively. Consider the point of the first return of the path to [math]. There are two possibilities: either it happens at the last step, or it happens before the last step.
In the first case, the path has the form , where is a Dyck path with steps. It follows that there is a binary tree corresponding to . We create a new binary tree by defining a new root vertex with left and right edges and connecting the tree to the right edge. The left edge is marked by the label of the first “up” step in the Dyck path .
In the second case, the Dyck path can be written as where and are two Dyck paths of lengths with , and the path has the property that it never returns to zero except at the last step. By recursion we can build binary trees and on and vertices. The tree has an additional property that it has a leaf immediately on the left of its root. We create a new tree on vertices by gluing the root of to this left leaf of .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Edward A. Bender. Asymptotic methods in enumeration. SIAM Review , 16(4):485 – 515, October 1974.
- 2[2] Philippe Biane. Representations of symmetric groups and free probability. Advances in Mathematics , 138:126–181, 1998.
- 3[3] Philippe Biane, Jim Pitman, and Marc Yor. Probability laws related to the J acobi theta and R iemann zeta functions, and B rownian excursions. Bulletin of American Mathematical Society , 38:435–465, 2001.
- 4[4] Saul A. Blanco and T. Kyle Petersen. Counting D yck paths by area and rank. Annals of Combinatorics , 18(2):171–197, 2014.
- 5[5] E. Rodney Canfield. Remarks on an asymptotic method in combinatorics. Journal of Combinatorial Theory, Series A , 37:348–352, 1984.
- 6[6] N. G. de Bruijn, D. E. Knuth, and S. O. Rice. The average height of planted plane trees. In Graph theory and computing . Academic Press, New York and London, 1972.
- 7[7] Alain Denise and Rodica Simion. Two combinatorial statistics on D yck paths. Discrete Mathematics , 137:155 – 176, 1995.
- 8[8] Emeric Deutsch. Dyck path enumeration. Discrete Mathematics , 204:167 – 202, 1999.
