Stein's method via induction
Louis H. Y. Chen, Larry Goldstein, Adrian R\"ollin

TL;DR
This paper introduces an inductive approach to Stein's method that effectively handles non-bounded variables with complex dependencies, providing optimal rate bounds for normal approximation in new applications.
Contribution
It develops a novel inductive technique for Stein's method that applies to non-bounded couplings and demonstrates its effectiveness on complex dependent structures.
Findings
Achieved Berry-Esseen bounds for Erdős-Rényi graphs with fixed edges.
Applied to Jack measure on tableaux, showing method's versatility.
Produced bounds in Kolmogorov metric with optimal rate.
Abstract
Applying an inductive technique for Stein and zero bias couplings yields Berry-Esseen theorems for normal approximation for two new examples. The conditions of the main results do not require that the couplings be bounded. Our two applications, one to the Erd\H{o}s-R\'enyi, random graph with a fixed number of edges, and one to Jack measure on tableaux, demonstrate that the method can handle non-bounded variables with non-trivial global dependence, and can produce bounds in the Kolmogorov metric with the optimal rate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
STEIN’S METHOD VIA INDUCTION
Louis H. Y. Chen∗, Larry Goldstein‡
and Adrian Röllin∗
(*National University of Singapore∗
and University of Southern California‡*)
Abstract
Applying an inductive technique for Stein and zero bias couplings yields Berry-Esseen theorems for normal approximation for two new examples. The conditions of the main results do not require that the couplings be bounded. Our two applications, one to the Erdős-Rényi random graph with a fixed number of edges, and one to Jack measure on tableaux, demonstrate that the method can handle non-bounded variables with non-trivial global dependence, and can produce bounds in the Kolmogorov metric with the optimal rate.
00footnotetext: AMS 2000 subject classifications: Primary 60F05; secondary 05C07, 05C80, 05E1000footnotetext: Keywords: Kolmogorov distance, optimal rates, Erdős-Rényi random graph, Jack measure
1 Introduction
We present new Berry-Esseen theorems for sums of possibly dependent variables by combining both the Stein and zero bias couplings of Stein’s method with the inductive technique of Bolthausen (1984) originally developed for the combinatorial central limit theorem. We apply these results to obtain normal approximations in the Kolmogorov metric for two new examples.
Stein’s method (Stein, (1972), Stein, (1986)) typically proceeds by coupling a random variable of interest to a related variable ; for an overview see Chen, Goldstein and Shao (2011) and Ross, (2011). Here we develop results that can be applied to the Stein couplings of Chen and Röllin, (2010) and to the zero bias couplings of Goldstein and Reinert, (1997), thus encompassing most of the known couplings that have appeared in the literature, including settings not typically framed in terms of couplings, such as local dependence. The innovation here is the widened scope of the couplings that can be handled that permit applications when the difference between and the coupled is not almost surely bounded by a constant, or where the bound on this difference increases in the problem size. This work is a broad extension and continuation of Ghosh (2009), applying induction and the zero bias coupling for the combinatorial central limit theorem where the random permutations are involutions, and of Goldstein (2013) using the size bias coupling to study degree counts in the Erdős-Rényi random graph; the inductive method considered here is inspired by Bolthausen, (1984), but goes ultimately back to Bergström, (1944).
At the center of Stein’s method is the characterization that is a standard normal random variable if and only if
[TABLE]
for all locally absolutely continuous functions for which the above expectations exist. Given a standardized variable whose distribution is to be compared to , and a test function on which to evaluate the difference , one solves the Stein equation
[TABLE]
for . The difference may then be evaluated by substituting for and taking expectation on the left hand side of (1.1), rather than the right. One explanation of why the expectation of the left hand side may simpler to compute, or bound, than that of the right is that it depends only on the distribution of , whereas the right also depends on that of . In particular, on the left hand side one may apply couplings of to auxiliary random variables having properties that allow for convenient manipulations.
In Theorem 1.1 we present results for situations in which one can form a Stein coupling as defined by Chen and Röllin, (2010). Following the treatment there, we say that the triple of random variables is a Stein coupling when
[TABLE]
for all functions for which the expectations above exist. It is not difficult to see that the canonical exchangeable pair coupling of Stein, (1986), and the size bias coupling of Goldstein and Rinott, (1996) are both special cases of Stein couplings. Indeed, recall that for we say is a -Stein pair if is exchangeable and
[TABLE]
In this case, it is easily verified that (1.2) is satisfied with
[TABLE]
Likewise, for a non-negative random variable with finite mean , we say that is a size bias coupling of when has the -size bias distribution, that is, when
[TABLE]
for all functions for which these expectations exist. Again, it is easy to verify that for such couplings (1.2) is satisfied with
[TABLE]
In particular, Theorem 1.1 extend results in Goldstein (2013) for the size bias coupling.
Theorem 1.2 provides a parallel result for the zero bias coupling of Goldstein and Reinert, (1997). Recall that for a non-trivial mean zero, variance random variable , we say that has the -zero biased distribution if
[TABLE]
for all functions for which the quantities above exist.
In Stein’s method in general, simplification occurs when one can achieve couplings of to an appropriate such that the difference is almost surely bounded, or bounded uniformly in the size of the problem. However, in many situations appropriately bounded couplings may be difficult to construct, whereas unbounded couplings seem to appear naturally. Hence Theorems 1.1 and 1.2, which do not impose restrictive boundedness conditions, may be applied to produce new results in a variety of examples.
General Framework.
Let and be two measurable spaces, the parameter space and the sample space, respectively. All random variables are understood to be real valued measurable functions from the product space . The distribution of a random variable is determined by a parameter through a given transition kernel from to . That is, for each , is a probability measure on , and for each , the map is -measurable. Depending on context and emphasis, we may also write as or , so that, for instance, .
These measurability conditions are needed to assure the measurability of mappings that appear later, such as of the mean , the variance of , and of , which represents the value of at the parameter used in the inductive step. These conditions will not always be invoked explicitly below; we illustrate their use by showing in the Appendix, Section 5, that this latter variable in particular is measurable.
Our goal is to obtain bounds on the Kolmogorov distance between the standardized version of a random variable and the normal distribution in terms of the parameter . Theorems 1.1 and 1.2 below yield a bound of the form for a positive ‘rate’ function of and a constant not depending on .
As noted, one main step our method requires is to couple to a random variable , which satisfies either the Stein coupling relation (1.2) or the zero bias coupling relation (1.4). In order to apply induction, we identify a subset in Condition (G1), consisting of the ‘nicely behaved’ parameters; its complement plays the role of the base case, on which the bound may be trivial. For our bound to be informative, it is necessary that the rate function be unbounded on \Smiley.
For the induction step, we also introduce a sub -algebra that, roughly speaking, captures the information about the changes that were necessary to construct from (or equivalently, from ); the coarser is, the better the normal approximation will be. A certain tension is created here, as must be large enough to contain the variables describing the changes from to , but small enough so that the conditional distribution of on , is sufficiently close to its original one.
Conditional on , the variable may no longer have its original distribution, but induction is viable when one can identify within another variable that has a distribution similar to the original ; when the parameter space is ordered, typically has a smaller parameter. For a successful induction, the parameter of the smaller problem should not stray too far from that of . There is some leeway here, as it suffices to have control over an event , as specified in Condition (G4). Intuitively, the event should contain the bulk of the support of the variables that generate , and not their extremes. For instance, for the Erdős-Rényi graph problem considered, contains the label and degree of a chosen vertex on which the coupling is based, and is an even on which its degree is ‘not too large’.
Relaxing the condition that the difference be bounded, we control the magnitude of this difference by its moments. Moreover, we upper bound by , and in the case of a Stein coupling, also by , where these majorizing variables are required to be measurable; we are able to handle exceptional or boundary cases as these upper bounds are only required to hold on . We will also require the existence of a random variable that bounds the absolute difference , and which is not ‘too large.’ See Conditions (G3), (G4) and (G6) for the case of Stein couplings.
There is also some leeway in that the distribution of , conditionally on , only needs to be close to that of on an event . Precisely, for the Stein coupling case, with similar remarks also applying to zero bias couplings, we impose in Condition (G5) that
[TABLE]
where is the (typically random) parameter capturing the conditional distribution of the embedded variable . For clarification, by (1.5) we mean
[TABLE]
With the help of , a recursive inequality for a bound on the distance between and the normal can be produced.
Before attempting to apply the methods presented in this article, it is advisable that a user first ‘test the waters’ by constructing a Stein or zero-bias coupling and proving a normal approximation for a smooth metric such as the Wasserstein distance; see Chen and Röllin, (2010), or Goldstein (2007), respectively. Once this goal has been achieved, the sigma-algebra will typically arise naturally from the coupling construction, and one may then proceed to identify a suitable variable whose conditional distribution given is within the same class of distributions determined by and close to that of . For instance, in occupancy problems, a Stein coupling or zero-bias coupling typically involves moving around a small number of balls among a small number of urns, and will typically again represent an occupancy problem, but on fewer balls and fewer urns.
1.1 Abstract approximation theorems
We now state the conditions required for our main results. The inverse rate function is assumed to be a positive function, measurable in , a condition satisfied for all natural examples, including the ones considered here. The mean and variance are measurable by the conditions in our General Framework. To avoid repetition, the distribution of random variables indicated after has been fixed is with respect to . The random variable will always denote the standard normal.
The variable denotes the unstandardized random variable of interest. Theorem 1.1 shows that the following set of conditions are sufficient for the Kolmogorov distance between the standardized version of and the normal to be bounded by for some universal constant .
- (G1)
Let be a positive measurable function, let be a positive number, and let
[TABLE]
Assume that is chosen such that for all . 2. (G2)
For all , let and , and define
[TABLE]
whenever , and set otherwise. Let and be two random variables such that, for each , is a Stein coupling, in the sense of (1.2), with respect to . 3. (G3)
With assume that
[TABLE] 4. (G4)
For each , let be a sub--algebra. Let and be random variables such that, for each , the mappings and are -measurable and such that, on some event which need not be in , we have , , and
[TABLE] 5. (G5)
Let be a -valued random element such that, for each , is -measurable. Let be a random variable, and for each , let be such that
[TABLE]
and
[TABLE] 6. (G6)
Let be a random variable such that, for each , is -measurable,
[TABLE] 7. (G7)
Assume
[TABLE]
where the essential suprema are taken with respect to .
Theorem 1.1**.**
If Conditions (G1)– (G7) are satisfied, then there exists a constant , independent of , such that
[TABLE]
Theorem 1.1 extends Theorem 1.1 in Goldstein (2013), which produces a Kolmogorov bound equivalent up to constants to the bound in Chen and Röllin, (2010) for the Wasserstein distance to the normal for bounded size bias couplings. In addition, the bound produced by Bartroff and Goldstein, (2013) by an application of Theorem 1.1 of Goldstein (2013) to counts in a multinomial occupancy model was shown there to be of optimal order by the lower bound (1.6) of Englund, (1981), see also (1.7) of Bartroff and Goldstein, (2013); the bound of Theorem 1.2 of Goldstein (2013), using also Theorem 1.1 of that same work, for degree counts in the Erdős-Rényi random graph can also be shown to be optimal up to constant factors in the same manner.
When higher moments exist a number of the conditions of the theorem may be verified using simpler expressions, obtained via standard inequalities. For instance, using and that in (1.2) shows that , hence applying the Cauchy-Schwarz inequality to the first expression in (1.7) in Condition (G3) above, followed by a consequence of the conditional variance formula, we obtain
[TABLE]
where is any -algebra with respect to which is measurable.
We now state a parallel result for zero bias couplings.
- (Z1)
Let be a positive measurable function, let a positive number, and let
[TABLE]
Assume that is chosen such that for all . 2. (Z2)
Let and , and define
[TABLE]
whenever and otherwise. Let be defined on , such that for each the variable has the -zero bias distribution as in (1.4) with respect to . 3. (Z3)
For each let be a sub-sigma algebra of , let , and let be a random variable such that is -measurable, and let be an event, which need not be in , on which and such that
[TABLE] 4. (Z4)
Let be a random variable, and let be a -valued random element such that, for each , is -measurable. For each , let be an event in such that
[TABLE]
and
[TABLE] 5. (Z5)
Let be a random variable such that, for each , is -measurable, and
[TABLE]
Theorem 1.2**.**
If Conditions (Z1)– (Z5) and (G7) are satisfied, then there exists a constant , independent of , such that
[TABLE]
Many of the conditions of Theorem 1.2, as for Theorem 1.1, can be shown to be satisfied using inequalities on moments. The proofs of Theorems 1.1 and 1.2 appear in Section 4.
1.2 Applications
We apply Theorems 1.1 and 1.2 to obtain new results in two examples; the proofs are deferred to Sections 2 and 3.
The first examples invokes Theorem 1.1 for Stein couplings for the normal approximation of the number of isolated vertices in the Erdős-Rényi graph on vertices, having exactly edges, distributed uniformly at random. This model is related to the one where edges between each pair of vertices are chosen independently with some fixed probability, but in the model we consider the indicators that vertices are isolated exhibit a non-trivial global dependence since the total number of edges is fixed. In fact, while in the model with independent edges these indicators are positively correlated, the effect of the global dependence in is stronger, resulting in a negative correlation; see proof of Lemma 2.5.
Related work was done by Kordecki, (1987) on the number of isolated vertices in the Erdős-Rényi graph model, although his general framework is not applicable here.The boundedness of the second derivative of the solution to the Stein equation on page 132 is shown only for the points where the second derivative exists, whereas, in order to perform the Taylor expansion on page 135, it is needed to hold everywhere; we were thus not able to reproduce his final results. In addition, the fixed number of edges model does not appear to satisfy the condition on page 134 of his work. We also mention the work by Goldstein (2013), who considered vertex degrees in general, though it only addressed the independent edge model.
Theorem 1.3 provides the following bound on the Kolmogorov distance between the standardized variable and the normal.
Theorem 1.3**.**
Let count the number of isolated vertices in the Erdős-Rényi graph on vertices, having exactly edges, distributed uniformly at random. Then, with and the mean and variance of , letting when and zero otherwise, with
[TABLE]
there exists a universal constant such that, for all ,
[TABLE]
where
[TABLE]
Remark 1.4**.**
In order to better understand the bounds obtained in Theorem 1.3, we now discuss in more detail the different regimes at which and can tend to infinity. To this end, denote by that , and by that and . By Lemma 2.7, if and tend to infinity so that , then
[TABLE]
Hence, we have
[TABLE]
so that
[TABLE]
For , the central domain, it follows that , and moreover, in the special case where ,
[TABLE]
Regarding lower bounds, (Englund,, 1981, Section 6) shows that for the standardized number of occupied cells in a uniform occupancy model with balls and boxes,
[TABLE]
Englund’s argument holds without changes for any random variable with finite variance supported on the integers, and so also for the number of isolated vertices in our model. Hence, since in the central domain , the rate function is of optimal order.
If and , the left domain, say, then
[TABLE]
since as for the first relation, and for the second. In this case, Englund’s lower bound is not achieved since . Nonetheless, the bound is informative as long as , which is the case as long as , such as when for and .
If , the right domain, using for the second relation we have
[TABLE]
so Englund’s lower bound is not attained. However, goes to infinity when for .
In the second example, we use the zero bias coupling constructed in (Fulman and Goldstein,, 2011, Theorem 3.1) in Theorem 1.2 to give a bound on the normal approximation of the content of a Young tableux under Jackα measure over a range of large . In more detail, we recall that a partition of a positive integer can be represented as a vector of non-increasing, positive integers summing to , where is the number of parts of the partition. For instance, corresponds to a partition of with . In turn, the partition can be represented by a tableaux with rows of equal sized boxes, whose row is of length , such as in (1.23).
The Jackα measure on tableaux, defined for , recovers the Plancherel measure when specializing to the case . Under Jackα, see Fulman, (2004) for instance, the probability of a partition of is given by
[TABLE]
where the product is over all boxes in the partition, denotes the number of boxes in the same row of and to the right of (the “arm” of ), and denotes the number of boxes in the same column of and below (the “leg” of ). For each tableaux representing a partition of we may define the -content of any individual box by
[TABLE]
as depicted in the following tableaux for the partition of 7:
[TABLE]
Here we study the distribution of the standardized sum of the -contents over all boxes in the tableaux, that is,
[TABLE]
and where the partition of is sampled from the Jackα measure in (1.22).
Fulman, (2004) proved an bound for the error in the Kolmogorov metric for the normal approximation of , improved by Fulman, (2006) using martingales to for any , and by Fulman, (2006) to using Bolthausen’s inductive approach and Stein’s method, but without an explicit constant. Hora and Obata, (2007) prove a central limit theorem, with no error bound, for using quantum probability.
Fulman and Goldstein, (2011) prove the bound
[TABLE]
in the Wasserstein metric , where is a standard normal variable. In addition to providing explicit constants, this bound also highlights the role of . A natural question it brings is whether a bound in the Kolmogorov metric can be shown that has this same dependence on . A few weeks before the current work was posted, (Chen and Thánh,, 2019, Theorem 1.1) proved the bound
[TABLE]
which achieves this goal with an explicit constant to within a logarithmic factor.
Here, given any , we show that, in the ‘large ’ region , this log factor may be removed, resulting in the bound having the same dependence as (1.25). That is, as over the region we consider, the ratio between the right hand sides of (1.25) and (1.26) is bounded away from zero and infinity. This same result, with an explicit constant, was also achieved by (Chen and Thánh,, 2019, Proposition 4.1) by applying a different approach. We do not consider , as Theorem 3.1 below shows that this case is degenerate.
Theorem 1.5**.**
For as given in (1.24) with sampled according to Jackα measure for some , for every there exists a constant depending only on such that
[TABLE]
We remark that by applying the reasoning at the end of the proof of Theorem 4.1 of Fulman and Goldstein, (2011) the result holds also for when replacing the on the right hand side by . In the computations that follow, without subscript will denote a universal constant whose value may change from line to line, and for a non-negative integer, will denote the set .
2 Isolated vertices in the Erdős-Rényi random graph
In this section we prove Theorem 1.3. We begin by reviewing Construction 2A of Chen and Röllin, (2010) for Stein couplings. Let be a collection of mean zero random variables, and let be a random index uniformly distributed over , independent of . Let and suppose that for each there exists such that
[TABLE]
Then, with , the triple is a Stein coupling. To verify the claim, first note that
[TABLE]
On the other hand,
[TABLE]
so (1.2) holds.
2.1 Isolated vertices in
Consider the Erdős and Rényi, (1960) random graph on vertices, having exactly edges, distributed uniformly at random. Let be the degree of vertex , and consider the number of isolated vertices
[TABLE]
With , the mean and variance of are given by, respectively,
[TABLE]
We remark that though there may be a choice of couplings for a given situation, the coupling we have chosen will work for the more general problem where is a sum
[TABLE]
of functions of the degree of vertex . For instance, the size bias coupling will work, as in Goldstein (2013), for counting the number of vertices having specified degrees, but not in this greater generality.
Proof of Theorem 1.3.
The proof consists of the setting up the framework, and then checking that Conditions (G1)–(G7) hold, with Condition (G2) requiring the construction of a Stein coupling. First, let be the enumeration of all unordered pairs with , given by
[TABLE]
Let be a uniformly chosen random permutation of . We will describe the construction of a graph , determined by and , that has distribution . As is determined by , and hence by , may be omitted in the notation for the graph; the same principle will be applied without comment for like quantities that appear later.
We construct as follows. For each with , connect vertices and with an edge if and only if
[TABLE]
where is the index in the enumeration (2.2) corresponding to the pair . Clearly this construction results in a graph with edges, precisely, those with labels . Since is uniform it is immediate that . Let be the degree of vertex in , let
[TABLE]
We now verify the conditions of Theorem 1.1 with and as given in (1.20) and (1.21), respectively.
Condition (G1).
Let , , and be as in Lemma 2.7. Now obtain in the definition (1.6) of \Smiley through Lemma 2.8 and the choices
[TABLE]
Since our definition of in (1.21) implies that whenever , the condition that on \Smileyis satisfied. Note that by Lemma 2.8
[TABLE]
Condition (G2).
For , let
[TABLE]
and set otherwise. Assume . Let be a collection of uniform random permutations of , with mutually independent. The purpose of the following algorithm is to take the graph as input and to construct, for each vertex , a graph on the vertices , having distribution , independent of , and which can be closely coupled to .
We first describe the algorithm in words: Initialise counters and that respectively record the number of edges successfully relocated, and the index of a candidate edge for possible addition to the new graph; for each given vertex , begin with and relocate the edges incident to uniformly by, incrementing when needed, adding as a new edge when it connects two vertices, neither of which are incident to (Step 6), and which are not already connected (Step 7). The counter records the number of edges successfully relocated, and the set holds their locations (that is, indices) in . At termination, the set will have size .
Algorithm 1. Fix .
@afterheading
- 1.
Let 2. 2.
Let be equal to , but with vertex and all edges incident to removed. 3. 3.
Let and . 4. 4.
If , then denote the resulting graph by , and stop. 5. 5.
Let . 6. 6.
If , then return to Step 5. 7. 7.
If , that is, if is an edge in , then return to Step 5. 8. 8.
In connect the vertices in by an edge, and let . 9. 9.
Let . 10. 10.
Return to Step 4.
It is not difficult to see that the algorithm will succeed in redistributing the edges incident on if and only if , which is guaranteed by our choice of \Smiley. Note that, given , and , the construction of from is deterministic and hence, for given , and , will always result in the same graph . Note also that, although has only vertices, we keep the labeling from the original graph . Since the order at which potential locations where the edges are added are sampled uniformly at random without replacement (via ), it is clear that , up to vertex labeling.
Now, let as in (2.7). With a uniformly chosen vertex from , independent of , and recalling the notation in (2.4), let
[TABLE]
For , let be the degree of vertex in the graph , let
[TABLE]
and
[TABLE]
Since the distribution of is the same regardless of the value of , we conclude that and are independent, so (2.1) holds, implying is a Stein coupling.
Condition (G3).
In what follows, consider a fixed , and drop the subscript in the expectations that follow. As is a function of , using (1.15) we have
[TABLE]
Now, from (2.8) and (2.9), we have
[TABLE]
Splitting the sum into two and using , we have
[TABLE]
where
[TABLE]
with . Note that and are deterministic functions of , and . Applying Lemma 2.1 and using the notation as there, we obtain
[TABLE]
where
[TABLE]
Bounding .
Note that
[TABLE]
since all differences arising from the first sum in (Condition (G3).) cancel except the one with index . Applying the simple bound
[TABLE]
we obtain
[TABLE]
Let count the number of white balls among draws from an urn with balls, of which are white and black. Note that the marginal distribution of the degree of any vertex in is \mathop{\mathrm{Hyp}}\bigl{(}N,m,n-1\bigr{)}, and hence has mean , since the graph’s edges are uniformly sampled among all possibilities, and exactly of them are associated with a specific vertex. Hence, applying Lemma 2.2, (2.12) and (2.13), we obtain
[TABLE]
where we recall denotes a universal constant, whose value may change from line to line. Thus, as ,
[TABLE]
Bounding .
As for , we likewise have
[TABLE]
Noting that, if , we have and hence , it is immediate that
[TABLE]
Bounding .
In order to bound , with the transposition of and , note first that
[TABLE]
since is a function of the graph and , and by (2.3), the graph obtained from does not change when swapping edge with edge or non-edge with non-edge. Hence, averaging over , a transposition of and a uniformly chosen index in , yields
[TABLE]
By exchangeability the expectation on the right hand side is constant for and ; hence, for such and ,
[TABLE]
so that
[TABLE]
Now,
[TABLE]
here, we have first applied the inequality , followed by (2.16) with replaced by to the first expectation in the expression that results to yield that , and to the second expectation, where denotes equality in distribution. Hence,
[TABLE]
Now, recalling that is the set of indices of edges to which those edges adjacent to vertex were relocated, let
[TABLE]
the set of vertices that received at least one additional edge when redistributing those edges. Also, let
[TABLE]
the neighbours of that did not receive a new edge when redistributing the edges incident on .
Note that the chosen vertex will increase the difference by one if it is isolated in . A vertex , will have this same effect if is isolated in but then has an edge attached to it in the redistribution of the removed edges of . On the other hand, a vertex will decrease this difference by one when is connected to , and has degree 1 in , and does not have such an edge reattached. Hence, this difference is given by
[TABLE]
where . Letting denote set difference, we obtain
[TABLE]
For the first term in (LABEL:65), we have used that for any vertex we can only have when is an endpoint of the additional edge determined by , that is, when . For the second term in (LABEL:65) we have used similarly that
[TABLE]
Moving now to the third term in (LABEL:65), if and , then ; indeed, if , vertex has the same degree in both and , and if also , then Algorithm 1 will redistribute the edges adjacent to to the same available pairs of vertices when has degree or ; indeed, note that between the two cases and , Step 7 changes only if for any of the tested there, which is equivalent to ). Therefore, if , we must either have or . Now, if , then the degree of in is one more than its degree in , so will contain one more edge than . And if , then since will be found blocked when forming and a new non-edge has to be found. Hence,
[TABLE]
For the fourth term in (LABEL:65) we apply the bound
[TABLE]
Finally, for the last term, similarly as for the third, if both and , it is easy to see that ; indeed, under these conditions, the set of vertices adjacent to does not change with the addition of edge , and moreover, , which implies , so that . Hence, if , we must either have or .
If , then has one more neighbour in than in , and so will contain one more edge than . In this case, and can differ by at most three elements. Indeed, they may only differ by the additional neighbour in , and by at most two existing neighbours of in which were not assigned an edge in , but were so assigned in .
If , then , so that and can differ by at most four elements; hence
[TABLE]
Now recalling (2.17), summing (LABEL:65) over and noting that
[TABLE]
we obtain
[TABLE]
where
[TABLE]
For the first term,
[TABLE]
Note that each vertex has at most potential edges available where the new edge can be placed. Hence, since , there are at most potential edges with one end in , and so
[TABLE]
Noting that is bounded by , recalling that and using Lemma 2.2, and also (2.6) of Condition (G1), which gives that as by (2.5), we therefore have
[TABLE]
Moreover, with ,
[TABLE]
since there are at most potential edges with one end in and the other end in . Hence, again using and Lemma 2.2, and also Cauchy-Schwarz, we obtain
[TABLE]
so that (2.23) results in the bound
[TABLE]
Next, we have
[TABLE]
To calculate the first probability, we condition on and average over . If , then the conditional probability vanishes, as no edge incident on the (removed) vertex gets redistributed. Hence, take such that . To compute , note that there are non-edges of , out of which involve vertex and can therefore not be used during the redistribution of the edges incident to vertex , which is to be removed. This leaves potential edges from which to draw our sample of non-edges. By uniformity, the probability that is in this sample is given by
[TABLE]
as we only ask for the probability that one special object is included in a simple random sample of objects from a population of size , and where in the final inequality we have used (2.6) of Condition (G1). Averaging over , for the first term in (2.25) we obtain the bound
[TABLE]
Next, as the events and are conditionally independent given , we may handle the second, off diagonal term of (2.25) by using Lemma 2.2 to give that
[TABLE]
which, recalling (2.26), results in the bound
[TABLE]
Thus, using (2.25), (2.27) and the inequality directly above, we obtain
[TABLE]
Finally, in order to bound , note that the double sum is simply twice the sum over all the vertices of edges in . Note also that, as must have degree at least one to be included in the sum, only if has degree 1 in and it receives the additional edge . Thus, since the additional edge has two endpoints, it is immediate that can be no more than , so that
[TABLE]
Recalling (2.22) and applying (2.24), (2.29) and (2.30) yields
[TABLE]
Now, by Lemma 2.6, we have , and since remains bounded on the positive real numbers, it follows that is bounded; hence,
[TABLE]
Bounding .
Using the same arguments as those used for to reach (2.17), we can show that
[TABLE]
Adding and subtracting , and splitting the sum, we obtain
[TABLE]
where
[TABLE]
In order to bound , note first that is non-zero, and in that case equals one, exactly when vertex is isolated in and the added edge is incident on ; that is,
[TABLE]
And since implies , we have
[TABLE]
Squaring, taking expectation and using exchangeability, we obtain
[TABLE]
For the first term, we have
[TABLE]
while for the second term
[TABLE]
where we have used that , the probability that a hypergeometric variable with the given parameters takes the value 0, is a decreasing function of the number of special items . Hence,
[TABLE]
In order to handle , note that if , we necessarily have , so that whenever ; it follows that
[TABLE]
Therefore,
[TABLE]
Combining the bounds (2.14), (2.15), (2.31) and (2.32) as in (2.11), and then recalling (2.10), we obtain
[TABLE]
Recalling (1.21) and noting that by Lemma 2.5, the first condition in (1.7) holds, as
[TABLE]
Next, it clearly suffices to verify the second condition in (1.7) of (G3) with replaced by its absolute upper bound
[TABLE]
obtained in (2.13), and splitting the resulting expression to be bounded into two terms, we have
[TABLE]
Now, let . Using the given form (2.8) of , we obtain
[TABLE]
where, for the final inequality, we used that when and that on the first summand, and Lemma 2.2 on the second summand. Setting we obtain the bound
[TABLE]
on the first term of (2.34).
The second term in (2.34) likewise leads to two terms, corresponding to the two in the second line of (2.35), but with an additional factor of . Now setting , for the first we have, by applying Cauchy-Schwarz,
[TABLE]
Conditional on vertex being isolated, the distribution of the number of isolated vertices in the model is one more than the number of isolated vertices in the model. Hence, writing
[TABLE]
and using twice, we obtain
[TABLE]
Lemma 2.9 yields that the first term is bounded by a constant. For the second term, by removing all edges from the vertex and relocating them among the remaining vertices, we have a coupling of and which yields , so that
[TABLE]
Using that in (1.21) is lower bounded by , which is at least 1 by Lemma 2.8, and that by Lemma 2.5 yields , and using also (2.37), we conclude that
[TABLE]
For the corresponding second term of (2.35), with and the additional factor of , using Cauchy-Schwarz and ,
[TABLE]
applying Lemma 2.2. Combining with (2.36) and (2.38) we see the sum is of the order of (2.39) and it follows that
[TABLE]
Condition (G4).
Let , and define
[TABLE]
the -algebra generated by the identity of the vertex chosen to be removed in the coupling and its degree. Letting , and be as in (2.33), we see that both are clearly -measurable.
For the first condition in (1.8), let
[TABLE]
Recall (2.5) and (2.6); in particular, on \Smiley, we have and . It is straightforward to check that under these conditions,
[TABLE]
Indeed, if for , the bound follows using that for , while for one verifies, for , that .
Now, bounding by as given in (2.33), writing as short for and using that in the final inequality, we obtain
[TABLE]
Since cannot be both isolated and have positive degree, we have almsot surely, and so the first term is zero. Applying Cauchy-Schwarz to the second term and then invoking Lemma 2.2,
[TABLE]
By Lemma 2.2 with being the mean of , we have for any that
[TABLE]
trivially, the final expression upper bounds the left hand side for as well and hence holds for all . Hence, with as in (2.42), by (2.43) and recalling in (1.21), we obtain
[TABLE]
where we have used that via Lemma 2.5, and trivially , for the second inequality, thus showing the first condition in (1.8) is satisfied.
From (2.35) with it follows that
[TABLE]
thus showing that the second condition in (1.8) is also satisfied.
Condition (G5).
Denote by the “embedded” graph obtained by removing vertex and all its incident edges; we keep the original vertex labeling. As the remaining edges are uniformly distributed over the remaining vertices, conditional on in (2.40), the resulting graph has conditional distribution
[TABLE]
almost surely; this identity is again to be understood up to labeling. In particular, letting be the degree of vertex in graph ,
[TABLE]
is the number of isolated vertices of , and (2.45) implies
[TABLE]
Clearly is -measurable. Now set as in (2.41), which is also clearly measurable. Condition (1.10) is clearly equivalent to the first condition in (1.8), which was verified in (LABEL:87).
Condition (G6).
Let
[TABLE]
which is clearly -measurable. Moreover, since removing any edge connected to vertex can make at most one vertex, other than , isolated; the additional term of one accounts for the case when vertex is isolated. Since , as given in (2.33), by setting in (2.35) we obtain
[TABLE]
As via Lemma 2.5, the second bound in (1.11) holds.
Condition (G7).
We verify the stronger conditions that (1.12) and the second bound of (1.13) hold when taking the larger supremum obtained when removing the intersection with . This stronger version of (1.12) is an immediate consequence of Lemma 2.9. As this same lemma shows that the ratios in (1.13) involving means and variances are bounded by a constant, it is only required to bound the ratios of the remaining factor. For , we have
[TABLE]
and for the reciprocal, using that for ,
[TABLE]
Conditions (G1)–(G7) have been verified, and Theorem 1.3 now follows from Theorem 1.1. ∎
2.2 Technical results
Lemma 2.1** (Efron-Stein-type variance bound).**
Let and the components of be independent uniform random permutations of , and let be a real-valued function. Let be random transpositions independent of each other and of , where transposes and a uniformly chosen integer in the set . Let be an independent copy of and let . Then
[TABLE]
Proof.
Without loss of generality assume . Let and , and let
[TABLE]
and
[TABLE]
Let be uniform on , let be uniform on , let be uniform on , and assume , and are mutually independent and independent of all else. Let , let , and let , and . Let G_{1,i}=n\bigl{(}h(\pi_{N-1},\Sigma_{i})-h(\pi_{N-1},\Sigma_{i-1})\bigr{)}, let G_{2,j}=(N-1)\bigl{(}h(\pi_{j},\Sigma_{0})-h(\pi_{j-1},\Sigma_{0})\bigr{)}, and let . Let be any bounded measurable function. Then, on the one hand,
[TABLE]
where we used that is equal in distribution to and independent of ; this follows e.g. from Algorithm P of (Knuth,, 1969, p. 147) since the distribution of is uniform conditionally on , and therefore independent of .
On the other hand, for all we have since , and for all that , by recalling the definition of and observing that and have the same distribution, and that both are independent of , so
[TABLE]
Therefore, is a Stein coupling and, specializing (1.2) to the case and applying the Cauchy Schwarz inequality and noting that , we have
[TABLE]
from which the claim follows. ∎
Lemma 2.2** (Tail and moment bounds for the hypergeometric distribution).**
Let have the hypergeometric distribution counting the number of white balls among draws from an urn with balls, of which are white and black. Let . Then, for any ,
[TABLE]
Moreover, for any , there is a constant independent of such that
[TABLE]
Proof.
To construct a bounded size bias coupling, index the white balls by , and write where is the indicator that the white ball is sampled. Construct with the -size biased distribution by uniformly sampling a random index from to independently of ; if , set , otherwise independently and uniformly select a ball from the sample and swap it with the white ball. It is easy to see that has the size-bias distribution, see for instance, Lemma 2.1 of Goldstein and Rinott, (1996). Moreover, if a sampled black ball was swapped with the white ball, and otherwise. Hence, , and the tail-bound (2.46) follows readily from Theorem 1.1 of Ghosh and Goldstein (2011).
Now, it is straightforward to check that whenever and , so that
[TABLE]
Hence, is stochastically dominated by an exponential random variable with mean , and in particular
[TABLE]
from which the second claim easily follows. ∎
A bound similar to (2.46) can be obtained from (Greene and Wellner, 2017, Corollary 1) with better constants, but under additional conditions on the parameters of the hypergeometric distribution
Lemma 2.3**.**
If , then
[TABLE]
and
[TABLE]
where the lower bound on is valid whenever .
Proof.
Since , the upper bound on immediately follows. Using the usual exponential upper bound for the final inequality,
[TABLE]
from which the upper bound on and first lower bound on follow. The second lower bound on follows from the first lower bound and the inequality when . The lower bound on follows from the inequality for and the lower bound in (2.47), which together yield
[TABLE]
Lemma 2.4**.**
For any
[TABLE]
Proof.
The upper and lower bounds hold trivially at . With , by Talyor’s expansion around zero, for all there exists such that
[TABLE]
For we have , thus proving the upper bound over this interval. As for all , the function is non-decreasing for . As , and , we have for all , thus proving the upper bound on . As for all , the function is non-decreasing on , and as as , we have for all .
For the lower bound, for letting
[TABLE]
With we have , so is decreasing for . In particular, for . As , the function is non-decreasing, and hence for we have , completing the proof of the lower bound. ∎
Lemma 2.5**.**
For all and distinct vertices and , the indicators and that and are isolated are negatively correlated, that is,
[TABLE]
Proof.
Vertex is isolated if and only if none of the edges that connect to another vertex is included in the set of edges selected. Likewise, distinct vertices and are both isolated if and only if none of a particular set of edges is selected. Hence, the first claim is equivalent to
[TABLE]
Expanding the binomial coefficients and canceling common factors yields the equivalent form
[TABLE]
where , and pairing up the factors of the falling factorials we obtain
[TABLE]
It suffices to show the inequality holds termwise. Expanding both sides of the term of each side and simplifying yields
[TABLE]
The case implies all others, and reduces to and so holds for all , thus proving the first claim.
Since the indicators of vertices being isolated are negatively correlated, we have
[TABLE]
from which is immediate. As for , using Lemma 2.3 we have
[TABLE]
as claimed. ∎
Lemma 2.6**.**
For and , we have
[TABLE]
and
[TABLE]
Proof.
Since the distribution of each individual degree is , and as the hypothesis of Lemma 2.3 holds due to the restriction assumed on , it follows from that lemma that
[TABLE]
yielding the upper bound in (2.48). Since under the assertions on and we have
[TABLE]
it follows that
[TABLE]
from which we obtain the lower bound in (2.48).
In order to prove the upper and lower bounds on the variance, we use the fact that when is a Stein coupling for a mean zero random variable ; this identity follows immediately upon setting in (1.2). Now recall (2.8), (2.9) and (2.20), and that in (2.18) is the set of vertices that receive at least one edge when forming , and that in (2.19) is the set of all vertices such that is an edge in , and does not receive a redistributed edge. As when the sets and are empty, and recalling that , we have
[TABLE]
Now consider the first sum in (LABEL:94). Note that when , of the potential edges, have vertex 1 as an endpoint, and an additional edges remain in and are not redistributed. Hence,
[TABLE]
To arrive at the hypergeometric expression in the sum in the last equality from the conditional probability that vertex is incident on any of the redistributed edges that were removed from vertex when making the new graph, note that the total number of edges available is reduced from first by , as vertex has been removed, and also due to the edges that were part of the original graph that are not changed. Of these remaining edges, are incident on vertex , which is one fewer than their original number of , due to the removal of vertex .
Using Lemma 2.3,
[TABLE]
from which we obtain the upper bound
[TABLE]
Given , we have , hence
[TABLE]
and so,
[TABLE]
Similarly, using the second moment expression from (2.28)
[TABLE]
and so from (LABEL:96) we obtain the lower bound
[TABLE]
Now, for the first term in the brackets we have
[TABLE]
where we have used (2.50) for the last inequality. For the second term in the brackets,
[TABLE]
where again we have used (2.50) for the last inequality. Hence, together with the upper bound (2.54), we arrive at
[TABLE]
Now considering the second sum in (LABEL:94), we can write
[TABLE]
where . Taking expectation of the first sum on the right hand side of (2.56) and noting that the distributions of the degrees in the graph are hypergeometric, we obtain that
[TABLE]
From this equality and using the assertions on and , we obtain
[TABLE]
Now taking expectation of the second sum of (2.56),
[TABLE]
We arrive at the first Hypergeomtric expression in the sum in the last equality by the same reasoning as that given following (LABEL:95); the remaining two expressions in the sum follow by similar, and simpler, means.
Now, for the first and last terms, using Lemma 2.3 for the upper bound, we have
[TABLE]
and thus, using in the final inequality that , which holds via the assumption that , and that , which holds as , true by assumption, we obtain
[TABLE]
Using the estimates from (2.57) and (LABEL:101) in the difference (2.56), and then applying that result and (LABEL:98) in (LABEL:94) yields the claim. ∎
Lemma 2.7**.**
There exist universal integers and , and positive constants and such that, whenever
[TABLE]
we have
[TABLE]
and
[TABLE]
where
[TABLE]
Proof.
It is easy to verify that
[TABLE]
Hence, with the first inequality in (2.59) holding with replaced by 27, and taking , Lemma 2.6 can be invoked to yield
[TABLE]
from which (2.60) now follows for any .
Turning to (2.61), we first show that the lower bound in (LABEL:92) is positive whenever and . Indeed, that lower bound is positive whenever
[TABLE]
which, recalling the upper bound (2.48), is implied whenever
[TABLE]
with
[TABLE]
Since (2.63) is equivalent to the inequality , which in turn is satisfied if , since , we arrive at the sufficient condition
[TABLE]
which is equivalent to . This inequality holds whenever both and .
We now proceed to bound the ratio between the upper and lower bounds, say and , respectively, of (LABEL:92). Using the identity , we have
[TABLE]
We proceed to lower bound the denominator in (2.65). Letting and be as in (2.64), and applying the upper bound in (2.48), we may write
[TABLE]
If , we have and thus from Lemma 2.4, so that
[TABLE]
when . If and so , we simply use the lower bound
[TABLE]
and for any positive we can take large enough so that
[TABLE]
Hence, writing with the understanding that the implied bound holds with universal constants, recalling (2.65), and using Lemma 2.6 to bound in its numerator, we have
[TABLE]
where both the terms are non-negative.
Next, with , we show that
[TABLE]
Using (2.60) for the second equality, (2.48) for the third, then (2.60) again and the lower bound of Lemma 2.4 for the fourth, we obtain
[TABLE]
as . In the case , we have
[TABLE]
showing the first bound in (2.67). In the case ,
[TABLE]
and using that is bounded over ,
[TABLE]
Applying (2.62), the second bound in (2.67) is shown. Now, using that , and writing
[TABLE]
and observing that, because the implicit constants in the bounds (2.66) and (2.67) are universal, and using that the terms in (2.66) are non-negative, we can choose small enough and large enough to guarantee that and , and hence obtain the upper and lower bounds
[TABLE]
from which the estimate (2.61) follows.
∎
Lemma 2.8**.**
Let be defined as in (1.21). For any integers and and any positive constant , there exists such that implies
[TABLE]
Proof.
We will show that for \overline{r}=\max\bigl{\{}\overline{n}^{1/2},(2\overline{m})^{3/2},1/\overline{c}^{2},1\bigr{\}} if (2.68) is violated. Indeed, if , we have by Lemma 2.5, and that , then
[TABLE]
Finally, if , then similarly
[TABLE]
Lemma 2.9**.**
Letting \Smileybe as in Condition (G1), it holds that
[TABLE]
and
[TABLE]
Proof.
First, note that if , then from (2.5) and (2.6) the conclusion of Lemma 2.7 holds. For the ratio of means, from Lemma 2.6, to upper bound it suffices to upper bound the ratio
[TABLE]
which is bounded by a constant via , as in (2.59). Similarly, to upper bound it suffices to upper bound the ratio
[TABLE]
which, here using that , we see is also so bounded.
For the ratios of variances, for let
[TABLE]
let , and write
[TABLE]
We show that these four terms, and their reciprocals, can be uniformly bounded over the range of the supremum in (2.69). Since (2.59) holds for and , we can apply Lemma 2.7, and also (2.5) for the first and final bounds, and obtain
[TABLE]
Next, since by (2.5) and , we have that . Since , again by (2.5), we have that , and since by (2.5) and for , we have that . It follows that also satisfies the hypotheses of Lemma 2.7. Using the lower bound on from (2.5), we have , and also from (2.6) that , so also using for the second and second to last inequality,
[TABLE]
Hence, (2.70) and (2.71) imply that
[TABLE]
Clearly, for . Lastly,
[TABLE]
Note that by (2.6), and by (2.5) that gives that , and also using ,
[TABLE]
It follows that remains bounded on \Smiley, and therefore, to show is bounded it suffices to show that
[TABLE]
remains bounded. Using Lemma 2.4,
[TABLE]
But this ratio remains bounded from above, away from 1, as implies
[TABLE]
The reciprocal is bounded similarly, using that (2.72) shows that is bounded. ∎
3 Jack Measure on Tableaux
We now turn to the study of the distribution of the standardized sum of the -contents over all boxes in a tableaux whose shape is determined by the partition of , that is, to
[TABLE]
where
[TABLE]
and where the partition is sampled from the Jackα measure in (1.22), as described in detail in the introduction; see (1.23) for an illustration of , where .
Our bound is based on the zero bias construction in Fulman and Goldstein, (2011), which itself depends on an exchangeable pair constructed using Kerov’s growth process, a sequential procedure for growing a random partition distributed according to Jackα measure.
The state of Kerov’s growth process at times is a partition of , starting at time 1 with the unique partition (1) of 1. To describe its transition rule from time to for , given a box in the diagram of a partition of , let denote the number of boxes in the same row of and to the right of (the “arm” of ), and let denote the number of boxes in the same column of and below (the “leg” of ), as in (1.22). Now set
[TABLE]
and, for a partition of obtained from by removing a single corner box, let
[TABLE]
where is the union of columns of that intersect and is the union of rows of that intersect . If at stage the state of the process is the partition , a transition to the partition occurs with probability
[TABLE]
It is shown in Kerov, (1994), see also Fulman, (2006), that if is distributed according to Jackα measure on partitions of , then the partition obtained by this process at time has the Jackα distribution.
In the proof of Theorem 3.1 of Fulman and Goldstein, (2011), a variable having the zero bias distribution of was constructed as follows. Fix and and let be the state of Kerov’s growth process at time , and set
[TABLE]
Denoting by the content of the box added at time to form , we can now write
[TABLE]
With the conditional distribution of given , constructing the pair
[TABLE]
on the same space as , and letting be independent of and , the variable
[TABLE]
has the -zero bias distribution. In fact, the joint distribution on the right hand side of (3.4) can be achieved by running Kerov’s growth process twice, conditionally independent on . As shown in Fulman and Goldstein, (2011), the resulting variables, say and , yield the crucial exchangeable Stein pair in (1.3) via (3.3). Again by Fulman and Goldstein, (2011), both the conditional mean and variance of given do not depend on ; specifically,
[TABLE]
It is essentially for this reason that we may construct as in (3.5), using ; for details, see Fulman and Goldstein, (2011).
Proof of Theorem 1.5.
We verify the conditions of Theorem 1.2.
Condition (Z1).
Fix an , suppressed in the notation, and let
[TABLE]
where
[TABLE]
which is positive and measurable. Note that
[TABLE]
which implies in particular that if .
From Fulman, (2004), the mean and variance of the content of a tableaux of a partition of under Jackα measure is given, respectively, by
[TABLE]
In particular we have that for all .
Condition (Z2).
The variable , given in (3.1) is easily seen to satisfy the needed conditions, and the construction of the zero bias variable is outlined above in (3.3), (3.4) and (3.5).
Condition (Z3).
From (3.3) and (3.5) we see that
[TABLE]
For each let be the trivial -algebra , let
[TABLE]
where and respectively denote the length of the first row and first column of the tableaux produced by Kerov’s growth process at time . Clearly is measurable.
We next argue that on as follows. With and the contents of the boxes added to by Kerov’s growth process, all conditionally independent given , with probability one,
[TABLE]
as the extreme values and are achieved, respectively, by adding a box at the end of first row, and at bottom of the first column. Scaling by in (3.10) to obtain , and , respectively, with probability one
[TABLE]
Now note that by (3.4) the distribution of is absolutely continuous with respect to that of , and hence with probability one
[TABLE]
As is the convex combination of , it too must lie in this same interval, and hence, as the length of the first column of can be no more than , we obtain
[TABLE]
In what follows, we think of as fixed and suppress the subscript in . Turning to the moment conditions, we claim that
[TABLE]
Now,
[TABLE]
To bound the second moment of , by the zero bias formula (1.4) with , and the proof of Theorem 4.1 in Fulman and Goldstein, (2011), we obtain
[TABLE]
Hence, by (3.6),
[TABLE]
For the second term of (3.15), by (3.6), we obtain , thus showing first inequality in (3.14). The final inequality in (3.14) holds as implies .
To verify the first condition in (1.16), apply the Cauchy Schwarz inequality, (3.8) and (3.14) to obtain
[TABLE]
To control , with , we apply the inequality
[TABLE]
from the proof of Lemma 6.6 in Fulman, (2004). Using that in the third inequality below we obtain
[TABLE]
Substitution into (3.16) now verifies the first condition in (1.16).
For the second condition in (1.16), using (3.8), the Cauchy Schwarz inequality, that , (3.14) and (3.11) we obtain
[TABLE]
Condition (Z4).
For , let
[TABLE]
which is measurable, let , and let be as in (3.2). The conditional distribution condition (1.17) is satisfied for with by the properties of Kerov’s growth process. Clearly the set is measurable with respect to . The moment condition (1.18) is trivially satisfied, as almost surely.
Condition (Z5).
By (3.1) and (3.2) we have that as in (3.3), the scaled content of the box added at time in Kerov’s growth process. Hence, the first part of Condition (1.19) holds with in (3.11), as by (3.12), and arguing as in (3.13), we have
[TABLE]
The second part of this condition holds easily, as
[TABLE]
Condition (G7).
To verify the variance ratio condition (1.12), recalling from (3.10) and from (3.17), we have
[TABLE]
as for all by the comment after (3.9). For this same reason condition (1.13) holds, as
[TABLE]
Conditions (Z1)–(Z5) and (G7) have been verified, and Theorem 1.5 now follows from Theorem 1.2. ∎
The next result shows that the case when is taken larger than that in Theorem 1.5 is degenerate; the boundary case is left unresolved.
Theorem 3.1**.**
For all , along any sequence for which ,
[TABLE]
Proof.
Note that for all boxes in the Tableaux with we have and takes all values between [math] and . Hence, from the Jackα measure distribution as given in (1.22),
[TABLE]
Substituting the lower bound on into this inequality yields
[TABLE]
Remark 3.2**.**
The Wasserstein bound in (1.25) suggests that a bound in the Kolmogorov metric should hold with rate function
[TABLE]
This rate function is equivalent to the one we take in (3.8) for the ‘large ’ parameter set (3.7), as there and is dominated by . Directly extending the arguments used here to cover the ‘small’ alpha regime requires that (3.16) hold for some choice of . In particular, (3.14) shows that , with as in (3.18). Hence, taking this route, one needs to specify as an appropriate restriction on that satisfies , and which gives rise to a bounding of the right order. If in this case may be taken to be as in (Z5) above, then needs to be of order .
4 Proof of Theorems 1.1 and 1.2
The proofs of Theorems 1.1 and 1.2 ultimately rely on obtaining information about the solution to a certain recursive inequality. In its simplest form, and closely related to the argument in Bolthausen, (1984), this inequality becomes
[TABLE]
for some and . In this simple case, it is not difficult to solve the corresponding equality explicitly to yield
[TABLE]
What is important here is not the exact form of the solution but rather that is uniformly bounded over . We show below that this property holds in greater generality when we replace on the left hand side of (4.1) by a generic parameter , and average the right hand side over a randomly chosen parameter , rather than evaluate at . Although, in the general case, there may exist additional solutions to the inequality that are unbounded, it turns out that these solutions must grow exponentially fast along some sequence, which is a behavior that can be excluded in our applications.
Lemma 4.1**.**
Let and be measurable spaces. For each , let be a probability measure on . Let and be such that, for each , both and are measurable functions. Assume there are constants and , measurable functions and , and a measurable set such that
[TABLE]
Then
[TABLE]
Proof.
Note that, for , the variable must be zero -almost surely by (A2), and so (A3) yields that
[TABLE]
We may therefore assume that \Smiley is non-empty, else the claim in trivial. We argue by contradiction; so assume Conditions (A1)–(A4) are satisfied and that the opposite of the conclusion is true. For every , we can use (A1) and consider the probability measure specified by its Radon-Nikodym derivative
[TABLE]
where denotes expectation with respect to . We argue by contradiction, assuming that when
[TABLE]
and Conditions (A1)–(A4) hold, there exists a sequence and a constant such that, for all ,
[TABLE]
which is clearly impossible.
We proceed by induction. For the base case , we note that since is bounded by on by (4.2), from (4.3) that there is such that , for some ; taking also , (4.4) is satisfied.
For the induction step, assume that the lower bound in (4.4) is true for . As , Condition (A3) yields that , and so the integrand must be at least this lower bound on a set of positive –measure; that is,
[TABLE]
Moreover, by the definition of essential supremum,
[TABLE]
Hence , and we can find satisfying
[TABLE]
Since on we conclude that in view of the first inequality of (4.5), which also completes the induction for the lower bound in (4.3). Applying (4.5) and (A4) yields
[TABLE]
yielding the upper bound in (4.4), and concluding the induction. ∎
Proof of Theorem 1.1.
Throughout the proof, denotes a constant that does not depend on and can change from formula to formula. Note first that by Condition (G1) the bound (1.14) trivially holds for every by taking . Therefore we need only show that (1.14) holds for all . Let
[TABLE]
Fix , whose exact value is to be chosen later, and for define
[TABLE]
Let be the unique bounded solution to the Stein equation
[TABLE]
Using a standard smoothing inequality, see e.g. the proof of Theorem 5.1 in Chen, Goldstein and Shao (2011), we have
[TABLE]
For ease of notation, we drop the indices and from .
Bound on .
Taking an arbitrary and using the definition (1.2) of a Stein coupling in the second line below, we have
[TABLE]
From (4.6) and (4.7) of Chen and Shao, (2004) we have, respectively, that and
[TABLE]
implying, by the first condition in (1.7), that
[TABLE]
and that
[TABLE]
Using the second condition in (1.7), and that in the integral, we have
[TABLE]
Let . To handle the indicator in , write
[TABLE]
Using (LABEL:143), and again that , we have
[TABLE]
where . Now, by (1.10) and the first condition of (1.8)
[TABLE]
Since is contained in and for , on this intersection we may define
[TABLE]
and thus write
[TABLE]
where , and are to be understood as random variables on . By the first condition in (1.11), we have on . Hence,
[TABLE]
where
[TABLE]
is measurable by Condition (G5).
Note that since \Smiley, given in Condition (G1), is in and is -measurable by Condition (G5) for . Now using Condition (G4) to bound by on , and applying the measurability of and with respect to by Conditions (G4) and (G5), we obtain
[TABLE]
Using (4.6) and (1.9) we obtain
[TABLE]
and as the normal density is bounded by , using (1.12) we see that the integrand in (LABEL:146) can be no more than
[TABLE]
Therefore, using the second condition in (1.8) and the second inequality in (1.11) for the fourth inequality below, and then the first condition in (1.13) for the last, we obtain
[TABLE]
where in the case \mathop{{}{\mathbb{E}}}\mathopen{}\bigl{\{}\overline{G}\,\overline{D}^{2}I_{F_{\theta,2}}\bigr{\}}=0, by the first line of the display above.
In order to bound , using that for by (4.6) for the second equality, that for the first inequality, the first condition in (1.13) for the second inequality, and the second condition in (1.8) for the last, we have
[TABLE]
where when \mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{G}\,\overline{D}^{2}I_{F_{\theta,2}}\bigr{\}}=0, by the first line of the display.
Collecting the bounds (4.9), (4.10), (4.13), (4.16) and (4.18) and using (4.7) we arrive at
[TABLE]
Since Condition (G1) implies that is an upper bound on for , and a lower bound on for , we conclude that
[TABLE]
Hence, by the second condition in (1.13),
[TABLE]
Choosing with as in (4.19) and multiplying that inequality by on both sides and then setting we obtain, for some possibly different constant , which does not depend on but may depend on ,
[TABLE]
We now verify the hypotheses of Lemma 4.1, with the additional identification
[TABLE]
Conditions (A1) and (A2) follow directly from the definition of , while (A3) on \Smiley is (4.21), and is satisfied on as , and we may replace by . Condition (A4) follows from (4.20). The conclusion of Lemma 4.1 now implies that for all . ∎
Proof of Theorem 1.2.
The proof for zero biasing is quite similar, but simpler, than the proof of Theorem 1.1; we only highlight the important differences.
Recalling , applying the bound (4.8), and the zero bias characterization (1.4), we obtain
[TABLE]
Using (Z3), noting in particular that on , and the fact that for yields , for the first two terms in (4.22), we have
[TABLE]
Following the reasoning in (4.12) and labeling the corresponding terms that arise here in the same manner, for , the only remaining term, by the first condition in (1.16), and (1.18), we obtain the bound
[TABLE]
For , as in (4.14) is replaced by , separating the term that arises from out of as defined there, here we obtain
[TABLE]
where is measurable. Now arguing as in (4.16) we obtain
[TABLE]
using the second condition of (1.16) and the first one of (1.13) for the first term, and the second conditions of (1.19) and (1.16), respectively, to obtain the last two terms in the bound.
As in (4.18), using the first condition of (1.13) and the second condition of (1.16), we obtain
[TABLE]
Combining terms as in (4.19) yields
[TABLE]
The proof can now be concluded as for Theorem 1.1. ∎
5 Appendix
We illustrate two instances where the conditions in the General Framework of the Introduction are implicitly invoked. First we show that random version of the random variable at the (random) ‘smaller’ parameter value is a random variable. The maps
[TABLE]
are measurable, the first as each component is measurable, and the second being a composition of measurable maps.
Next, we show that if is measurable and -integrable for all , then
[TABLE]
is a measurable function of . Indeed, the collection of subsets of for which the integral of is measurable with respect to is a monotone class. The class contains the rectangles which are products of measurable sets and , as their indicator
[TABLE]
which is a product of measurable functions of . Hence contains the algebra of all finite disjoint unions of such rectangles, and hence, by the Monotone Class theorem, the sigma-algebra these rectangle generate, that is, the product sigma-algebra. Given a non-negative integrable function , standard arguments using an approximating sequence of simple functions from below in concert with the Monotone Convergence Theorem yields the measurability of the integral of , and then for real valued functions by breaking up of any given integrable function into positive and negative parts.
Acknowledgements
We are grateful to the referees for their detailed comments and references. This work was partially supported by the Singapore Ministry of Education AcRF Tier 1 Grants R-146-000-230-114 and R-155-000-167-112 through the National University of Singapore. The second author thanks the Department of Statistics and Applied Probability, National University of Singapore, for their kind hospitality.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bartroff and Goldstein, (2013) Bartroff, J. and Goldstein, L. (2013). A Berry-Esseen bound for the uniform multinomial occupancy model. Electron. J. Probab. 18 , article 27, 1–29.
- 2Bergström, (1944) Bergström, H. (1944). On the central limit theorem. Skand. Aktuarietidskr. 27 , 139–153.
- 3Bolthausen, (1984) Bolthausen, E. (1984). An estimate of the remainder in a combinatorial central limit theorem. Z. Wahrsch. Verw. Gebiete 66 , 379–386.
- 4Chen, Goldstein and Shao (2011) Chen, L. H. Y., Goldstein, L. and Shao, Q.-M. (2010). Normal Approximation by Stein’s Method . Springer Verlag.
- 5Chen and Shao, (2004) Chen, L. H. Y. and Shao, Q.-M. (2004). Normal approximation under local dependence. Ann. Probab. 32 , 1985–2028.
- 6Chen and Röllin, (2010) Chen, L.H. Y. and Röllin, A. (2010). Stein couplings for normal approximation. Preprint , arxiv.org/abs/1003.6039
- 7Chen and Thánh, (2019) Chen, L.H. Y. and Thánh, L.V. (2019). On the error bound in the normal approximation for Jack measures. Preprint , arxiv.org/abs/1902.03476
- 8Englund, (1981) Englund, G. (1981). A remainder term estimate for the normal approximation in classical occupancy. Ann. Probab. 9 , 684–692.
