Stein's method via induction

Louis H. Y. Chen; Larry Goldstein; Adrian R\"ollin

arXiv:1903.09319·math.PR·May 12, 2020

Stein's method via induction

Louis H. Y. Chen, Larry Goldstein, Adrian R\"ollin

PDF

TL;DR

This paper introduces an inductive approach to Stein's method that effectively handles non-bounded variables with complex dependencies, providing optimal rate bounds for normal approximation in new applications.

Contribution

It develops a novel inductive technique for Stein's method that applies to non-bounded couplings and demonstrates its effectiveness on complex dependent structures.

Findings

01

Achieved Berry-Esseen bounds for Erdős-Rényi graphs with fixed edges.

02

Applied to Jack measure on tableaux, showing method's versatility.

03

Produced bounds in Kolmogorov metric with optimal rate.

Abstract

Applying an inductive technique for Stein and zero bias couplings yields Berry-Esseen theorems for normal approximation for two new examples. The conditions of the main results do not require that the couplings be bounded. Our two applications, one to the Erd\H{o}s-R\'enyi, random graph with a fixed number of edges, and one to Jack measure on tableaux, demonstrate that the method can handle non-bounded variables with non-trivial global dependence, and can produce bounds in the Kolmogorov metric with the optimal rate.

Equations659

E {Z f (Z)} = E {f^{'} (Z)}

E {Z f (Z)} = E {f^{'} (Z)}

f^{'} (w) - w f (w) = h (w) - E h (Z)

f^{'} (w) - w f (w) = h (w) - E h (Z)

E {G f (W^{'}) - G f (W)} = E {W f (W)}

E {G f (W^{'}) - G f (W)} = E {W f (W)}

E {W^{'} pl u s 0.25 m u ∣ pl u s 0.15 m u W} = (1 - λ) W .

E {W^{'} pl u s 0.25 m u ∣ pl u s 0.15 m u W} = (1 - λ) W .

G = \frac{1}{2 λ} (W^{'} - W) .

G = \frac{1}{2 λ} (W^{'} - W) .

E {Y f (Y)} = μ E {f (Y^{'})}

E {Y f (Y)} = μ E {f (Y^{'})}

W = Y - μ, W^{'} = Y^{'} - μ and G = μ .

W = Y - μ, W^{'} = Y^{'} - μ and G = μ .

E {W f (W)} = σ^{2} E {f^{'} (W^{*})}

E {W f (W)} = σ^{2} E {f^{'} (W^{*})}

L_{θ} (V pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}) = L_{Ψ_{θ}} (Y) on F_{θ, 2},

L_{θ} (V pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}) = L_{Ψ_{θ}} (Y) on F_{θ, 2},

P_{θ} [V \in \cdot pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}] (ω) = P_{Ψ_{θ} (ω)} [Y \in \cdot] for all ω \in F_{θ, 2} .

P_{θ} [V \in \cdot pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}] (ω) = P_{Ψ_{θ} (ω)} [Y \in \cdot] for all ω \in F_{θ, 2} .

\Smiley = {θ \in Θ : r_{θ} > \overline{r}} .

\Smiley = {θ \in Θ : r_{θ} > \overline{r}} .

W = \frac{Y - μ _{θ}}{σ _{θ}}

W = \frac{Y - μ _{θ}}{σ _{θ}}

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\lvert}\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(1-GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muW)\bigr{\rvert}<\infty\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}(1+|W|)|G|D^{2}\bigr{\}}<\infty.

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\lvert}\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(1-GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muW)\bigr{\rvert}<\infty\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}(1+|W|)|G|D^{2}\bigr{\}}<\infty.

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|G|D^{2}(1-I_{F_{\theta,1}})\bigr{\}}<\infty\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{G}\,\overline{D}^{2}\bigr{\}}<\infty.

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|G|D^{2}(1-I_{F_{\theta,1}})\bigr{\}}<\infty\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{G}\,\overline{D}^{2}\bigr{\}}<\infty.

L_{θ} (V pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}) = L_{Ψ} (Y) on F_{θ, 2},

L_{θ} (V pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}) = L_{Ψ} (Y) on F_{θ, 2},

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|G|D^{2}(1-I_{F_{\theta,2}})\bigr{\}}<\infty.

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|G|D^{2}(1-I_{F_{\theta,2}})\bigr{\}}<\infty.

\sigma_{\theta}^{-1}|Y-V|\leq\overline{B}\quad\text{on~{}$F_{\theta,1}$},\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{G}\,\overline{D}^{2}\overline{B}I_{F_{\theta,2}}\bigr{\}}<\infty.

\sigma_{\theta}^{-1}|Y-V|\leq\overline{B}\quad\text{on~{}$F_{\theta,1}$},\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{G}\,\overline{D}^{2}\overline{B}I_{F_{\theta,2}}\bigr{\}}<\infty.

θ \in \Smiley sup ess sup_{ω \in F_{θ, 2} \cap {Ψ \in \Smiley}} \frac{σ _{θ}^{2}}{σ _{Ψ (θ, ω)}^{2}} < \infty,

θ \in \Smiley sup ess sup_{ω \in F_{θ, 2} \cap {Ψ \in \Smiley}} \frac{σ _{θ}^{2}}{σ _{Ψ (θ, ω)}^{2}} < \infty,

θ \in \Smiley sup ess sup_{ω \in F_{θ, 2}} \frac{r _{θ}}{r _{Ψ (θ, ω)}} < \infty, θ \in \Smiley sup ess sup_{ω \in F_{θ, 2} \cap {Ψ \in \Smiley}} \frac{r _{Ψ (θ, ω)}}{r _{θ}} < \infty,

\sup_{z\in{\mathbb{R}}}\bigl{\lvert}{\mathbb{P}}_{\theta}[W\leq z]-{\mathbb{P}}[Z\leq z]\bigr{\rvert}\leq\frac{C}{r_{\theta}}\qquad\text{for all~{}$\theta\in\Theta$.}

\sup_{z\in{\mathbb{R}}}\bigl{\lvert}{\mathbb{P}}_{\theta}[W\leq z]-{\mathbb{P}}[Z\leq z]\bigr{\rvert}\leq\frac{C}{r_{\theta}}\qquad\text{for all~{}$\theta\in\Theta$.}

\displaystyle\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\mathopen{}\mathclose{{}\left|\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(1-GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muW)}\right|\leq\sqrt{\mathop{\mathrm{Var}}\nolimits_{\theta}\mathopen{}\mathclose{{}\left(\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muW)}\right)}\leq\sqrt{\mathop{\mathrm{Var}}\nolimits_{\theta}\mathopen{}\mathclose{{}\left(\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15mu{\mathcal{H}})}\right)},

\displaystyle\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\mathopen{}\mathclose{{}\left|\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(1-GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muW)}\right|\leq\sqrt{\mathop{\mathrm{Var}}\nolimits_{\theta}\mathopen{}\mathclose{{}\left(\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muW)}\right)}\leq\sqrt{\mathop{\mathrm{Var}}\nolimits_{\theta}\mathopen{}\mathclose{{}\left(\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(GD\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15mu{\mathcal{H}})}\right)},

\Smiley = {θ \in Θ : r_{θ} > \overline{r}} .

\Smiley = {θ \in Θ : r_{θ} > \overline{r}} .

W = \frac{Y - μ _{θ}}{σ _{θ}}

W = \frac{Y - μ _{θ}}{σ _{θ}}

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|D|(1-I_{F_{\theta,1}})\bigr{\}}<\infty\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}E_{\theta}\mathopen{}\mathclose{{}\left\{|DW|+{\overline{D}}}\right\}<\infty.

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|D|(1-I_{F_{\theta,1}})\bigr{\}}<\infty\quad\text{and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r_{\theta}E_{\theta}\mathopen{}\mathclose{{}\left\{|DW|+{\overline{D}}}\right\}<\infty.

L_{θ} (V pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}) = L_{Ψ} (Y) on F_{θ, 2},

L_{θ} (V pl u s 0.25 m u ∣ pl u s 0.15 m u F_{θ}) = L_{Ψ} (Y) on F_{θ, 2},

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|D|(1-I_{F_{\theta,2}})\bigr{\}}<\infty.

\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\,\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}|D|(1-I_{F_{\theta,2}})\bigr{\}}<\infty.

\sigma_{\theta}^{-1}|Y-V|\leq\overline{B}\quad\text{on $F_{\theta,1}$,\quad and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{D}\mathopen{}\mathclose{{}\left(\overline{B}+\overline{D}}\right)I_{F_{\theta,2}}\bigr{\}}<\infty.

\sigma_{\theta}^{-1}|Y-V|\leq\overline{B}\quad\text{on $F_{\theta,1}$,\quad and}\quad\sup_{\theta\in\raisebox{-0.45206pt}{\Smiley}}r^{2}_{\theta}\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{D}\mathopen{}\mathclose{{}\left(\overline{B}+\overline{D}}\right)I_{F_{\theta,2}}\bigr{\}}<\infty.

\sup_{z\in{\mathbb{R}}}\bigl{\lvert}{\mathbb{P}}_{\theta}[W\leq z]-{\mathbb{P}}[Z\leq z]\bigr{\rvert}\leq\frac{C}{r_{\theta}},\qquad\text{for all~{}$\theta\in\Theta$.}

\sup_{z\in{\mathbb{R}}}\bigl{\lvert}{\mathbb{P}}_{\theta}[W\leq z]-{\mathbb{P}}[Z\leq z]\bigr{\rvert}\leq\frac{C}{r_{\theta}},\qquad\text{for all~{}$\theta\in\Theta$.}

\displaystyle\Theta=\mathopen{}\mathclose{{}\left\{(n,m)\,:\,\text{$n\geq 3$, $0<m<{n\choose 2}$}}\right\},

\displaystyle\Theta=\mathopen{}\mathclose{{}\left\{(n,m)\,:\,\text{$n\geq 3$, $0<m<{n\choose 2}$}}\right\},

\sup_{z\in{\mathbb{R}}}\bigl{\lvert}{\mathbb{P}}_{n,m}[W\leq z]-\Phi(z)\bigr{\rvert}\leq\frac{C}{r_{n,m}}

\sup_{z\in{\mathbb{R}}}\bigl{\lvert}{\mathbb{P}}_{n,m}[W\leq z]-\Phi(z)\bigr{\rvert}\leq\frac{C}{r_{n,m}}

r_{n, m} = \frac{σ _{n, m}^{3}}{μ _{n, m} ( 1 + \frac{m ^{2}}{n ^{2}} )} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

STEIN’S METHOD VIA INDUCTION

Louis H. Y. Chen∗, Larry Goldstein‡

and Adrian Röllin∗

(*National University of Singapore∗

and University of Southern California‡*)

Abstract

Applying an inductive technique for Stein and zero bias couplings yields Berry-Esseen theorems for normal approximation for two new examples. The conditions of the main results do not require that the couplings be bounded. Our two applications, one to the Erdős-Rényi random graph with a fixed number of edges, and one to Jack measure on tableaux, demonstrate that the method can handle non-bounded variables with non-trivial global dependence, and can produce bounds in the Kolmogorov metric with the optimal rate.

00footnotetext: AMS 2000 subject classifications: Primary 60F05; secondary 05C07, 05C80, 05E1000footnotetext: Keywords: Kolmogorov distance, optimal rates, Erdős-Rényi random graph, Jack measure

1 Introduction

We present new Berry-Esseen theorems for sums $Y$ of possibly dependent variables by combining both the Stein and zero bias couplings of Stein’s method with the inductive technique of Bolthausen (1984) originally developed for the combinatorial central limit theorem. We apply these results to obtain normal approximations in the Kolmogorov metric for two new examples.

Stein’s method (Stein, (1972), Stein, (1986)) typically proceeds by coupling a random variable $Y$ of interest to a related variable $Y^{\prime}$ ; for an overview see Chen, Goldstein and Shao (2011) and Ross, (2011). Here we develop results that can be applied to the Stein couplings of Chen and Röllin, (2010) and to the zero bias couplings of Goldstein and Reinert, (1997), thus encompassing most of the known couplings that have appeared in the literature, including settings not typically framed in terms of couplings, such as local dependence. The innovation here is the widened scope of the couplings that can be handled that permit applications when the difference $|Y-Y^{\prime}|$ between $Y$ and the coupled $Y^{\prime}$ is not almost surely bounded by a constant, or where the bound on this difference increases in the problem size. This work is a broad extension and continuation of Ghosh (2009), applying induction and the zero bias coupling for the combinatorial central limit theorem where the random permutations are involutions, and of Goldstein (2013) using the size bias coupling to study degree counts in the Erdős-Rényi random graph; the inductive method considered here is inspired by Bolthausen, (1984), but goes ultimately back to Bergström, (1944).

At the center of Stein’s method is the characterization that $Z$ is a standard normal random variable if and only if

[TABLE]

for all locally absolutely continuous functions $f$ for which the above expectations exist. Given a standardized variable $W$ whose distribution is to be compared to $Z$ , and a test function $h$ on which to evaluate the difference $\mathop{{}{\mathbb{E}}}\mathopen{}h(W)-\mathop{{}{\mathbb{E}}}\mathopen{}h(Z)$ , one solves the Stein equation

[TABLE]

for $f$ . The difference $\mathop{{}{\mathbb{E}}}\mathopen{}h(W)-\mathop{{}{\mathbb{E}}}\mathopen{}h(Z)$ may then be evaluated by substituting $W$ for $w$ and taking expectation on the left hand side of (1.1), rather than the right. One explanation of why the expectation of the left hand side may simpler to compute, or bound, than that of the right is that it depends only on the distribution of $W$ , whereas the right also depends on that of $Z$ . In particular, on the left hand side one may apply couplings of $W$ to auxiliary random variables having properties that allow for convenient manipulations.

In Theorem 1.1 we present results for situations in which one can form a Stein coupling as defined by Chen and Röllin, (2010). Following the treatment there, we say that the triple $(W,W^{\prime},G)$ of random variables is a Stein coupling when

[TABLE]

for all functions $f$ for which the expectations above exist. It is not difficult to see that the canonical exchangeable pair coupling of Stein, (1986), and the size bias coupling of Goldstein and Rinott, (1996) are both special cases of Stein couplings. Indeed, recall that for $\lambda\in(0,1]$ we say $(W,W^{\prime})$ is a $\lambda$ -Stein pair if $(W,W^{\prime})$ is exchangeable and

[TABLE]

In this case, it is easily verified that (1.2) is satisfied with

[TABLE]

Likewise, for a non-negative random variable $Y$ with finite mean $\mu$ , we say that $(Y,Y^{\prime})$ is a size bias coupling of $Y$ when $Y^{\prime}$ has the $Y$ -size bias distribution, that is, when

[TABLE]

for all functions $f$ for which these expectations exist. Again, it is easy to verify that for such couplings (1.2) is satisfied with

[TABLE]

In particular, Theorem 1.1 extend results in Goldstein (2013) for the size bias coupling.

Theorem 1.2 provides a parallel result for the zero bias coupling $(W,W^{*})$ of Goldstein and Reinert, (1997). Recall that for a non-trivial mean zero, variance $\sigma^{2}$ random variable $W$ , we say that $W^{*}$ has the $W$ -zero biased distribution if

[TABLE]

for all functions $f$ for which the quantities above exist.

In Stein’s method in general, simplification occurs when one can achieve couplings of $W$ to an appropriate $W^{\prime}$ such that the difference is almost surely bounded, or bounded uniformly in the size of the problem. However, in many situations appropriately bounded couplings may be difficult to construct, whereas unbounded couplings seem to appear naturally. Hence Theorems 1.1 and 1.2, which do not impose restrictive boundedness conditions, may be applied to produce new results in a variety of examples.

General Framework.

Let $(\Theta,{\mathcal{T}})$ and $(\Omega,{\mathcal{F}})$ be two measurable spaces, the parameter space and the sample space, respectively. All random variables are understood to be real valued measurable functions from the product space $(\Theta\times\Omega,{\mathcal{T}}\otimes{\mathcal{F}})$ . The distribution of a random variable $X$ is determined by a parameter $\theta\in\Theta$ through a given transition kernel ${\mathbb{P}}_{\theta}$ from $\Theta$ to $\Omega$ . That is, for each $\theta\in\Theta$ , ${\mathbb{P}}_{\theta}[\cdot]$ is a probability measure on $(\Omega,{\mathcal{F}})$ , and for each $A\in{\mathcal{F}}$ , the map ${\mathbb{P}}_{\cdot}[A]$ is ${\mathcal{T}}$ -measurable. Depending on context and emphasis, we may also write $X$ as $X(\theta,\omega)$ or $X_{\theta}(\omega)$ , so that, for instance, $\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}X=\int_{\Omega}X(\theta,\omega){\mathbb{P}}_{\theta}[d\omega]$ .

These measurability conditions are needed to assure the measurability of mappings that appear later, such as of the mean $\mu_{\theta}$ , the variance $\sigma_{\theta}^{2}$ of $Y$ , and of $Y_{\Psi(\theta,\omega)}(\omega)$ , which represents the value of $Y$ at the parameter used in the inductive step. These conditions will not always be invoked explicitly below; we illustrate their use by showing in the Appendix, Section 5, that this latter variable in particular is measurable.

Our goal is to obtain bounds on the Kolmogorov distance between the standardized version $W$ of a random variable $Y$ and the normal distribution in terms of the parameter $\theta$ . Theorems 1.1 and 1.2 below yield a bound of the form $C/r_{\theta}$ for $r_{\theta}$ a positive ‘rate’ function of $\theta$ and $C$ a constant not depending on $\theta$ .

As noted, one main step our method requires is to couple $W$ to a random variable $W^{\prime}$ , which satisfies either the Stein coupling relation (1.2) or the zero bias coupling relation (1.4). In order to apply induction, we identify a subset $\raisebox{-0.6458pt}{\Smiley}\subset\Theta$ in Condition (G1), consisting of the ‘nicely behaved’ parameters; its complement plays the role of the base case, on which the bound $C/r_{\theta}$ may be trivial. For our bound to be informative, it is necessary that the rate function $r_{\theta}$ be unbounded on \Smiley.

For the induction step, we also introduce a sub $\sigma$ -algebra ${\mathcal{F}}_{\theta}$ that, roughly speaking, captures the information about the changes that were necessary to construct $W^{\prime}$ from $W$ (or equivalently, $Y^{\prime}$ from $Y$ ); the coarser ${\mathcal{F}}_{\theta}$ is, the better the normal approximation will be. A certain tension is created here, as ${\mathcal{F}}_{\theta}$ must be large enough to contain the variables describing the changes from $Y$ to $Y^{\prime}$ , but small enough so that the conditional distribution of $Y$ on ${\mathcal{F}}_{\theta}$ , is sufficiently close to its original one.

Conditional on ${\mathcal{F}}_{\theta}$ , the variable $Y$ may no longer have its original distribution, but induction is viable when one can identify within $Y$ another variable $V$ that has a distribution similar to the original $Y$ ; when the parameter space $\Theta$ is ordered, $V$ typically has a smaller parameter. For a successful induction, the parameter of the smaller problem should not stray too far from that of $Y$ . There is some leeway here, as it suffices to have control over an event $F_{\theta,1}$ , as specified in Condition (G4). Intuitively, the event $F_{\theta,1}$ should contain the bulk of the support of the variables that generate ${\mathcal{F}}_{\theta}$ , and not their extremes. For instance, for the Erdős-Rényi graph problem considered, ${\mathcal{F}}_{\theta}$ contains the label and degree of a chosen vertex on which the coupling is based, and $F_{\theta,1}$ is an even on which its degree is ‘not too large’.

Relaxing the condition that the difference $D=W^{\prime}-W$ be bounded, we control the magnitude of this difference by its moments. Moreover, we upper bound $D$ by $\overline{D}$ , and in the case of a Stein coupling, also $G$ by $\overline{G}$ , where these majorizing variables are required to be ${\cal F}_{\theta}$ measurable; we are able to handle exceptional or boundary cases as these upper bounds are only required to hold on $F_{\theta,1}$ . We will also require the existence of a random variable $B$ that bounds the absolute difference $|Y-V|$ , and which is not ‘too large.’ See Conditions (G3), (G4) and (G6) for the case of Stein couplings.

There is also some leeway in that the distribution of $V$ , conditionally on ${\mathcal{F}}_{\theta}$ , only needs to be close to that of $Y$ on an event $F_{\theta,2}\in{\mathcal{F}}_{\theta}$ . Precisely, for the Stein coupling case, with similar remarks also applying to zero bias couplings, we impose in Condition (G5) that

[TABLE]

where $\Psi_{\theta}$ is the (typically random) parameter capturing the conditional distribution of the embedded variable $V$ . For clarification, by (1.5) we mean

[TABLE]

With the help of $V$ , a recursive inequality for a bound on the distance between $W$ and the normal can be produced.

Before attempting to apply the methods presented in this article, it is advisable that a user first ‘test the waters’ by constructing a Stein or zero-bias coupling and proving a normal approximation for a smooth metric such as the Wasserstein distance; see Chen and Röllin, (2010), or Goldstein (2007), respectively. Once this goal has been achieved, the sigma-algebra ${\mathcal{F}}_{\theta}$ will typically arise naturally from the coupling construction, and one may then proceed to identify a suitable variable $V$ whose conditional distribution given ${\mathcal{F}}_{\theta}$ is within the same class of distributions determined by $\Theta$ and close to that of $Y$ . For instance, in occupancy problems, a Stein coupling or zero-bias coupling typically involves moving around a small number of balls among a small number of urns, and $V$ will typically again represent an occupancy problem, but on fewer balls and fewer urns.

1.1 Abstract approximation theorems

We now state the conditions required for our main results. The inverse rate function $r_{\theta}$ is assumed to be a positive function, measurable in $\theta$ , a condition satisfied for all natural examples, including the ones considered here. The mean $\mu_{\theta}=\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}Y$ and variance $\sigma_{\theta}^{2}=\mathop{\mathrm{Var}}\nolimits_{\theta}(Y)$ are measurable by the conditions in our General Framework. To avoid repetition, the distribution of random variables indicated after $\theta\in\Theta$ has been fixed is with respect to ${\mathscr{L}}_{\theta}(\cdot)$ . The random variable $Z$ will always denote the standard normal.

The variable $Y$ denotes the unstandardized random variable of interest. Theorem 1.1 shows that the following set of conditions are sufficient for the Kolmogorov distance between the standardized version $W$ of $Y$ and the normal to be bounded by $C/r_{\theta}$ for some universal constant $C$ .

(G1)

Let $r_{\theta}$ be a positive measurable function, let $\overline{r}$ be a positive number, and let

[TABLE]

Assume that $\overline{r}$ is chosen such that $\mathop{\mathrm{Var}}\nolimits_{\theta}Y>0$ for all $\theta\in\raisebox{-0.6458pt}{\Smiley}$ . 2. (G2)

For all $\theta\in\Theta$ , let $\mu_{\theta}=\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}Y$ and $\sigma^{2}_{\theta}=\mathop{\mathrm{Var}}\nolimits_{\theta}Y$ , and define

[TABLE]

whenever $\sigma_{\theta}>0$ , and set $W=0$ otherwise. Let $W^{\prime}$ and $G$ be two random variables such that, for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , $(W,W^{\prime},G)$ is a Stein coupling, in the sense of (1.2), with respect to ${\mathbb{P}}_{\theta}$ . 3. (G3)

With $D=W^{\prime}-W$ assume that

[TABLE] 4. (G4)

For each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , let ${\mathcal{F}}_{\theta}\subset{\mathcal{F}}$ be a sub- $\sigma$ -algebra. Let $\overline{G}$ and $\overline{D}$ be random variables such that, for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , the mappings $\overline{G}(\theta,\cdot)$ and $\overline{D}(\theta,\cdot)$ are ${\mathcal{F}}_{\theta}$ -measurable and such that, on some event $F_{\theta,1}$ which need not be in ${\cal F}_{\theta}$ , we have $|G|\leq\overline{G}$ , $|D|\leq\overline{D}$ , and

[TABLE] 5. (G5)

Let $\Psi$ be a $\Theta$ -valued random element such that, for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , $\Psi(\theta,\cdot)$ is ${\mathcal{F}}_{\theta}$ -measurable. Let $V$ be a random variable, and for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , let $F_{\theta,2}\in{\cal F}_{\theta}$ be such that

[TABLE]

and

[TABLE] 6. (G6)

Let $\overline{B}$ be a random variable such that, for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , $\overline{B}(\theta,\cdot)$ is ${\mathcal{F}}_{\theta}$ -measurable,

[TABLE] 7. (G7)

Assume

[TABLE]

where the essential suprema are taken with respect to ${\mathbb{P}}_{\theta}$ .

Theorem 1.1.

If Conditions (G1)– (G7) are satisfied, then there exists a constant $C$ , independent of $\theta$ , such that

[TABLE]

Theorem 1.1 extends Theorem 1.1 in Goldstein (2013), which produces a Kolmogorov bound equivalent up to constants to the bound in Chen and Röllin, (2010) for the Wasserstein distance to the normal for bounded size bias couplings. In addition, the bound produced by Bartroff and Goldstein, (2013) by an application of Theorem 1.1 of Goldstein (2013) to counts in a multinomial occupancy model was shown there to be of optimal order by the lower bound (1.6) of Englund, (1981), see also (1.7) of Bartroff and Goldstein, (2013); the bound of Theorem 1.2 of Goldstein (2013), using also Theorem 1.1 of that same work, for degree counts in the Erdős-Rényi random graph can also be shown to be optimal up to constant factors in the same manner.

When higher moments exist a number of the conditions of the theorem may be verified using simpler expressions, obtained via standard inequalities. For instance, using $f(w)=w$ and that $\mathop{\mathrm{Var}}\nolimits_{\theta}(W)=1$ in (1.2) shows that $\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}(GD)=1$ , hence applying the Cauchy-Schwarz inequality to the first expression in (1.7) in Condition (G3) above, followed by a consequence of the conditional variance formula, we obtain

[TABLE]

where ${\mathcal{H}}$ is any $\sigma$ -algebra with respect to which $W$ is measurable.

We now state a parallel result for zero bias couplings.

(Z1)

Let $r_{\theta}$ be a positive measurable function, let $\overline{r}$ a positive number, and let

[TABLE]

Assume that $\overline{r}$ is chosen such that $\mathop{\mathrm{Var}}\nolimits_{\theta}Y>0$ for all $\theta\in\raisebox{-0.6458pt}{\Smiley}$ . 2. (Z2)

Let $\mu_{\theta}=\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}Y$ and $\sigma^{2}_{\theta}=\mathop{\mathrm{Var}}\nolimits_{\theta}Y$ , and define

[TABLE]

whenever $\sigma_{\theta}>0$ and $W=0$ otherwise. Let $W^{*}$ be defined on $\Omega$ , such that for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ the variable $W^{*}$ has the $W$ -zero bias distribution as in (1.4) with respect to ${\mathbb{P}}_{\theta}$ . 3. (Z3)

For each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ let ${\mathcal{F}}_{\theta}$ be a sub-sigma algebra of ${\mathcal{F}}$ , let $D=W^{*}-W$ , and let ${\overline{D}}$ be a random variable such that ${\overline{D}}(\theta,\cdot)$ is ${\mathcal{F}}_{\theta}$ -measurable, and let $F_{\theta,1}$ be an event, which need not be in ${\cal F}_{\theta}$ , on which $|D|\leq{\overline{D}}$ and such that

[TABLE] 4. (Z4)

Let $V$ be a random variable, and let $\Psi$ be a $\Theta$ -valued random element such that, for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , $\Psi(\theta,\cdot)$ is ${\mathcal{F}}_{\theta}$ -measurable. For each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , let $F_{\theta,2}$ be an event in ${\cal F}_{\theta}$ such that

[TABLE]

and

[TABLE] 5. (Z5)

Let $\overline{B}$ be a random variable such that, for each $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , $\overline{B}(\theta,\cdot)$ is ${\mathcal{F}}_{\theta}$ -measurable, and

[TABLE]

Theorem 1.2.

If Conditions (Z1)– (Z5) and (G7) are satisfied, then there exists a constant $C$ , independent of $\theta$ , such that

[TABLE]

Many of the conditions of Theorem 1.2, as for Theorem 1.1, can be shown to be satisfied using inequalities on moments. The proofs of Theorems 1.1 and 1.2 appear in Section 4.

1.2 Applications

We apply Theorems 1.1 and 1.2 to obtain new results in two examples; the proofs are deferred to Sections 2 and 3.

The first examples invokes Theorem 1.1 for Stein couplings for the normal approximation of the number $Y$ of isolated vertices in the Erdős-Rényi graph ${\mathcal{G}}\sim\mathop{\mathrm{ER}}(n,m)$ on $n$ vertices, having exactly $m$ edges, distributed uniformly at random. This model is related to the one where edges between each pair of vertices are chosen independently with some fixed probability, but in the model we consider the indicators that vertices are isolated exhibit a non-trivial global dependence since the total number of edges is fixed. In fact, while in the model with independent edges these indicators are positively correlated, the effect of the global dependence in $\mathop{\mathrm{ER}}(n,m)$ is stronger, resulting in a negative correlation; see proof of Lemma 2.5.

Related work was done by Kordecki, (1987) on the number of isolated vertices in the Erdős-Rényi graph model, although his general framework is not applicable here.The boundedness of the second derivative of the solution to the Stein equation on page 132 is shown only for the points where the second derivative exists, whereas, in order to perform the Taylor expansion on page 135, it is needed to hold everywhere; we were thus not able to reproduce his final results. In addition, the fixed number of edges model does not appear to satisfy the condition on page 134 of his work. We also mention the work by Goldstein (2013), who considered vertex degrees in general, though it only addressed the independent edge model.

Theorem 1.3 provides the following bound on the Kolmogorov distance between the standardized variable $Y$ and the normal.

Theorem 1.3.

Let $Y$ count the number of isolated vertices in the Erdős-Rényi graph ${\mathcal{G}}\sim\mathop{\mathrm{ER}}(n,m)$ on $n$ vertices, having exactly $m$ edges, distributed uniformly at random. Then, with $\mu_{n,m}$ and $\sigma_{n,m}^{2}$ the mean and variance of $Y$ , letting $W=(Y-\mu_{n,m})/\sigma_{n,m}$ when $\sigma_{n,m}>0$ and zero otherwise, with

[TABLE]

there exists a universal constant $C>0$ such that, for all $(n,m)\in\Theta$ ,

[TABLE]

where

[TABLE]

Remark 1.4.

In order to better understand the bounds obtained in Theorem 1.3, we now discuss in more detail the different regimes at which $m$ and $n$ can tend to infinity. To this end, denote by $a(n)\sim b(n)$ that $\lim a(n)/b(n)=1$ , and by $a(n)\asymp b(n)$ that $\liminf a(n)/b(n)>0$ and $\limsup a(n)/b(n)<\infty$ . By Lemma 2.7, if $n$ and $m$ tend to infinity so that $\max\{m/n^{2},m^{2}/n^{3}\}\rightarrow 0$ , then

[TABLE]

Hence, we have

[TABLE]

so that

[TABLE]

For $n\asymp m$ , the central domain, it follows that $r_{n,m}\asymp\sigma_{n,m}$ , and moreover, in the special case where $m\sim cn$ ,

[TABLE]

Regarding lower bounds, (Englund,, 1981, Section 6) shows that for the standardized number of occupied cells in a uniform occupancy model with $n$ balls and $m$ boxes,

[TABLE]

Englund’s argument holds without changes for any random variable with finite variance supported on the integers, and so also for the number of isolated vertices in our model. Hence, since in the central domain $r_{n,m}\asymp\sigma_{n,m}$ , the rate function is of optimal order.

If $m\to\infty$ and $m/n\to 0$ , the left domain, say, then

[TABLE]

since $1-e^{-x}(1+x)\sim x^{2}/2$ as $x\to 0$ for the first relation, and $\sigma_{n,m}^{2}\asymp m^{2}/n$ for the second. In this case, Englund’s lower bound is not achieved since $r_{n,m}=\mathop{{}\mathrm{o}}\mathopen{}(\sigma_{n,m})$ . Nonetheless, the bound is informative as long as $r_{n,m}\to\infty$ , which is the case as long as $m/n^{5/6}\to\infty$ , such as when $m=cn^{\alpha}$ for $c>0$ and $5/6<\alpha<1$ .

If $m/n\to\infty$ , the right domain, using $\sigma^{2}_{n,m}\asymp ne^{-2m/n}$ for the second relation we have

[TABLE]

so Englund’s lower bound is not attained. However, $r_{n,m}$ goes to infinity when $m\leq\alpha\,n\log n$ for $0<\alpha<1/2$ .

In the second example, we use the zero bias coupling constructed in (Fulman and Goldstein,, 2011, Theorem 3.1) in Theorem 1.2 to give a bound on the normal approximation of the content $Y$ of a Young tableux under Jackα measure over a range of large $\alpha$ . In more detail, we recall that a partition of a positive integer $n$ can be represented as a vector $\Lambda=(\lambda_{1},\ldots,\lambda_{p})$ of non-increasing, positive integers summing to $n$ , where $p$ is the number of parts of the partition. For instance, $\Lambda=(4,2,1)$ corresponds to a partition of $n=7$ with $p=3$ . In turn, the partition $\Lambda$ can be represented by a tableaux with $p$ rows of equal sized boxes, whose $j^{\mathrm{th}}$ row is of length $\lambda_{j}$ , such as in (1.23).

The Jackα measure on tableaux, defined for $\alpha>0$ , recovers the Plancherel measure when specializing to the case $\alpha=1$ . Under Jackα, see Fulman, (2004) for instance, the probability of a partition $\Lambda$ of $n$ is given by

[TABLE]

where the product is over all boxes $x$ in the partition, $a(x)$ denotes the number of boxes in the same row of $x$ and to the right of $x$ (the “arm” of $x$ ), and $l(x)$ denotes the number of boxes in the same column of $x$ and below $x$ (the “leg” of $x$ ). For each tableaux representing a partition of $n$ we may define the $\alpha$ -content of any individual box by

[TABLE]

as depicted in the following tableaux for the partition $(4,2,1)$ of 7:

[TABLE]

Here we study the distribution of the standardized sum of the $\alpha$ -contents over all boxes in the tableaux, that is,

[TABLE]

and where the partition $\Lambda_{n}$ of $n$ is sampled from the Jackα measure in (1.22).

Fulman, (2004) proved an $\mathop{{}\mathrm{O}}\mathopen{}(n^{-1/4})$ bound for the error in the Kolmogorov metric for the normal approximation of $W$ , improved by Fulman, (2006) using martingales to $\mathop{{}\mathrm{O}}\mathopen{}(n^{-1/2+\varepsilon})$ for any $\varepsilon>0$ , and by Fulman, (2006) to $\mathop{{}\mathrm{O}}\mathopen{}(n^{-1/2})$ using Bolthausen’s inductive approach and Stein’s method, but without an explicit constant. Hora and Obata, (2007) prove a central limit theorem, with no error bound, for $W_{n,\alpha}$ using quantum probability.

Fulman and Goldstein, (2011) prove the bound

[TABLE]

in the Wasserstein metric $d_{1}$ , where $Z$ is a standard normal variable. In addition to providing explicit constants, this bound also highlights the role of $\alpha$ . A natural question it brings is whether a bound in the Kolmogorov metric can be shown that has this same dependence on $\alpha$ . A few weeks before the current work was posted, (Chen and Thánh,, 2019, Theorem 1.1) proved the bound

[TABLE]

which achieves this goal with an explicit constant to within a logarithmic factor.

Here, given any $\varepsilon\in(0,1)$ , we show that, in the ‘large $\alpha$ ’ region $\alpha\geq n^{1+\varepsilon}$ , this log factor may be removed, resulting in the bound having the same $\alpha$ dependence as (1.25). That is, as $\alpha\geq n$ over the region we consider, the ratio between the right hand sides of (1.25) and (1.26) is bounded away from zero and infinity. This same result, with an explicit constant, was also achieved by (Chen and Thánh,, 2019, Proposition 4.1) by applying a different approach. We do not consider $\varepsilon>1$ , as Theorem 3.1 below shows that this case is degenerate.

Theorem 1.5.

For $W$ as given in (1.24) with $\Lambda_{n}$ sampled according to Jackα measure for some $n\geq 2$ , for every $\varepsilon\in(0,1)$ there exists a constant $C$ depending only on $\varepsilon$ such that

[TABLE]

We remark that by applying the reasoning at the end of the proof of Theorem 4.1 of Fulman and Goldstein, (2011) the result holds also for $\alpha\leq n^{-1-\varepsilon}$ when replacing the $\alpha$ on the right hand side by $1/\alpha$ . In the computations that follow, $C$ without subscript will denote a universal constant whose value may change from line to line, and for $n$ a non-negative integer, $[n]$ will denote the set $\{1,\ldots,n\}$ .

2 Isolated vertices in the Erdős-Rényi random graph

In this section we prove Theorem 1.3. We begin by reviewing Construction 2A of Chen and Röllin, (2010) for Stein couplings. Let ${\bf X}=(X_{1},\ldots,X_{n})$ be a collection of mean zero random variables, and let $I$ be a random index uniformly distributed over $[n]$ , independent of ${\bf X}$ . Let $W=\sum_{i\in[n]}X_{i}$ and suppose that for each $i=1,\ldots,n$ there exists $W_{i}^{\prime}$ such that

[TABLE]

Then, with $G=-nX_{I}$ , the triple $(W,W_{I}^{\prime},G)$ is a Stein coupling. To verify the claim, first note that

[TABLE]

On the other hand,

[TABLE]

so (1.2) holds.

2.1 Isolated vertices in $\mathop{\mathrm{ER}}(n,m)$

Consider the Erdős and Rényi, (1960) random graph ${\mathcal{G}}\sim\mathop{\mathrm{ER}}(n,m)$ on $n$ vertices, having exactly $m$ edges, distributed uniformly at random. Let $d_{v}$ be the degree of vertex $v\in[n]$ , and consider the number of isolated vertices

[TABLE]

With $N={n\choose 2}$ , the mean and variance of $Y$ are given by, respectively,

[TABLE]

We remark that though there may be a choice of couplings for a given situation, the coupling we have chosen will work for the more general problem where $Y$ is a sum

[TABLE]

of functions $h_{v}$ of the degree $d_{v}$ of vertex $v$ . For instance, the size bias coupling will work, as in Goldstein (2013), for counting the number of vertices having specified degrees, but not in this greater generality.

Proof of Theorem 1.3.

The proof consists of the setting up the framework, and then checking that Conditions (G1)–(G7) hold, with Condition (G2) requiring the construction of a Stein coupling. First, let ${\mathcal{E}}_{n}$ be the enumeration of all $N$ unordered pairs $\{v,w\}\subset[n]$ with $v\neq w$ , given by

[TABLE]

Let $\pi$ be a uniformly chosen random permutation of $[N]$ . We will describe the construction of a graph ${\mathcal{G}}(m,\pi)$ , determined by $m$ and $\pi$ , that has distribution $\mathop{\mathrm{ER}}(n,m)$ . As $n$ is determined by $N$ , and hence by $\pi$ , $n$ may be omitted in the notation for the graph; the same principle will be applied without comment for like quantities that appear later.

We construct ${\mathcal{G}}(m,\pi)$ as follows. For each $\{v,w\}\subset[n]$ with $v<w$ , connect vertices $v$ and $w$ with an edge if and only if

[TABLE]

where $i$ is the index in the enumeration (2.2) corresponding to the pair $\{v,w\}$ . Clearly this construction results in a graph with $m$ edges, precisely, those with labels $\{\pi(1),\ldots,\pi(m)\}$ . Since $\pi$ is uniform it is immediate that ${\mathcal{G}}(m,\pi)\sim\mathop{\mathrm{ER}}(n,m)$ . Let $d_{v}(m,\pi)$ be the degree of vertex $v\in[n]$ in ${\mathcal{G}}(m,\pi)$ , let

[TABLE]

We now verify the conditions of Theorem 1.1 with $\Theta$ and $r_{n,m}$ as given in (1.20) and (1.21), respectively.

Condition (G1).

Let $n_{0}$ , $m_{0}$ , $c_{0}$ and $C_{0}$ be as in Lemma 2.7. Now obtain $\overline{r}$ in the definition (1.6) of \Smiley through Lemma 2.8 and the choices

[TABLE]

Since our definition of $r_{n,m}$ in (1.21) implies that $r_{n,m}=0$ whenever $\sigma^{2}_{n,m}=0$ , the condition that $\sigma^{2}_{n,m}>0$ on \Smileyis satisfied. Note that by Lemma 2.8

[TABLE]

Condition (G2).

For $(n,m)\in\raisebox{-0.6458pt}{\Smiley}$ , let

[TABLE]

and set $W=0$ otherwise. Assume $(n,m)\in\raisebox{-0.6458pt}{\Smiley}$ . Let $\Sigma=(\sigma_{1},\dots,\sigma_{n})$ be a collection of uniform random permutations of $[N]$ , with $\pi,\sigma_{1},\ldots,\sigma_{n}$ mutually independent. The purpose of the following algorithm is to take the graph ${\cal G}(m,\pi)$ as input and to construct, for each vertex $v\in[n]$ , a graph ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ on the $n-1$ vertices $[n]\setminus\{v\}$ , having distribution $\mathop{\mathrm{ER}}(n-1,m)$ , independent of $d_{v}(m,\pi)$ , and which can be closely coupled to ${\mathcal{G}}(m,\pi)$ .

We first describe the algorithm in words: Initialise counters $k$ and $i$ that respectively record the number of edges successfully relocated, and the index of a candidate edge for possible addition to the new graph; for each given vertex $v\in[n]$ , begin with ${\mathcal{G}}(m,\pi)$ and relocate the $d_{v}(m,\pi)$ edges incident to $v$ uniformly by, incrementing $i$ when needed, adding ${\mathcal{E}}_{n}(\sigma_{v}(i))$ as a new edge when it connects two vertices, neither of which are incident to $v$ (Step 6), and which are not already connected (Step 7). The counter $k$ records the number of edges successfully relocated, and the set $L^{v}(m,\pi,\sigma_{v})$ holds their locations (that is, indices) in ${\cal E}_{n}$ . At termination, the set $L^{v}(m,\pi,\sigma_{v})$ will have size $d_{v}(m,\pi)$ .

Algorithm 1. Fix $v\in[n]$ .

@afterheading

1.

Let $L^{v}(m,\pi,\sigma_{v})\leftarrow\emptyset$ 2. 2.

Let ${\mathcal{G}}^{\prime}$ be equal to ${\mathcal{G}}(m,\pi)$ , but with vertex $v$ and all $d_{v}(m,\pi)$ edges incident to $v$ removed. 3. 3.

Let $k\leftarrow 0$ and $i\leftarrow 0$ . 4. 4.

If $k=d_{v}(m,\pi)$ , then denote the resulting graph by ${\cal G}^{v}(m,\pi,\sigma_{v})$ , and stop. 5. 5.

Let $i\leftarrow i+1$ . 6. 6.

If $v\in{\mathcal{E}}_{n}(\sigma_{v}(i))$ , then return to Step 5. 7. 7.

If $\pi^{-1}(\sigma_{v}(i))\leq m$ , that is, if ${\mathcal{E}}_{n}(\sigma_{v}(i))$ is an edge in ${\mathcal{G}}(m,\pi)$ , then return to Step 5. 8. 8.

In ${\mathcal{G}}^{\prime}$ connect the vertices in ${\mathcal{E}}_{n}(\sigma_{v}(i))$ by an edge, and let $L^{v}(m,\pi,\sigma_{v})\leftarrow L^{v}(m,\pi,\sigma_{v})\cup\{\sigma_{v}(i)\}$ . 9. 9.

Let $k\leftarrow k+1$ . 10. 10.

Return to Step 4.

It is not difficult to see that the algorithm will succeed in redistributing the edges incident on $v$ if and only if $m\leq{n-1\choose 2}$ , which is guaranteed by our choice of \Smiley. Note that, given $m$ , $\pi$ and $\sigma_{v}$ , the construction of ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ from ${\mathcal{G}}(m,\pi)$ is deterministic and hence, for given $m$ , $\pi$ and $\sigma_{v}$ , will always result in the same graph ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ . Note also that, although ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ has only $n-1$ vertices, we keep the labeling from the original graph ${\mathcal{G}}(m,\pi)$ . Since the order at which potential locations where the $d_{v}(m,\pi)$ edges are added are sampled uniformly at random without replacement (via $\sigma_{v}$ ), it is clear that ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})\sim\mathop{\mathrm{ER}}(n-1,m)$ , up to vertex labeling.

Now, let $W=W(m,\pi)$ as in (2.7). With ${\mathcal{V}}$ a uniformly chosen vertex from $[n]$ , independent of $\pi,\sigma_{1},\ldots,\sigma_{n}$ , and recalling the notation in (2.4), let

[TABLE]

For $w\neq v$ , let $d^{v}_{w}(m,\pi,\sigma_{v})$ be the degree of vertex $w$ in the graph ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ , let

[TABLE]

and

[TABLE]

Since the distribution of ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ is the same regardless of the value of $d_{v}(m,\pi)$ , we conclude that $I_{v}(m,\pi)-\mu_{n,m}/n$ and $Y^{v}(m,\pi,\sigma_{v})$ are independent, so (2.1) holds, implying $(W,W^{\prime},G)$ is a Stein coupling.

Condition (G3).

In what follows, consider a fixed $(n,m)\in\raisebox{-0.6458pt}{\Smiley}$ , and drop the subscript $\theta$ in the expectations that follow. As $W$ is a function of $(\pi,\Sigma)$ , using (1.15) we have

[TABLE]

Now, from (2.8) and (2.9), we have

[TABLE]

Splitting the sum into two and using $\mathop{\mathrm{Var}}\nolimits(X+Y)\leq 2\mathop{\mathrm{Var}}\nolimits X+2\mathop{\mathrm{Var}}\nolimits Y$ , we have

[TABLE]

where

[TABLE]

with $B_{v}(m,\pi,\sigma_{v})=Y(m,\pi)-Y^{v}(m,\pi,\sigma_{v})$ . Note that $f_{m}(\pi,\Sigma)$ and $g_{m}(\pi,\Sigma)$ are deterministic functions of $m$ , $\pi$ and $\Sigma$ . Applying Lemma 2.1 and using the notation as there, we obtain

[TABLE]

where

[TABLE]

Bounding $\boldsymbol{R_{g,1}}$ .

Note that

[TABLE]

since all differences arising from the first sum in (Condition (G3).) cancel except the one with index $v=i$ . Applying the simple bound

[TABLE]

we obtain

[TABLE]

Let $\mathop{\mathrm{Hyp}}(N,m,n)$ count the number of white balls among $m$ draws from an urn with $N$ balls, $n$ of which are white and $N-n$ black. Note that the marginal distribution of the degree of any vertex in ${\cal G}(m,\pi)$ is $\mathop{\mathrm{Hyp}}\bigl{(}N,m,n-1\bigr{)}$ , and hence has mean $2m/n$ , since the graph’s $m$ edges are uniformly sampled among all $N$ possibilities, and exactly $n-1$ of them are associated with a specific vertex. Hence, applying Lemma 2.2, (2.12) and (2.13), we obtain

[TABLE]

where we recall $C$ denotes a universal constant, whose value may change from line to line. Thus, as $\mu_{n,m}\leq n$ ,

[TABLE]

Bounding $\boldsymbol{R_{f,1}}$ .

As for $g_{m}$ , we likewise have

[TABLE]

Noting that, if $I_{i}(m,\pi)=1$ , we have $d_{i}(m,\pi)=0$ and hence $B_{i}(m,\pi,\sigma_{i})=B_{i}(m,\pi,\sigma_{i}^{\prime})=1$ , it is immediate that

[TABLE]

Bounding $\boldsymbol{R_{g,2}}$ .

In order to bound $R_{g,2}$ , with $\tau_{ij}$ the transposition of $i$ and $j$ , note first that

[TABLE]

since $g_{m}$ is a function of the graph ${\mathcal{G}}(m,\pi)$ and $\Sigma$ , and by (2.3), the graph ${\mathcal{G}}(m,\pi)$ obtained from $\pi$ does not change when swapping edge with edge or non-edge with non-edge. Hence, averaging over $\tau_{j}$ , a transposition of $j$ and a uniformly chosen index in $\{j,\ldots,N\}$ , yields

[TABLE]

By exchangeability the expectation on the right hand side is constant for $j\leq m$ and $i\geq m+1$ ; hence, for such $i$ and $j$ ,

[TABLE]

so that

[TABLE]

Now,

[TABLE]

here, we have first applied the inequality $(x+y)^{2}\leq 2x^{2}+2y^{2}$ , followed by (2.16) with $m$ replaced by $m+1$ to the first expectation in the expression that results to yield that $g_{m+1}(\pi\tau_{1,m+1},\Sigma)=g_{m+1}(\pi,\Sigma)$ , and $\pi\tau_{1,m+1}=_{d}\pi$ to the second expectation, where $=_{d}$ denotes equality in distribution. Hence,

[TABLE]

Now, recalling that $L^{v}(m,\pi,\sigma_{v})$ is the set of indices of edges to which those edges adjacent to vertex $v$ were relocated, let

[TABLE]

the set of vertices that received at least one additional edge when redistributing those edges. Also, let

[TABLE]

the neighbours of $v$ that did not receive a new edge when redistributing the edges incident on $v$ .

Note that the chosen vertex $v$ will increase the difference $Y(m,\pi)-Y^{v}(m,\pi,\sigma_{v})$ by one if it is isolated in ${\mathcal{G}}(m,\pi)$ . A vertex $w\not=v$ , will have this same effect if $w$ is isolated in ${\mathcal{G}}(m,\pi)$ but then has an edge attached to it in the redistribution of the removed edges of $v$ . On the other hand, a vertex $w\not=v$ will decrease this difference by one when $w$ is connected to $v$ , and has degree 1 in ${\mathcal{G}}(m,\pi)$ , and does not have such an edge reattached. Hence, this difference is given by

[TABLE]

where $I_{w,1}(m,\pi)=\mathop{{}\mathrm{I}}[d_{w}(m,\pi)=1]$ . Letting $\triangle$ denote set difference, we obtain

[TABLE]

For the first term in (LABEL:65), we have used that for any vertex $w\in[n]$ we can only have $I_{w}(m,\pi)\not=I_{w}(m+1,\pi)$ when $w$ is an endpoint of the additional edge determined by $\pi(m+1)$ , that is, when $w\in{\mathcal{E}}_{n}(\pi(m+1))$ . For the second term in (LABEL:65) we have used similarly that

[TABLE]

Moving now to the third term in (LABEL:65), if $v\notin{\mathcal{E}}_{n}(\pi(m+1))$ and $\pi(m+1)\not\in L^{v}(m,\pi,\sigma_{v})$ , then $L^{v}(m+1,\pi,\sigma_{v})=L^{v}(m,\pi,\sigma_{v})$ ; indeed, if $v\notin{\mathcal{E}}_{n}(\pi(m+1))$ , vertex $v$ has the same degree in both ${\mathcal{G}}(m,\pi)$ and ${\mathcal{G}}(m+1,\pi)$ , and if also $\pi(m+1)\not\in L^{v}(m,\pi,\sigma_{v})$ , then Algorithm 1 will redistribute the edges adjacent to $v$ to the same available pairs of vertices when $v$ has degree $m$ or $m+1$ ; indeed, note that between the two cases $m$ and $m+1$ , Step 7 changes only if $\sigma_{v}(i)=\pi(m+1)$ for any of the $i$ tested there, which is equivalent to $\pi(m+1)\in L^{v}(m,\pi,\sigma_{v})$ ). Therefore, if $L^{v}(m,\pi,\sigma_{v})\neq L^{v}(m+1,\pi,\sigma_{v})$ , we must either have $v\in{}{\mathcal{E}}_{n}(\pi(m+1))$ or $\pi(m+1)\in L^{v}(m,\pi,\sigma_{v})$ . Now, if $v\in{}{\mathcal{E}}_{n}(\pi(m+1))$ , then the degree of $v$ in ${\mathcal{G}}(m+1,\pi)$ is one more than its degree in ${\mathcal{G}}(m,\pi)$ , so $L^{v}(m+1,\pi,\sigma_{v})$ will contain one more edge than $L^{v}(m,\pi,\sigma_{v})$ . And if $\pi(m+1)\in L^{v}(m,\pi,\sigma_{v})$ , then $|L^{v}(m,\pi,\sigma_{v})\triangle L^{v}(m+1,\pi,\sigma_{v})|=2$ since $\pi(m+1)$ will be found blocked when forming ${\cal G}^{v}(m+1,\pi,\sigma_{v})$ and a new non-edge has to be found. Hence,

[TABLE]

For the fourth term in (LABEL:65) we apply the bound

[TABLE]

Finally, for the last term, similarly as for the third, if both $v\notin{\mathcal{E}}_{n}(\pi(m+1))$ and $\pi(m+1)\not\in L^{v}(m,\pi,\sigma_{v})$ , it is easy to see that $M^{v}(m+1,\pi,\sigma_{v})=M^{v}(m,\pi,\sigma_{v})$ ; indeed, under these conditions, the set of vertices adjacent to $v$ does not change with the addition of edge $m+1$ , and moreover, $L^{v}(m+1,\pi,\sigma_{v})=L^{v}(m,\pi,\sigma_{v})$ , which implies $N^{v}(m+1,\pi,\sigma_{v})=N^{v}(m,\pi,\sigma_{v})$ , so that $M^{v}(m+1,\pi,\sigma_{v})=M^{v}(m,\pi,\sigma_{v})$ . Hence, if $M^{v}(m,\pi,\sigma_{v})\neq M^{v}(m+1,\pi,\sigma_{v})$ , we must either have $v\in{\mathcal{E}}_{n}(\pi(m+1))$ or $\pi(m+1)\in L^{v}(m,\pi,\sigma_{v})$ .

If $v\in{\mathcal{E}}_{n}(\pi(m+1))$ , then $v$ has one more neighbour in ${\mathcal{G}}(m+1,\pi)$ than in ${\mathcal{G}}(m,\pi)$ , and so $L^{v}(m+1,\pi,\sigma_{v})$ will contain one more edge than $L^{v}(m,\pi,\sigma_{v})$ . In this case, $M^{v}(m,\pi,\sigma_{v})$ and $M^{v}(m+1,\pi,\sigma_{v})$ can differ by at most three elements. Indeed, they may only differ by the additional neighbour in ${\mathcal{G}}(m+1,\pi)$ , and by at most two existing neighbours of $v$ in ${\mathcal{G}}(m,\pi)$ which were not assigned an edge in $L^{v}(m,\pi,\sigma_{v})$ , but were so assigned in $L^{v}(m+1,\pi,\sigma_{v})$ .

If $\pi(m+1)\in L^{v}(m,\pi,\sigma_{v})$ , then $|L^{v}(m,\pi,\sigma_{v})\triangle L^{v}(m+1,\pi,\sigma_{v})|=2$ , so that $M^{v}(m,\pi,\sigma_{v})$ and $M^{v}(m+1,\pi,\sigma_{v})$ can differ by at most four elements; hence

[TABLE]

Now recalling (2.17), summing (LABEL:65) over $v\in[n]$ and noting that

[TABLE]

we obtain

[TABLE]

where

[TABLE]

For the first term,

[TABLE]

Note that each vertex has at most $n-1$ potential edges available where the new edge $\pi(m+1)$ can be placed. Hence, since $N^{1}(m,\pi,\sigma_{1})\leq 2d_{1}(m,\pi)$ , there are at most $2d_{1}(m,\pi)(n-1)$ potential edges with one end in $N^{1}(m,\pi,\sigma_{1})$ , and so

[TABLE]

Noting that $|N^{1}(m,\pi,\sigma_{v})\cap{\mathcal{E}}_{n}(\pi(m+1))|$ is bounded by $2\mathop{{}\mathrm{I}}[N^{1}(m,\pi,\sigma_{v})\cap{\mathcal{E}}_{n}(\pi(m+1))\neq\emptyset]$ , recalling that $d_{1}(m,\pi)\sim\mathop{\mathrm{Hyp}}(N,m,n-1)$ and using Lemma 2.2, and also (2.6) of Condition (G1), which gives that $m\leq n^{3/2}$ as $\overline{c}\leq 1$ by (2.5), we therefore have

[TABLE]

Moreover, with ${\mathbb{P}}_{12}[\cdot]={\mathbb{P}}[\cdot\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15mud_{1}(m,\pi),d_{2}(m,\pi),N^{1}(m,\pi,\sigma_{1}),N^{2}(m,\pi,\sigma_{2})]$ ,

[TABLE]

since there are at most $2d_{1}(m,\pi)\times 2d_{2}(m,\pi)$ potential edges with one end in $N^{1}(m,\pi,\sigma_{1})$ and the other end in $N^{2}(m,\pi,\sigma_{2})$ . Hence, again using $m\leq n^{3/2}$ and Lemma 2.2, and also Cauchy-Schwarz, we obtain

[TABLE]

so that (2.23) results in the bound

[TABLE]

Next, we have

[TABLE]

To calculate the first probability, we condition on $\pi$ and average over $\sigma_{1}$ . If $1\in{\cal E}_{n}(\pi(m+1))$ , then the conditional probability vanishes, as no edge incident on the (removed) vertex $v$ gets redistributed. Hence, take $\pi$ such that $1\not\in{\cal E}_{n}(\pi(m+1))$ . To compute ${\mathbb{P}}[\pi(m+1)\in L^{1}(m,\pi,\sigma_{1})|\pi]$ , note that there are $N-m$ non-edges of ${\mathcal{G}}(m,\pi)$ , out of which $n-1-d_{1}(m,\pi)$ involve vertex $1$ and can therefore not be used during the redistribution of the $d_{1}(m,\pi)$ edges incident to vertex $1$ , which is to be removed. This leaves $N-m-n+1+d_{1}(m,\pi)$ potential edges from which to draw our sample of $d_{1}(m,\pi)$ non-edges. By uniformity, the probability that $\pi(m+1)$ is in this sample is given by

[TABLE]

as we only ask for the probability that one special object is included in a simple random sample of $d_{1}(m,\pi)$ objects from a population of size $N-m-n+1+d_{1}(m,\pi)$ , and where in the final inequality we have used (2.6) of Condition (G1). Averaging over $\pi$ , for the first term in (2.25) we obtain the bound

[TABLE]

Next, as the events $\pi(m+1)\in L^{1}(m,\pi,\sigma_{1})$ and $\pi(m+1)\in L^{2}(m,\pi,\sigma_{2})$ are conditionally independent given $\pi$ , we may handle the second, off diagonal term of (2.25) by using Lemma 2.2 to give that

[TABLE]

which, recalling (2.26), results in the bound

[TABLE]

Thus, using (2.25), (2.27) and the inequality directly above, we obtain

[TABLE]

Finally, in order to bound $R_{g,2,3}$ , note that the double sum is simply twice the sum over all the vertices of edges in ${\mathcal{G}}(m,\pi)$ . Note also that, as $w$ must have degree at least one to be included in the sum, $I_{w,1}(m,\pi)\neq I_{w,1}(m+1,\pi)$ only if $w$ has degree 1 in ${\mathcal{G}}(n,m)$ and it receives the additional edge $\pi(m+1)$ . Thus, since the additional edge has two endpoints, it is immediate that $R_{g,2,3}$ can be no more than $4$ , so that

[TABLE]

Recalling (2.22) and applying (2.24), (2.29) and (2.30) yields

[TABLE]

Now, by Lemma 2.6, we have $\mu_{n,m}/n\leq\exp(-2m/n)$ , and since $x\exp(-2x)$ remains bounded on the positive real numbers, it follows that $m\mu_{n,m}/n^{2}$ is bounded; hence,

[TABLE]

Bounding $\boldsymbol{R_{f,2}}$ .

Using the same arguments as those used for $R_{g,2}$ to reach (2.17), we can show that

[TABLE]

Adding and subtracting $I_{v}(m+1,\pi)B_{v}(m,\pi,\sigma_{v})$ , and splitting the sum, we obtain

[TABLE]

where

[TABLE]

In order to bound $R_{f,2,1}$ , note first that $I_{v}(m,\pi)-I_{v}(m+1,\pi)$ is non-zero, and in that case equals one, exactly when vertex $v$ is isolated in ${\mathcal{G}}(m,\pi)$ and the $(m+1)^{\mathrm{th}}$ added edge is incident on $v$ ; that is,

[TABLE]

And since $I_{v}(m,\pi)=1$ implies $B_{v}(m,\pi,\sigma_{v})=1$ , we have

[TABLE]

Squaring, taking expectation and using exchangeability, we obtain

[TABLE]

For the first term, we have

[TABLE]

while for the second term

[TABLE]

where we have used that $\mathop{\mathrm{Hyp}}(N,m,n)(\{0\})$ , the probability that a hypergeometric variable with the given parameters takes the value 0, is a decreasing function of the number of special items $n$ . Hence,

[TABLE]

In order to handle $R_{f,2,2}$ , note that if $I_{v}(m+1,\pi)=1$ , we necessarily have $I_{v}(m,\pi)=1$ , so that $B_{v}(m,\pi,\sigma_{v})=B_{v}(m+1,\pi,\sigma_{v})=1$ whenever $I_{v}(m+1,\pi)=1$ ; it follows that

[TABLE]

Therefore,

[TABLE]

Combining the bounds (2.14), (2.15), (2.31) and (2.32) as in (2.11), and then recalling (2.10), we obtain

[TABLE]

Recalling (1.21) and noting that $\sigma_{n,m}^{2}\leq\mu_{n,m}$ by Lemma 2.5, the first condition in (1.7) holds, as

[TABLE]

Next, it clearly suffices to verify the second condition in (1.7) of (G3) with $D$ replaced by its absolute upper bound

[TABLE]

obtained in (2.13), and splitting the resulting expression to be bounded into two terms, we have

[TABLE]

Now, let $a\geq 1$ . Using the given form (2.8) of $G$ , we obtain

[TABLE]

where, for the final inequality, we used that $\overline{D}=1/\sigma_{n,n}$ when $I_{\mathcal{V}}=1$ and that $\mathop{{}{\mathbb{E}}}\mathopen{}I_{\mathcal{V}}=\mu_{n,m}/n$ on the first summand, and Lemma 2.2 on the second summand. Setting $a=2$ we obtain the bound

[TABLE]

on the first term of (2.34).

The second term in (2.34) likewise leads to two terms, corresponding to the two in the second line of (2.35), but with an additional factor of $|W|$ . Now setting $a=2$ , for the first we have, by applying Cauchy-Schwarz,

[TABLE]

Conditional on vertex ${\mathcal{V}}$ being isolated, the distribution of the number of isolated vertices in the $\mathop{\mathrm{ER}}(n,m)$ model is one more than the number of isolated vertices in the $\mathop{\mathrm{ER}}(n-1,m)$ model. Hence, writing

[TABLE]

and using $(x+y)^{2}\leq 2(x^{2}+y^{2})$ twice, we obtain

[TABLE]

Lemma 2.9 yields that the first term is bounded by a constant. For the second term, by removing all edges from the $n^{\mathrm{th}}$ vertex and relocating them among the remaining vertices, we have a coupling of $\mathop{\mathrm{ER}}(n,m)$ and $\mathop{\mathrm{ER}}(n-1,m)$ which yields $|Y_{n-1,m}-Y_{n,m}|\leq 1+2d_{n}$ , so that

[TABLE]

Using that $r_{n,m}$ in (1.21) is lower bounded by $\overline{r}$ , which is at least 1 by Lemma 2.8, and that $\mu_{n,m}\geq\sigma_{n,m}^{2}$ by Lemma 2.5 yields $\sigma_{n,m}\geq(1+(m/n)^{2})$ , and using also (2.37), we conclude that

[TABLE]

For the corresponding second term of (2.35), with $a=2$ and the additional factor of $|W|$ , using Cauchy-Schwarz and $\mathop{{}{\mathbb{E}}}\mathopen{}W^{2}=1$ ,

[TABLE]

applying Lemma 2.2. Combining with (2.36) and (2.38) we see the sum is of the order of (2.39) and it follows that

[TABLE]

Condition (G4).

Let $(n,m)\in\raisebox{-0.6458pt}{\Smiley}$ , and define

[TABLE]

the $\sigma$ -algebra generated by the identity of the vertex chosen to be removed in the coupling and its degree. Letting $\overline{G}=|G|$ , and $\overline{D}$ be as in (2.33), we see that both are clearly ${\mathcal{F}}_{n,m}$ -measurable.

For the first condition in (1.8), let

[TABLE]

Recall (2.5) and (2.6); in particular, on \Smiley, we have $n\geq 344$ and $28\leq m\leq n^{3/2}$ . It is straightforward to check that under these conditions,

[TABLE]

Indeed, if for $m\leq n$ , the bound follows using that $2\log m\leq(1/4-4/344)m$ for $m\geq 28$ , while for $n\leq m$ one verifies, for $n\geq 344$ , that $4\sqrt{n}+2\log n\leq n/4$ .

Now, bounding $D$ by $\overline{D}$ as given in (2.33), writing $F$ as short for $F_{n,m,1}$ and using that ${\mathbb{P}}[I_{\mathcal{V}}=0]=1-\mu_{n,m}/n$ in the final inequality, we obtain

[TABLE]

Since ${\mathcal{V}}$ cannot be both isolated and have positive degree, we have $I_{\mathcal{V}}(1-I_{F})=0$ almsot surely, and so the first term is zero. Applying Cauchy-Schwarz to the second term and then invoking Lemma 2.2,

[TABLE]

By Lemma 2.2 with $\gamma=2m/n$ being the mean of $d_{1}(m,\pi)$ , we have for any $t>\gamma~{}$ that

[TABLE]

trivially, the final expression upper bounds the left hand side for $t\leq\gamma~{}$ as well and hence holds for all $t\geq 0$ . Hence, with $\underline{t}(n,m)$ as in (2.42), by (2.43) and recalling $r_{n,m}$ in (1.21), we obtain

[TABLE]

where we have used that $\sigma^{2}_{n,m}\leq\min\{\mu_{n,m},2m\}$ via Lemma 2.5, and trivially $\mu_{n,m}\leq n$ , for the second inequality, thus showing the first condition in (1.8) is satisfied.

From (2.35) with $a=2$ it follows that

[TABLE]

thus showing that the second condition in (1.8) is also satisfied.

Condition (G5).

Denote by ${\mathcal{G}}^{{\rm emb},{\mathcal{V}}}$ the “embedded” graph obtained by removing vertex ${\mathcal{V}}$ and all its incident edges; we keep the original vertex labeling. As the remaining $m-d_{\mathcal{V}}(m,\pi)$ edges are uniformly distributed over the remaining $n-1$ vertices, conditional on ${\cal F}_{n,m}$ in (2.40), the resulting graph has conditional distribution

[TABLE]

almost surely; this identity is again to be understood up to labeling. In particular, letting $d^{{\rm emb},{\mathcal{V}}}_{w}$ be the degree of vertex $w$ in graph ${\mathcal{G}}^{{\rm emb},{\mathcal{V}}}$ ,

[TABLE]

is the number of isolated vertices of ${\mathcal{G}}^{{\rm emb},{\mathcal{V}}}$ , and (2.45) implies

[TABLE]

Clearly $\Psi$ is ${\mathcal{F}}_{n,m}$ -measurable. Now set $F_{n,m,2}=F_{n,m,1}$ as in (2.41), which is also clearly ${\mathcal{F}}_{n,m}$ measurable. Condition (1.10) is clearly equivalent to the first condition in (1.8), which was verified in (LABEL:87).

Condition (G6).

Let

[TABLE]

which is clearly ${\mathcal{F}}_{n,m}$ -measurable. Moreover, $\sigma_{n,m}^{-1}|Y-V|\leq\overline{B}$ since removing any edge connected to vertex ${\mathcal{V}}$ can make at most one vertex, other than ${\mathcal{V}}$ , isolated; the additional term of one accounts for the case when vertex ${\mathcal{V}}$ is isolated. Since $\overline{B}\leq\overline{D}$ , as given in (2.33), by setting $a=3$ in (2.35) we obtain

[TABLE]

As $\sigma_{n,m}^{2}\leq\mu_{n,m}$ via Lemma 2.5, the second bound in (1.11) holds.

Condition (G7).

We verify the stronger conditions that (1.12) and the second bound of (1.13) hold when taking the larger supremum obtained when removing the intersection with $\{\Psi\in\raisebox{-0.6458pt}{\Smiley}\}$ . This stronger version of (1.12) is an immediate consequence of Lemma 2.9. As this same lemma shows that the ratios in (1.13) involving means and variances are bounded by a constant, it is only required to bound the ratios of the remaining factor. For $r_{n,m}/r_{n-1,m-d}$ , we have

[TABLE]

and for the reciprocal, using that $m/n\leq 2(m-d)/(n-1)$ for $d\leq m/4$ ,

[TABLE]

Conditions (G1)–(G7) have been verified, and Theorem 1.3 now follows from Theorem 1.1. ∎

2.2 Technical results

Lemma 2.1 (Efron-Stein-type variance bound).

Let $\pi$ and the components of $\Sigma=(\sigma_{1},\dots,\sigma_{n})$ be independent uniform random permutations of $[N]$ , and let $h(\pi,\Sigma)$ be a real-valued function. Let $\tau_{1},\dots,\tau_{N-1}$ be random transpositions independent of each other and of $(\pi,\Sigma)$ , where $\tau_{j}$ transposes $j$ and a uniformly chosen integer in the set $\{j,\ldots,N\}$ . Let $\Sigma^{\prime}=(\sigma_{1}^{\prime},\ldots,\sigma_{n}^{\prime})$ be an independent copy of $\Sigma$ and let $\Sigma^{\prime}_{i}=(\sigma_{1},\dots,\sigma_{i-1},\sigma_{i}^{\prime},\sigma_{i+1},\ldots,\sigma_{n})$ . Then

[TABLE]

Proof.

Without loss of generality assume $\mathop{{}{\mathbb{E}}}\mathopen{}h(\pi,\Sigma)=0$ . Let $\pi_{0}=\pi$ and $\Sigma_{0}=\Sigma$ , and let

[TABLE]

and

[TABLE]

Let $B$ be uniform on $\{0,1\}$ , let $I$ be uniform on $\{1,\dots,n\}$ , let $J$ be uniform on $\{1,\dots,N-1\}$ , and assume $B$ , $I$ and $J$ are mutually independent and independent of all else. Let $W=h(\pi_{0},\Sigma_{0})$ , let $W_{1,i}^{\prime}=h(\pi_{0},\Sigma_{i}^{\prime})$ , and let $W_{2,j}^{\prime}=h(\pi_{0}\tau_{j},\Sigma_{0})$ , and $W^{\prime}=BW_{1,I}^{\prime}+(1-B)W_{2,J}^{\prime}$ . Let $G_{1,i}=n\bigl{(}h(\pi_{N-1},\Sigma_{i})-h(\pi_{N-1},\Sigma_{i-1})\bigr{)}$ , let $G_{2,j}=(N-1)\bigl{(}h(\pi_{j},\Sigma_{0})-h(\pi_{j-1},\Sigma_{0})\bigr{)}$ , and let $G=BG_{1,I}+(1-B)G_{2,J}$ . Let $g:{\mathbb{R}}\to{\mathbb{R}}$ be any bounded measurable function. Then, on the one hand,

[TABLE]

where we used that $(\pi_{N-1},\Sigma_{n})$ is equal in distribution to and independent of $(\pi_{0},\Sigma_{0})$ ; this follows e.g. from Algorithm P of (Knuth,, 1969, p. 147) since the distribution of $\pi_{N-1}$ is uniform conditionally on $\pi_{0}$ , and therefore independent of $\pi_{0}$ .

On the other hand, for all $i$ we have $(\Sigma_{i},\Sigma_{i-1},\Sigma^{\prime}_{i})=_{d}(\Sigma_{i-1},\Sigma_{i},\Sigma_{0})$ since $\Sigma=_{d}\Sigma_{i}^{\prime}$ , and for all $j$ that $(\pi_{j},\pi_{j-1},\pi_{0}\tau_{j})=_{d}(\pi_{j-1},\pi_{j},\pi_{0})$ , by recalling the definition of $\pi_{j}$ and observing that $\pi_{0}$ and $\pi_{0}\tau_{j}$ have the same distribution, and that both are independent of $\tau_{j-1}\cdots\tau_{1}$ , so

[TABLE]

Therefore, $(W,W^{\prime},G)$ is a Stein coupling and, specializing (1.2) to the case $f(x)=x$ and applying the Cauchy Schwarz inequality and noting that $(\Sigma_{i},\Sigma_{i-1})=_{d}(\Sigma_{i}^{\prime},\Sigma)$ , we have

[TABLE]

from which the claim follows. ∎

Lemma 2.2 (Tail and moment bounds for the hypergeometric distribution).

Let $H$ have the hypergeometric distribution $\mathop{\mathrm{Hyp}}(N,m,n)$ counting the number of white balls among $m$ draws from an urn with $N$ balls, $n$ of which are white and $N-n$ black. Let $\gamma=\mathop{{}{\mathbb{E}}}\mathopen{}H=nm/N$ . Then, for any $t>0$ ,

[TABLE]

Moreover, for any $k\geq 1$ , there is a constant $C_{k}$ independent of $\gamma$ such that

[TABLE]

Proof.

To construct a bounded size bias coupling, index the white balls by $[n]$ , and write $H=\sum_{i=1}^{n}I_{i}$ where $I_{i}$ is the indicator that the $i^{\mathrm{th}}$ white ball is sampled. Construct $H^{s}$ with the $H$ -size biased distribution by uniformly sampling a random index $J$ from $1$ to $n$ independently of $I_{1},\ldots,I_{n}$ ; if $I_{J}=1$ , set $H^{s}=H$ , otherwise independently and uniformly select a ball from the sample and swap it with the $J^{\mathrm{th}}$ white ball. It is easy to see that $H^{s}$ has the size-bias distribution, see for instance, Lemma 2.1 of Goldstein and Rinott, (1996). Moreover, $H^{s}=H+1$ if a sampled black ball was swapped with the $J^{\mathrm{th}}$ white ball, and $H^{s}=H$ otherwise. Hence, $|H^{s}-H|\leq 1$ , and the tail-bound (2.46) follows readily from Theorem 1.1 of Ghosh and Goldstein (2011).

Now, it is straightforward to check that $t^{2}/(2\gamma+t)\geq(t-1)/(\gamma+1)$ whenever $t\geq 1$ and $\gamma>0$ , so that

[TABLE]

Hence, $H-\gamma-1$ is stochastically dominated by an exponential random variable $X$ with mean $1/(\gamma+1)$ , and in particular

[TABLE]

from which the second claim easily follows. ∎

A bound similar to (2.46) can be obtained from (Greene and Wellner, 2017, Corollary 1) with better constants, but under additional conditions on the parameters of the hypergeometric distribution

Lemma 2.3.

If $H\sim\mathop{\mathrm{Hyp}}(N,m,n)$ , then

[TABLE]

and

[TABLE]

where the lower bound on ${\mathbb{P}}[H=0]$ is valid whenever $m+n-1<N$ .

Proof.

Since ${\mathbb{P}}[H>0]\leq\mathop{{}{\mathbb{E}}}\mathopen{}H$ , the upper bound on ${\mathbb{P}}[H>0]$ immediately follows. Using the usual exponential upper bound for the final inequality,

[TABLE]

from which the upper bound on ${\mathbb{P}}[H=0]$ and first lower bound on ${\mathbb{P}}[H>0]$ follow. The second lower bound on ${\mathbb{P}}[H>0]$ follows from the first lower bound and the inequality $e^{-x}\leq 1-x+x^{2}/2$ when $x\geq 0$ . The lower bound on ${\mathbb{P}}[H=0]$ follows from the inequality $\log(1+x)\geq x/(1+x)$ for $x>-1$ and the lower bound in (2.47), which together yield

[TABLE]

Lemma 2.4.

For any $x\geq 0$

[TABLE]

Proof.

The upper and lower bounds hold trivially at $x=0$ . With $\psi(x)=1-e^{-x}(1+x)$ , by Talyor’s expansion around zero, for all $x>0$ there exists $\xi_{x}\in(0,x)$ such that

[TABLE]

For $y\in[0,2]$ we have $|\psi^{\prime\prime}(y)|\leq|1-y|\leq 1$ , thus proving the upper bound $x^{2}/2$ over this interval. As $\psi^{\prime\prime\prime}(y)=e^{-y}(y-2)\geq 0$ for all $y\geq 2$ , the function $\psi^{\prime\prime}(y)$ is non-decreasing for $y\geq 2$ . As $\psi^{\prime\prime}(2)=-e^{-2}\in(-1,0)$ , and $\lim_{y\rightarrow\infty}\psi^{\prime\prime}(y)=0$ , we have $\psi^{\prime\prime}(y)\in(-1,0)$ for all $y\geq 2$ , thus proving the upper bound $x^{2}/2$ on $(2,\infty)$ . As $\psi^{\prime}(x)\geq 0$ for all $x\geq 0$ , the function is non-decreasing on $[0,\infty)$ , and as $\psi(x)\rightarrow 1$ as $x\rightarrow\infty$ , we have $\psi(x)\leq 1$ for all $x\geq 0$ .

For the lower bound, for $x>0$ letting

[TABLE]

With $p(x)=e^{-x}(x^{2}+2x+2)-2$ we have $p^{\prime}(x)=-x^{2}e^{-x}\leq 0$ , so $q(x)$ is decreasing for $x>0$ . In particular, $q(x)\geq q(1)=1-2e^{-1}\geq 1/4$ for $x\in[0,1]$ . As $\psi^{\prime}(x)=xe^{-x}$ , the function $\psi(x)$ is non-decreasing, and hence for $x\geq 1$ we have $\psi(x)\geq\psi(1)=1-2e^{-1}\geq 1/4$ , completing the proof of the lower bound. ∎

Lemma 2.5.

For all $(n,m)\in\Theta$ and distinct vertices $v$ and $w$ , the indicators $\mathop{{}\mathrm{I}}[d_{w}=0]$ and $\mathop{{}\mathrm{I}}[d_{v}=0]$ that $v$ and $w$ are isolated are negatively correlated, that is,

[TABLE]

Proof.

Vertex $v$ is isolated if and only if none of the $n-1$ edges that connect $v$ to another vertex is included in the set of $m$ edges selected. Likewise, distinct vertices $v$ and $w$ are both isolated if and only if none of a particular set of $(n-2)+(n-2)+1$ edges is selected. Hence, the first claim is equivalent to

[TABLE]

Expanding the binomial coefficients and canceling common factors yields the equivalent form

[TABLE]

where $(n)_{k}=n(n-1)\cdots(n-k+1)$ , and pairing up the $k^{\mathrm{th}}$ factors of the falling factorials we obtain

[TABLE]

It suffices to show the inequality holds termwise. Expanding both sides of the $k^{\mathrm{th}}$ term of each side and simplifying yields

[TABLE]

The case $k=0$ implies all others, and reduces to $0\leq n^{2}-3n+2=(n-2)(n-1),$ and so holds for all $n\geq 2$ , thus proving the first claim.

Since the indicators of vertices being isolated are negatively correlated, we have

[TABLE]

from which $\sigma_{n,m}^{2}\leq\mu_{n,m}$ is immediate. As $d_{v}\sim\mathop{\mathrm{Hyp}}(N,m,n-1)$ for $N=n(n-1)/2$ , using Lemma 2.3 we have

[TABLE]

as claimed. ∎

Lemma 2.6.

For $n\geq 6$ and $0\leq m\leq n^{2}/4-3n/2$ , we have

[TABLE]

and

[TABLE]

Proof.

Since the distribution of each individual degree is $\mathop{\mathrm{Hyp}}(N,m,n-1)$ , and as the hypothesis of Lemma 2.3 holds due to the restriction assumed on $m$ , it follows from that lemma that

[TABLE]

yielding the upper bound in (2.48). Since under the assertions on $m$ and $n$ we have

[TABLE]

it follows that

[TABLE]

from which we obtain the lower bound in (2.48).

In order to prove the upper and lower bounds on the variance, we use the fact that $\mathop{\mathrm{Var}}\nolimits(W)=\mathop{{}{\mathbb{E}}}\mathopen{}\{G(W^{\prime}-W)\}$ when $(W,W^{\prime},G)$ is a Stein coupling for a mean zero random variable $W$ ; this identity follows immediately upon setting $f(x)=x$ in (1.2). Now recall (2.8), (2.9) and (2.20), and that $N^{v}(m,\pi,\sigma_{v})$ in (2.18) is the set of vertices that receive at least one edge when forming ${\mathcal{G}}^{v}(m,\pi,\sigma_{v})$ , and that $M^{v}(m,\pi,\sigma_{v})$ in (2.19) is the set of all vertices $w\not=v$ such that $\{v,w\}$ is an edge in ${\mathcal{G}}(m,\pi)$ , and does not receive a redistributed edge. As when $I_{v}(m,\pi)=1$ the sets $N^{v}(m,\pi,\sigma_{v})$ and $M^{v}(m,\pi,\sigma_{v})$ are empty, and recalling that $I_{w,1}(m,\pi)=\mathop{{}\mathrm{I}}[d_{w}(m,\pi)=1]$ , we have

[TABLE]

Now consider the first sum in (LABEL:94). Note that when $d_{1}(m,\pi)=k$ , of the potential $N$ edges, $n-1$ have vertex 1 as an endpoint, and an additional $m-k$ edges remain in ${\mathcal{G}}(m,\pi)$ and are not redistributed. Hence,

[TABLE]

To arrive at the hypergeometric expression in the sum in the last equality from the conditional probability that vertex $2$ is incident on any of the $k$ redistributed edges that were removed from vertex $1$ when making the new graph, note that the total number of edges available is reduced from $N$ first by $n-1$ , as vertex $1$ has been removed, and also due to the $m-k$ edges that were part of the original graph that are not changed. Of these remaining edges, $n-2$ are incident on vertex $2$ , which is one fewer than their original number of $n-1$ , due to the removal of vertex $1$ .

Using Lemma 2.3,

[TABLE]

from which we obtain the upper bound

[TABLE]

Given $d_{2}(m,\pi)=0$ , we have $d_{1}(m,\pi)\sim\mathop{\mathrm{Hyp}}(N-(n-1),m,n-2)$ , hence

[TABLE]

and so,

[TABLE]

Similarly, using the second moment expression from (2.28)

[TABLE]

and so from (LABEL:96) we obtain the lower bound

[TABLE]

Now, for the first term in the brackets we have

[TABLE]

where we have used (2.50) for the last inequality. For the second term in the brackets,

[TABLE]

where again we have used (2.50) for the last inequality. Hence, together with the upper bound (2.54), we arrive at

[TABLE]

Now considering the second sum in (LABEL:94), we can write

[TABLE]

where $M^{c,1}(m,\pi,\sigma_{1})=\{w:\{w,1\}\in{\mathcal{G}}(m,\pi),w\in N^{1}(m,\pi,\sigma_{1})\}$ . Taking expectation of the first sum on the right hand side of (2.56) and noting that the distributions of the degrees in the graph are hypergeometric, we obtain that

[TABLE]

From this equality and using the assertions on $m$ and $n$ , we obtain

[TABLE]

Now taking expectation of the second sum of (2.56),

[TABLE]

We arrive at the first Hypergeomtric expression in the sum in the last equality by the same reasoning as that given following (LABEL:95); the remaining two expressions in the sum follow by similar, and simpler, means.

Now, for the first and last terms, using Lemma 2.3 for the upper bound, we have

[TABLE]

and thus, using in the final inequality that $n^{2}\leq 4(N-m-n+2)$ , which holds via the assumption that $m\leq n^{2}/4-3n/2$ , and that $n^{2}\leq 4(N-n)$ , which holds as $n\geq 6$ , true by assumption, we obtain

[TABLE]

Using the estimates from (2.57) and (LABEL:101) in the difference (2.56), and then applying that result and (LABEL:98) in (LABEL:94) yields the claim. ∎

Lemma 2.7.

There exist universal integers $m_{0}$ and $n_{0}$ , and positive constants $C_{0}$ and $c_{0}$ such that, whenever

[TABLE]

we have

[TABLE]

and

[TABLE]

where

[TABLE]

Proof.

It is easy to verify that

[TABLE]

Hence, with the first inequality in (2.59) holding with $n_{0}$ replaced by 27, and taking $c_{0}\leq 1$ , Lemma 2.6 can be invoked to yield

[TABLE]

from which (2.60) now follows for any $C_{0}\geq 8$ .

Turning to (2.61), we first show that the lower bound in (LABEL:92) is positive whenever $n\geq 78$ and $m\geq 78$ . Indeed, that lower bound is positive whenever

[TABLE]

which, recalling the upper bound (2.48), is implied whenever

[TABLE]

with

[TABLE]

Since (2.63) is equivalent to the inequality $y<e^{x}-x-1$ , which in turn is satisfied if $y\leq x^{2}/2$ , since $x^{2}/2<e^{x}-x-1$ , we arrive at the sufficient condition

[TABLE]

which is equivalent to $39\leq m(2-39/n)$ . This inequality holds whenever both $n\geq 78$ and $m\geq 78$ .

We now proceed to bound the ratio between the upper and lower bounds, say $\overline{\sigma}^{2}_{n,m}$ and $\underline{\sigma}^{2}_{n,m}$ , respectively, of (LABEL:92). Using the identity $(1-a)/(1-b)=1+(b-a)/(1-b)$ , we have

[TABLE]

We proceed to lower bound the denominator in (2.65). Letting $x$ and $y$ be as in (2.64), and applying the upper bound in (2.48), we may write

[TABLE]

If $2m/n\leq 1$ , we have $0\leq x\leq 1$ and thus $1-e^{-x}(1+x)\geq x^{2}/4$ from Lemma 2.4, so that

[TABLE]

when $\min(n,m)\geq 312$ . If $2m/n>1$ and so $x>1$ , we simply use the lower bound

[TABLE]

and for any positive $c_{0}$ we can take $n_{0}$ large enough so that

[TABLE]

Hence, writing $\mathop{{}\mathrm{O}}\mathopen{}(\cdot)$ with the understanding that the implied bound holds with universal constants, recalling (2.65), and using Lemma 2.6 to bound $\mu_{n,m}/n$ in its numerator, we have

[TABLE]

where both the $\mathop{{}\mathrm{O}}\mathopen{}(\cdot)$ terms are non-negative.

Next, with $\varphi(x)=e^{-x}(1-e^{-x}(1+x))$ , we show that

[TABLE]

Using (2.60) for the second equality, (2.48) for the third, then (2.60) again and the lower bound of Lemma 2.4 for the fourth, we obtain

[TABLE]

as $R_{3}=\mathop{{}\mathrm{O}}\mathopen{}(R_{2})$ . In the case $2m/n\leq 1$ , we have

[TABLE]

showing the first bound in (2.67). In the case $2m/n>1$ ,

[TABLE]

and using that $x\exp(-x)$ is bounded over $[0,\infty)$ ,

[TABLE]

Applying (2.62), the second bound in (2.67) is shown. Now, using that $\overline{\sigma}^{2}_{n,m}\geq\underline{\sigma}^{2}_{n,m}$ , and writing

[TABLE]

and observing that, because the implicit constants in the bounds (2.66) and (2.67) are universal, and using that the $\mathop{{}\mathrm{O}}\mathopen{}(\cdot)$ terms in (2.66) are non-negative, we can choose $c_{0}$ small enough and $m_{0}$ large enough to guarantee that $0\leq a<1$ and $-1<b<1$ , and hence obtain the upper and lower bounds

[TABLE]

from which the estimate (2.61) follows.

∎

Lemma 2.8.

Let $r_{n,m}$ be defined as in (1.21). For any integers $\overline{n}$ and $\overline{m}$ and any positive constant $\overline{c}>0$ , there exists $\overline{r}\geq 1$ such that $r_{n,m}>\overline{r}$ implies

[TABLE]

Proof.

We will show that $r_{n,m}\leq\overline{r}$ for $\overline{r}=\max\bigl{\{}\overline{n}^{1/2},(2\overline{m})^{3/2},1/\overline{c}^{2},1\bigr{\}}$ if (2.68) is violated. Indeed, if $n<\overline{n}$ , we have by Lemma 2.5, and that $\mu_{n,m}\leq n$ , then

[TABLE]

Finally, if $m>\overline{c}n^{3/2}$ , then similarly

[TABLE]

Lemma 2.9.

Letting \Smileybe as in Condition (G1), it holds that

[TABLE]

and

[TABLE]

Proof.

First, note that if $(n,m)\in\raisebox{-0.6458pt}{\Smiley}$ , then from (2.5) and (2.6) the conclusion of Lemma 2.7 holds. For the ratio of means, from Lemma 2.6, to upper bound $\mu_{n,m}/\mu_{n-1,m-d}$ it suffices to upper bound the ratio

[TABLE]

which is bounded by a constant via $m\leq c_{0}n^{3/2}$ , as in (2.59). Similarly, to upper bound $\mu_{n-1,m-d}/\mu_{n,m}$ it suffices to upper bound the ratio

[TABLE]

which, here using that $d\leq n/4$ , we see is also so bounded.

For the ratios of variances, for $0\leq d\leq\min\{n,m\}/4$ let

[TABLE]

let $\varphi(x)=e^{-x}(1-e^{-x}(1+x))$ , and write

[TABLE]

We show that these four terms, and their reciprocals, can be uniformly bounded over the range of the supremum in (2.69). Since (2.59) holds for $n,$ and $m$ , we can apply Lemma 2.7, and also (2.5) for the first and final bounds, and obtain

[TABLE]

Next, since $m\geq 2m_{0}$ by (2.5) and $d\leq m/4$ , we have that $m-d\geq 3m/4\geq 3m_{0}/2\geq m_{0}$ . Since $n\geq 2n_{0}$ , again by (2.5), we have that $n-1\geq n_{0}$ , and since $m\leq(c_{0}/2)n^{3/2}$ by (2.5) and $(n/(n-1))^{3/2}\leq 2$ for $n\geq 3$ , we have that $m-d\leq m\leq c_{0}(n-1)^{3/2}$ . It follows that $(m-d,n-1)$ also satisfies the hypotheses of Lemma 2.7. Using the lower bound on $\overline{n}$ from (2.5), we have $1/(n-1)^{3}\leq 2/n^{3}$ , and also from (2.6) that $C_{0}(2/m+2m^{2}/n^{3})\leq 1/2$ , so also using $d\leq m/4$ for the second and second to last inequality,

[TABLE]

Hence, (2.70) and (2.71) imply that

[TABLE]

Clearly, $1\leq R_{3}\leq 2$ for $n\geq 2$ . Lastly,

[TABLE]

Note that by (2.6), and by (2.5) that gives that $\overline{c}\leq 1$ , and also using $d\leq n/4$ ,

[TABLE]

It follows that $1/e^{y}$ remains bounded on \Smiley, and therefore, to show $R_{4}$ is bounded it suffices to show that

[TABLE]

remains bounded. Using Lemma 2.4,

[TABLE]

But this ratio remains bounded from above, away from 1, as $d\leq m/4$ implies

[TABLE]

The reciprocal $1/R_{4}$ is bounded similarly, using that (2.72) shows that $e^{y}$ is bounded. ∎

3 Jack Measure on Tableaux

We now turn to the study of the distribution of the standardized sum of the $\alpha$ -contents over all boxes in a tableaux whose shape is determined by the partition $\Lambda_{n}$ of $n$ , that is, to

[TABLE]

where

[TABLE]

and where the partition $\Lambda_{n}$ is sampled from the Jackα measure in (1.22), as described in detail in the introduction; see (1.23) for an illustration of $c_{\alpha}(x)$ , where $x\in\Lambda_{7}$ .

Our bound is based on the zero bias construction in Fulman and Goldstein, (2011), which itself depends on an exchangeable pair constructed using Kerov’s growth process, a sequential procedure for growing a random partition distributed according to Jackα measure.

The state of Kerov’s growth process at times $n=1,2,\ldots$ is a partition of $n$ , starting at time 1 with the unique partition (1) of 1. To describe its transition rule from time $n-1$ to $n$ for $n\geq 2$ , given a box $x$ in the diagram of a partition $\Lambda_{n}$ of $n$ , let $a(x)$ denote the number of boxes in the same row of $x$ and to the right of $x$ (the “arm” of $x$ ), and let $l(x)$ denote the number of boxes in the same column of $x$ and below $x$ (the “leg” of $x$ ), as in (1.22). Now set

[TABLE]

and, for $\Lambda_{n-1}$ a partition of $n-1$ obtained from $\Lambda_{n}$ by removing a single corner box, let

[TABLE]

where $C_{\Lambda_{n}/\Lambda_{n-1}}$ is the union of columns of $\Lambda_{n}$ that intersect $\Lambda_{n}-\Lambda_{n-1}$ and $R_{\Lambda_{n}/\Lambda_{n-1}}$ is the union of rows of $\Lambda_{n}$ that intersect $\Lambda_{n}-\Lambda_{n-1}$ . If at stage $n-1$ the state of the process is the partition $\Lambda_{n-1}$ , a transition to the partition $\Lambda_{n}$ occurs with probability

[TABLE]

It is shown in Kerov, (1994), see also Fulman, (2006), that if $\Lambda_{n-1}$ is distributed according to Jackα measure on partitions of $n-1$ , then the partition $\Lambda_{n}$ obtained by this process at time $n$ has the Jackα distribution.

In the proof of Theorem 3.1 of Fulman and Goldstein, (2011), a variable having the zero bias distribution of $W$ was constructed as follows. Fix $n$ and $\alpha$ and let $\Lambda_{k}$ be the state of Kerov’s growth process at time $k$ , and set

[TABLE]

Denoting by $c_{\alpha}(x_{n})$ the content of the box $x_{n}$ added at time $n$ to form $\Lambda_{n}$ , we can now write

[TABLE]

With $dF(t|\Lambda_{n-1})$ the conditional distribution of $T$ given $\Lambda_{n-1}$ , constructing the pair

[TABLE]

on the same space as $\Lambda_{n-1}$ , and letting $U\sim\mathcal{U}[0,1]$ be independent of $V,T^{\dagger}$ and $T^{\ddagger}$ , the variable

[TABLE]

has the $W$ -zero bias distribution. In fact, the joint distribution on the right hand side of (3.4) can be achieved by running Kerov’s growth process twice, conditionally independent on $\Lambda_{n-1}$ . As shown in Fulman and Goldstein, (2011), the resulting variables, say $T^{\prime}$ and $T^{\prime\prime}$ , yield the crucial exchangeable Stein pair in (1.3) via (3.3). Again by Fulman and Goldstein, (2011), both the conditional mean and variance of $T$ given $\Lambda_{n-1}$ do not depend on $\Lambda_{n-1}$ ; specifically,

[TABLE]

It is essentially for this reason that we may construct $W^{*}$ as in (3.5), using $V$ ; for details, see Fulman and Goldstein, (2011).

Proof of Theorem 1.5.

We verify the conditions of Theorem 1.2.

Condition (Z1).

Fix an $\varepsilon\in(0,1)$ , suppressed in the notation, and let

[TABLE]

where

[TABLE]

which is positive and measurable. Note that

[TABLE]

which implies in particular that $n\geq 3$ if $(n,\alpha)\in\raisebox{-0.6458pt}{\Smiley}$ .

From Fulman, (2004), the mean and variance of the content $Y$ of a tableaux of a partition of $n$ under Jackα measure is given, respectively, by

[TABLE]

In particular we have that $\mathop{\mathrm{Var}}\nolimits_{n,\alpha}Y>0$ for all $(n,\alpha)\in\raisebox{-0.6458pt}{\Smiley}$ .

Condition (Z2).

The variable $Y$ , given in (3.1) is easily seen to satisfy the needed conditions, and the construction of the zero bias variable $W^{*}$ is outlined above in (3.3), (3.4) and (3.5).

Condition (Z3).

From (3.3) and (3.5) we see that

[TABLE]

For each $({n,\alpha})=\raisebox{-0.6458pt}{\Smiley}$ let ${\cal F}_{n,\alpha}$ be the trivial $\sigma$ -algebra $\{\emptyset,\Omega\}$ , let

[TABLE]

where $\lambda_{1}$ and $\lambda_{1}^{\prime}$ respectively denote the length of the first row and first column of the tableaux $\Lambda_{n-1}$ produced by Kerov’s growth process at time $n-1$ . Clearly $\overline{D}$ is $\mathcal{F}_{n,\alpha}$ measurable.

We next argue that $|D|\leq\overline{D}$ on $F_{{n,\alpha},1}$ as follows. With $c_{\alpha}(x_{n}),c_{\alpha}(x_{n}^{\prime})$ and $c_{\alpha}(x_{n}^{\prime\prime})$ the contents of the boxes added to $\Lambda_{n-1}$ by Kerov’s growth process, all conditionally independent given $\Lambda_{n-1}$ , with probability one,

[TABLE]

as the extreme values $\alpha(\lambda_{1}+1)$ and $-(\lambda_{1}^{\prime}+1)$ are achieved, respectively, by adding a box at the end of first row, and at bottom of the first column. Scaling by $\sigma_{n,\alpha}$ in (3.10) to obtain $T$ , $T^{\prime}$ and $T^{\prime\prime}$ , respectively, with probability one

[TABLE]

Now note that by (3.4) the distribution of $(T^{\dagger},T^{\ddagger})$ is absolutely continuous with respect to that of $(T^{\prime},T^{\prime\prime})$ , and hence with probability one

[TABLE]

As $T^{*}$ is the convex combination $UT^{\dagger}+(1-U)T^{\ddagger}$ of $T^{\dagger},T^{\ddagger}$ , it too must lie in this same interval, and hence, as the length of the first column of $\Lambda_{n-1}$ can be no more than $n$ , we obtain

[TABLE]

In what follows, we think of $(n,\alpha)\in\raisebox{-0.6458pt}{\Smiley}$ as fixed and suppress the subscript in $\mathop{{}{\mathbb{E}}}\mathopen{}_{n,\alpha}$ . Turning to the moment conditions, we claim that

[TABLE]

Now,

[TABLE]

To bound the second moment of $T^{*}$ , by the zero bias formula (1.4) with $f(x)=x^{3}/3$ , and the proof of Theorem 4.1 in Fulman and Goldstein, (2011), we obtain

[TABLE]

Hence, by (3.6),

[TABLE]

For the second term of (3.15), by (3.6), we obtain $\sqrt{\mathop{{}{\mathbb{E}}}\mathopen{}T^{2}}=\sqrt{2/n}$ , thus showing first inequality in (3.14). The final inequality in (3.14) holds as $(n,\alpha)\in\raisebox{-0.6458pt}{\Smiley}$ implies $\alpha\geq n$ .

To verify the first condition in (1.16), apply the Cauchy Schwarz inequality, (3.8) and (3.14) to obtain

[TABLE]

To control ${\mathbb{P}}[F_{n,\alpha,1}^{c}]$ , with $m=n-1$ , we apply the inequality

[TABLE]

from the proof of Lemma 6.6 in Fulman, (2004). Using that $\alpha\geq n^{1+\varepsilon}\geq m^{1+\varepsilon}$ in the third inequality below we obtain

[TABLE]

Substitution into (3.16) now verifies the first condition in (1.16).

For the second condition in (1.16), using (3.8), the Cauchy Schwarz inequality, that $\mathop{{}{\mathbb{E}}}\mathopen{}W^{2}=1$ , (3.14) and (3.11) we obtain

[TABLE]

Condition (Z4).

For $(n,\alpha)\in\raisebox{-0.6458pt}{\Smiley}$ , let

[TABLE]

which is ${\cal F}_{n,\alpha}$ measurable, let $F_{n,\alpha,2}=\Omega$ , and let $V$ be as in (3.2). The conditional distribution condition (1.17) is satisfied for $V$ with $\theta=(n,\alpha)$ by the properties of Kerov’s growth process. Clearly the set $F_{n,\alpha,2}$ is measurable with respect to ${\cal F}_{n,\alpha}$ . The moment condition (1.18) is trivially satisfied, as $1-1_{F_{n,\alpha,2}}=0$ almost surely.

Condition (Z5).

By (3.1) and (3.2) we have that $(Y-V)/\sigma_{n,\alpha}=T$ as in (3.3), the scaled content $c_{\alpha}(x_{n})$ of the box $x_{n}$ added at time $n$ in Kerov’s growth process. Hence, the first part of Condition (1.19) holds with $\overline{B}=\overline{D}$ in (3.11), as by (3.12), and arguing as in (3.13), we have

[TABLE]

The second part of this condition holds easily, as

[TABLE]

Condition (G7).

To verify the variance ratio condition (1.12), recalling $\sigma_{n,\alpha}^{2}$ from (3.10) and $\Psi(\alpha,n)$ from (3.17), we have

[TABLE]

as $n\geq 3$ for all $(n,\alpha)\in\ \raisebox{-0.6458pt}{\Smiley}$ by the comment after (3.9). For this same reason condition (1.13) holds, as

[TABLE]

Conditions (Z1)–(Z5) and (G7) have been verified, and Theorem 1.5 now follows from Theorem 1.2. ∎

The next result shows that the case when $\alpha$ is taken larger than that in Theorem 1.5 is degenerate; the boundary case $\varepsilon=1$ is left unresolved.

Theorem 3.1.

For all $\varepsilon>1$ , along any sequence $\{(n,\alpha_{n}),n\geq 1\}$ for which $\alpha_{n}\geq n^{1+\varepsilon}$ ,

[TABLE]

Proof.

Note that for all boxes $x$ in the Tableaux with $\lambda_{1}^{\prime}=n$ we have $a(x)=0$ and $l(x)$ takes all values between [math] and $n-1$ . Hence, from the Jackα measure distribution as given in (1.22),

[TABLE]

Substituting the lower bound on $\alpha_{n}$ into this inequality yields

[TABLE]

Remark 3.2.

The Wasserstein bound in (1.25) suggests that a bound in the Kolmogorov metric should hold with rate function

[TABLE]

This rate function is equivalent to the one we take in (3.8) for the ‘large $\alpha$ ’ parameter set (3.7), as there $n\leq\alpha$ and $1/\sqrt{n}$ is dominated by $\sqrt{\alpha}/{n}$ . Directly extending the arguments used here to cover the ‘small’ alpha regime requires that (3.16) hold for some choice of $F_{{n,\alpha},1}$ . In particular, (3.14) shows that $\mathop{{}{\mathbb{E}}}\mathopen{}_{n,\alpha}D^{2}\leq C/r_{n,\alpha}^{2}$ , with $r_{n,\alpha}$ as in (3.18). Hence, taking this route, one needs to specify $F_{n,\alpha,1}$ as an appropriate restriction on $\Lambda_{n-1}$ that satisfies ${\mathbb{P}}_{n,\alpha}[F_{n,\alpha,1}^{c}]<C/r_{n,\alpha}^{2}$ , and which gives rise to a bounding $\overline{D}$ of the right order. If in this case $\overline{B}$ may be taken to be $\overline{D}$ as in (Z5) above, then $\overline{D}$ needs to be of order $1/r_{n,\alpha}$ .

4 Proof of Theorems 1.1 and 1.2

The proofs of Theorems 1.1 and 1.2 ultimately rely on obtaining information about the solution to a certain recursive inequality. In its simplest form, and closely related to the argument in Bolthausen, (1984), this inequality becomes

[TABLE]

for some $0<q<1$ and $c>0$ . In this simple case, it is not difficult to solve the corresponding equality explicitly to yield

[TABLE]

What is important here is not the exact form of the solution but rather that $a_{n}$ is uniformly bounded over $n\geq 1$ . We show below that this property holds in greater generality when we replace $n$ on the left hand side of (4.1) by a generic parameter $\theta\in\Theta$ , and average the right hand side over a randomly chosen parameter $Y\in\Theta$ , rather than evaluate at $n-1$ . Although, in the general case, there may exist additional solutions to the inequality that are unbounded, it turns out that these solutions must grow exponentially fast along some sequence, which is a behavior that can be excluded in our applications.

Lemma 4.1.

Let $(\Theta,{\mathcal{T}})$ and $(\Omega,{\mathcal{F}})$ be measurable spaces. For each $\theta\in\Theta$ , let ${\mathbb{P}}_{\theta}[\cdot]$ be a probability measure on $\Omega$ . Let $X:\Theta\times\Omega\to[0,\infty)$ and $\Psi:\Theta\times\Omega\to\Theta$ be such that, for each $\theta\in\Theta$ , both $X(\theta,\cdot)$ and $\Psi(\theta,\cdot)$ are measurable functions. Assume there are constants $0<q<1$ and $c>0$ , measurable functions $a:\Theta\to[0,\infty)$ and $r:\Theta\to[0,\infty)$ , and a measurable set $\raisebox{-0.6458pt}{\Smiley}\subset\Theta$ such that

[TABLE]

Then

[TABLE]

Proof.

Note that, for $\theta\in\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ , the variable $X$ must be zero ${\mathbb{P}}_{\theta}$ -almost surely by (A2), and so (A3) yields that

[TABLE]

We may therefore assume that \Smiley is non-empty, else the claim in trivial. We argue by contradiction; so assume Conditions (A1)–(A4) are satisfied and that the opposite of the conclusion is true. For every $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , we can use (A1) and consider the probability measure ${\mathbb{P}}^{X}_{\theta}$ specified by its Radon-Nikodym derivative

[TABLE]

where $\mathop{{}{\mathbb{E}}}\mathopen{}^{X}_{\theta}$ denotes expectation with respect to ${\mathbb{P}}^{X}_{\theta}$ . We argue by contradiction, assuming that when

[TABLE]

and Conditions (A1)–(A4) hold, there exists a sequence $\{\theta_{n}\}_{n\geq 0}\subset\raisebox{-0.6458pt}{\Smiley}$ and a constant $C$ such that, for all $n\geq 0$ ,

[TABLE]

which is clearly impossible.

We proceed by induction. For the base case $n=0$ , we note that since $a(\cdot)$ is bounded by $c$ on $\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ by (4.2), from (4.3) that there is $\theta_{0}\in\raisebox{-0.6458pt}{\Smiley}$ such that $a(\theta_{0})=c/(1-q)+c\delta$ , for some $\delta>0$ ; taking also $C=r(\theta_{0})$ , (4.4) is satisfied.

For the induction step, assume that the lower bound in (4.4) is true for $n-1\geq 0$ . As $\theta_{n-1}\in\raisebox{-0.6458pt}{\Smiley}$ , Condition (A3) yields that $\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta_{n-1}}^{X}a(\Psi)\geq(a(\theta_{n-1})-c)/q$ , and so the integrand must be at least this lower bound on a set of positive ${\mathbb{P}}^{X}_{\theta_{n-1}}$ –measure; that is,

[TABLE]

Moreover, by the definition of essential supremum,

[TABLE]

Hence ${\mathbb{P}}_{\theta_{n-1}}^{X}[A_{n-1}\cap B_{n-1}]={\mathbb{P}}_{\theta_{n-1}}^{X}[A_{n-1}]>0$ , and we can find $\theta_{n}\in\Theta$ satisfying

[TABLE]

Since $a(\cdot)\leq c$ on $\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ we conclude that $\theta_{n}\in\raisebox{-0.6458pt}{\Smiley}$ in view of the first inequality of (4.5), which also completes the induction for the lower bound in (4.3). Applying (4.5) and (A4) yields

[TABLE]

yielding the upper bound in (4.4), and concluding the induction. ∎

Proof of Theorem 1.1.

Throughout the proof, $C$ denotes a constant that does not depend on $\theta$ and can change from formula to formula. Note first that by Condition (G1) the bound (1.14) trivially holds for every $\theta\in\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ by taking $C=\overline{r}$ . Therefore we need only show that (1.14) holds for all $\theta\in\raisebox{-0.6458pt}{\Smiley}$ . Let

[TABLE]

Fix $\varepsilon>0$ , whose exact value is to be chosen later, and for $z\in\mathbb{R}$ define

[TABLE]

Let $f_{z,\varepsilon}$ be the unique bounded solution to the Stein equation

[TABLE]

Using a standard smoothing inequality, see e.g. the proof of Theorem 5.1 in Chen, Goldstein and Shao (2011), we have

[TABLE]

For ease of notation, we drop the indices $z$ and $\varepsilon$ from $f$ .

Bound on $\boldsymbol{|\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\{f^{\prime}(W)-Wf(W)\}|}$ .

Taking an arbitrary $\theta\in\raisebox{-0.6458pt}{\Smiley}$ and using the definition (1.2) of a Stein coupling in the second line below, we have

[TABLE]

From (4.6) and (4.7) of Chen and Shao, (2004) we have, respectively, that $\|f^{\prime}\|\leq 1$ and

[TABLE]

implying, by the first condition in (1.7), that

[TABLE]

and that

[TABLE]

Using the second condition in (1.7), and that $|t|\leq|D|$ in the integral, we have

[TABLE]

Let $F_{\theta}=F_{\theta,1}\cap F_{\theta,2}$ . To handle the indicator in $R_{2,2}$ , write

[TABLE]

Using (LABEL:143), and again that $|t|\leq|D|$ , we have

[TABLE]

where $F_{\circ}=\{\Psi(\theta,\cdot)\in\raisebox{-0.6458pt}{\Smiley}\}$ . Now, by (1.10) and the first condition of (1.8)

[TABLE]

Since $F_{\theta}\cap F_{\circ}$ is contained in $F_{\circ}$ and $\sigma_{\theta}>0$ for $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , on this intersection we may define

[TABLE]

and thus write

[TABLE]

where $\rho$ , $T_{1}$ and $T_{2}$ are to be understood as random variables on $\raisebox{-0.6458pt}{\Smiley}\times\Omega$ . By the first condition in (1.11), we have $|T_{1}|\leq\overline{B}$ on $F_{\theta}\cap F_{\circ}$ . Hence,

[TABLE]

where

[TABLE]

is ${\cal F}_{\theta}$ measurable by Condition (G5).

Note that $F_{\circ}\in{\cal F}_{\theta}$ since \Smiley, given in Condition (G1), is in ${\cal T}$ and $\Psi(\theta,\cdot)$ is ${\cal F}_{\theta}$ -measurable by Condition (G5) for $\theta\in\raisebox{-0.6458pt}{\Smiley}$ . Now using Condition (G4) to bound $|D|$ by $\overline{D}$ on $F_{\theta,1}$ , and applying the measurability of $\overline{G},\overline{D}$ and $F_{\theta,2}$ with respect to ${\mathcal{F}}_{\theta}$ by Conditions (G4) and (G5), we obtain

[TABLE]

Using (4.6) and (1.9) we obtain

[TABLE]

and as the normal density is bounded by $1/\sqrt{2\pi}$ , using (1.12) we see that the integrand in (LABEL:146) can be no more than

[TABLE]

Therefore, using the second condition in (1.8) and the second inequality in (1.11) for the fourth inequality below, and then the first condition in (1.13) for the last, we obtain

[TABLE]

where $R_{2,2,2}=0$ in the case $\mathop{{}{\mathbb{E}}}\mathopen{}\bigl{\{}\overline{G}\,\overline{D}^{2}I_{F_{\theta,2}}\bigr{\}}=0$ , by the first line of the display above.

In order to bound $R_{2,2,3}$ , using that $\delta(\theta)=1$ for $\theta\in\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ by (4.6) for the second equality, that $F_{\theta}\subset F_{\theta,2}$ for the first inequality, the first condition in (1.13) for the second inequality, and the second condition in (1.8) for the last, we have

[TABLE]

where $R_{2,2,3}=0$ when $\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\bigl{\{}\overline{G}\,\overline{D}^{2}I_{F_{\theta,2}}\bigr{\}}=0$ , by the first line of the display.

Collecting the bounds (4.9), (4.10), (4.13), (4.16) and (4.18) and using (4.7) we arrive at

[TABLE]

Since Condition (G1) implies that $\overline{r}$ is an upper bound on $r_{\theta}$ for $\theta\in\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ , and a lower bound on $r_{\theta}$ for $\theta\in\raisebox{-0.6458pt}{\Smiley}$ , we conclude that

[TABLE]

Hence, by the second condition in (1.13),

[TABLE]

Choosing $\varepsilon=C/r_{\theta}q$ with $C$ as in (4.19) and multiplying that inequality by $r_{\theta}$ on both sides and then setting $a(\theta)=\delta(\theta)r(\theta)$ we obtain, for some possibly different constant $c>0$ , which does not depend on $\theta$ but may depend on $q$ ,

[TABLE]

We now verify the hypotheses of Lemma 4.1, with the additional identification

[TABLE]

Conditions (A1) and (A2) follow directly from the definition of $X$ , while (A3) on \Smiley is (4.21), and is satisfied on $\Theta\setminus\raisebox{-0.6458pt}{\Smiley}$ as $\delta(\theta)\leq 1$ , and we may replace $c$ by $\max\{\overline{r},c\}$ . Condition (A4) follows from (4.20). The conclusion of Lemma 4.1 now implies that $\delta(\theta)\leq C/r_{\theta}$ for all $\theta\in\Theta$ . ∎

Proof of Theorem 1.2.

The proof for zero biasing is quite similar, but simpler, than the proof of Theorem 1.1; we only highlight the important differences.

Recalling $D=W^{*}-W$ , applying the bound (4.8), and the zero bias characterization (1.4), we obtain

[TABLE]

Using (Z3), noting in particular that $|D|\leq|\overline{D}|$ on $F_{\theta,1}$ , and the fact that $r_{\theta}>\overline{r}$ for $\theta\in\raisebox{-0.6458pt}{\Smiley}$ yields $1/r_{\theta}^{2}\leq C/r_{\theta}$ , for the first two terms in (4.22), we have

[TABLE]

Following the reasoning in (4.12) and labeling the corresponding terms that arise here in the same manner, for $R_{2,2}$ , the only remaining term, by the first condition in (1.16), and (1.18), we obtain the bound

[TABLE]

For $R_{2,2,2}$ , as $ut$ in (4.14) is replaced by $uD$ , separating the term that arises from $uD$ out of $Q_{z,y}$ as defined there, here we obtain

[TABLE]

where $Q_{z}=(z+T_{2})/\rho$ is ${\cal F}_{\theta}$ measurable. Now arguing as in (4.16) we obtain

[TABLE]

using the second condition of (1.16) and the first one of (1.13) for the first term, and the second conditions of (1.19) and (1.16), respectively, to obtain the last two terms in the bound.

As in (4.18), using the first condition of (1.13) and the second condition of (1.16), we obtain

[TABLE]

Combining terms as in (4.19) yields

[TABLE]

The proof can now be concluded as for Theorem 1.1. ∎

5 Appendix

We illustrate two instances where the conditions in the General Framework of the Introduction are implicitly invoked. First we show that random version of the random variable $Y$ at the (random) ‘smaller’ parameter value is a random variable. The maps

[TABLE]

are measurable, the first as each component is measurable, and the second being a composition of measurable maps.

Next, we show that if $f(\theta,\omega)$ is measurable and ${\mathbb{P}}_{\theta}$ -integrable for all $\theta\in\Theta$ , then

[TABLE]

is a measurable function of $\theta$ . Indeed, the collection ${\mathcal{M}}$ of subsets $E$ of $\Theta\times\Omega$ for which the integral of $f(\omega,\theta)=I_{E}(\omega,\theta)$ is measurable with respect to ${\mathbb{P}}_{\theta}$ is a monotone class. The class ${\mathcal{M}}$ contains the rectangles which are products of measurable sets $A$ and $B$ , as their indicator

[TABLE]

which is a product of measurable functions of $\theta$ . Hence ${\mathcal{M}}$ contains the algebra of all finite disjoint unions of such rectangles, and hence, by the Monotone Class theorem, the sigma-algebra these rectangle generate, that is, the product sigma-algebra. Given a non-negative integrable function $f(\theta,\omega)$ , standard arguments using an approximating sequence of simple functions from below in concert with the Monotone Convergence Theorem yields the measurability of the integral of $f(\theta,\omega)$ , and then for real valued functions by breaking up of any given integrable function into positive and negative parts.

Acknowledgements

We are grateful to the referees for their detailed comments and references. This work was partially supported by the Singapore Ministry of Education AcRF Tier 1 Grants R-146-000-230-114 and R-155-000-167-112 through the National University of Singapore. The second author thanks the Department of Statistics and Applied Probability, National University of Singapore, for their kind hospitality.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bartroff and Goldstein, (2013) Bartroff, J. and Goldstein, L. (2013). A Berry-Esseen bound for the uniform multinomial occupancy model. Electron. J. Probab. 18 , article 27, 1–29.
2Bergström, (1944) Bergström, H. (1944). On the central limit theorem. Skand. Aktuarietidskr. 27 , 139–153.
3Bolthausen, (1984) Bolthausen, E. (1984). An estimate of the remainder in a combinatorial central limit theorem. Z. Wahrsch. Verw. Gebiete 66 , 379–386.
4Chen, Goldstein and Shao (2011) Chen, L. H. Y., Goldstein, L. and Shao, Q.-M. (2010). Normal Approximation by Stein’s Method . Springer Verlag.
5Chen and Shao, (2004) Chen, L. H. Y. and Shao, Q.-M. (2004). Normal approximation under local dependence. Ann. Probab. 32 , 1985–2028.
6Chen and Röllin, (2010) Chen, L.H. Y. and Röllin, A. (2010). Stein couplings for normal approximation. Preprint , arxiv.org/abs/1003.6039
7Chen and Thánh, (2019) Chen, L.H. Y. and Thánh, L.V. (2019). On the error bound in the normal approximation for Jack measures. Preprint , arxiv.org/abs/1902.03476
8Englund, (1981) Englund, G. (1981). A remainder term estimate for the normal approximation in classical occupancy. Ann. Probab. 9 , 684–692.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

STEIN’S METHOD VIA INDUCTION

Abstract

1 Introduction

1.1 Abstract approximation theorems

Theorem 1.1**.**

Theorem 1.2**.**

1.2 Applications

Theorem 1.3**.**

Remark 1.4**.**

Theorem 1.5**.**

2 Isolated vertices in the Erdős-Rényi random graph

2.1 Isolated vertices in ER(n,m)\mathop{\mathrm{ER}}(n,m)ER(n,m)

Proof of Theorem 1.3.

Bounding Rg,1\boldsymbol{R_{g,1}}Rg,1​.

Bounding Rf,1\boldsymbol{R_{f,1}}Rf,1​.

Bounding Rg,2\boldsymbol{R_{g,2}}Rg,2​.

Bounding Rf,2\boldsymbol{R_{f,2}}Rf,2​.

2.2 Technical results

Lemma 2.1** (Efron-Stein-type variance bound).**

Proof.

Lemma 2.2** (Tail and moment bounds for the hypergeometric distribution).**

Proof.

Lemma 2.3**.**

Proof.

Lemma 2.4**.**

Proof.

Lemma 2.5**.**

Proof.

Lemma 2.6**.**

Proof.

Lemma 2.7**.**

Proof.

Lemma 2.8**.**

Proof.

Lemma 2.9**.**

Proof.

3 Jack Measure on Tableaux

Proof of Theorem 1.5.

Theorem 3.1**.**

Proof.

Remark 3.2**.**

4 Proof of Theorems 1.1 and 1.2

Lemma 4.1**.**

Proof.

Proof of Theorem 1.1.

Bound on ∣Eθ{f′(W)−Wf(W)}∣\boldsymbol{|\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\{f^{\prime}(W)-Wf(W)\}|}∣Eθ​{f′(W)−Wf(W)}∣.

Proof of Theorem 1.2.

5 Appendix

Acknowledgements

Theorem 1.1.

Theorem 1.2.

Theorem 1.3.

Remark 1.4.

Theorem 1.5.

2.1 Isolated vertices in $\mathop{\mathrm{ER}}(n,m)$

Bounding $\boldsymbol{R_{g,1}}$ .

Bounding $\boldsymbol{R_{f,1}}$ .

Bounding $\boldsymbol{R_{g,2}}$ .

Bounding $\boldsymbol{R_{f,2}}$ .

Lemma 2.1 (Efron-Stein-type variance bound).

Lemma 2.2 (Tail and moment bounds for the hypergeometric distribution).

Lemma 2.3.

Lemma 2.4.

Lemma 2.5.

Lemma 2.6.

Lemma 2.7.

Lemma 2.8.

Lemma 2.9.

Theorem 3.1.

Remark 3.2.

Lemma 4.1.

Bound on $\boldsymbol{|\mathop{{}{\mathbb{E}}}\mathopen{}_{\theta}\{f^{\prime}(W)-Wf(W)\}|}$ .