Inference For High-Dimensional Split-Plot-Designs: A Unified Approach   for Small to Large Numbers of Factor Levels

Paavo Sattler; Markus Pauly

arXiv:1706.02592·math.ST·June 9, 2017

Inference For High-Dimensional Split-Plot-Designs: A Unified Approach for Small to Large Numbers of Factor Levels

Paavo Sattler, Markus Pauly

PDF

TL;DR

This paper develops robust inference procedures for high-dimensional split-plot designs with many factors and groups, applicable in life sciences where large numbers of observations per subject are common.

Contribution

It introduces a unified approach for inference in heteroscedastic split-plot designs with high-dimensional data, extending classical methods to handle increasing dimensions and groups.

Findings

01

Procedures are robust against increasing dimensions and groups.

02

Limit distributions are characterized in a general asymptotic framework.

03

Small sample approximations improve inference accuracy.

Abstract

Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, divers types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$ , subjects. In this paper we discuss inference procedures for such situations in general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. These will, e.g., be able to answer questions about the occurrence of certain time, group and interactions effects or about particular profiles. The test procedures are based on standardized quadratic forms involving suitably symmetrized…

Tables4

Table 1. Table 1: Asymptotic levels of the tests ψ z subscript 𝜓 𝑧 \psi_{z} and ψ χ subscript 𝜓 𝜒 \psi_{\chi} with fixed critical values under the null hypothesis and all asymptotic frameworks ( 3 )-( 5 ).

chosen	True asymptotic level of the test
level $α$	$ψ_{z}$ ( $β_{1} \to 0$ )	$ψ_{z}$ ( $β_{1} \to 1$ )	$ψ_{χ}$ ( $β_{1} \to 0$ )	$ψ_{χ}$ ( $β_{1} \to 1$ )
0.10	0.10	0.09354	0.11391	0.10
0.05	0.05	0.06819	0.02226	0.05
0.01	0.01	0.03834	0.00003	0.01

Table 2. Table 2: Analysis of the sleep lab trial from Figures 5 - 6 : Shown are the values of the test statistic W N subscript 𝑊 𝑁 W_{N} and the estimator f ^ p ⋆ superscript subscript ^ 𝑓 𝑝 ⋆ \hat{f}_{p}^{\star} as well as the p 𝑝 p -values of the test φ N ⋆ = 𝟏 { W N > K f ^ P ⋆ ; 1 − α } superscript subscript 𝜑 𝑁 ⋆ 1 subscript 𝑊 𝑁 subscript 𝐾 subscript superscript ^ 𝑓 ⋆ 𝑃 1 𝛼 \varphi_{N}^{\star}=\mathbf{1}\{W_{N}>K_{\hat{f}^{\star}_{P};1-\alpha}\} for different null hypotheses of interest.

Hypothesis	$W_{N}^{A}$	${\hat{f}}_{p}^{⋆}$	p-value
$H_{0}^{a}$	-0.45671	1.19030	0.55832
$H_{0}^{b}$	6.24114	7.07832	0.00008
$H_{0}^{a b}$	0.74578	7.21217	0.20120
$H_{0}^{t 1}$	-0.795083	461.874	0.784463
$H_{0}^{t 2}$	-0.591851	360.048	0.71764
$H_{0}^{t 3}$	-0.43381	223.24000	0.65845
$H_{0}^{t 4}$	-1.18382	426.083	0.88385
$H_{0}^{1 \times 2}$	2.37921	155.89025	0.01285
$H_{0}^{1 \times 3}$	0.23757	156.64141	0.39240
$H_{0}^{1 \times 4}$	–0.49984	143.57718	0.68099
$H_{0}^{2 \times 3}$	-0.72716	91.83337	0.75968
$H_{0}^{2 \times 4}$	-0.56510	79.78169	0.70183
$H_{0}^{3 \times 4}$	-0.66704	130.56430	0.74046

Table 3. Table 3: τ P subscript 𝜏 𝑃 \tau_{P} for 𝑻 = ( 𝑷 2 ⊗ 1 d 𝑱 d ) 𝝁 𝑻 tensor-product subscript 𝑷 2 1 𝑑 subscript 𝑱 𝑑 𝝁 \boldsymbol{T}=\left(\boldsymbol{P}_{2}\otimes\frac{1}{d}\boldsymbol{J}_{d}\right)\boldsymbol{\mu}

d	5	10	20	40	70	100	150	200	300	450	600	800
$τ_{P}$	1	1	1	1	1	1	1	1	1	1	1	1

Table 4. Table 4: τ P subscript 𝜏 𝑃 \tau_{P} for 𝑻 = ( 1 2 𝑱 2 ⊗ 𝑷 d ) 𝑻 tensor-product 1 2 subscript 𝑱 2 subscript 𝑷 𝑑 \boldsymbol{T}=\left(\frac{1}{2}\boldsymbol{J}_{2}\otimes\boldsymbol{P}_{d}\right)

d	5	10	20	40	70	100	150	200	300	450	600	800
$τ_{P}$	0.50	0.36	0.21	0.11	0.064	0.045	0.03	0.022	0.015	0.010	0.0074	0.0056

Equations186

a \in N fixed

a \in N fixed

d \in N fixed

X_{i, j} = (X_{i, j, 1}, \dots, X_{i, j, d})^{⊤} \sim in d N_{d} (μ_{i}, Σ_{i}) j = 1, \dots, n_{i}, i = 1, \dots, a

X_{i, j} = (X_{i, j, 1}, \dots, X_{i, j, d})^{⊤} \sim in d N_{d} (μ_{i}, Σ_{i}) j = 1, \dots, n_{i}, i = 1, \dots, a

H_{0} (H) : H μ = 0

H_{0} (H) : H μ = 0

μ_{i, t} = μ + α_{i} + β_{t} + (α β)_{i t}, i = 1, \dots, a; t = 1, \dots, d,

μ_{i, t} = μ + α_{i} + β_{t} + (α β)_{i t}, i = 1, \dots, a; t = 1, \dots, d,

a \in N fixed

a \in N fixed

d \in N fixed

min (a, d, n_{1}, \dots, n_{a}) \to \infty,

\frac{n _{i}}{N} \to ρ_{i} \in (0, 1) i = 1, \dots, a .

\frac{n _{i}}{N} \to ρ_{i} \in (0, 1) i = 1, \dots, a .

0 < lim inf n_{i} / N \leq lim sup n_{i} / N < 1, (i = 1, \dots, a) .

0 < lim inf n_{i} / N \leq lim sup n_{i} / N < 1, (i = 1, \dots, a) .

Q_{N} = N \cdot \overline{X}^{⊤} T \overline{X},

Q_{N} = N \cdot \overline{X}^{⊤} T \overline{X},

\begin{array}[]{ll}\sqrt{N}\cdot\boldsymbol{T}\overline{\boldsymbol{X}}&\stackrel{{\scriptstyle H_{0}}}{{\sim}}\mathcal{N}_{ad}\left(\boldsymbol{0}_{ad},\boldsymbol{T}\left[\bigoplus\limits_{i=1}^{a}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right]\boldsymbol{T}\right),\end{array}

\begin{array}[]{ll}\sqrt{N}\cdot\boldsymbol{T}\overline{\boldsymbol{X}}&\stackrel{{\scriptstyle H_{0}}}{{\sim}}\mathcal{N}_{ad}\left(\boldsymbol{0}_{ad},\boldsymbol{T}\left[\bigoplus\limits_{i=1}^{a}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right]\boldsymbol{T}\right),\end{array}

E_{H_{0}} (Q_{N})

E_{H_{0}} (Q_{N})

Var_{H_{0}} (Q_{N})

Var_{H_{0}} (Q_{N})

W_{N}

W_{N}

β_{1} = s \leq a d max β_{s} \to 0 as N \to \infty,

β_{1} = s \leq a d max β_{s} \to 0 as N \to \infty,

β_{1} \to 1 as N \to \infty,

β_{1} \to 1 as N \to \infty,

for all s \in N β_{s} \to b_{s} as N \to \infty,

for all s \in N β_{s} \to b_{s} as N \to \infty,

A_{i, 1}

A_{i, 1}

A_{i, r, 2}

A_{i, 3}

A_{4}

V a r_{H_{0}} (Q_{N}) := 2 i = 1 \sum a (\frac{N}{n _{i}})^{2} (T_{W})_{ii}^{2} A_{i, 3} + 4 i = 1 \sum a r = 1, r < i \sum a \frac{N ^{2}}{n _{i} n _{r}} (T_{W})_{i r}^{2} A_{i, r, 2} = 2 A_{4}

V a r_{H_{0}} (Q_{N}) := 2 i = 1 \sum a (\frac{N}{n _{i}})^{2} (T_{W})_{ii}^{2} A_{i, 3} + 4 i = 1 \sum a r = 1, r < i \sum a \frac{N ^{2}}{n _{i} n _{r}} (T_{W})_{i r}^{2} A_{i, r, 2} = 2 A_{4}

W_{N} = \frac{Q _{N} - E _{H_{0}} ( Q _{N} )}{V a r _{H_{0}} ( Q _{N} ) ^{1/2}}

W_{N} = \frac{Q _{N} - E _{H_{0}} ( Q _{N} )}{V a r _{H_{0}} ( Q _{N} ) ^{1/2}}

K_{f_{P}} = \frac{χ _{f_{P}}^{2} - f _{P}}{2 f _{P}} such that f_{P} = \frac{tr ^{3} ( ( T V _{N} ) ^{2} )}{tr ^{2} ( ( T V _{N} ) ^{3} )} .

K_{f_{P}} = \frac{χ _{f_{P}}^{2} - f _{P}}{2 f _{P}} such that f_{P} = \frac{tr ^{3} ( ( T V _{N} ) ^{2} )}{tr ^{2} ( ( T V _{N} ) ^{3} )} .

Z_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{2 a})} := (\frac{N}{n _{1}} (X_{1, ℓ_{1}} - X_{1, ℓ_{2}})^{⊤}, \dots, \frac{N}{n _{a}} (X_{a, ℓ_{2 a - 1}} - X_{a, ℓ_{2 a}})^{⊤})^{⊤}

Z_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{2 a})} := (\frac{N}{n _{1}} (X_{1, ℓ_{1}} - X_{1, ℓ_{2}})^{⊤}, \dots, \frac{N}{n _{a}} (X_{a, ℓ_{2 a - 1}} - X_{a, ℓ_{2 a}})^{⊤})^{⊤}

E (Z_{(1, 2)}^{⊤} T Z_{(3, 4)} Z_{(3, 4)}^{⊤} T Z_{(5, 6)} Z_{(5, 6)}^{⊤} T Z_{(1, 2)}) = 8 tr ((T V_{N})^{3}) .

E (Z_{(1, 2)}^{⊤} T Z_{(3, 4)} Z_{(3, 4)}^{⊤} T Z_{(5, 6)} Z_{(5, 6)}^{⊤} T Z_{(1, 2)}) = 8 tr ((T V_{N})^{3}) .

C_{5} = ℓ_{1, 1}, \dots, ℓ_{6, 1} = 1 ℓ_{1, 1} \neq = \dots \neq = ℓ_{6, 1} \sum n_{1} \dots ℓ_{1, a}, \dots, ℓ_{6, a} = 1 ℓ_{1, a} \neq = \dots \neq = ℓ_{6, a} \sum n_{a} \frac{Λ _{1} ( ℓ _{1, 1} , \dots , ℓ _{6, a} ) \cdot Λ _{2} ( ℓ _{1, 1} , \dots , ℓ _{6, a} ) \cdot Λ _{3} ( ℓ _{1, 1} , \dots , ℓ _{6, a} )}{8 \cdot i = 1 \prod a \frac{n _{i} !}{( n _{i} - 6 ) !}},

C_{5} = ℓ_{1, 1}, \dots, ℓ_{6, 1} = 1 ℓ_{1, 1} \neq = \dots \neq = ℓ_{6, 1} \sum n_{1} \dots ℓ_{1, a}, \dots, ℓ_{6, a} = 1 ℓ_{1, a} \neq = \dots \neq = ℓ_{6, a} \sum n_{a} \frac{Λ _{1} ( ℓ _{1, 1} , \dots , ℓ _{6, a} ) \cdot Λ _{2} ( ℓ _{1, 1} , \dots , ℓ _{6, a} ) \cdot Λ _{3} ( ℓ _{1, 1} , \dots , ℓ _{6, a} )}{8 \cdot i = 1 \prod a \frac{n _{i} !}{( n _{i} - 6 ) !}},

Λ_{1} (ℓ_{1, 1}, \dots, ℓ_{6, a}) = Z_{(ℓ_{1, 1}, ℓ_{2, 1}, \dots, ℓ_{1, a}, ℓ_{2, a})}^{⊤} T Z_{(ℓ_{3, 1}, ℓ_{4, 1}, \dots, ℓ_{3, a}, ℓ_{4, a})},

Λ_{1} (ℓ_{1, 1}, \dots, ℓ_{6, a}) = Z_{(ℓ_{1, 1}, ℓ_{2, 1}, \dots, ℓ_{1, a}, ℓ_{2, a})}^{⊤} T Z_{(ℓ_{3, 1}, ℓ_{4, 1}, \dots, ℓ_{3, a}, ℓ_{4, a})},

Λ_{2} (ℓ_{1, 1}, \dots, ℓ_{6, a}) = Z_{(ℓ_{3, 1}, ℓ_{4, 1}, \dots, ℓ_{3, a}, ℓ_{4, a})}^{⊤} T Z_{(ℓ_{5, 1}, ℓ_{6, 1}, \dots, ℓ_{5, a}, ℓ_{6, a})},

Λ_{2} (ℓ_{1, 1}, \dots, ℓ_{6, a}) = Z_{(ℓ_{3, 1}, ℓ_{4, 1}, \dots, ℓ_{3, a}, ℓ_{4, a})}^{⊤} T Z_{(ℓ_{5, 1}, ℓ_{6, 1}, \dots, ℓ_{5, a}, ℓ_{6, a})},

Λ_{3} (ℓ_{1, 1}, \dots, ℓ_{6, a}) = Z_{(ℓ_{5, 1}, ℓ_{6, 1}, \dots, ℓ_{5, a}, ℓ_{6, a})}^{⊤} T Z_{(ℓ_{1, 1}, ℓ_{2, 1}, \dots, ℓ_{1, a}, ℓ_{2, a})} .

Λ_{3} (ℓ_{1, 1}, \dots, ℓ_{6, a}) = Z_{(ℓ_{5, 1}, ℓ_{6, 1}, \dots, ℓ_{5, a}, ℓ_{6, a})}^{⊤} T Z_{(ℓ_{1, 1}, ℓ_{2, 1}, \dots, ℓ_{1, a}, ℓ_{2, a})} .

τ_{P} - τ_{P} = \frac{C _{5}^{2}}{A _{4}^{3}} - \frac{tr ^{2} ( ( T V _{N} ) ^{3} )}{tr ^{3} ( ( T V _{N} ) ^{2} )} ⟶ p 0.

τ_{P} - τ_{P} = \frac{C _{5}^{2}}{A _{4}^{3}} - \frac{tr ^{2} ( ( T V _{N} ) ^{3} )}{tr ^{3} ( ( T V _{N} ) ^{2} )} ⟶ p 0.

C_{5}^{⋆} = C_{5}^{⋆} (B) = \frac{1}{8 \cdot B} b = 1 \sum B Λ_{1} (σ (b)) \cdot Λ_{2} (σ (b)) \cdot Λ_{3} (σ (b)) .

C_{5}^{⋆} = C_{5}^{⋆} (B) = \frac{1}{8 \cdot B} b = 1 \sum B Λ_{1} (σ (b)) \cdot Λ_{2} (σ (b)) \cdot Λ_{3} (σ (b)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Inference For High-Dimensional Split-Plot-Designs:

A Unified Approach for Small to Large Numbers of Factor Levels

Inference For High-Dimensional Split-Plot-Designs:

A Unified Approach for Small to Large Numbers of Factor Levels

Paavo Sattler1 and Markus Pauly1

1University of Ulm, Institute of Statistics

Abstract: Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, divers types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$ , subjects. In this paper we discuss inference procedures for such situations in general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. These will, e.g., be able to answer questions about the occurrence of certain time, group and interactions effects or about particular profiles.

The test procedures are based on standardized quadratic forms involving suitably symmetrized U-statistics-type estimators which are robust against an increasing number of dimensions $d$ and/or groups $a$ . We then discuss its limit distributions in a general asymptotic framework and additionally propose improved small sample approximations. Finally its small sample performance is investigated in simulations and the applicability is illustrated by a real data analysis.

**Keywords: **Approximations, High-dimensional Data, Quadratic Forms, Repeated Measures, Split-plot designs

1 Introduction

In our current century of data, statisticians increasingly face the problem to reconsider the adaptability of classical inferential techniques. In particular, divers types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Here, the curse of high dimensionality or the large $d$ small $N$ problem is especially encountered in life sciences whenever it is easier (or cheaper) to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$ , subjects. Similar observations can be made in industrial sciences with subjects replaced by units. Such designs, where experimental units are repeatedly observed under different conditions or at different time points, are called repeated measures designs or (if two or more groups are observed) split-plot designs. In these trials, one likes to answer questions about the occurrence of certain group or time effects or about particular profiles. Conventionally, for $d<N$ , corresponding null hypotheses are inferred with Hotelling’s $T^{2}$ (one or two sample case) or Wilks’s $\Lambda$ , see e.g. [13][Section 4.3] or [21] [Section 6.8]. Besides normality, these procedures heavily rely on the assumption of equal covariance matrices and particularly break down in high-dimensional settings with $N<d$ . While there exist several promising approaches to adequately deal with the problem of covariance heterogeneity in the classical case with $d<N$ (see e.g. [6, 16, 17, 20, 27, 37, 1, 24, 9, 32, 35, 26, 18, 15]) most procedures for high-dimensional repeated measures designs rely on certain sparsity conditions (see e.g. [2, 11, 23, 30, 34, 10, 19] and the references cited therein). In particular, in an asymptotic $(d,N)\to\infty$ framework, typical assumptions restrict the way the sample size $N$ and/or various powers of traces of the underlying covariances increase with respect to $d$ . These type of sparsity conditions guarantee central limit theorems that lead to approximations of underlying test statistics by a fixed limit distribution. However, as illustrated in [31] for one-sample repeated measures these conditions can in general not be regarded as regularity assumptions. In particular, they may even fail for classical covariance structures. To this end, the authors proposed a novel approximation technique that showed considerably accurate results and investigated its asymptotic behavior in a flexible and non-restrictive $(d,N)\to\infty$ framework. Here, no assumptions regarding the dependence between $d$ and $N$ or the covariance matrix were made. In the current paper, we follow this approach and extend the results of [31] to general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. To even allow for a large number of groups as in [3, 4] or [39], we do not only consider the case with a fixed number $a\in{\mathbb{N}}$ of samples but additionally allow for situations with $a\to\infty$ . The latter case is of particular interest if most groups are rather small (as in screening trials) such that a classical test would essentially possess no power for fixed $a$ . Here increasing the number of groups implies increasing the total sample size from which a power increase might be expected as well. This leads to one of the following asymptotic frameworks

[TABLE]

which we handle simultaneously in the sequel. For all considerations, the adequate and dimension-stable estimation of traces of certain powers of combined covariances turned out to be a major problem. It is tackled by introducing novel symmetrized estimates of $U$ -statistics-type which possess nice asymptotic properties under all asymptotic frameworks given above.

The paper is organized as follows. The statistical model together with the considered hypotheses of interest are introduced in Section 2. The test statistic and its asymptotic behavior is investigated in Section 3, where also novel dimension-stable trace estimators are introduced. Additional approximations for small sample sizes are theoretically discussed in Section 4 and their performance is studied in simulations in Section 5. Afterwards, the new methods will be applied to analyze a high-dimensional data set from a sleep-laboratory trial in Section 6. The paper closes with a discussion and an outlook. All proofs in this paper are shifted to the supplementary material.

2 Statistical Model and Hypotheses

We consider a split-plot design given by $a$ independent groups of $d$ -dimensional random vectors

[TABLE]

with mean vectors $E(\boldsymbol{X}_{i,1})=\boldsymbol{\mu}_{i}=(\mu_{i,t})_{t=1}^{d}\in{\mathbb{R}}^{d}$ and positive definite covariance matrices $Cov(\boldsymbol{X}_{i,1})=\boldsymbol{\Sigma}_{i}$ . Here $j=1,\dots,n_{i}$ denotes the individual subjects or units in group $i=1,\dots,a$ , $a,n_{i}\in{\mathbb{N}}$ , where no specific structure of the group-specific covariance matrices $\boldsymbol{\Sigma}_{i}$ is assumed. In particular, they are even allowed to differ completely. Altogether we have a total number of $N=\sum_{i=1}^{a}n_{i}$ random vectors representing observations from independent subjects. Within this framework, a factorial structure on the factors group or time can be incorporated by splitting up indices. Also, a group-specific random subject effect can be incorporated as outlined in [31][Equation (2.2)].

Writing $\boldsymbol{\mu}=(\boldsymbol{\mu}_{1}^{\top},\ldots,\boldsymbol{\mu}_{a}^{\top})^{\top}$ , linear hypotheses of interest in this general split-plot model are formulated as

[TABLE]

for a proper hypothesis matrix $\boldsymbol{H}$ . It is of the form $\boldsymbol{H}=\boldsymbol{H}_{S}\otimes\boldsymbol{H}_{W}$ , where $\boldsymbol{H}_{S}$ and $\boldsymbol{H}_{W}$ refer to subplot (time) and/or whole-plot (group) effects. For theoretical considerations it is often more convenient to reformulate $H_{0}(\boldsymbol{H})$ by means of the corresponding projection matrix $\boldsymbol{T}=\boldsymbol{H}^{\top}[\boldsymbol{H}\boldsymbol{H}^{\top}]^{-}\boldsymbol{H}$ , see e.g. [31]. Here $(\cdot)^{-}$ denotes some generalized inverse of the matrix and $H_{0}(\boldsymbol{H})$ can equivalently be written as $H_{0}(\boldsymbol{T}):\boldsymbol{T}\boldsymbol{\mu}={\bf 0}$ . It is a simple exercise to prove that the matrix $\boldsymbol{T}$ is of the form $\boldsymbol{T}=\boldsymbol{T}_{S}\otimes\boldsymbol{T}_{W}$ for projection matrices $\boldsymbol{T}_{S}$ and $\boldsymbol{T}_{W}$ , see A.1 (p.A.1) in the supplement. Typical examples are given by

(a)

No group effect:

$H_{0}^{a}:\left(\boldsymbol{P}_{a}\otimes\frac{1}{d}\boldsymbol{J}_{d}\right)\boldsymbol{\mu}={\bf 0}$ ,

(b)

No time effect:

$H_{0}^{b}:\left(\frac{1}{a}\boldsymbol{J}_{a}\otimes\boldsymbol{P}_{d}\right)\boldsymbol{\mu}={\bf 0}$ ,

(c)

No interaction effect between time and group:

$H_{0}^{ab}:\left(\boldsymbol{P}_{a}\otimes\boldsymbol{P}_{d}\right)\boldsymbol{\mu}={\bf 0}$ ,

where $\boldsymbol{J}_{d}$ is the d-dimensional matrix only containing 1s and $\boldsymbol{P}_{d}:=\boldsymbol{I}_{d}-1/d\cdot\boldsymbol{J}_{d}$ is the centring matrix. For interpretational purposes it is sometimes helpful to decompose the component-wise means as

[TABLE]

where $\alpha_{i}\in{\mathbb{R}}$ represents the $i$ -th group effect, $\beta_{t}\in{\mathbb{R}}$ the time effect at time point $t$ and $(\alpha\beta)_{it}\in{\mathbb{R}}$ the $(i,t)$ -interaction effect between group and time with the usual side conditions $\sum_{i}\alpha_{i}=\sum_{t}\beta_{t}=\sum_{i,t}(\alpha\beta)_{it}=0$ . With this notation the above null hypothesis can be rewritten as (a) $H_{0}^{a}:\alpha_{i}\equiv 0\text{ for all }i$ , (b) $H_{0}^{b}:\beta_{t}\equiv 0\text{ for all }t$ and (c) $H_{0}^{ab}:(\alpha\beta)_{it}\equiv 0\text{ for all }i,t$ , respectively.

These and other hypotheses will be utilized in the data analysis Section 6.

3 The Test Statistic and its Asymptotics

We derive appropriate inference procedures for $H_{0}(\boldsymbol{T})$ and analyze their asymptotic properties under the following asymptotic frameworks

[TABLE]

as $N\to\infty$ . Here, no dependency on how the dimension $d=d(N)$ in (3) and (5) or the number of groups $a=a(N)$ in (4)-(5) converges to infinity with respect to the sample sizes $n_{i}$ and $N$ is postulated. In particular, we cover high-dimensional ( $d>n_{i}$ or even $d>N$ ) as well as low-dimensional settings. For a lucid presentation of subsequent results and proofs we additionally assume throughout that

[TABLE]

However, by turning to convergent subsequences, all results can be shown to hold under the more general condition

[TABLE]

It is convenient to measure deviations from the null hypothesis $H_{0}(\boldsymbol{T}):\boldsymbol{T}\boldsymbol{\mu}={\bf 0}$ by means of the quadratic form

[TABLE]

where ${\overline{\boldsymbol{X}}^{\top}=(\overline{\boldsymbol{X}}_{1}^{\top},\dots\overline{\boldsymbol{X}}_{a}^{\top})}$ with $\overline{\boldsymbol{X}}_{i}=n_{i}^{-1}\sum_{j=1}^{n_{i}}\boldsymbol{X}_{i,j},i=1,\dots,a,$ denotes the vector of pooled group means.

Since $Q_{N}$ is in general asymptotically degenerated under (3)-(5) we study its standardized version. To this end, note that under the null hypothesis it holds that

[TABLE]

due to assumption (1). Thus, it follows from classical theorems about moments of quadratic forms, see e.g. [29] or A.4 in the supplement, that its mean and variance under the null hypothesis can be expressed as

[TABLE]

Henceworth we investigate the asymptotic behaviour (under $H_{0}(\boldsymbol{T})$ ) of the standardized quadratic form $\widetilde{W}_{N}=\{Q_{N}-E_{H_{0}}(Q_{N})\}/\operatorname{{\it Var}}_{H_{0}}\left(Q_{N}\right)^{1/2}$ . Denoting by $\boldsymbol{V}_{N}:=\bigoplus_{i=1}^{a}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}$ the inversely weighted combined covariance matrix the representation theorem for quadratic forms [29][p.90], implies that

[TABLE]

Here ’ $\stackrel{{\scriptstyle\mathcal{D}}}{{=}}$ ’ denotes equality in distribution, $\lambda_{s}$ are the eigenvalues of $\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}$ in decreasing order, and $(C_{s})_{s}$ is a sequence of independent $\chi_{1}^{2}$ -distributed random variables. Note, that the eigenvalues $\lambda_{s}$ also depend on the dimension $d$ and the sample sizes $n_{i}$ . Transferring the results of [31] for the one-group design with $a=1$ to our general setting, we obtain the subsequent asymptotic null distributions of the standardized quadratic form for all asymptotic settings (3)-(5).

Theorem 3.1:

Let $\beta_{s}={\lambda_{s}}\Big{/}{\sqrt{\sum_{\ell=1}^{ad}\lambda_{\ell}^{2}}}$ for $s=1,\dots,ad$ . Then $\widetilde{W}_{N}$ has, under $H_{0}(\boldsymbol{T})$ , and one of the frameworks (3)-(5) asymptotically

a)

a standard normal distribution if

[TABLE]

b)

a standardized $\left(\chi_{1}^{2}-1\right)/\sqrt{2}$ distribution if

[TABLE]

c)

the same distribution as the random variable $\sum_{s=1}^{\infty}b_{s}\left(C_{s}-1\right)/\sqrt{2}$ , if

[TABLE]

for a decreasing sequence $(b_{s})_{s}$ in $[0,1]$ with $\sum_{s=1}^{\infty}b_{s}^{2}=1$ .

It is worth to note that the influence of the different asymptotic frameworks are hidden in the corresponding conditions on the sequence of standardized eigenvalues $(\beta_{s})_{s}$ , which depend on both, $a$ and $d$ .

Since these quantities are unknown in general we cannot apply the result directly. In particular, we are not even able to calculate the test statistic ${\widetilde{W}}_{N}$ , not to mention to choose its correct limit distribution. To this end, we first introduce novel unbiased estimates of the unknown traces involved in (8)-(3) and discuss their mathematical properties. Plugging them into (8)-(3) leads to the calculation of adequately standardized test statistics. Finally, the choice of proper critical values is discussed in Section 4.

3.1 Symmetrized Trace Estimators

Here we derive unbiased and ratio-consistent estimates for the unknown traces $\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right),\operatorname{tr}\left((\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2})$ and $\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right),i\neq r,$ given in (8)-(3). Since it is not obvious that the usual plug-in estimates that are based on empirical covariance matrices are useful in high-dimensional settings we follow the approach of [8, 31] and directly estimate the traces. Different, to the one-sample design studied therein we face the problem of additional nuisance parameters – the mean vectors $\boldsymbol{\mu}_{i}$ . To avoid their estimation we adopt Tyler’s symmetrization trick from $M$ -estimates of scatter (see e.g. [12], [14] or [36]) to the present situation, see also [7]. In particular, we consider differences of observation pairs $(\ell_{1},\ell_{2}),\ell_{1}\neq\ell_{2},$ from the same group which fulfill $\left({\boldsymbol{X}}_{i,\ell_{1}}-{\boldsymbol{X}}_{i,\ell_{2}}\right)\sim\mathcal{N}_{d}\left(\boldsymbol{0}_{d},2\boldsymbol{\Sigma}_{i}\right)$ and introduce the following novel estimators for $i=1,\dots,a:$

[TABLE]

Here and throughout the paper expressions of the kind $a\neq b\neq c$ mean that the indices are pairwise different. In this sense all estimators (11)-(14) are symmetrized U-statistics, where the kernel is given by a specific quadratic or bilinear form. Their properties are analyzed below.

Lemma 3.1:

For any $\boldsymbol{\mu}\in\mathbb{R}^{ad}$ and $i=1,\dots,a$ it holds that

$\widehat{E_{H_{0}}}(Q_{N}):=\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}$ * is an unbiased and ratio-consistent estimator for ${\mathbb{E}}_{H_{0}}(Q_{N})$ .* 2. 2.

$A_{4}$ * is an unbiased and ratio-consistent estimator for $\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).$ * 3. 3.

$A_{i,1},A_{i,r,2}$ * and $A_{i,3}$ are unbiased and ratio-consistent estimators for $\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right),\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)$ and $\operatorname{tr}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right),$ respectively.*

Remark 3.1:

*(a) Recall that an $\mathbb{R}$ -valued estimator $\widehat{\theta}_{N}$ is ratio-consistent for a sequence of real parameters $\theta_{N}$ iff $\widehat{\theta}_{N}/\theta_{N}\to 1$ in probability as $N\to\infty$ . Here the estimators and parameters may depend on $a=a(N)$ and/or $d=d(N)$ .

(b) Studying the proof of Lemma 3.1 given in the supplementary material in detail, we see that all estimators are even (dimension-)stable in the sense of [8], i.e. they fulfill $|{\mathbb{E}}(\widehat{\theta}_{N}/\theta_{N}-1)|\leq b_{N}$ and $\operatorname{{\it Var}}(\widehat{\theta}_{N}/\theta_{N})\leq c_{N}$ for sequences $b_{N},c_{N}\downarrow 0$ not depending on $a$ and $d$ . *

It follows from Lemma 3.1 that

[TABLE]

is an unbiased estimator of $Var_{H_{0}}(Q_{N})$ . This motivates to study the standardized quadratic form

[TABLE]

for testing $H_{0}(T)$ . Its asymptotic behaviour under $H_{0}(T):\boldsymbol{T}\boldsymbol{\mu}={\bf 0}_{ad}$ is summarized below.

Theorem 3.2:

Under $H_{0}(T):\boldsymbol{T}\boldsymbol{\mu}={\bf 0}_{ad}$ and one of the frameworks (3)-(5) the statistic $W_{N}$ has the same asymptotic limit distributions as $\widetilde{W}_{N}$ , if the respective conditions (a)-(c) from Theorem 3.1 are fulfilled.

The result shows that it is not reasonable to approximate the unknown distribution of the test statistic with a fixed distribution to obtain a valid test procedure. For example, choosing $z_{1-\alpha}$ , the $(1-\alpha)$ -quantile of the standard-normal distribution ( $\alpha\in(0,1)$ ), as critical value would lead to a valid asymptotic level $\alpha$ test $\psi_{z}=\mathbf{1}\{W_{N}>z_{1-\alpha}\}$ in case of $\beta_{1}\to 0$ , i.e. ${\mathbb{E}}_{H_{0}}(\psi_{z})\to\alpha$ . However, for $\beta_{1}\to 1$ we would obtain ${\mathbb{E}}_{H_{0}}(\psi_{z})\to P(\chi_{1}^{2}>\sqrt{2}z_{1-\alpha}+1)$ which may lead to an asymptotically liberal ( $\alpha=0.01$ or $0.05$ ) or conservative ( $\alpha=0.1$ ) test decision, see Table 1. Contrary, choosing $c_{1-\alpha}=(\chi_{1;1-\alpha}^{2}-1)/\sqrt{2}$ as critical value (where $\chi_{1;1-\alpha}^{2}$ denotes the $(1-\alpha)$ -quantile of the $\chi_{1}^{2}$ -distribution) for the test $\psi_{\chi}=\mathbf{1}\{W_{N}>c_{1-\alpha}\}$ , it follows that ${\mathbb{E}}_{H_{0}}(\psi_{\chi})\to\alpha$ if $\beta_{1}\to 1$ but ${\mathbb{E}}_{H_{0}}(\psi_{\chi})\to 1-\Phi(c_{1-\alpha})$ for $\beta_{1}\to 0$ , where $\Phi$ denotes the cumulative distribution function of $\mathcal{N}(0,1)$ . Again we obtain an asymptotically liberal ( $\alpha=0.1$ ) or extremely conservative ( $\alpha=0.05$ or $0.01$ ) test decision, see the last column of Table 1.

Hence, an indicator (i.e. estimator) for whether $\beta_{1}\to 0$ , $\beta_{1}\to 1$ or betwixt would be desirable. Nevertheless, even if the tests with fixed critical values are asymptotically correct ( $\psi_{z}$ in case of $\beta_{1}\to 0$ or $\psi_{\chi}$ in case of $\beta_{1}\to 1$ ), their true type- $I$ -error control may be poor for small sample sizes, see the simulations in Section 5.1.

Thus, in any case it seems more appropriate to approximate $W_{N}$ by a sequence of standardized distributions as already advocated in [31] for the case of $a=1$ . We will propose such approximations in the next Sections, where also a check criterion for $\beta_{1}\to 0$ or $\beta_{1}\to 1$ is presented.

4 Better Approximations

To motivate the subsequent approximation, recall from (10) that $\widetilde{W}_{N}$ is of weighted $\chi^{2}$ -form. Following [40] it is reasonable to approximate statistics of this from by a standardized $(\chi^{2}_{f}-1)/\sqrt{2}$ -distribution such that the first three moments coincide. Straightforward calculations show that this is achieved by approximating with

[TABLE]

In case of $a=1$ this simplifies to the method presented in [31]. There it has already been seen that the approximation (15) performs much better for smaller sample sizes and/or dimensions than the above approaches with a fixed distribution. We will later rediscover this observation in Section 5 for our present design with general $a$ . The next theorem gives a mathematical reason for this approximation.

Theorem 4.1:

Under the conditions of Lemma 3.1 and one of the frameworks (3)-(5) we have that $K_{{f_{P}}}$ given in (15) has, under $H_{0}:\boldsymbol{T}\boldsymbol{\mu}=\boldsymbol{0}_{ad}$ , asymptotically

a)

a standard normal distribution if $\beta_{1}\to 0$ as $N\to\infty$ ,

b)

a standardized $\left(\chi_{1}^{2}-1\right)/\sqrt{2}$ distribution if $\beta_{1}\to 1$ as $N\to\infty$ .

Thus, compared to the approximation with a fixed limit distribution, the $K_{f_{P}}$ -approach would at least be asymptotically correct whenever $\beta_{1}\to\gamma\in\{0,1\}$ while always providing a three moment approximation to the test statistic. To apply this result, an estimator for $f$ in (15) is needed. Since we have already found $A_{4}$ as unbiased and ratio-consistent estimator for $\operatorname{tr}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2})$ , it remains to find an adequate one for $\operatorname{tr}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3})$ . A combination of both will then lead to a proper estimator for ${f_{P}}$ and $\tau_{P}={f_{P}}^{-1}$ , respectively. Again we prefer a direct estimation of the involved traces. To this end, we introduce normal random vectors

[TABLE]

with $1\leq\ell_{2i-1}\neq\ell_{2i}\leq n_{i}$ for all $i=1\dots,a$ . Note, that this vectors are multivariat normal distributed with ${\mathbb{E}}(\boldsymbol{Z}_{\left(\ell_{1},\ell_{2},\dots,\ell_{2a-1},\ell_{2a}\right)})={\bf 0}_{ad}$ and $\operatorname{{\it Cov}}\left(\boldsymbol{Z}_{\left(\ell_{1},\ell_{2},\dots,\ell_{2a-1},\ell_{2a}\right)}\right)=2\bigoplus_{i=1}^{a}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}=2\boldsymbol{V}_{N}$ . Utilizing their particular form, it is shown in the supplement, that a cyclic combination of these random vectors yield an unbiased estimator for $\operatorname{tr}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3})$ . In particular, writing $\boldsymbol{Z}_{(\ell_{1},\ell_{2})}$ for $\boldsymbol{Z}_{(\ell_{1},\ell_{2},\ell_{1},\ell_{2},\dots,\ell_{1},\ell_{2})}$ we have

[TABLE]

This motivates the definition of (for $n_{i}\geq 6$ )

[TABLE]

where

[TABLE]

Its properties together with a consistent estimator for $f_{P}$ are summarized below.

Lemma 4.1:

*(a) The estimator $C_{5}$ given in (17) is unbiased for $\operatorname{tr}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3})$ .

(b) Suppose that $a\in{\mathbb{N}}$ is fixed. Then $\widehat{\tau}_{P}:=C_{5}^{2}/A_{4}^{3}$ is a consistent estimator for $\tau_{P}=1/f_{P}$ as

${\min(d,n_{1},\dots,n_{d})\to\infty}$ , i.e. we have convergence in probability*

[TABLE]

(c) Now suppose that $a\to\infty$ and that there exists some $p>1$ such that $\min(n_{1},\dots,n_{a})=\mathcal{O}\left(a^{p}\right)$ . Then (18) even holds under the asymptotic frameworks (4) - (5).

Theorem 4.2:

Suppose (18). Then, Theorem 4.1 remains valid if we replace $f_{P}$ by its estimator $\widehat{f}_{P}=1/\widehat{\tau}_{P}$ .

Remark 4.2:

*(a) Using similar arguments as in the proof of Lemma 8.1. of [31] we obtain the equivalences $\beta_{1}\to 0\Leftrightarrow\tau_{P}\to 0$ and $\beta_{1}\to 1\Leftrightarrow\tau_{P}\to 1$ . Thus, $\widehat{\tau}_{P}$ can also be used as check criterion for these two cases.

(b) It is also possible to derive a consistent estimator for $\tau_{CQ}={\operatorname{tr}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4})}/{\operatorname{tr}^{2}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2})}$ ${=1/f_{CQ}}$ , a key quantity in [11], see the supplement for details concerning the estimator. The corresponding approximation by the sequence $K_{f_{CQ}}$ even shares the same asymptotic properties of the Pearson approximation (15) stated in Theorem 4.1 and Theorem 4.2. However, it only provides a two moment approximation which turned out to perform worse in simulations (results not shown).

(c) In the supplement, we additionally present an unbiased estimator $C_{7}$ for $\operatorname{tr}(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3})$ such that $C_{7}^{2}/A_{4}^{3}$ is consistent for $\tau_{P}$ in all asymptotic frameworks (3) - (5). Particularly, the extra condition $\min(n_{1},\dots,n_{a})=\mathcal{O}\left(a^{p}\right)$ is not needed. However, it is computationally more expensive compared to $C_{5}$ and thus omitted here.*

In practical applications, the computation costs for $C_{5}$ are nevertheless rather high. This leads to disproportional waiting times for $p$ -values of the corresponding approximate test $\varphi_{N}=\mathbf{1}\{W_{N}>K_{\widehat{f}_{P};1-\alpha}\}$ , where the critical value is given as $(1-\alpha)$ -quantile of $K_{\widehat{f}_{P}}$ . Therefore, we propose a certain subsampling-type method. Since the unbiasedness of $C_{5}$ clearly stems from (16) it seems reasonable to proceed as follows: For each $i=1,\dots,a$ and $b=1,\dots,B$ we independently draw random subsamples $\{\sigma_{1i}(b),\dots,\sigma_{6i}(b)\}$ of length $6$ from $\{1,\dots,n_{i}\}$ and store them in a joint random vector $\boldsymbol{\sigma}(b)=(\sigma_{11}(b),\dots,\sigma_{6a}(b))$ . Then, a subsampling-version of the estimator $C_{5}$ is given by

[TABLE]

Letting $B=B(N)\to\infty$ as $N\to\infty$ it is easy to see (see the supplement for details), that ${C_{5}^{\star}}$ has the same asymptotic properties as ${C_{5}}$ . In particular, it is stated in the supplement that $\widehat{\tau}_{P}^{\star}:=1/\hat{f}_{P}^{\star}:=C_{5}^{\star 2}/A_{4}^{3}$ is a consistent estimator for $\tau_{P}$ and that the approximation $K_{\hat{f}_{P}^{\star}}$ has the same weak limits as $K_{\hat{f}_{P}}$ stated in Theorem 4.2. This leads to $\varphi_{N}^{\star}=\mathbf{1}\{W_{N}>K_{\hat{f}^{\star}_{P};1-\alpha}\}$ which is an asymptotically exact test whenever $\beta_{1}\to\gamma\in\{0,1\}$ . The finite sample, dimension and group size performance of this approximation are investigated in the subsequent section.

5 Simulations

In the previous sections we considered the asymptotic properties of the proposed inference methods which are valid for large sample and fixed or possibly large dimension and/or group sizes. Here we investigate the small sample properties of our proposed approximation procedure

${\varphi_{N}^{\star}=\mathbf{1}\{W_{N}>K_{\hat{f}^{\star}_{P};1-\alpha}\}}$ in comparison to the statistical tests $\psi_{z}$ and $\psi_{\chi}$ based on fixed critical values. In particular, we compare these procedures in simulation studies with respect to

(a)

their type I error rate control under the null hypothesis (Section 5.1) and

(b)

their power behaviour under various alternatives (Section 5.2).

All simulations were performed with the help of the R computing environment (R Development Core Team, 2013), each with $n_{sim}=10^{4}$ simulation runs.

5.1 Asymptotic distribution and Type I error control

First we study the speed of convergence, i.e. type I error control, of the three different tests under the null hypothesis. To be in line with the simulation results presented in [31] for the case $a=1$ we also multiplied the statistic $W_{N}$ by $\sqrt{{N}/{(N-1)}}$ to avoid a slightly liberal behaviour.

Due to the abundance of different split-plot designs and the more methodological focus of the paper, we restrict our simulation study to two specific null hypotheses and a high dimensional and heteroscedastic two-sample setting. In particular, we investigate the type-I-error behaviour of all three tests for the null hypotheses

•

${H_{0}^{a}:\left(\boldsymbol{P}_{2}\otimes\frac{1}{d}\boldsymbol{J}_{d}\right)\boldsymbol{\mu}={\bf 0}}$ and

•

${H_{0}^{b}:\left(\frac{1}{2}\boldsymbol{J}_{2}\otimes\boldsymbol{P}_{d}\right)\boldsymbol{\mu}={\bf 0}}$ .

In both cases sample sizes were chosen from $n_{1}\in\{10,20,50\}$ and $n_{2}\in\{15,30,75\}$ combined with various choices of dimensions $d\in\{5,10,20,40,70,100,150,200,$ $300,450,600,800\}$ . For the covariance matrices a heteroscedastic setting with autoregressive structures $\left(\boldsymbol{\Sigma}_{1}\right)_{i,j}=0.6^{|i-j|}$ and $\left(\boldsymbol{\Sigma}_{2}\right)_{i,j}=0.65^{|i-j|}$ was chosen and for each simulation run $B(N)=500\cdot N,$ $N=n_{1}+n_{2},$ subsamples were drawn.

Note that these settings imply $\beta_{1}\to 1$ for $H_{0}^{a}$ and $\beta_{1}\to 0$ for $H_{0}^{b}$ , see the supplement for details.

Thus, $\varphi_{N}^{\star}$ is asymptotically exact in both cases while $\psi_{\chi}$ and $\psi_{z}$ posses the asymptotic behaviour given in Table 1. In particular, the $z$ -test $\psi_{z}$ should be rather liberal for testing for $H_{0}^{a}$ and $\psi_{\chi}$ strongly conservative for $H_{0}^{b}$ . All these theoretical findings can be recovered in our simulations: The results for $H_{0}^{a}$ , displayed in Figure 1, show an inflated type I error level control of $\psi_{z}$ around $8\%$ for smaller samples sizes ( $N=25$ ). For larger sample sizes ( $N=125$ ) it stabilizes in the region of its asymptotic level of $6.8\%\pm 0.2\%$ . Moreover, the error control is only slightly effected by the varying dimensions under investigation. In comparison, the two asymptotically correct tests $\varphi_{N}^{\star}$ and $\psi_{\chi}$ are slightly liberal for smaller sample sizes and more or less asymptotically correct for moderate ( $N=50$ ) to larger sample sizes. Here, it is astonishing that both procedures are nearly superposable, suggesting a fast convergence of the degrees of freedom estimator $\widehat{f_{P}}$ .

The results for $H_{0}^{b}$ , presented in Figure 2, are slightly different. In particular, both the tests $\psi_{\chi}$ and $\psi_{z}$ depending on fixed critical values are more effected by the underlying dimension: For smaller $d<100$ the true level is considerably larger than their asymptotic level given in Table 1; resulting in a rather liberal behaviour of $\psi_{z}$ and close to exact type I error control for $\psi_{\chi}$ . This effect is decreased with increasing sample sizes. Moreover, for larger dimension ( $d\geq 200$ ) both tests approach their asymptotic level. In comparison, the procedure $\varphi_{N}^{\star}$ based on the $K_{\hat{f}^{\star}}$ approximation shows a fairly good $\alpha$ level control through all dimension and sample size settings. Making this the method of choice.

5.2 Power Performance

We examined the power of the three procedures. Again a heteroscedastic two group split-plot design with autoregressive covariance structures ( $\left(\boldsymbol{\Sigma}_{1}\right)_{i,j}=0.6^{|i-j|}$ and $\left(\boldsymbol{\Sigma}_{2}\right)_{i,j}=0.65^{|i-j|}$ ) was selected. The alpha level ( $5\%$ ) and the null hypotheses were chosen as above ( $H_{0}^{a}:\left(\boldsymbol{P}_{2}\otimes\frac{1}{d}\boldsymbol{J}_{d}\right)\boldsymbol{\mu}={\bf 0}$ and $H_{0}^{b}:\left(\frac{1}{2}\boldsymbol{J}_{2}\otimes\boldsymbol{P}_{d}\right)\boldsymbol{\mu}={\bf 0}$ ). The investigated alternatives were

•

a trend alternative for both hypotheses with $\boldsymbol{\mu}_{2}={\bf 0}_{d}$ and $\mu_{1,t}=t\cdot\delta/d,t\in{\mathbb{N}}_{d}$ for $\delta\in[0,3]$ and additionally

•

a shift alternative for $H_{0}^{a}$ with $\boldsymbol{\mu}_{2}={\bf 0}_{d}$ and $\boldsymbol{\mu}_{1}=\textbf{1}\cdot\delta$ and

•

a one-point alternative for $H_{0}^{b}$ , with $\boldsymbol{\mu}_{2}={\bf 0}_{d}$ and $\boldsymbol{\mu}_{1}=\boldsymbol{e}_{1}\cdot\delta$ ,

each with increased $\delta\in[0,3]$ . We only considered the moderate sample size setting with $n_{1}=20$ and $n_{2}=30$ together with three choices of dimensions $d=\{10,40,100\}$ . The results can be found in Figures 3 and 4.

It can be readily seen that the power depends on the type of alternative: For the trend (Figure 3) and the shift alternative (left panel of Figure 4) the power gets larger with increasing dimension. This is essentially apparent for the shift alternative, where the power increases considerably from $d=10$ to $d=40$ . Contrary, for the one-point alternative the power becomes smaller for higher dimensions $d$ (right panel of Figure 4). However, this is as expected since a difference in one single component can be detected more easily for smaller $d$ .

6 Analysis of a sleep laboratory data set

Finally, the new methods are exemplified on the sleep laboratory trial reported in [22]. In this two-armed repeated measures trial, the activity of prostaglandin-D-synthase ( $\beta$ -trace) was measured every 4 hours over a period of 4 days. The grouping factor was gender and the above $d=24$ repeated measures were observed on $n_{i}=10$ young healthy men (group $i=2$ ) and women (group $i=1$ ). Since each day presented a certain sleep condition the repeated measures are structured by two crossed fixed factors:

•

intervention (with $4$ levels: normal sleep, sleep deprivation, recovery sleep and REM sleep deprivation) and

•

time (with the $6$ levels/time points $24h,4h,8h,12h,16h$ and $20h$ ).

Due to $d>n_{i}$ we are thus dealing with a high-dimensional split-plot design with $a=2$ groups and $d=24$ repeated measures. The time profiles of each subject are displayed in Figure 5 (for the female group $1$ ) and Figure 6 (for the male group $2$ ). We note, that group-specific profile analysis could already be performed by the methods given in [31]. In particular, they found a significant intervention and a borderline time effect for the male group. For the current two-sample design additional questions concern (1) whether there is a gender effect, i.e. the time profiles of the groups differ, and if so (2) whether they differ with respect to certain interventions.

Moreover, investigations regarding (3) a general effect of time and (4) interactions between the different factors are of equal interest. Utilizing the notation from Section 2, the corresponding null hypotheses can be formalized via adequate contrast matrices. In particular, we are interested in testing the null hypotheses

(a)

No gender effect: $H_{0}^{a}:\left(\boldsymbol{P}_{2}\otimes\frac{1}{24}\boldsymbol{J}_{24}\right)\boldsymbol{\mu}={\bf 0},$

(b)

No time effect: $H_{0}^{b}:\left(\frac{1}{2}\boldsymbol{J}_{2}\otimes\boldsymbol{P}_{24}\right)\boldsymbol{\mu}={\bf 0}$ ,

(c)

No interaction effect between time and group: $H_{0}^{ab}:\left(\boldsymbol{P}_{2}\otimes\boldsymbol{P}_{24}\right)\boldsymbol{\mu}={\bf 0}$ ,

(d)

No time effect for intervention $\ell$ , $\ell\in\{1,\dots,4\}$ :

$H_{0}^{t\ell}:\left(\boldsymbol{P}_{2}\otimes\left(\left(\boldsymbol{e}_{l}\cdot\boldsymbol{e}_{l}^{\top}\right)\otimes\boldsymbol{P}_{6}\right)\right)\boldsymbol{\mu}={\bf 0}$ ,

(e)

No effect between interventions $\ell$ and $k$ , $\ell,k\in\{1,\dots,4\}$ :

$H_{0}^{\ell\times k}:\left(\boldsymbol{P}_{2}\otimes\left(\left(\boldsymbol{e}_{\ell}\cdot\boldsymbol{e}_{\ell}^{\top}-\boldsymbol{e}_{\ell}\cdot\boldsymbol{e}_{k}^{\top}\right)\otimes\frac{1}{6}\boldsymbol{J}_{6}\right)\right)\boldsymbol{\mu}={\bf 0}$ ,

where $e_{\ell}=(\delta_{\ell j})_{j}$ denotes the Kronecker delta. Applying the test $\varphi_{N}^{\star}$ based on the standardized quadratic form $W_{N}$ as test statistic and the proposed $K_{\hat{f}_{P}^{\star}}$ -approximation with $B=50000\cdot N=100,000$ subsamples we obtain the results summarized in Table 2 .

There it can be readily seen that most hypotheses cannot be rejected at level $\alpha=5\%$ . In particular, there is no evidence for an overall gender effect, so that we have not performed post-hoc analyses on the interventions. Only a highly significant time effect, as well as a significant effect between the first two interventions (normal sleep and sleep deprivation), could be detected. However, applying a multiplicity adjustment (Bonferroni or Holm) only the time effect remained significant.

7 Conclusion & Outlook

In this paper we have investigated inference procedures for general split-plot models, allowing for unbalanced and/or heteroscedastic covariance settings as well as a factorial structure on the whole- and sub-plot factors. Inspired by the work of [31] for one group repeated measures designs the test statistics were based on standardized quadratic forms. However, different to their work novel symmetrized $U$ -statistics were introduced to adequately handle the problem of additional nuisance parameters in the multiple sample case.

To jointly cover low and highdimensional models as well as situations with a small or large number of groups we conducted an in-depth study of their asymptotic behaviour under a unified asymptotic framework. In particular, the number of groups $a$ and dimensions $d$ may be fixed as in classical asymptotic settings, or even converge to infinity. Here we do neither postulate any assumptions on how $d$ and/or $a$ and the underlying sample sizes converge to infinity nor any sparsity conditions on the covariance structures since such assumptions are usually hard to check for a practical data set at hand. As a consequence it turned out that the test statistic posses a whole continuum of asymptotic limits that depend on the eigenvalues of the underlying covariances. We thus argued that an approximation by a fixed critical value is not adequate and proposed an approximation by a sequence of standardized $\chi^{2}$ -distributions with estimated degrees of freedom. For computational efficiency we additionally provided a subsampling-type version of the degrees of freedom estimator. Our approach provides a reasonably good three moment approximation of the test statistic and is even asymptotically exact if the influence of the largest eigenvalue is negligible (leading to a standard normal limit) or decisive (leading to a standardized $\chi_{1}^{2}$ limit).

Apart from these asymptotic considerations we evaluated the finite sample and dimension performance of our approximation technique. In particular, for varying combinations of sample sizes and dimensions, we compared its power and type I error control with test procedures based on fixed critical values. In all designs it showed a quite accurate error control over all low- ( $d\leq 10$ ) to highdimensional situations (with up to $d=800$ ). In comparison, its performance was considerably better than that of the other two tests which partially disclosed a rather liberal or conservative behaviour.

In future research we like to extend the current results to general highdimensional MANOVA designs, where we also like to relax the involved assumption of multivariate normality and/or even test simultaneously for mean and covariance effects as recently proposed in [28]. These investigations, however, require completely different (e.g., martingale) techniques and estimators of the involved traces. Moreover, we also plan to conduct more detailed simulations (especially for larger group sizes $a$ and other covariance matrices) in a more applied paper.

Acknowledgement

The authors would like to thank Edgar Brunner for helpful discussions. This work was supported by the German Research Foundation project DFG-PA 2409/4-1.

Supplementary Material to

**’Inference For High-Dimensional Split-Plot-Designs:

A Unified Approach for Small to Large Numbers of Factor Levels’

** Paavo Sattler1 and Markus Pauly1

1University of Ulm, Institute of Statistics

Abstract. In this supplement we present all theoretical derivations and computations that were omitted in the paper for lucidity.

Appendix A Appendix

We start with some preliminary results and Lemmatas.

A.1 Basics

In Section 2 of the main paper we claimed that the unique projection matrix $\boldsymbol{T}$ to the hypothesis matrix $\boldsymbol{H}=\boldsymbol{H}_{S}\otimes\boldsymbol{H}_{W}$ that equivalently describes the null is given by the product of two projection matrices $\boldsymbol{T}_{S}\otimes\boldsymbol{T}_{W}$ . We start with the proof of this claim:

Lemma A.1:

Let be $\boldsymbol{H}=\boldsymbol{H}_{W}\otimes\boldsymbol{H}_{S}$ with $\boldsymbol{H}\in{\mathbb{R}}^{ad\times ad},\boldsymbol{H}_{W}\in{\mathbb{R}}^{a\times a},\boldsymbol{H}_{S}\in{\mathbb{R}}^{d\times d}$ . For each hypothesis $\boldsymbol{H}\boldsymbol{\mu}=\textbf{0}$ with such a matrix $\boldsymbol{H}$ exist projectors $\boldsymbol{T}\in{\mathbb{R}}^{ad\times ad},\boldsymbol{T}_{W}\in{\mathbb{R}}^{a\times a},\boldsymbol{T}_{S}\in{\mathbb{R}}^{d\times d}$ which can be used to formulate the same null hypothesis $\boldsymbol{T}\boldsymbol{\mu}=\textbf{0}$ with $\boldsymbol{T}=\boldsymbol{T}_{W}\otimes\boldsymbol{T}_{S}$ .

Proof:

It is known that the projector $\boldsymbol{T}=\boldsymbol{H}^{\top}[\boldsymbol{H}\boldsymbol{H}^{\top}]^{-}\boldsymbol{H}$ fulfills $\boldsymbol{T}\boldsymbol{\mu}=\textbf{0}\Leftrightarrow\boldsymbol{H}\boldsymbol{\mu}=\textbf{0}$ . For this reason and utilizing well known rules ( see for example [33] ) for generalized inverses we obtain

[TABLE]

Thus, $\boldsymbol{T}_{W}:=\boldsymbol{H}_{W}^{\top}[\boldsymbol{H}_{W}\boldsymbol{H}_{W}^{\top}]^{-}\boldsymbol{H}_{W}$ and $\boldsymbol{T}_{S}:=\boldsymbol{H}_{S}^{\top}[\boldsymbol{H}_{S}\boldsymbol{H}_{S}^{\top}]^{-}\boldsymbol{H}_{S}$ are projectors, i.e. idempotent and symmetric. ∎

For proofing our main results we have to compare various traces of powers of combinations underlying covariance matrices. To this end, we will particularly apply the following inequalities:

Lemma A.2:

For positive real numbers a,b and a symmetric matrix $\boldsymbol{A}\in{\mathbb{R}}^{d\times d}$ it holds

[TABLE]

For $\boldsymbol{A}\in{\mathbb{R}}^{d\times d}$ symmetric with eigenvalues $\lambda_{1},\dots,\lambda_{d}\geq 0$ it holds that

[TABLE]

If $\boldsymbol{\Sigma}_{i}\in{\mathbb{R}}^{d\times d}$ is positive definite and symmetric and $\boldsymbol{T}\in{\mathbb{R}}^{d\times d}$ is idempotent and symmetric it holds for every $k\in{\mathbb{N}}$ that

[TABLE]

Proof:

The first part is an application of the Cauchy–Bunyakovsky–Schwarz inequality, with the Frobenius inner product. Therefore

[TABLE]

The second part just uses the binomial theorem together with the condition $\lambda_{t}\geq 0$ for $t=1,\dots,d$ :

[TABLE]

Finally, the last inequality follows from the second one, if we show that all conditions are fulfilled. With idempotence of $\boldsymbol{T}$ and invariance of the trace under cyclic permutations, it follows for all $k\in{\mathbb{N}}$ that

[TABLE]

Thus, it is sufficient to consider this term. Since $\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}$ is symmetric all powers are symmetric too and it follows with $k^{\prime}=\lfloor k/2\rfloor$ that

[TABLE]

since $\boldsymbol{\Sigma}_{i}$ and $\boldsymbol{I}_{d}$ are positive definite and ${k-2k^{\prime}}\in\{0,1\}$ . So both conditions of the second inequation are shown and

[TABLE]

∎

Furthermore, an inequality for traces which contain $\boldsymbol{\Sigma}_{i}$ and $\boldsymbol{\Sigma}_{r}$ is needed.

Lemma A.3:

Let $\boldsymbol{\Sigma}_{i},\boldsymbol{\Sigma}_{r}\in{\mathbb{R}}^{d\times d}$ be positive definite and symmetric matrices and suppose that $\boldsymbol{T}\in{\mathbb{R}}^{d\times d}$ is idempotent and symmetric. Then it holds for $i\neq r$ that

[TABLE]

Proof:

As shown before $\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}$ and $\boldsymbol{T}\boldsymbol{\Sigma}_{r}\boldsymbol{T}$ are symmetric and positive semidefinite. For this reason, a symmetric matrix $\boldsymbol{W}$ exists with $\boldsymbol{W}\boldsymbol{W}=\boldsymbol{T}\boldsymbol{\Sigma}_{r}\boldsymbol{T}$ . Due the fact that all matrices are symmetric it holds

[TABLE]

and because $\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}$ is positive semidefinite also

[TABLE]

*This allows to use the inequalities from above for this matrix, and again utilizing the invariance of the trace under cyclic permutations we obtain

$\begin{array}[]{ll}\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{\Sigma}_{r}\right)^{2}\right)&=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{T}\boldsymbol{\Sigma}_{r}\boldsymbol{T}\cdot\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{T}\boldsymbol{\Sigma}_{r}\boldsymbol{T}\right)=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\boldsymbol{W}\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\boldsymbol{W}\right)\\ &=\operatorname{tr}\left(\boldsymbol{W}\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\boldsymbol{W}\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\right)=\operatorname{tr}\left(\left(\boldsymbol{W}\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\right)^{2}\right)\\ &\leq\operatorname{tr}^{2}\left(\boldsymbol{W}\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\right)=\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{W}\boldsymbol{W}\right)=\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{T}\boldsymbol{\Sigma}_{r}\boldsymbol{T}\right)\\ &=\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{i}\boldsymbol{T}\boldsymbol{\Sigma}_{r}\right).\end{array}$

∎

To standardize the quadratic form we also have to calculate its moments. Here, the following theorem helps:

Theorem A.4:

Let $\boldsymbol{T}\in{\mathbb{R}}^{d\times d}$ be a symmetric matrix and ${\boldsymbol{X}}\sim\mathcal{N}_{d}\left(\boldsymbol{\mu}_{X},\boldsymbol{\Sigma}_{X}\right),$ where $\boldsymbol{\Sigma}_{X}$ is positive definite. Then with $r\in{\mathbb{N}}$ it holds,

[TABLE]

with $g^{\left(k\right)}=2^{k}k!\left[\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{\Sigma}\right)^{k+1}\right)+\left(k+1\right)\boldsymbol{\mu}_{X}\left(\boldsymbol{T}\boldsymbol{\Sigma}\right)^{k}\boldsymbol{T}\boldsymbol{\mu}_{X}\right]$ for $k\in{\mathbb{N}}$ and $g^{\left(0\right)}=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)+{\boldsymbol{\mu}_{X}}^{\top}\boldsymbol{T}\boldsymbol{\mu}_{X}$ .

Proof:

The proof can be found on page 53 in [29]. ∎

Korollar A.5:

*Let $\boldsymbol{T}\in{\mathbb{R}}^{d\times d}$ be a symmetric matrix and $\boldsymbol{X}\sim\mathcal{N}_{d}\left(\boldsymbol{0}_{d},\boldsymbol{\Sigma}_{X}\right)$ and ${\boldsymbol{Y}}\sim\mathcal{N}_{d}\left(\boldsymbol{0}_{d},\boldsymbol{\Sigma}_{Y}\right)$ independent, where $\boldsymbol{\Sigma}_{X},\boldsymbol{\Sigma}_{Y}\in{\mathbb{R}}^{d\times d}$ are positive definite. Then we have for all $n_{i},n_{r},N\in{\mathbb{N}}$ that

$\begin{array}[]{ll}{\mathbb{E}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{X}}\right)^{1}\right)=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right),\\[5.59721pt] {\mathbb{E}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{X}}\right)^{2}\right)=2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)^{2}\right)+\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)\stackrel{{\scriptstyle\ref{Spur1}}}{{=}}\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)\right),\\[7.74998pt] \operatorname{{\it Var}}\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{X}}\right)=\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)\right),\end{array}$

$\begin{array}[]{l}{\mathbb{E}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{1}\right)=0,\\[4.30554pt] {\mathbb{E}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{2}\right)=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right),\\[4.30554pt] {\mathbb{E}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{3}\right)=0,\\[4.30554pt] {\mathbb{E}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{4}\right)=6\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right)^{2}\right)+3\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right),\end{array}$

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right),\\[4.30554pt] \operatorname{{\it Var}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{2}\right)=6\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right)^{2}\right)+2\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right),\\[8.61108pt] \frac{4N}{n_{i}^{2}n_{r}^{2}}\operatorname{{\it Var}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{2}\right)\stackrel{{\scriptstyle\ref{Spur2}}}{{=}}\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\frac{N}{n_{i}}\boldsymbol{T}\boldsymbol{\Sigma}_{X}\cdot\frac{N}{n_{r}}\boldsymbol{T}\boldsymbol{\Sigma}_{Y}\right)^{2}\right)\right).\end{array}$

*Moreover, for $\boldsymbol{\Sigma}_{X}=\boldsymbol{\Sigma}_{Y}$

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)=\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)\right),\\[8.61108pt] \operatorname{{\it Var}}\left(\left({\boldsymbol{X}}^{\top}\boldsymbol{T}{\boldsymbol{Y}}\right)^{2}\right)\stackrel{{\scriptstyle\ref{Spur1}}}{{=}}\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{\Sigma}_{X}\boldsymbol{T}\boldsymbol{\Sigma}_{X}\right)\right).\end{array}$ *

Proof:

Using the inequalities for traces and with the bilinear form written as

[TABLE]

all equations follows with the previous theorem. ∎

Lemma A.6:

Let $X_{n}\in\mathcal{L}^{2}$ be a real random variable with ${\mathbb{E}}(X_{n})=\mu$ , $b_{n,d}$ a sequence with ${\lim_{n,d\to\infty}b_{n,d}=0}$ , and $c_{a,d,n_{\min}}$ a sequence with $\lim_{a,d,n_{\min}\to\infty}c_{n,d}=0$ then it holds

•

$\operatorname{{\it Var}}\left({X_{n}}\right)\leq b_{n,d}\ \ \Rightarrow\ {X}_{n}\text{ is an consistent estimator for }\mu,{\text{ if }n,d\to\infty,}$ **

•

$\operatorname{{\it Var}}\left({X_{n}}\right)\leq c_{a,d,n_{\min}}\ \Rightarrow{X}_{n}\text{ is an consistent estimator for }\mu,{\text{ if }a,d,n_{\min}\to\infty.}$ **

For $\mu\neq 0$ they are especially ratio-consistent.

Proof:

For arbitrary $\epsilon>0$ the Tschebyscheff inequality leads to

[TABLE]

Consider the limit for $n,d\to\infty$ justifies the consistency and using this for $X_{n}/\mu$ leads to ratio-consistency. The second part follows identically. ∎

This result is especially true if $b_{n,d}$ or $c_{a,d,n_{\min}}$ only depends on n resp. $n_{\min}$ .

For completeness we state a straightforward application of the Cauchy–Bunyakovsky–Schwarz inequality:

Lemma A.7:

For real random variables $X,Y\in\mathcal{L}^{2}$ it holds

[TABLE]

and so for $X,Y$ identically distributed

[TABLE]

The next result gives equivalent conditions for $\beta_{1}\to a\in\{0,1\}$ :

Lemma A.8:

Let be $\lambda_{\ell}$ again the eigenvalues of $\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}$ sorted so that $\lambda_{1}$ is the biggest one. Then it follows

[TABLE]

Moreover we know $0\leq\frac{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}=\tau_{P}\leq 1.$ This Lemma also holds if $\lim_{N,d\to\infty}$ is replaced by $\lim_{a,N}\to\infty$ or $\lim_{a,d,N\to\infty}$ .

Proof:

This follows from Lemma 8.1 given in the supplement in [31][page 21] since their result does not depend on the concrete matrix, i.e. can be directly applied for $\boldsymbol{V}_{N}$ . Moreover, the different asymptotic frameworks do not influence the proof since they are hidden within the above convergences. ∎

To prove the properties of the subsampling-type estimators some auxiliaries are needed. In particular, the following lemma allows us to decompose the variances and to use conditional terms for the calculation.

Lemma A.9:

Let $X$ be a real random variable and denote by $\mathcal{F}$ a $\sigma$ -field. Then it holds that

[TABLE]

Proof:

With the rules for conditional expectations we calculate

[TABLE]

The result follows by sum up this both parts. ∎

We will apply the result for certain amounts (i.e. numbers) of pairs below. There, for each $i=1,\dots,a$ and $b=1,\dots,B$ we independently draw random subsamples $\{\sigma_{1i}(b),\dots,\sigma_{mi}(b)\}$ of length $m$ from $\{1,\dots,n_{i}\}$ and store them in a joint random vector $\boldsymbol{\sigma}(b,m)=(\boldsymbol{\sigma}_{1}(b,m),\dots,\boldsymbol{\sigma}_{a}(b,m))=(\sigma_{11}(b),\dots,\sigma_{ma}(b))$ . Besides we define ${\mathbb{N}}_{k}=\{1,\dots,k\}$ .

Lemma A.10:

Let $M(B,\boldsymbol{\sigma}(b,m))$ be the amount of pairs $(k,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}$ , which fulfill $\boldsymbol{\sigma}(k,m)$ and $\boldsymbol{\sigma}(\ell,m)$ have totally different elements and analogue $M(B,\boldsymbol{\sigma}_{i}(b,m))$ . As long as $m\leq n_{i}$ for all $i\in{\mathbb{N}}_{a}$ , it holds

[TABLE]

and for $m\leq n_{i}$

[TABLE]

*where $|\cdot|$ denotes the number of elements.

Let $M(B,(\boldsymbol{\sigma}_{i}(b,m),\boldsymbol{\sigma}_{r}(b,m)))$ be the amount of pairs $(k,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}$ fulfilling $\boldsymbol{\sigma}_{i}(k,m)$ and $\boldsymbol{\sigma}_{i}(\ell,m)$ and moreover $\boldsymbol{\sigma}_{r}(k,m)$ and $\boldsymbol{\sigma}_{r}(\ell,m)$ have totally different elements. If $m\leq n_{i}$ it holds

[TABLE]

Proof:

*Because $M(B,\boldsymbol{\sigma}(b,m))$ never contains pairs of the kind (k,k) the maximal number of elements is $B^{2}-B$ . The fact that two vectors $\boldsymbol{a},\boldsymbol{b}\in{\mathbb{R}}^{n}$ have no element in common, even at different components, is denoted as $\boldsymbol{a}{\neq}!\boldsymbol{b}$ .

The number of totally different pairs can be seen as a binomial distribution with $B^{2}-B$ elements, and to calculate the necessary probability independence is used. With the fact that all combinations in this situation have the same probability it follows that*

[TABLE]

If two times $m$ elements are picked from ${\mathbb{N}}_{n_{i}}$ there are $\binom{n_{i}}{m}^{2}$ possibilities, where in $\binom{n_{i}}{m}\cdot\binom{n_{i}-m}{m}$ of them both $m$ -tuples are totally different. This leads to the stated probability and with the mean of the binomial distribution we get

[TABLE]

All in all we calculate

[TABLE]

For $M(B,(\boldsymbol{\sigma}_{i}(b,m),\boldsymbol{\sigma}_{r}(b,m)))$ and $M(B,\boldsymbol{\sigma}_{i}(b,m))$ less multiplications are needed, so the results follow. ∎

If $B(N)\to\infty$ (for example B could be chosen proportional to N) these terms converge to zero, disregarding the number of groups or of m.

A.2 Proofs of Section 3

Proof of Theorem 3.1 (p.3.1):

*The proof of this lemma is very similar to the one from [31][Theorem 2.1]. Due to the fact that a finite sum of multivariate normally distributed random variables is again multivariate normally distributed, the representation theorem can be used to (distributionally equivalently) express the quadratic form as $W_{N}=\sum_{s=1}^{ad}\frac{\lambda_{s}}{\sqrt{\sum_{\ell=1}^{ad}\lambda_{\ell}^{2}}}\left(\frac{C_{s}-1}{\sqrt{2}}\right)$ .

The only differences to [31][Theorem 2.1] are that in the case of more groups the eigenvalues do not only depend on $d$ but also on the $n_{i}$ and $a$ and that there are more terms to sum. The first point has only an influence on the limit of the $\beta_{s}$ . The higher number of summands does not matter because we observe the asymptotic under the asymptotic frameworks (4)-(5), for which at least $a$ or $d$ converge to infinity. The proofs from [31][Theorem 2.1] only need the representation from above, a number of summations which goes to infinity and the conditions on the limits of the $\beta_{s}$ . Since these are fulfilled the proof can be conducted in the same way. ∎

Proof of 3.1 (p.3.1):

*Remember that with ${\boldsymbol{Y}}_{i,\ell,k}:=\boldsymbol{X}_{i,\ell}-\boldsymbol{X}_{i,k}$ and $i\neq r\in{\mathbb{N}}_{a}$ , $a>1$ trace estimators were defined by

$\begin{array}[]{l}A_{i,1}\hskip 5.69046pt=\frac{1}{2\cdot\binom{n_{i}}{2}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\left({\boldsymbol{X}}_{i,\ell_{1}}-{\boldsymbol{X}}_{i,\ell_{2}}\right)^{\top}\boldsymbol{T}_{S}\left({\boldsymbol{X}}_{i,\ell_{1}}-{\boldsymbol{X}}_{i,\ell_{2}}\right),\end{array}\\ \\ \begin{array}[]{l}A_{i,r,2}=\frac{1}{4\cdot\binom{n_{i}}{2}\binom{n_{r}}{2}}{\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1},k_{2}=1\\ k_{1}>k_{2}\end{subarray}}^{n_{r}}\left[\left({\boldsymbol{X}}_{i,\ell_{1}}-{\boldsymbol{X}}_{i,\ell_{2}}\right)^{\top}\boldsymbol{T}_{S}\left({\boldsymbol{X}}_{r,k_{1}}-{\boldsymbol{X}}_{r,k_{2}}\right)\right]^{2}},\end{array}\\ \\ \begin{array}[]{l}A_{i,3}\hskip 5.69046pt=\frac{1}{4\cdot 6\binom{n_{i}}{4}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{2}=1\\ k_{2}\neq\ell_{1}\neq\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1}=1\\ \ell_{2}\neq\ell_{1}\neq k_{1}>k_{2}\end{subarray}}^{n_{i}}\left[\left({\boldsymbol{X}}_{i,\ell_{1}}-{\boldsymbol{X}}_{i,\ell_{2}}\right)^{\top}\boldsymbol{T}_{S}\left({\boldsymbol{X}}_{i,k_{1}}-{\boldsymbol{X}}_{i,k_{2}}\right)\right]^{2},\end{array}\\ \\ \begin{array}[]{l}A_{4}\hskip 11.38092pt=\sum_{i=1}^{a}\left(\frac{N}{n_{i}}\right)^{2}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}+2\sum_{i=1}^{a}\sum_{r=1,r<i}^{a}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}.\end{array}$ *

For $\ell\neq k$ we know ${\boldsymbol{Y}}_{i,\ell,k}\sim\mathcal{N}\left(\boldsymbol{0}_{d},2\boldsymbol{\Sigma}_{i}\right)$ and for totally different indices the ${\boldsymbol{Y}}_{i,\ell,k}$ are statistically independent. So the previous lemmata can be used to calculate the moments. The unbiasedness can be shown by calculating the expectation values for each estimator

[TABLE]

*The following argument will be used several times in this work with small differences, so incidentally it will be more detailed.

*To check the variance we recognize first that $\operatorname{{\it Cov}}\left[{{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}\large{\textbf{;}}\normalsize{{\boldsymbol{Y}}_{i,\ell_{1}^{\prime},\ell_{2}^{\prime}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,\ell_{1}^{\prime},\ell_{2}^{\prime}}\right]$ is 0 if all indices are totally different, so just $\binom{n_{i}}{2}\left(\binom{n_{i}}{2}-\binom{n_{i}-2}{2}\right)$ combinations remain. Instead of calculating the covariances of the remaining quadratic forms it is easier to use lemmata from above. By using the fact that all quadratic forms are identically distributed, we can calculate the variances which are all the same so it is just the number of remaining combinations multiplied with the variances. This leads to:

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(A_{i,1}\right)&=\frac{1}{4\cdot\binom{n_{i}}{2}^{2}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1}^{\prime},\ell_{2}^{\prime}=1\\ \ell_{1}^{\prime}>\ell_{2}^{\prime}\end{subarray}}^{n_{i}}\operatorname{{\it Cov}}\left[{{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}\hskip 1.42271pt\large{\textbf{;}}\normalsize\hskip 1.42271pt{{\boldsymbol{Y}}_{i,\ell_{1}^{\prime},\ell_{2}^{\prime}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,\ell_{1}^{\prime},\ell_{2}^{\prime}}\right]\\ &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{\binom{n_{i}}{2}-\binom{n_{i}-2}{2}}{4\binom{n_{i}}{2}}\operatorname{{\it Var}}\left[{{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,1,2}\right]+\frac{\binom{n_{i}-2}{2}}{4\binom{n_{i}}{2}}\cdot 0\\[6.45831pt] &\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\frac{\binom{n_{i}}{2}-\binom{n_{i}-2}{2}}{4\binom{n_{i}}{2}}{\mathcal{O}\left(\operatorname{tr}^{2}\left(2\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}\\[8.61108pt] &=\mathcal{O}\left(n_{i}^{-1}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right).\end{array}$

*With these values we know for $\boldsymbol{V}_{N}=\bigoplus_{i=1}^{a}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}$ that

$\begin{array}[]{ll}{\mathbb{E}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}\right)=\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}{\mathbb{E}}\left(A_{i,1}\right)=\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)\end{array}$

*and

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}}{{\mathbb{E}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}\right)}\right)&=\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}\operatorname{{\it Var}}(A_{i,1})}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\\ &\leq\frac{\sum\limits_{i=1}^{a}\mathcal{O}\left(n_{i}^{-1}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}}{{\mathbb{E}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}\right)}\right)}&\leq\frac{\mathcal{O}\left(\frac{1}{n_{\min}}\right)\cdot\mathcal{O}\left(\sum\limits_{i=1}^{a}\operatorname{tr}^{2}\left(\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\end{array}$ *

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}}{{\mathbb{E}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}\right)}\right)}&\leq\frac{\mathcal{O}\left(\frac{1}{n_{\min}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}=\mathcal{O}\left(\frac{1}{n_{\min}}\right).\end{array}$

So the conditions for an unbiased and ratio-consistent estimator are fulfilled.

The same steps with a different number of remaining combinations leads to

$\begin{array}[]{ll}{\mathbb{E}}\left(A_{i,3}\right)&={\frac{1}{4\cdot 6\binom{n_{i}}{4}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1},k_{2}=1\\ \ell_{2}\neq\ell_{1}\neq k_{1}>k_{2}\neq\ell_{1}\neq\ell_{2}\end{subarray}}^{n_{i}}{\mathbb{E}}\left(\left[{{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,k_{1},k_{2}}\right]^{2}\right)}\\ &\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\frac{1}{4\cdot 6\binom{n_{i}}{4}}\cdot{6\binom{n_{i}}{4}}\cdot\operatorname{tr}\left(4\cdot\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right),\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left({A_{i,3}}\right)&=\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1},k_{2}=1\\ \ell_{2}\neq\ell_{1}\neq k_{1}>k_{2}\neq\ell_{1}\neq\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1}^{\prime},\ell_{2}^{\prime}=1\\ \ell_{1}^{\prime}>\ell_{2}^{\prime}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1}^{\prime},k_{2}^{\prime}=1\\ \ell_{2}^{\prime}\neq\ell_{1}^{\prime}\neq k_{1}^{\prime}>k_{2}^{\prime}\neq\ell_{1}^{\prime}\neq\ell_{2}^{\prime}\end{subarray}}^{n_{i}}\frac{\operatorname{{\it Cov}}\left(\left[{{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,k_{1},k_{2}}\right]^{2}\hskip 1.42271pt\large{\textbf{;}}\hskip 1.42271pt\left[{{\boldsymbol{Y}}_{i,\ell_{1}^{\prime},\ell_{2}^{\prime}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,k_{1}^{\prime},k_{2}^{\prime}}\right]^{2}\right)}{4^{2}\cdot 6^{2}\cdot\binom{n_{i}}{4}^{2}}\\[17.22217pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{6\binom{n_{i}}{4}-6\binom{n_{i}-4}{4}}{4^{2}\cdot 6\cdot\binom{n_{i}}{4}}\operatorname{{\it Var}}\left(\left[{{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,3,4}\right]^{2}\right)\\[8.61108pt] &\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\frac{\binom{n_{i}}{4}-\binom{n_{i}-4}{4}}{16\binom{n_{i}}{4}}\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)\\[8.61108pt] &=\mathcal{O}\left(n_{i}^{-1}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right),\end{array}$

$\begin{array}[]{ll}{\mathbb{E}}\left(A_{i,r,2}\right)&=\frac{1}{4\cdot\binom{n_{i}}{2}\binom{n_{r}}{2}}{\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1},k_{2}=1\\ k_{1}>k_{2}\end{subarray}}^{n_{r}}{\mathbb{E}}\left(\left[{{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{r,k_{1},k_{2}}\right]^{2}\right)}\\ &\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\frac{1}{4\cdot\binom{n_{i}}{2}\binom{n_{r}}{2}}\cdot\binom{n_{i}}{2}\cdot\binom{n_{r}}{2}\cdot\operatorname{tr}\left(4\cdot\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)=\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right),\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{2N^{2}}{n_{i}n_{r}}A_{i,r,2}\right)&=\frac{4N^{4}}{n_{i}^{2}n_{r}^{2}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1},\ell_{2}=1\\ \ell_{1}>\ell_{2}\end{subarray}}^{n_{1}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1},k_{2}=1\\ k_{1}>k_{2}\end{subarray}}^{n_{2}}\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1}^{\prime},\ell_{2}^{\prime}=1\\ \ell_{1}^{\prime}>\ell_{2}^{\prime}\end{subarray}}^{n_{i}}\sum\limits_{\footnotesize\begin{subarray}{c}k_{1}^{\prime},k_{2}^{\prime}=1\\ k_{1}^{\prime}>k_{2}^{\prime}\end{subarray}}^{n_{r}}\frac{\operatorname{{\it Cov}}\left(\left[{{\boldsymbol{Y}}_{i,\ell_{1},\ell_{2}}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{r,k_{1},k_{2}}\right]^{2}\hskip 1.42271pt\large{\textbf{;}}\hskip 1.42271pt\left[{{\boldsymbol{Y}}_{i,\ell_{1}^{\prime},\ell_{2}^{\prime}}}^{\top}\boldsymbol{T}_{S}\hskip 0.71114pt{\boldsymbol{Y}}_{r,k_{1}^{\prime},k_{2}^{\prime}}\right]^{2}\right)}{16\cdot\binom{n_{i}}{2}^{2}\binom{n_{r}}{2}^{2}}\\[12.91663pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{4N^{4}}{n_{i}^{2}n_{r}^{2}}\frac{\binom{n_{i}}{2}\binom{n_{r}}{2}-\binom{n_{i}-2}{2}\binom{n_{r}-2}{2}}{16\cdot\binom{n_{i}}{2}\binom{n_{r}}{2}}\operatorname{{\it Var}}\left(\left[{{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}\hskip 0.71114pt{\boldsymbol{Y}}_{r,1,2}\right]^{2}\right)\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(\frac{2N^{2}}{n_{i}n_{r}}A_{i,r,2}\right)}&\stackrel{{\scriptstyle\ref{QF4}}}{{\leq}}\frac{\binom{n_{i}}{2}\binom{n_{r}}{2}-\binom{n_{i}-2}{2}\binom{n_{r}-2}{2}}{\binom{n_{i}}{2}\binom{n_{r}}{2}}\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\frac{N}{n_{i}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)\right)\\[6.45831pt] &\leq\mathcal{O}\left(\frac{1}{n_{\min}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\frac{N}{n_{i}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)\right).\end{array}$

*Finally, the conditions for $A_{4}$ have to be checked. With the expectation values from above we calculate

$\begin{array}[]{ll}{\mathbb{E}}\left(A_{4}\right)&=\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}{\mathbb{E}}(A_{i,3})+2\sum\limits_{i=1}^{a}\sum\limits_{r=1,r<i}^{a}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}{\mathbb{E}}\left(A_{i,r,2}\right)\\[6.45831pt] &=\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}\operatorname{tr}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)+2\sum\limits_{i=1}^{a}\sum\limits_{r=1,r<i}^{a}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

To calculate the variances the following additional inequalities are needed:

$\begin{array}[]{ll}\frac{\operatorname{{\it Var}}\left(\sum\limits_{i=1}^{a}\left(\frac{N}{n_{i}}\right)^{2}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}&=\frac{\sum\limits_{i=1}^{a}\operatorname{{\it Var}}\left(\left(\frac{N}{n_{i}}\right)^{2}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\ &\leq\sum\limits_{i=1}^{a}\mathcal{O}\left(n_{i}^{-1}\right)\cdot\frac{\mathcal{O}\left({(\boldsymbol{T}_{W})_{ii}}^{4}\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\ &\leq\mathcal{O}\left(\frac{1}{n_{\min}}\right)\frac{\mathcal{O}\left(\operatorname{tr}^{2}\left(\sum\limits_{i=1}^{a}{(\boldsymbol{T}_{W})_{ii}}^{2}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}=\mathcal{O}\left(\frac{1}{n_{\min}}\right)\end{array}$

*and

$\begin{array}[]{ll}&\frac{\operatorname{{\it Var}}\left(2\sum\limits_{r<i\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\[6.45831pt] \stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}&4\sum\limits_{i<r\in{\mathbb{N}}_{a}}\sum\limits_{h<g\in{\mathbb{N}}_{a}}\frac{\sqrt{\operatorname{{\it Var}}\left(\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}A_{i,r,2}\right)}\sqrt{\operatorname{{\it Var}}\left(\frac{N^{2}}{n_{h}n_{g}}{(\boldsymbol{T}_{W})_{gh}}A_{h,g,2}\right)}}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\[8.61108pt] \end{array}$

$\begin{array}[]{ll}\leq&\left(\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\frac{\sqrt{\mathcal{O}\left(\frac{1}{n_{\min}}\right)}{(\boldsymbol{T}_{W})_{ir}}^{2}\operatorname{tr}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\frac{N}{n_{r}}\boldsymbol{\Sigma}_{r}\right)}{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)^{2}\\[8.61108pt] \leq&\mathcal{O}\left(\frac{1}{n_{\min}}\right)\left(\frac{\mathcal{O}\left(\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}{(\boldsymbol{T}_{W})_{ir}}^{2}\operatorname{tr}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\frac{N}{n_{r}}\boldsymbol{\Sigma}_{r}\right)\right)}{\sum\limits_{i,r\in{\mathbb{N}}_{a}}{(\boldsymbol{T}_{W})_{ir}}^{2}\operatorname{tr}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)}\right)^{2}\leq\mathcal{O}\left(\frac{1}{n_{\min}}\right).\end{array}$

*Together this leads to

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{A_{4}}{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\left[\sqrt{\frac{\operatorname{{\it Var}}\left(2\sum\limits_{r<i\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}+\sqrt{\frac{\operatorname{{\it Var}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}\right]^{2}\\ &\left[\sqrt{\mathcal{O}\left(\frac{1}{n_{\min}}\right)}+\sqrt{\mathcal{O}\left(\frac{1}{n_{\min}}\right)}\right]^{2}=\mathcal{O}\left(\frac{1}{n_{\min}}\right)\end{array}$

*and therefore $A_{4}$ is an unbiased and ratio-consistent estimator of $\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)$ .

Moreover, we want to stress that the zero sequences used as upper border for $\widehat{E_{H_{0}}}(Q_{N})$ and $A_{4}$ do not depend on the number of groups or dimensions, so this estimators can be also used for increasing number of groups.

*With the expectation values and variances from the beginning it follows directly that $A_{i,1},A_{i,r,2},A_{i,3}$ are unbiased, ratio-consistent estimators of $\operatorname{tr}(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}),\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)$ and $\operatorname{tr}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)$ .

It is worth to note that all of this estimators also consistent estimators which are even dimension-stable in the sense of [8]. ∎

For $A_{i,r,2}$ there exists a alternative form which can be implemented substantially more efficient and was considered in [9]. It is based on matrices of the form $\widehat{\boldsymbol{M}}_{i,r}=\boldsymbol{P}_{n_{i}}\left(\boldsymbol{T}_{S}{\boldsymbol{X}}_{i,1},\dots,\boldsymbol{T}_{S}{\boldsymbol{X}}_{i,n_{i}}\right)^{\top}\cdot\left(\boldsymbol{T}_{S}{\boldsymbol{X}}_{r,1},\dots,\boldsymbol{T}_{S}{\boldsymbol{X}}_{r,n_{r}}\right)\boldsymbol{P}_{n_{r}}^{\top}$ . Recalling that $\textbf{1}_{n}$ is the vector of ones and $\#$ denotes the Hadamard-Schur-Product, it can be seen that

[TABLE]

For $A_{i,3}$ there also exists an alternative formula, which expands much longer, but is more efficient:

[TABLE]

To finally prove Theorem 3.2 (p.3.2) we need another lemma.

Lemma A.11:

For the previously defined estimators it holds for $n_{\min}\to\infty$ that

[TABLE]

Proof:

*We know that

$\begin{array}[]{ll}&{\mathbb{E}}\left(\frac{\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}-\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)\right)}}\right)=\frac{\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\left({\mathbb{E}}\left(A_{i,1}\right)-\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}=0.\end{array}$

*Thus,

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\left(A_{i,1}-\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}\right)&=\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}\operatorname{{\it Var}}\left(A_{i,1}\right)}{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\[12.91663pt] &\stackrel{{\scriptstyle\text{Proof of \ref{Schae1}}}}{{\leq}}\mathcal{O}\left(\frac{1}{n_{\min}}\right)\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}\operatorname{tr}\left(\left(2\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)}{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}=\mathcal{O}\left(\frac{1}{n_{\min}}\right).\end{array}$

In the last step we used the fact that all terms are non-negative and applied the binomial theorem in the last inequality. It is a zero sequence which only depends on $n_{\min}$ , so again with A.6 (p.A.6) the result is proved. ∎*

Proof of Theorem 3.2 (p.3.2):

*From A.6 it follows for $n_{\min}\to\infty$ and independent of $a$ or $d$ that $A_{4}\left/{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right.\stackrel{{\scriptstyle P}}{{\longrightarrow}}1$ and therefore ${{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\left/A_{4}\right.\stackrel{{\scriptstyle P}}{{\longrightarrow}}1}$ . Moreover, it also follows that $\sqrt{{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\left/A_{4}\right.}\stackrel{{\scriptstyle P}}{{\longrightarrow}}1$ and with A.11 we deduce $\frac{\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}-\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}\stackrel{{\scriptstyle P}}{{\longrightarrow}}0$ .

*Thus, we can finally calculate the standardized quadratic form as

$\begin{array}[]{ll}W_{N}&=\frac{Q_{N}-\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}}{\sqrt{2A_{4}}}\\[4.30554pt] &=\left(\frac{Q_{N}-\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}-\frac{\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}-\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}\right)\cdot\sqrt{\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}{A_{4}}}\\[10.76385pt] &=\left(\frac{Q_{N}-\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}{\sqrt{2\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}-o_{p}(1)\right)\cdot(1+o_{p}(1))\\[10.76385pt] &=\widetilde{W}_{N}+\widetilde{W}_{N}\cdot o_{p}(1)-o_{p}(1)-o_{p}(1)\cdot o_{p}(1).\end{array}$

The last two parts converge in probability to zero, so also in distribution and with Slutzky $\widetilde{W}_{N}\cdot o_{p}(1)$ converge in distribution to zero if one of the conditions of Theorem 3.1 is fulfilled. Thereby $W_{N}$ has asymptotical the same distribution as $\widetilde{W}_{N}$ .∎

For large numbers of groups many estimators $A_{i,1},A_{i,r,2}$ and $A_{i,3}$ and have to be calculated which leads to long computation time. In this cases it is better to again use subsamling-type estimators which leads to $A_{i,1}^{\star},{A^{\star}_{i,r,2}},{A^{\star}_{i,3}}$ and therefore to $A^{\star}_{4}$ .

Lemma A.12:

*With the definitions from above let be

$\begin{array}[]{l}A_{i,1}^{\star}(B)\hskip 5.69046pt=\frac{1}{2\cdot B}\sum\limits_{b=1}^{B}{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}},\end{array}$

$\begin{array}[]{l}A_{i,r,2}^{\star}(B)\hskip 0.28436pt=\frac{1}{4\cdot B}{\sum\limits_{b=1}^{B}\left[{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{r,\sigma_{r1}(b),\sigma_{r2}(b)}}\right]^{2}},\end{array}$

$\begin{array}[]{l}A_{i,3}^{\star}(B)\hskip 5.69046pt=\frac{1}{4\cdot B}{\sum\limits_{b=1}^{B}\left[{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,\sigma_{i3}(b),\sigma_{i4}(b)}}\right]^{2}},\end{array}$

$\begin{array}[]{l}A_{4}^{\star}(B)\hskip 11.38092pt=\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}\cdot A_{i,3}^{\star}(B)+2\sum_{i=1}^{a}\sum_{r=1,r<i}^{a}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}^{\star}(B).\end{array}$

If $B(N)\to\infty$ , this estimators and $\sum_{i=1}^{a}A_{i,1}^{\star}$ have the same properties as $A_{i,1},A_{i,r,2},A_{i,3},A_{4}$ and $\sum_{i=1}^{a}{A_{i,1}}$ which were defined in 3.1 (p.3.1) .

Proof:

*For $A_{i,1}^{\star}(B)$ , this lemma will be proved in detail. For all other terms only the major steps are shown.

*The unbiasedness is clear because the random variables $\sigma_{i1}(b),\sigma_{i2}(b)$ have no influence on the number of terms of the sum and also the terms are identically distributed. Hence,

$\begin{array}[]{ll}{\mathbb{E}}\left(A_{i,1}^{\star}(B)\right)&=\frac{1}{2\cdot B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left({{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}\right)\\[8.61108pt] &=\frac{1}{2\cdot B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left({{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,1,2}\right)\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\operatorname{tr}(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}).\end{array}$

*The second part is more complicated. Let $\mathcal{F}(\boldsymbol{\sigma}_{i}(B,m))$ be the smallest $\sigma$ -field which contains $\boldsymbol{\sigma}_{i}(b,m)\ \forall b\in B$ , so obvious $M(B,\boldsymbol{\sigma}_{i}(b))$ is $\mathcal{F}(\boldsymbol{\sigma}_{i}(B))$ -measurable. Identical for $\mathcal{F}(\boldsymbol{\sigma}_{i}(B,m),\boldsymbol{\sigma}_{r}(B,m))$ and $\mathcal{F}(\boldsymbol{\sigma}(B,m))$ . Similar to the previous part, the distribution of the bilinear form does not depend on the index combination. Together with the independence of the normally distributed vectors and $\sigma_{i1}(b),\sigma_{i2}(b)$ this leads to

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\mathbb{E}}\left(A_{i,1}^{\star}(B)\big{|}\mathcal{F}(\boldsymbol{\sigma}_{i}(B,2))\right)\right)=\operatorname{{\it Var}}\left(\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)=0.\end{array}$

With A.9 (p.A.9) we thus obtain

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(A_{i,1}^{\star}(B)\right)&=0+{\mathbb{E}}\left(\operatorname{{\it Var}}\left(A_{i,1}^{\star}(B)|\mathcal{F}(\boldsymbol{\sigma}_{i}(B,2))\right)\right).\end{array}$

*For the calculation of the conditional variance of the sum, it would be useful finding an upper bound that is based on the variance instead of calculate the covariances. To achieve this, we calculate the number of index combinations which leads to a covariance that is zero. This amount is non-deterministic and we recognize it contains the amount $M(B,\boldsymbol{\sigma}_{i}(b,2))$ which was considered before.

Again not the amount is important but the number of elements which are contained in $M(B,\boldsymbol{\sigma}_{i}(b,2))$ since the bilinear forms are identically distributed. Therefore the condition of the variance of the bilinear form disappears since the random indices have no influence on the variance. With the $\mathcal{F}(\boldsymbol{\sigma}_{i}(B,2))$ -measurability of $M(B,\boldsymbol{\sigma}_{i}(b,2))$ it thus follows that

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(A_{i,1}^{\star}(B)\right)&=0+{\mathbb{E}}\left(\operatorname{{\it Var}}\left(A_{i,1}^{\star}(B)|\mathcal{F}(\boldsymbol{\sigma}_{i}(B,2))\right)\right)\\[4.30554pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{1}{4B^{2}}{\mathbb{E}}\left(\sum\limits_{(j,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,(\boldsymbol{\sigma}_{i}(b,2)))}\operatorname{{\it Var}}\left({{\boldsymbol{Y}}_{i,\sigma_{i1}(j),\sigma_{i2}(j)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,\sigma_{i1}(j),\sigma_{i2}(j)}}\big{|}\mathcal{F}(\boldsymbol{\sigma}_{i}(B,2))\right)\right)\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(A_{i,1}^{\star}(B)\right)}&=\frac{1}{4B^{2}}{\mathbb{E}}\left(\sum\limits_{(j,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,(\boldsymbol{\sigma}_{i}(b,2)))}\operatorname{{\it Var}}\left({{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,1,2}}\right)\right)\\[10.76385pt] &\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\frac{{\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,(\boldsymbol{\sigma}_{i}(b,2)))|\right)}{B^{2}}\cdot\frac{\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{4}\\[2.15277pt] &\stackrel{{\scriptstyle\ref{Menge}}}{{=}}\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{i}-2}{2}}{\binom{n_{i}}{2}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right).\end{array}$

*The other values are calculated in a similar way.

$\begin{array}[]{ll}{\mathbb{E}}\left(A_{i,r,2}^{\star}(B)\right)&=\frac{1}{4\cdot B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\left[{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{r,\sigma_{r1}(b),\sigma_{r2}(b)}}\right]^{2}\right)\\[8.61108pt] &=\frac{1}{4\cdot B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\left[{{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{r,1,2}\right]^{2}\right)\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\operatorname{tr}(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}).\end{array}$

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\mathbb{E}}\left(A_{i,r,2}^{\star}(B)|\mathcal{F}(\boldsymbol{\sigma}_{i}(B,2),\boldsymbol{\sigma}_{r}(B,2))\right)\right)=\operatorname{{\it Var}}\left(\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)\right)=0.\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(A_{i,r,2}^{\star}(B)\right)&=0+{{\mathbb{E}}\left(\operatorname{{\it Var}}\left(A_{i,r,2}^{\star}(B)|\mathcal{F}(\boldsymbol{\sigma}_{i}(B),\boldsymbol{\sigma}_{r}(B,2))\right)\right)}\\[4.30554pt] &\leq\frac{{\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}_{i}(b,2),\boldsymbol{\sigma}_{r}(b,2))|\right)}{B^{2}}\cdot\operatorname{{\it Var}}\left(\left[{{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{r,1,2}}\right]^{2}\right)\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(A_{i,r,2}^{\star}(B)\right)}&\stackrel{{\scriptstyle\ref{QF4}}}{{\leq}}\frac{{\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}_{i}(b,2),\boldsymbol{\sigma}_{r}(b,2))|\right)}{B^{2}}\cdot\mathcal{O}\left({\operatorname{tr}^{2}\left(\frac{N}{n_{i}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)}\right)\\[6.45831pt] &\stackrel{{\scriptstyle\ref{Menge}}}{{=}}\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{i}-2}{2}}{\binom{n_{i}}{2}}\cdot\frac{\binom{n_{r}-2}{2}}{\binom{n_{r}}{2}}\right)\cdot\mathcal{O}\left({\operatorname{tr}^{2}\left(\frac{N}{n_{i}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)}\right)\\[8.61108pt] &\leq\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}^{2}}{\binom{n_{\min}}{2}^{2}}\right)\cdot\mathcal{O}\left({\operatorname{tr}^{2}\left(\frac{N}{n_{i}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)}\right).\end{array}$

$\begin{array}[]{ll}{\mathbb{E}}\left(A_{i,3}^{\star}(B)\right)&=\frac{1}{4\cdot B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\left[{{\boldsymbol{Y}}_{i,\sigma_{i1}(b),\sigma_{i2}(b)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,\sigma_{i3}(b),\sigma_{i4}(b)}}\right]^{2}\right)\\[8.61108pt] &=\frac{1}{4\cdot B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\left[{{\boldsymbol{Y}}_{i,1,2}}^{\top}\boldsymbol{T}_{S}{\boldsymbol{Y}}_{i,1,2}\right]^{2}\right)\stackrel{{\scriptstyle\ref{QF4}}}{{=}}\operatorname{tr}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right).\end{array}$

$\begin{array}[t]{l}\operatorname{{\it Var}}\left({\mathbb{E}}\left(A_{i,3}^{\star}(B)|\mathcal{F}(\boldsymbol{\sigma}_{i}(B,4))\right)\right)=\operatorname{{\it Var}}\left(\operatorname{tr}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)=0.\end{array}$

$\begin{array}[t]{ll}\operatorname{{\it Var}}\left(A_{i,3}^{\star}(B)\right)&=0+{\mathbb{E}}\left(\operatorname{{\it Var}}\left(A_{i,3}^{\star}(B)|\mathcal{F}(\boldsymbol{\sigma}_{i}(B,4))\right)\right)\\[2.15277pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{1}{16B^{2}}{\mathbb{E}}\left(\sum\limits_{(j,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}_{i}(b,4))}\operatorname{{\it Var}}\left(\left[{{\boldsymbol{Y}}_{i,\sigma_{i1}(j),\sigma_{i2}(j)}}^{\top}\boldsymbol{T}_{S}{{\boldsymbol{Y}}_{i,\sigma_{i3}(j),\sigma_{i4}(j)}}\right]^{2}\Big{|}\mathcal{F}(\boldsymbol{\sigma}_{i}(B,4))\right)\right)\\[8.61108pt] &\stackrel{{\scriptstyle\ref{QF4}}}{{\leq}}\frac{{\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}_{i}(b,4))|\right)}{B^{2}}\cdot\frac{\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)}{16}\\[4.30554pt] &\stackrel{{\scriptstyle\ref{Menge}}}{{=}}\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{i}-4}{4}}{\binom{n_{i}}{4}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right).\end{array}$

$\begin{array}[]{l}{\mathbb{E}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}^{\star}\right)=\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}{\mathbb{E}}\left(A_{i,1}^{\star}\right)=\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\operatorname{tr}\left(\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right).\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}^{\star}}{\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\right)&=\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}\operatorname{{\it Var}}\left(A_{i,1}^{\star}\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\\[-4.73611pt] &=\frac{\sum\limits_{i=1}^{a}{(\boldsymbol{T}_{W})_{ii}}^{2}\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{i}-2}{2}}{\binom{n_{i}}{2}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right)\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\\[5.59721pt] &\leq\frac{\sum\limits_{i=1}^{a}{(\boldsymbol{T}_{W})_{ii}}^{2}\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}}{\binom{n_{\min}}{2}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right)\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\\[4.30554pt] &\leq\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}}{\binom{n_{\min}}{2}}\right)\cdot\frac{\mathcal{O}\left(\operatorname{tr}^{2}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)\right)}{\operatorname{tr}^{2}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\\[4.30554pt] &=\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}}{\binom{n_{\min}}{2}}\right)\cdot\mathcal{O}\left(1\right).\end{array}$

*For $B(N)\to\infty$ the first factor is a zero sequence and therefore $\sum_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}A_{i,1}^{\star}$ a ratio-consistent, unbiased estimator of $\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right).$

$\begin{array}[]{ll}&{\mathbb{E}}\left(\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}^{\star}+\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}^{\star}\right)\\[10.76385pt] =&\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}{\mathbb{E}}\left(A_{i,3}^{\star}\right)+\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}{\mathbb{E}}\left(A_{i,r,2}^{\star}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}^{\star}}{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&=\frac{\sum\limits_{i=1}^{a}\operatorname{{\it Var}}\left(\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}^{\star}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\[-4.30554pt] &\leq\frac{\sum\limits_{i=1}^{a}{(\boldsymbol{T}_{W})_{ii}}^{4}\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{i}-4}{4}}{\binom{n_{i}}{4}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}^{\star}}{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)}&\leq\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-4}{4}}{\binom{n_{\min}}{4}}\right)\cdot\frac{\sum\limits_{i=1}^{a}{(\boldsymbol{T}_{W})_{ii}}^{4}\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\[-3.44444pt] &\leq\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-4}{4}}{\binom{n_{\min}}{4}}\right)\cdot\frac{\mathcal{O}\left(\operatorname{tr}^{2}\left(\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\right)^{2}\right)\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\\[8.61108pt] &\leq\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-4}{4}}{\binom{n_{\min}}{4}}\right)\cdot\mathcal{O}\left(1\right).\end{array}$

$\begin{array}[]{ll}&\operatorname{{\it Var}}\left(\frac{\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}^{\star}}{\operatorname{tr}\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)}\right)\\[10.76385pt] \leq&\left(\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\frac{\sqrt{\operatorname{{\it Var}}\left(\frac{N^{2}}{n_{i}n_{j}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}^{\star}\right)}}{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)^{2}\end{array}$

$\begin{array}[]{ll}\leq&\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}^{2}}{\binom{n_{\min}}{2}^{2}}\right)\cdot\left(\frac{\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}{(\boldsymbol{T}_{W})_{ir}}^{2}\sqrt{\mathcal{O}\left({\operatorname{tr}^{2}\left(\frac{N}{n_{i}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)}\right)}}{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)^{2}\\[10.76385pt] \leq&\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}^{2}}{\binom{n_{\min}}{2}^{2}}\right)\cdot\left(\frac{\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\mathcal{O}\left({(\boldsymbol{T}_{W})_{ir}}^{2}\operatorname{tr}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\boldsymbol{T}_{S}\frac{N}{n_{r}}\boldsymbol{\Sigma}_{r}\right)\right)}{\sum\limits_{i,r\in{\mathbb{N}}_{a}}{(\boldsymbol{T}_{W})_{ir}}^{2}\operatorname{tr}\left(\boldsymbol{T}_{S}\frac{N}{n_{i}}\boldsymbol{\Sigma}_{i}\frac{N}{n_{r}}\boldsymbol{T}_{S}\boldsymbol{\Sigma}_{r}\right)}\right)^{2}\\[8.61108pt] \leq&\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}^{2}}{\binom{n_{\min}}{2}^{2}}\right)\cdot\mathcal{O}(1).\end{array}$

$\begin{array}[]{ll}&\operatorname{{\it Var}}\left(\frac{\sum\limits_{i=1}^{a}\frac{N^{2}}{n_{i}^{2}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}^{\star}+\sum\limits_{i\neq r\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}^{\star}}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)\\[12.91663pt] \stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}&\left[\sqrt{\frac{\operatorname{{\it Var}}\left(2\sum\limits_{r<i\in{\mathbb{N}}_{a}}\frac{N^{2}}{n_{i}n_{r}}{(\boldsymbol{T}_{W})_{ir}}^{2}A_{i,r,2}^{\star}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}+\sqrt{\frac{\operatorname{{\it Var}}\left(\sum\limits_{i=1}^{a}\frac{N}{n_{i}}{(\boldsymbol{T}_{W})_{ii}}^{2}A_{i,3}^{\star}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}}\right]^{2}\end{array}$

$\begin{array}[]{ll}\leq&\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-2}{2}^{2}}{\binom{n_{\min}}{2}^{2}}\right)\cdot\mathcal{O}(1).\end{array}$

So again this is a zero sequence, and $A_{4}^{\star}$ is an unbiased and dimensional stable (i.e. also ratio consistent) estimator of $\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)$ . ∎

A.3 Proofs of Section 4

Lemma A.13:

For

[TABLE]

we define

[TABLE]

*With this notation it follows that

${\mathbb{E}}\left(C_{5}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right),\hskip 56.9055pt\operatorname{{\it Var}}\left(C_{5}\right)\leq\frac{\left(\prod\limits_{i=1}^{a}{n_{i}\choose 6}-\prod\limits_{i=1}^{a}\binom{n_{i}-6}{6}\right)}{\prod\limits_{i=1}^{a}\binom{n_{i}}{6}}\cdot 27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).$ *

Proof:

Set

[TABLE]

*It then follows that

$\begin{array}[]{ll}&{\mathbb{E}}\left(\boldsymbol{T}\boldsymbol{Z}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}\cdot{\boldsymbol{Z}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}}^{\top}\boldsymbol{T}^{\top}\right)\\[3.44444pt] =&{\mathbb{E}}\left(\left(\sqrt{2}\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}\widetilde{Z}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}\right)\left(\sqrt{2}\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}{\widetilde{\boldsymbol{Z}}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}}\right)^{\top}\right)\\[6.45831pt] =&2\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}{\mathbb{E}}\left(\widetilde{\boldsymbol{Z}}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}{{\widetilde{\boldsymbol{Z}}}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}}^{\top}\right){\boldsymbol{V}_{N}^{1/2}}^{\top}\boldsymbol{T}\\[4.30554pt] =&2\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}\boldsymbol{I}_{ad}{\boldsymbol{V}_{N}^{1/2}}^{\top}\boldsymbol{T}=2\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}.\end{array}$

*With the rules for conditional expectation and the involved independence it follows that

$\begin{array}[]{ll}{\mathbb{E}}\left(C_{5}\right)&=\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1,1},\dots,\ell_{6,1}=1\\ \ell_{1,1}\neq\dots\neq\ell_{6,1}\end{subarray}}^{n_{1}}\dots\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1,a},\dots,\ell_{6,a}=1\\ \ell_{1,a}\neq\dots\neq\ell_{6,a}\end{subarray}}^{n_{a}}\frac{{\mathbb{E}}\left(\Lambda_{1}(\ell_{1,1},\dots,\ell_{6,a})\cdot\Lambda_{2}(\ell_{1,1},\dots,\ell_{6,a})\cdot\Lambda_{3}(\ell_{1,1},\dots,\ell_{6,a})\right)}{8\cdot\prod\limits_{i=1}^{a}\frac{n_{i}!}{\left(n_{i}-6\right)!}}\\[12.91663pt] &=\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1,1},\dots,\ell_{6,1}=1\\ \ell_{1,1}\neq\dots\neq\ell_{6,1}\end{subarray}}^{n_{1}}\dots\sum\limits_{\footnotesize\begin{subarray}{c}\ell_{1,a},\dots,\ell_{6,a}=1\\ \ell_{1,a}\neq\dots\neq\ell_{6,a}\end{subarray}}^{n_{a}}\frac{{\mathbb{E}}\left({\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\cdot{\boldsymbol{Z}_{(3,4)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(5,6)}\cdot{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(1,2)}\right)}{8\cdot\prod\limits_{i=1}^{a}\frac{n_{i}!}{\left(n_{i}-6\right)!}}\\[17.22217pt] \vspace{0.15cm}&=\frac{1}{8}{\mathbb{E}}\left({\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\cdot{\boldsymbol{Z}_{(3,4)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(5,6)}\cdot{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(1,2)}\right)\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}{\mathbb{E}}\left(C_{5}\right)}&=\frac{1}{8}{\mathbb{E}}\left({\mathbb{E}}\left({\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\cdot{\boldsymbol{Z}_{(3,4)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(5,6)}\cdot{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(1,2)}\bigm{|}\boldsymbol{Z}_{(1,2)}\right)\right)\par\\[4.30554pt] &=\frac{1}{8}{\mathbb{E}}\left({\boldsymbol{Z}_{(1,2)}}^{\top}{\mathbb{E}}\left(\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\cdot{\boldsymbol{Z}_{(3,4)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(5,6)}\cdot{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\right)\boldsymbol{Z}_{(1,2)}\right)\\[3.44444pt] &=\frac{4}{8}{\mathbb{E}}\left({\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}\boldsymbol{Z}_{(1,2)}\right)=\frac{1}{2}\operatorname{tr}((\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T})2\boldsymbol{V}_{N})=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right).\end{array}$

*Due to the fact that all ${\boldsymbol{X}}_{i,j}$ are identically distributed we can neglect the concrete indices, as long as we maintain the structure of dependence of the bilinear forms. The last term fulfills the requirements from A.5 (p.A.5) with $\boldsymbol{Z}_{(1,2)}\sim\mathcal{N}\left(\boldsymbol{0}_{ad},2\boldsymbol{V}_{N}\right)$ and the matrix $\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}\boldsymbol{T}\boldsymbol{V}_{N}\boldsymbol{T}$ .

*For the calculation of the variance it is useful to diagonalize the matrix ${\boldsymbol{V}_{N}^{1/2}}^{\top}\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}$ : It exists an orthogonal matrix $\boldsymbol{P}$ with $\boldsymbol{P}{\boldsymbol{V}_{N}^{1/2}}^{\top}\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}\boldsymbol{P}^{\top}=\boldsymbol{D}=\operatorname{\it diag}\left(\lambda_{1},\dots,\lambda_{ad}\right)$ , where $\lambda_{i}$ are the eigenvalues of ${\boldsymbol{V}_{N}^{1/2}}^{\top}\boldsymbol{T}{\boldsymbol{V}_{N}^{1/2}}$ . We define $\boldsymbol{E}_{i}:=\boldsymbol{P}\widetilde{\boldsymbol{Z}}_{(i,j)}$ so with the properties of the standard normal distribution $\boldsymbol{E}_{i}\sim\mathcal{N}_{ad}(\boldsymbol{0}_{ad},\boldsymbol{I}_{ad})$ , where the $\boldsymbol{E}_{i}$ are independent for different indices. Thus, we can rewrite

$\begin{array}[]{l}{\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}={\widetilde{\boldsymbol{Z}}_{(1,2)}}^{\top}2{\boldsymbol{V}_{N}^{1/2}}^{\top}\boldsymbol{T}\boldsymbol{V}_{N}^{1/2}\widetilde{\boldsymbol{Z}}_{(3,4)}=2{\widetilde{\boldsymbol{Z}}_{(1,2)}}^{\top}\boldsymbol{P}^{\top}\boldsymbol{D}\boldsymbol{P}\widetilde{\boldsymbol{Z}}_{(3,4)}=2\boldsymbol{E}_{1}^{\top}\boldsymbol{D}\boldsymbol{E}_{3}.\end{array}$

With this argument for all three random variables it follows for the second moment that

$\begin{array}[]{ll}&{\mathbb{E}}\left(\left[\boldsymbol{E}_{1}^{\top}\boldsymbol{D}\boldsymbol{E}_{3}\boldsymbol{E}_{3}^{\top}\boldsymbol{D}\boldsymbol{E}_{5}\boldsymbol{E}_{5}^{\top}\boldsymbol{D}\boldsymbol{E}_{1}\right]^{2}\right)\\[5.16663pt] =&{\mathbb{E}}\left(\left[\sum_{i=1}^{ad}\lambda_{i}E_{1}^{(i)}E_{3}^{(i)}\right]^{2}\left[\sum_{j=1}^{ad}\lambda_{j}E_{3}^{(j)}E_{5}^{(j)}\right]^{2}\left[\sum_{h=1}^{ad}\lambda_{h}E_{5}^{(h)}E_{1}^{(h)}\right]^{2}\right)\\[6.45831pt] =&\sum\limits_{i_{1},i_{2},j_{1},j_{2},h_{1},h_{2}=1}^{ad}\lambda_{i_{1}}\lambda_{i_{2}}\lambda_{j_{1}}\lambda_{j_{2}}\lambda_{h_{1}}\lambda_{h_{2}}{\mathbb{E}}\left(E_{1}^{(i_{1})}E_{3}^{(i_{1})}E_{1}^{(i_{2})}E_{3}^{(i_{2})}E_{3}^{(j_{1})}E_{5}^{(j_{1})}E_{3}^{(j_{2})}E_{5}^{(j_{2})}E_{5}^{(h_{1})}E_{1}^{(h_{1})}E_{5}^{(h_{2})}E_{1}^{(h_{2})}\right).\end{array}$

Now we consider the expectation value for the different combinations. If all indices are equal, it is given by

[TABLE]

Moreover, for $i_{1}=i_{2}\neq h_{1}=h_{2}$ and $h_{2}\neq j_{1}=j_{2}\neq i_{1}$ it holds that

[TABLE]

Next, the case $i_{1}=i_{2}=j_{1}=j_{2}\neq h_{1}=h_{2}$ is considered (noting this result can also be used for both analogue combinations):

[TABLE]

Finally, we consider the combination $i_{1}=j_{1}=h_{1}\neq i_{2}=j_{2}=h_{2}$ and obtain

[TABLE]

*This is also true for $i_{1}=j_{2}=h_{1}\neq i_{2}=j_{1}=h_{2}$ and the analogue combinations, so, all in all, we have 4 combinations of this kind. All other index combinations lead to expectation zero because in this combinations at least one index appears just one time in the product. Therefore with the independence and the fact that all random variables $E_{i}$ are centered it is true that

$\begin{array}[]{ll}&{\mathbb{E}}\left(\left[\boldsymbol{E}_{1}^{\top}\boldsymbol{D}\boldsymbol{E}_{3}\boldsymbol{E}_{3}^{\top}\boldsymbol{D}\boldsymbol{E}_{5}\boldsymbol{E}_{5}^{\top}\boldsymbol{D}\boldsymbol{E}_{1}\right]^{2}\right)\\[8.61108pt] =&\sum\limits_{i=1}^{ad}\lambda_{i}^{6}\cdot 27+\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{3}\lambda_{j}^{3}\cdot 1\cdot 4+\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{d}\lambda_{i}^{2}\lambda_{j}^{4}\cdot 9+\sum\limits_{\footnotesize\begin{subarray}{c}i,j,h=1\\ i\neq j\neq h\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{2}\lambda_{h}^{2}\\[12.91663pt] =&23\sum\limits_{i=1}^{ad}\lambda_{i}^{6}+4\left(\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{3}\lambda_{j}^{3}+\sum\limits_{i=j=1}^{ad}\lambda_{i}^{3}\lambda_{j}^{3}\right)+9\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}+\sum\limits_{\footnotesize\begin{subarray}{c}i,j,h=1\\ i\neq j\neq h\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{2}\lambda_{h}^{2}\end{array}$

$\begin{array}[]{ll}=&17\sum\limits_{i=1}^{ad}\lambda_{i}^{6}+4\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)+3\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}+6\left(\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}+\sum\limits_{i=j=1}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}\right)+\sum\limits_{\footnotesize\begin{subarray}{c}i,j,h=1\\ i\neq j\neq h\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{2}\lambda_{h}^{2}\\[12.91663pt] =&17\sum\limits_{i=1}^{ad}\lambda_{i}^{6}+4\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)+3\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}+6\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)+\sum\limits_{\footnotesize\begin{subarray}{c}i,j,h=1\\ i\neq j\neq h\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{2}\lambda_{h}^{2}\\[12.91663pt] \stackrel{{\scriptstyle\ref{Spur1}}}{{\leq}}&17\sum\limits_{i=1}^{ad}\lambda_{i}^{6}+4\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)+3\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}+6\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)+\sum\limits_{\footnotesize\begin{subarray}{c}i,j,h=1\\ i\neq j\neq h\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{2}\lambda_{h}^{2}\end{array}$

$\begin{array}[]{ll}\stackrel{{\scriptstyle\ref{Spur1}}}{{\leq}}&20\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)+6\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)+\left(\sum\limits_{\footnotesize\begin{subarray}{c}i,j,h=1\\ i\neq j\neq h\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{2}\lambda_{h}^{2}+3\sum\limits_{\footnotesize\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{ad}\lambda_{i}^{2}\lambda_{j}^{4}+\sum\limits_{i=1}^{ad}\lambda_{i}^{6}\right)\\[10.76385pt] =&20\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)+7\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\\ \stackrel{{\scriptstyle\ref{Spur1}}}{{\leq}}&20\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)+7\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\\ \stackrel{{\scriptstyle\ref{Spur1}}}{{\leq}}&27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

*So we can control the variance by

$\begin{array}[]{ll}\operatorname{{\it Var}}(C_{5})&\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{\operatorname{{\it Var}}\left(\Lambda_{1}(1,2,3,4,5,6,\dots,5,6)\cdot\Lambda_{2}(1,2,3,4,5,6,\dots,5,6)\cdot\Lambda_{3}(1,2,3,4,5,6,\dots,5,6)\right)}{{64\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{6}}\cdot{\left(\prod\limits_{i=1}^{a}\binom{n_{i}}{6}-\prod\limits_{i=1}^{a}\binom{n_{i}-6}{6}\right)^{-1}}}\\[12.91663pt] &{\leq}\frac{{\mathbb{E}}\left(\left[\Lambda_{1}(1,2,3,4,5,6,\dots,5,6)\cdot\Lambda_{2}(1,2,3,4,5,6,\dots,5,6)\cdot\Lambda_{3}(1,2,3,4,5,6,\dots,5,6)\right]^{2}\right)}{{64\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{6}}\cdot{\left(\prod\limits_{i=1}^{a}\binom{n_{i}}{6}-\prod\limits_{i=1}^{a}\binom{n_{i}-6}{6}\right)^{-1}}}\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}(C_{5})}&=\frac{{\mathbb{E}}\left(\left[2^{3}\cdot\boldsymbol{E}_{1}^{\top}\boldsymbol{D}\boldsymbol{E}_{3}\boldsymbol{E}_{3}^{\top}\boldsymbol{D}\boldsymbol{E}_{5}\boldsymbol{E}_{5}^{\top}\boldsymbol{D}\boldsymbol{E}_{1}\right]^{2}\right)}{{64\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{6}}\cdot{\left(\prod\limits_{i=1}^{a}\binom{n_{i}}{6}-\prod\limits_{i=1}^{a}\binom{n_{i}-6}{6}\right)^{-1}}}\\[15.0694pt] &\leq\frac{\left(\prod\limits_{i=1}^{a}{n_{i}\choose 6}-\prod\limits_{i=1}^{a}\binom{n_{i}-6}{6}\right)}{\prod\limits_{i=1}^{a}\binom{n_{i}}{6}}\cdot 27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

∎*

With this result, we can construct an estimator for $\tau_{P}$ step by step:

Lemma A.14:

For $C_{5}$ as previously defined, it holds for fixed $a$ that

[TABLE]

It even holds in the asymptotic frameworks (4)-(5) if $p>1$ exists with $n_{\min}=\mathcal{O}(a^{p})$ .

Proof:

*From the previous lemma, we know that

$\begin{array}[]{ll}{\mathbb{E}}\left(\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&={\mathbb{E}}\left(\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}=0,\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&=\frac{\operatorname{{\it Var}}(C_{5})}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\stackrel{{\scriptstyle\ref{MSchae2}}}{{\leq}}27\cdot\frac{\left(\prod\limits_{i=1}^{a}{n_{i}\choose 6}-\prod\limits_{i=1}^{a}\binom{n_{i}-6}{6}\right)}{\prod\limits_{i=1}^{a}\binom{n_{i}}{6}}.\end{array}$

*For fixed $a$ this is a zero sequence. If we consider $a\to\infty$ we need the existence of $p>1$ and $n_{\min}=\mathcal{O}(a^{p})$ to guarantee that the upper border is a zero sequence.

So in both cases A.6 (p.A.6) can be used. ∎*

Lemma A.15:

Moreover $C_{5}$ holds for fixed $a$

[TABLE]

If $p>1$ exists with $n_{\min}=\mathcal{O}(a^{p})$ , the convergence even holds in the asymptotic frameworks (4)-(5).

Proof:

*With the last lemma it follows for both cases that

$\begin{array}[]{ll}\frac{C_{5}^{2}}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\tau_{P}&=\left(\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)^{2}-\left(\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)^{2}\\[8.61108pt] &\vspace{.25cm}=\left[\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right]\left[\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}+\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right]\end{array}$

$\begin{array}[]{ll}\vspace{.25cm}{\color[rgb]{1,1,1}\frac{C_{5}^{2}}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\tau_{P}}&=o_{P}(1)\cdot\left[\frac{C_{5}}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}+2\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right]\\[5.16663pt] &=o_{P}(1)\cdot\left[o_{P}(1)+2\cdot\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right]=o_{P}(1).\end{array}$

For the last step we used that $\tau_{P}\in[0,1]$ which is known from A.8 (p.A.8) and hence ${\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}\left/{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right.\in[-1,1]$ . As a product of a bound term and a term which converges to zero in probability, it also converges to zero in probability and with Slutzky’s Lemma the result follows. ∎*

Proof of 4.1 :

From 3.1 (p.3.1) together with A.6 (p.A.6) it follows

[TABLE]

*independent of $d$ or $a$ . With A.15 (p.A.15) it follows

[TABLE]

*or under the additional condition also in the asymptotic frameworks (4) -(5) .

*With these limits in both cases we can calculate

$\begin{array}[]{ll}\frac{C_{5}^{2}}{A_{4}^{3}}-\tau_{P}&=\frac{C_{5}^{2}}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\cdot\frac{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}{A_{4}^{3}}-\tau_{P}\\[5.16663pt] &=\frac{C_{5}^{2}}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\cdot(1+o_{P}(1))-\tau_{P}\\[5.16663pt] &=\frac{C_{5}^{2}}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\tau_{P}+\left(\frac{C_{5}^{2}}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\tau_{P}+\tau_{P}\right)\cdot o_{P}(1)\\[7.74998pt] &=o_{P}(1)+o_{P}(1)\cdot o_{P}(1)+\tau_{P}\cdot o_{P}(1)=o_{P}(1).\end{array}$

As in the previous lemma we used $\tau_{P}\in[0,1]$ and Slutzky. ∎*

For ${C_{5}^{\star}}$ the properties are shown in a similar way as in A.12 (p.A.12).

Lemma A.16:

For

[TABLE]

define

[TABLE]

Then it holds

[TABLE]

Proof:

*With the same steps as in the previous lemma and by using the fact that expectation and variance do not depend on the concrete indices but rather on the structure of independences we get

$\begin{array}[]{ll}{\mathbb{E}}\left({C_{5}^{\star}}(B)\right)\par&=\frac{1}{8B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\Lambda_{1}(\boldsymbol{\sigma}(b,6))\cdot\Lambda_{2}(\boldsymbol{\sigma}(b,6))\cdot\Lambda_{3}(\boldsymbol{\sigma}(b,6))\right)\\[7.74998pt] &=\frac{1}{8B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\Lambda_{1}(\ell_{1,1},\dots,\ell_{6,a})\cdot\Lambda_{2}(\ell_{1,1},\dots,\ell_{6,a})\cdot\Lambda_{3}(\ell_{1,1},\dots,\ell_{6,a})\right).\\[7.74998pt] &\stackrel{{\scriptstyle\ref{MSchae2}}}{{=}}\frac{1}{8B}\sum\limits_{b=1}^{B}\operatorname{tr}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right).\end{array}$

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\mathbb{E}}\left({C_{5}^{\star}}(B)|\mathcal{F}(\boldsymbol{\sigma}(B,6))\right)\right)=\operatorname{{\it Var}}\left(\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)\right)=0.\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left({C_{5}^{\star}}(B)\right)&=0+{\mathbb{E}}\left(\operatorname{{\it Var}}\left({C_{5}^{\star}}(B)|\mathcal{F}(\boldsymbol{\sigma}(B,6))\right)\right)\\[4.30554pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{1}{64B^{2}}{\mathbb{E}}\left(\sum\limits_{(j,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}(b,6))}\operatorname{{\it Var}}\left(\Lambda_{1}(\boldsymbol{\sigma}(j,6))\Lambda_{2}(\boldsymbol{\sigma}(j,6))\Lambda_{3}(\boldsymbol{\sigma}(j,6))|\mathcal{F}(\boldsymbol{\sigma}(B,6))\right)\right)\\[10.33327pt] &=\frac{{\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}(b,6))|\right)}{B^{2}}\cdot\frac{\operatorname{{\it Var}}\left({\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\cdot{\boldsymbol{Z}_{(3,4)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(5,6)}\cdot{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(1,2)}\right)}{64}\\[4.30554pt] &\stackrel{{\scriptstyle\ref{MSchae2}}}{{\leq}}\left(1-\left(1-\frac{1}{B}\right)\cdot\prod\limits_{i=1}^{a}\frac{\binom{n_{i}-6}{6}}{\binom{n_{i}}{6}}\right)\cdot 27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

∎

Proof of Theorem 4.2 (p.4.2):

With A.16 we recognize ${\tau_{P}\to 1\Leftrightarrow\widehat{\tau_{P}}\stackrel{{\scriptstyle P}}{{\longrightarrow}}1}$ and $\tau_{P}\to 0\Leftrightarrow\widehat{\tau_{P}}\stackrel{{\scriptstyle P}}{{\longrightarrow}}0$ . Therefore $f_{P}\to 1\Leftrightarrow\widehat{f_{P}}\stackrel{{\scriptstyle P}}{{\longrightarrow}}1$ and $f_{P}\to\infty\Leftrightarrow\widehat{f}_{P}\stackrel{{\scriptstyle P}}{{\longrightarrow}}\infty$ . This is the only condition needed for the proof of [31][Theorem 3.1], so the result follows. ∎

Although $n_{\min}=\mathcal{O}(a^{p})$ with $p>0$ is not too critical in most settings we additionally developed an estimator which can be used without any restrictions.

For this estimator another random vector has to be introduced: The random vector $\pi_{j,i}$ represents a random permutation of the numbers $1,\dots,n_{i},$ where $\pi_{j,i}$ are independent for different $i$ or $j$ and $\pi_{j,i}(l)$ denotes its $l$ -th element. Then we define

[TABLE]

with

[TABLE]

and

[TABLE]

This estimator again uses Z, but different to $C_{5}$ the indices are the same for all groups. However the highest index is $n_{\min}$ and some index combinations are unachievable. For this reason, the above random permutations were used. So first the observations in each group were rearranged randomly and with this rearranged samples we calculated the sum of the used terms. Thereafter, we again rearrange the observations and the same terms as before are calculated. If these values were summed up and divided by the number of rearrangements we get an alternative for $C_{5}$ which is shown in the following lemma.

Lemma A.17:

*For $C_{7}$ as defined before it holds *

[TABLE]

Proof:

*Again we calculate

$\begin{array}[]{ll}{\mathbb{E}}\left(C_{7}\left(w\right)\right)&=\frac{1}{w}\sum\limits_{j=1}^{w}\sum\limits_{\ell_{1}\neq\dots\neq\ell_{6}=1}^{n_{\min}}\frac{{\mathbb{E}}\left(\Lambda_{4}\left(j;\ell_{1},\dots,\ell_{6}\right)\cdot\Lambda_{5}\left(j;\ell_{1},\dots,\ell_{6}\right)\cdot\Lambda_{6}\left(j;\ell_{1},\dots,\ell_{6}\right)\right)}{8\cdot\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}\\[10.76385pt] &=\frac{1}{w}\sum\limits_{j=1}^{w}\sum\limits_{\ell_{1}\neq\dots\neq\ell_{6}=1}^{n_{\min}}\frac{{\mathbb{E}}\left(\Lambda_{4}\left(j;1,\dots,6\right)\cdot\Lambda_{5}\left(j;1,\dots,6\right)\cdot\Lambda_{6}\left(j;1,\dots,6\right)\right)}{8\cdot\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right).\end{array}$

Because of the fact that all groups use the same indices, the number of remaining indexcombinations simplifies and we receive*

[TABLE]

*For the sum this leads to

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(C_{7}\left(w\right)\right)&=\operatorname{{\it Var}}\left(\frac{1}{w}\sum\limits_{j=1}^{w}\sum\limits_{\ell_{1}\neq\dots\neq\ell_{6}=1}^{n_{\min}}\frac{\Lambda_{4}\left(j;\ell_{1},\dots,\ell_{6}\right)\cdot\Lambda_{5}\left(j;\ell_{1},\dots,\ell_{6}\right)\cdot\Lambda_{6}\left(j;\ell_{1},\dots,\ell_{6}\right)}{8\cdot\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}\right)\\[10.76385pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{1}{w^{2}}\sum\limits_{j_{1},j_{2}=1}^{w}\operatorname{{\it Var}}\left(\sum\limits_{\ell_{1}\neq\dots\neq\ell_{6}=1}^{n_{\min}}\frac{\Lambda_{4}\left(j;\ell_{1},\dots,\ell_{6}\right)\cdot\Lambda_{5}\left(j;\ell_{1},\dots,\ell_{6}\right)\cdot\Lambda_{6}\left(j;\ell_{1},\dots,\ell_{6}\right)}{8\cdot\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}\right)\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left(C_{7}\left(w\right)\right)}&\leq\frac{1}{w^{2}}\sum\limits_{j_{1},j_{2}=1}^{w}\left(\frac{\frac{n_{\min}!}{\left(n_{\min}-6\right)!}-\frac{\left(n_{\min}-6\right)!}{\left(n_{\min}-12\right)!}}{\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)\\[10.76385pt] &=\left(\frac{\frac{n_{\min}!}{\left(n_{\min}-6\right)!}-\frac{\left(n_{\min}-6\right)!}{\left(n_{\min}-12\right)!}}{\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right).\end{array}$

∎*

Simulations (not shown here) show that higher values for $w$ lead to better estimations.

Lemma A.18:

For $C_{7}$ as previously defined, it holds

[TABLE]

independent of a or d. Therefore this holds for the asymptotic frameworks (3)-(5).

Proof:

*With the previous lemma we know

$\begin{array}[]{ll}{\mathbb{E}}\left(\frac{C_{7}(w)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&={\mathbb{E}}\left(\frac{C_{7}(w)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}=0,\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{C_{7}(w)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}{\operatorname{tr}^{3/2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&=\frac{\operatorname{{\it Var}}\left(C_{7}(w)\right)}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}{\leq}\left(\frac{\frac{n_{\min}!}{\left(n_{\min}-6\right)!}-\frac{\left(n_{\min}-6\right)!}{\left(n_{\min}-12\right)!}}{\frac{n_{\min}!}{\left(n_{\min}-6\right)!}}\right)\cdot\mathcal{O}\left(1\right).\end{array}$

So exactly the same steps as in the proof of 4.1 , which in this case uses that the zero sequence not depends on $a$ or $d$ , leads to the result. ∎*

But for the calculation of this estimator we need $w\cdot{n_{\min}!}/{\left(n_{\min}-6\right)!}$ summations. Thus, a subsampling-type version of $C_{7}$ is necessary which is now defined.

Lemma A.19:

*For each $b=1,\dots,B$ we independently draw random subsamples $\boldsymbol{\sigma}_{0}(b,6)$ of length $6$ from $\{1,\dots,n_{\min}\}$ and define

${C_{7}^{\star}}\left(w,B\right)=\sum\limits_{j=1}^{w}\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8wB}$

which holds*

[TABLE]

Proof:

*The proof for this subsampling-type estimator takes the same steps as before, with another amount $M(B,\boldsymbol{\sigma}_{0}(b,6))$ . At the beginning we calculate expectation value and an upper bound for the variance of the inner sum. We get

$\begin{array}[]{ll}&{\mathbb{E}}\left(\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8B}\right)\\[10.76385pt] =&\sum\limits_{b=1}^{B}\frac{{\mathbb{E}}\left(\Lambda_{4}\left(j;1,\dots,6\right)\cdot\Lambda_{5}\left(j;1,\dots,6\right)\cdot\Lambda_{6}\left(j;1,\dots,6\right)\right)}{8B}=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right).\end{array}$

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\mathbb{E}}\left(\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8B}\Big{\lvert}\mathcal{F}\left(\boldsymbol{\sigma}_{0}(B)\right)\right)\right)=\operatorname{{\it Var}}\left(\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)\right)=0.\par\end{array}$

$\begin{array}[]{ll}&\operatorname{{\it Var}}\left(\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8B}\right)\\[8.61108pt] =&0+{\mathbb{E}}\left(\operatorname{{\it Var}}\left(\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8B}\right)\Big{\lvert}\mathcal{F}\left(\boldsymbol{\sigma}_{0}(B)\right)\right)\end{array}$

$\begin{array}[]{ll}=&\frac{{\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M\left(B,\boldsymbol{\sigma}_{0}(b,6)\right)|\right)}{B^{2}}\cdot\frac{\operatorname{{\it Var}}\left(\Lambda_{4}\left(j;{1},\dots,{6}\right)\cdot\Lambda_{5}\left(j;{1},\dots,{6}\right)\cdot\Lambda_{6}\left(j;{1},\dots,{6}\right)\right)}{64}\\[4.30554pt] \stackrel{{\scriptstyle\ref{MSchae2}}}{{\leq}}&\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-6}{6}}{\binom{n_{\min}}{6}}\right)\cdot 27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

*With these values we can consider the whole estimator

$\begin{array}[]{ll}{\mathbb{E}}\left({C_{7}^{\star}}\left(w,B\right)\right)&=\frac{1}{w}\sum\limits_{j=1}^{w}{\mathbb{E}}\left(\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8\cdot B}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right),\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left({C_{7}^{\star}}(w,B)\right)&\leq\frac{1}{w^{2}}\left(\sum\limits_{j=1}^{w}\sqrt{\operatorname{{\it Var}}\left(\sum\limits_{b=1}^{B}\frac{\Lambda_{4}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{5}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)\cdot\Lambda_{6}\left(j;\boldsymbol{\sigma}_{0}(b,6)\right)}{8B}\right)}\right)^{2}\\[10.76385pt] &\leq\frac{1}{w^{2}}\left(\sum\limits_{j=1}^{w}\sqrt{\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-6}{6}}{\binom{n_{\min}}{6}}\right)\cdot 27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)^{2}\\[10.76385pt] &=\left(1-\left(1-\frac{1}{B}\right)\cdot\frac{\binom{n_{\min}-6}{6}}{\binom{n_{\min}}{6}}\right)\cdot 27\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right).\end{array}$

∎

The next lemma shows that the version of the estimators with random indices has all the properties the classical ones possess.

Lemma A.20:

*The statements of A.11, A.14, A.15, 4.1 and A.18 are also true, if all or only a part of the estimators are replaced by the subsampling-type estimators.

Moreover, Theorem 3.1 , Theorem 3.2 and Theorem 4.2 hold, if all or only a part of the estimators are replaced by the subsampling-type estimators.*

Proof:

*For the proofs of the classical estimators from the first paragraph, only the expectation values are used together with upper bounds for the variances which are zero sequences. With random indices, the expectation is the same and for the variance, all traces are the same but the zero sequence changes. So the proofs of the subsampling-type estimators work identically.

For the second paragraph, only some convergences are necessary, which the subsampling-type estimators also fulfills. ∎*

A.4 On the asymptotic distribution in our simulation designs

To chose the convenient test for our simulation the limit of $\beta_{1}$ has to be considered. Instead of this we calculate the value of $\tau_{P}={\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{3}\right)}\Big{/}{\operatorname{tr}^{3}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}$ and because $\boldsymbol{V}_{N}$ is known no estimation is needed. The ratio $n_{1}/N$ and $n_{2}/N$ are the same for all our sample sizes, so the different numbers $n_{1},n_{2}$ has no influence on the values of $\tau_{P}$ . The results can be seen in Table 3 and Table 4 which leads to the assumption $\tau_{P}\to 1$ for $H_{a}^{0}$ and $\tau_{P}\to 0$ for $H_{b}^{0}$ . With A.8 (p.A.8) this is equivalent to $\beta_{1}\to 1$ under $H_{0}^{a}$ resp. $\beta_{1}\to 0$ under $H_{0}^{b}$ .

A.5 On the Chen-Qui-Condition

We can also develop an estimator for $\tau_{CQ}=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)/\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)=1/f_{CQ}$ on an analogical way as before. This leads to:

Lemma A.21:

Let be

[TABLE]

*with

$\begin{array}[]{ll}\Lambda_{7}(\ell_{1,1},\dots,\ell_{8,a})&=\left[\boldsymbol{Z}_{(\ell_{1,1},\ell_{2,1},\dots,\ell_{1,a},\ell_{2,a})}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}\right]^{4},\\ \Lambda_{8}(\ell_{1,1},\dots,\ell_{8,a})&=\left[\boldsymbol{Z}_{(\ell_{1,1},\ell_{2,1},\dots,\ell_{1,a},\ell_{2,a})}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(\ell_{3,1},\ell_{4,1},\dots,\ell_{3,a},\ell_{4,a})}\right]^{2}\cdot\left[Z_{(\ell_{5,1},\ell_{6,1},\dots,\ell_{5,a},\ell_{6,a})}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(\ell_{7,1},\ell_{8,1},\dots,\ell_{7,a},\ell_{8,a})}\right]^{2}.\end{array}$

Then we know*

[TABLE]

Proof:

$\begin{array}[]{ll}{\mathbb{E}}(C_{6})&=\frac{{\mathbb{E}}\left(\left[\boldsymbol{Z}_{(1,2)}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{4}\right)}{6\cdot 16}-\frac{{\mathbb{E}}\left(\left[\boldsymbol{Z}_{(1,2)}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{2}\left[\boldsymbol{Z}_{(5,6)}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(7,8)}\right]^{2}\right)}{2\cdot 16}\\[4.30554pt] &\stackrel{{\scriptstyle\ref{QF3}}}{{=}}\frac{1}{6\cdot 16}\left(6\operatorname{tr}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)+3\operatorname{tr}^{2}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)-\frac{1}{2\cdot 16}\operatorname{tr}^{2}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)\end{array}$

*For the second inequality, the variance of parts is calculated. Like before with A.2 (p.A.2) and A.4 (p.A.4) we calculate

$\operatorname{{\it Var}}\left(\frac{1}{6}\left[{\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{4}\right)=\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)$

*and

$\begin{array}[]{ll}&\operatorname{{\it Var}}\left(\frac{1}{2}\left[{\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{2}\left[{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(7,8)}\right]^{2}\right)\\[6.45831pt] \leq&\frac{1}{4}\cdot{\mathbb{E}}\left(\left[{\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{4}\left[{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(7,8)}\right]^{4}\right)\\[6.45831pt] =&\frac{1}{4}\left(6\operatorname{tr}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)+3\operatorname{tr}^{2}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)^{2}=\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right).\end{array}$

With A.7 (p.A.7) it is known

$\begin{array}[]{ll}\operatorname{{\it Var}}(A-B)&\leq\operatorname{{\it Var}}(A)+\operatorname{{\it Var}}(B)+2|\operatorname{{\it Cov}}(A,B)|\leq\left(\sqrt{\operatorname{{\it Var}}(A)}+\sqrt{\operatorname{{\it Var}}(B)}\right)^{2}\end{array}$

and therefore

$\begin{array}[]{ll}\operatorname{{\it Var}}(C_{6})&\leq\frac{\prod\limits_{i=1}^{a}\binom{n_{i}}{8}-\prod\limits_{i=1}^{a}\binom{n_{i}-8}{8}}{16^{2}\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{8}}\operatorname{{\it Var}}\left(\frac{1}{6}\Lambda_{7}(1,\dots,8)-\frac{1}{2}\Lambda_{8}(1,\dots,8)\right)\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}(C_{6})}&\leq\frac{\prod\limits_{i=1}^{a}\binom{n_{i}}{8}-\prod\limits_{i=1}^{a}\binom{n_{i}-8}{8}}{16^{2}\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{8}}\left(\sqrt{\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)}+\sqrt{\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)}\right)^{2}\end{array}$

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}(C_{6})}&=\frac{\prod\limits_{i=1}^{a}\binom{n_{i}}{8}-\prod\limits_{i=1}^{a}\binom{n_{i}-8}{8}}{16^{2}\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{8}}{\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)}.\end{array}$

∎

Lemma A.22:

With the estimators introduced in the previous lemmata it holds for fixed $a$

[TABLE]

If $p>1$ exists with $n_{\min}=\mathcal{O}(a^{p})$ , the convergence even holds in the asymptotic frameworks (4)-(5).

Proof:

*Again we first consider the parts:

$\begin{array}[]{ll}{\mathbb{E}}\left(\frac{C_{6}}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)=\frac{{\mathbb{E}}\left(C_{6}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}=0.\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left(\frac{C_{6}}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}-\frac{\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)}{\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\right)&\leq\frac{\prod\limits_{i=1}^{a}\binom{n_{i}}{8}-\prod\limits_{i=1}^{a}\binom{n_{i}-8}{8}}{16^{2}\cdot\prod\limits_{i=1}^{a}\binom{n_{i}}{8}}\frac{\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right)}{\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\leq\frac{\prod\limits_{i=1}^{a}\binom{n_{i}}{8}-\prod\limits_{i=1}^{a}\binom{n_{i}-8}{8}}{\prod\limits_{i=1}^{a}\binom{n_{i}}{8}}\cdot\mathcal{O}(1).\end{array}$

So with A.6 (p.A.6) for fixed $a$ and $d,n_{\min}\to\infty$ and moreover if the additional condition is fulfilled even for the asymptotic frameworks (4)-(5), it follows*

[TABLE]

*Analogue to the proof of 4.1 it follows ${\operatorname{tr}^{2}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)}\Big{/}{A_{4}^{2}}\stackrel{{\scriptstyle P}}{{\longrightarrow}}1.$

Together this leads to

[TABLE]

∎

Again in most cases the subsampling-type version of this estimator should be used.

Lemma A.23:

Let be

[TABLE]

Then it holds

[TABLE]

Proof:

*By using the same steps as before it holds

$\begin{array}[]{ll}{\mathbb{E}}\left({C_{6}^{\star}}(B)\right)&=\frac{1}{16B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\frac{\Lambda_{7}(\ell_{1,1},\dots,\ell_{8,a})}{6}-\frac{\Lambda_{8}(\ell_{1,1},\dots,\ell_{8,a})}{2}\right)\\[6.88889pt] &=\frac{1}{16B}\sum\limits_{b=1}^{B}{\mathbb{E}}\left(\left[{\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{2}\cdot\left(\frac{\left[{\boldsymbol{Z}_{(1,2)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(3,4)}\right]^{2}}{6}-\frac{\left[{\boldsymbol{Z}_{(5,6)}}^{\top}\boldsymbol{T}\boldsymbol{Z}_{(7,8)}\right]^{2}}{2}\right)\right)\\[6.88889pt] &\stackrel{{\scriptstyle\ref{MSchae3}}}{{=}}\frac{1}{16B}\sum\limits_{b=1}^{B}\operatorname{tr}\left(\left(2\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)=\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right).\end{array}$

$\begin{array}[]{l}\operatorname{{\it Var}}\left({\mathbb{E}}\left({C_{6}^{\star}}(B)|\mathcal{F}(\boldsymbol{\sigma}(B,8))\right)\right)=\operatorname{{\it Var}}\left(\operatorname{tr}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{4}\right)\right)=0.\end{array}$

$\begin{array}[]{ll}\operatorname{{\it Var}}\left({C_{6}^{\star}}(B)\right)&=0+{\mathbb{E}}\left(\operatorname{{\it Var}}\left({C_{6}^{\star}}(B)|\mathcal{F}(\boldsymbol{\sigma}(B,8))\right)\right)\\[2.15277pt] &\stackrel{{\scriptstyle\ref{Var1}}}{{\leq}}\frac{1}{16^{2}B^{2}}{\mathbb{E}}\left(\sum\limits_{(j,\ell)\in{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}(b,8))}\operatorname{{\it Var}}\left(\frac{\Lambda_{7}(\boldsymbol{\sigma}(j,8))}{6}-\frac{\Lambda_{8}(\boldsymbol{\sigma}(j,8))}{2}\Big{\lvert}\mathcal{F}(\boldsymbol{\sigma}(B,8))\right)\right)\\[3.44444pt] \end{array}$

**

$\begin{array}[]{ll}{\color[rgb]{1,1,1}\operatorname{{\it Var}}\left({C_{6}^{\star}}(B)\right)}&=\frac{\operatorname{{\it Var}}\left(\frac{\Lambda_{7}(\ell_{1,1},\dots,\ell_{8,a})}{6}-\frac{\Lambda_{8}(\ell_{1,1},\dots,\ell_{8,a})}{2}\right)}{16^{2}B\cdot\left({\mathbb{E}}\left(|{\mathbb{N}}_{B}\times{\mathbb{N}}_{B}\setminus M(B,\boldsymbol{\sigma}(b,8))|\right)\right)^{-1}}\\[6.45831pt] &\stackrel{{\scriptstyle\ref{MSchae3}}}{{\leq}}\left(1-\left(1-\frac{1}{B}\right)\cdot\prod\limits_{i=1}^{a}\frac{\binom{n_{i}-8}{8}}{\binom{n_{i}}{8}}\right)\cdot\mathcal{O}\left(\operatorname{tr}^{4}\left(\left(\boldsymbol{T}\boldsymbol{V}_{N}\right)^{2}\right)\right).\par\end{array}$ **

∎

With A.19 we get an estimator for $\tau_{CQ}$ with $\widehat{\tau_{CQ}}({C_{6}^{\star}},A_{4})={{C_{6}^{\star}}}/{A_{4}^{2}}$ and once more for a large number of groups ${A_{4}^{\star}}$ should be used.

Lemma A.24:

Theorem 4.1* is also valid if $f_{P}$ is replaced by $f_{CQ}$ or by $(\widehat{\tau_{CQ}}({C_{6}},A_{4}))^{-1}$ . Using ${C_{6}^{\star}}$ or ${A_{4}^{\star}}$ also doesn’t change the result. Identical the result of A.22 remains true if one or all estimators are replaced by their subsampling version.*

Proof:

*With A.8 we know $f_{p}\to 1\Leftrightarrow f_{CQ}\to 1$ and $f_{p}\to 0\Leftrightarrow f_{CQ}\to 0$ so in both cases $K_{f_{P}}$ is asymptotically identic with $K_{f_{CQ}}$ .

From A.22 we know that $\widehat{\tau_{CQ}}-\tau_{CQ}$ converges in probability to zero so this result follows identically to Theorem 4.1. At last the subsampling versions have the same properties like the standard estimators. ∎

Therefore this is a second way to test the hypotheses and moreover, it provides an indicator for the choice of the limit distribution, because of A.8. For situation c) from Theorem 3.1 there is no proof that this approach can be used but in the case of just one group it leads to good results.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ahmad, M. R., Werner, C. and Brunner, E. (2008). Analysis of High Dimensional Repeated Measures Designs: The One Sample Case. Computational Statistics and Data Analysis , 53 , 416–427.
2[2] Bai, Z. and Saranadasa, H. (1996) : Effect of highdimension: by an example of a two sample problem. Statistica Sinica 6, 311-329.
3[3] Bathke, A.C. and Harrar, S.W. (2008). Nonparametric methods in multivariate factorial designs for large number of factor levels. Journal of Statistical Planning and Inference , 138 ,588–610.
4[4] Bathke, A.C., Harrar, S.W. and Madden, L.V. (2008). How to compare small multivariate samples using nonparametric tests. Computational Statistics and Data Analysis , 52 , 4951–4965.
5[5] Billingsley, P. (1968) : Convergence of probability measures. John Wiley & Sons, New York.
6[6] Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. The Annals of Mathematical Statistics , 25 , 290–302.
7[7] Brunner, E. (2009): Repeated measures under non-sphericity. Proceedings of the 6th St. Petersburg Workshop on Simulation.
8[8] Brunner, E., Becker, B. and Werner, C. (2010) : Approximate distributions of quadratic forms in high-dimensional repeated-measures designs. Technical Report, Department Medizinische Statistik Georg-August-Universität Göttingen