Causal Inference from Possibly Unbalanced Split-Plot Designs: A   Randomization-based Perspective

Rahul Mukerjee; Tirthankar Dasgupta

arXiv:1906.08420·stat.ME·June 21, 2019

Causal Inference from Possibly Unbalanced Split-Plot Designs: A Randomization-based Perspective

Rahul Mukerjee, Tirthankar Dasgupta

PDF

Open Access

TL;DR

This paper develops new methods for causal inference in unbalanced split-plot designs, providing unbiased variance estimators and a construction procedure to improve inference accuracy in complex experimental setups.

Contribution

It extends randomization-based causal inference methods to unbalanced split-plot designs, introducing a new unbiased variance estimator and a minimax bias construction procedure.

Findings

01

Derived a sampling variance expression for treatment contrasts.

02

Proposed a new unbiased variance estimator under milder conditions.

03

Introduced a minimax bias construction procedure.

Abstract

Split-plot designs find wide applicability in multifactor experiments with randomization restrictions. Practical considerations often warrant the use of unbalanced designs. This paper investigates randomization based causal inference in split-plot designs that are possibly unbalanced. Extension of ideas from the recently studied balanced case yields an expression for the sampling variance of a treatment contrast estimator as well as a conservative estimator of the sampling variance. However, the bias of this variance estimator does not vanish even when the treatment effects are strictly additive. A careful and involved matrix analysis is employed to overcome this difficulty, resulting in a new variance estimator, which becomes unbiased under milder conditions. A construction procedure that generates such an estimator with minimax bias is proposed.

Tables1

Table 1. Table 1 : Simulation settings

Population	$θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4}$	$σ_{1}^{2}$	$σ_{2}^{2}$	$σ_{3}^{2}$	$σ_{4}^{2}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{4}$
I	(10,5,9,8)	(10,5,9,8)	(10,5,9,8)	(10,5,9,8)	2	2	2	2	1	1	1	1
II	(10,5,9,8)	(9,7,4,6)	(11,8,7,8)	(8,7,6,9)	2.5	2	2	3	.5	.5	.5	.5
III	(10,5,9,8)	(5,9,10,8)	(10,9,8,5)	(10,5,8,9)	2.5	2	2	3	1	1	1	1
IV	(10,5,9,8)	(5,9,10,8)	(10,9,8,5)	(10,5,8,9)	2.5	2	2	3	.5	.5	.5	.5
V	(10,5,9,8)	(5,9,10,8)	(10,9,8,5)	(10,5,8,9)	2.5	2	2	3	.2	.4	.6	.8
VI	(10,5,9,8)	(5,9,10,8)	(10,9,8,5)	(10,5,8,9)	2.5	2	2	3	0	0	0	0
VII	(10,5,9,8)	(5,9,10,8)	(10,9,8,5)	(10,5,8,9)	2.5	2	2	3	-.3	-.3	-.3	-.3
VIII	(10,5,9,8)	(5,9,10,8)	(10,9,8,5)	(10,5,8,9)	2.5	2	2	3	-.3	.3	-.3	.3

Equations131

τ_{i} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) Y_{i} (z_{1} z_{2}),

τ_{i} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) Y_{i} (z_{1} z_{2}),

\overline{Y} (z_{1} z_{2}) = N^{- 1} i = 1 \sum N Y_{i} (z_{1} z_{2}),

\overline{Y} (z_{1} z_{2}) = N^{- 1} i = 1 \sum N Y_{i} (z_{1} z_{2}),

\overline{τ} = N^{- 1} i = 1 \sum N τ_{i} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) \overline{Y} (z_{1} z_{2}),

\overline{τ} = N^{- 1} i = 1 \sum N τ_{i} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) \overline{Y} (z_{1} z_{2}),

M_{1} = \dots = M_{W}, r_{12} (z_{2}) = \dots = r_{W 2} (z_{2}), \mbox f or a l l z_{2} \in Z_{2} .

M_{1} = \dots = M_{W}, r_{12} (z_{2}) = \dots = r_{W 2} (z_{2}), \mbox f or a l l z_{2} \in Z_{2} .

\overline{Y}_{w}^{obs} (z_{1} z_{2}) = {r_{w 2} (z_{2})}^{- 1} i \in T_{w 2} (z_{2}) \sum Y_{i} (z_{1} z_{2}),

\overline{Y}_{w}^{obs} (z_{1} z_{2}) = {r_{w 2} (z_{2})}^{- 1} i \in T_{w 2} (z_{2}) \sum Y_{i} (z_{1} z_{2}),

\overline{Y}^{obs} (z_{1} z_{2}) = \frac{W}{N r _{1} ( z _{1} )} w \in T_{1} (z_{1}) \sum M_{w} \overline{Y}_{w}^{obs} (z_{1} z_{2}) = \frac{1}{r _{1} ( z _{1} )} w \in T_{1} (z_{1}) \sum \frac{M _{w}}{M} \overline{Y}_{w}^{obs} (z_{1} z_{2}),

\overline{Y}^{obs} (z_{1} z_{2}) = \frac{W}{N r _{1} ( z _{1} )} w \in T_{1} (z_{1}) \sum M_{w} \overline{Y}_{w}^{obs} (z_{1} z_{2}) = \frac{1}{r _{1} ( z _{1} )} w \in T_{1} (z_{1}) \sum \frac{M _{w}}{M} \overline{Y}_{w}^{obs} (z_{1} z_{2}),

\overline{τ} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) \overline{Y}^{obs} (z_{1} z_{2}),

\overline{τ} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) \overline{Y}^{obs} (z_{1} z_{2}),

U_{i} (z_{1} z_{2}) = (M_{w} / \overline{M}) Y_{i} (z_{1} z_{2}),

U_{i} (z_{1} z_{2}) = (M_{w} / \overline{M}) Y_{i} (z_{1} z_{2}),

S_{bt} (z_{1} z_{2}, z_{1}^{*} z_{2}^{*}) = \frac{M}{W - 1} w = 1 \sum W {\overline{U}_{w} (z_{1} z_{2}) - \overline{U} (z_{1} z_{2})} {\overline{U}_{w} (z_{1}^{*} z_{2}^{*}) - \overline{U} (z_{1}^{*} z_{2}^{*})},

S_{bt} (z_{1} z_{2}, z_{1}^{*} z_{2}^{*}) = \frac{M}{W - 1} w = 1 \sum W {\overline{U}_{w} (z_{1} z_{2}) - \overline{U} (z_{1} z_{2})} {\overline{U}_{w} (z_{1}^{*} z_{2}^{*}) - \overline{U} (z_{1}^{*} z_{2}^{*})},

S_{in, w} (z_{1} z_{2}, z_{1}^{*} z_{2}^{*}) = \frac{1}{M _{w} - 1} i \in Ω_{w} \sum {U_{i} (z_{1} z_{2}) - \overline{U}_{w} (z_{1} z_{2})} {U_{i} (z_{1}^{*} z_{2}^{*}) - \overline{U}_{w} (z_{1}^{*} z_{2}^{*})} .

\overline{τ}_{w} = (1/ M_{w}) i \in Ω_{w} \sum τ_{i} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) \overline{Y}_{w} (z_{1} z_{2}), w = 1, \dots, W,

\overline{τ}_{w} = (1/ M_{w}) i \in Ω_{w} \sum τ_{i} = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum g (z_{1} z_{2}) \overline{Y}_{w} (z_{1} z_{2}), w = 1, \dots, W,

\overline{τ} = (1/ W) w = 1 \sum W (M_{w} / \overline{M}) \overline{τ}_{w} .

\overline{τ} = (1/ W) w = 1 \sum W (M_{w} / \overline{M}) \overline{τ}_{w} .

Δ = \frac{1}{W ( W - 1 )} w = 1 \sum W {(M_{w} / \overline{M}) \overline{τ}_{w} - \overline{τ}}^{2},

Δ = \frac{1}{W ( W - 1 )} w = 1 \sum W {(M_{w} / \overline{M}) \overline{τ}_{w} - \overline{τ}}^{2},

var (\overline{τ})

var (\overline{τ})

\overline{U}_{w}^{obs} (z_{1} z_{2})

\overline{U}_{w}^{obs} (z_{1} z_{2})

\overline{U}^{obs} (z_{1} z_{2})

\overline{Y}^{obs} (z_{1} z_{2}) = \overline{U}^{obs} (z_{1} z_{2}) .

\overline{Y}^{obs} (z_{1} z_{2}) = \overline{U}^{obs} (z_{1} z_{2}) .

V (\overline{τ}) = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum z_{2}^{*} \in Z_{2} \sum \frac{g ( z _{1} z _{2} ) g ( z _{1} z _{2}^{*} )}{r _{1} ( z _{1} )} S (z_{1} z_{2}, z_{1} z_{2}^{*}),

V (\overline{τ}) = z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum z_{2}^{*} \in Z_{2} \sum \frac{g ( z _{1} z _{2} ) g ( z _{1} z _{2}^{*} )}{r _{1} ( z _{1} )} S (z_{1} z_{2}, z_{1} z_{2}^{*}),

S (z_{1} z_{2}, z_{1} z_{2}^{*}) = \frac{1}{r _{1} ( z _{1} ) - 1} w \in T_{1} (z_{1}) \sum {\overline{U}_{w}^{obs} (z_{1} z_{2}) - \overline{U}^{obs} (z_{1} z_{2})} {\overline{U}_{w}^{obs} (z_{1} z_{2}^{*}) - \overline{U}^{obs} (z_{1} z_{2}^{*})} .

S (z_{1} z_{2}, z_{1} z_{2}^{*}) = \frac{1}{r _{1} ( z _{1} ) - 1} w \in T_{1} (z_{1}) \sum {\overline{U}_{w}^{obs} (z_{1} z_{2}) - \overline{U}^{obs} (z_{1} z_{2})} {\overline{U}_{w}^{obs} (z_{1} z_{2}^{*}) - \overline{U}^{obs} (z_{1} z_{2}^{*})} .

\overline{Y}_{1} (z_{1} z_{2}) - \overline{Y}_{1} (z_{1}^{*} z_{2}^{*}) = \dots = \overline{Y}_{W} (z_{1} z_{2}) - \overline{Y}_{W} (z_{1}^{*} z_{2}^{*}),

\overline{Y}_{1} (z_{1} z_{2}) - \overline{Y}_{1} (z_{1}^{*} z_{2}^{*}) = \dots = \overline{Y}_{W} (z_{1} z_{2}) - \overline{Y}_{W} (z_{1}^{*} z_{2}^{*}),

Δ = \frac{τ ^{2}}{W ( W - 1 ) M ^{2}} w = 1 \sum W (M_{w} - \overline{M})^{2},

Δ = \frac{τ ^{2}}{W ( W - 1 ) M ^{2}} w = 1 \sum W (M_{w} - \overline{M})^{2},

Δ = (1/ N)^{2} w = 1 \sum W M_{w}^{2} \overline{τ}_{w}^{2} - w = 1 \sum W w^{*} (\neq = w) = 1 \sum W {M_{w} M_{w^{*}} / (W - 1)} \overline{τ}_{w} \overline{τ}_{w^{*}} .

Δ = (1/ N)^{2} w = 1 \sum W M_{w}^{2} \overline{τ}_{w}^{2} - w = 1 \sum W w^{*} (\neq = w) = 1 \sum W {M_{w} M_{w^{*}} / (W - 1)} \overline{τ}_{w} \overline{τ}_{w^{*}} .

\overline{τ}_{w}^{2} = (1/ M_{w})^{2} z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum z_{1}^{*} \in Z_{1} \sum z_{2}^{*} \in Z_{2} \sum i \in Ω_{w} \sum i^{*} \in Ω_{w} \sum g (z_{1} z_{2}) g (z_{1}^{*} z_{2}^{*}) Y_{i} (z_{1} z_{2}) Y_{i^{*}} (z_{1}^{*} z_{2}^{*}) .

\overline{τ}_{w}^{2} = (1/ M_{w})^{2} z_{1} \in Z_{1} \sum z_{2} \in Z_{2} \sum z_{1}^{*} \in Z_{1} \sum z_{2}^{*} \in Z_{2} \sum i \in Ω_{w} \sum i^{*} \in Ω_{w} \sum g (z_{1} z_{2}) g (z_{1}^{*} z_{2}^{*}) Y_{i} (z_{1} z_{2}) Y_{i^{*}} (z_{1}^{*} z_{2}^{*}) .

G_{w}^{obs} = z_{2} \in Z_{2} \sum g (z_{1 w} z_{2}) \overline{Y}_{w}^{obs} (z_{1 w} z_{2}) .

G_{w}^{obs} = z_{2} \in Z_{2} \sum g (z_{1 w} z_{2}) \overline{Y}_{w}^{obs} (z_{1 w} z_{2}) .

H_{w w^{*}} = \frac{W ( W - 1 ) G _{w}^{obs} G _{w^{*}}^{obs}}{r _{1} ( z _{1 w} ) { r _{1} ( z _{1 w^{*}} ) - δ ( z _{1 w} , z _{1 w^{*}} ) }},

H_{w w^{*}} = \frac{W ( W - 1 ) G _{w}^{obs} G _{w^{*}}^{obs}}{r _{1} ( z _{1 w} ) { r _{1} ( z _{1 w^{*}} ) - δ ( z _{1 w} , z _{1 w^{*}} ) }},

V (\overline{τ}) = V (\overline{τ}) + (1/ N^{2}) w = 1 \sum W w^{*} (\neq = w) = 1 \sum W [b_{w w^{*}} + {M_{w} M_{w^{*}} / (W - 1)}] H_{w w^{*}},

V (\overline{τ}) = V (\overline{τ}) + (1/ N^{2}) w = 1 \sum W w^{*} (\neq = w) = 1 \sum W [b_{w w^{*}} + {M_{w} M_{w^{*}} / (W - 1)}] H_{w w^{*}},

E {V (\overline{τ})}

E {V (\overline{τ})}

Δ = (1/ N^{2}) w = 1 \sum W w^{*} = 1 \sum W b_{w w^{*}} \overline{τ}_{w} \overline{τ}_{w^{*}} .

Δ = (1/ N^{2}) w = 1 \sum W w^{*} = 1 \sum W b_{w w^{*}} \overline{τ}_{w} \overline{τ}_{w^{*}} .

M_{1} \leq M_{2} \leq \dots \leq M_{W} .

M_{1} \leq M_{2} \leq \dots \leq M_{W} .

B=\left[\begin{array}[]{ccc}M_{1}^{2}&(M_{3}^{2}-M_{1}^{2}-M_{2}^{2})/2&(M_{2}^{2}-M_{1}^{2}-M_{3}^{2})/2\\ (M_{3}^{2}-M_{2}^{2}-M_{1}^{2})/2&M_{2}^{2}&(M_{1}^{2}-M_{2}^{2}-M_{3}^{2})/2\\ (M_{2}^{2}-M_{3}^{2}-M_{1}^{2})/2&(M_{1}^{2}-M_{3}^{2}-M_{2}^{2})/2&M_{3}^{2}\end{array}\right].

B=\left[\begin{array}[]{ccc}M_{1}^{2}&(M_{3}^{2}-M_{1}^{2}-M_{2}^{2})/2&(M_{2}^{2}-M_{1}^{2}-M_{3}^{2})/2\\ (M_{3}^{2}-M_{2}^{2}-M_{1}^{2})/2&M_{2}^{2}&(M_{1}^{2}-M_{2}^{2}-M_{3}^{2})/2\\ (M_{2}^{2}-M_{3}^{2}-M_{1}^{2})/2&(M_{1}^{2}-M_{3}^{2}-M_{2}^{2})/2&M_{3}^{2}\end{array}\right].

M_{W} < M_{1} + \dots + M_{W - 1},

M_{W} < M_{1} + \dots + M_{W - 1},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Optimal Experimental Design Methods · Statistical Methods and Bayesian Inference

Full text

Causal Inference from Possibly Unbalanced Split-Plot Designs: A Randomization-based Perspective

Rahul Mukerjee

Indian Institute of Management Calcutta, Joka, Diamond Harbour Road, Kolkata 700104, India, email: [email protected]

Tirthankar Dasgupta

Abstract

Split-plot designs find wide applicability in multifactor experiments with randomization restrictions. Practical considerations often warrant the use of unbalanced designs. This paper investigates randomization based causal inference in split-plot designs that are possibly unbalanced. Extension of ideas from the recently studied balanced case yields an expression for the sampling variance of a treatment contrast estimator as well as a conservative estimator of the sampling variance. However, the bias of this variance estimator does not vanish even when the treatment effects are strictly additive. A careful and involved matrix analysis is employed to overcome this difficulty, resulting in a new variance estimator, which becomes unbiased under milder conditions. A construction procedure that generates such an estimator with minimax bias is proposed.

Keywords: Bias; Factorial experiment; Finite population; Minimaxity; Treatment-effect additivity.

Introduction

Factorial experiments were originally developed in the context of agricultural experiments (Fisher, 1925, 1935; Yates, 1935) and later extensively used in industrial and engineering applications (Wu and Hamada, 2009). Such experiments have currently been undergoing a third popularity surge among social, behavioral, and biomedical sciences. However, one of the key challenges of using standard principles of designing and analyzing factorial experiments in these fields arises from randomization restrictions. Consider a simplified version of the education experiment described in Dasgupta et al., (2015). Suppose the goal is to assess the causal effects of two interventions (referred to as factors in experimental design literature) – $F_{1}$ : a mid-year quality review by a team of experts, and $F_{2}$ : a bonus scheme for teachers – on the performances of 40 schools in the state of New York. Each factor has two levels denoted by 1 (application) and 0 (non-application). A completely randomized assignment of the 40 schools to the four treatment combinations $00,01,10,11$ is likely to disperse the schools assigned to level 1 of factor $F_{1}$ (i.e., schools to undergo review) all over the state. Such a design may be prohibitive from the consideration of travel cost and time. A more practical alternative would be to divide these 40 schools by geographic proximity into four groups called whole-plots. Two of these whole-plots would then be randomly assigned to level 0 and the other two to level 1 of factor $F_{1}$ . The teacher bonus scheme can then be applied to half of the schools chosen randomly within each whole-plot. Such a randomization scheme is an example of a classic split-plot design. See Kirk, (1982), Cochran and Cox, (1957), Box et al., (2005), and Wu and Hamada, (2009) for formal definitions.

Randomization-based inference is the most natural methodology to draw inference on causal effects of treatments from split-plot experiments in a finite population setting, as observed by Freedman, (2006, 2008). Recently, Zhao et al., (2018) developed a framework for randomization-based estimation procedure of finite-population causal effects for balanced split-plot designs, in which each whole-plot consists of the same number of units or sub-plots, and any treatment combination of the sub-plot factors occurs equally often in all whole-plots; vide (4) below. However, unbalanced split-plot designs are quite common in the social sciences. Consider the school experiment described earlier. Suppose the 40 schools are spread over four counties with 8, 8, 12 and 12 schools in these counties. In this case, each county can be considered as a natural whole-plot. Thus the design is unbalanced and the estimation methodology proposed by Zhao et al., (2018) is no longer applicable.

In this paper we investigate randomization based causal inference in split-plot designs that are possibly unbalanced, using the potential outcomes framework (Neyman, 1923; Rubin, 1974, 1978, 2005). We start with a natural unbiased estimator of a typical treatment contrast and first examine how far the approach of Zhao et al., (2018) for the balanced case can be adapted to our more general setup. It is seen that this approach, aided by a variable transformation, yields an expression for the sampling variance of the treatment contrast estimator but runs into difficulty in variance estimation. Specifically, as in the balanced case and other situations in causal inference, the resulting variance estimator is conservative in the sense of having a nonnegative bias. However, unlike in most standard situations, the bias does not vanish even under strict additivity or homegeneity of treatment effects. To overcome this problem, a careful matrix analysis is employed leading, under wide generality, to a new variance estimator. This estimator is also conservative, but enjoys the nice property of becoming unbiased under between-whole-plot additivity, a condition even milder than strict additivity. We also discuss the issue of minimaxity, with a view to controlling the bias in variance estimation and explore the bias of the estimator under treatment effect heterogeneity via simulations.

Treatment contrast and its unbiased estimation

Consider a factorial experiment conducted to assess causal effects of $m_{1}$ whole-plot factors $F_{11},\ldots,F_{1m_{1}}$ and $m_{2}$ sub-plot factors $F_{21},\ldots,F_{2m_{2}}$ on a finite population of $N$ units. Each factor has two or more levels. The treatment combinations are denoted by $z=z_{1}z_{2}$ , where $z_{k}\in Z_{k}$ and $Z_{k}$ is the set of level combinations of $F_{k1},\ldots,F_{km_{k}}\ (k=1,2)$ . For $i=1,\ldots,N$ , let $Y_{i}(z_{1}z_{2})$ denote the potential outcome of unit $i$ when exposed to treatment combination $z_{1}z_{2}$ . A typical treatment contrast for unit $i$ of the form

[TABLE]

where $g(z_{1}z_{2})$ , $z_{1}\in Z_{1},z_{2}\in Z_{2}$ are known, not all zeros, and sum to zero. Let

[TABLE]

denote the average potential outcome for treatment combination $z_{1}z_{2}$ , and let

[TABLE]

denote a treatment contrast for the finite population of $N$ units. We define $\overline{\tau}$ as the finite-population causal estimand of interest and consider the problem of drawing inference on $\overline{\tau}$ using the outcomes observed from the experiment.

The observed outcomes are generated through an assignment mechanism, which is the process of allocating treatment combinations to the $N$ units. Here we consider a split-plot assignment mechanism which can be described as follows. Suppose there is a partitioning of the $N$ experimental units into $W(\geq 2)$ disjoint sets $\Omega_{1},\ldots,\Omega_{W}$ , called whole-plots, such that $\Omega_{w}$ consists of $M_{w}(\geq 2)$ units, called sub-plots, $w=1,\ldots,W$ , and $M_{1}+\ldots M_{W}=N$ . Consider now a two-stage randomization, which assigns $r_{1}(z_{1})$ whole-plots to level combination $z_{1}$ of $F_{11},\ldots F_{1m_{1}}$ and then, for each $w=1,\ldots,W$ , assigns $r_{w2}(z_{2})$ sub-plots within whole-plot $\Omega_{w}$ to level combination $z_{2}$ of $F_{21},\ldots F_{2m_{2}}$ . Here at each stage all assignments are equiprobable, the $r_{1}(z_{1})$ and $r_{w2}(z_{2})$ are fixed positive integers, and $\sum_{z_{1}\in Z_{1}}r_{1}(z_{1})=W$ , $\sum_{z_{2}\in Z_{2}}r_{w2}(z_{2})=M_{w}$ for $w=1,\ldots,W$ .

Note that the above assignment mechanism yields a balanced split-plot design if

[TABLE]

In the school example described in Section 1, the whole-plots represent sets of schools within a county and we have $N=40$ , $W=4$ , $M_{1}=M_{2}=8$ , $M_{3}=M_{4}=12$ , $Z_{1}=Z_{2}=\{0,1\}$ . Finally, for all $z_{2}\in Z_{2}$ , $r_{w2}(z_{2})=4$ for $w=1,2$ and $r_{w2}(z_{2})=6$ for $w=3,4$ . Thus, the design is unbalanced.

To define the observed outcomes of the experiment, we introduce two sets of random treatment assignment indices at the whole-plot and the sub-plot levels. Let $T_{1}(z_{1})$ denote the set of indices $w$ such that whole-plot $\Omega_{w}$ is randomly assigned to level combination $z_{1}$ of $F_{11},\ldots,F_{1m_{1}}$ . Similarly, for $z_{2}\in Z_{2}$ and $w=1,\ldots,W$ , let $T_{w2}(z_{2})$ be the set of sub-plots in $\Omega_{w}$ randomly assigned to level combination $z_{2}$ of $F_{21},\ldots,F_{2m_{2}}$ . For any treatment combination $z_{1}z_{2}$ , the observed outcomes from the whole-plot $\Omega_{w}$ , $w\in T_{1}(z_{1})$ , are then $Y_{i}(z_{1}z_{2})$ , $i\in T_{w2}(z_{2})$ . Let

[TABLE]

denote the average observed outcome for treatment combination $z_{1}z_{2}$ within whole-plot $\Omega_{w}$ for $w\in T_{1}(z_{1})$ . In the spirit of the usual unbiased estimator of the population mean in two-stage sampling (Cochran, 1977), define

[TABLE]

where $\overline{M}=(M_{1}+\ldots+M_{W})/W=N/W$ is the average whole-plot size. From (5) and (6), it is straightforward to verify by conditioning on the randomization at the whole-plot level that $E\left\{\overline{Y}^{\textnormal{obs}}(z_{1}z_{2})\right\}=\overline{Y}(z_{1}z_{2})$ , where $\overline{Y}(z_{1}z_{2})$ is given by (2). Using (3), an immediate consequence of this fact is Proposition 1.

Proposition 1.

An unbiased estimator of the finite population treatment contrast $\overline{\tau}$ is given by

[TABLE]

where $\overline{Y}^{\textnormal{obs}}(z_{1}z_{2})$ is given by (6).

Sampling variance and its estimation generalizing the balanced case

Proposition 1 yields a point estimator of $\overline{\tau}$ . However, to quantify the uncertainty associated with the point estimator and draw inference on $\overline{\tau}$ , one needs to derive and estimate the sampling variance of $\widehat{\overline{\tau}}$ with respect to its distribution induced by the randomization in the split-plot design. Zhao et al., (2018) derived an expression for the sampling variance of $\widehat{\overline{\tau}}$ for a balanced split-plot design, that is, when conditions (4) are satisfied. They also obtained an estimator of the sampling variance that, like most variance estimators in finite population causal inference (Mukerjee et al., 2018), has a nonnegative bias. Further, they noted that this bias vanishes under between-whole-plot additivity, that is, average treatment effect homogeneity at the whole-plot level. In this Section, we derive an expression for the sampling variance and find a variance estimator generalizing the arguments in Zhao et al., (2018) to the unbalanced case, and examine the properties of the estimator. To that end, we first convert the “raw” potential outcomes $Y_{i}(z_{1}z_{2})$ to “adjusted” potential outcomes

[TABLE]

for each $z_{1}\in Z_{1}$ , $z_{2}\in Z_{2}$ , $i\in\Omega_{w}$ and $w=1,\ldots,W$ . An intuition behind this adjustment will be provided shortly, after we introduce its observed version.

For each $z_{1}z_{2}$ , define $\overline{U}_{w}(z_{1}z_{2})=M_{w}^{-1}\sum_{i\in\Omega_{w}}U_{i}(z_{1}z_{2}),\ w=1,\ldots,W$ , and $\overline{U}(z_{1}z_{2})=W^{-1}\sum_{w=1}^{W}\overline{U}_{w}(z_{1}z_{2})$ . By (8), $\overline{U}(z_{1}z_{2})=\overline{Y}(z_{1}z_{2})$ . Next, for $z_{1},z_{1}^{*}\in Z_{1}$ and $z_{2},z_{2}^{*}\in Z_{2}$ , define

[TABLE]

In the balanced case, $S_{\textnormal{bt}}(z_{1}z_{2},z_{1}^{*}z_{2}^{*})$ and $W^{-1}\sum_{w=1}^{W}S_{\textnormal{in},w}(z_{1}z_{2},z_{1}^{*}z_{2}^{*})$ represent, respectively, the between and within whole-plot mean squares or products in an analysis of variance/covariance decomposition of the potential outcomes.

It is also important to define a measure of heterogeneity of treatment contrasts across the whole-plots. First, Let

[TABLE]

denote the whole-plot level treatment contrasts, where $\overline{Y}_{w}(z_{1}z_{2})=M_{w}^{-1}\sum_{i\in\Omega_{w}}Y_{i}(z_{1}z_{2})$ is the average potential outcome of all units in whole-plot $\Omega_{w}$ for treatment combination $z_{1}z_{2}$ . The second equality in (9) follows from (1). Also, from (3) and (9), it follows that

[TABLE]

Now define the following measure of heterogeneity of treatment contrasts across the whole-plots:

[TABLE]

where $\overline{\tau}_{w}$ is given by (9). Then, extending the ideas of Zhao et al., (2018), after considerable algebra, we obtain the following result on the sampling variance of $\widehat{\overline{\tau}}$ , the unbiased estimator of $\overline{\tau}$ .

Theorem 1.

The sampling variance of $\widehat{\overline{\tau}}$ is

[TABLE]

Next, to obtain an estimator of the sampling variance, we first define the counterparts of $\overline{Y}_{w}^{\textnormal{obs}}(z_{1}z_{2})$ and $\overline{Y}^{\textnormal{obs}}(z_{1}z_{2})$ in (5) and (6) in terms of the adjusted potential outcomes:

[TABLE]

Then it is easy to see from (5), (6) and (8) that

[TABLE]

Note that $\overline{U}^{\textnormal{obs}}(z_{1}z_{2})$ is the simple average of $\overline{U}_{w}^{\textnormal{obs}}(z_{1}z_{2})$ , $w\in T_{1}(z_{1})$ , irrespective of whether $M_{1},\ldots,M_{W}$ are equal or not. This is precisely what the relationship between $\overline{Y}^{\textnormal{obs}}(z_{1}z_{2})$ and $\overline{Y}_{w}^{\textnormal{obs}}(z_{1}z_{2})$ in (6) reduces to when $M_{1}=\cdots=M_{W}$ , providing us with the intuition to generalize the results of Zhao et al., (2018) by substituting the potential outcomes by their adjusted version in view of (12). We now define the following estimator of the sampling variance in Theorem 1:

[TABLE]

where

[TABLE]

These expressions now allow us to work along the lines of Zhao et al., (2018) by substituting (12) in (7). Again, considerable algebra yields the following result:

Theorem 2.

The variance estimator $\widehat{V}(\widehat{\overline{\tau}})$ given by (13) estimates the sampling variance of $\widehat{\overline{\tau}}$ with a nonnegative bias $\Delta$ defined by (11), that is, $E\left\{\widehat{V}(\widehat{\overline{\tau}})\right\}=\textnormal{var}(\widehat{\overline{\tau}})+\Delta$ .

Remark 1.

Theorem 2 shows that $\widehat{V}(\widehat{\overline{\tau}})$ is a conservative estimator of $\textnormal{var}(\widehat{\overline{\tau}})$ with a non-negative bias $\Delta$ . This property is in line with variance estimators in other situations of randomization based causal inference. Moreover, in the balanced case, by (11), the bias $\Delta$ vanishes when $\overline{\tau}_{1}=\cdots=\overline{\tau}_{W}=\overline{\tau}$ . As observed by Zhao et al., (2018), this happens for every treatment contrast $\overline{\tau}$ if and only if between-whole-plot additivity holds, which means

[TABLE]

for every pair of treatment combinations $z_{1}z_{2}$ and $z_{1}^{*}z_{2}^{*}$ . A disturbing feature of the variance estimator $\widehat{V}(\widehat{\overline{\tau}})$ , however, emerges in the unbalanced case which is the main focus of this paper. Then $\widehat{V}(\widehat{\overline{\tau}})$ remains biased even if between-whole-plot additivity holds, because by (9) and (10), condition (14) implies $\overline{\tau}_{1}=\cdots=\overline{\tau}_{W}=\overline{\tau}$ and hence

[TABLE]

which is positive when $M_{1},\ldots,M_{W}$ are not all equal unless $\overline{\tau}=0$ . The situation remains unchanged even under the stronger assumption of strict additivity or homogeneity of treatment effects (Neyman, 1923), which enforces the constancy of $Y_{i}(z_{1}z_{2})-Y_{i}(z_{1}^{*}z_{2}^{*})$ over $i=1,\ldots,N$ for every pair of treatment combinations $z_{1}z_{2}$ and $z_{1}^{*}z_{2}^{*}$ .

This property of $\widehat{V}(\widehat{\overline{\tau}})$ described in Remark 1 is a matter of concern because a requirement typically imposed on a variance estimator in causal inference is that it should become unbiased at least under Neymannian strict additivity, if not under milder versions thereof such as between-whole-plot additivity in the present context. The estimator $\widehat{V}(\widehat{\overline{\tau}})$ , obtained by generalizing the arguments in the balanced case fails to meet this requirement when $M_{1},\ldots,M_{W}$ are not all equal. In the rest of the paper, we investigate the existence of a variance estimator that overcomes this difficulty and show how, under wide generality, such an estimator can be obtained by appropriately modifying $\widehat{V}(\widehat{\overline{\tau}})$ as given by (13).

A new variance estimator

We begin our search for an improved variance estimator by expanding the bias term $\Delta$ defined in (11) as follows:

[TABLE]

Note that in (15), the term $\overline{\tau}_{w}^{2}$ is not unbiasedly estimable, but for $w\neq w^{*}$ , $\overline{\tau}_{w}\overline{\tau}_{w^{*}}$ allows unbiased estimation. This is because, by (9),

[TABLE]

The sums over $i$ and $i^{*}$ in (16) include the case $i=i^{*}$ . There is at least one pair of distinct treatment combinations $z_{1}z_{2}$ and $z_{1}^{*}z_{2}^{*}$ such that $g(z_{1}z_{2})g(z_{1}^{*}z_{2}^{*})\neq 0$ and $Y_{i}(z_{1}z_{2})Y_{i}(z_{1}^{*}z_{2}^{*})$ is never observable as unit $i$ cannot be assigned simultaneously to both $z_{1}z_{2}$ and $z_{1}^{*}z_{2}^{*}$ . Hence, $\overline{\tau}_{w}^{2}$ does not allow unbiased estimation. On the other hand, for $w\neq w^{*}$ , $\overline{\tau}_{w}\overline{\tau}_{w^{*}}$ does not involve terms like $Y_{i}(z_{1}z_{2})Y_{i}(z_{1}^{*}z_{2}^{*})$ , and is unbiasedly estimable. For each $w$ , let $z_{1w}$ denote the level combination of the whole-plot factors assigned to whole-plot $\Omega_{w}$ . Now define

[TABLE]

The following proposition now gives an unbiased estimator of $\overline{\tau}_{w}\overline{\tau}_{w^{*}}$ :

Proposition 2.

For $w,w^{*}=1,\ldots,W$ , $w\neq w^{*}$ , an unbiased estimator of $\overline{\tau}_{w}\overline{\tau}_{w^{*}}$ is given by

[TABLE]

where $\delta(z_{1w},z_{1w^{*}})$ is an indicator that equals one if $z_{1w}=z_{1w^{*}}$ and zero otherwise.

We can now use Proposition 2 to construct a new estimator of $\textnormal{var}(\widehat{\overline{\tau}})$ . Consider any symmetric matrix $B=((b_{ww^{*}}))$ of order $W$ such that $b_{ww}=M_{w}^{2}$ for $w=1,\ldots,W$ . Now define the variance estimator

[TABLE]

where $\widehat{V}(\widehat{\overline{\tau}})$ is the variance estimator defined in Section 3, and $H_{ww^{*}}$ is as defined in Proposition 2. Then, from (15), (17), Theorem 2 and Proposition 2 it is easy to see that

[TABLE]

where

[TABLE]

Clearly, the bias $\widetilde{\Delta}$ is nonnegative, making $\widetilde{V}(\widehat{\overline{\tau}})$ a conservative estimator of $\textnormal{var}(\widehat{\overline{\tau}})$ if the matrix $B$ is nonnegative definite. Furthermore, by (18), this bias vanishes if and only if $\overline{\tau}_{1}=\cdots=\overline{\tau}_{W}$ , when $B$ has each row sum zero, and is a positive semidefinite matrix of rank $W-1$ . These facts are summarized in Theorem 3, which is the main result of this section.

Theorem 3.

Let there exist a positive semidefinite matrix $B=((b_{ww^{*}}))$ of order $W$ and satisfying the conditions: (c1) $b_{ww}=M_{w}^{2},\ w=1,\ldots,W$ , (c2) $\sum_{w^{*}=1}^{W}b_{ww^{*}}=0,\ w=1,\ldots,W$ , and (c3) rank $(B)=W-1$ . Then the variance estimator $\widetilde{V}(\widehat{\overline{\tau}})$ defined in (17) estimates $\textnormal{var}(\widehat{\overline{\tau}})$ with a nonnegative bias $\widetilde{\Delta}$ given by (18), which vanishes if and only if $\overline{\tau}_{1}=\cdots=\overline{\tau}_{W}$ .

Remark 2.

Recall that the between-whole-plot additivity condition (14) is equivalent to $\overline{\tau}_{1}=\cdots=\overline{\tau}_{W}$ for every treatment contrast. Thus, even when the whole-plot sizes $M_{1},\ldots,M_{W}$ are not all equal, by Theorem 3, the bias $\widetilde{\Delta}$ vanishes for every treatment contrast if and only if between-whole-plot additivity holds. Thus, if a positive semidefinite matrix $B$ satisfying conditions (c1)-(c3) is available, then Theorem 3 provides us with a variance estimator that possesses properties similar to the one derived by Zhao et al., (2018) for the balanced case. However, the issue of existence of such a matrix turns out to be quite challenging, and will be explored in the next section.

Existence and construction

We will now study the existence of a positive semidefinite matrix $B$ satisfying conditions (c1)-(c3) stated in Theorem 3 as a purely mathematical problem. Without loss of generality, we assume hereafter that

[TABLE]

To motivate the ideas, consider first the case $W=3$ , where conditions (c1) and (c2) determine $B$ uniquely as

[TABLE]

This matrix is also positive semidefinite and satisfies (c3) if and only if its principal minor, given by the first two rows and columns, is positive. Simplification of this condition and application of (19) yields $M_{3}<M_{1}+M_{2}$ as the necessary and sufficient condition for $B$ to satisfy (c1)-(c3). This construction of $B$ for $W=3$ raises the following questions with respect to the general case $W\geq 3$ :

(a)

Is the condition

[TABLE]

necessary and sufficient for existence of a positive semidefinite matrix $B$ satisfying (c1)-(c3)?

(b)

If so, then under (21), can one construct such a matrix $B$ by an extension of the form in ( 20) to the general case?

Later in this section, Theorem 4 answers (a) in the affirmative. On the other hand, the question in (b) does not allow a conclusive answer. To see why, observe that the most obvious extension of (20) to general $W\geq 3$ is given by $B=((b_{ww^{*}}))$ , with

[TABLE]

The divisors in (22) ensure condition (c2) about zero row sums and make it consistent with (20) when $W=3$ . The form (22) is also natural because, in keeping with $M_{1}^{2},\ldots,M_{W}^{2}$ as the diagonal elements of $B$ , it takes the off-diagonal elements as linear combinations of $M_{1}^{2},\ldots,M_{W}^{2}$ in a systematic manner. However, unlike the case of $W=3$ , the matrix $B$ given by (22) may not be positive semidefinite for $W\geq 4$ , even when the condition (21) holds. For instance, if $W=4$ , then this condition holds for both the configurations $(M_{1},\ldots,M_{4})=(8,8,12,12)$ and $(6,6,14,14)$ . The matrix $B$ in (22) is positive semidefinite of rank 3 ( $=W-1$ ) for the first configuration, but has a negative eigenvalue for the second.

The above discussion makes it clear that, in general, the task of obtaining a positive semidefinite matrix $B$ satisfying (c1)-(c3) under condition (21) can be far more complex than what the form (20) arising for $W=3$ suggests.Theorem 4 establishes condition (21) as a necessary and sufficient condition for existence of such a matrix.

Theorem 4.

Let $W\geq 3$ . Then condition (21), that is, $M_{W}<M_{1}+\ldots+M_{W-1}$ , is necessary and sufficient for the existence of a positive semidefinite matrix $B=((b_{ww^{*}}))$ of order $W$ and satisfying the conditions (c1) $b_{ww}=M_{w}^{2},\ w=1,\ldots,W$ , (c2) $\sum_{w^{*}=1}^{W}b_{ww^{*}}=0,\ w=1,\ldots,W$ , and (c3) rank $(B)=W-1$ .

The sufficiency part of the proof of Theorem 4 leads to a construction procedure of the matrix $B$ satisfying conditions (c1)-c(3). If $M_{1}=\ldots=M_{W}(=M,\ \mbox{say})$ , then one can simply take $M^{2}$ at each diagonal position of $B$ and $-M^{2}/(W-1)$ at each off-diagonal position. Turning next to the case of unequal $M_{1}\leq\ldots\leq M_{W}$ , suppose condition (21) holds. Let ${\mu}=(M_{1},\ldots,M_{W-1})^{\prime}$ , where the prime denotes transposition, and let $e$ denote the $(W-1)\times 1$ vector of ones. Then the steps involved in the construction of the matrix $B$ are:

Step 1: Find a vector $x$ with elements $\pm 1$ satisfying the condition

[TABLE]

Step 2: Find nonnegative constants $a_{1}$ and $a_{2}$ , satisfying $a_{1}+a_{2}<1$ and the following condition:

[TABLE]

Step 3: Construct the following matrix:

[TABLE]

where $x$ , $a_{1}$ and $a_{2}$ are obtained from steps 1 and 2 above, $I$ is the identity matrix of order $W-1$ and $D=\text{diag}(M_{1},\ldots,M_{W-1})$ .

Step 4: Construct matrix $B$ as follows:

[TABLE]

Then $B$ is positive semidefinite of order $W$ and satisfies (c1)-(c3) by the proof of the sufficiency part of Theorem 4. A lemma, crucial in this proof, appears in the supplementary material and guarantees the existence of vector $x$ in step 1 and constants $a_{1}$ and $a_{2}$ in step 2 under condition (21).

Remark 3.

It is satisfying that the condition (21) holds under wide generality. It only requires the largest whole-plot to be not too large compared to the others and holds, in particular, when there is a tie about the largest whole-plot.

Remark 4.

For $W=3$ , one can check that the construction stated above yields the unique $B$ in (20). For $W\geq 4$ , however, a positive semidefinite matrix $B$ meeting (c1)-(c3) is non-unique. Indeed, then the above construction itself can yield a wide class of such matrices $B$ considering all vectors $x$ which satisfy (23), and for each such $x$ , all nonnegative $a_{1},a_{2}$ satisfying $a_{1}+a_{2}<1$ and ((b)). Thus, the issue of discriminating among rival choices of $B$ becomes important. Such a discriminating strategy is discussed in Section 6.

Minimax estimators unbiased under between-whole-plot additivity

As seen in Section 5, while condition (21) guarantees the existence of matrix $B$ and consequently a variance estimator that is unbiased under between-whole-plot additivity, such a matrix is non-unique. Thus, it is important to define a criterion that can discriminate among possible choices of $B$ . Clearly, a good choice should control the bias $\tilde{\Delta}=(1/N^{2})\sum_{w=1}^{W}\sum_{w^{*}=1}^{W}b_{ww^{*}}\overline{\tau}_{w}\overline{\tau}_{w^{*}}$ given by (18) that is associated with the estimation of $\textnormal{var}(\widehat{\overline{\tau}})$ . The hurdle here is that, $\overline{\tau}_{1},\ldots,\overline{\tau}_{W}$ are unknown. Even the idea of minimaxity does not work without further refinement, because $B$ is positive semidefinite, and hence $\tilde{\Delta}$ is unbounded with respect to variation of $\overline{\tau}_{1},\ldots,\overline{\tau}_{W}$ in the $W$ -dimensional real space. On the other hand, by (10), multiplication of $\overline{\tau}_{1},\ldots,\overline{\tau}_{W}$ by any nonzero constant only rescales the treatment contrast $\overline{\tau}$ , without essentially altering it. We, therefore, consider minimization of $\tilde{\Delta}$ subject to $\sum_{w=1}^{W}\overline{\tau}_{w}^{2}=1$ . This is motivated by Mukerjee et al., (2018) who touched upon split-plot designs only in the balanced case. It is easy to see that the above formulation calls for obtaining $B$ , subject to (c1)-(c3), so as to minimize $\lambda_{\max}(B)$ , the largest eigenvalue of $B$ . The following proposition provides us with a lower bound for $\lambda_{\max}(B)$ .

Proposition 3.

For any positive semidefinite matrix $B$ satisfying (c1)-(c3), a lower bound for $\lambda_{\max}(B)$ is given by $\lambda_{0}=\sum_{w=1}^{W}M_{w}^{2}/(W-1)$ , but this bound is unattainable whenever $M_{1},\ldots,M_{W}$ are not all equal.

Given Proposition 3, an analytical solution to the minimaxity problem above seems to be intractable in the unbalanced case. This is anticipated, because a complete characterization of matrices $B$ satisfying (c1)-(c3) is hard, even though in Section 5, we were able to outline a general method for constructing such matrices when condition (21) holds. As a practical strategy, therefore, it makes sense to concentrate on matrices $B$ that can be obtained via this method, with a view to minimizing $\lambda_{\max}(B)$ among these matrices. It is reassuring that even then the class of competing matrices $B$ is quite large, as noted in Remark 4.

Example 1.

Returning to the school example in Sections 1 and 2, where we have $N=40$ , $W=4$ and $(M_{1},M_{2},M_{3},M_{4})=(8,8,12,12)$ , the smallest $\lambda_{\max}(B)$ obtainable via steps 1 through 4 described in Section 5 is 192, which corresponds to

[TABLE]

as given by $x=(1,1,-1)^{\prime}$ , $a_{1}=0.5$ and $a_{2}=0$ .

Simulation Results

Whereas Theorem 3 establishes unbiasedness of $\widetilde{V}(\widehat{\overline{\tau}})$ under (21) and between-whole-plot additivity, and consideration of minimaxity is expected to provide protection under extreme departures from additivity, it is also important to understand how the bias of $\widetilde{V}(\widehat{\overline{\tau}})$ would compare to that of $\widehat{V}(\widehat{\overline{\tau}})$ under different levels of treatment effect heterogeneity. We now conduct some simulations to study this aspect. We consider the estimation of the interaction effect between factors $F_{1}$ and $F_{2}$ in the setting of Example 1. The unit-level treatment contrast $\tau_{i}$ equals $\{Y_{i}(00)-Y_{i}(01)-Y_{i}(10)+Y_{i}(11)\}/4$ for $i=1,\ldots,40$ (Dasgupta et al., 2015). The finite population contrast of interest is $\overline{\tau}=\sum_{i=1}^{40}\tau_{i}/40$ . The vector of potential outcomes for unit $i$ , denoted by $Y_{i}=\left(Y_{i}(00),Y_{i}(01),Y_{i}(10),Y_{i}(11)\right)$ , is generated using the multivariate normal model:

[TABLE]

where

[TABLE]

is the covariance matrix for whole-plot $\Omega_{w}$ that depends on two parameters: the variance $\sigma_{w}^{2}$ and correlation $\rho_{w}$ . Matrices $I_{n}$ and $J_{n}$ respectively denote the $n$ th order identity matrix and the matrix of ones. Eight possible scenarios (listed in Table 1) for generating the potential outcomes are considered.

Strict additivity holds for population I. The potential outcomes for population II are forced to to ensure, via an appropriate command in R, that the whole-plot means $\overline{\tau}_{1},\ldots,\overline{\tau}_{4}$ are always one. Population III generates different $\overline{\tau}_{1},\ldots,\overline{\tau}_{4}$ but guarantees the same $\tau_{i}$ within each whole-plot. Populations IV through VIII differ only with respect to the correlation parameters that lead to different types of treatment effect heterogeneity. These include all zero correlations in population VI, all negative correlations in population VII, and a mix of positive and negative correlations in population VIII.

From each population, 200 sets of potential outcomes are generated, and the biases of variance estimators $\widehat{V}(\widehat{\overline{\tau}})$ and $\widetilde{V}(\widehat{\overline{\tau}})$ are compared. Note that these biases are $\Delta$ given by (11) and $\tilde{\Delta}$ given by (18). We also calculate the bias ratio $\widetilde{\Delta}/\Delta$ for each population. The results for populations I and II are consistent with our results. In both of these cases, $\widetilde{\Delta}$ is always zero and $\Delta$ is always 0.0133. Boxplots of the distributions of $\Delta$ and $\tilde{\Delta}$ for populations III-VIII are shown in Figure 1. The median bias ratios for these populations are 0.804, 0.811, 0.811, 0.810, 0.822 and 0.817 respectively. The plots and the median bias ratios establish the robustness of the new estimator $\widetilde{V}(\widehat{\overline{\tau}})$ with respect to controlling bias under various forms of treatment effect heterogeneity.

Acknowledgement

This work was supported by the J.C. Bose National Fellowship, Government of India, and grants from Indian Institute of Management Calcutta and National Science Foundation, USA.

Appendix: Proofs of results

In what follows, $E_{1}$ and $\textnormal{cov}_{1}$ denote unconditional expectation and covariance with respect to the randomization at the whole-plot stage, while $E_{2}$ and $\textnormal{cov}_{2}$ denote expectation and covariance with respect to the randomization at the sub-plot stage, conditional on the whole-plot stage assignment.

Proof of Proposition 1.

Follows from straightforward conditioning arguments.

∎

Proof of Theorem 1.

Recall that

[TABLE]

Consequently,

[TABLE]

Defining $\delta(z_{1},z_{1}^{*})$ as an indicator that equals one if $z_{1}=z_{1}^{*}$ and zero otherwise, we have

[TABLE]

Next,

[TABLE]

so that

[TABLE]

Hence,

[TABLE]

Since $\widehat{\overline{\tau}}=\sum_{z_{1}\in Z_{1}}\sum_{z_{2}\in Z_{2}}g(z_{1}z_{2})\overline{U}^{\textnormal{obs}}(z_{1}z_{2})$ , we have that

[TABLE]

Substituting the expression of $\textnormal{cov}\left\{\overline{U}^{\textnormal{obs}}(z_{1}z_{2}),\overline{U}^{\textnormal{obs}}(z_{1}^{*}z_{2}^{*})\right\}$ from (27) in the above, the first two terms in the expression of $\textnormal{var}(\widehat{\overline{\tau}})$ in Theorem 1 follow immediately. The last term can be explained as

[TABLE]

∎

Proof of Theorem 2.

[TABLE]

where $\widetilde{\overline{U}}(z_{1}z_{2})=\sum_{w\in T_{1}(z_{1})}\overline{U}_{w}(z_{1}z_{2})/r_{1}(z_{1})$ , and $\widetilde{\overline{U}}(z_{1}z_{2}^{*})$ is similarly defined. For any $w\in T_{1}(z_{1})$ ,

[TABLE]

Thus,

[TABLE]

The result stated in Theorem 2 is evident from the above. ∎

Proof of Proposition 2.

Because $w\neq w^{*}$ , by (5) and the definition of $G_{w}^{\text{obs}}$ , conditionally on the assignment of the whole-plots to the level combinations of the whole-plot factors, $G_{w}^{\textnormal{obs}}$ and $G_{w^{*}}^{\textnormal{obs}}$ are independent and the conditional expectation of their product equals

[TABLE]

The result now follows from (9), noting that the pair $(z_{1w},z_{1w^{*}})$ equals any $(z_{1},z_{1}^{*})$ with probability $\frac{r_{1}(z_{1})\left\{r_{1}(z_{1}^{*})-\delta(z_{1},z_{1}^{*})\right\}}{W(W-1)}$ . ∎

Proof of the necessity part of Theorem 4.

Suppose a positive semidefinite matrix $B=(b_{ww^{*}})$ of order $W$ and satisfying (c1)-(c3) exists. Then by (c1),

[TABLE]

Hence using (c2), (28), and (c1) in succession,

[TABLE]

which implies $M_{W}\leq M_{1}+\ldots+M_{W-1}$ . If possible, let equality hold here. Then equality holds throughout in (29), and invoking (28), this yields

[TABLE]

For any $w,w^{*}$ such that $w<w^{*}<W$ , by (c1) and (30), the principal minor of $B$ , as given by its $w$ th, $w^{*}$ th and $W$ th rows and columns turns out to be $-M_{W}^{2}(b_{ww^{*}}-M_{w}M_{w^{*}})^{2}$ . Because this principal minor is nonnegative due to positive semidefinite-ness of $B$ , it follows that $b_{ww^{*}}=M_{w}M_{w^{*}}$ . This, in conjunction with (c1) and (30), implies that $B=bb^{\prime}$ , where $b=(M_{1},\ldots,M_{W-1},-M_{W})^{\prime}$ . But then $\textnormal{rank}(B)=1<W-1$ , and (c3) is violated. This contradiction proves the necessity of the condition $M_{W}<M_{1}+\ldots+M_{W-1}$ . ∎

To prove the sufficiency part of Theorem 4, we first state a lemma that is crucial in this proof and also leads to the algorithm for construction of the symmetric positive semidefinite matrix $B$ of order $W$ that satisfies conditions (c1)-(c3).

Lemma 1.

Let $W\geq 3$ . Suppose $M_{1},\ldots,M_{W}$ are not all equal and $M_{1}\leq\ldots\leq M_{W}$ , as per (19). Let $e$ denote the $(W-1)\times 1$ vector of ones and $\mu=(M_{1},\ldots,M_{W-1})^{\prime}$ .

(a)

Then there exists a $(W-1)\times 1$ vector $x$ with elements $\pm 1$ such that $|\mu^{\prime}x|<M_{W}$ .

(b)

If, in addition, condition (21) holds, i.e., $M_{W}<M_{1}+\ldots+M_{W-1}$ , then, with the vector $x$ as in (a) above, there exist nonnegative constants $a_{1},a_{2}$ satisfying $a_{1}+a_{2}<1$ , such that equation (24) holds, i.e.,

[TABLE]

Proof of Lemma 1.

Part (a). It will suffice to show that there exist $x_{1},\ldots,x_{W-1}$ , each $+1$ or $-1$ , such that $|\sum_{w=1}^{W-1}M_{w}x_{w}|<M_{W}$ . One can then simply take $x=(x_{1},\ldots,x_{W-1})^{\prime}$ . Recall that $M_{1}\leq M_{2}\leq\ldots\leq M_{W}$ , as per (19). Because $M_{1}\ldots,M_{W}$ are not all equal, this yields

[TABLE]

Let $h$ be the largest nonnegative integer such that

[TABLE]

By (31), $W-2h\geq 2$ . If $h\geq 1$ , define

[TABLE]

and note that

[TABLE]

because by (19) and (32), $M_{w}=M_{W}$ for $w=W-2h,\ldots,W-1$ . Now, if $W-2h=2$ , then with $x_{1}=1$ and $x_{2},\ldots,x_{W-1}$ as in (33), $|\sum_{w=1}^{W-1}M_{w}x_{w}|=M_{1}<M_{W}$ , by (31) and (34).

Next, let $W-2h\geq 3$ . Then, by (19),

[TABLE]

Let $w_{1}$ be the largest integer in $\{1,\ldots,W-2h-2\}$ such that $\sum_{w=1}^{w_{1}}M_{w}\leq\sum_{w=w_{1}+1}^{W-2h-1}M_{w}$ . If $w_{1}=W-2h-2$ , then $\sum_{w=1}^{W-2h-2}M_{w}\leq M_{W-2h-1}$ . So, with $x_{1}=\ldots=x_{W-2h-2}=-1$ , $x_{W-2h-1}=1$ and $x_{W-2h},\ldots,x_{W-1}$ as in (33) when $h\geq 1$ ,

[TABLE]

by (34).

Now, suppose $1\leq w_{1}\leq W-2h-3$ , in which case $W-2h\geq 4$ . Then,

[TABLE]

As a result, either

[TABLE]

Else,

[TABLE]

Adding these two inequalities, we have $M_{w_{1}+1}\geq M_{W}$ , which is impossible by the definition of $h$ , because $w_{1}+1\leq W-2h-2$ .

If (i) holds, then the choice $x_{1}=\ldots=x_{w_{1}}=-1$ , $x_{w_{1}+1}=\ldots=x_{W-2h-1}=1$ , coupled with $x_{W-2h},\ldots,x_{W-1}$ as in (33) when $h\geq 1$ , entails $\left|\sum_{w=1}^{W-1}M_{w}x_{w}\right|<M_{W}$ , by (34). Similarly, if (ii) holds, then the choice $x_{1}=\ldots=x_{w_{1}+1}=-1$ , $x_{w_{1}+2}=\ldots=x_{W-2h-1}=1$ , coupled with $x_{W-2h},\ldots,x_{W-1}$ as in (33) when $h\geq 1$ , entails $\left|\sum_{w=1}^{W-1}M_{w}x_{w}\right|<M_{W}$ .

Part (b): Let $M_{W}<M_{1}+\ldots+M_{W-1}=\mu^{\prime}e$ , and let the vector $x$ be as in part (a) above, so that $|\mu^{\prime}x|<M_{W}$ . Let $\phi_{1}=\left(\mu^{\prime}x\right)^{2}-\mu^{\prime}\mu$ , $\phi=M_{W}^{2}-\mu^{\prime}\mu$ and $\phi_{2}=\left(\mu^{\prime}e\right)^{2}-\mu^{\prime}\mu$ . Then $\phi_{1}<\phi<\phi_{2}$ , as $|\mu^{\prime}x|<M_{W}<\mu^{\prime}e$ . As a result, there exist constants $\tilde{a}_{1}$ and $\tilde{a}_{2}$ such that $0\leq\tilde{a}_{1},\tilde{a}_{2}<1$ and $\tilde{a}_{1}\phi_{1}<\phi<\tilde{a}_{2}\phi_{2}$ . Let $\xi=\left(\tilde{a}_{2}\phi_{2}-\phi\right)/\left(\tilde{a}_{2}\phi_{2}-\tilde{a}_{1}\phi_{1}\right)$ . Then $0<\xi<1$ . Hence, if we take $a_{1}=\tilde{a}_{1}\xi$ , $a_{2}=\tilde{a}_{2}(1-\xi)$ , then $a_{1},a_{2}\geq 0$ and $a_{1}+a_{2}<1$ , because $a_{1}+a_{2}$ is a weighted average of $\tilde{a}_{1}$ and $\tilde{a}_{2}$ , both of which are less than one. Moreover, $a_{1}\phi_{1}+a_{2}\phi_{2}=\phi$ by the definition of $\xi$ , i.e., $a_{1}$ and $a_{2}$ satisfy (24). ∎

Proof of the sufficiency part of Theorem 4.

In view of Lemma 1, this follows from steps 1-4 in Section 5, noting that (i) the matrix $A$ there is positive definite, and hence the matrix $B$ there is positive semidefinite of rank $W-1$ with each row sum zero, (ii) $A$ has diagonal elements $M_{1}^{2},\ldots,M_{W-1}^{2}$ , and (iii) by (24),

[TABLE]

because $De=\mu$ .

∎

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Box et al., (2005) Box, G. E. P., Hunter, J. S., and Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery . John Wiley & Sons, Hoboken, New Jersey, 2nd edition.
2Cochran, (1977) Cochran, W. G. (1977). Sampling Techniques . John Wiley & Sons: New York.
3Cochran and Cox, (1957) Cochran, W. G. and Cox, G. M. (1957). Experimental Designs . John Wiley & Sons, Hoboken, New Jersey, 2nd edition.
4Dasgupta et al., (2015) Dasgupta, T., Pillai, N. S., and Rubin, D. B. (2015). Causal inference for 2 K superscript 2 𝐾 2^{K} factorial designs by using potential outcomes. Journal of the Royal Statistical Society, Series B , 77(4):727–753.
5Fisher, (1925) Fisher, R. A. (1925). Statistical Methods for Research Workers . Oliver & Boyd, Edinburgh, Scotland.
6Fisher, (1935) Fisher, R. A. (1935). The Design of Experiments . Oliver & Boyd, Oxford, England, 1st edition.
7Freedman, (2006) Freedman, D. A. (2006). Statistical models for causation: What inferential leverage do they provide? Evaluation Review , 30:691–713.
8Freedman, (2008) Freedman, D. A. (2008). On regression adjustments to experimental data. Advances in Applied Mathematics , 40:180–193.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Causal Inference from Possibly Unbalanced Split-Plot Designs: A Randomization-based Perspective

Abstract

Introduction

Treatment contrast and its unbiased estimation

Proposition 1**.**

Sampling variance and its estimation generalizing the balanced case

Theorem 1**.**

Theorem 2**.**

Remark 1**.**

A new variance estimator

Proposition 2**.**

Theorem 3**.**

Remark 2**.**

Existence and construction

Theorem 4**.**

Remark 3**.**

Remark 4**.**

Minimax estimators unbiased under between-whole-plot additivity

Proposition 3**.**

Example 1**.**

Simulation Results

Acknowledgement

Appendix: Proofs of results

Proof of Proposition 1.

Proof of Theorem 1.

Proof of Theorem 2.

Proof of Proposition 2.

Proof of the necessity part of Theorem 4.

Lemma 1**.**

Proof of Lemma 1.

Proof of the sufficiency part of Theorem 4.

Proposition 1.

Theorem 1.

Theorem 2.

Remark 1.

Proposition 2.

Theorem 3.

Remark 2.

Theorem 4.

Remark 3.

Remark 4.

Proposition 3.

Example 1.

Lemma 1.