The P-LOOP Estimator: Covariate Adjustment for Paired Experiments

Edward Wu; Johann A. Gagnon-Bartsch

arXiv:1905.08450·stat.AP·May 22, 2019

The P-LOOP Estimator: Covariate Adjustment for Paired Experiments

Edward Wu, Johann A. Gagnon-Bartsch

PDF

Open Access

TL;DR

The paper introduces the P-LOOP estimator, a flexible covariate adjustment method for paired experiments that improves treatment effect estimation by automatically deciding whether to incorporate pairing information during imputation.

Contribution

It proposes a novel covariate adjustment estimator for paired experiments that adaptively accounts for pairing, enhancing precision over existing methods.

Findings

01

P-LOOP improves estimation precision in paired experiments.

02

The method automatically decides to include pairing in covariate adjustment.

03

Flexible use of prediction algorithms like lasso or random forests.

Abstract

In paired experiments, participants are grouped into pairs with similar characteristics, and one observation from each pair is randomly assigned to treatment. Because of both the pairing and the randomization, the treatment and control groups should be well balanced; however, there may still be small chance imbalances. It may be possible to improve the precision of the treatment effect estimate by adjusting for these imbalances. Building on related work for completely randomized experiments, we propose the P-LOOP (paired leave-one-out potential outcomes) estimator for paired experiments. We leave out each pair and then impute its potential outcomes using any prediction algorithm. The imputation method is flexible; for example, we could use lasso or random forests. While similar methods exist for completely randomized experiments, covariate adjustment methods in paired experiments are…

Tables2

Table 1. Table 1: Simulation Results

	Simpson’s Paradox		Uninformative Pairs
Method	True SE	Nominal SE	True SE	Nominal SE
Simple Difference	0.582	0.585	0.606	0.604
P-LOOP (differences)	0.389	0.406	0.392	0.407
P-LOOP (outcomes)	0.674	0.676	0.385	0.390
P-LOOP (interpolated)	0.387	0.408	0.389	0.393

Table 2. Table 2: Comparison of Methods

	Pretest		Pretest and School Type
Method	Point Est.	Nominal Var.	Point Est.	Nominal Var.
Simple Difference	-6.82	9.82	-6.82	9.82
Regression 1	-2.61	6.18	-2.61	6.18
Regression 2	-2.60	6.56	-2.27	4.57
P-LOOP (differences)	-2.79	6.54	-2.17	4.35
P-LOOP (outcomes)	-2.04	5.55	-1.81	4.02
P-LOOP (interpolated)	-2.00	5.72	-2.06	3.91

Equations94

Y_{i} = T_{i} t_{i} + (1 - T_{i}) c_{i} .

Y_{i} = T_{i} t_{i} + (1 - T_{i}) c_{i} .

\overset{τ}{ˉ} = \frac{1}{2 N} i = 1 \sum 2 N (t_{i} - c_{i}) .

\overset{τ}{ˉ} = \frac{1}{2 N} i = 1 \sum 2 N (t_{i} - c_{i}) .

\frac{1}{2 N} i = 1 \sum 2 N [2 (Y_{i} - \overset{m}{^}_{i}) T_{i} - 2 (Y_{i} - \overset{m}{^}_{i}) (1 - T_{i})],

\frac{1}{2 N} i = 1 \sum 2 N [2 (Y_{i} - \overset{m}{^}_{i}) T_{i} - 2 (Y_{i} - \overset{m}{^}_{i}) (1 - T_{i})],

W_{i} = T_{i} a_{i} + (1 - T_{i}) b_{i} .

W_{i} = T_{i} a_{i} + (1 - T_{i}) b_{i} .

τ_{i} = \frac{( t _{i 1} - c _{i 1} ) + ( t _{i 2} - c _{i 2} )}{2} = \frac{1}{2} (a_{i} + b_{i})

τ_{i} = \frac{( t _{i 1} - c _{i 1} ) + ( t _{i 2} - c _{i 2} )}{2} = \frac{1}{2} (a_{i} + b_{i})

\overset{τ}{ˉ} = \frac{1}{N} i = 1 \sum N τ_{i}

\overset{τ}{ˉ} = \frac{1}{N} i = 1 \sum N τ_{i}

d_{i}

d_{i}

= \frac{1}{2} (a_{i} - b_{i}),

\overset{τ}{^}_{i}

\overset{τ}{^}_{i}

\overset{τ}{^}

\overset{τ}{^}

= \frac{1}{N} i = 1 \sum N [(W_{i} - \hat{d}_{i}) T_{i} + (W_{i} + \hat{d}_{i}) (1 - T_{i})]

Var (\overset{τ}{^}_{i})

Var (\overset{τ}{^}_{i})

= MSE (\hat{d}_{i})

Var (\overset{τ}{^})

Var (\overset{τ}{^})

\frac{1}{N ^{2}} i = 1 \sum N MSE (\hat{d}_{i})

\frac{1}{N ^{2}} i = 1 \sum N MSE (\hat{d}_{i})

M_{a}

M_{a}

M_{b}

\hat{M}_{a}

\hat{M}_{a}

\hat{M}_{b}

Var (\overset{τ}{^}) = \frac{1}{N} (\frac{1}{4} \hat{M}_{a} + \frac{1}{4} \hat{M}_{b} + \frac{1}{2} \hat{M}_{a} \hat{M}_{b}) .

Var (\overset{τ}{^}) = \frac{1}{N} (\frac{1}{4} \hat{M}_{a} + \frac{1}{4} \hat{M}_{b} + \frac{1}{2} \hat{M}_{a} \hat{M}_{b}) .

Y = α + T τ + P β + Z γ + ϵ

Y = α + T τ + P β + Z γ + ϵ

\hat{d}_{i}

\hat{d}_{i}

(\frac{Z _{i 1} + Z _{i 2}}{2}, Z_{i 1} - Z_{i 2}) .

(\frac{Z _{i 1} + Z _{i 2}}{2}, Z_{i 1} - Z_{i 2}) .

\hat{d}_{i} = \frac{1}{2} (\overset{a}{^}_{i} - \hat{b}_{i}) .

\hat{d}_{i} = \frac{1}{2} (\overset{a}{^}_{i} - \hat{b}_{i}) .

α_{i} = x \in [0, 1] argmin k \in A \ i \sum [a_{k} - (x \overset{a}{^}_{k}^{(1)} + (1 - x) \overset{a}{^}_{k}^{(2)})]^{2} .

α_{i} = x \in [0, 1] argmin k \in A \ i \sum [a_{k} - (x \overset{a}{^}_{k}^{(1)} + (1 - x) \overset{a}{^}_{k}^{(2)})]^{2} .

α_{i}

α_{i}

Y_{ij} = 80 - 10 T_{ij} - 5 Z_{ij} + 10 E_{i} + ϵ_{ij}

Y_{ij} = 80 - 10 T_{ij} - 5 Z_{ij} + 10 E_{i} + ϵ_{ij}

Y_{ij} = 80 - 10 T_{ij} + 5 Z_{ij} + ϵ_{ij}

Y_{ij} = 80 - 10 T_{ij} + 5 Z_{ij} + ϵ_{ij}

Var (\overset{τ}{^}_{i})

Var (\overset{τ}{^}_{i})

= E [Var (\overset{τ}{^}_{i} ∣ \hat{d}_{i})] + Var (τ_{i})

= E [Var ((W_{i} - \hat{d}_{i}) T_{i} + (W_{i} + \hat{d}_{i}) (1 - T_{i}) ∣ \hat{d}_{i})]

= E [Var ((a_{i} - \hat{d}_{i}) T_{i} + (b_{i} + \hat{d}_{i}) (1 - T_{i}) ∣ \hat{d}_{i})]

= E [Var ((a_{i} - b_{i} - 2 \hat{d}_{i}) T_{i} + b_{i} + \hat{d}_{i} ∣ \hat{d}_{i})]

= E [(2 d_{i} - 2 \hat{d}_{i})^{2} Var (T_{i} ∣ \hat{d}_{i})]

= E [4 (d_{i} - \hat{d}_{i})^{2} \times 1/4]

= E [(d_{i} - \hat{d}_{i})^{2}] = MSE (\hat{d}_{i}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Bayesian Inference · Statistical Methods and Inference

Full text

The P-LOOP Estimator: Covariate Adjustment for Paired Experiments

Edward Wu11footnotemark: 1 Department of Statistics, University of Michigan, Ann Arbor, MI.

Johann A. Gagnon-Bartsch11footnotemark: 1

Abstract

In paired experiments, participants are grouped into pairs with similar characteristics, and one observation from each pair is randomly assigned to treatment. Because of both the pairing and the randomization, the treatment and control groups should be well balanced; however, there may still be small chance imbalances. It may be possible to improve the precision of the treatment effect estimate by adjusting for these imbalances. Building on related work for completely randomized experiments, we propose the P-LOOP (paired leave-one-out potential outcomes) estimator for paired experiments. We leave out each pair and then impute its potential outcomes using any prediction algorithm. The imputation method is flexible; for example, we could use lasso or random forests. While similar methods exist for completely randomized experiments, covariate adjustment methods in paired experiments are relatively understudied. A unique trade-off exists for paired experiments, where it can be unclear whether to factor in pair assignments when making adjustments. We address this issue in the P-LOOP estimator by automatically deciding whether to account for the pairing when imputing the potential outcomes. By addressing this trade-off, the method has the potential to improve precision over existing methods.

1 Introduction

In randomized controlled trials, we expect the characteristics of the treatment and control groups to be similar except for the treatment itself. However, there will often be small imbalances in baseline covariates due to chance variation in treatment assignment, which can be addressed in multiple ways. One way to improve the precision of the treatment effect estimate would be to adjust for these imbalances during the analysis. Alternatively, it might be possible to balance covariates through the design of the experiment. For example, in paired experiments, participants are organized into pairs prior to treatment assignment, and then one participant in each pair is randomly assigned to treatment. Ideally, the two participants in each pair would be as similar as possible. While a paired design is often effective, it may still be helpful to make adjustments for remaining covariate imbalances. However, perhaps in part because covariate balance is addressed through experimental design, covariate adjustment methods in paired experiments are relatively understudied.

Covariate adjustment methods can be model-based or design-based (for a discussion, see [8] and [9]). Model-based estimators have the potential to improve efficiency; however, incorrect modeling assumptions can result in bias and increased mean squared error. Design-based estimators rely only on randomization as the basis for inference, diminishing the concern of model misspecification. Hierarchical linear models (see [14] and [21]) are an example of a model-based approach for blocked experiments, including paired experiments. [13] and [5] note that hierarchical linear models are a common way to analyze blocked experiments. However, the use of such models requires one to make various modeling decisions, potentially raising concerns about model misspecification. For example, [5] notes that there is some debate as to whether block effects should be modeled as fixed or random.

As noted above, covariate adjustments in paired experiments are relatively understudied, and design-based methods are even more so. One recent approach is presented by [6]. Fogarty examines the use of regression adjustments in paired experiments under a design-based framework, building on the work of [7] and [10], who discuss regression adjustments in completely randomized experiments. More recently, covariate adjustment methods have been proposed for completely randomized and Bernoulli randomized experiments that involve the use of sample splitting and machine learning methods to impute potential outcomes. These include [1], [20], [4], [22], [18], and [15]. Unlike the case of regression adjustments, there is not currently an analogue to these methods for paired experiments.

In this paper, we present an analogous approach to these machine learning methods for paired experiments, the P-LOOP (paired leave-one-out potential outcomes”) estimator. This method is design-based; however, it also allows for the use of models to improve performance. We leave out each pair and impute their potential outcomes using information from the remaining observations. This imputation can be done with any prediction method, such as linear regression or random forests. Regardless of the imputation method, the P-LOOP estimator is unbiased and randomization is the basis for inference. In addition, one issue when making covariate adjustments is choosing which and how many covariates to use. We can address this issue in the P-LOOP estimator by choosing an imputation method that allows for automatic variable selection. [2] and [3] propose the use of targeted maximum likelihood estimation in paired experiments. As noted in [11], targeted maximum likelihood estimation allows for automatic variable selection when making covariate adjustments.

The P-LOOP estimator also addresses an issue that is specific to paired experiments, which we will call the pair inclusion trade-off. In paired experiments, the performance of the estimator can suffer if we fail to properly account for the pair assignments. If the relationship between the covariates and outcome within pairs is the opposite of the relationship overall, i.e., a Simpson’s paradox occurs, then omitting the pair assignments will hurt precision relative to the unadjusted estimator. However, in cases where the pair assignments are not predictive of the outcome, it is better to ignore the pairing. We discuss the pair inclusion trade-off further in Section 4. To address the trade-off, we impute two sets of potential outcomes, one in which we account for and the other where we ignore the pair assignments. Having two sets of imputed potential outcomes, we then interpolate between them by minimizing the cross validated mean squared error. By addressing this trade-off, we protect against the Simpson’s paradox, but retain the potential for improvements in precision if the pairing is not informative.

Covariate adjustment methods have also been proposed for matched-pair cluster randomized trials. For example, [17] propose a design-based estimator, while [23] propose a method that assumes a superpopulation.

This paper is organized as follows. In Section 2, we discuss the model and introduce notation. In Section 3, we present the P-LOOP estimator and derive a variance estimate. We discuss the pair inclusion trade-off further and present an imputation method to address it in Section 4. In Section 5, we apply the P-LOOP estimator to simulated and actual experimental data. Section 6 concludes.

2 Background and Notation

2.1 Estimating the Average Treatment Effect

In this paper, we work under the Neyman-Rubin model (see [19] and [16]), a non-parametric model that is often used to analyze randomized experiments. Consider a paired randomized experiment in which there are $2N$ individuals, indexed by $i=1,2,...,2N$ . We let $T_{i}=1$ if the participant is assigned to treatment and $T_{i}=0$ if control. Each of the $2N$ participants has two fixed (non-random) potential outcomes, $t_{i}$ and $c_{i}$ . We observe $t_{i}$ if participant $i$ is assigned to treatment and $c_{i}$ otherwise. That is, the observed outcome $Y_{i}$ for participant $i$ is

[TABLE]

We define the individual treatment effect for each participant as $t_{i}-c_{i}$ , and the average treatment effect as

[TABLE]

Consider the case where the $T_{i}$ are independent Bernoulli random variables with probability $p=0.5$ , and suppose we wish to estimate the average treatment effect. One unbiased estimate is obtained by taking the average observed outcome of the treatment group and subtracting the average observed outcome of the control group (the “simple difference estimator”). However, for each participant, suppose we observe a $q$ -dimensional vector of baseline covariates $Z_{i}$ prior to treatment assignment. It may be possible to use these covariates to improve the precision of the estimate over the simple difference estimator. For example, we could estimate the average treatment effect as

[TABLE]

where $\hat{m}_{i}$ is a function of $Z_{i}$ . Several authors have noted an estimator of this form can be used to incorporate covariate information. For example, [1] note that if $\hat{m}_{i}$ is predictive of the observed outcome $Y_{i}$ , then the resulting estimate will improve over the unadjusted estimator, while [22] suggest estimating the quantity $m_{i}=(t_{i}+c_{i})/2$ . In addition, [1] and [22] note that this estimate is unbiased if $T_{i}$ and $\hat{m}_{i}$ are independent. One way to ensure this independence is by obtaining $\hat{m}_{i}$ through a sample splitting procedure. For example, one could leave out the $i$ -th observation and calculate $\hat{m}_{i}$ using the remaining observations. See [20], [4], [18], and [15] for similar estimators.

2.2 Notation for Paired Experiments

We now consider the case where the participants are pair randomized. Suppose that the $2N$ participants are organized into $N$ pairs. We index the pairs by $i=1,2,...,N$ , each with two participants indexed by $j=1,2$ , and the quantities defined in Section 2.1 are re-indexed by $i$ and $j$ . For example, for participant $j$ in pair $i$ , we denote the potential outcomes as $t_{ij}$ and $c_{ij}$ , and define the observed outcome, treatment indicator, and covariates as $Y_{ij}$ , $T_{ij}$ , and $Z_{ij}$ , respectively.

For each pair, one of the two participants is randomly chosen to be assigned to treatment and the other is assigned to control. That is, $T_{i1}\sim\text{Bern}(0.5)$ , and $T_{i2}=1-T_{i1}$ . Note that the $T_{ij}$ ’s are not mutually independent because exactly one participant in each pair must be assigned to treatment. However, we assume the $T_{i1}$ ’s are mutually independent. We can therefore essentially convert our paired experiment to a Bernoulli randomized experiment by treating each pair as an experimental unit, as we describe next.

When treating each pair as a unit, we can draw direct analogues between the notation of paired and Bernoulli randomized experiments. We denote each pair’s treatment assignment by $T_{i}$ , where $T_{i}=T_{i1}$ . For each pair, we also observe a response variable $W_{i}$ and a $2q$ -dimensional vector of baseline covariates $(Z_{i1},Z_{i2})$ . As with a Bernoulli randomized experiment, each pair has two potential outcomes: we observe $a_{i}=t_{i1}-c_{i2}$ if $T_{i}=1$ and $b_{i}=t_{i2}-c_{i1}$ if $T_{i}=0$ . To differentiate these outcomes from those of the individual participants, we will refer to $a_{i}$ and $b_{i}$ as potential differences. We define the observed difference $W_{i}$ as:

[TABLE]

We define the pair-level treatment effect $\tau_{i}$ as

[TABLE]

and the average treatment effect $\bar{\tau}$ as

[TABLE]

which is our primary parameter of interest.

3 The P-LOOP Estimator

We now present the P-LOOP estimator, which is analogous to equation (1), but for paired experiments. Define the quantity

[TABLE]

where $m_{ij}=(t_{ij}+c_{ij})/2$ , and let

[TABLE]

where $\hat{d}_{i}$ is an estimate for $d_{i}$ . Recall that for Bernoulli randomized experiments, equation (1) is an unbiased estimate of the average treatment effect if $\hat{m}_{i}$ and $T_{i}$ are independent. An identical argument can be used for paired experiments to show that $\hat{\tau}_{i}$ will be unbiased if $\hat{d}_{i}$ and $T_{i}$ are independent.

We define the P-LOOP estimator as:

[TABLE]

in which we estimate $d_{i}$ by using a leave-one-out procedure. For each pair $i$ , we drop both observations and use the remaining $N-1$ pairs to impute $a_{i}$ and $b_{i}$ using any method (such as a random forest or linear regression). We then set $\hat{d}_{i}=\frac{1}{2}(\hat{a}_{i}-\hat{b}_{i})$ and repeat this procedure for all $N$ pairs to obtain $\hat{\tau}$ . This leave-one-out procedure ensures that the P-LOOP estimator will be unbiased, as $\hat{d}_{i}$ and $T_{i}$ are independent. Because the P-LOOP estimator is unbiased, the mean squared error of the estimator depends only on the variance.

3.1 Variance of the P-LOOP Estimator

In Appendix A, we show

[TABLE]

and thus that the variance of the P-LOOP estimator is

[TABLE]

where $\gamma_{ij}=\text{Cov}(\hat{\tau}_{i},\hat{\tau}_{j})$ . We provide an unbiased estimator for $\sum_{i\neq j}\gamma_{ij}$ in Appendix B. However, in practice we suggest that the variance be estimated without this term for computational efficiency, as $\sum_{i\neq j}\gamma_{ij}$ is generally negligible (see Appendix B). For this reason, we focus on estimating $\text{MSE}(\hat{d}_{i})$ .

To estimate the mean squared error of $\hat{d}_{i}$ , we express $\text{MSE}(\hat{d}_{i})$ in terms of the mean squared errors of $\hat{a}_{i}$ and $\hat{b}_{i}$ . In Appendix C, we show that

[TABLE]

where

[TABLE]

To estimate these quantities, let $\mathcal{A}=\{k:T_{k}=1\}$ , $\mathcal{B}=\{k:T_{k}=0\}$ , and $n$ be the number of elements in $\mathcal{A}$ . Define the following estimates for $M_{a}$ and $M_{b}$ :

[TABLE]

Having obtained estimates for $M_{a}$ and $M_{b}$ , we have the following plug-in estimator for the variance of the P-LOOP estimator:

[TABLE]

Finally, note that because $\text{Var}(\hat{\tau}_{i})=\text{MSE}(\hat{d}_{i})$ , the performance of the estimator depends directly on how well we estimate $d_{i}$ . One baseline approach is to set $\hat{d}_{i}=0$ for all $i$ , in which case $\hat{\tau}$ will exactly equal the simple difference estimator. The fact that the P-LOOP estimator reduces to the simple difference estimator in this case provides some reassurance that the leave-one-out procedure does not inherently introduce additional noise. Moreover, if we improve the estimate of $d_{i}$ over setting $\hat{d}_{i}=0$ , we will be able to improve precision beyond this baseline. Note that improving the estimate of $d_{i}$ is not necessarily trivial. Because we are interested in estimating the difference between $m_{i1}$ and $m_{i2}$ , it does not suffice to reduce the mean squared error for the imputed potential outcomes as in the estimator of [22]. For example, it is possible to obtain estimates of the potential outcomes that are close to the true values while having $\hat{d}_{i}$ of the incorrect sign. On the other hand, we could have estimates for the potential outcomes that are far from the true values that result in $\hat{d}_{i}$ being close to the true $d_{i}$ .

4 Imputation Methods of Potential Differences in Paired Experiments

We next present an imputation method to address the pair inclusion trade-off discussed in Section 1. We first further discuss this trade-off and then propose a method for addressing the trade-off within the P-LOOP estimator.

Note that for the P-LOOP estimator, we always drop both observations in each pair when estimating $d_{i}$ . Thus, when we discuss the inclusion or exclusion of the paired structure when imputing potential outcomes, we refer specifically to how we treat the remaining pairs when building a prediction model. If we ignore the paired structure when imputing potential outcomes, this means we fit a model to the remaining observations as individual units. If we include the paired structure when imputing potential outcomes, this means we fit a model to the remaining observations as paired units.

4.1 The Pair Inclusion Trade-Off

We first discuss the pair inclusion trade-off in the context of a linear model, rather than the Neyman-Rubin model, as it is perhaps easiest to understand the pair inclusion trade-off in this context. Consider the following standard linear regression model

[TABLE]

where $Y$ is the observed outcome, $T$ is the treatment assignment vector, $Z$ is a covariate, and $P$ is a $2N\times N$ matrix of indicator variables that encodes the pair assignments. Suppose that there are pair effects (that is, $\beta\neq 0$ ), and that $Z$ is correlated with both $P$ and $T$ . If we were to omit $P$ and regress $Y$ onto $T$ and $Z$ , then we would bias the estimate of $\tau$ . On the other hand, suppose that the pairing is not informative ( $\beta=0$ ). In this case, including $P$ in the regression would inflate the variance for $\hat{\tau}$ , and it would be preferable to omit $P$ from the regression.

This trade-off could occur with other covariate adjustment methods in paired experiments, including the P-LOOP estimator. Suppose we ignore the paired structure of the data when we train our imputation model for the potential outcomes. In this case, we model the relationship between the covariates and the outcome overall, rather than the relationship within pairs. However, if the relationship between the covariates and outcome within pairs is sufficiently different from the relationship overall, we could obtain a $\hat{d}_{i}$ that is far from the truth. One situation where this could happen is when a Simpson’s paradox occurs, and the relationship within pairs between the covariates and outcome is the opposite of the overall relationship. For example, the covariates may be positively correlated with outcome overall, but negatively correlated with the outcome within pairs. If we ignore pair assignments when estimating $d_{i}$ , we would infer that higher values of $Z$ are associated with higher values of $Y$ . However, for a given pair, we would want higher values of $Z$ to predict lower values of $Y$ . In this case, the predicted difference $d_{i}=m_{i1}-m_{i2}$ would be of the wrong sign, resulting in poorer performance relative to the simple difference estimator. On the other hand, if the paired structure is not predictive of the outcome, then it would be better to omit the pair assignments when imputing the potential differences.

It can be unclear whether we should account for the pair assignments when imputing the potential differences. To avoid data snooping, we propose an imputation method in this section that automatically addresses the trade-off. We first propose methods for calculating $\hat{a}_{i}$ and $\hat{b}_{i}$ that do and do not account for the pair assignments in the prediction model, producing two sets of potential differences. Having produced two estimates for each $a_{i}$ and $b_{i}$ , we propose a method to automatically interpolate between them.

4.2 Estimating $d_{i}$ when Pairs are not Predictive:

Impute Potential Outcomes Separately

We first estimate $d_{i}$ without accounting for the pair assignments for the observations outside of pair $i$ . To do this, we fit a model on the individual observations and then separately impute all four potential outcomes (i.e., $t_{i1},c_{i1},t_{i2},$ and $c_{i2}$ ) for a given pair.

More specifically, for each pair $i$ , we drop both observations in the pair. We then fit a prediction algorithm on the remaining observations, ignoring the pair assignments and treating each individual as a unit. For example, we could regress $Y_{kj}$ onto $T_{kj}$ and $Z_{kj}$ for $k\neq i$ . We then use this model to impute $t_{i1},c_{i1},t_{i2},$ and $c_{i2}$ . To obtain $\hat{t}_{i1}$ , we would plug in the covariates for the first observation in pair $i$ and a treatment indicator of 1. We would obtain estimates for the remaining potential outcomes similarly and set

[TABLE]

4.3 Estimating $d_{i}$ when Pairs are Predictive:

Impute Potential Differences Directly

Next, we propose a method that accounts for pair assignments when estimating $d_{i}$ . Rather than imputing the potential outcomes ( $t_{i1},c_{i1},t_{i2},$ and $c_{i2}$ ), we impute $a_{i}$ and $b_{i}$ directly, treating each pair as an observational unit. Recall from Section 3 that $a_{i}$ and $b_{i}$ are analogous to the potential outcomes in an experiment with Bernoulli randomization. We can therefore apply a procedure to the paired units that is similar to the leave-one-out procedure described earlier for estimating $m_{i}$ in equation (1). For Bernoulli experiments, we would only use the control units when imputing $c_{i}$ and the treatment units when imputing $t_{i}$ . However, for paired experiments $a_{i}$ and $b_{i}$ are determined by which unit is arbitrarily labeled $j=1$ and are therefore effectively interchangeable. As an example, for the $i$ -th pair, we have $a_{i}=t_{i1}-c_{i2}$ . However, if we had instead recorded the second unit in the pair first, then $a_{i}$ would be $t_{i2}-c_{i1}$ . We can take advantage of this fact to use all observations (except those in pair $i$ ) when imputing each potential difference.

In order to build a prediction model, we need to combine the covariates for each pair in some way. One way to do this would be to simply concatenate the covariate vectors for the two observations in each pair. In this case, we define $Z_{i}^{a}$ as the vector of covariates where the covariates for the treated units come first. That is, $Z_{i}^{a}=(Z_{i1},Z_{i2})$ if $T_{i}=1$ , and $Z_{i}^{a}=(Z_{i2},Z_{i1})$ if $T_{i}=0$ . Similarly, define $Z_{i}^{b}$ as the vector of covariates where the covariates for the control units come first. For example, suppose $Z_{i1}=(1,2)$ , $Z_{i2}=(2,3)$ , and $T_{i}=0$ . Then $Z_{i}^{a}=(2,3,1,2)$ and $Z_{i}^{b}=(1,2,2,3)$ . In other words, $Z_{i}$ is the concatenated vector of covariates as it is ordered in the original data, $Z_{i}^{a}$ is the concatenated vector where the covariates for the treated unit come first, and $Z_{i}^{b}$ is the concatenated vector where the covariates for the control unit come first.

Alternatively, we may wish to transform the covariates in some way; for example, we could take the means and differences of the covariates. In this case, define $Z_{i}$ as

[TABLE]

That is, $Z_{i}$ is the vector where the first $q$ entries are the averages of each covariate for the pair, and the second $q$ entries are the differences (observation 1 minus observation 2). In analogy to the concatenation example, we define $Z_{i}^{a}$ to be the means and the treatment minus control differences and $Z_{i}^{b}$ to be the means and the control minus treatment differences.

We can now estimate $d_{i}$ using these combined covariates and the observed differences. We start by leaving out pair $i$ . To impute $a_{i}$ , we create a model using the observed outcomes $W_{k}$ (for $k\neq i$ ) as our response variable and the covariates $Z_{k}^{a}$ as our predictors. We then plug the covariates $Z_{i}$ into this model to obtain $\hat{a}_{i}$ . To impute $b_{i}$ , we use a similar procedure, replacing $Z_{k}^{a}$ with $Z_{k}^{b}$ . Having obtained estimates $\hat{a}_{i}$ and $\hat{b}_{i}$ , we set

[TABLE]

4.4 Interpolating between Imputation Methods

We have proposed two methods for imputing potential outcomes. However, we often do not know ahead of time which method will perform better. We therefore interpolate between the two methods.

For each pair $i$ , we have two estimates of $a_{i}$ : $\hat{a}_{i}^{(1)}$ and $\hat{a}_{i}^{(2)}$ . We wish to obtain the value $\alpha_{i}$ that minimizes the distance between $a_{i}$ and the interpolation $\hat{a}_{i}=\alpha_{i}\hat{a}_{i}^{(1)}+(1-\alpha_{i})\hat{a}_{i}^{(2)}$ . However, we want $\hat{a}_{i}$ to be independent of $T_{i}$ . We therefore use a leave-one-out procedure to calculate $\alpha_{i}$ . For each $i$ , we leave out pair $i$ and set $\alpha_{i}$ to the value that minimizes the mean squared error for the remaining observations. In other words, we have

[TABLE]

Taking the derivative with respect to $x$ and setting equal to 0, we have

[TABLE]

which we then restrict to be in the interval $[0,1]$ . We then set our final estimate of $a_{i}$ to be $\hat{a}_{i}=\alpha_{i}\hat{a}_{i}^{(1)}+(1-\alpha_{i})\hat{a}_{i}^{(2)}$ . We use a similar procedure for $\hat{b}_{i}$ .

5 Results

We compare the performance of the P-LOOP estimator to that of other estimators. We start with a simulation to illustrate the pair inclusion trade-off. We then apply the P-LOOP estimator to data on a paired experiment involving schools in Texas.

5.1 Simulation Results

We compare the simple difference estimator to the P-LOOP estimator using random forests as the imputation method. Recall from earlier that we are excluding the pair assignments in our imputation method if we impute the potential outcomes ( $t_{i1},c_{i1},t_{i2},$ and $c_{i2}$ ) separately, while we are including the pair assignments if we impute the potential differences ( $a_{i}$ and $b_{i}$ ) directly. We show results using each of these imputation methods as well as the interpolation method.

Consider a hypothetical experiment where a blood pressure medication is being tested. We generate $N=50$ pairs of twins, half of which are of ethnicity $E_{i}=0$ and the other half $E_{i}=1$ . Next, suppose there exists a genetic mutation $Z_{ij}$ . For each participant, we set $Z_{ij}\sim\text{Bernoulli}(p_{k})$ for $E_{i}=k$ . We set $p_{1}=0.9$ and $p_{0}=0.5$ . That is, participants of ethnicity $E_{i}=1$ are more likely to have the mutation. We assume that only the observed outcome $Y_{ij}$ , as well as $T_{i}$ and $Z_{ij}$ , are recorded. Suppose that ethnicity 1 has a higher baseline blood pressure than ethnicity 0 (for reasons unrelated to the mutation), but that the presence of the mutation is causally associated with lower blood pressure. We generate the outcome as:

[TABLE]

where $\epsilon_{ij}\overset{iid}{\sim}\text{N}(0,4)$ . Because participants for ethnicity $E_{i}=1$ have higher baseline blood pressure, $Z_{ij}$ is positively correlated with blood pressure across all participants. Thus a Simpson’s paradox occurs: overall, $Z_{ij}$ has a positive association with blood pressure, while within pairs, $Z_{ij}$ has a negative association with blood pressure. We summarize the results of this simulation in Table 1 under the column Simpson’s Paradox.

We also generate a set of potential outcomes in which the pairs contain no additional information (beyond its association with covariate $Z_{ij}$ ). We generate the observed outcome as:

[TABLE]

where $\epsilon_{ij}\overset{iid}{\sim}\text{N}(0,4)$ . In this case, $E_{i}$ is associated with outcome because it is associated with $Z_{ij}$ , but otherwise has no effect on outcome. We summarize the results of this simulation in Table 1 under the column Uninformative Pairs.

We see that in the Simpson’s paradox case, imputing the potential outcomes separately (not accounting for pairs when estimating $a_{i}$ and $b_{i}$ ) causes inflated variance relative to the simple difference estimator, while imputing potential differences directly (accounting for pairs) results in improved performance. However, in the case where the pair assignments are uninformative, it is better to impute the potential outcomes separately. The gains in this example are relatively minor; however, we show in the next section that the improvements can be more substantial.

5.2 Texas Schools Data

We next apply the P-LOOP estimator to data on a randomized trial involving schools in Texas, which is discussed in [12]. This trial tested the effectiveness of a computer program, the Cognitive Tutor Algebra 1 curriculum, and included 22 pairs of schools. As the outcome, we consider the passing rate of the schools on the math section of the Texas Assessment of Knowledge and Skills (TAKS) in 2008. In addition to the passing rate, we also have available as covariates the school type (middle or high school) and a pretest score, the passing rate from 2007. We estimate the average treatment effect using either just the pretest score or both the pretest score and school type as covariates. In Table 2, we compare the performance of P-LOOP with the simple difference estimator and the estimators of [6], which we will refer to as Regression 1 and Regression 2. Regression 1 involves the treatment minus control outcomes regressed onto the treatment minus control covariates, while Regression 2 is the same regression with the addition of the mean of the covariates in each pair. For the sake of comparison, we use linear regression as the imputation method in the P-LOOP estimator. As in the case of the simulations, we show the results imputing potential differences (accounting for pairs), imputing potential outcomes separately (ignoring the pair assignments), and the interpolation between the two. Note that P-LOOP imputing potential differences most closely matches the Regression 2 method, as both methods account for pairing and use the differences and averages of the covariates for making adjustments.

Both P-LOOP and the methods of [6] outperform the unadjusted estimator in terms of nominal variance. It is not clear ahead of time which regression method will perform better. Regression 1 outperforms Regression 2 when the pretest score is the only covariate, but Regression 2 outperforms Regression 1 when the school type is included. Note that the regression methods always account for the pair assignments. For the P-LOOP estimator, we see that it is better to impute the potential outcomes separately, and that the interpolation method imputes values closer to the potential outcomes imputation. With the interpolation method, we do not lose out on the precision gains from ignoring the pairs in our imputation, but we are still protected against a potential Simpson’s paradox.

6 Discussion

In paired experiments, the design of the experiment helps to enforce covariate balance between the treatment and control groups. While this design is often effective, it can be useful to make covariate adjustments to further improve precision. Covariate adjustments in paired experiments share many of the issues in completely randomized experiments; for example, it can be unclear ahead of time which covariates to use. A unique issue to paired experiments is the pair inclusion trade-off, so we must take particular care when making adjustments in paired experiments. Failing to account for the pair assignments can harm performance (for example, when a Simpson’s paradox occurs), while including the paired structure when the pair assignments are not predictive can needlessly inflate variance.

We present a design-based method for paired experiments, the P-LOOP estimator. To the best of our knowledge, this method is the first to directly address the pair inclusion trade-off. Generally, other methods account for the pairing, which protects against Simpson’s paradox and other situations where the within pair trends differ from the overall trend. However, our method imputes two sets of potential outcomes, one excluding and one including the pair assignments, and automatically interpolates between the two. As we see in the Texas Schools data, this allows for improved precision. The P-LOOP estimator is also the first method for paired experiments that involves sample splitting and the use of machine learning methods to impute potential outcomes, building on the flexible approaches used in completely randomized experiments ([1], [20], [4], [22], [18], and [15]). This flexibility can be beneficial in several ways, such as allowing for automatic variable selection or high dimensional covariates.

Finally, logical extensions to the P-LOOP estimator include block randomized experiments and experiments with multiple treatments. As with paired experiments, it can be unclear whether to include the block assignments when making covariate adjustments. However, while paired experiments can be treated essentially as Bernoulli randomized experiments, this is not the case for blocked experiments and the variance estimation procedure outlined in this paper would necessarily be modified.

7 Acknowledgments

We would like to thank Zhenke Wu for helpful comments and suggestions. We would also like to thank John Pane and Adam Sales for providing the data set used in Section 5.2.

Appendix A True Variance of the P-LOOP Estimator

First, we calculate the variance of a single $\hat{\tau}_{i}$ :

[TABLE]

Let $\gamma_{ij}=\text{Cov}(\hat{\tau}_{i},\hat{\tau}_{j})$ . Then we have the following expression for the variance of the LOOP estimator

[TABLE]

Appendix B Handling the Covariance Terms $\gamma_{ij}$

In this section, we further discuss the covariance terms $\gamma_{ij}$ . First define $U_{i}=2T_{i}-1$ ; that is, $U_{i}=1$ if $T_{i}=1$ , and $U_{i}=-1$ if $T_{i}=0$ . Note that $U_{i}$ has expectation 0. Then we can rewrite $\hat{\tau}_{i}$ as $W_{i}-\hat{d}_{i}U_{i}$ . We have the following expression for $\gamma_{ij}$ :

[TABLE]

The first term is zero, as $W_{i}$ and $W_{j}$ are independent due to the independence of $T_{i}$ and $T_{j}$ . The second and third terms are also zero. Note that $U_{j}$ is independent of $W_{i}$ due to the independence of $T_{i}$ and $T_{j}$ , and recall that $\hat{d}_{j}$ is independent of $T_{j}$ (and therefore $U_{j}$ ). Then we have for the second term

[TABLE]

where the last line follows because $\text{E}(U_{j})=0$ . We therefore have

[TABLE]

Using a procedure outlined by [22], we can obtain an unbiased estimate of $\gamma_{ij}$ . This procedure involves leaving out two pairs (rather than a single pair) at a time and is therefore computationally expensive. However, estimating these terms is often unnecessary, as they are generally negligible by an identical argument to one presented in [22]. Under the setup and notation of Section 2.1 for Bernoulli randomized experiments, define an estimate the individual treatment effect for observation $i$ as:

[TABLE]

Note that we use $\hat{\delta}_{i}$ rather than $\hat{\tau}_{i}$ to avoid confusion with the notation for paired experiments. [22] show that for any observations $i$ and $j$ , the covariance of these individual treatment effect estimates is

[TABLE]

and note that these terms are generally negligible in the sense that for many estimators of practical interest, $\text{Cov}(\hat{m}_{i}U_{i},\hat{m}_{j}U_{j})$ goes to zero faster than $1/N$ .

Appendix C Bound on the Mean Squared Error of $\hat{d}_{i}$

We bound the term $\frac{1}{N^{2}}\sum_{i=1}^{N}\text{MSE}(\hat{d}_{i})$ . We can express the mean squared error of $\hat{d}_{i}$ in terms of the mean squared errors of $\hat{a}_{i}$ and $\hat{b}_{i}$ :

[TABLE]

This last inequality follows from the fact that $-\text{Bias}(\hat{a}_{i})\text{Bias}(\hat{b}_{i})-\text{Cov}(\hat{a}_{i},\hat{b}_{i})\leq\sqrt{\text{MSE}(\hat{a}_{i})\text{MSE}(\hat{b}_{i})}$ . We then have the following bound:

[TABLE]

where we define

[TABLE]

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Peter M Aronow and Joel A Middleton “A class of unbiased estimators of the average treatment effect in randomized experiments” In Journal of Causal Inference 1.1 , 2013, pp. 135–154
2[2] Laura B Balzer, Mark J Laan, Maya L Petersen and the SEARCH Collaboration “Adaptive pre-specification in randomized trials with and without pair-matching” In Statistics in medicine 35.25 Wiley Online Library, 2016, pp. 4528–4545
3[3] Laura B Balzer, Maya L Petersen, Mark J Laan and the SEARCH Collaboration “Targeted estimation and inference for the sample average treatment effect in trials with and without pair-matching” In Statistics in medicine 35.21 Wiley Online Library, 2016, pp. 3717–3732
4[4] Victor Chernozhukov et al. “Double/debiased machine learning for treatment and structural parameters” In The Econometrics Journal 21.1 Wiley Online Library, 2018, pp. C 1–C 68
5[5] Philip Dixon “Should blocks be fixed or random?” In Conference on Applied Statistics in Agriculture , 2016
6[6] Colin B. Fogarty “Regression-assisted inference for the average treatment effect in paired experiments” In Biometrika 105.4 Oxford University Press, 2018, pp. 994–1000
7[7] David A Freedman “On regression adjustments to experimental data” In Advances in Applied Mathematics 40.2 Elsevier, 2008, pp. 180–193
8[8] Kosuke Imai, Gary King and Clayton Nall “Rejoinder: Matched pairs and the future of cluster-randomized experiments” In Statistical Science 24.1 JSTOR, 2009, pp. 65–72

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The P-LOOP Estimator: Covariate Adjustment for Paired Experiments

Abstract

1 Introduction

2 Background and Notation

2.1 Estimating the Average Treatment Effect

2.2 Notation for Paired Experiments

3 The P-LOOP Estimator

3.1 Variance of the P-LOOP Estimator

4 Imputation Methods of Potential Differences in Paired Experiments

4.1 The Pair Inclusion Trade-Off

4.2 Estimating did_{i}di​ when Pairs are not Predictive:

4.3 Estimating did_{i}di​ when Pairs are Predictive:

4.4 Interpolating between Imputation Methods

5 Results

5.1 Simulation Results

5.2 Texas Schools Data

6 Discussion

7 Acknowledgments

Appendix A True Variance of the P-LOOP Estimator

Appendix B Handling the Covariance Terms γij\gamma_{ij}γij​

Appendix C Bound on the Mean Squared Error of d^i\hat{d}_{i}d^i​

4.2 Estimating $d_{i}$ when Pairs are not Predictive:

4.3 Estimating $d_{i}$ when Pairs are Predictive:

Appendix B Handling the Covariance Terms $\gamma_{ij}$

Appendix C Bound on the Mean Squared Error of $\hat{d}_{i}$