On randomization-based causal inference for matched-pair factorial designs
Jiannan Lu, Alex Deng

TL;DR
This paper introduces matched-pair factorial designs within the potential outcomes framework, proposing estimators for factorial effects and their covariance, enhancing causal inference methods for complex experimental setups.
Contribution
It develops a new matched-pair design framework and provides estimators for factorial effects and their covariance matrices under randomization-based causal inference.
Findings
Derived the matched-pair estimator for factorial effects
Calculated the covariance matrix of the estimator
Provided a Neymanian estimator for the covariance matrix
Abstract
Under the potential outcomes framework, we introduce matched-pair factorial designs, and propose the matched-pair estimator of the factorial effects. We also calculate the randomization-based covariance matrix of the matched-pair estimator, and provide the "Neymanian" estimator of the covariance matrix.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
On randomization-based causal inference for matched-pair factorial designs
Jiannan Lu111Address for correspondence: Jiannan Lu, One Microsoft Way, Redmond, Washington 98052-6399, U.S.A. Email: [email protected] and Alex Deng
Analysis and Experimentation, Microsoft Corporation
Abstract
Under the potential outcomes framework, we introduce matched-pair factorial designs, and propose the matched-pair estimator of the factorial effects. We also calculate the randomization-based covariance matrix of the matched-pair estimator, and provide the “Neymanian” estimator of the covariance matrix.
Keywords: Experimental design; factorial effect; precision; potential outcome.
Introduction
Randomization is widely regarded as the gold standard of causal inference (Rubin, 2008). Under the potential outcomes framework (Neyman, 1923; Rubin, 1974), for a two-level factor, we define the causal effect as the linear contrast of the potential outcomes under treatment and control. To investigate multiple factors simultaneously, factorial designs (Fisher, 1935; Yates, 1937) can be employed. Randomization-based casual inference for factorial designs has deep roots in the experimental design literature (e.g., Kempthrone, 1952), and was recently presented using the language of potential outcomes (Dasgupta et al., 2015; Mukerjee et al., 2016).
Pair-matching (Cochran, 1953), as a special form of stratification, has been widely adopted by researchers and practitioners (e.g., Grossarth-Maticek and Ziegler, 2008). For treatment-control studies (i.e., factorial designs), pair-matching has been extensively investigated by the causal inference community (Rosenbaum, 2002; Imai, 2008; Imai et al., 2009; Ding, 2016; Fogarty, 2016a ; Fogarty, 2016b ). Unfortunately, similar discussion appears to be missing for general factorial designs. In this paper, we fill this theoretical gap by extending Imai, (2008)’s analysis to matched-pair factorial designs. We restrict the experimental units to be a fixed finite population, for a two-fold reason. First, as shown in Imai, (2008), it is straightforward to generalize the finite-population analyses to infinite populations. Second, for some practical examples, it might be unreasonable to view the experimental units as a random sample from an infinite population.
The paper proceeds as follows. Section 2 reviews the randomization-based causal inference framework for completely randomized factorial designs. Section 3 introduces matched-pair factorial designs, proposes the matched-pair estimator for the factorial effects, calculates its covariance matrix and the corresponding estimator. Section 4 briefly discusses the precision gains by pair-matching in factorial designs, and concludes.
Causal inference for completely randomized factorial designs
To ensure self-containment, we first review the randomization-based causal inference framework for completely randomized factorial designs. Although most materials are adapted from Dasgupta et al., (2015) and Lu, 2016a ; Lu, 2016b , some are refined for better clarity. For more detailed discussions on factorial designs, see, e.g., Wu and Hamada, (2009).
2.1 Factorial designs
A factorial design consists of two-level (coded and ) factors. We represent it by the corresponding model matrix (Wu and Hamada, 2009), a matrix that can be constructed as follows:
Let 2. 2.
For , construct by letting its first entries be the next entries be and repeating times; 3. 3.
If order all subsets of with at least two elements, first by cardinality and then lexicography. For let be the th subset and where “” stands for entry-wise product.
The use of the constructed is two-fold:
corresponds to the null effect; to correspond to the main effects of the factors; to correspond to the two-way interactions; corresponds to the -way interaction; 2. 2.
The th row of corresponds to the th treatment combination
For let denote the th row of
Example 1**.**
For factorial designs, the model matrix is:
[TABLE]
The four treatment combinations are and We represent the main effects of factors 1 and 2 by and respectively, and the two-way interaction by
2.2 Randomization-based causal inference
We consider a factorial design with units. By invoking the Stable Unit Treatment Value Assumption (Rubin, 1980), for and let the potential outcome of unit under be the average potential outcome for be and Define the individual and population-level factorial effect vectors as
[TABLE]
respectively. Our interest lies in
We denote the treatment assignment mechanism by
[TABLE]
We impose the following restrictions on the treatment assignment mechanism:
[TABLE]
In other words, we assign units to each treatment, and one treatment to each unit. Therefore, the observed outcome of unit is and the average observed outcome for treatment is Under complete randomization, Dasgupta et al., (2015) estimated by
[TABLE]
The sole source of randomness of is the treatment assignment. Dasgupta et al., (2015) and Lu, 2016b derived the covariance matrix of this estimator, and the “Neymanian” estimator of the covariance matrix. We summarize their main results in the following lemmas.
Lemma 1**.**
is unbiased, and its covariance matrix is
[TABLE]
Moreover, the “Neymanian” estimator of the covariance matirx is
[TABLE]
whose bias is
The covariance matrix estimator is “conservative,” because its diagonal entries, i.e., the variance estimators of the components of have non-negative biases.
Causal inference for matched-pair randomized factorial designs
3.1 Matched-pair designs and causal parameters
As pointed out by Imai, (2008), they key idea behind matched-pair designs is that “experimental units are paired based on their pre-treatment characteristics and the randomization of treatment is subsequently conducted within each matched pair.” To apply this idea to factorial designs, we group the experimental units into “pairs” of units, and within each pair randomly assign one unit to each treatment. Let be the set of indices of the units in pair such that
[TABLE]
For pair denote the average outcomes for treatment as and and the factorial effect vector as It is apparent
[TABLE]
Within each pair, we randomly assign one unit to each treatment. Let the observed outcome of treatment in pair be and We estimate by The matched-pair estimator for is
[TABLE]
3.2 Randomization-based inference
We now present the main results of this paper.
Proposition 1**.**
is an unbiased estimator of and its covariance matrix is
[TABLE]
where
[TABLE]
and
[TABLE]
Proof.
To prove the first part, note that is an unbiased estimator of for This fact combined with (3) completes the proof.
To prove the second part, let denote the treatment assignment for pair By definition, ’s are independently and identically distributed, implying the (joint) independence of ’s. Consequently, we can treat each pair as a completely randomized factorial design with units. Therefore by Lemma 1,
[TABLE]
This implies that
[TABLE]
To prove the equivalence between (4) and (5), simply note that
[TABLE]
and
[TABLE]
The proof is complete. ∎
We discuss a special case before moving forward. When we have the classic treatment-control studies, and label the treatment and control as and respectively. We are interested in the difference-in-mean estimator
[TABLE]
Denote Imai, (2008) (p. 4861, Eq. (8)) derived the variance of as
[TABLE]
As a validity check, Proposition 1 reduces to (6) when We leave the proof to the readers.
We discuss the estimation of because Lemma 1 does not apply for matched-pair factorial designs. Inspired by Imai, (2008), we propose the following estimator:
[TABLE]
Proposition 2**.**
The bias of the covariance estimator in (7) is
[TABLE]
Proof.
The proof is a basic maneuver of the expectation and covariance operators. First, by (3) and the joint independence of ’s,
[TABLE]
Therefore by (7),
[TABLE]
∎
Proposition 2 implies that the estimator of is also “conservative.” We leave it to the readers to prove that for treatment-control studies, Proposition 2 reduces to the corresponding results in Imai, (2008) (p. 4862, Prop. 2, Part 1).
Discussions and concluding remarks
For treatment-control studies, Imai, (2008) compared the variance formulas for the complete-randomization and matched-pair estimators, and derived the condition under which pair-matching leads to precision gains. For general factorial designs, analogous comparisons can be made between (2) and (4). However, to our best knowledge, intuitive closed-form expressions might not be available without additional assumptions on the potential outcomes.
There are multiple future directions based on our current work. First, we may compare the precisions of the complete-randomization and matched-pair estimators under certain mild restrictions on the potential outcomes. Second, it is possible to unify the randomization-based and regression-based inference frameworks, as pointed out by Samii and Aronow, (2012) and Lu, 2016b . Third, additional pre-treatment covariates may shed light on the pair-matching mechanism, and help sharpen our current analysis.
Acknowledgements
The first author thanks Professor Tirthankar Dasgupta at Rutgers University and Professor Peng Ding at University California at Berkeley, for their early educations on causal inference and experimental design. We thank the Co-Editor-in-Chief and an anonymous reviewer for their thoughtful comments, which have substantially improved the presentation of this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Cochran, (1953) Cochran, W. G. (1953). Matching in analytical studies. American Journal of Public Health , 43:684–691.
- 2Dasgupta et al., (2015) Dasgupta, T., Pillai, N., and Rubin, D. B. (2015). Causal inference from 2 k superscript 2 𝑘 2^{k} factorial designs using the potential outcomes model. Journal of the Royal Statistical Society: Series B , 77:727–753.
- 3Ding, (2016) Ding, P. (2016). A paradox from randomization-based causal inference (with discussion). Statistical Science , in press.
- 4Fisher, (1935) Fisher, R. A. (1935). The Design of Experiments . Edinburgh: Oliver and Boyd.
- 5(5) Fogarty, C. B. (2016 a). Regression assisted inference for the average treatment effect in paired experiments. ar Xiv:1612.05179 .
- 6(6) Fogarty, C. B. (2016 b). Sensitivity analysis for the average treatment effect in paired observational studies. ar Xiv:1609.02112 .
- 7Grossarth-Maticek and Ziegler, (2008) Grossarth-Maticek, R. and Ziegler, R. (2008). Randomized and non-randomized prospective controlled cohort studies in matched pair design for the long-term therapy of corpus uteri cancer patients with a mistletoe preparation. European Journal of Medical Research , 13:107–120.
- 8Imai, (2008) Imai, K. (2008). Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Statistics in Medicine , 27:4857–4873.
