Instrumental variables as bias amplifiers with general outcome and   confounding

Peng Ding; Tyler VanderWeele; James Robins

arXiv:1701.04177·math.ST·January 17, 2017

Instrumental variables as bias amplifiers with general outcome and confounding

Peng Ding, Tyler VanderWeele, James Robins

PDF

TL;DR

This paper develops a general theory demonstrating that instrumental variables can amplify bias in causal estimates under certain models, challenging the common practice of adjusting for all covariates.

Contribution

It extends previous linear model results to a broad class of models with monotonicity assumptions, providing new insights into bias amplification with instrumental variables.

Findings

01

Bias amplification occurs under wide models with monotonicity.

02

Instrumental variables can increase bias when used as covariates.

03

Monotonicity assumptions relate to causal diagram signs.

Abstract

Drawing causal inference with observational studies is the central pillar of many disciplines. One sufficient condition for identifying the causal effect is that the treatment-outcome relationship is unconfounded conditional on the observed covariates. It is often believed that the more covariates we condition on, the more plausible this unconfoundedness assumption is. This belief has had a huge impact on practical causal inference, suggesting that we should adjust for all pretreatment covariates. However, when there is unmeasured confounding between the treatment and outcome, estimators adjusting for some pretreatment covariate might have greater bias than estimators without adjusting for this covariate. This kind of covariate is called a bias amplifier, and includes instrumental variables that are independent of the confounder, and affect the outcome only through the treatment.…

Figures1

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 1 : Examples for the presence and absence of Z-Bias, in which Z ∼ similar-to 𝑍 absent Z\sim Bernoulli ( 0.5 ) 0.5 (0.5) , U ∼ similar-to 𝑈 absent U\sim Bernoulli ( 0.5 ) 0.5 (0.5) , the conditional probability of the treatment A 𝐴 A is p z u = pr ( A = 1 ∣ Z = z , U = u ) p_{zu}=\textup{pr}(A=1\mid Z=z,U=u) , and the conditional probability of the outcome Y 𝑌 Y is r a u = pr ( Y = 1 ∣ A = a , U = u ) r_{au}=\textup{pr}(Y=1\mid A=a,U=u) .

Case	$p_{11}$	$p_{10}$	$p_{01}$	$p_{00}$	$r_{11}$	$r_{10}$	$r_{01}$	$r_{00}$	${ACE}^{true}$	${ACE}^{unadj}$	${ACE}^{adj}$	Z-Bias
1	0.8	0.6	0.2	0.1	0.08	0.06	0.02	0.01	0.0550	0.0574	0.0584	YES
2	0.3	0.2	0.3	0.1	0.03	0.02	0.03	0.01	0.0050	0.0076	0.0077	YES
3	0.5	0.4	0.4	0.1	0.04	0.04	0.04	0.01	0.0150	0.0173	0.0172	NO

Table 2. Table 2 : The example from Wooldridge ( 2010 ) .

	point estimate	standard error	lower confidence limit	upper confidence limit
${ACE}^{true}$	2.47	0.59	1.31	3.62
${ACE}^{unadj}$	1.77	0.07	1.64	1.90
${ACE}^{adj}$	1.76	0.07	1.64	1.89

Equations170

U \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

U \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

U \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

U \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

\textsc A C E_{1}^{true} = E {Y (1) ∣ A = 1} - E {Y (0) ∣ A = 1},

\textsc A C E_{1}^{true} = E {Y (1) ∣ A = 1} - E {Y (0) ∣ A = 1},

\textsc A C E_{0}^{true} = E {Y (1) ∣ A = 0} - E {Y (0) ∣ A = 0},

\textsc A C E_{0}^{true} = E {Y (1) ∣ A = 0} - E {Y (0) ∣ A = 0},

\textsc A C E^{true} = E {Y (1)} - E {Y (0)} .

\textsc A C E^{true} = E {Y (1)} - E {Y (0)} .

\textsc A C E_{1}^{true}

\textsc A C E_{1}^{true}

\textsc A C E_{0}^{true}

\textsc A C E^{true}

\textsc A C E^{unadj} = E (Y ∣ A = 1) - E (Y ∣ A = 0) .

\textsc A C E^{unadj} = E (Y ∣ A = 1) - E (Y ∣ A = 0) .

\textsc A C E_{1}^{adj} = E (Y ∣ A = 1) - \int μ_{0} (z) F (d z ∣ A = 1),

\textsc A C E_{1}^{adj} = E (Y ∣ A = 1) - \int μ_{0} (z) F (d z ∣ A = 1),

\textsc A C E_{0}^{adj} = \int μ_{1} (z) F (d z ∣ A = 0) - E (Y ∣ A = 0),

\textsc A C E_{0}^{adj} = \int μ_{1} (z) F (d z ∣ A = 0) - E (Y ∣ A = 0),

\textsc A C E^{adj} = \int μ_{1} (z) F (d z) - \int μ_{0} (z) F (d z) .

\textsc A C E^{adj} = \int μ_{1} (z) F (d z) - \int μ_{0} (z) F (d z) .

\textsc A C E_{1}^{adj} \textsc A C E_{0}^{adj} \textsc A C E^{adj} \geq \textsc A C E^{unadj} \textsc A C E^{unadj} \textsc A C E^{unadj} \geq \textsc A C E_{1}^{true} \textsc A C E_{0}^{true} \textsc A C E^{true} .

\textsc A C E_{1}^{adj} \textsc A C E_{0}^{adj} \textsc A C E^{adj} \geq \textsc A C E^{unadj} \textsc A C E^{unadj} \textsc A C E^{unadj} \geq \textsc A C E_{1}^{true} \textsc A C E_{0}^{true} \textsc A C E^{true} .

\frac{p _{11} p _{00}}{p _{10} p _{01}} \leq 1, \frac{( 1 - p _{11} ) ( 1 - p _{00} )}{( 1 - p _{10} ) ( 1 - p _{01} )} \leq 1,

\frac{p _{11} p _{00}}{p _{10} p _{01}} \leq 1, \frac{( 1 - p _{11} ) ( 1 - p _{00} )}{( 1 - p _{10} ) ( 1 - p _{01} )} \leq 1,

\textsc A C E_{1}^{adj} = E (Y ∣ A = 1) - \int ν_{0} (π) F (d π ∣ A = 1),

\textsc A C E_{1}^{adj} = E (Y ∣ A = 1) - \int ν_{0} (π) F (d π ∣ A = 1),

\textsc A C E_{0}^{adj} = \int ν_{1} (π) F (d π ∣ A = 0) - E (Y ∣ A = 0),

\textsc A C E_{0}^{adj} = \int ν_{1} (π) F (d π ∣ A = 0) - E (Y ∣ A = 0),

\textsc A C E^{adj} = \int ν_{1} (π) F (d π) - \int ν_{0} (π) F (d π) .

\textsc A C E^{adj} = \int ν_{1} (π) F (d π) - \int ν_{0} (π) F (d π) .

U = {Y (1), Y (0)} \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

U = {Y (1), Y (0)} \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

\textsc O R_{Y} = \frac{pr { Y ( 1 ) = 1 , Y ( 0 ) = 1 } pr { Y ( 1 ) = 0 , Y ( 0 ) = 0 }}{pr { Y ( 1 ) = 1 , Y ( 0 ) = 0 } pr { Y ( 1 ) = 0 , Y ( 0 ) = 1 }} \geq 1.

\textsc O R_{Y} = \frac{pr { Y ( 1 ) = 1 , Y ( 0 ) = 1 } pr { Y ( 1 ) = 0 , Y ( 0 ) = 0 }}{pr { Y ( 1 ) = 1 , Y ( 0 ) = 0 } pr { Y ( 1 ) = 0 , Y ( 0 ) = 1 }} \geq 1.

pr (A = 1 ∣ Π, U) = Π + δ [Y (1) - E {Y (1)}] + η [Y (0) - E {Y (0)}] + θ [Y (1) Y (0) - E {Y (1) Y (0)}] .

pr (A = 1 ∣ Π, U) = Π + δ [Y (1) - E {Y (1)}] + η [Y (0) - E {Y (0)}] + θ [Y (1) Y (0) - E {Y (1) Y (0)}] .

pr (A = 1 ∣ U, Z) = α_{0} + α_{1} U + α_{2} Z, pr (Y = 1 ∣ U, A) = β_{0} + β_{1} U + β_{2} A,

pr (A = 1 ∣ U, Z) = α_{0} + α_{1} U + α_{2} Z, pr (Y = 1 ∣ U, A) = β_{0} + β_{1} U + β_{2} A,

pr (A = 1 ∣ U, Z) = α_{0} α_{1}^{U} α_{2}^{Z}, pr (Y = 1 ∣ U, A) = β_{0} β_{1}^{U} β_{2}^{A},

pr (A = 1 ∣ U, Z) = α_{0} α_{1}^{U} α_{2}^{Z}, pr (Y = 1 ∣ U, A) = β_{0} β_{1}^{U} β_{2}^{A},

U \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

U \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

\frac{p _{11} p _{00}}{p _{10} p _{01}} \leq 1, \frac{( 1 - p _{11} ) ( 1 - p _{00} )}{( 1 - p _{10} ) ( 1 - p _{01} )} \leq 1.

\frac{p _{11} p _{00}}{p _{10} p _{01}} \leq 1, \frac{( 1 - p _{11} ) ( 1 - p _{00} )}{( 1 - p _{10} ) ( 1 - p _{01} )} \leq 1.

\frac{p _{11} p _{00}}{p _{10} p _{01}}

\frac{p _{11} p _{00}}{p _{10} p _{01}}

\frac{( 1 - p _{11} ) ( 1 - p _{00} )}{( 1 - p _{10} ) ( 1 - p _{01} )} = 1 + \frac{( 1 - p _{11} ) ( 1 - p _{00} ) - ( 1 - p _{10} ) ( 1 - p _{01} )}{( 1 - p _{10} ) ( 1 - p _{01} )}

\frac{( 1 - p _{11} ) ( 1 - p _{00} )}{( 1 - p _{10} ) ( 1 - p _{01} )} = 1 + \frac{( 1 - p _{11} ) ( 1 - p _{00} ) - ( 1 - p _{10} ) ( 1 - p _{01} )}{( 1 - p _{10} ) ( 1 - p _{01} )}

\frac{\partial F ( u ∣ A = a , Z = z )}{\partial z} \geq 0,

\frac{\partial F ( u ∣ A = a , Z = z )}{\partial z} \geq 0,

p_{11}

p_{11}

p_{10}

p_{01}

p_{00}

p_{11} \geq β (z_{1}) + γ (u), p_{10} \geq β (z_{0}) + γ (u), p_{01} \leq β (z_{1}) + γ (u), p_{00} \leq β (z_{0}) + γ (u),

p_{11} \geq β (z_{1}) + γ (u), p_{10} \geq β (z_{0}) + γ (u), p_{01} \leq β (z_{1}) + γ (u), p_{00} \leq β (z_{0}) + γ (u),

p_{11} - p_{10} - p_{01} + p_{00}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Instrumental variables as bias amplifiers with general outcome and confounding

P. Ding

[email protected]

Department of Statistics, University of California, Berkeley, California, USA.

T. J. VanderWeele

J. M. Robins

[email protected]

Departments of Epidemiology and Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA.

Abstract

Drawing causal inference with observational studies is the central pillar of many disciplines. One sufficient condition for identifying the causal effect is that the treatment-outcome relationship is unconfounded conditional on the observed covariates. It is often believed that the more covariates we condition on, the more plausible this unconfoundedness assumption is. This belief has had a huge impact on practical causal inference, suggesting that we should adjust for all pretreatment covariates. However, when there is unmeasured confounding between the treatment and outcome, estimators adjusting for some pretreatment covariate might have greater bias than estimators without adjusting for this covariate. This kind of covariate is called a bias amplifier, and includes instrumental variables that are independent of the confounder, and affect the outcome only through the treatment. Previously, theoretical results for this phenomenon have been established only for linear models. We fill in this gap in the literature by providing a general theory, showing that this phenomenon happens under a wide class of models satisfying certain monotonicity assumptions. We further show that when the treatment follows an additive or multiplicative model conditional on the instrumental variable and the confounder, these monotonicity assumptions can be interpreted as the signs of the arrows of the causal diagrams.

keywords:

Causal inference; Directed acyclic graph; Interaction; Monotonicity; Potential outcome

1 Introduction

Causal inference from observational data is an important but challenging problem for empirical studies in many disciplines. Under the potential outcomes framework (Neyman, 1923[1990]; Rubin, 1974), the causal effects are defined as comparisons between the potential outcomes under treatment and control, averaged over a certain population of interest. One sufficient condition for nonparametric identification of the causal effects is the ignorability condition (Rosenbaum & Rubin, 1983), that the treatment is conditionally independent of the potential outcomes given those pretreatment covariates that confound the relationship between the treatment and outcome. To make this fundamental assumption as plausible as possible, many researchers suggest that the set of collected pretreatment covariates should be as rich as possible. It is often believed that “typically, the more conditional an assumption, the more generally acceptable it is” (Rubin, 2009), and therefore “in principle, there is little or no reason to avoid adjustment for a true covariate, a variable describing subjects before treatment” (Rosenbaum, 2002, pp. 76).

Simply adjusting for all pretreatment covariates (d’Agostino, 1998; Rosenbaum, 2002; Hirano & Imbens, 2001), or the pretreatment criterion (VanderWeele & Shpitser, 2011), has a sound justification from the view point of design and analysis of randomized experiments. Cochran (1965), citing Dorn (1953), suggested that the planner of an observational study should always ask himself the question, “How would the study be conducted if it were possible to do it by controlled experimentation?” Following this classical wisdom, Rubin (2007, 2008a, 2008b, 2009) argued that the design of observational studies should be in parallel with the design of randomized experiments, i.e., because we balance all pretreatment covariates in randomized experiments, we should also follow this pretreatment criterion and balance or adjust for all pretreatment covariates when designing observational studies.

However, this pretreatment criterion can result in increased bias under certain data generating processes. We highlight two important classes of such data generating processes for which the pretreatment criterion may be problematic. The first class is captured by an example of Greenland & Robins (1986), in which conditioning on a pretreatment covariate invalidates the ignorability assumption and thus a conditional analysis is biased; yet the ignorability assumption holds unconditionally, so an analysis that ignores the covariate is unbiased. Several researchers have shown that this phenomenon is generic when the data are generated under the causal diagram in Figure 1(a). In this diagram, the ignorability assumption holds unconditionally but not conditionally (Pearl, 2000; Spirtes et al., 2000; Greenland, 2003; Pearl, 2009; Shrier, 2008, 2009; Sjölander, 2009; Ding & Miratrix, 2015). In Figure 1(a), a pretreatment covariate $M$ is associated with two independent unmeasured covariates $U$ and $U^{\prime}$ , but $M$ does not itself affect either the treatment $A$ or outcome $Y$ . Because the corresponding causal diagram looks like the English letter M, this phenomenon is called M-Bias.

The second class of processes, which constitute the subject of this paper, are represented by the causal diagram in Figure 1(b). Owing to confounding by the unmeasured common cause $U$ of the treatment $A$ and the outcome $Y$ , both the analysis that adjusts and the analysis that fails to adjust for pretreatment measured covariates are biased. If the magnitude of the bias is larger when we adjust for a particular pretreatment covariate than when we do not, we refer to the covariate as a bias amplifier. Of particular interest is to determine the conditions under which an instrumental variable is a bias amplifier. An instrumental variables is a pretreamtnet covariate that is independent of the confounder $U$ and has no direct effect on the outcome except through its effect on the treatment. The variable $Z$ in Figure 1(b) is an example. Heckman & Navarro-Lozano (2004) and Bhattacharya & Vogt (2012) showed numerically that when the treatment and outcome are confounded, adjusting for an instrumental variable can result in greater bias than the unadjusted estimator. Wooldridge theoretically demonstrated this in linear models in a technical report in 2006, which was finally published as Wooldridge (2016). Because instrumental variables are often denoted by $Z$ as in Figure 1(b), this phenomenon is called Z-Bias.

The treatment assignment is a function of the instrumental variable, the unmeasured confounder and some other independent random error, which are the three sources of variation of the treatment. If we adjust for the instrumental variable, the treatment variation is driven more by the unmeasured confounder, which could result in increased bias due to this confounder. Seemingly paradoxically, without adjusting for the instrumental variable, the observational study is more like a randomized experiment, and the bias due to confounding is smaller. Although applied researchers (Myers et al., 2011; Walker, 2013; Brooks & Ohsfeldt, 2013; Ali et al., 2014) have confirmed through extensive simulation studies that this bias amplification phenomenon exists in a wide range of reasonable models, definite theoretical results have been established only for linear models. We fill in this gap in the literature by showing that adjusting for an instrumental variable amplifies bias for estimating causal effects under a wide class of models satisfying certain monotonicity assumptions. When the instrumental variable and the confounder have either no additive or no multiplicative interaction on the treatment, these assumptions can be interpreted as the signs of the arrows of the causal diagram (VanderWeele & Robins, 2010). However, we also show that there exist data generating processes under which an instrumental variable is not a bias amplifier.

2 Framework and Notation

We consider a binary treatment $A$ , an instrumental variable $Z$ , an unobserved confounder $U,$ and an outcome $Y$ , with the joint distribution depicted by the causal diagram in Figure 1(b). Let denote conditional independence between random variables. Then the instrumental variable $Z$ in Figure 1(b) satisfies $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U$ , $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y\mid(A,U)$ and $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \put(1.0,0.0){{\it/}} \end{picture}A.$ We first discuss analysis conditional on observed pretreatment covariates $X$ , and comment on averaging over $X$ in §6 and the Supplementary Material. We define the potential outcomes of $Y$ under treatment $a$ as $Y(a)$ , $(a=1,0).$ The true average causal effect of $A$ on $Y$ for the population actually treated is

[TABLE]

for the population who are actually in the control condition it is

[TABLE]

and for the whole population it is

[TABLE]

Define $m_{a}(u)=E(Y\mid A=a,U=u)$ to be the conditional mean of the outcome given the treatment and confounder. As illustrated by Figure 1(b), because $U$ suffices to control confounding between $A$ and $Y$ , the ignorability assumption $A\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(a)\mid U$ holds for $a=0$ and $1$ . Therefore, according to $Y=AY(1)+(1-A)Y(0)$ , we have

[TABLE]

The unadjusted estimator is the naive comparison between the treatment and control means

[TABLE]

Define $\mu_{a}(z)=E(Y\mid A=a,Z=z)$ as the conditional mean of the outcome given the treatment and instrumental variable. Because the instrumental variable $Z$ is also a pretreatment covariate unaffected by the treatment, the usual strategy to adjust for all pretreatment covariates suggests using the adjusted estimator for the population under treatment

[TABLE]

for the population under control

[TABLE]

and for the whole population

[TABLE]

Surprisingly, for linear structural equation models on $(Z,U,A,Y)$ , previous theory demonstrated that the magnitudes of the biases of the adjusted estimators are no smaller than the unadjusted ones (Pearl, 2010, 2011, 2013; Wooldridge, 2016). The goal of the rest of our paper is to show that this phenomenon exists in more general scenarios.

3 Scalar Instrumental Variable and Scalar Confounder

We first give a theorem for a scalar instrumental variable $Z$ and a scalar confounder $U.$

Theorem 3.1.

In the causal diagram of Figure 1(b) with scalar $Z$ and $U$ , if

(a)

$\textup{pr}(A=1\mid Z=z)$ * is non-decreasing in $z$ , $\textup{pr}(A=1\mid U=u)$ is non-decreasing in $u$ , and $E(Y\mid A=a,U=u)$ is non-decreasing in $u$ for both $a=0$ and $1$ ;* 2. (b)

$E(Y\mid A=a,Z=z)$ * is non-increasing in $z$ for both $a=0$ and $1$ ,*

then

[TABLE]

Inequalities among vectors as in (1) should be interpreted as component-wise relationships. Intuitively, the monotonicity in Condition (a) of Theorem 3.1 requires non-negative dependence structures on arrows $Z\rightarrow A$ , $U\rightarrow A$ and $U\rightarrow Y$ in the causal diagram of Figure 1(b). Because the dependence is in expectation, Condition (a) of Theorem 3.1 is weaker than the requirement of signed directed acyclic graphs (VanderWeele & Robins, 2010).

The monotonicity in Condition (b) of Theorem 3.1 reflects the collider bias caused by conditioning on $A$ . As noted by Greenland (2003), in many cases, if $Z$ and $U$ affect $A$ in the same direction, then the collider bias caused by conditioning on $A$ is often in the opposite direction. Lemmas S6–S12 in the Supplementary Material show that, if $Z$ and $U$ are independent and have non-negative additive or multiplicative effects on $A$ , then conditioning on $A$ results in negative association between $Z$ and $U$ . This negative collider bias, coupled with the positive association between $U$ and $Y$ , further implies negative association between $Z$ and $Y$ conditional on $A$ as stated in Condition (b) of Theorem 3.1.

For easy interpretation, we will give sufficient conditions for Z-Bias which require no interaction of $Z$ and $U$ on $A.$ When $A$ given $Z$ and $U$ follows an additive model, we have the following theorem.

Theorem 3.2.

In the causal diagram of Figure 1(b) with scalar $Z$ and $U$ , (1) holds if

(a)

$\textup{pr}(A=1\mid Z=z,U=u)=\beta(z)+\gamma(u)$ ; 2. (b)

$\beta(z)$ * is non-decreasing in $z$ , $\gamma(u)$ is non-decreasing in $u$ , and $E(Y\mid A=a,U=u)$ is non-decreasing in $u$ for both $a=1$ and [math];* 3. (c)

the essential supremum of $U$ given $(A=a,Z=z)$ depends only on $a$ .

In summary, when $A$ given $Z$ and $U$ follows an additive model and monotonicity of Theorem 3.2 holds, both unadjusted and adjusted estimators have non-negative biases for the true average causal effects for the treatment, control and the whole populations. Furthermore, the adjusted estimators, either for the treatment, control or the whole populations, have larger biases than the unadjusted estimator, i.e., Z-Bias arises.

When both the instrumental variable $Z$ and the confounder $U$ are binary, Theorem 3.2 has an even more interpretable form. Define $p_{zu}=\textup{pr}(A=1\mid Z=z,U=u)$ for $z,u=0$ and $1.$

Corollary 3.3.

In the causal diagram of Figure 1(b) with binary $Z$ and $U$ , (1) holds if

(a)

there is no additive interaction of $Z$ and $U$ on $A$ , i.e., $p_{11}-p_{10}-p_{01}+p_{00}=0;$ 2. (b)

$Z$ * and $U$ have monotonic effects on $A$ , i.e., $p_{11}\geq\max(p_{10},p_{01})$ and $\min(p_{10},p_{01})\geq p_{00}$ , and $E(Y\mid A=a,U=1)\geq E(Y\mid A=a,U=0)$ for both $a=1$ and $0.$ *

When $A$ given $Z$ and $U$ follows an multiplicative model, we have the following theorem.

Theorem 3.4.

In the causal diagram of Figure 1(b) with scalar $Z$ and $U$ , (1) holds if we replace Condition (a) of Theorem 3.2 by

(a’)

$\textup{pr}(A=1\mid Z=z,U=u)=\beta(z)\gamma(u)$ .

When both the instrument $Z$ and the confounder $U$ are binary, Theorem 3.4 can be simplified.

Corollary 3.5.

In the causal diagram of Figure 1(b) with binary $Z$ and $U$ , (1) holds if we replace Condition (a) of Corollary 3.3 by

(a’)

there is no multiplicative interaction of $Z$ and $U$ on $A$ , i.e., $p_{11}p_{00}=p_{10}p_{01}.$

We invoke the assumptions of no additive and multiplicative interaction of $Z$ and $U$ on $A$ in Theorems 3.2 and 3.4 for easy interpretation. They are sufficient but not necessary conditions for Z-Bias. In fact, we show in the proofs that Conditions (a) and (a’) in Theorems 3.2 and 3.4 and Corollaries 3.3 and 3.5 can be replaced by weaker conditions. For the case with binary $Z$ and $U$ , these conditions are particularly easy to interpret:

[TABLE]

i.e., $Z$ and $U$ have non-positive multiplicative interaction on both the presence and absence of $A.$ Even if Condition (a) or (a’) does not hold, one can show that half of the parameter space of $(p_{11},p_{10},p_{01},p_{00})$ satisfies the weaker condition (2), which is only sufficient, not necessary. Therefore, even in the presence of additive or multiplicative interaction, Z-Bias arises in more than half of the parameter space for binary $(Z,U,A,Y)$ .

4 General Instrumental Variable and General Confounder

When the instrumental variable $Z$ and the confounder $U$ are vectors, Theorems 3.1–3.4 still hold if the monotonicity assumptions hold for each component of $Z$ and $U$ , and $Z$ and $U$ are multivariate totally positive of order two (Karlin & Rinott, 1980), including the case that the components of $Z$ and $U$ are mutually independent (Esary et al., 1967). A random vector $W$ is multivariate totally positive of order two, if its density $f(\cdot)$ satisfies $f\{\max(w_{1},w_{2})\}f\{\min(w_{1},w_{2})\}\geq f(w_{1})f(w_{2})$ , where $\max(w_{1},w_{2})$ and $\min(w_{1},w_{2})$ are component-wise maximum and minimum of the vectors $w_{1}$ and $w_{2}$ . In the following, we will develop general theory for Z-Bias without the total positivity assumption about the components of $Z$ and $U.$

It is relatively straightforward to summarize a general instrumental variable $Z$ by a scalar propensity score $\Pi=\Pi(Z)=\textup{pr}(A=1\mid Z)$ , because $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}A\mid\Pi(Z)$ as shown in Rosenbaum & Rubin (1983). We define $\nu_{a}(\pi)=E(Y\mid A=a,\Pi=\pi)$ . The adjusted estimator for the population under treatment is

[TABLE]

the adjusted estimator for the population under control is

[TABLE]

and the adjusted estimator for the whole population is

[TABLE]

When $Z$ is scalar, then the above three formulas reduce to the ones in Section 3.

Greenland & Robins (1986) showed that for the causal effect on the treated population, $Y(0)$ alone suffices to control for confounding; likewise, for the causal effect on the control population, $Y(1)$ alone suffices to control for confounding. If interest lies in all three of our average causal effects, then we need to take $U=\{Y(1),Y(0)\}$ as the ultimate confounder for the relationship of $A$ on $Y.$ This is not an assumption about $U$ . Because $Y=AY(1)+(1-A)Y(0)$ is a deterministic function of $A$ and $\{Y(1),Y(0)\}$ , this implies that $U=\{Y(1),Y(0)\}$ satisfies the ignorability assumption (Rosenbaum & Rubin, 1983), or blocks all the back-door paths from $A$ to $Y$ (Pearl, 1995, 2000). We represent the causal structure in Figure 2.

We first state a theorem without assuming the structure of the causal diagram in Figure 2.

Theorem 4.1.

*If for both $a=1$ and [math], $\textup{pr}\{A=1\mid Y(a)\}$ is non-decreasing in $Y(a)$ , and $\textnormal{cov}\{\Pi,\nu_{a}(\Pi)\}\leq 0$ , then (1) holds. *

In a randomized experiment $A\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(a)$ , so the dependence of $\textup{pr}\{A=1\mid Y(a)\}$ on $Y(a)$ characterizes the self-selection process of an observational study. The condition $\textnormal{cov}\{\Pi,\nu_{a}(\Pi)\}\leq 0$ in Theorem 4.1 is another measure of the collider-bias caused by conditioning on $A$ , as $\nu_{a}(\pi)=E\{Y(a)\mid A=a,\Pi=\pi\}$ and $Y(a)$ is a component of $U$ in Figure 2. This measure of collider bias is more general than the one in Theorem 3.1. Analogous to Section 3, we will present more transparent sufficient conditions for Z-Bias to aid interpretation.

In the following, we use the distributional association measure (Cox & Wermuth, 2003; Ma et al., 2006; Xie et al., 2008), i.e., random variable $V$ has a non-negative distributional association on random variable $W$ , if the conditional distribution satisfies $\partial F(w\mid v)/\partial v\leq 0$ for all $v$ and $w.$ If the random variables are discrete, then partial differentiation is replaced by differencing between adjacent levels (Cox & Wermuth, 2003).

If there is no additive interaction between $\Pi$ and $\{Y(1),Y(0)\}$ on $A$ , then we have the following results.

Theorem 4.2.

In the causal diagram of Figure 2, (1) holds if

(a)

$\textup{pr}(A=1\mid\Pi,U)=\Pi+\delta\{Y(1)\}+\eta\{Y(0)\}$ * with $\delta(\cdot)$ and $\eta(\cdot)$ being non-decreasing;* 2. (b)

$\{Y(1),Y(0)\}$ * have non-negative distributional associations on each other, i.e., $\partial F(y_{1}\mid y_{0})/\partial y_{0}\leq 0$ and $\partial F(y_{0}\mid y_{1})/\partial y_{1}\leq 0$ for all $y_{1}$ and $y_{0}$ ;* 3. (c)

the essential supremum of $Y(1)$ given $Y(0)$ does not depend on $Y(0)$ , and the essential supremum of $Y(0)$ given $Y(1)$ does not depend on $Y(1)$ .

Remark 4.3.

*If we impose an additive model $\textup{pr}(A=1\mid\Pi,U)=h(\Pi)+\delta\{Y(1)\}+\eta\{Y(0)\},$ then independence of $\Pi$ and $U$ implies that $\textup{pr}(A=1\mid\Pi)=h(\Pi)+E[\delta\{Y(1)\}]+E[\eta\{Y(0)\}]=\Pi.$ Therefore, we must have $h(\Pi)=\Pi$ and $E[\delta\{Y(1)\}]+E[\eta\{Y(0)\}]=0.$ *

When the outcome is binary, the distributional association between $Y(1)$ and $Y(0)$ becomes their odds ratio (Xie et al., 2008), and non-negative distributional association between $Y(1)$ and $Y(0)$ is equivalent to

[TABLE]

We can further relax the model assumption of $A$ given $\Pi$ and $U$ by allowing for non-negative interaction between $Y(1)$ and $Y(0)$ on $A.$

Corollary 4.4.

In the causal diagram of Figure 2 with a binary outcome $Y$ , (1) holds if

(a)

$\textup{pr}(A=1\mid\Pi,U)=\alpha+\Pi+\delta Y(1)+\eta Y(0)+\theta Y(1)Y(0)$ * with $\delta,\eta,\theta\geq 0$ ;* 2. (b)

$\textsc{OR}_{Y}\geq 1.$ **

Remark 4.5.

If we have an additive model of $A$ given $\Pi$ and $U$ , $\textup{pr}(A=1\mid\Pi,U)=h(\Pi)+g(U),$ then the functional form $g(U)=\alpha+\delta Y(1)+\eta Y(0)+\theta Y(1)Y(0)$ imposes no restriction for binary outcome. Furthermore, $\textup{pr}(A=1\mid\Pi)=\Pi$ implies that $h(\Pi)=\Pi$ and $E\{g(U)\}=0$ , i.e., $\alpha=-\delta E\{Y(1)\}-\eta E\{Y(0)\}-\theta E\{Y(1)Y(0)\}.$ Therefore, the additive model in Condition (a) of Corollary 4.4 is

[TABLE]

If there is no multiplicative interaction of $\Pi$ and $\{Y(1),Y(0)\}$ on $Z$ , then we have the following results.

Theorem 4.6.

In the causal diagram of Figure 2, (1) holds if we replace Condition (a) of Theorem 4.2 by

(a’)

$\textup{pr}(A=1\mid\Pi,U)=\Pi\delta\{Y(1)\}\eta\{Y(0)\}$ * with $\delta(\cdot)$ and $\eta(\cdot)$ being non-decreasing.*

Corollary 4.7.

In the causal diagram of Figure 2 with a binary outcome $Y$ , (1) holds if we replace Condition (a) of Corollary 4.4 by

(a’)

$\textup{pr}(A=1\mid\Pi,U)=\alpha\Pi\delta^{Y(1)}\eta^{Y(0)}\theta^{Y(1)Y(0)}$ * with $\delta,\eta,\theta\geq 1$ .*

5 Illustrations

5.1 Numerical Examples

Myers et al. (2011) simulated binary $(Z,U,A,Y)$ to investigate Z-Bias. They generated $(Z,U)$ according to $\textup{pr}(Z=1)=0.5$ and $\textup{pr}(U=1)=\gamma_{0}$ . The first set of their generative models is additive,

[TABLE]

where the coefficients are all positive. The second set of their generative models is multiplicative,

[TABLE]

where the coefficients in (3) and (4) are all positive. They use simulation to show that Z-Bias arises under these models. In fact, in the above models, $Z$ and $U$ have monotonic effects on $A$ without additive or multiplicative interactions, and $U$ acts monotonically on $Y$ , given $A$ . Therefore, Corollaries 3.3 and 3.5 imply that Z-Bias must occur. The qualitative conclusion follows immediately from our theory. However, our theory does not make statements about the magnitude of the bias, and for more details about the magnitude and finite sample properties, see Myers et al. (2011).

We further use three numerical examples to illustrate the role of the no-interaction assumptions required by Theorems 3.2 and 3.4 and Corollaries 3.3 and 3.5. Recall the conditional probability of the treatment $A$ , $p_{zu}=\textup{pr}(A=1\mid Z=z,U=u)$ , and define the conditional probabilities of the outcome $Y$ as $r_{au}=\textup{pr}(Y=1\mid A=a,U=u)$ , for $z,a,u=0,1.$ Table 1 gives three examples, where monotonicity on the conditional distributions of $A$ and $Y$ hold, and there are both additive and multiplicative interactions. In all cases, the instrumental variable $Z$ is Bernoulli $(p=0.5)$ , and the confounder $U$ is another independent Bernoulli $(\pi=0.5)$ . In Case 1, the weaker condition (2) holds, and our theory implies that Z-Bias arises. In Case 2, neither the condition in Theorem 3.1 or (2) holds, but Z-Bias still arises. Our conditions are only sufficient but not necessary. In Case 3, neither the condition in Theorem 3.1 or (2) holds, and Z-Bias does not arise.

Finally, for binary $(Z,U,A,Y)$ we use Monte Carlo to compute the volume of the Z-Bias space, i.e., the parameter space of $p$ , $\pi$ , $p_{zu}$ ’s and $r_{au}$ ’s in which the adjusted estimator has higher bias than the unadjusted estimator. We randomly draw these ten probabilities from independent Uniform $(0,1)$ random variables, and for each draw of these probabilities we compute the average causal effect $\textsc{ACE}^{\textnormal{true}}$ , the unadjusted estimator $\textsc{ACE}^{\textnormal{unadj}}$ and the adjusted estimator $\textsc{ACE}^{\textnormal{adj}}$ . We plot the joint values of the biases $(\textsc{ACE}^{\textnormal{adj}}-\textsc{ACE}^{\textnormal{true}},\textsc{ACE}^{\textnormal{unadj}}-\textsc{ACE}^{\textnormal{true}})$ in Figure 3. The volume of the Z-Bias space can be approximated by the frequency that $\textsc{ACE}^{\textnormal{adj}}$ deviates more from $\textsc{ACE}^{\textnormal{true}}$ than $\textsc{ACE}^{\textnormal{unadj}}$ . With $10^{6}$ random draws, our Monte Carlo gives an unbiased estimate for this volume as $0.6805$ with estimated standard error $0.0005$ . Therefore, in about $68\%$ of the parameter space, the adjusted estimator is more biased than the unadjusted estimator.

5.2 Real Data Examples

Bhattacharya & Vogt (2012) presented an example about the treatment effect of small classroom in the third grade on test scores for reading. Their instrumental variable analysis gave point estimate $8.73$ with standard error $2.01$ . Without adjusting for the instrumental variable in the propensity score model, the point estimate was $6.00$ with estimated standard error $1.34$ ; adjusting for the instrumental variable, the point estimate was $2.97$ with estimated standard error $1.84$ . The difference between the adjusted estimator and the instrumental variable estimator is larger than that between the unadjusted estimator and the instrumental variable estimator.

Wooldridge (2010, Example 21.3) discusses estimating the effect of attaining at least seven years of education on fertility, with treatment $A$ being a binary indicator for at least seven years of education, outcome $Y$ being the number of living children, and instrumental variable $Z$ being a binary indicator if the woman was born in the first half of the year. Although the original data set of Wooldridge (2010) contains other variables, most of them are posttreatment variables, so we do not adjust for them in our analysis. The instrumental variable analysis gives point estimate $2.47$ with estimated standard error $0.59$ . The unadjusted analysis gives point estimate $1.77$ with estimated standard error $0.07$ . The adjusted analysis gives point estimate $1.76$ with estimated standard error $0.07$ . Table 2 summarizes the results. In this example, the adjusted and unadjusted estimators give similar results.

6 Discussion

6.1 Allowing for an Arrow from $Z$ to $Y$

When the variable $Z$ has an arrow to the outcome $Y$ as illustrated by Figure 4, the following generalization of Theorem 3.1 holds.

Theorem 6.1.

Consider the causal diagram of Figure 4 with scalar $Z$ and $U$ , where $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U$ and $A\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(a)\mid(Z,U)$ for $a=0$ and $1$ . The result in (1) holds if we replace Condition (a) of Theorem 3.1 by

(a’)

$\textup{pr}(A=1\mid Z=z,U=u)$ * and $E(Y\mid A=a,Z=z,U=u)$ are non-decreasing in $z$ and $u$ for $a=0$ and $1$ .*

However, when there is an arrow from $Z$ to $Y$ , Theorem 6.1 is of little use in practice without strong substantive knowledge about the size of the direct effect of $Z$ on $Y$ . In particular, neither Theorem 3.2 nor Theorem 3.4 is true when an arrow from $Z$ to $Y$ is present. This reflects the fact that neither the absence of an additive nor the absence of a multiplicative interaction of $Z$ and $U$ on $A$ is sufficient to conclude that $E(Y\mid A=a,Z=z)$ is non-increasing in $z$ when $E(Y\mid A=a,U=u,Z=z)$ is non-decreasing in $z$ and $u$ .

With a general instrumental variable and a general confounder, Theorem 4.1 holds without any assumptions on the underlying causal diagram, and therefore it holds even if the variable $Z$ affects the outcome directly. However, Theorems 4.2 and 4.6 no longer hold if an arrow from $Z$ to $Y$ is present as in Figure 4. This reflects the fact that the absence of an additive or multiplicative interaction of $U$ and $\Pi$ on $A$ no longer implies $\textnormal{cov}\{\Pi,\nu_{a}(\Pi)\}\leq 0$ when $Z$ has a direct effect on $Y$ , even if the remaining conditions of Theorems 4.2 and 4.6 hold. Analogously, Theorems 4.2 and 4.6 no longer hold if there exits an unmeasured common cause of $Z$ and $Y$ on the causal diagram in Figure 1(b), even if $Z$ has no direct effect on $Y$ .

6.2 Extensions

In §§2–4, we discussed Z-Bias for the average causal effects. We can extend the results to distributional causal effects for general outcomes (Ju & Geng, 2010) and causal risk ratios for binary or positive outcomes. Moreover, the results in §§2–4 are conditional on or within the strata of observed covariates. Similar results hold for causal effects averaged over observed covariates. We give more details in the Supplementary Material. In this paper we have given sufficient conditions for the presence of Z-Bias; future work could consider sufficient conditions for the absence of Z-Bias.

6.3 Conclusion

It is often suggested that we should adjust for all pretreatment covariates in observational studies. However, we show that in a wide class of models satisfying certain monotonicity, adjusting for an instrumental variable actually amplifies the impact of the unmeasured treatment-outcome confounding, which results in more bias than the unadjusted estimator. In practice, we may not be sure about whether a covariate is a confounder, for which one needs to control, or perhaps instead an instrumental variable, for which control would only increase any existing bias due to unmeasured confounding. Therefore, a more practical approach, as suggested by Rosenbaum (2010, Chapter 18.2) and Brookhart et al. (2010), may be to conduct analysis both with and without adjusting for the covariate. If two analyses give similar results, as in the example in Table 2, then we need not worry about Z-Bias; otherwise, we need additional information and analysis before making decisions.

Acknowledgments

Peng Ding is partially supported by the U.S. Institute of Education Sciences, and Tyler J. VanderWeele by the U.S. National Institutes of Health. The authors thank the Associate Editor and two reviewers for detailed and helpful comments.

Supplementary material

Supplementary Material available at Biometrika online includes all the proofs and extensions.

Appendix 1. Lemmas and Their Proofs

In order to prove the main results, we need to invoke the following lemmas. Some of them are from the literature, and some of them are new and of independent interest.

Lemma S2 is from Esary et al. (1967, Theorem 2.1).

Lemma S2.

*Let $f(\cdot)$ and $g(\cdot)$ be functions with $K$ real-valued arguments, which are both non-decreasing in each of their arguments. If $U=(U_{1},\ldots,U_{K})$ is a multivariate random variable with $K$ mutually independent components, then $\textnormal{cov}\{f(U),g(U)\}\geq 0.$ *

Lemma S3 is from VanderWeele (2008), and Lemmas S4 and S5 are from Chiba (2009).

Lemma S3.

*For a univariate $U$ or a multivariate $U$ with mutually independent components, if for $a=1$ and [math], $Y(a)\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}A\mid U$ , $E(Y\mid A=a,U=u)$ is non-decreasing in each component of $u$ , and $\textup{pr}(A=1\mid U=u)$ is non-decreasing in each component of $u,$ then $E(Y\mid A=1)\geq E\{Y(1)\}$ and $E(Y\mid A=0)\leq E\{Y(0)\}.$ *

Lemma S4.

*For a univariate $U$ and a multivariate $U$ with mutually independent components, if $Y(0)\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}A\mid U$ , $E(Y\mid A=0,U=u)$ is non-decreasing in each component of $u$ , and $\textup{pr}(A=1\mid U=u)$ is non-decreasing in each component of $u,$ then $E(Y\mid A=0)\leq E\{Y(0)\mid A=1\}.$ *

Lemma S5.

*For a univariate $U$ and a multivariate $U$ with mutually independent components, if $Y(1)\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}A\mid U$ , $E(Y\mid A=1,U=u)$ is non-decreasing in each component of $u$ , and $\textup{pr}(A=1\mid U=u)$ is non-decreasing in each component of $u,$ then $E(Y\mid A=1)\geq E\{Y(1)\mid A=0\}.$ *

Lemma S6, extending Rothman et al. (2008), states that under monotonicity, no additive interaction implies non-positive multiplicative interactions for both presence and absence of the outcome.

Lemma S6.

If $p_{11}\geq\max(p_{10},p_{01})$ , $\min(p_{10},p_{01})\geq p_{00}>0$ , and $p_{11}-p_{10}-p_{01}+p_{00}=0$ , then

[TABLE]

Proof S7 (of Lemma S6).

Define $\textsc{RR}_{11}=p_{11}/p_{00}\geq 1$ , $\textsc{RR}_{10}=p_{10}/p_{00}\geq 1$ and $\textsc{RR}_{01}=p_{01}/p_{00}\geq 1$ . Then $p_{11}-p_{10}-p_{01}+p_{00}=0$ implies $\textsc{RR}_{11}=\textsc{RR}_{10}+\textsc{RR}_{01}-1,$ which further implies

[TABLE]

The second inequality of (S5) follows from

[TABLE]

Lemma S6 is about interaction between two binary causes, and for our discussion we need to extend it to interaction between two general causes. Lemma S8 extends Piegorsch et al. (1994) and Yang et al. (1999) by relating the conditional association between two independent causes given the outcome to the interaction between the two causes on the outcome.

Lemma S8.

If $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U$ , and $\textup{pr}(A=1\mid Z=z,U=u)=\beta(z)+\gamma(u)$ with $\beta(z)$ and $\gamma(u)$ non-decreasing in $z$ and $u$ , then for both $a=1$ and [math] and for all values of $u$ and $z$ ,

[TABLE]

*i.e., $U$ has non-positive distributional dependence on $Z$ , given $A$ . *

Proof S9 (of Lemma S8).

For a fixed $u$ and $z_{1}>z_{0}$ , we define

[TABLE]

following from the additive model of $A$ and $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U.$

Because $\beta(z_{1})\geq\beta(z_{0}),$ it is straightforward to show that $p_{11}\geq p_{10}$ and $p_{01}\geq p_{00}$ . Because $\gamma(u)$ is increasing in $u$ , we have

[TABLE]

which imply $p_{11}\geq p_{01}$ and $p_{10}\geq p_{00}.$ We further have

[TABLE]

The four probabilities $(p_{11},p_{10},p_{01},p_{00})$ satisfy the conditions in Lemma S6, Therefore, (2) holds. Replacing the probabilities in (2) by their definitions above, we have

[TABLE]

and

[TABLE]

Therefore, for both $a=1$ and [math] and for all values of $u$ ,

[TABLE]

is non-increasing in $z$ . Because of the independence of $Z$ and $U$ , we have

[TABLE]

*Therefore, $F(u\mid A=a,Z=z)$ is a non-increasing function of (S6), and the conclusion holds. *

Lemmas S6 and S8 above hold under the assumption of no additive interaction, and the following two lemmas state similar results under the assumption of no multiplicative interaction.

Lemma S10.

If $p_{11}\geq\max(p_{10},p_{01}),\min(p_{10},p_{01})\geq p_{00}$ , and $p_{11}p_{00}=p_{10}p_{01}$ , then

[TABLE]

Proof S11 (of Lemma S10).

Using the same notation in the proof of Lemma S6, $p_{11}p_{00}=p_{10}p_{01}$ implies $\textsc{RR}_{11}=\textsc{RR}_{10}\textsc{RR}_{01}$ , with $\textsc{RR}_{10}\geq 1,\textsc{RR}_{01}\geq 1$ , and $\textsc{RR}_{11}\geq 1.$ Therefore,

[TABLE]

which further implies that

[TABLE]

Lemma S12.

If $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U$ , and $\textup{pr}(A=1\mid Z=z,U=u)=\beta(z)\gamma(u)$ with $\beta(z)>0$ and $\gamma(u)>0$ non-decreasing in $z$ and $u$ , then $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U\mid A=1$ , and for all values of $u$ and $z$ ,

[TABLE]

*i.e., $U$ has non-positive distributional dependence on $Z$ , given $A=0$ . *

Proof S13 (of Lemma S12).

For a fixed $u$ and $z_{1}>z_{0}$ , we define

[TABLE]

following from the multiplicative model of $A$ and $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U.$ Because $\beta(z_{1})\geq\beta(z_{0}),$ we have $p_{11}\geq p_{10}$ and $p_{01}\geq p_{00}$ . Because $\gamma(u)$ is increasing in $u$ , we have

[TABLE]

which imply $p_{11}\geq p_{01}$ and $p_{10}\geq p_{00}.$ We can further verify $(p_{11}p_{00})/(p_{10}p_{01})=1.$ Because the four probabilities $(p_{11},p_{10},p_{01},p_{00})$ satisfy the conditions in Lemma S10, we have $\{(1-p_{11})(1-p_{00})\}/\{(1-p_{10})(1-p_{01})\}\leq 1.$ Replacing the probabilities by their definitions, we have

[TABLE]

*Following the same logic of the proof of Lemma S8, we can prove that $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U\mid A=1$ , and $Z$ has non-positive distributional association on $U$ , given $A=0$ . *

Define $f=\textup{pr}(A=1)$ to be the proportion of the population under treatment. The average causal effect for the whole population can be written as a convex combination of the average causal effects for the treated and control populations:

[TABLE]

Analogously, with a scalar instrumental variable, the adjusted estimator for the whole population can be written as

[TABLE]

and with a general instrumental variable,

[TABLE]

Lemma S14.

With a scalar instrumental variable $Z$ , the differences between the adjusted and unadjusted estimators are

[TABLE]

*With a general instrumental variable $Z$ , the above formulas hold if we replace $\Pi(Z)$ by $\Pi$ and $\mu_{a}(Z)=E(Y\mid A=a,Z)$ by $\nu_{a}(\Pi)=E(Y\mid A=a,\Pi).$ *

Proof S15 (of Lemma S14).

The difference $\textsc{ACE}_{1}^{\textnormal{adj}}-\textsc{ACE}^{\textnormal{unadj}}$ is equal to

[TABLE]

Similarly, the difference $\textsc{ACE}_{0}^{\textnormal{adj}}-\textsc{ACE}^{\textnormal{unadj}}$ is equal to

[TABLE]

Therefore, the difference $\textsc{ACE}^{\textnormal{adj}}-\textsc{ACE}^{\textnormal{unadj}}$ is equal to

[TABLE]

*Analogously, we can prove the results for general instrumental variables. *

Appendix 2. Proofs of Theorems and Corollaries in the Main Text

Proof S16 (of Theorem 3.1).

Because $\Pi(z)=\textup{pr}(A=1\mid Z=z)$ and $\textup{pr}(A=1\mid U=u)$ are non-decreasing in $z$ and $u$ , and $E(Y\mid A=a,U=u)$ is non-decreasing in $u$ for both $a=0$ and $1$ , the unadjusted estimator, $\textsc{ACE}^{\textnormal{unadj}}$ , is larger than or equal to $\textsc{ACE}^{\textnormal{true}},\textsc{ACE}_{1}^{\textnormal{true}}$ and $\textsc{ACE}_{0}^{\textnormal{true}}$ , according to Lemmas S3–S5.

Because $\Pi(Z)$ is non-decreasing and $\mu_{a}(Z)$ is non-increasing in $Z$ for both $a=0$ and $1$ , their covariance is non-positive according to Lemma S2, i.e., $\textnormal{cov}\{\Pi(Z),\mu_{a}(Z)\}\leq 0.$

*Because the differences between all the adjusted estimators, $\textsc{ACE}_{1}^{\textnormal{adj}}$ , $\textsc{ACE}_{0}^{\textnormal{adj}}$ and $\textsc{ACE}^{\textnormal{adj}}$ , and the unadjusted estimator, $\textsc{ACE}^{\textnormal{unadj}}$ , are negative constants multiplied by $\textnormal{cov}\{\Pi(Z),\mu_{a}(Z)\}$ , according to Lemma S14 all of $\textsc{ACE}_{1}^{\textnormal{adj}},$ $\textsc{ACE}_{0}^{\textnormal{adj}},$ and $\textsc{ACE}^{\textnormal{adj}}$ are larger or equal to $\textsc{ACE}^{\textnormal{unadj}}.$ *

Proof S17 (of Theorem 3.2).

The independence of $Z$ and $U$ implies that

[TABLE]

are non-decreasing in $z$ and $u.$ Therefore, according to Theorem 3.1 we need only to verify that $E(Y\mid A=a,Z=z)$ in non-increasing in $z$ for both $a=0$ and $1.$

Because $Z\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}U$ and $\textup{pr}(A=1\mid Z=z,U=u)=\beta(z)+\gamma(u)$ with non-decreasing $\beta(z)$ and $\gamma(u)$ , we can apply Lemma S8, and conclude that $\partial F(u\mid A=a,Z=z)/\partial z\geq 0.$

Write the essential infimum and supremum of $U$ given $(A=a,Z=z)$ as $\underline{u}(a,z)$ and $\overline{u}(a)$ , with the later depending only on $a$ according to Condition (c) of Theorem 3.2. Because $Y\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Z\mid(A,U)$ , integration or summation by parts gives

[TABLE]

Therefore, its derivative with respect to $z$ ,

[TABLE]

*is smaller than or equal to zero, because $\partial m_{a}(u)/\partial u\geq 0$ for both $a=0$ and $1$ and for all $u$ . *

Proof S18 (of Corollary 3.3).

*According to Theorem 3.1 we need only to verify that $\mu_{a}(z)=E(Y\mid A=a,Z=z)$ is non-increasing in $z$ for both $a=0$ and $1.$ Following Lemma S6, for binary and independent $Z$ and $U,$ monotonicity and no additive interaction imply (S5), which, according to Bayes’ Theorem, is equivalent to *

[TABLE]

The above inequalities (S7) and (S8) state that $Z$ and $U$ have negative association given each level of $A$ , and therefore $\textup{pr}(U=1\mid A=a,Z=z)$ is non-increasing in $z$ for both $a=1$ and $0.$

Because $m_{a}(1)\geq m_{a}(0)$ and

[TABLE]

*we know that $\mu_{a}(z)$ is non-decreasing in $\textup{pr}(U=1\mid A=a,Z=z)$ . Therefore, $\mu_{a}(z)$ is non-increasing in $z$ for both $a=1$ and $0.$ *

Proof S19 (of Theorem 3.4).

*Because of the independence of $Z$ and $U$ , we have $\textup{pr}(A=1\mid Z=z)=\beta(z)E\{\gamma(U)\}$ and $\textup{pr}(A=1\mid U=u)=E\{\beta(Z)\}\gamma(u)$ are non-decreasing in $z$ and $u.$ According to Lemma S12, the multiplicative model of $A$ also implies that for both $a=1$ and [math] and for all $z$ and $u$ , $\partial F(u\mid A=a,Z=z)/\partial z\geq 0.$ Following exactly the same steps of the proof of Theorem 3.2, we can prove Theorem 3.4. *

Proof S20 (of Corollary 3.5).

For binary and independent $Z$ and $U$ , monotonicity, no multiplicative interaction, and Lemma S10 imply

[TABLE]

*With the above results in (S9), the rest of the proof is the same as the proof of Corollary 3.3. *

Proof S21 (of Theorem 4.1).

First, we consider the treatment effect on the population under treatment. Taking $U=Y(0)$ in Lemma S4, we have $\textsc{ACE}^{\textnormal{unadj}}\geq\textsc{ACE}_{1}^{\textnormal{true}}$ , because $A\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(0)\mid Y(0)$ , $\textup{pr}\{A=1\mid Y(0)\}$ is non-decreasing in $Y(0)$ , and $E\{Y\mid A=0,Y(0)\}=Y(0)$ is non-decreasing in $Y(0)$ . The condition $\textnormal{cov}\{\Pi,E(Y\mid A=0,\Pi)\}\leq 0$ implies that $\textsc{ACE}_{1}^{\textnormal{adj}}\geq\textsc{ACE}^{\textnormal{unadj}}$ according to Lemma S14. Therefore, $\textsc{ACE}_{1}^{\textnormal{adj}}\geq\textsc{ACE}^{\textnormal{unadj}}\geq\textsc{ACE}_{1}^{\textnormal{true}}.$

Second, we take $U=Y(1)$ in Lemma S5, and by a similar argument as above we have $\textsc{ACE}_{0}^{\textnormal{adj}}\geq\textsc{ACE}^{\textnormal{unadj}}\geq\textsc{ACE}_{0}^{\textnormal{true}}.$

*The conclusion holds because $\textsc{ACE}^{\textnormal{true}}=f\textsc{ACE}_{1}^{\textnormal{true}}+(1-f)\textsc{ACE}_{0}^{\textnormal{true}}$ and $\textsc{ACE}^{\textnormal{adj}}=f\textsc{ACE}_{1}^{\textnormal{adj}}+(1-f)\textsc{ACE}_{0}^{\textnormal{adj}}.$ *

Proof S22 (of Theorem 4.2).

Under the additive model of $A$ given $\Pi$ and $U=\{Y(1),Y(0)\}$ , we have the following results. First, $\textup{pr}(A=1\mid\Pi)=\Pi$ is increasing in $\Pi.$ Second, $\Pi\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}\{Y(1),Y(0)\}$ implies

[TABLE]

Denote the infimum and supremum of $Y(0)$ given $Y(1)=y_{1}$ by $\underline{y}_{0}(y_{1})$ and $\overline{y}_{0}$ , with the later not depending on $y_{1}$ according to Condition (c) of Theorem 4.2. Applying integration or summation by parts, we have

[TABLE]

The function $\widetilde{\delta}(y_{1})$ is non-decreasing in $y_{1}$ , because

[TABLE]

Third, following the same reasoning as the second argument, we have $\textup{pr}\{A=1\mid\Pi,Y(1)=y_{0}\}=\Pi+\widetilde{\eta}(y_{0}),$ with $\widetilde{\eta}(y_{0})$ being a non-decreasing function of $y_{0}.$ Fourth, $\Pi\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(1)$ implies $\textup{pr}\{A=1\mid Y(1)=y_{1}\}=f+\widetilde{\delta}(y_{1}),$ which is non-decreasing in $y_{1}.$ Fifth, $\Pi\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(0)$ implies $\textup{pr}\{A=1\mid Y(0)=y_{0}\}=f+\widetilde{\eta}(y_{0}),$ which is non-decreasing in $y_{0}.$

According the fourth and fifth arguments above, Condition (a) in Theorem 4.1 holds. Therefore, we need only to verify Condition (b) in Theorem 4.1 to complete the proof.

We have shown that $\textup{pr}\{A=1\mid\Pi,Y(1)\}=\Pi+\widetilde{\delta}\{Y(1)\}$ , which is additive and non-decreasing in $\Pi$ and $Y(1)$ . According to Lemma S8, we know that

[TABLE]

for all $y_{1}$ and $\pi.$ We have also shown that $\textup{pr}\{A=1\mid\Pi,Y(0)\}=\Pi+\widetilde{\eta}\{Y(0)\}$ , which is additive and non-decreasing in $\Pi$ and $Y(0)$ . Again according to Lemma S8, we know that

[TABLE]

*for all $y_{0}$ and $\pi.$ According to Xie et al. (2008), the above negative distributional associations in (S10) and (S11) imply the negative associations in expectation between $Y(0)$ and $\Pi$ given $A$ , as required by condition (b) of Theorem 4.1. *

Proof S23 (of Corollary 4.4).

As shown in the proof of Theorem 4.2, the conclusion follows immediately from the five ingredients. We will show that they hold even if there is non-negative interaction between binary $Y(1)$ and $Y(0)$ . The following proof is in parallel with the proof of Theorem 4.2.

First, $\textup{pr}(A=1\mid\Pi)=\Pi$ is increasing in $\Pi.$ Second,

[TABLE]

The last equation in (S13) follows from the fact that $Y(1)$ is binary and the functional form must be linear in $y_{1}$ , where the coefficient is

[TABLE]

where (LABEL:eq::coef-given-y1) follows from (S12), and (S15) follows from $\delta\geq 0$ and $\theta\geq 0.$ Because $\textsc{OR}_{Y}\geq 1$ , the potential outcomes have non-negative association, implying that their risk difference $\textsc{RD}_{Y}=\textup{pr}\{Y(0)=1\mid Y(1)=1\}-\textup{pr}\{Y(0)=1\mid Y(1)=0\}\geq 0$ . Therefore, $\widetilde{\delta}\geq 0$ , and $\textup{pr}\{A=1\mid\Pi,Y(1)\}$ is additive and non-decreasing in $\Pi$ and $Y(1)$ .

Third, similar to the second argument, we have $\textup{pr}\{A=1\mid\Pi,Y(0)=y_{0}\}=\Pi+\widetilde{\eta}[y_{0}-E\{Y(0)\}]$ with $\widetilde{\eta}\geq 0.$ Therefore, $\textup{pr}\{A=1\mid\Pi,Y(0)\}$ is additive and non-decreasing in $\Pi$ and $Y(0)$ . Fourth, $\Pi\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(1)$ implies that $\textup{pr}\{A=1\mid Y(1)\}=f+\widetilde{\delta}Y(1)$ is increasing in $Y(1).$ Fifth, $\Pi\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}Y(0)$ implies that $\textup{pr}\{A=1\mid Y(0)\}=f+\widetilde{\eta}Y(0)$ is increasing in $Y(0).$

*With these five ingredients, the rest of the proof is exactly the same as the proof of Theorem 4.2. *

Proof S24 (of Theorem 4.6).

First, $\textup{pr}(A=1\mid\Pi)=\Pi$ is non-decreasing in $\Pi$ . Second,

[TABLE]

is multiplicative and non-decreasing in $\Pi$ and $y_{1}$ , following the same argument as the proof of Theorem 4.2. Third, $\textup{pr}\{A=1\mid\Pi,Y(0)=y_{0}\}=\Pi\widetilde{\eta}(y_{0})$ is multiplicative and non-decreasing in $\Pi$ and $y_{0}$ . Fourth, $\textup{pr}\{A=1\mid Y(1)=y_{1}\}=f\widetilde{\delta}(y_{1})$ is non-decreasing in $y_{1}$ . Fifth, $\textup{pr}\{A=1\mid Y(0)=y_{0}\}=f\widetilde{\eta}(y_{0})$ is non-decreasing in $y_{0}$ .

The multiplicative models and Lemma S12 imply that for all $\pi,y_{1}$ and $y_{0}$ ,

[TABLE]

*The rest part is the same as the proof of Theorem 4.2. *

Proof S25 (of Corollary 4.7).

First, $\textup{pr}(A=1\mid\Pi)=\Pi$ is non-decreasing in $\Pi$ . Second,

[TABLE]

where the functional form must be multiplicative because of binary $Y(0)$ , and the parameter $\widetilde{\delta}$ is

[TABLE]

Because $\textsc{OR}_{Y}\geq 1$ , we have $\textup{pr}\{Y(0)=1\mid Y(1)=1\}\geq\textup{pr}\{Y(0)=1\mid Y(1)=0\}$ , which implies that $\widetilde{\delta}\geq 1.$ Therefore, $\textup{pr}\{A=1\mid\Pi,Y(1)\}$ is multiplicative and non-decreasing in $\Pi$ and $Y(1)$ . Third, we can similarly show that $\textup{pr}\{A=1\mid\Pi,Y(0)\}$ is multiplicative and non-decreasing in $\Pi$ and $Y(0).$ Fourth, $\textup{pr}\{A=1\mid Y(1)=y_{1}\}=\alpha f\widetilde{\delta}^{y_{1}}$ is non-decreasing in $y_{1}$ . Fifth, $\textup{pr}\{A=1\mid Y(0)=y_{0}\}=\alpha f\widetilde{\eta}^{y_{0}}$ is non-decreasing in $y_{0}$ .

*The rest part is the same as the proof of Theorem 4.6. *

Proof S26 (of Theorem 6.1.).

In Figure 4, $Z$ and $U$ are two independent confounders for the relationship between $A$ and $Y$ . Because $\textup{pr}(A=1\mid Z=z,U=u)$ and $E(Y\mid A=a,Z=z,U=u)$ are non-decreasing in $z$ and $u$ for both $a=0$ and $1$ , Lemmas S3–S5 imply that the unadjusted estimator, $\textsc{ACE}^{\textnormal{unadj}}$ , is larger than or equal to $\textsc{ACE}^{\textnormal{true}},\textsc{ACE}_{1}^{\textnormal{true}}$ and $\textsc{ACE}_{0}^{\textnormal{true}}$ .

*The independence between $Z$ and $U$ implies $\textup{pr}(A=1\mid Z=z)=\int\textup{pr}(A=1\mid Z=z,U=u)F(\operatorname{d}\!{u})$ , and the monotonicity of $\textup{pr}(A=1\mid Z=z,U=u)$ in $z$ implies that $\textup{pr}(A=1\mid Z=z)$ is non-decreasing in $z$ . The rest of the proof is identical to the proof of Theorem 3.1. *

Appendix 3. Extensions to Other Causal Measures

Appendix 3 $\cdot$ 1. Distributional Causal Effects

Sometimes we are also interested in estimating the distributional causal effects (Ju & Geng, 2010) for the treatment, control and whole populations:

[TABLE]

The unadjusted estimator is

[TABLE]

The adjusted estimators for the treatment, control and whole populations are

[TABLE]

If the outcome is binary, then the distributional causal effects at $y<1$ are the average causal effects, and zero at $y\geq 1$ . All results about distributional causal effects reduce to average causal effects for binary outcome. For a general outcome, the distributional causal effects are the average causal effects on the dichotomized outcome $I_{y}=I(Y>y).$ Therefore, if we replace the outcome $Y$ by $I_{y}$ in Theorems 3.1–3.4, the results about Z-Bias hold for distributional effects. For instance, the condition that $\textup{pr}(Y>y\mid A=a,U=u)$ is non-decreasing in $u$ for all $a$ is the same as requiring a non-negative sign on the arrow $U\rightarrow Y$ , according to the theory of signed directed acyclic graphs (VanderWeele & Robins, 2010). The following theorem states the results analogous to Theorems 4.1–4.6.

Corollary S27.

In the causal diagram of Figure 2, if for all $y$ and for both $a=1$ and [math],

(a)

$\textup{pr}\{Y(a)>y\mid A=1\}\geq\textup{pr}\{Y(a)>y\mid A=0\}$ ; 2. (b)

$\textnormal{cov}\{\Pi,\textup{pr}(Y>y\mid A=a,\Pi)\}\leq 0$ ;

then

[TABLE]

*Under the conditions of Theorems 4.2 and 4.6, (S17) holds. *

Proof S28 (of Corollary S27).

Condition (a) of Corollary S27 is equivalent to $\textup{pr}\{A=1\mid I_{y}(a)=1\}\geq\textup{pr}\{A=1\mid I_{y}(a)=0\}$ , and Condition (b) of Corollary S27 is equivalent to $\textnormal{cov}\{\Pi,E(I_{y}\mid A=a,\Pi)\}\leq 0$ . Therefore, the conclusion follows from Theorem 4.1.

According to the proofs of Theorems 4.2 and 4.6, we have

[TABLE]

*because of monotonicity of $\textup{pr}\{A=1\mid Y(a)\}$ in $Y(a)$ . Therefore, Condition (a) of Theorem S27 holds. Under the conditions of Theorems 4.2 and 4.6, we have also shown in (S10)–(S16) that for all $a,y$ and $\pi$ , $\partial\textup{pr}(Y\leq y\mid A=a,\Pi=\pi)/\partial\pi\geq 0,$ which implies that $E(I_{y}\mid A=a,\Pi=\pi)$ is non-increasing in $\pi$ . Therefore, Condition (b) of Theorem S27 holds. The proof is complete. *

Appendix 3 $\cdot$ 2. Ratio Measures

In many applications with binary or positive outcomes, we are also interested in assessing causal effects on the ratio scale for the treatment, control and whole populations, defined as

[TABLE]

The unadjusted estimator on the ratio scale is

[TABLE]

The adjusted estimators on the ratio scale for the treatment, control and whole populations are

[TABLE]

With a general instrumental variable $Z$ , we can replace $Z$ by $\Pi$ in the definitions of the adjusted estimators.

Corollary S29.

All the theorems and corollaries in §§3 and 4 hold on the ratio scale, i.e., under their conditions,

[TABLE]

Proof S30 (of Corollary S29).

*First, $\textsc{RR}^{\textnormal{true}}$ is a convex combination of $\textsc{RR}^{\textnormal{true}}_{1}$ and $\textsc{RR}^{\textnormal{true}}_{0}$ , and $\textsc{RR}^{\textnormal{adj}}$ is a convex combination of $\textsc{RR}^{\textnormal{adj}}_{1}$ and $\textsc{RR}^{\textnormal{adj}}_{0}$ , which are formally stated in Ding & VanderWeele (2016, eAppendix). Then the conclusion follows from the proofs of the theorems above. *

Appendix 3 $\cdot$ 3. Average Over Observed Covariates

In practice, we need to adjust for the observed covariates $X$ that are confounders affecting both the treatment and outcome. The discussion in previous sections is conditional on or within strata of observed covariates $X$ , and the causal effects and their estimators are given $X$ . For example,

[TABLE]

and other conditional quantities can be analogously defined. If the conditions in the theorems and corollaries in §§3 and 4 hold within each level of $X$ , then the conclusions in (1) and (S17) hold not only within each level of $X$ but also averaged over $X$ . For example, for the average causal effects, we have

[TABLE]

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ali et al. (2014) Ali, M. S. , Groenwold, R. H. & Klungel, O. H. (2014). Propensity score methods and unobserved covariate imbalance: Comments on “Squeezing the balloon”. Health Services Research 49 , 1074–1082.
2Bhattacharya & Vogt (2012) Bhattacharya, J. & Vogt, W. B. (2012). Do instrumental variables belong in propensity scores? Int. J. Stat. Econ. 9 , 107–127.
3Brookhart et al. (2010) Brookhart, M. A. , Stürmer, T. , Glynn, R. J. , Rassen, J. & Schneeweiss, S. (2010). Confounding control in healthcare database research: challenges and potential approaches. Medical Care 48 , S 114–S 120.
4Brooks & Ohsfeldt (2013) Brooks, J. M. & Ohsfeldt, R. L. (2013). Squeezing the balloon: Propensity scores and unmeasured covariate balance. Health Services Research 48 , 1487–1507.
5Chiba (2009) Chiba, Y. (2009). The sign of the unmeasured confounding bias under various standard populations. Biometrical Journal 51 , 670–676.
6Cochran (1965) Cochran, W. G. (1965). The planning of observational studies of human populations (with discussion). Journal of the Royal Statistical Society: Series A (General) 128 , 234–266.
7Cox & Wermuth (2003) Cox, D. & Wermuth, N. (2003). A general condition for avoiding effect reversal after marginalization. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 , 937–941.
8d’Agostino (1998) d’Agostino, R. B. (1998). Tutorial in biostatistics: Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine 17 , 2265–2281.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Instrumental variables as bias amplifiers with general outcome and confounding

Abstract

keywords:

1 Introduction

2 Framework and Notation

3 Scalar Instrumental Variable and Scalar Confounder

Theorem 3.1**.**

Theorem 3.2**.**

Corollary 3.3**.**

Theorem 3.4**.**

Corollary 3.5**.**

4 General Instrumental Variable and General Confounder

Theorem 4.1**.**

Theorem 4.2**.**

Remark 4.3**.**

Corollary 4.4**.**

Remark 4.5**.**

Theorem 4.6**.**

Corollary 4.7**.**

5 Illustrations

5.1 Numerical Examples

5.2 Real Data Examples

6 Discussion

6.1 Allowing for an Arrow from ZZZ to YYY

Theorem 6.1**.**

6.2 Extensions

6.3 Conclusion

Acknowledgments

Supplementary material

Appendix 1. Lemmas and Their Proofs

Lemma S2**.**

Lemma S3**.**

Lemma S4**.**

Lemma S5**.**

Lemma S6**.**

Proof S7** (of Lemma S6).**

Lemma S8**.**

Proof S9** (of Lemma S8).**

Lemma S10**.**

Proof S11** (of Lemma S10).**

Lemma S12**.**

Proof S13** (of Lemma S12).**

Lemma S14**.**

Proof S15** (of Lemma S14).**

Appendix 2. Proofs of Theorems and Corollaries in the Main Text

Proof S16** (of Theorem 3.1).**

Proof S17** (of Theorem 3.2).**

Proof S18** (of Corollary 3.3).**

Proof S19** (of Theorem 3.4).**

Proof S20** (of Corollary 3.5).**

Proof S21** (of Theorem 4.1).**

Proof S22** (of Theorem 4.2).**

Proof S23** (of Corollary 4.4).**

Proof S24** (of Theorem 4.6).**

Proof S25** (of Corollary 4.7).**

Proof S26** (of Theorem 6.1.).**

Appendix 3. Extensions to Other Causal Measures

Appendix 3⋅\cdot⋅1. Distributional Causal Effects

Corollary S27**.**

Proof S28** (of Corollary S27).**

Appendix 3⋅\cdot⋅2. Ratio Measures

Corollary S29**.**

Proof S30** (of Corollary S29).**

Appendix 3⋅\cdot⋅3. Average Over Observed Covariates

Theorem 3.1.

Theorem 3.2.

Corollary 3.3.

Theorem 3.4.

Corollary 3.5.

Theorem 4.1.

Theorem 4.2.

Remark 4.3.

Corollary 4.4.

Remark 4.5.

Theorem 4.6.

Corollary 4.7.

6.1 Allowing for an Arrow from $Z$ to $Y$

Theorem 6.1.

Lemma S2.

Lemma S3.

Lemma S4.

Lemma S5.

Lemma S6.

Proof S7 (of Lemma S6).

Lemma S8.

Proof S9 (of Lemma S8).

Lemma S10.

Proof S11 (of Lemma S10).

Lemma S12.

Proof S13 (of Lemma S12).

Lemma S14.

Proof S15 (of Lemma S14).

Proof S16 (of Theorem 3.1).

Proof S17 (of Theorem 3.2).

Proof S18 (of Corollary 3.3).

Proof S19 (of Theorem 3.4).

Proof S20 (of Corollary 3.5).

Proof S21 (of Theorem 4.1).

Proof S22 (of Theorem 4.2).

Proof S23 (of Corollary 4.4).

Proof S24 (of Theorem 4.6).

Proof S25 (of Corollary 4.7).

Proof S26 (of Theorem 6.1.).

Appendix 3 $\cdot$ 1. Distributional Causal Effects

Corollary S27.

Proof S28 (of Corollary S27).

Appendix 3 $\cdot$ 2. Ratio Measures

Corollary S29.

Proof S30 (of Corollary S29).

Appendix 3 $\cdot$ 3. Average Over Observed Covariates