Which practical interventions does the do-operator refer to in causal   inference? Illustration on the example of obesity and cancer

Lola Etievant; Vivian Viallon

arXiv:1901.00772·stat.ME·January 4, 2019

Which practical interventions does the do-operator refer to in causal inference? Illustration on the example of obesity and cancer

Lola Etievant, Vivian Viallon

PDF

Open Access

TL;DR

This paper examines the interpretation of the do-operator in causal inference, especially for exposures like obesity, by analyzing how interventions on causes of the exposure relate to the hypothetical intervention effect.

Contribution

It clarifies the conditions under which the effect of do(X=x) aligns with interventions on causes of X within structural causal models.

Findings

01

Effect of do(X=x) equals intervention on causes of X affecting outcome only through X

02

Interventions on causes affecting outcome through other pathways only partly captured by do(X=x)

03

In simple models, do(X=x) represents an indirect effect of interventions on causes W

Abstract

For exposures $X$ like obesity, no precise and unambiguous definition exists for the hypothetical intervention $d o (X = x_{0})$ . This has raised concerns about the relevance of causal effects estimated from observational studies for such exposures. Under the framework of structural causal models, we study how the effect of $d o (X = x_{0})$ relates to the effect of interventions on causes of $X$ . We show that for interventions focusing on causes of $X$ that affect the outcome through $X$ only, the effect of $d o (X = x_{0})$ equals the effect of the considered intervention. On the other hand, for interventions on causes $W$ of $X$ that affect the outcome not only through $X$ , we show that the effect of $d o (X = x_{0})$ only partly captures the effect of the intervention. In particular, under simple causal models (e.g., linear models with no interaction), the effect of $d o (X = x_{0})$ can be seen as an…

Equations50

\left\{\begin{array}[]{l c l}X&=&f_{X}(U)\\ Y&=&f_{Y}(X,\xi)\end{array}\right.

\left\{\begin{array}[]{l c l}X&=&f_{X}(U)\\ Y&=&f_{Y}(X,\xi)\end{array}\right.

\left\{\begin{array}[]{l c l}X&=&f_{X}(V,\vartheta)\\ Y&=&f_{Y}(X,\xi)\end{array}\right.

\left\{\begin{array}[]{l c l}X&=&f_{X}(V,\vartheta)\\ Y&=&f_{Y}(X,\xi)\end{array}\right.

A C E

A C E

= I P (Y = 1∣ X = 1) - I P (Y = 1∣ X = 0),

{X = x_{0}}

{X = x_{0}}

A C E

A C E

\left\{\begin{array}[]{l c l}X&=&f_{X}(U,W)\\ Y&=&f_{Y}(X,W,\xi)\end{array}\right.

\left\{\begin{array}[]{l c l}X&=&f_{X}(U,W)\\ Y&=&f_{Y}(X,W,\xi)\end{array}\right.

\left\{\begin{array}[]{l c l}X&=&f_{X}(V,\vartheta,W,Z)\\ Y&=&f_{Y}(X,W,Z,\xi)\end{array}\right.

\left\{\begin{array}[]{l c l}X&=&f_{X}(V,\vartheta,W,Z)\\ Y&=&f_{Y}(X,W,Z,\xi)\end{array}\right.

I P (Y = 1∣ d o (U = u_{x_{0}}))

I P (Y = 1∣ d o (U = u_{x_{0}}))

= I P (f_{Y} (X^{(U = u_{x_{0}})}, ξ) = 1)

= I P (f_{Y} (x_{0}, ξ) = 1)

= I P (Y^{(x_{0})} = 1)

= I P (Y = 1∣ d o (X = x_{0})) .

I P (Y = 1∣

I P (Y = 1∣

= I P (f_{Y} (X^{(U = u_{x_{0}} (w_{0}))}, W, ξ) = 1∣ W = w_{0})

= I P (f_{Y} (x_{0}, w_{0}, ξ) = 1)

= I P (Y^{(X = x_{0}, W = w_{0})} = 1)

= I P (Y = 1∣ d o (X = x_{0}, W = w_{0}))

= I P (Y = 1∣ d o (X = x_{0}), W = w_{0}),

I P (Y = 1∣ d o (U = u_{x_{0}} (W)))

I P (Y = 1∣ d o (U = u_{x_{0}} (W)))

= w_{0} \sum I P (Y = 1∣ d o (X = x_{0}), W = w_{0}) I P (W = w_{0})

= I P (Y = 1∣ d o (X = x_{0})) .

I P (Y = 1∣

I P (Y = 1∣

= I P (f_{Y} (X^{(W = w_{x_{0}} (u_{0}))}, w_{x_{0}} (u_{0}), ξ) = 1∣ U = u_{0})

= I P (f_{Y} (x_{0}, w_{x_{0}} (u_{0}), ξ) = 1∣ U = u_{0})

= I P (f_{Y} (x_{0}, w_{x_{0}} (u_{0}), ξ) = 1)

= I P (Y^{(X = x_{0}, W = w_{x_{0}} (u_{0}))} = 1) .

I E (Y^{(w_{1} (U))} - Y^{(w_{0} (U))})

I E (Y^{(w_{1} (U))} - Y^{(w_{0} (U))})

= u \sum I E (Y^{(w_{1} (u), X^{(w_{1} (u))})} - Y^{(w_{0} (u), X^{(w_{0} (u))})} ∣ U = u) I P (U = u)

= u \sum {I E (Y^{(w_{1} (u), X^{(w_{1} (u))})} - Y^{(w_{1} (u), X^{(w_{0} (u))})} ∣ U = u)

+ I E (Y^{(w_{1} (u), X^{(w_{0} (u))})} - Y^{(w_{0} (u), X^{(w_{0} (u))})} ∣ U = u)} I P (U = u)

= u \sum I E (Y^{(w_{1} (u), x_{1})} - Y^{(w_{1} (u), x_{0})} + Y^{(w_{1} (u), x_{0})} - Y^{(w_{0} (u), x_{0})}) I P (U = u) .

u \sum I E (Y^{(w_{1} (u), x_{1})} - Y^{(w_{1} (u), x_{0})}) I P (U = u)

u \sum I E (Y^{(w_{1} (u), x_{1})} - Y^{(w_{1} (u), x_{0})}) I P (U = u)

= u \sum {I E (Y ∣ W = w_{1} (u), X = x_{1}) - I E (Y ∣ W = w_{1} (u), X = x_{0})} I P (U = u) .

I E (Y ∣ d o (X = x_{1})) - I E (Y ∣ d o (X = x_{0}))

I E (Y ∣ d o (X = x_{1})) - I E (Y ∣ d o (X = x_{0}))

= w \sum {I E (Y ∣ W = w, X = x_{1}) - I E (Y ∣ W = w, X = x_{0})} I P (W = w) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life · Statistical Methods in Clinical Trials

Full text

Which practical interventions does the $do$ -operator refer to in causal inference? Illustration on the example of obesity and cancer.

Lola Etievant*(1)* and Vivian Viallon*(2)*

( (1) Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, 43 boulevard du 11 novembre 1918, F-69622 Villeurbanne, France.

(2) International Agency for Research on Cancer, Nutritional Methodology and Biostatistics Group, Lyon, France.

)

Abstract

For exposures $X$ like obesity, no precise and unambiguous definition exists for the hypothetical intervention $do(X=x_{0})$ . This has raised concerns about the relevance of causal effects estimated from observational studies for such exposures. Under the framework of structural causal models, we study how the effect of $do(X=x_{0})$ relates to the effect of interventions on causes of $X$ . We show that for interventions focusing on causes of $X$ that affect the outcome through $X$ only, the effect of $do(X=x_{0})$ equals the effect of the considered intervention. On the other hand, for interventions on causes $W$ of $X$ that affect the outcome not only through $X$ , we show that the effect of $do(X=x_{0})$ only partly captures the effect of the intervention. In particular, under simple causal models (e.g., linear models with no interaction), the effect of $do(X=x_{0})$ can be seen as an indirect effect of the intervention on $W$ .

1 Introduction

Because most epidemiological results are derived from observational data, their causal interpretation has always been at the center of concern 1. Causal inference theory, which has attracted a lot of interest in the last few decades, has proved useful to formally describe conditions ensuring the causal validity of results derived from observational data 2, 3, 4, 5, 6, 7. For example, a number of sets of sufficient conditions has been established for the identifiability of causal effects in the presence of confounding or non-random selection. Under the so-called Structural Causal Models 3, 6 (SCMs), and further assuming that the structure of the underlying Directed Acyclic Graph (DAG) is known, a key condition for the identifiability of the causal effect is exchangeability, or ignorability 3, 6, 7. In particular, exchangeability has been shown to hold conditionally on any set of variables satisfying the back-door criterion 3, 6. Then, a variety of statistical approaches have been proposed for the estimation of causal effects under increasingly complex settings including time-varying confounding, failure time data, etc. Among other approaches, we shall mention the parametric g-formula, inverse probability weighting approaches, g-estimation and doubly robust procedures 3, 7, 8.

Even if their use has been controversial 9, counterfactual variables, or potential outcomes, are key to most causal inference theories commonly considered nowadays, in epidemiology, social science, statistics and computer science. The $do$ -calculus that accompanies SCMs allows precise definitions of these variables and their joint distribution 3, 6. Here, we will use the notation $Y^{(X=x_{0})}$ to denote the counterfactual variable representing the outcome that would have been observed in the counterfactual world $\Omega^{(X=x_{0})}$ that would have followed the hypothetical intervention $do(X=x_{0})$ , where $X$ is the exposure of interest and $x_{0}$ is any potential value for this exposure 3. For simplicity, we will focus on binary outcomes, and we let ${\rm I}\kern-1.79993pt{\rm P}(Y=1|do(X=x_{0})=1)={\rm I}\kern-1.79993pt{\rm P}(Y^{(X=x_{0})}=1)$ denote the probability of observing the outcome in this counterfactual world.

For some exposures, the lack of a precise and unambiguous definition for the intervention $do(X=x_{0})$ has raised some concerns in the literature 10, 11, 12, 13, 14, 15, 16, 17, 18, 19. For example, consider the case where $X$ stands for a binary variable indicating obesity status at 20 years of age. In a population of lean teenagers, or even newborns, the hypothetical intervention $do(X=x_{0})$ , for $x_{0}=0$ (or $x_{0}=1$ ), could then correspond to a typically adaptive and dynamic intervention that would ensure that individuals stay lean (or get obese) by the age of 20. However, these interventions are not well-defined, in the sense that different “versions” may lead to the same obesity value $x_{0}$ at 20 years-old. For instance, in the “stay lean” arm ( $do(X=0)$ ), individuals may be asked to do 45 minutes of physical exercise a day, or 72 minutes of physical exercise a day. They could also be asked to adhere to a healthy diet, etc. In addition, some of the versions ensuring that $X=0$ at 20 years old may be impossible to apply in practice, such as those involving genetic factors.

More generally, this situation of a treatment with different versions, or compound treatment, violates the “no-multiple-versions-of-treatment assumption”, which is part of the “Stable Unit Treatment Value Assumption” (SUTVA) 20, 16. This has led to some debate around the relevance, for public health matters, of the causal effects estimated from observational studies in such cases. Interestingly, most arguments have been based by considering the situation where “treatment precedes versions of that treatment”, while situations where “versions precede treatment” were only quickly mentioned, if at all 11, 12, 16. Here, we consider the situations where versions precede treatment, in which case these versions can be seen as particular levels for the causes of $X$ . Then, focusing on situations where direct interventions on $X$ are impractical, we inspect how the effect of the hypothetical intervention $do(X=x_{0})$ relates to the effects of interventions on causes of $X$ . We show that the effect of the hypothetical intervention $do(X=x_{0})$ equals the effect of particular interventions on causes of $X$ that are causes of $Y$ through $X$ only, as expected. However, for causes $W$ that influence $Y$ not only through $X$ , the causal effect of $X$ differs from the causal effect of interventions on $W$ . For example, in the particular case of obesity and cancer occurence, the effect of $do(X=x_{0})$ is different from the effects of interventions on diet or physical activity, except for cancers whose risk is not directly associated with diet and/or physical activity.

To make our illustrative example even more concrete, we assume throughout that we intend to estimate the causal effect of obesity at 20 years of age on the occurence of cancer by the age of 50. A typical prospective cohort study would sample individuals who are cancer-free at the age of 20, record information regarding their obesity status and other variables (potential confounders, etc.) at inclusion, follow these individuals over the age interval 20-50 and finally record cancer occurence by the age of 50. Denote by $X\in\{0,1\}$ and $Y\in\{0,1\}$ the binary variables representing obesity at 20 and cancer occurence between 20 and 50. For simplicity, we further assume the absence of competing events and censoring.

The rest of the article is organized as follows. Even if this is highly unlikely in our illustrative example, we start by considering the unconfounded setting where all causes of $X$ are causes of $Y$ through $X$ only. Then, in Section 3, we consider a more realistic setting where confounders are present. We shall stress that this second setting is still an over-simplified version of the causal model in our illustrative example (see the Discussion). Yet, we believe it is instructive to describe the relationship between the intervention $do(X=x_{0})$ and its multiple versions. Under both settings, we consider the situation where some causes are modifiable, while others are not. Section 4 presents some concluding remarks and discussion. Proofs of our main results are presented in the Appendix.

2 The unconfounded case

Because exposure $X$ is not randomized in our prospective cohort study, identifiability of the causal effect of $X$ on $Y$ is generally not guaranteed. A particular situation when this causal effect is identifiable is when all causes of $X$ , denoted by $U$ in this simple case, are causes of $Y$ through $X$ only. Even if this absence of confounders is highly unlikely in our illustrative example, it is instructive to consider this simple situation as a starting point. The more general situation where confounding is present is deferred to Section 3.

2.1 Preliminary derivations

Consider that the data available in our cohort study are generated by a causal model with associated DAG and structural equations as presented in Figure 1a. Variables $\xi$ and $U$ represent all causes of $Y$ and $X$ , respectively, and are assumed to be independent to each other. Both $\xi$ and $U$ may include purely random components. Given the structural equations attached to this simple causal model, we have $\{X=x\}\Rightarrow\{Y=Y^{(x)}\}$ , so that consistency holds. Moreover, under this causal model, the ignorability condition $Y^{(x)}\mathbin{\mathchoice{\hbox to0.0pt{\hbox{\set@color$ \displaystyle\perp $}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$ \displaystyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \textstyle\perp $}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$ \textstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptstyle\perp $}\hss}\kern 2.36812pt{}\kern 2.36812pt\hbox{\set@color$ \scriptstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptscriptstyle\perp $}\hss}\kern 1.63437pt{}\kern 1.63437pt\hbox{\set@color$ \scriptscriptstyle\perp $}}}X$ holds. Then, whenever the positivity condition further holds ( $0<{\rm I}\kern-1.4pt{\rm P}(X=1)<1$ ), we have

[TABLE]

and the causal effect of $X$ on $Y$ is identifiable. But, when direct interventions on $X$ are impractical, and only interventions on the causes of $X$ are practical, a natural question is the meaning of the hypothetical intervention $do(X=x)$ . Consider the structural equation pertaining to exposure, $X=f_{X}(U)$ , and set $f_{X}^{-1}(x_{0})=\{u:f_{X}(u)=x_{0}\}$ . Of course, we have $X=x_{0}\Leftrightarrow U\in f_{X}^{-1}(x_{0})$ . As a result, for any $u_{x_{0}}\in f_{X}^{-1}(x_{0})$ , ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(U=u_{x_{0}}))={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ ; see Appendix A. In this simple case, all interventions $do(U=u_{x_{0}})$ on the causes of $X$ which would yield $X=x_{0}$ share the same effect on $Y$ : versions are irrelevant 11, 16, and the causal effect ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ estimated on the cohort is an estimate of this shared effect.

2.2 Distinguishing modifiable and non-modifiable causes

To gain insight from a practical standpoint, the previous analysis can be slightly refined by decomposing causes of $X$ as $U=(V,\vartheta)$ where $V$ and $\vartheta$ correspond to sets of modifiable and non-modifiable causes of $X$ , respectively. See Figure 1b. Because non-modifiable causes may affect modifiable ones, while the former are unlikely to be affected by the latter, we do not consider the possibility of an arrow pointing from $V$ to $\vartheta$ in Figure 1b. Causes $\vartheta$ are non-modifiable and the only interventions that could be practically set up are those on $V$ . Denote the set of possible values of $\vartheta$ by ${\cal V}$ . Then, for any $x\in\{0,1\}$ and $\nu\in{\cal V}$ , set $f_{X|\vartheta}^{-1}(x;\nu)=\{v:f_{X}(v,\nu)=x\}$ . First assume that this set is non-empty for any $x\in\{0,1\}$ and $\nu\in{\cal V}$ : in other words, first assume that, for any $x\in\{0,1\}$ , and for any value $\nu$ for the non-modifiable factors $\vartheta$ , there exists some value $v$ of the modifiable factors $V$ such that $f_{X}(\nu,v)=x$ . Now, for individuals such that $\vartheta=\nu_{0}$ , for any $\nu_{0}\in{\cal V}$ , we have $X=x_{0}\Leftrightarrow V\in f_{X|\vartheta}^{-1}(x_{0};\nu_{0})$ . Therefore ${\rm I}\kern-1.4pt{\rm P}(Y^{(V=v_{x_{0}}(\nu_{0}))}=1|\vartheta=\nu_{0})={\rm I}\kern-1.4pt{\rm P}(Y=1|do(V=v_{x_{0}}(\nu_{0})),\vartheta=\nu_{0})={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ for any $v_{x_{0}}(\nu_{0})\in f_{X|\vartheta}^{-1}(x_{0};\nu_{0})$ . Denote by $do(V=v_{x_{0}}(\vartheta))$ any intervention which sets, for all individuals in the population, the value of $V$ according to the value $\nu_{0}$ of $\vartheta$ , in such a way that for any individual with $\vartheta=\nu_{0}$ , the intervention $do(V=v_{x_{0}}(\vartheta))$ sets $V$ to $v_{x_{0}}(\nu_{0})\in f_{X|\vartheta}^{-1}(x_{0};\nu_{0})$ . Then, we have ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(V=v_{x_{0}}(\vartheta)))={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ . In other words, versions are again irrelevant and any such intervention has the same effect on $Y$ , which is ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(V=v_{x_{0}}(\vartheta))={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ .

Of course, unless there exists at least one value $v_{1}\in\cap_{\nu\in{\cal V}}\{f_{X|\vartheta}^{-1}(x_{0};\nu)\}$ , only a dynamic, i.e. individual-specific, treatment can be adopted to attain this effect. For instance, consider the “stay lean” arm of the clinical trial mentioned in the Introduction. Because individuals may be more or less genetically predisposed to obesity, some individuals will have to make little effort to stay lean by the age of 20, while others will have to adopt a drastic diet and/or have intense physical activity, etc. We may stress that this heterogeneity among individuals is at the core of personalized (preventive) medicine and need to be acknowledged, rather than discarded, in causal inference. Similarly, our cohort reflects this heterogeneity: individuals sharing the same obesity status $\{X=x_{0}\}$ , for $x_{0}\in\{0,1\}$ , can differ regarding $V$ and $\vartheta$ . More precisely, for $x_{0}\in\{0,1\}$ , set ${\cal V}(x_{0})=\{\nu\in{\cal V}:f_{X|\vartheta}^{-1}(x_{0};\nu)\neq\emptyset\}$ . The lean and obese groups in our cohort are sampled from

[TABLE]

for $x_{0}=0$ and $x_{0}=1$ , respectively. Again, if the model of Figure 1b is correct, versions of the compound treatment obesity are not relevant 11, 16. Therefore, how the levels of the causes of “obesity at 20 years of age” are mixed up in the group of obese, or lean, individuals in our cohort is not relevant either: our cohort would return unbiased estimates for the quantity ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))={\rm I}\kern-1.4pt{\rm P}(Y=1|X=x_{0})$ , just as the clinical trial would. Then, the effect of the intervention $do(X=x_{0})$ can again be interpreted as the effect of any intervention on the causes of $X$ ensuring $X=x_{0}$ .

If, for some $x$ , there exist some values $\nu_{1}\in{\cal V}$ of the non-modifiable variables $\vartheta$ such that the set $f_{X|\vartheta}^{-1}(x;\nu_{1})$ is empty, the intervention $do(X=x)$ is purely theoretical for individuals such that $\vartheta=\nu_{1}$ since no practical intervention could yield $X=x$ for them. However, under the assumptions of SCMs, and if the DAG of Figure 1b is correct, the effect of the hypothetical intervention $do(X=x_{0})$ can still be estimated from our cohort study even if no practical intervention ensuring $X=x_{0}$ exists for individuals with $\vartheta=\nu_{1}$ . Indeed, we have ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}),\vartheta=\nu_{1})={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0})={\rm I}\kern-1.4pt{\rm P}(Y=1|X=x_{0})$ .

3 The more standard case with confounders

3.1 Preliminary analyses

We now turn our attention to the more common situation where confounding is present. Without loss of generality, assume that causes of $X$ are grouped in two sets, $W$ and $U$ . Here, and as above, causes in $U$ are assumed to have an effect on $Y$ through $X$ only, while $W$ is the set of common causes of $X$ and $Y$ , that is the set of confounders in the $X$ - $Y$ relationship. In our illustrative example, $W$ could include gender, physical activity and dietary habit, while $U$ might include genetic predisposition to obesity. Figure 2a depicts the corresponding causal model. Assume for ease of notation that the set ${\cal W}$ of possible values for $W$ is discrete. Further recall that consistency still holds, and assume that $0<{\rm I}\kern-1.4pt{\rm P}(X=1|W=w)<1$ for all $w$ such that ${\rm I}\kern-1.4pt{\rm P}(W=w)>0$ . Then, because $Y^{(x)}\mathbin{\mathchoice{\hbox to0.0pt{\hbox{\set@color$ \displaystyle\perp $}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$ \displaystyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \textstyle\perp $}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$ \textstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptstyle\perp $}\hss}\kern 2.36812pt{}\kern 2.36812pt\hbox{\set@color$ \scriptstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptscriptstyle\perp $}\hss}\kern 1.63437pt{}\kern 1.63437pt\hbox{\set@color$ \scriptscriptstyle\perp $}}}X|W$ under the model depicted in Figure 2a , the causal effect of $X$ on $Y$ is identifiable. More precisely, we have

[TABLE]

But, again, a natural question is how the hypothetical intervention $do(X=x)$ does relate to interventions on causes of $X$ . Neglecting for now issues related to the possibility to apply these interventions in practice, these interventions can concern either $(i)$ $U$ only, $(ii)$ $W$ only, or $(iii)$ both $U$ and $W$ .

First consider interventions on $U$ only and set, for any $x\in\{0,1\}$ and $w\in{\cal W}$ , $f_{X|W}^{-1}(x;w)=\{u:f_{X}(u,w)=x\}$ . For any $w_{0}\in{\cal W}$ , we have $X=x_{0}\Leftrightarrow U\in f_{X|W}^{-1}(x_{0};w_{0})$ for individuals belonging to stratum $W=w_{0}$ . Then, assume that $f_{X|W}^{-1}(x_{0};w_{0})$ is non-empty for all $(x_{0},w_{0})\in\{0,1\}\times{\cal W}$ and denote by $do(U=u_{x_{0}}(W))$ any intervention setting $U$ to any value $u_{x_{0}}(w_{0})\in f_{X|W}^{-1}(x_{0};w_{0})$ for individuals in stratum $W=w_{0}$ , for all $w_{0}\in{\cal W}$ . Arguing as in Section 2.2, we get ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(U=u_{x_{0}}(W)))={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ ; see Section B.1 in the Appendix. Again, versions are irrelevant, and any such intervention has the same effect on $Y$ , which is ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ .

Now consider interventions on $W$ only and set, for any $x\in\{0,1\}$ and $u\in{\cal U}$ , $f_{X|U}^{-1}(x;u)=\{w:f_{X}(u,w)=x\}$ . Then, assume that $f_{X|U}^{-1}(x;u)$ is non-empty for every $(x,u)\in\{0,1\}\times{\cal U}$ , and for any $u_{0}\in{\cal U}$ , denote by $w_{x_{0}}(u_{0})$ one given element of $f_{X|U}^{-1}(x_{0};u_{0})$ . Given this particular collection of values $(w_{x_{0}}(u))_{u\in{\cal U}}$ , denote by $do(W=w_{x_{0}}(U))$ the intervention which sets $W$ to $w_{x_{0}}(u_{0})$ for individuals in stratum $U=u_{0}$ , for all $u_{0}\in{\cal U}$ . Arguing as before, it comes that ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(W=w_{x_{0}}(U)))={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0},W=w_{x_{0}}(U)))$ , which generally differs from ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0}))$ . The intervention $do(W=w_{x_{0}}(U))$ does entail $X=x_{0}$ for all individuals, but because $W$ has an effect on $Y$ not only through $X$ , the effect of $do(W=w_{x_{0}}(U))$ is not entirely captured by that of $do(X=x_{0})$ . Actually, $X$ can be seen as a mediator in the $W-Y$ relationship, and, under simple models, in particular in the absence of interaction between $X$ and $W$ , the effect of $do(X=x_{0})$ is actually related to the indirect effect of the intervention $do(W=w_{x_{0}}(U))$ , through $X$ ; see Section B.3 in the Appendix. It is also important to note that ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(W=w_{x_{0}}(U))$ depends on the collection of values $(w_{x_{0}}(u))_{u\in{\cal U}}$ . If $w_{0}$ and $\tilde{w}_{0}$ are two distinct elements of $f_{X|U}^{-1}(x_{0};u_{0})$ for some $u_{0}\in{\cal U}$ , then ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(W=w_{0}),U=u_{0})={\rm I}\kern-1.4pt{\rm P}(Y^{(W=w_{0},X=x_{0})}=1)$ , while ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(W=\tilde{w}_{0}),U=u_{0})={\rm I}\kern-1.4pt{\rm P}(Y^{(W=\tilde{w}_{0},X=x_{0})}=1)$ . The difference between these two quantities is related to the direct effect of $W$ , and reflects the fact that two interventions on $W$ sharing the same effect on $X$ do not necessarily have the same effects on $Y$ when $W$ has a direct effect on $Y$ : in this case, versions of the compound treatment are relevant.

Now, if $f_{X|U}^{-1}(x;u)$ is empty for some $(x,u)\in\{0,1\}\times{\cal U}$ , then no intervention on $W$ only can ensure $X=x$ for individuals in stratum $U=u$ . Similarly, if $f_{X|W}^{-1}(x;w)$ is empty for some pair $(x,w)$ , then no intervention on $U$ only can ensure $X=x$ for individuals in stratum $W=w$ . Then, consider interventions on both $W$ and $U$ , and set $f_{X}^{-1}(x)=\{(w,u):f_{X}(u,w)=x\}$ . For any $(w_{0},u_{0})\in f_{X}^{-1}(x_{0})$ , it is easy to show that ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(W=w_{0},U=u_{0}))={\rm I}\kern-1.4pt{\rm P}(Y^{(W=w_{0},X=x_{0})}=1)$ . Therefore, interventions on both $W$ and $U$ that ensure $X=x_{0}$ are similar to interventions on $W$ only: their effects are generally not uniquely defined (they depend on the particular pair of values $(w_{0},u_{0})\in f_{X}^{-1}(x_{0})$ ) and only partly capture the effect of interventions on $X$ .

3.2 Distinguishing modifiable and non-modifiables causes

All the analyses above can be refined by acknowledging that some causes in $U$ and $W$ are modifiable, while others are not, and by considering interventions on modifiable causes only. See Figure 2b. Compared to Section 3.1, notations become a little more complex, but conclusions remain mostly similar. For instance, consider interventions on both $V$ and $W$ , where $V$ is a modifiable cause of $X$ with no direct effect on $Y$ , while $W$ is a modifiable confounder in the $X-Y$ relationship. For any $x_{0}\in\{0,1\}$ and any potential values $\nu$ and $z$ for non-modifiable causes $\vartheta$ and $Z$ , assume that the set $f_{X|\vartheta,Z}^{-1}(x_{0};\nu,z)=\{(v,w):f_{X}(v,\nu,w,z)=x_{0}\}$ is non-empty, and denote by $(v_{x_{0}}(\nu,z),w_{x_{0}}(\nu,z))$ one given element in this set. Then denote by $do(V=v_{x_{0}}(\vartheta,Z),W=w_{x_{0}}(\vartheta,Z))$ the intervention setting $V$ to $v_{x_{0}}(\nu_{0},z_{0})$ and $W$ to $w_{x_{0}}(\nu_{0},z_{0})$ for any individuals in stratum $\{\vartheta=\nu_{0}\}\cap\{Z=z_{0}\}$ , for all $\nu_{0},z_{0}$ . Arguing as before, it can be shown that ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(V=v_{x_{0}}(\vartheta,Z),W=w_{x_{0}}(\vartheta,Z)))={\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0},W=w_{x_{0}}(\vartheta,Z))).$ This quantity generally differs from ${\rm I}\kern-1.4pt{\rm P}(Y=1|do(X=x_{0})$ and the reason again is that the intervention $do(V=v_{x_{0}}(\vartheta,Z),W=w_{x_{0}}(\vartheta,Z))$ not only ensures that $X=x_{0}$ , but it also has a direct effect on $Y$ through the intervention on $W$ .

4 Conclusion-Discussion

In this article, we showed how the hypothetical intervention $do(X=x_{0})$ , when impossible to apply in practice, relates to interventions on causes of $X$ . Basing our arguments on structural causal models, our conclusions are in line with those of Petersen 12: the DAG which represents our assumptions on the causal model under study is basically sufficient (and necessary) to precisely understand how $do(X=x_{0})$ can be interpreted. When interventions on causes of $X$ that are causes of $Y$ through $X$ only exist, the effect of $do(X=x_{0})$ captures the effect of such interventions. However, for causes of $X$ , say $W$ , that cause $Y$ not only through $X$ , the effect of $do(X=x_{0})$ only partly captures the effect of interventions on $W$ . Under simple causal models, the effect of $do(X=x_{0})$ is related to the indirect effect of interventions on $W$ .

Taking the example of obesity (at 20 years old) and the risk of cancer (by the age of 50), our results confirm concerns raised by several authors 16, 19, 11: because most modifiable causes of obesity can be regarded as confounders in the obesity-cancer relationship, the effect of obesity estimated from observational data likely differs from the effect of interventions on these causes, which could be estimated through clinical trials. At this point, however, we may insist on the fact that, if all modifiable causes of obesity are confounders in the obesity-cancer relationship, then clinical trials would not yield an estimate of the effect of obesity on cancer. Instead, a clinical trial would return an estimate of the causal effect of the considered intervention on cancer, and this effect would only partly capture the effect of obesity. Consider again the clinical trial sketched in the Introduction. More precisely, consider a randomized clinical trial where the study population, corresponding, e.g. to lean teenagers, is randomly assigned to two arms. Denote by $U$ and $Z$ the other, possibly non-modifiable, causes of $X$ , with $Z$ corresponding to common causes of $Y$ and $X$ , and $U$ corresponding to causes of $Y$ through $X$ only. In this setting, observe that $Y^{X=x}\mathbin{\mathchoice{\hbox to0.0pt{\hbox{\set@color$ \displaystyle\perp $}\hss}\kern 3.46875pt{\not}\kern 3.46875pt\hbox{\set@color$ \displaystyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \textstyle\perp $}\hss}\kern 3.46875pt{\not}\kern 3.46875pt\hbox{\set@color$ \textstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptstyle\perp $}\hss}\kern 2.36812pt{\not}\kern 2.36812pt\hbox{\set@color$ \scriptstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptscriptstyle\perp $}\hss}\kern 1.63437pt{\not}\kern 1.63437pt\hbox{\set@color$ \scriptscriptstyle\perp $}}}W$ while $Y^{X=x}\mathbin{\mathchoice{\hbox to0.0pt{\hbox{\set@color$ \displaystyle\perp $}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$ \displaystyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \textstyle\perp $}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$ \textstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptstyle\perp $}\hss}\kern 2.36812pt{}\kern 2.36812pt\hbox{\set@color$ \scriptstyle\perp $}}{\hbox to0.0pt{\hbox{\set@color$ \scriptscriptstyle\perp $}\hss}\kern 1.63437pt{}\kern 1.63437pt\hbox{\set@color$ \scriptscriptstyle\perp $}}}(W,Z)$ in general. Denote by ${\cal U}$ and ${\cal Z}$ the sets of possible values for $U$ and $Z$ , respectively. Then, an “ideal” clinical trial would consist in randomly assigning individuals to one of the following two groups: those for whom $W$ would be set to $w_{1}(U,Z)$ and those for whom $W$ would be set to $w_{0}(U,Z)$ , for two given collections of values $(w_{0}(u,z))_{u\in{\cal U},z\in{\cal Z}}$ and $(w_{1}(u,z))_{u\in{\cal U},z\in{\cal Z}}$ , where $w_{0}(u,z)$ and $w_{1}(u,z)$ ensure that $X=0$ and $X=1$ , respectively, for individuals with $U=u$ and $Z=z$ . Assuming complete compliance, and arguing as in Section 3, it is easy to show that the comparison of these two groups would return an estimate of the effect of this particular intervention on $W$ , not that of $X$ . Comparisons should be made between groups of individuals sharing the same value for $W$ and $Z$ to obtain a valid estimate of the effect of obesity, within strata defined by $W$ and $Z$ . In other words, under this ideal clinical trial setting, non-modifiable confounders in the $X-Y$ relationship would still have to be measured and controlled for to unbiasedly estimate the causal effect of obesity, within strata defined by $W$ and $Z$ . When controlled for a sufficient set of confounders, analyses based on observational studies can be used to derive unbiased estimates of these same effects.

There are a number of subtleties that we neglected for the sake of simplicity. First, a clinical trial whose objective is to prevent obesity by the age of 20 would typically not only be dynamic, but also adaptive, i.e. the intervention is not only subject-specific, but it is also time-dependent. A good example is the Feeding Dynamic Intervention, to prevent childhood obesity (https://clinicaltrials.gov/ct2/show/NCT01515254). Similarly, although we focused on time-fixed exposure and confounders, but they are all time-varying in the population. For instance, physical activity and food intakes vary over the age interval $[0,20)$ , and the corresponding variables are all potential confounders in the relationship between obesity at 20 years-old and cancer occurence before 50 years-old. Another important time-varying cause of obesity at 20 years-old is obesity over the age interval $[0,19)$ . Consequently, individuals in the two groups of our cohort, obese and lean at 20 years-old, do not only differ because of their status regarding obesity at 20 years of age, they also typically differ with respect to their histories regarding obesity, physical activity and dietary habits. This can lead to biases if these histories are not appropriately accounted for in the analysis 21. Second, selection bias may also be at play in our cohort study since only individuals who are cancer-free at 20 can be included. This selection bias will be more severe if cancer risk before 20 years old is associated to levels of obesity, physical activity and dietary habits over the age interval [0, 19]. This selection bias due to prevalent exposure and depletion of susceptibles has been put forward as one of the reasons explaining the discrepancies between results obtained through observational and interventional data when studying the association between hormone replacement therapy and coronary heart disease for instance 22.

Appendix A Proof in the unconfounded case

Under the model depicted in Figure 1a, we have

[TABLE]

Appendix B Proof in the confounded case

B.1 Interventions of type $(i)$

Assume that $f_{X|W}^{-1}(x_{0};w_{0})$ is non-empty for any $x_{0},w_{0}$ . Then, under the model depicted in Figure 2a, we have, for any $u_{x_{0}}(w_{0})\in f_{X|W}^{-1}(x_{0};w_{0})$

[TABLE]

where the last equality follows from rule 2 of the do-calculus3.

Moreover,

[TABLE]

B.2 Interventions of type $(ii)$

Assume that $f_{X|U}^{-1}(x_{0};u_{0})$ is non-empty for any $x_{0},u_{0}$ . Then, under the model depicted in Figure 2a, we have, for any $w_{x_{0}}(u_{0})\in f_{X|U}^{-1}(x_{0};u_{0})$

[TABLE]

B.3 Relationship with indirect effects

Denote by $(w_{1}(u_{0}),w_{0}(u_{0}))_{u_{0}\in{\cal U}}$ two given collection of values such that $w_{1}(u_{0})\in f_{X|U}^{-1}(1;u_{0})$ and $w_{0}(u_{0})\in f_{X|U}^{-1}(0;u_{0})$ . Further let $do(W=w_{1}(U))$ and $do(W=w_{0}(U))$ denote two given interventions setting $W$ to $w_{1}(u_{0})\in f_{X|U}^{-1}(1;u_{0})$ and $w_{0}(u_{0})\in f_{X|U}^{-1}(0;u_{0})$ , respectively, for individuals in stratum $U=u_{0}$ , for all $u_{0}\in{\cal U}$ . We have

[TABLE]

The term $\sum_{u}{\rm I}\kern-1.4pt{\rm E}(Y^{(w_{1}(u),x_{1})}-Y^{(w_{1}(u),x_{0})}){\rm I}\kern-1.4pt{\rm P}(U=u)$ can be regarded as an indirect effect since the level of $W$ is held fixed and only the value of $X$ changes from $x_{0}$ to $x_{1}$ which, for individuals in stratum $U=u$ , equal $X^{(W=w_{0}(u))}$ and $X^{(W=w_{1}(u))}$ respectively. More precisely, we have

[TABLE]

Under the model depicted in Figure 2a, recall we have

[TABLE]

Under simple causal models, for instance when $f_{Y}(W,X,\xi)=\alpha^{T}W+\beta X+\xi$ , the two quantities, $\sum_{u}{\rm I}\kern-1.4pt{\rm E}(Y^{(w_{1}(u),x_{1})}-Y^{(w_{1}(u),x_{0})}){\rm I}\kern-1.4pt{\rm P}(U=u)$ and ${\rm I}\kern-1.4pt{\rm E}(Y|do(X=x_{1}))-{\rm I}\kern-1.4pt{\rm E}(Y|do(X=x_{0}))$ , coincide and equal $\beta$ . However, under more complex models, these two quantities are typically different. Even under linear models, if interaction terms of the form $\gamma^{T}WX$ are present in function $f_{Y}$ , these two terms are typically different and $\sum_{u}{\rm I}\kern-1.4pt{\rm E}(Y^{(w_{1}(u),x_{1})}-Y^{(w_{1}(u),x_{0})}){\rm I}\kern-1.4pt{\rm P}(U=u)$ would actually depend on the collection of values $\{w_{1}(u),u\in{\cal U}\}$ .

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

11 K. J. Rothman, S. Greenland, and T. L. Lash, Modern Epidemiology . Lippincott Williams & Wilkins, 2008.
22 D. B. Rubin, “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology , vol. 66, no. 5, pp. 688–701, 1974.
33 J. Pearl, Causality: models, reasoning, and inference . Cambridge, U.K. ; New York: Cambridge University Press, 2000.
44 K. J. Rothman and S. Greenland, “Causation and causal inference in epidemiology,” American Journal of Public Health , vol. 95, no. S 1, pp. S 144–S 150, 2005.
55 M. Glymour and S. Greenland, “Causal diagrams,” in Modern epidemiology , pp. 183–209, 3rd ed. lippincott williams & wilkins ed., 2008.
66 J. Pearl, “Causal inference in statistics: An overview,” Statistics Surveys , vol. 3, no. 0, pp. 96–146, 2009.
77 M. A. Hernan and J. M. Robins, Causal Inference . Boca Raton: Chapman & Hall/CRC, forthcoming.
88 J. K. Lunceford and M. Davidian, “Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study,” Statistics in medicine , vol. 23, no. 19, pp. 2937–2960, 2004.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Which practical interventions does the dododo-operator refer to in causal inference? Illustration on the example of obesity and cancer.

Abstract

1 Introduction

2 The unconfounded case

2.1 Preliminary derivations

2.2 Distinguishing modifiable and non-modifiable causes

3 The more standard case with confounders

3.1 Preliminary analyses

3.2 Distinguishing modifiable and non-modifiables causes

4 Conclusion-Discussion

Appendix A Proof in the unconfounded case

Appendix B Proof in the confounded case

B.1 Interventions of type (i)(i)(i)

B.2 Interventions of type (ii)(ii)(ii)

B.3 Relationship with indirect effects

Which practical interventions does the $do$ -operator refer to in causal inference? Illustration on the example of obesity and cancer.

B.1 Interventions of type $(i)$

B.2 Interventions of type $(ii)$