Separable Effects for Causal Inference in the Presence of Competing   Events

Mats J. Stensrud; Jessica G. Young; Vanessa Didelez; James M. Robins,; Miguel A. Hern\'an

arXiv:1901.09472·stat.ME·February 14, 2020

Separable Effects for Causal Inference in the Presence of Competing Events

Mats J. Stensrud, Jessica G. Young, Vanessa Didelez, James M. Robins,, Miguel A. Hern\'an

PDF

Open Access

TL;DR

This paper introduces separable effects to clarify causal relationships in time-to-event studies with competing risks, enabling more precise effect estimation without cross-world assumptions.

Contribution

It proposes a novel framework for defining and identifying causal effects in the presence of competing events, avoiding cross-world contrasts and hypothetical interventions.

Findings

01

Separable effects can be identified under the assumption of treatment decomposition.

02

Application to prostate cancer trial demonstrates practical utility.

03

Distinct causal pathways for treatment effects are elucidated.

Abstract

In time-to-event settings, the presence of competing events complicates the definition of causal effects. Here we propose the new separable effects to study the causal effect of a treatment on an event of interest. The separable direct effect is the treatment effect on the event of interest not mediated by its effect on the competing event. The separable indirect effect is the treatment effect on the event of interest only through its effect on the competing event. Similar to Robins and Richardson's extended graphical approach for mediation analysis, the separable effects can only be identified under the assumption that the treatment can be decomposed into two distinct components that exert their effects through distinct causal pathways. Unlike existing definitions of causal effects in the presence of competing events, our estimands do not require cross-world contrasts or hypothetical…

Figures11

Click any figure to enlarge with its caption.

Tables9

Table 1. Table 1. Estimates of cumulative incidence after 3 years of follow-up.

Estimand	G-formula estimate (95%CI)	IP weighted estimate (95%CI)
$\Pr (Y_{36}^{a = 1} = 1)$	0.14 (0.08-0.20)	0.17 (0.10, 0.24)
$\Pr (Y_{36}^{a_{Y} = 1, a_{D} = 0} = 1)$	0.15 (0.09-0.21)	0.18 (0.10, 0.26)
$\Pr (Y_{36}^{a = 0} = 1)$	0.21 (0.15-0.28)	0.23 (0.17, 0.35)

Table 2. Table 2. Coefficients for the data generating mechanism of the examples in Appendix A .

Scenario	$α_{Y}$	$ω_{1}$	$ω_{2}$	$ω_{3}$	$α_{D}$	$ξ_{1}$	$ξ_{2}$	$ξ_{3}$
1	0.01	0	10	5	0.03	0	0	5
2	0.01	0	0	5	0.03	0	5	5
3	0.01	0	10	5	0.03	0	5	5
4	0.01	0	10	5	0.03	0	-5	5

Table 3. Table 3. Outcomes at time k 𝑘 k in subgroups Q k subscript 𝑄 𝑘 Q_{k} and R k subscript 𝑅 𝑘 R_{k} .

Treatment	Outcomes at $k$ in $Q_{k}$	Outcomes at $k$ in $R_{k}$
$A_{Y} = 1, A_{D} = 1$	$(Y_{k} = 0, D_{k} = 1)$	$(Y_{k} = 0, D_{k} = 1)$
$A_{Y} = 1, A_{D} = 0$	$(Y_{k} = 1, D_{k} = 0)$	$(Y_{k} = 0, D_{k} = 1)$ or $(Y_{k} = 0, D_{k} = 0)$

Table 4. Table 4. Data generating mechanism for the 7 simulation scenarios.

Scenario	$α_{Y}$	$ξ_{2}$	$ξ_{3}$	$ξ_{4}$	$ξ_{5}$	$ξ_{6}$	$α_{D}$	$ω_{3}$	$ω_{4}$	$ω_{5}$
1	0.01	10	0	5	0	0	0.03	-2	5	0
2	0.01	10	0	-2	5	0	0.03	-2	5	-2
3	0.01	10	0	5	-10	5	0.03	-2	5	-10
4	0.01	10	5	5	0	0	0.03	-2	5	0
5	0.01	10	0	-10	0	0	0.03	-2	0	0

Table 5. Table 5. Scenario 1.

		$n = 400$
Parameter	Estimator	$k = 100$	$k = 75$	$k = 25$
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 1} = 1)$	g-formula	0.95	0.94	0.93
	non-parametric	0.95	0.94	0.95
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 0} = 1)$	g-formula	0.94	0.93	0.92
	non-parametric	0.94	0.95	0.95
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 0} = 1)$	g-formula	0.95	0.96	0.94
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.94	0.95	0.95
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.96	0.95	0.95
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 1} = 1)$	g-formula	0.93	0.93	0.94
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.92	0.90	0.95
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.94	0.94	0.92

Table 6. Table 6. Scenario 2.

		$n = 400$
Parameter	Estimator	$k = 100$	$k = 75$	$k = 25$
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 1} = 1)$	g-formula	0.91	0.92	0.91
	non-parametric	0.95	0.96	0.93
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 0} = 1)$	g-formula	0.94	0.94	0.93
	non-parametric	0.93	0.93	0.93
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 0} = 1)$	g-formula	0.96	0.94	0.91
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.93	0.95	0.93
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.91	0.92	0.88
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 1} = 1)$	g-formula	0.94	0.93	0.93
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.90	0.91	0.93
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.93	0.94	0.94

Table 7. Table 7. Scenario 3.

		$n = 400$
Parameter	Estimator	$k = 100$	$k = 75$	$k = 25$
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 1} = 1)$	g-formula	0.93	0.95	0.91
	non-parametric	0.93	0.93	0.94
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 0} = 1)$	g-formula	0.93	0.86	0.48
	non-parametric	0.94	0.93	0.94
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 0} = 1)$	g-formula	0.93	0.94	0.93
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.94	0.94	0.93
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.91	0.72	0.56
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 1} = 1)$	g-formula	0.82	0.74	0.45
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.95	0.95	0.94
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.84	0.72	0.33

Table 8. Table 8. Scenario 4.

		$n = 400$
Parameter	Estimator	$k = 100$	$k = 75$	$k = 25$
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 1} = 1)$	g-formula	0.96	0.94	0.93
	non-parametric	0.95	0.94	0.94
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 0} = 1)$	g-formula	0.93	0.93	0.92
	non-parametric	0.93	0.93	0.95
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 0} = 1)$	g-formula	0.96	0.97	0.94
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.94	0.96	0.94
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.97	0.96	0.96
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 1} = 1)$	g-formula	0.05	0.05	0.07
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.31	0.26	0.34
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.05	0.04	0.12

Table 9. Table 9. Scenario 5.

		$n = 400$
Parameter	Estimator	$k = 100$	$k = 75$	$k = 25$
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 1} = 1)$	g-formula	0.95	0.94	0.94
	non-parametric	0.95	0.95	0.95
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 0} = 1)$	g-formula	0.94	0.94	0.93
	non-parametric	0.95	0.94	0.94
$\Pr (Y_{k}^{a_{Y} = 1, a_{D} = 0} = 1)$	g-formula	0.96	0.95	0.94
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.97	0.96	0.95
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.95	0.95	0.94
$\Pr (Y_{k}^{a_{Y} = 0, a_{D} = 1} = 1)$	g-formula	0.93	0.94	0.94
	${\hat{ν}}_{1, a_{Y}, a_{D}, k}$	0.94	0.93	0.94
	${\hat{ν}}_{2, a_{Y}, a_{D}, k}$	0.94	0.94	0.94

Equations228

A \equiv A_{D} \equiv A_{Y},

A \equiv A_{D} \equiv A_{Y},

Y_{k + 1}^{a_{Y} = a, a_{D} = a}

Y_{k + 1}^{a_{Y} = a, a_{D} = a}

D_{k + 1}^{a_{Y} = a, a_{D} = a}

Y_{k}^{a_{Y}, a_{D} = 1} = D_{k + 1}^{a_{Y}, a_{D} = 1} = Y_{k}^{a_{Y}, a_{D} = 0} = D_{k + 1}^{a_{Y}, a_{D} = 0} = 0 ⟹

Y_{k}^{a_{Y}, a_{D} = 1} = D_{k + 1}^{a_{Y}, a_{D} = 1} = Y_{k}^{a_{Y}, a_{D} = 0} = D_{k + 1}^{a_{Y}, a_{D} = 0} = 0 ⟹

Y_{k + 1}^{a_{Y}, a_{D} = 0} = Y_{k + 1}^{a_{Y}, a_{D} = 1}, for a_{Y} \in {0, 1},

Y_{k}^{a_{Y} = 1, a_{D}} = D_{k}^{a_{Y} = 1, a_{D}} = Y_{k}^{a_{Y} = 0, a_{D}} = D_{k}^{a_{Y} = 0, a_{D}} = 0 ⟹

Y_{k}^{a_{Y} = 1, a_{D}} = D_{k}^{a_{Y} = 1, a_{D}} = Y_{k}^{a_{Y} = 0, a_{D}} = D_{k}^{a_{Y} = 0, a_{D}} = 0 ⟹

D_{k + 1}^{a_{Y} = 1, a_{D}} = D_{k + 1}^{a_{Y} = 0, a_{D}}, for a_{D} \in {0, 1} .

Pr (Y_{k + 1}^{a_{Y} = 1, a_{D}} = 1) vs. Pr (Y_{k + 1}^{a_{Y} = 0, a_{D}} = 1)

Pr (Y_{k + 1}^{a_{Y} = 1, a_{D}} = 1) vs. Pr (Y_{k + 1}^{a_{Y} = 0, a_{D}} = 1)

Pr (Y_{k + 1}^{a_{Y}, a_{D} = 1} = 1) vs. Pr (Y_{k + 1}^{a_{Y}, a_{D} = 0} = 1),

Pr (Y_{k + 1}^{a_{Y}, a_{D} = 1} = 1) vs. Pr (Y_{k + 1}^{a_{Y}, a_{D} = 0} = 1),

[Pr (Y_{k + 1}^{a_{Y} = 1, a_{D} = 1} = 1) - Pr (Y_{k + 1}^{a_{Y} = 0, a_{D} = 1} = 1)]

[Pr (Y_{k + 1}^{a_{Y} = 1, a_{D} = 1} = 1) - Pr (Y_{k + 1}^{a_{Y} = 0, a_{D} = 1} = 1)]

+ [Pr (Y_{k + 1}^{a_{Y} = 0, a_{D} = 1} = 1) - Pr (Y_{k + 1}^{a_{Y} = 0, a_{D} = 0} = 1)]

= Pr (Y_{k + 1}^{a = 1} = 1) - Pr (Y_{k + 1}^{a = 0} = 1) .

Pr (Y_{k + 1}^{a_{Y}, a_{D}} = 1),

Pr (Y_{k + 1}^{a_{Y}, a_{D}} = 1),

\displaystyle\bar{Y}_{K+1}^{a},\bar{D}_{K+1}^{a}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}A\mid L\text{ for all }a,

\displaystyle\bar{Y}_{K+1}^{a},\bar{D}_{K+1}^{a}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}A\mid L\text{ for all }a,

Y_{k + 1}^{a} = Y_{k + 1}

Y_{k + 1}^{a} = Y_{k + 1}

D_{k + 1}^{a} = D_{k + 1},

Pr (L = l) > 0 ⟹

Pr (L = l) > 0 ⟹

Pr (A = a ∣ L = l) > 0 for a \in {0, 1},

Pr (D_{k + 1} = Y_{k} = 0, L = l) > 0 ⟹

Pr (A = a ∣ D_{k + 1} = Y_{k} = 0, L = l) > 0 for a \in {0, 1} and k \in {0, \dots, K} .

Pr (L = l) > 0 ⟹

Pr (L = l) > 0 ⟹

Pr (A_{Y} = a_{Y}, A_{D} = a_{D} ∣ L = l) > 0 for a_{Y}, a_{D} \in {0, 1},

Δ1 :

Δ1 :

= Pr (Y_{k + 1}^{a_{Y}, a_{D} = 0} = 1 ∣ Y_{k}^{a_{Y}, a_{D} = 0} = 0, D_{k + 1}^{a_{Y}, a_{D} = 0} = 0, L = l),

Δ2 :

Δ2 :

= Pr (D_{k + 1}^{a_{Y} = 0, a_{D}} = 1 ∣ Y_{k}^{a_{Y} = 0, a_{D}} = 0, D_{k}^{a_{Y} = 0, a_{D}} = 0, L = l),

\displaystyle\sum_{l}\Big{[}\sum_{s=0}^{k}\Pr(Y_{s+1}=1\mid D_{s+1}=Y_{s}=0,A=a_{Y},L=l)

\displaystyle\sum_{l}\Big{[}\sum_{s=0}^{k}\Pr(Y_{s+1}=1\mid D_{s+1}=Y_{s}=0,A=a_{Y},L=l)

\displaystyle\prod_{j=0}^{s}\big{[}\Pr(D_{j+1}=0\mid D_{j}=Y_{j}=0,A=a_{D},L=l)

\displaystyle\times\Pr(Y_{j}=0\mid D_{j}=Y_{j-1}=0,A=a_{Y},L=l)\big{]}\Big{]}\Pr(L=l),

\displaystyle\sum_{l}\Big{[}\sum_{s=0}^{k}\Pr(Y_{s+1}=1\mid D_{s+1}=Y_{s}=0,A_{Y}=a_{Y},A_{D}=a_{D},L=l)

\displaystyle\sum_{l}\Big{[}\sum_{s=0}^{k}\Pr(Y_{s+1}=1\mid D_{s+1}=Y_{s}=0,A_{Y}=a_{Y},A_{D}=a_{D},L=l)

\displaystyle\prod_{j=0}^{s}\big{[}\Pr(D_{j+1}=0\mid D_{j}=Y_{j}=0,A_{Y}=a_{Y},A_{D}=a_{D},L=l)

\displaystyle\times\Pr(Y_{j}=0\mid D_{j}=Y_{j-1}=0,A_{Y}=a_{Y},A_{D}=a_{D},L=l)\big{]}\Big{]}\Pr(L=l).

\displaystyle\sum_{l}\Big{[}\sum_{s=0}^{k}\Pr(Y_{s+1}=1\mid D_{s+1}=Y_{s}=\bar{C}_{s+1}=0,A=a_{Y},L=l)

\displaystyle\sum_{l}\Big{[}\sum_{s=0}^{k}\Pr(Y_{s+1}=1\mid D_{s+1}=Y_{s}=\bar{C}_{s+1}=0,A=a_{Y},L=l)

\displaystyle\prod_{j=0}^{s}\big{[}\Pr(D_{j+1}=0\mid D_{j}=Y_{j}=\bar{C}_{j+1}=0,A=a_{D},L=l)

\displaystyle\times\Pr(Y_{j}=0\mid D_{j}=Y_{j-1}=\bar{C}_{j}=0,A=a_{Y},L=l)\big{]}\Big{]}\Pr(L=l),

s = 0 \sum k E

s = 0 \sum k E

W_{D, s} (a_{Y}, a_{D})

W_{D, s} (a_{Y}, a_{D})

W_{C, s} (a_{D})

s = 0 \sum k E

s = 0 \sum k E

W_{Y, s} (a_{D}, a_{Y})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Bayesian Modeling and Causal Inference · Qualitative Comparative Analysis Research

Full text

Separable Effects for Causal Inference in the Presence of Competing Events

Mats J. Stensrud1,2, Jessica G. Young3, Vanessa Didelez4,5,James M. Robins1,6, Miguel A. Hernán1,6,7

1 Department of Epidemiology, Harvard T. H. Chan School of Public Health, USA

2Department of Biostatistics, University of Oslo, Norway

3 Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, USA

4 Leibniz Institute for Prevention Research and Epidemiology – BIPS, Germany

5 Faculty of Mathematics / Computer Science, University of Bremen, Germany

6 Department of Biostatistics, Harvard T. H. Chan School of Public Health, USA

7 Harvard-MIT Division of Health Sciences and Technology, USA

Abstract.

In time-to-event settings, the presence of competing events complicates the definition of causal effects. Here we propose the new separable effects to study the causal effect of a treatment on an event of interest. The separable direct effect is the treatment effect on the event of interest not mediated by its effect on the competing event. The separable indirect effect is the treatment effect on the event of interest only through its effect on the competing event. Similar to Robins and Richardson’s extended graphical approach for mediation analysis, the separable effects can only be identified under the assumption that the treatment can be decomposed into two distinct components that exert their effects through distinct causal pathways. Unlike existing definitions of causal effects in the presence of competing events, our estimands do not require cross-world contrasts or hypothetical interventions to prevent death. As an illustration, we apply our approach to a randomized clinical trial on estrogen therapy in individuals with prostate cancer.

1. Introduction

A competing event is any event that makes it impossible for the event of interest to occur. For example, consider a randomized trial to estimate the effect of a new treatment on the 3-year risk of prostate cancer in which 1000 individuals with prostate cancer were assigned to the treatment and 1000 to placebo. All participants adhered to the protocol and remained under follow-up. After 3 years, 100 individuals in the treatment arm and 200 in the placebo arm died of prostate cancer. Also, 150 individuals in the treatment arm and 50 in the placebo arm died of other causes (e.g., cardiovascular disease). Death from cardiovascular disease is a competing event for death from prostate cancer: individuals who die of cardiovascular disease cannot subsequently die of prostate cancer. When competing events are present, several causal estimands may be considered to define the causal effect of treatment on a time-to-event outcome [1].

Consider first the total treatment effect [1] defined by the contrast of the cumulative incidence (risk) [2, 3] of the event of interest under different treatment values. In our example, the total treatment effect on death from prostate cancer is the contrast of the cumulative incidence of death from prostate cancer under treatment, consistently estimated by $\frac{100}{1000}$ , and under placebo, consistently estimated by $\frac{200}{1000}$ . Therefore, the estimate of the total treatment effect on the additive scale is $\frac{100}{1000}-\frac{200}{1000}=-0.1$ , which indicates that treatment reduced the risk of death from prostate cancer.

However, in our trial, the interpretation of the total treatment effect on the event of interest is difficult because the treatment also increased the risk of the competing event. The estimate of the total effect of treatment on the competing event is $\frac{150}{1000}-\frac{50}{1000}=0.1$ on the additive scale. Thus, it is possible that the beneficial effect of treatment on death from prostate cancer is simply a consequence of the harmful effect of treatment on death from other causes: when more people die from other causes, fewer people can die from prostate cancer. Note that this problem of interpretation cannot be solved by considering contrasts of hazard functions, such as cause-specific and subdistribution hazards, because these estimands are defined conditional on a post-treatment event (survival) and therefore do not generally have a causal interpretation [1, 4].

One way to deal with this problem is to consider a second causal estimand on the risk scale: the (controlled) direct effect of treatment on the event of interest had competing events been eliminated. This estimand corresponds to defining the competing events as censoring events [1], and is sometimes denoted the marginal (net) distribution function. Unlike the total effect, identification of the controlled direct effect requires untestable assumptions even in an ideal randomized trial with perfect adherence and no loss to follow-up [1]. Also, this causal estimand often introduces a new conceptual challenge: the direct effect is not sufficiently well-defined because there is no scientific agreement as to which hypothetical intervention, if any, would eliminate the competing events [5]. For example, in our prostate cancer trial, no intervention has ever been proposed that can prevent all deaths from causes other than prostate cancer. As a byproduct of the ill-defined intervention to prevent competing events, effect estimates cannot be empirically verified – not even in principle – in a randomized experiment.

A third causal estimand is the survivor average causal effect (SACE) [6], which is the total treatment effect (on the risk scale) in the principal stratum of patients who would never experience the competing event under either level of treatment [1, 6, 7]. Unlike the total effect, the presence of competing events is not a problem when interpreting the SACE, because the SACE is restricted to subjects who do not experience competing events. However, identification of the SACE requires strong untestable assumptions, e.g. about cross-world counterfactuals, even in a perfectly executed trial. Also, the SACE could never, even in principle, be confirmed in a real-world experiment as it will never be possible to observe the status of the competing event for the same individual under two different levels of treatment.

The problems of the previous estimands can be overcome in settings in which the treatment exerts its effect on the event of interest and its effect on the competing event through different causal pathways. Here, we define the separable direct and indirect effects for settings with competing events. Like the controlled direct effect and the SACE, identification of separable effects relies on untestable assumptions even when the treatment is randomized. However, unlike the controlled direct effect and the SACE, separable effects do not require conceptual interventions on competing events or knowledge of cross-world counterfactuals; the separable effects are well-defined if we can articulate a hypothetical decomposition of the treatment into two components. Therefore, in principle, they may be verified in a future experiment. Our definitions of separable effects and conditions for identifiability follow from the work of Robins and Richardson [8] and Didelez [9] on mediation: the pure (natural) direct effects [10] are extensively used in mediation analyses, but they require untestable cross-world independence assumptions and are often difficult to interpret, for example, in survival settings. Robins and Richardson [8] proposed an alternative causal estimand that overcomes these problems by considering a decomposed treatment: unlike the pure direct effects, the decomposed treatment effects can be identified under assumptions that are in principle empirically testable. Moreover, it was shown by Didelez [9] that the decomposed treatment effects are sensible estimands in survival settings.

We have organized the paper as follows. In Section 2, we describe the observed data structure. In Section 3, we present a conceptual treatment decomposition and provide explicit examples to fix ideas. In Section 4, we formulate the causal estimand and define the new separable effects. In Section 5, we present conditions that allow for identifiability of the separable effects. In Section 6, we give 3 different estimators for the separable effects that can be implemented with standard statistical models, and we use data from a randomized clinical trial to estimate a direct effect of estrogen therapy on prostate cancer mortality. In Section 7, we provide a final discussion of the new estimands.

2. Observed data structure

We consider a study in which individuals are randomly assigned to a binary treatment $A\in\{0,1\}$ at baseline (e.g. $A=1$ if assigned to treatment and $A=0$ if assigned to placebo). Let $L\in\mathcal{L}$ denote a vector of individual pretreatment characteristics. For each of equally spaced discrete time intervals $k\in\{0,1,...,K+1\}$ , let $Y_{k}$ and $D_{k}$ denote indicators of an event of interest and a competing event by interval $k$ , respectively. In our example, $Y_{k}$ denotes death due to prostate cancer and $D_{k}$ death from other causes by interval $k$ . We adopt the convention that $D_{k}$ is measured just before $Y_{k}$ . If an individual experiences the competing event at time $k$ without a history of the event of interest $(D_{k}=1,Y_{k-1}=0)$ , then all future values of the event of interest are zero. We can approximate a continuous time setting by choosing time intervals that are arbitrary small.

By definition, $D_{0}\equiv Y_{0}\equiv 0$ , that is, no individual experiences any event during the initial interval. We use overbars to denote the history of a random variable, such that $\bar{Y}_{k}=(Y_{1},Y_{2},...,Y_{k})$ is the history of the event of interest through interval $k$ . Similarly, we use underbars to denote future values of a random variable, such that $\underline{Y}_{k}=(Y_{k},Y_{k+1},...,Y_{K+1})$ . We assume full adherence to the assigned treatment without loss of generality, and until Section 5.4, no loss to follow-up.

3. Decomposition of treatment effects

Suppose that treatment $A$ can be conceptualized as having two binary components that act through different causal pathways: one component $A_{Y}$ that affects the event of interest $Y_{k}$ and one component $A_{D}$ that affects the competing event $D_{k}$ . This hypothetical decomposition of $A$ can be formally described by the following conditions.

Suppose that $A$ and the two components $A_{Y}$ and $A_{D}$ are deterministically related in the observed data,

[TABLE]

but we can conceive hypothetical interventions that set $A_{D}$ and $A_{Y}$ to different values. For $k\in\{0,...,K\}$ , let $Y_{k+1}^{a}$ be an individual’s indicator of the event of interest at time $k+1$ when, possibly contrary to fact, $A$ is set to the value $a\in\{0,1\}$ . Similarly, let $Y_{k+1}^{a_{Y},a_{D}}$ be this outcome when, possibly contrary to fact, $A_{Y}$ is set to $a_{Y}$ and $A_{D}$ is set to $a_{D}$ , where $a_{Y},a_{D}\in\{0,1\}$ . We require that setting $A=a$ is equivalent to setting both $A_{Y}$ and $A_{D}$ to $a$ , that is,

[TABLE]

The assumption that $A_{D}$ only exerts effects on $Y_{k+1}$ through its effect on $\overline{D}_{k+1}$ can be stated as

[TABLE]

and, similarly, the assumption that $A_{Y}$ only exerts effects on $D_{k+1}$ through its effect on $\overline{Y}_{k}$ can be stated as

[TABLE]

The causal diagram in Figure 1 represents this decomposition in a setting with a single time point. The bold arrows represent the deterministic relation (1). Our decomposition conditions do not preclude the existence of multiple forms of decompositions of $A$ . However, every decomposition of $A$ into two distinct components must be justified by subject-matter knowledge. Let us consider two examples.

3.1. Diethylstilbestrol and prostate cancer mortality

In our prostate cancer example, we assume that $A$ can be decomposed into a component $A_{Y}$ that directly affects death from prostate cancer and a component $A_{D}$ that directly affects death from other causes. Suppose that treatment $A=0$ is placebo and $A=1$ is diethylstilbestrol (DES), an estrogen which is thought to reduce mortality due to prostate cancer by suppressing testosterone production and to increase cardiovascular mortality through estrogen-induced synthesis of coagulation factors [11].

We could then consider a hypothetical treatment that has the same direct effect as DES on prostate cancer mortality, but lacks any effect effect on mortality from other causes; that is, the same effect as the $A_{Y}$ component of DES when the $A_{D}$ component is removed. Real-life treatments similar to such a hypothetical treatment are luteinizing hormone releasing hormone (LHRH) antagonists or orchidectomy (castration), which can stop testosterone production but, unlike estrogen, do not increase cardiovascular risk.

Also, we could consider a hypothetical treatment that has the same direct effect as DES on mortality from other causes, but that lacks any effect on prostate cancer mortality; that is, the same effect as the $A_{D}$ component of DES when the $A_{Y}$ component is removed. In practice, a drug that contains not only DES but also testosterone may resemble this hypothetical treatment, as the additional testosterone component can nullify the testosterone suppression that is induced by DES.

3.2. Statins and dementia

Consider a study to quantify the effect of statins on dementia. Statins reduce cardiovascular mortality by lowering the cholesterol production in the liver. As dementia may develop due to microvascular events in the small cerebral arteries, lowering cholesterol may also reduce the risk of dementia. When studying the effect of statins on dementia, death will be a competing event.

Because statins appear to reduce mortality and dementia through the same mechanism, i.e., lowering the cholesterol levels in the blood, decomposing $A$ into the distinct components $A_{Y}$ and $A_{D}$ would be difficult. One possibility might be to leverage the distinct localization of the microvessels in the brain: we could bioengineer a cholesterol transporter, which is surgically implanted to shuttle cholesterol particles from the distal cerebral arteries directly to the large cerebral veins, circumventing the cerebral microvessels. That is, if $Y_{k}$ and $D_{k}$ denote dementia and death, respectively, then carriers of the transporter will have the $A_{Y}$ component of statins on dementia, but they will lack the $A_{D}$ component of statins on mortality. Robins and Richardson discussed the construction of plausible interventions in a mediation context, using nicotine in cigarettes as an example [8, Section 5.2].

3.3. Practical considerations

Whenever the decomposition of treatment $A$ into $A_{Y}$ and $A_{D}$ is possible in principle, regardless of whether it is possible in practice at this time in history, the effects of $A_{Y}$ and $A_{D}$ are well-defined. Therefore, in both examples above, we described well-defined effects even though the decomposition of treatment may be practically possible in the prostate cancer example but not in the statin example.

However, caution is required when considering treatment decompositions that, as in the statins example, are possible in principle but not in practice. The problem is that practically impossible decompositions make it hard to evaluate the identifiability conditions for the effects of each component. As described in Section 5, the identification of the separable effects is based on the unverifiable condition that $A_{Y}$ and $A_{D}$ are treatment components actually operating in the data [5], such that $A_{Y}$ has no direct effect on $D_{k}$ and that $A_{D}$ has no direct effect on $Y_{k}$ . When relying on convoluted treatment decompositions, as in our statins example, we may be less confident that these conditions hold in the data. Of course, if these conditions are violated, our effect estimates may differ from those that would be obtained in a future experiment in which both components $A_{Y}$ and $A_{D}$ are randomly assigned.

On the other hand, a careful definition of treatment decomposition may help ground scientific conversations even if the decomposition is not yet possible. For example, it is debated whether statins have a protective effect on dementia [12]. To clarify the notion of a ’protective effect’ it would be helpful to consider a hypothetical trial in which subjects were randomly assigned to the cholesterol transporter or placebo.

4. Definition of separable effects

We can now define the separable direct effects of treatment on the event of interest as the contrasts

[TABLE]

for $a_{D}=1$ or $a_{D}=0$ ; that is, the effect of the component of treatment that affects the event of interest $A_{Y}$ when the component of treatment that affects the competing event $A_{D}$ is set at a constant value $a_{D}$ .

Analogously, we can define the separable indirect effects of treatment on the event of interest as the contrasts

[TABLE]

for $a_{Y}=1$ and $a_{Y}=0$ ; that is, the effect of the component of treatment that affects the competing event $A_{D}$ when the component of treatment that affects the event of interest $A_{Y}$ is set at a constant value $a_{Y}$ . In other words, the separable indirect effects are functions of the treatment component $A_{D}$ that affects the competing event $D_{k+1}$ , and the separable indirect effects arise because the competing event makes it impossible for the event of interest to occur.

From (2) we find that the sum of separable direct and indirect effects (on the additive scale) equals the total effect,

[TABLE]

To provide intuition about the magnitude of the separable effects, we describe 4 illustrative scenarios in Appendix A.

5. Identification of separable effects

The identification of the separable effects requires the identification of the quantities

[TABLE]

where $a_{Y},a_{D}\in\{0,1\}$ . Identifying these quantities would be straightforward if each of the treatment components could be separately intervened upon, that is, if we could conduct a randomized experiment with 4 possible treatment arms defined by the 4 combinations of values of $A_{Y}$ and $A_{D}$ . However, when using data from a study like that of Section 2, in which only the treatment $A$ is randomized, we only observe 2 out of the 4 treatment arms in a hypothetical trial in which $A_{Y}$ and $A_{D}$ were randomized. As a result, we need additional untestable conditions to identify (5). This conceptualization of the treatment decomposition in terms of a 4-arm randomized experiment was originally proposed by James Robins during a presentation at the UK Causal Inference Conference in London, April 2016. Since then, Robins and others have often publicly discussed this conceptualization in the context of mediation analysis, which is isomorphic to the context with competing events discussed here.

5.1. Identifiability conditions

First, we need exchangeability conditional on the measured covariates $L$ ,

[TABLE]

where time $K+1$ is the end of the study. This exchangeability condition is expected to hold when $A\equiv A_{Y}\equiv A_{D}$ is randomized.

Second, consistency, such that if $A=a$ , then

[TABLE]

for $a\in\{0,1\}$ at all times $k\in\{0,\ldots,K\}$ . If any subject has data history consistent with the intervention under a counterfactual scenario, then the consistency assumption ensures that the observed outcome is equal to the counterfactual outcome.

Third, positivity such that

[TABLE]

where (6) is the usual positivity condition under interventions on $A$ and (7) ensures that among those event-free through each follow-up time, there exist individuals with $A=1$ and individuals with $A=0$ . However, our estimand is based on hypothetical intervention on both $A_{Y}$ and $A_{D}$ , and our positivity conditions do not ensure the stricter condition that

[TABLE]

which, indeed, will be violated when $a_{Y}\neq a_{D}$ in our setting where $A\equiv A_{Y}\equiv A_{D}$ .

To allow for identifiability under our positivity condition in (6), we introduce two conditions that are related to conditions described by Didelez in a mediation setting [9].

Dismissible component condition 1

[TABLE]

at all times $k\in\{0,...,K\}$ . That is, the counterfactual (discrete-time) hazards of the event of interest are equal under all values of $A_{D}$ .

Dismissible component condition 2

[TABLE]

at all times $k\in\{0,...,K\}$ . That is, the counterfactual (discrete-time) hazard functions of the competing event are equal under all values of $A_{Y}$ . The dismissible component conditions are analogous to identification conditions from Shpitser [13] on path specific effects.

By considering a hypothetical trial in which both $A_{Y}$ and $A_{D}$ are randomized, we can define conditional independencies that imply the dismissible component conditions, and these conditional independencies can be read off of causal DAGs directly, see Appendix B for details.

The dismissible component conditions ensure that we can adjust for common causes of $D_{k}$ and $Y_{k^{\prime}}$ for all $k,k^{\prime}\in\{1,...,K+1\}$ . In particular, an unmeasured common cause of $D_{1}$ and $Y_{1}$ , such as $U_{YD}$ in Figure 2, violates $\Delta$ 1 and $\Delta$ 2. In our prostate cancer example, suppose that smoking is a common cause of death from prostate cancer ( $Y_{k}$ ) and death from other causes ( $D_{k}$ ). Then, if smoking is an unmeasured variable (such as $U_{YD}$ in Figure 2), the dismissible component conditions will be violated.

However, the presence of unmeasured causes $U_{Y}$ of $Y_{k}$ and unmeasured causes $U_{D}$ of $D_{k}$ , as shown in Figure 3, does not violate $\Delta$ 1 and $\Delta$ 2 (see Appendix E for details); it just implies that contrasts of the hazard terms in (LABEL:eq:_identification_L) cannot be causally interpreted [1, 4, 14], which is analogous to the mediation setting in Didelez [9, Figure 6]. For this reason, we have defined our causal estimands as contrasts of risks rather than as contrasts of hazards. Furthermore, adjusting for a measured common cause of $Y_{k}$ and $D_{k}$ , such as $L$ in Figure 4, allows identification under $\Delta$ 1 and $\Delta$ 2. In subsequent figures we have omitted the variables $U_{Y}$ and $U_{D}$ to avoid clutter, but our results are valid in the presence of $U_{Y}$ and $U_{D}$ . We have also omitted an arrow from $L$ to $A$ , but this arrow would not invalidate our results. Furthermore, we have intentionally omitted arrows from $D_{k}$ to $Y_{s}$ for $k<s$ , as these arrows are redundant in our setting where the competing event is a terminating event that precludes the event of interest at all subsequent times. Finally, note that if the dismissible component conditions hold on a coarser scale (say, daily), then they will in general also hold on a finer scale (say, hourly), but the reverse is not true. This is analogous to any setting where measurements of time-varying covariates are needed to identify causal effects.

The dismissible component conditions are not empirically verifiable in a trial in which the entire treatment $A$ , but neither of its components $A_{Y}$ and $A_{D}$ , is intervened upon. However, both conditions could be tested in a trial in which $A_{Y}$ and $A_{D}$ were randomly assigned.

5.2. Identification formula

Under the identifiability conditions in Section 5.1, we identify $\Pr(Y_{k+1}^{a_{Y},a_{D}}=1)$ from the following g-functional [6] of the observed data described in Section 2,

[TABLE]

see Appendix C for proof.

5.3. Intuition on the identification formula (LABEL:eq:_identification_L) and falsifiability of the separable effects.

Identification formula (LABEL:eq:_identification_L) can be intuitively motivated as follows: consider an experiment $G$ in which both $A_{Y}$ and $A_{D}$ are randomly assigned such that $\Pr(A_{Y}=a_{Y},A_{D}=a_{D})>0$ for all $a_{D},a_{Y}\in\{0,1\}$ . In the experiment $G$ , $\Pr(Y^{a_{Y},a_{D}}_{k+1}=0)=\Pr(Y_{k+1}=1\mid A_{Y}=a_{Y},A_{D}=a_{D})$ by randomization. By the laws of probability $\Pr(Y_{k+1}=1\mid A_{Y}=a_{Y},A_{D}=a_{D})$ can in turn be re-expressed as

[TABLE]

Formula (LABEL:eq:_identification_L) can be obtained by applying the dismissible component conditions to the terms in (LABEL:eq:_outcome_under_ay_and_ad). These additional conditions are needed for identification in our current study because, unlike in $G$ , only $A$ was randomized in our current study and not the separate components $A_{Y}$ and $A_{D}$ . If the experiment $G$ is actually conducted in the future, then the separable effect estimates obtained from (LABEL:eq:_identification_L) in our current study can be confirmed by comparing them to estimates of $\Pr(Y_{k+1}=0\mid A_{Y}=a_{Y},A_{D}=a_{D})$ from $G$ [8].

Note that (LABEL:eq:_identification_L) can also be read off of a Single World Intervention Graph (SWIG) [15] that satisfies the dismissible component conditions, as suggested in Figure 5, illustrating that the separable effects are single-world quantities that are empirically testable in principle. This is in contrast to alternative approaches from mediation analysis that require additional, untestable cross-world independence assumptions [8].

5.4. Separable effects in the presence of censoring

We consider a subject to be censored at time $k+1$ if the subject remained under follow-up and was event-free until $k$ , but we have no information about the subject’s events at $k+1$ or later [1]. That is, censoring is a type of event that does not make it impossible for the event of interest to occur and we assume that censoring can in principle be prevented [1]. When the censoring is independent of future counterfactual events given $L$ , as illustrated in Figure 6, we can identify the separable effects from

[TABLE]

where $C_{k}$ is an indicator of being censored at $k$ , see Appendix C for details. Alternatively, the identification formula can be derived by drawing a SWIG for the scenario of interest, as suggested in Figure 7. Hereafter we will use $\nu_{a_{Y},a_{D},k}$ to denote the g-formula (LABEL:eq:_identification_L).

5.5. Alternative representations of the identification formula

The g-formula (LABEL:eq:_identification_censoring_L) can also be expressed as

[TABLE]

where

[TABLE]

see Appendix D for details. Furthermore, another representation of (LABEL:eq:_identification_censoring_L) is

[TABLE]

where $W_{C,s}(a_{D})$ is defined as in (11) and

[TABLE]

as formally shown in Appendix D. Note that in settings without censoring, $W_{C,s}(a)\equiv 1,a=0,1$ . Representations (11) and (12) motivate inverse probability (IP) weighted estimators of the separable effects, as described in Section 6.

6. Estimation of separable effects

To estimate the separable effects, we emphasize that (LABEL:eq:_identification_L) and (LABEL:eq:_identification_censoring_L) are functionals of (discrete-time) hazard functions and the density of $L$ . Indeed, $\Pr(Y_{k+1}=1\mid D_{k+1}=Y_{k}=\bar{C}_{k+1}=0,A=a,L=l)$ and $\Pr(D_{k+1}=0\mid D_{k}=Y_{k}=\bar{C}_{k+1}=0,A=a,L=l)$ are often denoted ’cause specific hazard functions’ in the statistical literature. Though the term ’cause specific’ is confusing because the causal interpretation of these hazard functions is ambiguous [1], we can nevertheless estimate these functions using classical statistical models, such as multiplicative or additive hazard models. Provided that these hazard models are correctly specified, along with $\widehat{\Pr}(L=l)$ [16], we can consistently estimate (LABEL:eq:_identification_censoring_L) using a parametric g-formula estimator [6]. However, we can also derive weighted estimators that rely on fewer model assumptions.

6.1. Inverse probability weighted estimators

Motivated by the alternative g-formula representation (11), define

[TABLE]

where $\Pr(D_{j+1}=0\mid\bar{C}_{j+1}=D_{j}=Y_{j}=0,L,A=a_{D};\eta_{D})$ is a parametric model for the numerator (and denominator) of $W_{D,k}(a_{Y},a_{D})$ indexed by parameter $\eta_{D}$ , and $\hat{\eta}_{D}$ is a consistent estimator of $\eta_{D}$ (e.g. the MLE), and the terms in $\hat{W}_{C,k,i}(a_{D};\hat{\eta}_{C})$ are defined similarly, where $\hat{\eta}_{C}$ is a consistent estimator of $\eta_{C}$ .

Let $\eta_{1}=(\eta_{D},\eta_{C})$ , and define the estimator $\hat{\nu}_{1,a_{Y},a_{D},k}$ of $\nu_{a_{Y},a_{D},k}$ as the solution to the estimating equation $\sum_{i=1}^{n}U_{1,k,i}(\nu_{a_{Y},a_{D},k},\hat{\eta}_{1})=0$ with respect to $\nu_{a_{Y},a_{D},k}$ with

[TABLE]

and $\hat{W}_{1,s,i}(a_{Y},a_{D};\hat{\eta}_{1})=\hat{W}_{D,s,i}(a_{Y},a_{D};\hat{\eta}_{D})\hat{W}_{C,s,i}(a_{Y};\hat{\eta}_{C})$ .

Then, $\hat{\nu}_{1,a_{Y},a_{D},k}$ is a consistent estimator for $\nu_{a_{Y},a_{D},k}$ if the models indexed by elements in $\eta_{1}$ are correctly specified and $\hat{\eta}_{1}$ is a consistent estimator for $\eta_{1}$ , which follows because (LABEL:eq:_identification_censoring_L) and (11) are equal. For example, we can use conventional statistical models for binary outcomes, such as pooled logistic regression models, to estimate the weights $W_{D,k}(a_{Y},a_{D})$ and $W_{C,k}(a_{Y})$ .

Analogous to $\hat{\nu}_{1,a_{Y},a_{D},k}$ , we can derive an estimator based on (12). Suppose

[TABLE]

where the terms in $\hat{W}_{Y,k,i}(a_{D},a_{Y};\hat{\eta}_{Y})$ are statistical models for binary outcomes and $\hat{\eta}_{Y}$ is a consistent estimator for $\eta_{Y}$ .

Let $\eta_{2}=(\eta_{Y},\eta_{C})$ , and define the estimator $\hat{\nu}_{2,a_{Y},a_{D},k}$ of $\nu_{a_{Y},a_{D},k}$ as the solution to the estimating equation $\sum_{i=1}^{n}U_{2,k,i}(\nu_{a_{Y},a_{D},k},\hat{\eta}_{2})=0$ with respect to $\nu_{a_{Y},a_{D},k}$ , where

[TABLE]

and $\hat{W}_{2,s,i}(a_{Y},a_{D};\hat{\eta}_{2})=\hat{W}_{C,s,i}(a_{D};\hat{\eta}_{C})\hat{W}_{Y,s,i}(a_{D},a_{Y};\hat{\eta}_{Y})$ . Analogous to the estimator based on (11), provided that the models indexed by elements in $\eta_{2}$ are correctly specified and $\hat{\eta}_{2}$ is a consistent estimator for $\eta_{2}$ , then consistency of $\hat{\nu}_{2,a_{Y},a_{D},k}$ for $\nu_{a_{Y},a_{D},k}$ follows because (LABEL:eq:_identification_censoring_L) and (12) are equal.

In the next section, we use this approach to analyze a randomized trial on prostate cancer therapy. In Appendix F, we present simulations, suggesting that the estimators perform satisfactorily in finite samples. The simulations also illustrate that the separable effect can be substantially different than the total effect, and that the estimators may be biased if the dismissible component conditions are violated.

6.2. Example: A randomized trial of prostate cancer

Consider, as described in Section 3.1, a hypothetical drug that has the same direct effect as DES on prostate cancer mortality (same $A_{Y}$ component), but lacks any effect on mortality due to other causes (opposite $A_{D}$ component). Then we can define separable direct effects of treatment DES on prostate cancer mortality $Y_{k}$ in the presence of competing mortality $D_{k}$ from other causes. We estimated these separable effects using a parametric g-formula estimator and, for simplicity, one of the inverse probability (IP) weighted estimators ( $\hat{\nu}_{1,a_{Y},a_{D},k}$ ). We used publicly available data from a randomized trial (http://biostat.mc.vanderbilt.edu/DataSets) [17] that has been used in several methodological articles on competing events [18, 19, 20, 21]. In total, 502 patients were assigned to 4 different treatment arms. We restrict our analysis to the placebo arm (127 patients) and the high-dose DES arm (125 patients).

To implement the parametric g-formula estimator, we used pooled logistic regression models to estimate the terms in (LABEL:eq:_identification_censoring_L), in which daily activity function, age group, hemoglobin level and previous cardiovascular disease were included as covariates ( $L$ in Figure 6), that is,

[TABLE]

where $\theta_{0,k}$ and $\beta_{0,k}$ are time-varying intercepts modeled as cubic polynomials. To allow time-varying treatment effects, we included $\theta_{2},\theta_{3},\beta_{2}$ and $\beta_{3}$ .

To implement the IP weighted estimator $\hat{\nu}_{1,a_{Y},a_{D},k}$ , we only require the model (LABEL:eq:_regression_d) (similarly, we would only require the model (LABEL:eq:_regression_y) to implement $\hat{\nu}_{2,a_{Y},a_{D},k}$ ).

Both the parametric g-formula and IP weighted estimator gave cumulative incidence estimates under the hypothetical drug that were similar, but not identical, to those under DES treatment. Table 1 displays estimates of the 3-year cumulative incidence and 95% bootstrap confidence intervals based on both estimators and Figure 8B shows cumulative incidence curves from the IP weighted estimator (R code is provided found in the supplementary material).

Our analysis suggests that DES mostly reduces prostate cancer mortality via testosterone suppression because the estimate of the separable indirect effect on 3-year mortality is close to zero. Using either the parametric g-formula or the IP weighted estimator, the estimate of the additive indirect effect after 3 years of follow-up is $0.01$ ( $0.15-0.14=0.01$ and $0.18-0.17=0.01$ ), which can be interpreted as the reduction in prostate cancer mortality under DES compared with placebo that is due to the DES effect on mortality from other causes. That is, the total effect of DES on prostate cancer mortality is not simply a consequence of a harmful effect on death from other causes.

The validity of our estimates relies on the assumption that $L$ is sufficient to adjust for the common causes of $Y_{k}$ and $D_{k}$ . This assumption would be violated if other factors, such as unmeasured comorbidities, exert effects on both $Y_{k}$ and $D_{k}$ . Also, our approach relies on the absence of time-varying common causes of the event of interest and the competing event in many settings. In future work, we will generalize our approach to allow for time-varying covariates.

7. Discussion

We have defined separable effects as new estimands to promote causal reasoning in competing event settings. The separable effects are motivated by hypothetical interventions, in which a time-fixed treatment is decomposed into distinct components, and each component can be assigned different values.

Therefore, to define and interpret the separable effects, investigators must use their subject-matter knowledge to explicitly articulate a hypothetical decomposition of the treatment. An explicit consideration of this decomposition helps assess the plausibility of the assumptions and guides the design of future experiments to empirically verify the effects [8].

Classical statistical estimands fail to provide the same information as the separable effects (see Young et al [1] for a detailed discussion of interpretation and identification of counterfactual contrasts in classical estimands for competing event settings). In particular, the cumulative incidence functions of the event of interest and the competing event do not clarify the mechanism by which treatment exerts effects on the event of interest, even if these outcomes are considered jointly in an ideal randomized trial. Furthermore, estimands on the hazard scale, e.g. subdistribution hazards and cause-specific hazards, do not have a straightforward causal interpretation and thus cannot solve the problem [1, 4].

Identification of separable effects requires, even in a perfectly executed randomized trial, adjustment for pretreatment variables that are common causes of the event of interest and the competing event. However, this strong condition is also needed for the causal interpretation of analysis of trials targeting conventional estimands such as controlled direct effects or counterfactual contrasts of hazard functions [1].

For simplicity, we have considered settings in which the treatment $A$ is randomly assigned. For example, we illustrated the application of standard time-to-event methods to estimate the separable effects in a prostate cancer randomized trial. However, our approach can be easily extended to analyses of observational studies under the additional assumption of no unmeasured confounding for the effect of treatment on both the competing event and the event of interest.

Finally, the idea of separable effects is not only relevant to settings in which the outcome of interest is a time-to-event. Many practical settings involve intermediate outcomes that are ill-defined after the occurrence of a terminating event. For example, we may be interested in treatment effects on outcomes such as quality of life or cognitive function, and these outcomes are meaningless after death. We aim to study separable effects in such settings in future research.

Acknowledgements

This work was funded by NIH grant R37 AI102634. M.J.S. was also supported by an ASISA Fellowship and the Research Council of Norway, grant NFR239956/F20.

Appendix A Some intuition about the magnitude of the separable direct effects.

Consider the following scenarios:

•

Scenario 1: $A$ has a null direct effect on the competing event ( $A\nrightarrow D_{k}$ ), and the separable direct effect is equal to the total effect.

•

Scenario 2: $A$ has a null direct effect on the event of interest ( $A\nrightarrow Y_{k}$ ), and the indirect effect is equal to the total effect.

•

Scenario 3: $A$ has an average harmful (positive) total effect on both $Y_{k}$ and $D_{k}$ . The separable direct effects $\Pr(Y_{k+1}^{a_{Y}=1,a_{D}}=1)\text{ vs. }\Pr(Y_{k+1}^{a_{Y}=0,a_{D}}=1)$ are harmful (positive), and the separable indirect effects $\Pr(Y_{k+1}^{a_{Y},a_{D}=1}=1)\text{ vs. }\Pr(Y_{k+1}^{a_{Y},a_{D}=0}=1)$ are protective (negative).

•

Scenario 4: $A$ has an average harmful (positive) total effect on $Y_{k}$ and a protective (negative) total effect on $D_{k}$ , and the separable direct effects $\Pr(Y_{k+1}^{a_{Y}=1,a_{D}}=1)\text{ vs. }\Pr(Y_{k+1}^{a_{Y}=0,a_{D}}=1)$ are harmful (positive), and the separable indirect effects $\Pr(Y_{k+1}^{a_{Y},a_{D}=1}=1)\text{ vs. }\Pr(Y_{k+1}^{a_{Y},a_{D}=0}=1)$ are harmful (positive).

To provide some intuition about the magnitude of the separable effects across these scenarios, we conducted simulations under the following data generating process:

(1)

Draw $L_{1}\sim\text{Bernoulli}[p=0.25]$ . 2. (2)

Draw $A_{Y}\sim\text{Bernoulli}[p=0.5].$ 3. (3)

Draw $A_{D}\sim\text{Bernoulli}[p=0.5].$ 4. (4)

Define $A=a$ if $A_{Y}=a$ and $A_{D}=a$ . 5. (5)

Set $D_{0}=Y_{0}=0$ . 6. (6)

For each $k\in\{0,K\}$ ,

•

if $D_{k}=Y_{k}=0$ ,

draw $D_{k+1}\sim\text{Bernoulli}[p=\psi_{k}(A_{Y},A_{D},L_{1},L_{2})]$ , where

[TABLE]

if $D_{k+1}=0$ ,

draw $Y_{k+1}\sim\text{Bernoulli}(p=\lambda_{k}(A_{D},L_{1}))$ , where

[TABLE]

if $D_{k+1}=1$ , set $Y_{k+1}=0$ .

•

else, define $D_{k+1}=D_{k}$ , $Y_{k+1}=Y_{k}$ .

Scenario 1 is illustrated in Figure 9a, which was generated using the coefficients from the first row of Table 2.

Scenario 2 illustrated in Figure 9b, which was generated using the coefficients from the second row of Table 2.

Scenario 3 is illustrated in Figure 9c, which was generated using the coefficients from the third row of Table 2.

Scenario 4 is illustrated in Figure 9d, where data were generated from the forth row of Table 2.

To provide additional intuition about the magnitude of the separable effects, it may be helpful to consider two hypothetical sets of individuals (Table 3).

First, define the set $Q_{k}$ of individuals such that $i\in Q_{k}$ if $i$ would experience the competing event at time $t_{i}<k$ under full treatment (that is, $A_{Y}=1,A_{D}=1$ ), and would experience the event of interest at a time $s_{i}$ , where $t_{i}<s_{i}<k$ , under the hypothetical treatment $A_{Y}=1,A_{D}=0$ , see Table 3. Heuristically, this happens if the hypothetical treatment delays the competing event such that the event of interest is allowed to occur. If $Q_{k}$ comprises a large fraction of the population, we would expect the total effect and the separable direct effect to be different, because competing events would make it impossible for the event of interest to occur under full treatment, but not under the hypothetical treatment.

Second, define the set of individuals $R_{k}$ such that all individuals $j\in R_{k}$ experience the competing event at time $t_{j}<k$ under full treatment, but would either experience the competing event at $s_{j}$ , where $s_{j}<k$ , or not experience any event before $k$ under the hypothetical treatment. That is, the subjects in $R_{k}$ will not experience the event of interest before $k$ under the hypothetical treatment, regardless of the time at which the competing event occurs. If $R_{k}$ comprises a large fraction of the population, the total effect and the separable direct effect on the event of interest will be close.

Appendix B Conditional Independencies that imply the dismissible component conditions.

We expressed the dismissible component conditions $\Delta$ 1 and $\Delta$ 2 in terms of equalities of hazard functions. We now show that these equalities are implied by certain counterfactual independencies that can be read directly off of successive single world transformation of a causal DAG.

Hypothetical trial

Suppose that each component of $A$ is randomly assigned in a hypothetical 4-arm trial $G$ . To indicate that the random variables are defined with respect to $G$ , let $A_{Y}(G)$ and $A_{D}(G)$ be the value of $A_{Y}$ and $A_{D}$ observed under $G$ , respectively. We assume that $A_{Y}(G)$ and $A_{D}(G)$ are randomized independently of each other to values in $\{0,1\}$ , that is $A_{Y}(G)\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}A_{D}(G)$ . Assume no losses to follow-up. Define the independencies

[TABLE]

B.1. Conditions that ensure $\Delta$ 1 and $\Delta$ 2

Since $A_{Y}(G)$ and $A_{D}(G)$ are randomly assinged, conditional exchangeability is satisfied in the trial $G$ , such that

[TABLE]

where $a_{Y},a_{D}\in\{0,1\}$ . In the special case where $a_{Y}=a_{D}$ , this conditional exchangeability condition is the same as the conditional exchangeability condition in the main text.

Furthermore, we assume consistency in $G$ , that is, if $A_{Y}=a_{Y}$ and $A_{D}=a_{D}$ then

[TABLE]

where $a_{Y},a_{D}\in\{0,1\}$ . This consistency condition is identical to the consistency condition in the main text when $a_{Y}=a_{D}$ .

We assume positivity in $G$ , that is, for all $l\in\mathcal{L}$ ,

[TABLE]

which holds by design in $G$ .

Let $a_{Y}=0$ , $a_{D}=1$ (an analogous argument holds when $a_{Y}=1$ , $a_{D}=0$ ). Using exchangeability and consistency we find that, for all $l\in\mathcal{L}$ ,

[TABLE]

Similarly, using (15), exchangeability and consistency we find

[TABLE]

The derivations in (LABEL:eq:_identical_ass_1) and (LABEL:eq:_identical_ass_2) show that $\Delta$ 1 is satisfied if condition (15) holds, assuming conditional exchangeability, positivity and consistency. We can use exactly the same argument to show that condition $\Delta$ 2 holds under conditional exchangeability, positivity, consistency and condition (16). Conditions (15) and (16) are helpful in practice because these independences can be evaluated in causal graphs. In particular, these conditions hold in Figure 10, where we have described a trial in which $A_{Y}$ and $A_{D}$ are randomly assigned such that $\Pr(A_{Y}=a_{Y},A_{D}=a_{D})>0$ for all $a_{D},a_{Y}\in\{0,1\}$ .

Note that conditions (3) and (4) in the main text, which are part of the decomposition assumption, are required for the independencies (15) and (16) to hold.

Appendix C Proof of identifiability

We assume a Finest Fully Randomized Causally Interpretable Structured Tree Graph (FFRCISTG) model [6]. The aim is to identify $P\left(Y^{a_{Y},a_{D},\bar{c}=0}_{k}=1\right)$ as a function of the factual data, in which $A$ is randomized. To do this, we will initially consider a scenario in which both $A_{Y}$ and $A_{D}$ are randomized, that is, we consider a 4 arm trial $G$ , as described in Appendix B. Hereafter we omit the string ’ $(G)$ ’ after the random variables, e.g. $A_{Y}(G)=A_{Y}$ , to avoid clutter. We will provide a proof for the scenario with a measured pretreatment covariate $L$ and censoring $C_{k}$ . The results will immediately hold in simpler scenarios, e.g. by defining $L$ or $C_{k}$ to be the empty set.

C.1. Identifiabilty conditions in the presence of censoring

First, we generalize the identifiability conditions to allow for censoring. Assume that subjects may be lost to follow-up, and that the losses to follow-up can depend on $A_{Y}$ , $A_{D}$ and $L$ , as suggested in Figure 6. Further, assume that the losses to follow-up are independent of future counterfactual events (’independent censoring’). To be more precise, we consider a setting in which we intervened such that no subject was lost to follow-up. Let $C_{k}\in\{0,1\}$ be an indicator of loss to follow-up by $k$ . Let $D^{a_{Y},a_{D},\bar{c}=0}_{k}$ and $Y^{a_{Y},a_{D},\bar{c}=0}_{k}$ be the counterfactual values of $Y_{k}$ and $D_{k}$ when $A_{Y}$ is set to $a^{*}$ , $A_{D}$ is set to $a$ , and follow-up is ensured at all times.

In a continuous time setting, it is usually assumed that two events cannot occur at the same point in time. In our discrete time setting with pretreatment covariates $L$ and censoring $C_{k}$ , we define a temporal order

[TABLE]

For all $k\in\{0,K\}$ we consider the following conditions. First, we extend the exchangeability conditions from Section 5.1,

[TABLE]

Here, as in Section 5.1, E1 holds when $A\equiv A_{Y}\equiv A_{D}$ are randomized. E2 requires that losses to follow-up are independent of future counterfactual events, given the measured past. This condition is similar to the ’independent censoring’ condition that is assumed to hold in classical randomized trials [1].

Furthermore, we require a consistency condition such that if $A_{Y}=a_{Y}$ , $A_{D}=a_{D}$ and $\bar{C}_{k}=0$ , then $Y_{k}={Y}^{a_{Y},a_{D},\bar{c}=0}_{k}$ and $D_{k}={D}^{a_{Y},a_{D},\bar{c}=0}_{k}$ , and still we only observe scenarios where $a_{Y}=a_{D}$ . The consistency condition ensures that if an individual has a data history consistent with the intervention under a counterfactual scenario, then the observed outcome is equal to the counterfactual outcome.

Similar to Section 5.1, the exchangeability and consistency conditions are conventional in the causal inference literature. We also require an extra positivity condition in the presence of censoring, that is,

[TABLE]

for $a=\{0,1\}$ , which ensures that for any possible history of treatment assignments and covariates among those who are event-free and uncensored at $k$ , some subjects will remain uncensored at $k+1$ .

Finally, we rely on two dismissible component conditions which generalize the conditions in Section 5, by allowing for a hypothetical intervention to eliminate censoring at all times.

Dismissible component conditions: For all $l\in\mathcal{L}$ ,

[TABLE]

Under these conditions, $\Pr(Y^{a_{Y},a_{D},\bar{c}=0}_{K+1}=1)$ is identified from (LABEL:eq:_identification_censoring_L).

C.2. Proof of identifiability

We consider the counterfactual outcomes in a setting where $a_{Y}=0$ and $a_{D}=1$ (analogous arguments holds for the setting where $a_{Y}=1$ and $a_{D}=0$ ), and we use laws of probability as well as $\Delta$ 1c and $\Delta$ 2c to express

[TABLE]

where $Y^{a_{Y},a_{D},\bar{c}=0}_{-1}$ and $Y^{a_{Y},\bar{c}=0}_{-1}$ are empty sets.

For $s\geq 0$ and all $l$ such that $\Pr(D^{a,\bar{c}=0}_{s+1}=Y^{a,\bar{c}=0}_{s}=0,L=l)>0$ , let us consider the term

[TABLE]

where we use the fact that all subjects are event-free and uncensored at $k=0$ in the 2nd line, and laws of probability and E1 in the 3rd line. Then, we use positivity and E2 to find

[TABLE]

Similarly, if $s=1$ we use consistency, a new step like (LABEL:eq:_step_1_pos_exch), and consistency to find that

[TABLE]

If $s>1$ , we use consistency to find

[TABLE]

Then, we repeat the steps in (LABEL:eq:_step_1_pos_exch) and (LABEL:eq:_step_2_consistency) to find that for all $s\in(1,2,...,K+1)$ ,

[TABLE]

Similarly, for $D^{a,\bar{c}=0}_{s+1}$ we could follow the same steps as for $Y^{a,\bar{c}=0}_{s+1}$ to express

[TABLE]

Using the results in (LABEL:eq:_step_2_counterfactual_relation), (LABEL:eq:_hazard_Y) and (LABEL:eq:_hazard_D), we find that

[TABLE]

In words, we have derived that $\Pr(Y^{a_{Y},a_{D},\bar{c}=0}_{K+1}=1)$ is identified from a trial in which only subjects with $(A_{Y}=A_{D}=A)$ are observed, i.e. in a trial in which $A$ is randomized. Hence, in practice we only need data from the treatment arms in which $A\equiv A_{Y}\equiv A_{D}\in\{0,1\}$ .

Appendix D Proof of weighted representations

For the ease of exposition, define

[TABLE]

Consider the expression

[TABLE]

where we use the definition of expected value, the fact that $Y_{k}$ and $D_{k}$ are binary, and laws of probability.

We use laws of probability to express $\Pr(\bar{Y}_{k}=\bar{D}_{k}=\bar{C}_{k}=0,l\mid A=a_{Y})$ as

[TABLE]

where any variable indexed with a number $m<0$ is defined to be the empty set.

Arguing iteratively for $k-1,k-2,...,0$ we find that

[TABLE]

We plug in the expression for $W^{\prime}_{C,k}(a_{Y})$ to get

[TABLE]

We plug in the expression for the weights $W_{D,k}(a_{Y},a_{D})$ to get

[TABLE]

and the final expression is equal to (LABEL:eq:_identification_censoring_L).

Appendix E Exploring the dismissible component conditions

By considering causal graphs, we provide some insight into the interpretation of assumptions $\Delta$ 1 and $\Delta$ 2.

E.1. Scenario in which the dismissible component conditions are satisfied.

Consider the study from Appendix B in which $A_{Y}$ and $A_{D}$ were randomized without loss to follow-up, which ensures positivity and exchangeability. Furthermore, we assume that the usual assumptions about consistency is satisfied; if $A_{Y}=a_{Y}$ , $A_{D}=a_{D}$ , then $Y_{k}={Y}^{a_{Y},a_{D}}_{k}$ .

Assume that the causal structure in the single world intervention template (SWIT) of Figure 5 holds. Here, $A_{Y}$ is d-separated from both $Y^{a_{Y},a_{D}}_{k}$ and $D_{k}^{a_{Y},a_{D}}$ for $k\in 1,2$ . Similarly $A_{D}$ is d-separated from both $Y^{a_{Y},a_{D}}_{k}$ and $D^{a_{Y},a_{D}}_{k}$ . Hence, under the assumptions about positivity and consistency, we can identify the following joint law from the g-formula,

[TABLE]

where the last equality follows due to conditional independences that we read off the causal graph. Similarly, we can identify

[TABLE]

Using laws of total probability,

[TABLE]

Hence,

[TABLE]

that is $\Delta$ 1 is satisfied at $k=2$ . Using the same argument, we can derive that $\Delta$ 2 is satisfied for $k=2$ , and both $\Delta$ 1 and $\Delta$ 2 will be satisfied for $k=1$ . That is, Figure 5 implies that $\Delta$ 1 and $\Delta$ 2 hold. Furthermore, we could use exactly the same derivations to find that $\Delta$ 1 and $\Delta$ 2 hold in Figure 11, even if $U_{Y}$ and $U_{D}$ are unmeasured.

E.2. Scenario in which the dismissible component conditions are not necessarily satisfied

Consider the SWIT in Figure 12, which only differs from Figure 5 in the variable $U_{Y}$ that is an unmeasured common cause of $Y_{1}$ and $D_{1}$ . Here we read off Figure 12 to find that

[TABLE]

However, we cannot conclude from the graph that

[TABLE]

because there is an open collider path $a_{D}\rightarrow D_{1}\leftarrow U_{YD}\rightarrow Y_{1}$ . Hence, we cannot conclude that the graph in Figure 12 implies $\Delta$ 1, and our results do not allow us to identify $\Pr(Y^{a_{Y},a_{D}}_{1}=1)$ in this scenario. The unmeasured common cause $U_{YD}$ of $Y_{k}$ and $D_{k^{\prime}}$ for $k,k^{\prime}\in(0,1,...,K+1)$ leads to violation of $\Delta$ 1 and $\Delta$ 2.

Appendix F Simulations

Here we present simulations from 5 scenarios to illustrate the finite sample performance of the separable effects. We consider settings where the dismissible component conditions are satisfied, but also settings where these conditions are violated. Furthermore, we consider coverage under violation of the parametric model assumptions.

In each scenario, we simulated two randomized experiments in which 400 and 2000 subjects were randomly assigned to treatment $A\in\{0,1\}$ , respectively. To assess finite sample behavior, we calculated confidence intervals for 3 time points by simulating each experiment 500 times, and for each of these experiments we created non-parametric percentile bootstrap confidence intervals from 500 bootstrap samples.

The true cumulative incidences from the simulation scenarios are shown in Figure 13. Generally, our simulations confirm that the g-formula and IPW estimators perform satisfactory when the identifiability conditions are satisfied.

F.1. Data generating mechanism

For each individual, the data were generated from the following algorithm, where we have omitted $i$ subscripts to indicate inidivuals:

(1)

Draw $L_{1}\sim\text{Bernoulli}[p=0.25]$ . 2. (2)

Draw $L_{2}\sim\text{Bernoulli}[p=0.2L_{1}+0.8(1-L_{1})]$ . 3. (3)

Draw $A\sim\text{Bernoulli}[p=0.5]$ , and define $A_{Y}\equiv A_{D}\equiv A$ . 4. (4)

Set $D_{0}=Y_{0}=0$ . 5. (5)

For each $k\in\{0,K\}$ ,

•

if $D_{k}=Y_{k}=0$ ,

draw $D_{k+1}\sim\text{Bernoulli}[p=\alpha_{D}\psi_{k}(A_{Y},A_{D},L_{1},L_{2})]$ , where

[TABLE]

if $D_{k+1}=0$ ,

draw $Y_{k+1}\sim\text{Bernoulli}(p=\alpha_{Y}\lambda_{k}(A_{Y},A_{D},L_{1},L_{2}))$ , where

[TABLE]

if $D_{k+1}=1$ , set $Y_{k+1}=0$ .

•

else, define $D_{k+1}=D_{k}$ , $Y_{k+1}=Y_{k}$ .

The coefficients in each of the scenarios are found in Table 4 and the true cumulative incidence curves of $Y_{k+1},k\in\{0,99\}$ is found in Figure 13.

F.2. Scenario 1: Dismissible component conditions hold and no model mis-specification.

Data were generated from the simple setting described by the first row in Table 4; that is, there is a causal effect of (i) $A_{Y}$ on $Y_{k}$ , (ii) $A_{D}$ on $D_{k}$ , and (iii) $L_{1}$ on both $Y_{k}$ and $D_{k}$ . Here, both the dismissible component conditions hold conditional on $L_{1}$ .

To estimate the separable effects, we fitted the following models

[TABLE]

which are correctly specified, even if model (32) includes a term $\theta_{3}$ that is redundant. Thus, we would expect all our estimators to have nominal coverage, and this is confirmed in Table 5; here, coverage is derived from estimated 95% confidence intervals based on the parametric g-formula estimator (g-formula) and the weighted estimators ( $\hat{\nu}_{1,a_{Y},a_{D},k}$ and $\hat{\nu}_{2,a_{Y},a_{D},k}$ ) for the trial with $n=400$ subjects.

Scenario 2: Dismissible component conditions hold and minor model mis-specification.

In this scenario, there are causal effects of both $L_{1}$ and $L_{2}$ on $Y_{k}$ and $D_{k}$ (second row in Table 4). Both the dismissible component conditions hold conditional on $L_{1}$ and $L_{2}$ . We used regression models (32) and (33) for model fitting.

Note that in this setting (32) is correctly specified, but (33) is mis-specified because it does not include a term for $L_{2}$ . Thus, we would expect that the IPW estimator that uses the correctly specified regression model ( $\hat{\nu}_{2,a_{Y},a_{D},k}$ ) is unbiased, but the parametric g-formula estimator and the other IPW estimator ( $\hat{\nu}_{1,a_{Y},a_{D},k}$ ) are biased because (33) is mis-specified. The results in Table 6, however, suggest that all estimators have close to nominal coverage. This may be explained by the fact that the model mis-specification is minor, and the magnitude of the separable effects is small (see Figure 13).

Scenario 3: Dismissible component conditions hold and model mis-specification

In this scenario, both the dismissible component conditions hold conditional on $L_{1}$ and $L_{2}$ . Unlike Scenarios 1 and 2, we fitted the following regression models to the simulated data,

[TABLE]

Here, (34) is mis-specified because it does not include a term for $L_{2}$ , but (35) is correctly specified; thus the correctness of the model specifications are opposite from Scenario 2. Also, $L_{2}$ exerts larger effects on $Y_{k}$ and $D_{k}$ in this setting compared to Scenario 2.

The results in Table 7 illustrate that the IPW estimator $\hat{\nu}_{1,a_{Y},a_{D},k}$ is unbiased because it relies on a correctly specified model, but the parametric g-formula estimator and the other IPW estimator ( $\hat{\nu}_{2,a_{Y},a_{D},k}$ ) are biased – in particular, for shorter follow-up times – because they rely on mis-specified regression models.

Scenario 4: Dismissible component conditions fail and model misspecification.

The dismissible component condition $\Delta 2$ fails in this scenario due to the non-zero coefficient $\omega_{3}=5$ ; there is a direct effect $A_{Y}\rightarrow D_{k}$ for $k\in\{0,100\}$ . Yet we fitted regression models (32) and (33) to the simulated data.

The simulations suggest that none of the estimators has nominal coverage for $\Pr(Y^{a_{Y}=0,a_{D}=1}_{k+1}=1)$ . However, since dismissible component condition $\Delta 1$ holds we can identify $\Pr(Y^{a_{Y}=1,a_{D}=0}_{k+1}=1)$ , as suggested by the nominal coverage for this quantity in Table 8. Yet we cannot interpret a contrast $\Pr(Y^{a_{Y}=0,a_{D}=1}_{k+1}=1)\text{ vs }\Pr(Y^{a_{Y}=1,a_{D}=1}_{k+1}=1)$ as the separable direct effect of $A$ , due to the violation of the dismissible component condition.

Scenario 5: Dismissible component conditions hold and no model misspecification.

In this scenario, $L_{1}$ exerts (strong) causal effects on $Y_{k}$ but not on $D_{k}$ . Thus, all the dismissible component conditions hold marginally. To illustrate that we obtain unbiased estimates even if $L_{1}$ is not included in any of the regression models, we fitted the parsimonious models,

[TABLE]

and the results in Table 9 show that all estimators have nominal coverage, even if $L_{1}$ is not included in the models.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Jessica G. Young, Mats J. Stensrud, Eric J. Tchetgen Tchetgen, and Miguel A. Hernán. A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in Medicine , 2020.
2[2] Ross L Prentice, John D Kalbfleisch, Arthur V Peterson Jr, Nancy Flournoy, Vern T Farewell, and Norman E Breslow. The analysis of failure times in the presence of competing risks. Biometrics , pages 541–554, 1978.
3[3] Per Kragh Andersen, Ronald B Geskus, Theo de Witte, and Hein Putter. Competing risks in epidemiology: possibilities and pitfalls. International journal of epidemiology , 41(3):861–870, 2012.
4[4] Miguel A Hernán. The hazards of hazard ratios. Epidemiology (Cambridge, Mass.) , 21(1):13, 2010.
5[5] Miguel A Hernán. Does water kill? a call for less casual causal inferences. Annals of epidemiology , 26(10):674–680, 2016.
6[6] James M Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling , 7(9-12):1393–1512, 1986.
7[7] Constantine E Frangakis and Donald B Rubin. Principal stratification in causal inference. Biometrics , 58(1):21–29, 2002.
8[8] James M Robins and Thomas S Richardson. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: Finding the determinants of disorders and their cures , pages 103–158, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Separable Effects for Causal Inference in the Presence of Competing Events

Abstract.

1. Introduction

2. Observed data structure

3. Decomposition of treatment effects

3.1. Diethylstilbestrol and prostate cancer mortality

3.2. Statins and dementia

3.3. Practical considerations

4. Definition of separable effects

5. Identification of separable effects

5.1. Identifiability conditions

Dismissible component condition 1

Dismissible component condition 2

5.2. Identification formula

5.3. Intuition on the identification formula (LABEL:eq:_identification_L) and falsifiability of the separable effects.

5.4. Separable effects in the presence of censoring

5.5. Alternative representations of the identification formula

6. Estimation of separable effects

6.1. Inverse probability weighted estimators

6.2. Example: A randomized trial of prostate cancer

7. Discussion

Acknowledgements

Appendix A Some intuition about the magnitude of the separable direct effects.

Appendix B Conditional Independencies that imply the dismissible component conditions.

Hypothetical trial

B.1. Conditions that ensure Δ\DeltaΔ1 and Δ\DeltaΔ2

Appendix C Proof of identifiability

C.1. Identifiabilty conditions in the presence of censoring

C.2. Proof of identifiability

Appendix D Proof of weighted representations

Appendix E Exploring the dismissible component conditions

E.1. Scenario in which the dismissible component conditions are satisfied.

E.2. Scenario in which the dismissible component conditions are not necessarily satisfied

Appendix F Simulations

F.1. Data generating mechanism

F.2. Scenario 1: Dismissible component conditions hold and no model mis-specification.

Scenario 2: Dismissible component conditions hold and minor model mis-specification.

Scenario 3: Dismissible component conditions hold and model mis-specification

Scenario 4: Dismissible component conditions fail and model misspecification.

Scenario 5: Dismissible component conditions hold and no model misspecification.

B.1. Conditions that ensure $\Delta$ 1 and $\Delta$ 2