Bounding Causes of Effects with Mediators
Philip Dawid, Macartan Humphreys, Monica Musio

TL;DR
This paper develops a method to bound the probability of causation in binary exposure-outcome scenarios by leveraging the structure of mediators, providing formulas and bounds for various causal processes.
Contribution
It introduces a general formula for bounding the probability of causation using mediator structures, improving causal inference in complex causal processes.
Findings
Bounds on probability of causation can be derived from mediator data.
Maximum and minimum bounds are achieved with processes of at most two steps.
Probability of causation can be zero with negative data, but not one even with extensive positive mediator data.
Abstract
Suppose X and Y are binary exposure and outcome variables, and we have full knowledge of the distribution of Y, given application of X. From this we know the average causal effect of X on Y. We are now interested in assessing, for a case that was exposed and exhibited a positive outcome, whether it was the exposure that caused the outcome. The relevant "probability of causation", PC, typically is not identified by the distribution of Y given X, but bounds can be placed on it, and these bounds can be improved if we have further information about the causal process. Here we consider cases where we know the probabilistic structure for a sequence of complete mediators between X and Y. We derive a general formula for calculating bounds on PC for any pattern of data on the mediators (including the case with no data). We show that the largest and smallest upper and lower bounds that can resultā¦
| 1 |
| No evidence | Positive evidence | Mixed evidence | ||
|---|---|---|---|---|
| Largest | Upper | |||
| Lower | ||||
| Smallest | Upper | (*) | (*) | (*) |
| Lower |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Bounding Causes of Effects with Mediators
Philip Dawid University of Cambridgeā[email protected] āā
Macartan Humphreys Columbia University & WZB Berlinā[email protected] āā
Monica Musio UniversitĆ degli Studi di Cagliariā[email protected]
Abstract
Suppose and are binary exposure and outcome variables, and we have full knowledge of the distribution of , given application of . From this we know the average causal effect of on . We are now interested in assessing, for a case that was exposed and exhibited a positive outcome, whether it was the exposure that caused the outcome. The relevant āprobability of causationā, PC, typically is not identified by the distribution of given , but bounds can be placed on it, and these bounds can be improved if we have further information about the causal process. Here we consider cases where we know the probabilistic structure for a sequence of complete mediators between and . We derive a general formula for calculating bounds on PC for any pattern of data on the mediators (including the case with no data). We show that the largest and smallest upper and lower bounds that can result from any complete mediation process can be obtained in processes with at most two steps. We also consider homogeneous processes with many mediators. PC can sometimes be identified as 0 with negative data, but it cannot be identified at 1 even with positive data on an infinite set of mediators. The results have implications for learning about causation from knowledge of general processes and of data on cases.
1 Introduction
Even the best possible evidence regarding the effects of a treatment on an outcome is generally not enough to identify the probability that the outcome was caused by the treatment.
For instance, researchers conducting randomised controlled trials may determine that providing a medicine to school children increases the overall probability of good health from one third to two thirds. This information, no matter how precise, is not enough to answer the following question: Is Ann healthy because she took the medicine? It is not even enough to answer the question probabilistically. The reason is that, consistent with these results, it may be that the medicine makes a positive change for 2 out of 3 students, but an adverse change for the remainder: in that case the medicine certainly helped Ann. But it might alternatively be that the medicine makes a positive change for 1 in 3 children but no change for the others. In that case the chances it helped Ann are just 1 in 2. Of the children taking the medicine, two thirds are healthy. Half of these are healthy because of the medicine, whereas the other half would have been healthy anyway.
Put differently, the experimental data identifies the āeffects of causes,ā (EoC) but we are interested in the reverse problem, of quantifying ācauses of effectsā (CoE). The CoE task of defining and assessing the probability of causation (Robins andĀ Greenland, 1989) in an individual case has been considered by Tian andĀ Pearl (2000); Dawid (2011); Yamamoto (2012); Pearl (2015); Dawid, Musio andĀ Fienberg (2016); Dawid, Murtas andĀ Musio (2016); Dawid, Musio andĀ Murtas (2017); Murtas, Dawid andĀ Musio (2017). Note that this is distinct from the āreverse causal questionā of Gelman andĀ Imbens (2013), which is an EoC task aimed at ascertaining which causes have an effect on an outcome.
To understand causes of effects better, we might seek additional evidence along causal pathways. For example, researchers evaluating development programs specify ātheories of changeā and seek evidence for intermediate outcomes along a pathway linking treatment to outcomesāmost simply, Was the treatment received? Was the medicine ingested? VanĀ Evera (1997) describes various tests that might be implemented using such ancillary evidence. A āsmoking gun testā searches for evidence that, though unlikely to be found, would give great confidence in a claim if it were to be found; a āhoop testā test is a search for evidence that we expect to find, but which, if found to be absent, would provide compelling evidence against a proposition (as if the proposition were asked to jump through a hoop).
Sometimes many points along a causal pathway are investigated. An intervention might be to provide citizens with information on political corruption, in the hope that this will lead to ultimate changes in politiciansā behavior. Researchers might then check many points along a chain of intermediate outcomes. Was the political message delivered? Was it understood? Was it believed? Did it induce a change in behavior by citizens? Did this in turn produce a change in behavior by politicians?
Seeing positive evidence at many points along a such a causal chain would appear to give confidence that the final outcome is indeed due to the conjectured cause. This is the core premise of āprocess tracing,ā as deployed by qualitative political scientists (Collier, 2011), as well as of mixed methods research as used in development evaluation (White, 2009). In the most optimistic accounts it is assumed that, as one gets close enough to a process, by observing more and more links in a chain, the link between any two steps becomes less questionable and eventually the causal process reveals itself (Mahoney, 2012, 581).
We here provide a comprehensive treatment of the scope for inferences of this form from knowledge of causal chains. We obtain a general formula for calculating bounds on the probability of causation, for an arbitrary pattern of data along chains of binary variables. We derive implications of this formula, and calculate the largest and smallest upper and lower bounds achievable from any causal chain consistent with the known relation between and . We give special attention to what might appear to be the best possible conditions: those in which causal processes really do follow a simple causal chain, in which researchers have complete experimental evidence about the probabilistic relationship between any two consecutive nodes in the chain, in which the chain is arbitrarily long, in which the causal effect of each intermediate variable on its successor climbs to 1, and in which researchers observe outcomes consistent with positive effects at every point on the chain. We show that such information does indeed increase confidence that an outcome can be attributed to a cause and, for homogeneous chains at least, that the longer the chain the better. However, we find that even under these ideal conditions our ability to narrow the bounds for the probability of causation can be modest. In the example of attributing Annās health to good medicine, a homogeneous process with arbitrarily many positive intermediate steps observed might only tighten the bounds from to .
In contrast, we show that non-homogeneous processes can tighten the bounds considerably. For example, suppose Ann was prescribed the medicine and recovered. If we know that being prescribed the medicine is the only way in which Ann could have obtained and taken the medicine, and that taking the medicine helps anyone who would otherwise be sick, then with positive evidence on a single intermediate point on the causal chaināthat Ann did indeed take the medicineāwe can identify the probability that prescribing the medicine caused Annās recovery at . (We are still short of 1, because it is possible that Ann would have recovered even without the medicine.) A process like this, in which we observe a ānecessary condition for a sufficient conditionā, provides the largest possible lower bound on the probability of causation available from any observations on any chain. At this point we have done the best possible and more data along the chain will not help.
Although achieving identification of the probability of causation at 1 is generally elusive, negative data can yield identification at 0, either in two steps from a heterogeneous process, or from alternating data along an infinite homogeneous chain. In this sense, information on mediators can support āhoopā tests but not āsmoking gunā tests.
1.1 Plan of paper
Existing results (Dawid, Murtas andĀ Musio, 2016) have considered the case of a single unobserved mediator. We generalize this in two ways. First, we consider situations with chains of arbitrary length. Secondly, we calculate bounds for general data, that is, for situations in which the values of none, some or all the mediators are observed.
We proceed as follows. Section 2 introduces the set-up, and provides general formulae for bounding the probability of causation for a simple one-step process. In § 3 we extend these results to cases in which we know the structure of a complete mediation process. We consider various degrees of knowledge of the values of the mediators for the individual case at hand: all unobserved, all observed, or just some observed. Our main result is Theorem 4, which provides a general formula applicable to all cases.
SectionĀ 4 draws out the detailed implications of this result in a variety of contexts. In § 4.1 we investigate the largest achievable lower and upper bounds from any sequence, and find that these can be achieved by heterogeneous two-step processes. SectionĀ 4.2 examines the case of homogeneous processes of arbitrary length. We show that an alternating pattern for the values at all intermediate points can lead to a limiting value of 0 for the probability of causation. However, it is not generally possible for even the most positive evidence to identify the probability of causationāand a fortiori not possible to identify it at 1āeven in the limit of infinitely many steps. Section § 4.3 considers implications of our results for gathering data on mediators. In § 5 we compare the bounds based on knowledge of mediator processes with those achievable from knowledge of covariates, which can be much tighter. We summarise our findings in § 6. Various technical details for the proofs in the paper are elaborated in three appendices.
2 Preliminaries
We consider a binary treatment variable and binary outcome variable . We suppose we have access to experimental (or unconfounded observational) data supplying values for , where we use the notation to denote a regime in which is set to value by external intervention.
Define
[TABLE]
Then is the average causal effect of on , while is a measure of how common is.
The transition matrix from to (where the row and column labels of any such matrix are implicitly [math] and in that order) can be written:
[TABLE]
All entries of must be non-negative: this holds if and only if
[TABLE]
We have equality in (2) if and only if one of the entries of (1) is 1, in which case we term degenerate. For , this will happen if either , in which case and can be thought of as a sufficient condition for ; or , in which case , and can be thought of as a necessary condition for . Defining, for ,
[TABLE]
we might thus regard as measuring the relative sufficiency of for .111Although we do not focus on it, for the analogous quantity can be interpreted as the relative sufficiency of for .
2.1 Potential outcomes and causes of effects
While knowledge of the transition matrix , and in particular the āaverage causal effectā , is directly relevant for EoC (āeffects of causesā) analysis, it is not enough to support CoE (ācauses of effectsā) analysis. For this we need to introduce the pair of potential outcomes, {\mbox{\mathbf{Y}}}=(Y_{0},Y_{1}), where we conceive of as the value would take, if . We regard both and as existing simultaneously, even prior to setting the value of , and as having a bivariate probability distribution.
We can now define the following events in terms of (where denotes , the value distinct from , etc.):
General causation
:= āā.
That is, changing the value of will result in a change to the value of . We can also describe this as ā affects .ā
When the relevant variables and are clear from the context we will simplify the notation to .
Specific causation
:= āā (for or ).
That is, changing the value of from to would change the value of from to . We can also describe this as ā causes .ā When the relevant variables and are clear from the context we will simplify the notation to .
We note that .
Probability of Causation.
In cases of interest we will have observed , and want to know the probability that caused , given this information. We denote this quantity by , or when the relevant variables and are clear from the context. Thus
[TABLE]
The joint distribution for , while constrained by knowledge of the transition matrix , is in general not fully determined by it. Rather, we can only deduce that it has the form of TableĀ 1, where the marginal probabilities agree with (1) according to .
However, the internal entries of TableĀ 1 are not determined by , but have one degree of freedom, expressed by the āslackā quantity = . We see that
[TABLE]
the probability of general causation.
The only constraints on are that all internal entries of TableĀ 1 must be non-negative, which holds if and only if
[TABLE]
In particular , and thus the bivariate distribution of in TableĀ 1, is uniquely determined by if and only is degenerate.
We further note
[TABLE]
whence, by (6),
[TABLE]
Throughout this article we shall assume no confounding, expressed mathematically as X\,\mbox{\perp!!!\perp}\,{\mbox{\mathbf{Y}}}. Then
[TABLE]
which is thus subject to the interval bounds, given by (9) or (10), as appropriate, divided by the known entry of the transition matrix .
This analysis delivers the following lower and upper bounds (prefix āsā for āsimpleā):
[TABLE]
In the absence of additional information, the above bounds constitute the best available inference regarding the probability of causation.
Specifically, when , on defining
[TABLE]
we have the following upper bounds:
For :
[TABLE]
For :
[TABLE]
2.2 Special case
A particular interest is in cases where (so the overall effect of and is positive) and we observe positive outcomes, , . In this case we omit the subscript . We have
[TABLE]
and interval bounds given by
[TABLE]
This result agrees with (Tian andĀ Pearl, 2000; Dawid, 2011; Dawid, Musio andĀ Murtas, 2017).
PC is identified (i.e., the interval in (26) reduces to a single point) if and only if , which holds when is degenerate with either the lower left or upper right element of being 0. In the former case , while in the latter case .
More generally, we have , so .
3 Bounds from mediation
We now suppose that, in addition to and , we can gather data on one or more binary mediator variables . We also define and . We are interested in assessing the probability that caused for a new case where we have information on the values of some or all of the mediators .
We assume that the data are based on experiments, or in any case are such as to allow us to determine the one-step interventional probabilities , . We shall here confine attention to the case of a complete mediation sequence, where
[TABLE]
We shall further suppose that, for any new case considered, there is no confounding at every step, so that
[TABLE]
In this case the sequence of observations on a new case will form a (generally non-stationary) Markov chain. This is an empirically testable consequence of our assumptions, assumptions which would therefore be falsified if the Markov property is found to fail (although those assumptions are not guaranteed to be valid when it is found to hold.)
Let the transition matrix from to be , and the overall transition matrix from to be . We shall write
[TABLE]
to indicate that we are assuming the above mediation sequence, and refer to (27) as a decomposition of the matrix . In particular we then have .
We can readily show by induction that
[TABLE]
In particular, for the case , (29) becomes
[TABLE]
On account of (28) we have the following result:
Theorem 1
The average causal effect of on is the product of the successive average causal effects of each variable in the sequence on the following one.
Again, to conduct CoE rather than EoC analysis, we introduce, for , bivariate variables
[TABLE]
where denotes the potential value of under , supposed unaffected by values of previous ās. We further assume that the variable {\mbox{\mathbf{M}}}_{i} is common to all the various worlds, whether actual or counterfactual, under consideration. The actually realised values satisfy .
As the expression of our āno confoundingā assumptions, we impose mutual independence between , {\mbox{\mathbf{M}}}_{1},ā¦,{\mbox{\mathbf{M}}}_{n}.
Theorem 2
. That is to say, affects if and only if each affects the next.
**Proof. ** Suppose first that each variable affects the next. Then changing the value of will change that of , which in turn will change that of , and so on until the value of is changed, so showing that affects . Conversely, if, for some , does not affect , then, whether or not has been changed, the value of will be unchanged, whence so too will that of , and so on until the value of is unchanged, whence does not affect .
Corollary 1
- (i).
** 2. (ii).
** 3. (iii).
Given the detailed information on the decomposition (27), the constraints on are now:
[TABLE]
**Proof. **
By the assumed mutual independence of the ({\mbox{\mathbf{M}}}_{i}).
By (5).
By (ii), (6) for each , and (28).
On account of (i) we have:
Corollary 2
For any decomposition, the probability that affects is the product of the probabilities that each variable in the sequence from to affects the next in the sequence.
On comparing (31) with (6), we see that detailed knowledge of the mediation process has not changed the lower bound for . However, the upper bound is typically reduced:
Theorem 3
The upper bound of (31), which takes into account the decomposition (27), does not exceed the upper bound of (6), which ignores the decomposition. It will be strictly less if all the are non-degenerate with .
**Proof. ** Consider first the case . Then
[TABLE]
It follows that
[TABLE]
Moreover, we shall have strict inequality in (33), and hence also in (34), if is non-degenerate and .
The result for general follows easily by induction.
We note that the above condition for strict inequality in (34), while sufficient, is not necessary. For example, in the case it will also hold if and have different signs, since then we would have strict inequality in (32).
It follows from (31) and (34) that collapsing two mediators into a single one can only increase the upper bound for :
Corollary 3
Consider two decompositions and , where . Then the upper bound for for the former does not exceed that for the latter.
3.1 Bounds when mediators are unobserved
Suppose first that, for the new case, we have observed , but the values of the mediators are not observed. Even in this case, as shown for the two-term decomposition in Dawid, Murtas andĀ Musio (2016), knowledge of the decomposition (27) of can alter the bounds for PC.
Indeed, in this case (4) still applies, where is given by (7) or (8) as appropriate, but now with subject to the revised bounds of (31). In each case the lower bound is unaffected, but, by TheoremĀ 3, the upper bound is reduced.
This analysis delivers the following revised bounds (prefix āuā for āunobserved mediatorsā):
[TABLE]
3.2 Special case
In particular, for the case , where we observe , (but the values of mediators are not observed), we have revised bounds
[TABLE]
For this agrees with the analysis of Dawid, Murtas andĀ Musio (2016).
3.3 Bounds when some or all mediators are observed
Now suppose that, in addition to , , we also observe data on mediators () for the new case. In particular we observe , for . For notational simplicity we write for , for . We also identify and (so , ).
The relevant probability of causation is now
[TABLE]
Note that in contrast to the difference between (35)ā(38) on the one hand and (11)ā(14) on the other hand, which relate to the same quantity but express different conclusions about it, \mbox{\rm\widetilde{\mbox{PC}}}_{xy} is a genuinely different quantity from , since it conditions on different information about the new case.
Theorem 4
Given observations on , the probability that caused is given by the product of the probabilities that each observed term in the sequence caused the next observed term:
[TABLE]
**Proof. ** From TheoremĀ 2 we have
[TABLE]
whence, using the āno-confoundingā independence properties,
[TABLE]
Now since we have the decomposition information about the mediators (if any) occurring between and , but not their values for the new case, the bounds on any factor in (40) will, mutatis mutandis, have the form of the relevant expressions for and , as displayed in (35)ā(38). Then the overall lower [resp., upper] bound on \mbox{\rm\widetilde{\mbox{PC}}}_{xy} will be the product of these lower [resp., upper] bounds, across all terms. This procedure supplies a complete recipe for determining the appropriate bounds on \mbox{\rm\widetilde{\mbox{PC}}}_{xy} in the knowledge of the full decomposition of and the values of the observed mediators for the new case.
3.4 Special cases
Again consider the case , . On account of (28) we can, after possibly switching the labels [math] and for some of the ās, take , all . We assume henceforth that this is the case. The above procedure then delivers lower bound [math] unless , all , so that , all . In that case we obtain lower bound (with prefix āoā for āobserved mediatorsā):
[TABLE]
It is easy to see that this lower bound can only increase if we introduce further observed mediators. It follows that the smallest lower bound occurs when the are no observed mediators, when it reduces to as in (39) and (26); while the largest lower bound occurs when all mediators are observed (all taking value 1)āthat is to say, there is positive evidence for every link in the mediation chain.
In the remainder of this paper we shall give special attention to this case, and write simply for \mbox{\rm\widetilde{\mbox{PC}}}_{11}, etc. The bounds for are then:
[TABLE]
The following result follows directly from the above considerations:
Lemma 1
*The lower bound oLB of (42) is at least as large as the lower bound sLB of (26). *
It is not, however, always the case that : see (45) below.
4 Implications
Equation (40) provides a general formula for calculating bounds on the probability of causation for any pattern of data observed on mediating variables (including no data).
We now derive implications from this analysis.
4.1 Largest and smallest upper and lower bounds
Consider an arbitrary decomposition of :
[TABLE]
with , . We restrict attention to the case and assume that variables are labeled so that each .
We investigate the smallest and largest achievable values for , mUB (prefix for mixed evidence) and show that in each case these are achievable by decompositions involving at most one mediator.
Theorem 5
Let the (known, fixed) transition matrix from to be , with and . The largest and smallest upper and lower bounds from any complete mediation process for the case with mediators unobserved, for the case with positive outcomes on all mediators observed, and for mixed cases, that include some negative evidence on the mediators, are as given in TableĀ 2.
These can all be achieved by decompositions of length 1 or 2.
**Proof. ** See AppendixĀ A.
The largest upper bound with mediators unobserved, , can be achieved without any mediators. Since unobserved mediators do not alter the lower bound we have . In addition we have , which is achievable, for example, from the following decomposition:
[TABLE]
Note that with this decomposition PC is identified via two degenerate transition matrices: is a sufficient condition for , while is a necessary condition for .
The smallest upper and lower bounds available when mediators are observed agree with the simple lower bound. Positive evidence cannot reduce the lower bound, but it can reduce the upper bound to the lower bound, at which point is identified. This can be achieved by the same decomposition given in (44).
The largest upper bound with positive evidence on mediators, , can exceed the simple upper bound when . It is achieved by the following two-term decomposition, involving a single mediator:
[TABLE]
The lower bound can be raised with positive information on mediators, and takes its largest value with the following degenerate two-term decomposition , involving a single mediator:
[TABLE]
With this decomposition is identified via two degenerate transition matrices: in this case is a necessary condition for , while is a sufficient condition for . The largest lower bound with positive evidence from this decomposition is which can fall far short of 1, implying that in general mediators cannot provide āsmoking gunā evidence that caused .
For the case with mixed evidence on the mediators the lower bound is always 0. The smallest upper bound is also 0, which can be achieved by the decomposition (46) above, with the single mediator observed at 0 (the key feature of this decomposition is that can not be caused by ). In this case is identified at 0, showing that it is possible for negative data on mediators to provide āhoopā evidence that did not cause . The highest upper bound, , can be achieved by a two-step decomposition , with the mediator taking value 0. For this occurs with the decomposition with parameters
[TABLE]
For it occurs with decomposition parameterized by
[TABLE]
4.2 Homogeneous transitions
Throughout this section we confine attention to the special case , . We specialize further to the case of a constant one-step transition matrix, for all . We define , , in terms of and in parallel to (3), (15) and (16).
In this case, by (28) and (29), we have
[TABLE]
In particular, we note that the relative sufficiency of for is preserved at each intermediate step: . It follows that .
We have
[TABLE]
Note that, for large , must be close to 1 and close to 0, with the same sign as .
Using (51) and (52) in (39) and (42) yield the following bounds for a homogeneous process:
[TABLE]
In particular, for the degenerate cases , so that , we see, that for all , PC and are both identified, at when , and at 1 when āthe existence of the mediators being irrelevant in these cases.
Mixed evidence
Here we assume the process is non-degenerate.
For the case with some negative evidence the lower bound, say, is always 0, as noted in Section § 3.4. The upper bound, however, depends on the particular pattern of positive and negative evidence. For any sequence of observations on consecutive mediators (allowing and , both required to take value 1), denote the associated upper bound by . Let denote a full sequence of observations (i.e., on all mediators). We search for a full sequence {\mbox{\mathbf{s}}}_{0} yielding the maximum value, say, of \mbox{\rm UB}({\mbox{\mathbf{s}}}).
Theorem 6
For large enough , we have
[TABLE]
The optimal sequence {\mbox{\mathbf{s}}}_{0} alternates , except, if is odd, for the final 2 symbols.
**Proof. ** See AppendixĀ B.
For the smallest possible upper bound is for all . Otherwise, as . Then with alternating evidence on many mediators the associated probability of causation, say, is effectively identified as [math].
FigureĀ 1 plots the intervals , and for a range of cases. It highlights how modest are the gains from repeated observation of homogeneous mediators and how alternating evidence can tighten bounds as long as .
4.2.1 Unboundedly many mediators
We now consider the behaviour of the bounds when we have a potentially unlimited sequence of variables directly mediating between and āstill assuming identical one-step transition matrices. Our results are given in TheoremĀ 7.
Theorem 7
[TABLE]
**Proof. ** See AppendixĀ C.
In particular, for we have
[TABLE]
and
[TABLE]
Proposition 1
For , is a concave strictly increasing function of , and and (for ) are both convex strictly decreasing functions of .
We do not have a full proof of PropositionĀ 1. Supporting evidence is given by numerous plots of and against for various pairs, and the following two results, which are proved in AppendixĀ C.
Lemma 2
If , then is a concave increasing function of , and and (for ) are convex strictly decreasing functions of , for sufficiently large.
Lemma 3
For the non-degenerate case , , , and (for ) .
4.3 Implications for data gathering
Our results have focused on improving the bounds on PC by learning about general mediating processes together with values for prespecified mediators for the case at hand. Our results can also be used to suggest which mediators researchers might most fruitfully seek to observe for the case at hand.
Thus consider a homogeneous process with steps ( even) and suppose that researchers can observe the value of just one mediator . In this case we can show that the lower bound on , if we were to observe , is maximized if the central mediator in the sequence is observed. To see this, note that from (28), (29) and TheoremĀ 4, the lower bound from observation of mediator is given by the product of the lower bound for the probability that caused and the lower bound for the probability that caused :
[TABLE]
where and are given by (51) and (52). This expression has the form , where is decreasing and convex in : this holds since , and . Hence the denominator is minimised, and so is maximised, when .
As an illustration, suppose 121 dominoes stand in a row. The fall of any domino increases the chance that its neighbor will fall from 0.005 to 0.995. You know that the first domino was knocked and fell, that the last is also down, and want the probability that the fall of the first one caused the fall of the last one. A lower bound above 50% would secure a conviction of dominoĀ 1.
With no further information, the lower bound is 0.461ānot enough to convict. But now suppose you can seek information on the status of just one other domino in the sequence: which should you choose? It is better to choose in the middle than at the edges.
If for example you were to seek information on the status of domino 2 and found that it had fallen, you would find āa modest gain, reflecting the fact that you fully expected domino 2 to have fallen, given that domino 1 was knocked. However, you are less sure you will find domino 61 down. If you do, you find ā enough to convict dominoĀ 1.
Note that in all cases the lower bound would be 0 if the intermediate domino were found to be standing. Taking both possible outcomes into account, the expected lower bound is always . But the second strategy does better than the first, in allowing the possibility to obtain a larger lower bound (albeit with a smaller probability), and so secure a conviction.
5 Comparisons with other bounds
Although knowledge of mediators can narrow bounds, we have seen that this narrowing can be modest, even with access to an infinite sequence of positive evidence along a causal path. To put our results in context, we compare them with bounds that can be achieved from monotonicity, and from covariate information. Knowledge of the bounds achievable by different strategies provides some guidance as to whether a strategy would be worth pursuing.
Monotonicity
Suppose that we somehow knew that there are no cases for which the exposure would prevent the outcome, i.e., such that . From TableĀ 1 this is equivalent to , its lower limit, which in turn implies that PC, given by (25), is identified at its lower limit,
However, since monotonicity is an attribute of the typically unidentifiable joint distribution of , it is not easy to justify without additional knowledge. One case where this works is when we know the existence of a mediation process with decomposition (44).
Observed covariate
Suppose that, in addition to and , we can observe a binary covariate , which can affect the dependence of on . Let , and let be the transition matrix from to , conditional on ; for consistency with the known we must have .
In particular, it could be the case that , and
[TABLE]
In this case knowledge that an individual with also has is enough to identify PC at 1.
Unobserved covariate
As shown in Dawid (2011), knowledge of covariates can improve bounds, even if their values are not observed for the case at hand. In particular, this can let us identify PC at the upper bound, . For this to be possible, however, the average treatment effect must be negative for some value of .
Thus suppose , and the conditional transition matrices are:
For ,
[TABLE]
For ,
[TABLE]
In either case, knowledge that is sufficient to infer that . This identifies the probability of causation: for , for . In both cases we hit the upper bound.
Comparisons
FigureĀ 2 compares the bounds obtained, for a range of values of and . It illustrates how, in general, lower bounds rise with and fall with . For homogeneous processes the lower bounds improve on the simple bounds, although the gain from unlimited steps is not a striking improvement on that for just two steps. The best gains from non-homogeneous decompositions are substantial, as are the gains from knowledge of covariates, especially when is small.
6 Conclusion
We close with some comments, which may help to guide the collection of ancillary evidence to improve the bounds on the probability of causation. These are based on our general results, as exemplified in FigureĀ 2.
Knowledge of mediation processes, and of positive values for some mediators in a particular case, can raise the lower bound on the probability of causation, thus providing some evidence against a sceptic who doubts that the outcome in the case can be attributed to the putative cause. However, it may well not raise the bound enough to convince her. In contrast, for some processes, observing negative evidence on mediators can effectively convince the sceptic that the outcome is not the result of the exposure. 2. 2.
Observing positive data on homogeneous mediation processes can improve the bounds, but there are diminishing returns, and full identification is not achieved, even with infinite data. 3. 3.
For a homogeneous process, observation in the middle of the process is more informative than nearer the edges. 4. 4.
Heterogeneous mediation processes can sometimes yield identification with minimal auxiliary data gathering:
- ā¢
A process where is a necessary condition for a sufficient condition for yields the largest possible upper bound, and identifies the probability of causation. For example, if it is known that the effect of delivering a deworming medicine passes uniquely through ingestion, and ingestion is sufficient for effective deworming, then evidence of ingestion raises the lower bound and identifies the probability of causation.
- ā¢
A process in which is a sufficient condition for a necessary condition for yields identification, and there is no gain from gathering data on the mediator. For instance if ingesting medicine is a sufficient condition for good health, and good health is a necessary condition for good school performance, then observing ingestion and good school performance is sufficient to achieve identification. There are no additional gains from measuring health, since good health is already implied by good performance. 5. 5.
Potential gains from knowledge of mediation processes are typically weaker than potential gains from knowledge of conditions under which interventions are more or less effective. Even when covariates are unobserved for the case at hand, knowledge of the general effect of covariates can tighten the bounds when some subgroups exhibit adverse effects. On this basis researchers might be able to assess whether a search for a suitable covariate could lead to improved bounds, and perhaps even identification of the probability of causation.
Appendix A Proof of TheoremĀ 5
A.1 Mediators unobserved
Lower bounds:
uLB is unchanged by knowledge of the mediation process alone and so the largest and smallest values of uLB are .
Smallest upper bound:
From (39) we can see that for a degenerate two-term decomposition with and , . In this case PC is identified.
Largest upper bound:
It follows from CorollaryĀ 3 and (25) that this is achieved when there are no mediators, and is thus sUB.
A.2 Positive data observed at every step
We now consider the case where mediators are observed. Then, for the decomposition (43),
[TABLE]
Smallest lower bound
It follows from LemmaĀ 1 that the smallest achievable lower bound is
[TABLE]
which does not require any mediators.
Smallest upper bound
Trivially we must have .
Note now that the decomposition (44) identifies \mbox{\rm\widetilde{\mbox{PC}}}=\mbox{{\it{s}}\rm{LB}}, whence in particular , the smallest possible value.
Largest lower bound
Lemma 4
[TABLE]
**Proof. ** This holds since
[TABLE]
Lemma 5
Let . Then
[TABLE]
**Proof. ** Follows from matrix multiplication, on noting that each term is the leading entry of its associated transition matrix.
Corollary 4
Let . Then
[TABLE]
From (42), LemmaĀ 4 and CorollaryĀ 4 we deduce:
Corollary 5
Let . Then .
However the value can be achieved, specifically for the degenerate two-term decomposition (46), so this is indeed the largest lower bound. And in this case we have identification: \mbox{\rm\widetilde{\mbox{PC}}}=(1+\tau-\rho)/2.
We note that, since , the largest lower bound, , can not exceed the simple upper bound . Thus any lower bound must lie in the simple interval .
Largest upper bound
Lemma 6
[TABLE]
**Proof. ** Trivial if . Otherwise follows from .
Lemma 7
Let . Then
[TABLE]
**Proof. ** Trivial if both and (and hence, by (30) and the fact that , also ) are negative.
If , , we have to show . This follows from (30). Similarly if , (using .
Finally, if , (and so also , the result follows from (34).
Corollary 6
Let . Then
[TABLE]
From LemmaĀ 6 and (42) we deduce:
Corollary 7
For decomposition , .
However the value can be achieved. If , no mediators are required. If , the value is achieved by the two-term decomposition (45). By LemmaĀ 6, this largest upper bound is at least as large as the simple upper bound sUB of (26).
Since we know that , we cannot have identification of in this case unless these inequalities become equalities, which only holds when . In fact for the decomposition (45) we have .
A.3 Negative data observed at some steps
The lower bounds at 0 are immediate from Equation (40). It is easy to verify that the lowest upper bound at 0 is achievable by the decomposition (45), and the highest upper bound at 1 is achievable from the decompositions in (47) and (48). Since these bounds are at 0 and 1 they are the extreme values obtainable from any process involving some negative data.
Appendix B Proof of TheoremĀ 6
Using (17)ā(20) and (21)ā(24), we have the following upper bounds for a single step.
For :
[TABLE]
For :
[TABLE]
When , all these upper bounds are 1, and the upper bound for any evidence sequence is 1.
Otherwise, does not depend on , while as . So there exists such that . Henceforth we suppose . Then we have:
Lemma 8
[TABLE]
**Proof. ** We have if , or 1 if , while if , or if . In all cases , while and .
Corollary 8
The optimal sequence {\mbox{\mathbf{s}}}_{0} can not contain any subsequence of repeated values of length greater than 2.
We now consider separately the cases of positive and negative .
1. .
Suppose {\mbox{\mathbf{s}}}_{0} contains a subsequence . It must then be followed by a , so {\mbox{\mathbf{s}}}_{0} contains a subsequence . Since , while , on replacing this subsequence by we would achieve a smaller upper bound. This contradiction shows that {\mbox{\mathbf{s}}}_{0} cannot contain any successive repeated [math]ās.
Now suppose {\mbox{\mathbf{s}}}_{0} contains a subsequence . Consider the first appearance of this. If not at the very end, it must be followed by . Now replace this subsequence by . Since , we have not changed \mbox{\rm UB}({\mbox{\mathbf{s}}}_{0}), but have postponed the first occurrence of . We can thus assume that the first such occurrence (if any) is at the very end.
If now is even, the first values must be the alternating sequence . But this can not be followed by , since that would produce a subsequence . We deduce that {\mbox{\mathbf{s}}}_{0} must be the full alternating sequence . The smallest possible upper bound with mixed evidence is thus \mbox{{\it{m}}\rm{UB}}_{n}=\mbox{\rm UB}({\mbox{\mathbf{s}}}_{0})=\gamma^{n/2}.
If is odd, there must at least one appearance of . The above argument now delivers as {\mbox{\mathbf{s}}}_{0} the alternating sequence of length , followed by the final . We now have \mbox{{\it{m}}\rm{UB}}_{n}=\mbox{\rm UB}({\mbox{\mathbf{s}}}_{0})=\gamma^{(n-1)/2}\delta^{\prime}.
2. .
The argument here is almost, but not quite, a mirror image of that above.
Suppose that {\mbox{\mathbf{s}}}_{0} contains a subsequence . If not at the very end, it must be followed by a [math] so that contains a subsequence . Now, since and , {\mbox{\mathbf{s}}}_{0} cannot contain any internal successive repeated 1ās. Also, {\mbox{\mathbf{s}}}_{0} can not end with , and hence with , since . So there can be no repeated ās.
Now suppose {\mbox{\mathbf{s}}}_{0} contains a subsequence . Consider the first appearance of this. If not just before the final , it must be followed by . Now replace this subsequence by . Since , we have not changed \mbox{\rm UB}({\mbox{\mathbf{s}}}_{0}), but have postponed the first occurrence of . So the only possibility for two successive [math]s is if {\mbox{\mathbf{s}}}_{0} ends with .
Before the end, we must have an alternating sequence .
If is even, we can not then conclude with , so in this case we must have the full alternating sequence . The smallest possible upper bound with mixed evidence is, again, \mbox{{\it{m}}\rm{UB}}_{n}=\mbox{\rm UB}({\mbox{\mathbf{s}}}_{0})=\gamma^{n/2}.
If is odd we must have the alternating sequence of length , followed by . We again find \mbox{{\it{m}}\rm{UB}}_{n}=\mbox{\rm UB}({\mbox{\mathbf{s}}}_{0})=\gamma^{(n-1)/2}\delta^{\prime}.
Appendix C Proofs for § 4.2.1
**Proof of TheoremĀ 7. ** Using Mathematica (Wolfram Research, Inc., 2018), we obtain expansions
[TABLE]
with
[TABLE]
The expression for is obtained similarly.
Finally, since for all , we trivially have .
**Proof of LemmaĀ 2. ** In (A.1) and (A.2), Mathematica gives
[TABLE]
In particular , . Thus
[TABLE]
where the leading term is positive, so is eventually increasing. Similarly
[TABLE]
with negative leading term, so is eventually concave in . A similar argument shows that, for , is eventually decreasing and convex in . We note that the convergence of to its limit is at a faster rate than for .
The behaviour of is obtained similarly (the limit being approached at rate ).
**Proof of LemmaĀ 3. ** Consider the -part homogeneous decomposition . Now replace each by its homogeneous 2-part decomposition , so creating the -part homogeneous decomposition .
By CorollaryĀ 3 and (25) we see that uUB decreases on making these replacements.
The argument of § 3.4 shows that oLB is increased by these replacements.
To show the result for oUB it is enough to show that for a two-term homogeneous decomposition with . That is to say,
[TABLE]
or equivalently
[TABLE]
Noting that and , this becomes
[TABLE]
equivalent to , which holds since, by (51) and (52), by assumption.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Collier (2011) Collier, David. 2011. āUnderstanding Process Tracing.ā PS: Political Science & Politics 44(4):823ā830.
- 3Dawid, Musio and Fienberg (2016) Dawid, A. Philip, Monica Musio and Stephen E. Fienberg. 2016. āFrom Statistical Evidence to Evidence of Causality.ā Bayesian Analysis 11:725ā752.
- 4Dawid (2011) Dawid, Alexander Philip. 2011. The RĆ“le of Scientific and Statistical Evidence in Assessing Causality. In Perspectives on Causation , ed. Richard Goldberg. Oxford: Hart Publishing pp. 133āā147.
- 5Dawid, Musio and Murtas (2017) Dawid, Alexander Philip, Monica Musio and Rossella Murtas. 2017. āThe Probability of Causation.ā Law, Probability and Risk 16:163ā179.
- 6Dawid, Murtas and Musio (2016) Dawid, Alexander Philip, Rossella Murtas and Monica Musio. 2016. Bounding the Probability of Causation in Mediation Analysis. In Topics on Methodological and Applied Statistical Inference . Springer pp. 75ā84.
- 7Gelman and Imbens (2013) Gelman, Andrew and Guido Imbens. 2013. Why Ask Why? Forward Causal Inference and Reverse Causal Questions. Working Paper 19614 National Bureau of Economic Research. https://www.nber.org/papers/w 19614
- 8Mahoney (2012) Mahoney, James. 2012. āThe Logic of Process Tracing Tests in the Social Sciences.ā Sociological Methods & Research 41(4):570ā597.
