Abstracting Causal Models
Sander Beckers, Joseph Y. Halpern

TL;DR
This paper explores a hierarchy of abstraction concepts in causal models, from exact transformations to strong abstractions, providing a unified framework and showing how micro-variable combinations fit into this hierarchy.
Contribution
It introduces a unified hierarchy of causal model abstractions and demonstrates that micro-variable aggregation is an instance of strong abstraction.
Findings
Procedures for micro-variable aggregation are instances of strong abstraction.
The hierarchy clarifies relationships among different causal abstraction notions.
All examples by Rubenstein et al. fit into the strong abstraction framework.
Abstract
We consider a sequence of successively more restrictive definitions of abstraction for causal models, starting with a notion introduced by Rubenstein et al. (2017) called exact transformation that applies to probabilistic causal models, moving to a notion of uniform transformation that applies to deterministic causal models and does not allow differences to be hidden by the "right" choice of distribution, and then to abstraction, where the interventions of interest are determined by the map from low-level states to high-level states, and strong abstraction, which takes more seriously all potential interventions in a model, not just the allowed interventions. We show that procedures for combining micro-variables into macro-variables are instances of our notion of strong abstraction, as are all the examples considered by Rubenstein et al.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms
Abstracting Causal Models
Sander Beckers
Dept. of Philosophy and Religious Studies
Utrecht University
Utrecht, Netherlands
[email protected] &Joseph Y. Halpern
Dept. of Computer Science
Cornell University
Ithaca, NY 14853
Abstract
We consider a sequence of successively more restrictive definitions of abstraction for causal models, starting with a notion introduced by Rubenstein et al. (?) called exact transformation that applies to probabilistic causal models, moving to a notion of uniform transformation that applies to deterministic causal models and does not allow differences to be hidden by the “right” choice of distribution, and then to abstraction, where the interventions of interest are determined by the map from low-level states to high-level states, and strong abstraction, which takes more seriously all potential interventions in a model, not just the allowed interventions. We show that procedures for combining micro-variables into macro-variables are instances of our notion of strong abstraction, as are all the examples considered by Rubenstein et al.
1 Introduction
We can and typically do analyze problems at different levels of abstraction. For example, we can try to understand human behavior by thinking at the level of neurons firing in the brain or at the level of beliefs, desires, and intentions. A political scientist might try to understand an election in terms of individual voters or in terms of the behavior of groups such as midwestern blue-collar workers. Since, in these analyses, we are typically interested in the causal connections between variables, it seems reasonable to model the various levels of abstraction using causal models (?; ?). The question then arises whether a high-level “macro” causal model (e.g., one that considers beliefs, desires, and intentions) is a faithful abstraction of a low-level “micro” model (e.g., one that describes things at the neuronal level). What should this even mean?
Perhaps the most common way to approach the question of abstraction is to cluster “micro-variables” in the low-level model into a single “macro-variable” in the high-level model (?; ?; ?). Of course, one has to be careful to do this in a way that preserves the causal relationships in the low-level model. For example, we do not want to cluster variables , , and into a single variable if different settings and such that lead to different outcomes. Rubenstein et al. (?) (RW+ from now on) provided an arguably more general approach to abstraction. They defined a notion of an exact transformation between two causal models. They suggest that if there is an exact transformation from causal model to , then we should think of as an abstraction of , so that is the high-level model and is the low-level model.
Abstraction almost by definition involves ignoring inessential differences. So it seems that RW+ would want to claim that if there exists an exact transformation from and , then and are the same, except for “inessential differences”. This leads to the obvious question: what counts as an inessential difference? Of course, this is to some extent in the eye of the beholder, and may well depend on the application. Nevertheless we claim that the notion of “inessential difference” implicitly encoded in the definition of exact transformation is far too broad. As we show by example, there are models that we would view as significantly different that are related by exact transformations. There are two reasons for this. The first is that, because RW+ consider probabilistic causal models, some differences that are intuitively significant are overlooked by considering just the right distributions. Second, besides a function that maps low-level states to high-level states, RW+ define a separate mapping of interventions that can mask what we view as essential differences between the interventions allowed at the low level and the high level.
In this paper, we consider a sequence of successively more restrictive definitions of abstraction, starting with the RW+ notion of exact transformation, moving to a notion of uniform transformation that applies to deterministic causal models and does not allow differences to be hidden by the “right” choice of distribution, and then to abstraction, where the mapping between the interventions is determined by the mapping from low-level states to high-level states, and strong abstraction, which takes more seriously all potential interventions in a model, not just the allowed interventions. Finally, we define constructive abstraction, which is the special case of strong abstraction where the mapping from low-level states to high-level states partitions the low-level variables and maps each cell to a unique high-level variable. As we show, procedures for combining micro-variables into macro-variables are instances of constructive abstraction, as are all the other examples considered by RW+. While we view constructive abstraction as the notion that is likely to be the most useful in practice, as we show by example, the weaker notions of strong abstraction and abstraction are of interest as well.
Not surprisingly, the idea of abstracting complicated low-level models to simpler high-level models that in some sense act the same way has also been considered in other settings; see, for example, (?). While we are trying to capture these intuitions as well, considering a setting that involves causality adds new subtleties.
2 Probabilistic causal models: a review
In this section we review the definition of causal models. Much of the discussion is taken from (?).
Definition 2.1
:* *A signature is a tuple , where is a set of exogenous variables (intuitively, variables that represent factors outside the control of the model), is a set of endogenous variables (intuitively, variables whose values are ultimately determined by the values of the endogenous variables), and , a function that associates with every variable a nonempty set of possible values for (i.e., the set of values over which ranges). If , denotes the crossproduct .
Definition 2.2
:* *A basic causal model is a pair , where is a signature and defines a function that associates with each endogenous variable a structural equation giving the value of in terms of the values of other endogenous and exogenous variables (discussed in more detail below). A causal model is a tuple , where is a basic causal model and is a set of allowed interventions (also discussed in more detail below).
Formally, the equation maps to , so determines the value of , given the values of all the other variables in . Note that there are no functions associated with exogenous variables; their values are determined outside the model. We call a setting of values of exogenous variables a context.
The value of may depend on the values of only a few other variables. depends on in context if there is some setting of the endogenous variables other than and such that if the exogenous variables have value , then varying the value of in that context results in a variation in the value of ; that is, there is a setting of the endogenous variables other than and and values and of such that .
In this paper we restrict attention to recursive (or acyclic) models, that is, models where, for each context , there is a partial order on variables such that if depends on in context , then . In a recursive model, given a context , the values of all the remaining variables are determined (we can just solve for the value of the variables in the order given by ). A model is strongly recursive if the partial order is independent of ; that is, there is a partial order such that for all contexts . In a strongly recursive model, we often write the equation for an endogenous variable as ; this denotes that the value of depends only on the values of the variables in , and the connection is given by . For example, we might have .111RW+ do not restrict to acyclic models. Rather, they make the weaker restriction that, for every setting of the causal variables, with probability 1, there is a unique solution to the equations. In the deterministic setting, the analogous restriction would be to consider causal models where there is a unique solution to all equations. None of our definitions or results changes if we allow this more general class of models. We have restricted to recursive models only to simplify the exposition.
An intervention has the form , where is a set of endogenous variables. Intuitively, this means that the values of the variables in are set to . The structural equations define what happens in the presence of external interventions. Setting the value of some variables to in a causal model results in a new causal model, denoted , which is identical to , except that is replaced by : for each variable , (i.e., the equation for is unchanged), while for each in , the equation for is replaced by (where is the value in corresponding to ).
The set of interventions can be viewed as the set of interventions that we care about for some reason or other. For example, it might consist of the interventions that involve variables and values that are under our control. In (?; ?), only basic causal models are considered (and are called causal models). RW+ added the set of allowed interventions to the model. We consider allowed interventions as well, since it seems useful when considering abstractions to describe the set of interventions of interest. We sometimes write a causal model as , where is the basic causal model , if we want to emphasize the role of the set of interventions.
Given a signature , a primitive event is a formula of the form , for and . A causal formula (over ) is one of the form , where
- •
is a Boolean combination of primitive events,
- •
are distinct variables in , and
- •
.
Such a formula is abbreviated as . The special case where is abbreviated as . Intuitively, says that would hold if were set to , for .
A causal formula is true or false in a causal model, given a context. As usual, we write if the causal formula is true in causal model given context . The relation is defined inductively. if the variable has value in the unique (since we are dealing with recursive models) solution to the equations in in context (i.e., the unique vector of values that simultaneously satisfies all equations in with the variables in set to ). The truth of conjunctions and negations is defined in the standard way. Finally, if .
To simplify notation, we sometimes write to denote the unique element of such that . Similarly, given an intervention , denotes the unique element of such that .
A probabilistic causal model is just a causal model together with a probability on contexts. We often abuse notation slightly and denote the probabilistic causal model as , where is the underlying deterministic causal model .
RW+ worked with probabilistic causal models, but added one more feature and made a restrictive assumption. They consider models that place a partial order on interventions. However, they (and we) consider only what they call the natural partial order, where if is a subset of and is the corresponding subset of , so we do not explicitly introduce the partial order as a component of the model here. In addition, RW+ assume that for each endogenous variable , there is a unique exogenous variable such that is the only exogenous variable on whose value depends, and if . We say that a causal model has unique exogenous variables (uev) if this is the case.
Assuming that a causal model has uev makes sense if we think of as the noise variable corresponding to . However, this assumption is not always appropriate (e.g., if we take the temperature to be exogenous, and temperature can affect a number of endogenous variables). Not surprisingly, in (non-probabilistic) causal models, assuming uev entails a significant loss of generality. In particular, we cannot express the correlation in values between two endogenous variables due to being affected by a common exogenous variable. However, the uev assumption can be made essentially without loss of generality in probabilistic causal models, as the lemma below shows.
Definition 2.3
:* *Two probabilistic causal models and are equivalent, written , if , for all , , and all causal formulas have the same probability of being true in both and ; that is, for all causal formulas , we have .
Lemma 2.4
:* Given a probabilistic causal model , there is a probabilistic causal model with uev such that .222Proofs can be found in the appendix.*
All the models that we consider in our examples have uev. Whatever problems there are with the RW+ notions, they do not arise from the assumption that models have uev.
3 From exact transformations to abstractions
In this section, we review the RW+ definition, point out some problems with it, and then consider a sequence of strengthenings of the definition.
3.1 Exact transformations
We need some preliminary definitions. First observe that, given a probabilistic model , the probability on can also be viewed as a probability on (since each context in determines a unique setting of the variables in ); more precisely,
[TABLE]
In the sequel, we freely view as a distribution on both and ; the context should make clear which we intend. Each intervention also induces a probability on in the obvious way:
[TABLE]
One last piece of notation: We are interested in when a high-level model is an abstraction of a low-level model. In the sequel, we always use and to denote deterministic causal models (where the and stand for low level and high level, respectively). We write to denote a probabilistic causal model that extends .
With this background, we can give the RW+ definition of exact transformation. Although the definition was given for probabilistic causal models that satisfy uev, it makes sense for arbitrary probabilistic causal models.
Definition 3.1
:* * If and are probabilistic causal models, is an order-preserving, surjective mapping (where is order-preserving if, for all interventions in such that according to the natural order, we have ), and , then is an exact (-)-transformation of if, for every intervention , we have
[TABLE]
where is the “pushforward” distribution on determined by and :
[TABLE]
The key point here is the requirement that . Roughly speaking, it says that if you start from the low-level intervention and move up to the high-level model following two distinct routes, you end up at the same place.
The first route goes as follows. The intervention changes the probability distribution on low-level outcomes, giving rise to (where an “outcome” is a setting of the endogenous variables). This distribution can be moved up to the high level by applying , giving , which is a distribution on high-level outcomes.
The second route goes as follows. From the low-level intervention we move up to a high-level intervention by applying , giving . This intervention changes the probability distribution on high-level outcomes, giving rise to . To be an exact transformation means that this distribution and the previous one are identical, for all interventions .
Despite all the notation, we hope that the intuition is clear: the intervention acts the same way in the low-level model as the intervention does in the high level-model. (See RW+ for more discussion and intuition.) The following example illustrates Definition 3.1.
Example 3.2
:* * Consider a simple voting scenario where we have 99 voters who can either vote for or against a proposition. The campaign for the proposition can air some subset of two advertisements to try to influence how the voters vote. The low-level model is characterized by endogenous variables , , , , and , and exogenous variables , . denotes voter ’s vote, so if voter votes for the proposition, and if voter votes against. denotes whether add is run, and denotes the total number of votes for the proposition. determines how voter votes as a function of which ads are run for , while and determine and , respectively.
We can cluster the voters into three groups: –, –, –. For example, the first group might represent older, wealthy voters; the second group might represent soccer moms; and the third group might represent young singles. Members of the same group are affected by the ads in the same way, meaning that for all , and all , that belong to the same group. The high-level model replaces the variables by variables , , and , representing the sum of the votes of each group, it replaces by , and replaces by a binary variable that just indicates who won. The only interventions allowed in the low-level model are interventions to the variables and .
We now have an obvious map from to that maps a low-level state to a high-level state by taking , , and to be the total vote of the corresponding groups; the map is just the identity. Given a probability on , there is an obvious probability on such that is an exact transformation of . Note that it is critical here that we don’t allow interventions on the individual variables at the low level. For example, it is not clear to what high-level intervention should map the low-level intervention .
RW+ discuss three applications of exact transformations:
- •
a model from which some variables are marginalized;
- •
moving from the micro-level to the macro-level by aggregating groups of variables;
- •
and moving from a time-evolving dynamical process to a stationary equilibrium state.
We review the details of their second application here, just to show how it plays out in our framework.
Example 3.3
:* * Let be a causal model with endogenous variables and , exogenous variables and , and equations for and for , where is an matrix, and there exists an such that each column of the matrix sums to . Finally, the intervention set is .
Let be a model with endogenous variables and , exogenous variables and , equations and , and intervention set . Consider the following transformation that averages and :
[TABLE]
Given a probability on (the contexts in the low-level model ), if we take and , then is an exact (-)-transformation of for the obvious choice of .
3.2 Uniform transformations
As the following example shows, much of the work to ensure that a transformation is an exact transformation can be done by choosing appropriate distributions and . This leads to cases where is an exact transformation of although it is hard to think of as a high-level abstraction of .
Example 3.4
:* * For , let be a deterministic causal model with signature ; let be a fixed context in ; let be such that ; let consist only of the empty intervention; let put probability 1 on ; let map all elements of to ; and let be the identity map from to . Clearly is an exact --transformation of .
The fact that each of and is an exact transformation of the other, despite the fact that the models are completely unrelated, suggests to us that exact transformations are not capturing the essence of abstraction. Roughly speaking, what is happening here is that a high-level model can be arbitrary in contexts that do not lead to settings that have positive probability for some allowed low-level intervention. This means that if there are few allowed low-level interventions or few contexts with positive probability, then there are very few constraints on . We end up with high-level models that should not (in our view) count as abstractions of . We can address this concern by strengthening the notion of exact transformation to require it to hold for all distributions .
Definition 3.5
:* * If and are deterministic causal models, is an order-preserving, surjective mapping , and , then is a uniform (-)-transformation of if, for all , there exists such that is an exact (-)-transformation of .
As we pointed out earlier, since RW+ assume uev, the probability distribution in general might do a lot of work to capture correlations between values of endogenous variables. It makes sense to consider arbitrary distributions if we drop the uev assumption (as in fact we do).
In Example 3.4 it is easy to see that neither nor is a uniform transformation of the other. On the other hand, in Example 3.2, we do have a uniform transformation.
Considering uniform transformations has other nice features. For one thing, it allows us to derive from a mapping from to that “explains” how and are related. More precisely, not only do we know that, for the appropriate , for all distributions there exists such that is an exact (-)-transformation of , we can take to be (i.e., the pushforward of under ).
Proposition 3.6
:* If is a uniform (-)-transformation of and is countable, then there exists a function such that, for all distributions on , is an exact (-)-transformation of .*
The next result provides a characterization of when is a uniform (-)-transformation of .
Definition 3.7
:* * is compatible with if, for all and ,
[TABLE]
Theorem 3.8
:* Given causal models and , , and an order-preserving surjective function , the following are equivalent:*
- (a)
* is a uniform (-)-transformation of ;*
- (b)
there exists a function compatible with .
It is easy to check that uniform transformations are closed under composition.
Theorem 3.9
:* If is a uniform (-)-transformation of and is a uniform (-)-transformation of , then is a uniform --transformation of .*
3.3 Abstraction
Although the notion of a uniform transformation deals with some of the problems we see with the RW+ notion of exact transformation, it does not deal with all of them, as the following two examples show.
Example 3.10
:* * Let and be deterministic causal models, both with endogenous binary variables and and corresponding binary exogenous variables and .333A variable is binary if its range is . In , the equations are and . ( plays no role in the equations in . We added it just to make a model that has uev and thus show that having uev is not an issue here.) In , the equations are and . The only allowed interventions in are , for ; the only allowed interventions in are , for . It is easy to see that is a uniform transformation of and that is a uniform transformation of . If and are the maps showing that is a uniform transformation of , then we can take both and to be the identity, maps to , while maps to . But this does not match our intuition that if is an abstraction of , then is a higher-level description of the situation than . Whatever “higher-level description” means, we would expect that if and are different, then we should not have and being abstractions of each other.
What is the problem here? If we just focus on these sets of allowed interventions, then there is in fact no problem. and do, in a sense, work the same way as far as these allowed interventions go. However, the mappings and seem to be in conflict with taking and to be the identity. Given that is the identity mapping, we would expect to also be the identity mapping. Why should map to something other than here? It is easy to see that if we take to also be the identity mapping then the problem disappears, as we no longer have uniform transformations between these two models. More generally, we define below a natural way in which a mapping on states induces a mapping on allowed interventions. But even when is well-behaved there exist counterintuitive examples of uniform transformations.
Example 3.11
:* * Given a model , let be a model that is like except that has a new exogenous binary variable and a new binary endogenous variable . Modify the equations in so that is the only parent of , but is the parent of every other endogenous variable in (and thus of every endogenous variable in ). Take . If , then all equations in are identical to those in . However, if , then all equations behave in some arbitrary way (the exact way they behave is irrelevant). Define by taking . We claim that is a uniform (-)-transformation of , where is the identity. Given a distribution on , define so that its marginal on the variables in is and . It is easy to see that is an exact (-)-transformation of , regardless of how the equations in are defined if .
What goes wrong in this example is that the high level is more detailed than the low level, contrary to what one expects of an abstraction. Concretely, introducing the extra variable allows to capture a whole range of possibilities that have no counterpart whatsoever in . That doesn’t sound right (at least to us). We can fix this by simply demanding that our abstraction function be surjective.
Combining both observations, we define a natural way in which an abstraction function determines which sets of interventions should be allowed at the low level and the high level, and the mapping between them.
Definition 3.12
:* * Given a set of endogenous variables, , and , let
[TABLE]
Given , define if , , and (as usual, given , we define ). It is easy to see that, given and , there can be at most one such and . If there does not exist such a and , we take to be undefined. Let be the set of interventions for which is defined, and let .
It is straightforward to check that in Example 3.2, is defined on interventions to , , and on these interventions it is the identity (and thus agrees with as defined in that example), but it is also defined on simultaneous interventions on , , and , and on (as well as combinations of these interventions). In Example 3.3, the interventions on which is defined are precisely those in the set of that example; on these interventions, .
Note that if is surjective, then it follows that , and for all , .
Definition 3.13
:* * is a -abstraction of if the following conditions hold:
- •
is surjective;
- •
there is a surjective function compatible with ;
- •
.
As intended, Examples 3.10 and 3.11 are not -abstractions; on the other hand, in Examples 3.2 and 3.3, is a -abstraction of .
Unlike exact transformations, -abstraction is a relation between causal models: the mapping is determined by , and there is no need to specify a probability distribution.
Proposition 3.14
:* If is a -abstraction of , then is a uniform -tranformation of .*
We can strengthen the notion of -abstraction to define a relation on basic causal models, by considering the largest possible sets of allowed interventions.
Definition 3.15
:* * If and are basic causal models, then is a strong -abstraction of if , the set of all high-level interventions, and is a -abstraction of .
The notion of strong -abstraction provides a clean, powerful relation between basic causal models. However, there are applications where the two additional requirements that make an abstraction strong are too much to ask. In the following example, neither requirement is satisfied.
Example 3.16
:* * Consider an object in the earth’s gravitational field. On the low level (), there are three endogenous variables: (velocity), (height), and (mass), and three corresponding exogenous variables, , , and . The equations in are , , and . The high level captures the object’s current energy. contains endogenous variables (kinetic energy) and (potential energy), and two corresponding exogenous variables, and . The equations in are and . We define using the standard equations for kinetic energy and gravitational potential energy, so . It is easy to see that is a surjection onto . We claim that is not a strong -abstraction of . To see why, consider interventions of the form for . Applying Definition 3.12, we get that , since ; by choosing and appropriately, we can still get all values in , as long as . We also clearly have that maps the empty intervention in to the empty intervention in . With this, we can already show that is not a uniform - transformation of . Suppose that is a probability on that puts probability 1 on . For condition (1) in Definition 3.1 to hold for the intervention , the probability on must put probability 1 on . But (1) must hold for all choices of . This is clearly impossible.
Although is not a strong -abstraction of , we can easily construct a sensible and useful -abstraction between these models by simply not allowing interventions of the form in the low-level model. Concretely, if we define as containing the empty intervention and all interventions of the form , then maps this to the set that contains the empty intervention and all interventions of the form .
As the following example shows, there also exist interesting cases where only the first requirement of Definition 3.15 is not satisfied. Roughly speaking, this is because some high-level variables are not logically independent, so not all high-level interventions are meaningful.
Example 3.17
:* * Suppose that we have a grid of pixels, each of which can be black or white. In the low-level model, we have 10,000 endogenous variables , for , and 10,000 corresponding exogenous variables for , with the obvious equations . We would expect there to be other variables that are affected by the s (e.g., what a viewer perceives), but for ease of exposition, we ignore these other variables in this example and focus only on the variables. Suppose that all we care about is how many of the pixels in the upper half of the grid are black and how many pixels in the left half of the grid are black. Thus, in the high-level model, we have variables and whose range is . Because of the dependencies between and , there is a single exogenous variable that determines their values, which are pairs such that and . Now we have an obvious map from low-level states to high-level states. We claim that is a -abstraction of , where consists of the empty intervention and interventions that simultaneously set all the variables in the upper half and left half (i.e., all variables with or ) and an arbitrary subset of the variables in the bottom right. Given a nonempty intervention of this form, , where is the number of variables set to 1 with and is the number of variables set to 1 with ; how the variables in the bottom right are set in is irrelevant. Thus, consists of interventions of the form , where and . It is straightforward to check that there is no low-level intervention such that . For suppose that . Then . This means that for some such that , which is a contradiction. A similar argument shows that no intervention of the form can be in . It is straightforward to check that is a uniform -transformation of , so is a -abstraction of , however it is clearly not a strong -abstraction of .
The problem here is that although has variables and , we can only intervene on them simultaneously. It may make sense to consider such interventions if we want a visual effect that depends on both the number of black pixels in the upper half and the number of black pixels in the left half. But it is worth noting that if we consider a high-level model with only a single variable that counts the number of pixels that are black in the upper half and the left half altogether, then is a strong -abstraction of with the obvious map .
Lastly, we present an example where the second requirement of Definition 3.15 is not satisfied.
Example 3.18
:* * Let be the model with the following equations, , , and , where all variables are binary. Let be a model with two equations and , where again all variables are binary. Define by taking . We leave it to the reader to verify that and .
We claim that is not a -abstraction of , and thus that is not a strong -abstraction of . To see why, note that both the low-level intervention and the empty intervention are mapped to the empty intervention by . Now suppose that we have a prior such that , , where , and , , are independent. Thus, . Applying condition (1) in Definition 3.1 to the empty intervention, we must have . On the other hand, . Since is the empty intervention, we must also have . This can’t happen unless , so is not a uniform -transformation of , and thus cannot be a -abstraction.
Now define so that the only allowable low-level interventions are ones where (i.e., we allow all interventions of the form where is one of the components of ; in particular, we do not allow the empty low-level intervention). Then clearly is a -abstraction of , where we have that .
3.4 From micro-variables to macro-variables
Roughly speaking, the intuition for clustering micro-variables into macro-variables is that in the high-level model, one variable captures the effect of a number of variables in the low-level model. This makes sense only if the low-level variables that are being clustered together “work the same way” as far as the allowable interventions go. The following definition makes this precise.
Definition 3.19
:* * is a constructive -abstraction of if is a strong -abstraction of and, if , then there exists a partition of , where are nonempty, and mappings for such that ; that is, , where is the projection of onto the variables in , and is the concatenation operator on sequences. is a constructive abstraction of if it is a constructive -abstraction of for some .
In this definition, we can think of each as describing a set of microvariables that are mapped to a single macrovariable . The variables in (which might be empty) are ones that are marginalized away.
By definition, every constructive -abstraction is a strong -abstraction. We conjecture that a converse to this also holds: that is, if is a strong -abstraction of , that perhaps satisfies a few minor technical conditions, then it will in fact be a constructive -abstraction of . However, we have not proved this result yet.
We suspect that constructive -abstractions are the notion of abstraction that will arise most often in practice. All three of the examples discussed by RW+ (one of which is Example 3.3) are constructive abstractions. We can easily extend Example 3.2 by adding low-level and high-level interventions to make it a constructive abstraction as well.
4 Discussion and Conclusions
We believe that getting a good notion of abstraction will be critical in allowing modelers to think at a high level while still being faithful to a more detailed model. As the analysis of this paper shows, there are different notions of abstraction, that relate causal models at different levels of detail. For example, -abstraction is a relation between basic causal models, while a uniform - transformation relates causal models, and RW+’s notion of exact transformation relates probabilistic causal models. Although our final notion of constructive abstraction is the cleanest and arguably easiest to use, we believe that there exist applications for which the weaker abstraction relations are more appropriate. More work needs to be done to understand which abstraction relation is most suitable for a given application. We hope that the definitions proposed here will help clarify the relevant issues. They should also shed light on some of the recent discussions of higher-level causation in communities ranging from physics to philosophy (see, e.g., (?; ?).
In fact, we see the current paper as laying the formal groundwork for several interesting topics that we intend to explore in future work. First, we hope to generalize the abstraction relation to a notion of approximate abstraction, given that in most real-life settings the mappings between different levels are only approximately correct. Second, our framework makes it possible to explore whether the notion of actual causation could be applied across causal models, rather than merely within a single causal model. For example, it seems to be useful to think of an event in a low-level model as causing an event in a high-level model. Third, abstracting causal models of large complexity into simpler causal models with only a few variables is of direct relevance to the increasing demand for explainable AI, for in many situations the problem lies not in the fact that no causal model is available, but in the fact that the only available model is too complicated for humans to understand.
Acknowledgments:
Halpern was supported in part by NSF grants IIS-1703846 and IIS-1718108, ARO grant W911NF-17-1-0592, and a grant from the Open Philanthropy project. Beckers was supported by the grant ERC-2013- CoG project REINS, nr. 616512. Some of this work was done while Beckers was a postdoc at Cornell University, supported by the Belgian American Educational Foundation. We thank Frederick Eberhardt, Marc Denecker, and the reviewers of the paper for many useful comments.
Appendix A Appendix: Proofs
Proof of Lemma 2.4: Let and define as follows. has one exogenous variable for each endogenous variable in . Taking , we take to be the exogenous variable corresponding to . Let . We take for (so the set of possible values for each variable is the set of all contexts in ). If , we define . (Note that here .) Thus, it is clear that the only exogenous variable that the value of in depends on is , so has uev, as desired. places probability 0 on a context unless , and . It is almost immediate that, with these choices, .
Proof of Proposition 3.6: Suppose that is a uniform (-)-transformation of . Say that and correspond if for all interventions .
We claim that for all , there exists at least one that corresponds to . To see this, fix . Let give probability 1. Then for each intervention , the distribution gives probability 1 to . Let be a probability distribution such that is an exact (-)-transformation of . Since , it follows that gives probability 1 to , and hence also to the set . Since there are only countably many interventions in , also has probability 1, and thus must be nonempty. Choose . By construction, corresponds to .
Define by taking , where corresponds to . (If more than one tuple corresponds to , then one is chosen arbitrarily.) It is now straightforward to check that is an exact (-)-transformation of . We leave details to the reader.
Proof of Theorem 3.8: To show that (a) implies (b), suppose that is a uniform (-)-transformation of . Let be the function guaranteed to exist by Proposition 3.6. We must show that for all , ,
[TABLE]
Fix . From the construction of in the proof of Proposition 3.6, it follows that and correspond, which, by definition, means that for all interventions .
To show that (b) implies (a), suppose that (b) holds. Given a distribution on , let . It suffices to show that is an exact (-)-transformation of . Thus, we must show that for every intervention , we have Straightforward computations now show that
[TABLE]
as desired.
Proof of Theorem 3.14: This follows immediately from Theorem 3.8 once we show that is order-preserving (it is surjective by definition). Suppose that and . Thus is a subset of and is the corresponding subset of . Suppose that . We must show that .
By definition of , . So . But and ; therefore . It immediately follows that .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Binahashemi, de Giacomo, and Lespérance 2017] Binahashemi, B.; de Giacomo, G.; and Lespérance, Y. 2017. Abstraction in situation calculus action theories. In Proc. Thirty-First National Conference on Artificial Intelligence (AAAI ’17) , 1048–1055.
- 2[Chalupka, Eberhardt, and Perona 2015] Chalupka, K.; Eberhardt, F.; and Perona, P. 2015. Visual causal feature learning. In Proc. 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2015) , 181–190.
- 3[Chalupka, Eberhardt, and Perona 2016] Chalupka, K.; Eberhardt, F.; and Perona, P. 2016. Multi-level cause-effect systems. In Proc. 19th Int. Conf. on Artificial Intelligence and Statistics (AISTATS 2016) , 361–369.
- 4[Fenton-Glynn 2017] Fenton-Glynn, L. 2017. Is there high-level causation? Ergo 4(30):845–898.
- 5[Halpern and Pearl 2005] Halpern, J. Y., and Pearl, J. 2005. Causes and explanations: a structural-model approach. Part I: Causes. British Journal for Philosophy of Science 56(4):843–887.
- 6[Halpern 2016] Halpern, J. Y. 2016. Actual Causality . Cambridge, MA: MIT Press.
- 7[Hoel, Albantakis, and Tononi 2013] Hoel, E. P.; Albantakis, L.; and Tononi, G. 2013. Quantifying causal emergence shows that macro can beat micro. Proc. National Academy of Science 110(49):19790–19795.
- 8[Iwasaki and Simon 1994] Iwasaki, Y., and Simon, H. A. 1994. Causality and model abstraction. Artificial Intelligence 67(1):143–194.
