Weak equivalence of local independence graphs
S{\o}ren Wengel Mogensen

TL;DR
This paper investigates the complexity of determining Markov equivalence in local independence graphs of multivariate stochastic processes, introduces weaker equivalence relations, and develops feasible algorithms for their analysis.
Contribution
It proves coNP-completeness of Markov equivalence decision and introduces a hierarchy of weak equivalence relations with practical algorithms and concise representations.
Findings
Deciding Markov equivalence of DMGs is coNP-complete.
Introduces weaker equivalence relations with feasible algorithms.
Provides hierarchical structure linking different equivalence levels.
Abstract
Classical graphical modeling of multivariate random vectors uses graphs to encode conditional independence. In graphical modeling of multivariate stochastic processes, graphs may encode so-called local independence analogously. If some coordinate processes of the multivariate stochastic process are unobserved, the local independence graph of the observed coordinate processes is a directed mixed graph (DMG). Two DMGs may encode the same local independences in which case we say that they are Markov equivalent. Markov equivalence is a central notion in graphical modeling. We show that deciding Markov equivalence of DMGs is coNP-complete, even under a sparsity assumption. As a remedy, we introduce a collection of equivalence relations on DMGs that are all less granular than Markov equivalence and we say that they are weak equivalence relations. This leads to feasible algorithms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Topological and Geometric Data Analysis · Constraint Satisfaction and Optimization
Weak equivalence of local independence graphs
Søren Wengel Mogensen
( Department of Automatic Control, Lund University)
Abstract
Classical graphical modeling of multivariate random vectors uses graphs to encode conditional independence. In graphical modeling of multivariate stochastic processes, graphs may encode so-called local independence analogously. If some coordinate processes of the multivariate stochastic process are unobserved, the local independence graph of the observed coordinate processes is a directed mixed graph (DMG). Two DMGs may encode the same local independences in which case we say that they are Markov equivalent.
Markov equivalence is a central notion in graphical modeling. We show that deciding Markov equivalence of DMGs is coNP-complete, even under a sparsity assumption. As a remedy, we introduce a collection of equivalence relations on DMGs that are all less granular than Markov equivalence and we say that they are weak equivalence relations. This leads to feasible algorithms for naturally occurring computational problems related to weak equivalence of DMGs. The equivalence classes of a weak equivalence relation have attractive properties. In particular, each equivalence class has a greatest element which leads to a concise representation of an equivalence class. Moreover, these equivalence relations define a hierarchy of granularity in the graphical modeling which leads to simple and interpretable connections between equivalence relations corresponding to different levels of granularity.
1 Introduction
The distribution of a multivariate random vector, , induces an independence model, , which is simply the collection of triples, , such that and are conditionally independent given . Graphs are often used as convenient representations of such independence models (Lauritzen, 1996; Maathuis et al., 2019). The graphical theory reflects the fact that conditional independence is symmetric in and , i.e., if and only if . In graphical modeling of multivariate stochastic processes, it is useful to apply a notion of independence that distinguishes between past and present and for this purpose several authors have used local independence, analogously to how conditional independence is used in classical graphical modeling. However, local independence is not symmetric in the above sense and its graphical representation therefore requires a specialized framework. Local independence was first introduced by Schweder (1970) in composable Markov processes and later studied by Aalen (1987) in a broader class of stochastic processes. Didelez (2000, 2008) described graphical modeling of marked point processes based on local independence and Mogensen et al. (2018) extended this theory to Itô processes.
Graphs are said to be Markov equivalent if they represent the same independences, i.e., if they are indistinguishable when observing only the induced independences. Several characterizations of Markov equivalence are available in different classes of graphs representing classical conditional independence (Frydenberg, 1990; Verma and Pearl, 1990b; Spirtes and Verma, 1992; Andersson et al., 1997a, b; Richardson, 1997; Andersson et al., 2001; Zhao et al., 2005; Zhang, 2007; Ali et al., 2009). Mogensen and Hansen (2020) used directed mixed graphs as representations of local independences in partially observed stochastic processes and they characterized Markov equivalence in this class of graphs by proving that each equivalence class contains a greatest element. Their equivalence result also provided a simple approach to visualizing and understanding an entire equivalence class. Mogensen and Hansen (2022) characterized Markov equivalence of directed correlation graphs representing local independence in the presence of correlated noise processes. Recent work studied local independence testing in point processes (Thams and Hansen, 2021) and Christgau et al. (2022) described nonparametric tests of local independence. It is worth noting that local independence is a continuous-time version of discrete-time Granger causality which has been used in graphical models of time series (Eichler and Didelez, 2007, 2010; Eichler, 2012, 2013). The graphical theory of directed mixed graphs and the results in this paper may be applied in both continuous-time and discrete-time stochastic processes (Mogensen and Hansen, 2020, supplementary material).
In graphs representing classical conditional independence, several characterizations of Markov equivalence lead to polynomial-time algorithms for deciding Markov equivalence (e.g., Richardson, 1997; Ali et al., 2009). In the local independence framework, Mogensen and Hansen (2022) proved that deciding Markov equivalence of two directed correlation graphs is coNP-complete which means that we should not expect to find a polynomial-time algorithm in this case. In this paper, we show that deciding Markov equivalence of directed mixed graphs is also coNP-complete. We further show that assuming sparsity of the directed mixed graphs does not generally remedy this. Our results imply that several computational problems that occur naturally when using directed mixed graphs are also computationally hard. For this reason, Markov equivalence in partially observed local independence graphs may not always be a practical notion. Instead, we introduce a class of weak equivalence relations between local independence graphs. We characterize the corresponding equivalence classes and show that they too contain a greatest element. Mogensen and Hansen (2020) argued that the existence of a greatest element leads to a straightforward Markov equivalence theory. We extend this theory to the more general weak equivalences studied in this paper. This allows a simple representation of weak equivalence classes. A subset of the weak equivalence relations may be understood as creating a hierarchy of equivalence relations in which a parameter, , creates a trade-off between the size of the equivalence classes and the computational complexity, leading to a graphical theory which is both useful and practical. This hierarchy also illustrates interpretable connections between equivalence classes across different values of .
The paper is structured in the following way. In Section 2, we introduce necessary terminology and notation. We also describe global Markov properties that connect so-called -separation in graphs to local independence and provide justification for using graphs as representations of local independence. Moreover, we give an example to illustrate the framework and purpose of the paper. In Section 3, we prove that deciding Markov equivalence of directed mixed graphs is computationally hard, even under sparsity restrictions, and we discuss the implications of this result. In Section 4, we introduce the notion of weak equivalence of graphs. We describe its properties and compare it with Markov equivalence. Section 5 proves that, under a regularity condition, every weak equivalence class has a greatest element. Using the main result from the previous section, Section 6 first describes a graph which concisely represents an entire equivalence class. It then describes a hierarchy of certain weak equivalence classes and how they represent different levels of granularity in their description of the underlying graphs. Section 7 discusses algorithmic aspects of weak equivalence, and in Section 8 we briefly outline how results from the previous sections relate to graphical structure learning. Section 9 provides a discussion of the results.
2 Local independence and graphs
The interest in -separation arises from its connection to local independence as formalized through various global Markov properties. We start by defining local independence following the exposition in Christgau et al. (2022). We will give the definition for counting processes, though, it can be extended to other classes of stochastic processes (Didelez, 2008; Mogensen et al., 2018; Mogensen and Hansen, 2022).
We consider a multivariate counting processes, , on a probability space, , and we assume that is observed over some interval . We let denote the set . We use to denote the right-continuous and complete filtration generated by . One can think of as consisting of the information in the coordinate processes in up until time point . For and , we assume that has a -intensity, . The stochastic process is -predictable and is a local -martingale.
Definition 2.1** (Local independence).**
Let and let . We say that is locally independent of given (or simply, that is locally independent of given ) if the local -martingale as defined above is also a local -martingale. For , we say that is locally independent of given if is locally independent of given for all and , and we denote this by .
Christgau et al. (2022) use the term conditional local independence instead of local independence which highlights the fact that Definition 2.1 is analogous to classical conditional independence of random variables. Intuitively, when is locally independent of given , observation of the -process over the interval does not provide additional information other than that contained in when trying to predict if there will be an event in process in the interval .
Local independence was first used by Schweder (1970) in composable Markov processes and later studied by Aalen (1987). Didelez (2000, 2008) described graphical modeling based on local independence. Other work on local independence Markov properties go into more detail (Didelez, 2000, 2008; Mogensen et al., 2018; Mogensen and Hansen, 2022).
Definition 2.2** (Local independence graph).**
We consider a multivariate counting process, , as above, . Its local independence graph is the directed graph, , on nodes such that
[TABLE]
for where indicates the absence of the directed edge from to .
The statement denotes that is locally independent of given , and above we have simply written the singletons and as and , respectively. The implication from left to right in Definition 2.2 is known as the pairwise Markov property. When this property holds, we see that the absence of an edge implies a local independence. The global Markov property allows one to read off more general local independences from a local independence graph using - or -separation (Definition 2.5). This is similar to other classes of graphical models (Maathuis et al., 2019). Several results state conditions for the equivalence of pairwise and global Markov properties (Didelez, 2008; Mogensen et al., 2018).
Local independence is a continuous-time analogue of Granger causality in discrete-time stochastic processes. The results of this paper also applies to Granger-causal graphs, see, e.g., the supplementary material of Mogensen and Hansen (2020) and Eichler (2007).
2.1 Alarm network
We describe an example application based on modeling how alarms propagate through a complex industrial system. Example data is in Figure 2. In this industrial system, a number of process variables (e.g., temperatures and pressures) are measured repeatedly. Each process variable corresponds to an alarm process, and if a measured process is outside the normal range of operations an event occurs in the corresponding alarm process. The stochastic system is described by a -dimensional counting process, ,
[TABLE]
observed over the interval . The coordinate processes in are alarm processes. Process represents exogenous events that feed into the system, e.g., changes in operating conditions, and this process is unobserved. Process is an alarm process, but unavailable for some reason, and the observed processes are those in . We assume that is a local independence graph in the sense of Definition 2.2. Under some regularity conditions, this implies that the global Markov property is satisfied in this graph (Didelez, 2008) and therefore -separation (Definition 2.5) in the graph implies local independence.
The graph in Figure 1 (the latent projection of , see Section C) represents the observable local independences in the sense that for it holds that is -separated from given in if and only if is -separated from given in . The underlying graph of the full system, , is a directed graph while the latent projection is a directed mixed graph. In general, this larger class of graphs is needed to represent the local independences of partially observed multivariate stochastic processes.
Local independence asks the following question. If we are to predict if processes will have an event in the immediate future and we have the information in the past of processes will the information in the past of proesses add anything? This is illustrated visually in Figure 2 with , , and . In this specific example, is -separated from given in and under the global Markov property this implies that the corresponding local independence holds. Therefore, the information in the past of process is superfluous when already accounting for the information in the past of processes .
Several directed mixed graphs may induce the same -separations which means that they represent the same local independences. In this case, we say that they are Markov equivalent. The graph on the right in Figure 1 is the directed mixed equivalence graph of . It represents the entire Markov equivalence class by indicating if an edge is in every Markov equivalent graph (solid), in no Markov equivalent graph (absent), or in only some Markov equivalent graphs (dashed). This is a useful representation, but it may not be a practical one for all applications as it leads to computationally hard problems. In this paper, we trade away some of the expressive power of Markov equivalence to obtain a more feasible notion of equivalence and we show that weaker notions of equivalence remain easily interpretable.
2.2 Graphs
A graph is a pair where is a finite node set and is an edge set. The edge set is a disjoint union, , where is a set of ordered pairs, corresponding to directed edges, , and is a set of unordered pairs, corresponding to bidirected edges, . We use to denote that there is a bidirected edge between and in the graph , or just when it is clear from the context to which graph the statement refers, and we use and analogously. The definition of the node set implies that we allow multiple edges between a pair of nodes, however, the edges between two nodes and is always a subset of . Moreover, and are equivalent while and are different edges. We emphasize that the edge is not shorthand for the two edges and , and the meaning of the bidirected edge is different from that of the two directed edges. This will be clear from subsequent definitions.
We use to denote a generic edge of either type between and , and we say that and are adjacent in when there exists an edge between them, . When there are multiple nodes on each side of the edge, , this means that for all and . We separate such statements by semicolons, ; . We use to mean that or . We say that edges and have a head at , and that has a tail at . If an edge is between and and , we say that is a loop.
We use as a generic node set and let denote the cardinality of , . The graphs described above are directed mixed graphs as formalized in the next definition.
Definition 2.3** (Directed mixed graph (DMG)).**
We say that is a directed mixed graph if its edge set, , consists of directed and bidirected edges.
We say that a DMG is a directed graph (DG) if it has no bidirected edges. A walk between and is an alternating sequence of nodes, and edges
[TABLE]
such that for each , is between and . Let denote the edge above. We will sometimes write a walk as . A walk also specifies an orientation for each edge as one can otherwise not distingush between and . We say that , , is a collider if and both have head at . Otherwise, we say that it is a noncollider. A node may be repeated on a walk, , , and may therefore occur both as a collider and as a noncollider on the same walk. Thus, the property of being a collider/noncollider pertains to the specific instance of the node on the walk. We say that and are endpoints of the walk. Note that endpoints of a walk are neither colliders nor noncolliders. We say that a walk is nontrivial if it has at least one edge. A walk on which no node is repeated is a path.
Let . When is an edge we use to denote the graph , and we use to denote the graph . We say that is complete if it contains ; , and for all , and we say that is empty if . We say that a walk between and is directed from to if every edge on the walk is directed and points towards (the last) , . We say that is an ancestor of in if there exists a directed walk from to , and we allow this walk to be trivial (no edges) meaning that a node is always an ancestor of itself. We define , or simply , to be the set of ancestors of , and for we define . Note that .
Definition 2.4** (-connecting walk).**
We say that a nontrivial walk in a DMG, ,
[TABLE]
is -connecting from to given if , the edge has a head at , every collider is in and no noncollider is in .
The -connecting walks are used in the definition of -separation below which will help us connect DMGs to local independence. Mogensen et al. (2018) and Mogensen and Hansen (2020) defined -separation as an extension to -separation (Didelez, 2000, 2008). One can think of - and -separation as analogous to - and -separation in DAG-based graphical models (Pearl, 2009; Richardson and Spirtes, 2002; Richardson, 2003).
Definition 2.5** (-separation).**
Let and let . We say that is -separated from given in if there is no -connecting walk from any to any given . We write this as , or simply . We say that is a conditioning set.
By definition, is -separated from given if . One should also note that -separation is not symmetric in and in that does not imply , and neither is local independence. This lack of symmetry sets the graphical modeling of local independence apart from the classical graphical modeling of conditional independence (Lauritzen, 1996). In contrast to -separation, -separation cannot be characterized using only paths (Mogensen and Hansen, 2020). It is, however, possible to obtain a characterization using only routes which are a finite subset of all possible walks (see Definition D.1 in Appendix D or Mogensen and Hansen (2020)). The next example illustrates the concept of -connecting walks and -separation in a DMG.
Example 2.6**.**
We consider the DMG, , in Figure 3. The walk is -connecting from to given . It is not -connecting from to given as is a noncollider. On the walk the node is a collider in its first instance and a noncollider in its second. The walk is -connecting from to given , however, the reverse walk, is not -connecting from to given .
We see that is -separated from given in . On the other hand, is not -separated from given as the walk is -connecting.
2.3 Independence models and Markov equivalence
For a fixed stochastic process, , and a DMG, , both local independence and -separation can be thought as ternary relations on a finite set where and denotes power set. We use to denote and we define an abstract independence model, , to be a subset of . Thus, is a collection of triples such that . We say that is an independence model over . When , or are singletons, we will often omit the set notation and write, e.g., instead of .
We use to denote the independence model induced by , that is, the set of -separations that are true in , . Similarly, an independence model can be defined as the set of local independences that hold in the distribution of a multivariate stochastic process. We say that an independence model, , is graphical, if there exist a DMG, , such that .
Definition 2.7** (Markov equivalence).**
Let and be DMGs. We say that and are Markov equivalent if for all it holds that is -separated from given in if and only if is -separated from given in . Equivalently, and are Markov equivalent if . We use to denote the Markov equivalence class of .
Example 2.8**.**
We return to the graph, , in Figure 3. By definition, its independence model, , consists of all triples such that is -separated from given in . It is enough to consider such that and are singletons and as these characterize (Proposition 4.11). We see that is -separated from given , and this is the only -separation of this type in the graph.
2.3.1 Extremal elements of sets of DMGs
Let be a set of DMGs on a common node set, . If , we write , and we say that is a subgraph of , and that is a supergraph of . We write when and . The following definitions are common set-theoretic notions when considering the set with the partial order, .
Definition 2.9** (Maximal element, DMG).**
We say that is a maximal element of if there is no , , such that .
Definition 2.10** (Greatest element, DMG).**
We say that is a greatest element of if for all .
When a greatest element exists, it is unique. It is also maximal, and it is the only maximal element. In this paper, we are mostly concerned with maximal and greatest elements, however, we also define minimal and least elements of sets of DMGs. We say that is a minimal element of if there is no , , such that . We say that is a least element of if for all . The set will most often be an equivalence class in our usage of the above terms, and we sometimes simply say that is a maximal/minimal/greatest/least element when the equivalence class is understood from the context.
Example 2.11**.**
If we consider the set of graphs in Figure 4, we see that graph is the greatest element of as every graph in is a subgraph of , and therefore is also the unique maximal element of . The smaller set does not have a greatest element and graphs and are maximal elements of .
2.3.2 Representation of Markov equivalence classes
We introduce a central result from Mogensen and Hansen (2020). They show that every Markov equivalence class has a greatest element. Section 5 extends this theorem to weak equivalence relations.
Theorem 2.12** **(Greatest element of a Markov equivalence class,
(Mogensen and Hansen, 2020)).
Let be a DMG, and let be its Markov equivalence class. There exists such that for all the edge set of is a subset of the edge set of .
The next example illustrates the utility of this theorem.
Example 2.13**.**
Graphs - in Figure 4 constitute a Markov equivalence class, (for simplicity, we assume that all loops are present, and do not consider Markov equivalent graphs obtained by removing loops). Graph is the greatest element of in the sense that all Markov equivalent graphs are subgraphs of graph . In other words, if a graph in the Markov equivalence class contains the edge , then is also in the graph . This means that we can represent the entire Markov equivalence class using graph E. The edges are the same as in the greatest element. Edges are solid in graph if they are in every Markov equivalent graph and they are dashed if they are in some Markov equivalent graphs, but not in others. Absent edges are not in any graph in the Markov equivalence class. Therefore, graph represents a summary of the information the Markov equivalence class provides on each edge. Moreover, Theorem 2.12 implies that every Markov equivalence class contains a greatest element, and therefore this is a general approach to representing and understanding Markov equivalence classes (Mogensen and Hansen, 2020).
3 Hardness of marginalized local independence graphs
In this section, we argue that certain computational problems in relation to DMGs and Markov equivalence are hard. For this purpose, we give a very short introduction to the concepts from complexity theory that we will need. A decision problem is in coNP if no-instances have certificates which can be evaluated in polynomial time. For instance, if and are not Markov equivalent (they are a no-instance when deciding Markov equivalence) a triple such that is -separated from given in , but not in , may function as a certificate as one can check this specific separation in both graphs and conclude that they are not Markov equivalent. A decision problem is in P if it can be solved by a deterministic Turing machine in polynomial time. A decision problem is coNP-hard if it is at least as hard as any problem in coNP, and it is coNP-complete if it is coNP-hard and in coNP. It is generally believed that P coNP in which case there are no polynomial-time algorithm which can solve a coNP-hard problem. The complement of a decision problem arises from interchanging yes and no. A decision problem is in coNP if and only if its complement is in NP. We now introduce some decision problems relating to DMGs.
Decision problem 3.1** (Markov equivalence in DMGs).**
Let and be DMGs. Are and Markov equivalent?
The development in this paper is partly motivated by the fact that the above decision problem is hard (Corollary 3.3). We can formulate a restricted version of the problem in which the pair of graphs for which to decide Markov equivalence only differ by a single (bidirected or directed) edge, as formalized in Decision problems A.1 (bidirected) and A.2 (directed). These problems are also hard and we prove this in Theorem 3.2. Corollary 3.3 follows immediately from this theorem.
Theorem 3.2**.**
Let be a DMG and let denote an edge. Deciding Markov equivalence of and is coNP-complete (Decision problems A.1 and A.2).
Corollary 3.3**.**
Deciding Markov equivalence of DMGs is coNP-complete (Decision problem 3.1).
Decision problem A.1 has been proven to be coNP-complete (PhD thesis, Mogensen (2020b)) and this was used to obtain the result in Corollary 3.3. We will give a slightly different proof to make the generalization to the proof in the sparse setting more transparent and to also prove that Decision problem A.2 is coNP-complete. The graphs , , and used in the proof of Theorem 3.2 are clearly not sparse, that is, for the size of the node set going to infinity there are nodes with unbounded connectivity (formal definitions of node connectivity are in Subsection 3.1 and Section B). In the next section, we will show that the hardness results remain true under certain sparsity assumptions. We include the proof of the non-sparse result in Theorem 3.2 to illustrate the technique as the more general result can be proved using a similar approach, even if some additional ideas are needed.
Mogensen and Hansen (2022) showed that deciding -separation Markov equivalence of so-called directed correlation graphs (cDGs) is coNP-complete, though only in the non-sparse case. Their proof of coNP-hardness uses a reduction from 3DNF tautology as does the proof of Theorem 3.2. However, their proof is specific to cDGs as it uses a characterization of Markov equivalence which holds in cDGs, but not in DMGs (Mogensen and Hansen, 2022). While a DMG represents the local independences of a partially observed multivariate stochastic process, i.e., some coordinate processes are unobserved, a cDG represents a multivariate stochastic process driven by correlated noise. Mogensen and Hansen (2022) compared DMGs and cDGs further and showed that a Markov equivalence class of cDGs need not have a greatest element.
Proof.
We consider Boolean variables, , and a Boolean formula, ,
[TABLE]
such that is a literal of a variable, that is, either (a positive literal) or (a negative literal). We assume to be in 3DNF form (each conjunction has at most three literals). is the number of conjunctions in the formula and is the number of variables. We define to be the number of factors in the ’th conjunction. Deciding whether is a tautology (evaluates to true for all inputs) is known to be coNP-complete Garey and Johnson (1979) and we will use a reduction from this problem to show coNP-hardness of Decision problems A.1 and A.2.
We construct three graphs, , , and from such that and where is a bidirected edge and is a directed edge. We then show that and are Markov equivalent if and only if is a tautology and that and are Markov equivalent if and only if is a tautology.
First, we define the set .
[TABLE]
We define the node set and is the node set of all three graphs , , and . Note that each literal, , corresponds to two nodes, and .
We now define the edge set . We add ; ; ; . For each node , we add edges and . We also add edges ; . We add edges and for each . We also add all directed and bidirected loops, , for all . We add edges ; ; , and ; as well as . For each , we add and . We add and . For each , we add . Finally, we add for each a directed cycle containing as well as every and corresponding to a positive literal of the variable , and we add a directed cycle containing as well as every and corresponding to a negative literal of the variable . This defines the edge set , . We obtain from by adding the edge , that is, . Note that is an ancestor of in if and only if is an ancestor of in . We obtain from by adding the edge , .
We will first argue that and are Markov equivalent if and only if is a tautology. Assume first that is a tautology and consider a -connecting walk in ,
[TABLE]
Using the fact that all loops are included, we can always find a -connecting walk such that the edge occurs at most once and we assume that this is the case. We can assume that only occurs once on the walk. If , there is a -connecting walk from to with a head at : If , or for some , either or is connecting and can be composed with the subwalk from to to obtain a connecting walk in . If or for some , then is in . Assume instead that ,
[TABLE]
and consider the subwalk from to , . If there is a noncollider on , say , then and . We use this to argue that we can always find a walk from to such that when concatenated with the subwalk from to we obtain a -connecting walk from to . If , we can find a connecting walk from to with a head at by concatenating the subwalk from to with if and if . If for some , we can concatenate with or . If for some , we can concatenate with . If , then we can replace with to obtain a connecting walk in . If , we can concatenate with . If , we can concatenate with . Finally, is not possible as only occurs once on the original walk.
Assume now that is a collider walk. If it goes through a -segment, then the corresponding -segment is open (note that and are in a directed cycle and so are and ). If it goes through the --segment, then for each either or . Let if and otherwise. The formula is a tautology and therefore it evaluates to under this assignment of truth values. Thus, there exists such that for . Assume first that is a positive literal corresponding to the variable . In this case, and , and therefore . Assume instead that is a negative literal corresponding to the variable . In this case, and which means that and . This means that the walk is open for some and this gives us a -connecting walk from to in also in this case.
If instead
[TABLE]
then the same arguments hold.
On the other hand, say that is not a tautology, and consider an assignment, , of truth values such that evaluates to false. Define the set
[TABLE]
In , there is an open, bidirected walk from to through the - segment, and we see that is not -separated from given . On the other hand, consider a walk between and in . The first and last edges on a connecting walk from to given must be bidirected and as , this means that the walk must be a collider walk to be -connecting from to given , and it must go through . If corresponds to a positive literal and it is open (i.e., in ) then the correspond variable is in and . If it corresponds to a negative literal and it is open, then the corresponding variable is [math] in and . This means that each segment must be closed in at least one node as the assignment evaluates to [math]. Therefore, is -separated from given in , and we conclude that and are Markov equivalent if and only if is a tautology.
We now show that and are Markov equivalent. Take any -connecting walk in . Any occurrence of can be replaced by either or , depending on whether . The resulting walk is present and connecting in . On the other hand, consider a -connecting walk from to given in . We start by removing all non-endpoint occurrences of . Say
[TABLE]
If , then can be replaced by . If or if , we can remove the cycle ( we may need to concatenate with to obtain a -connecting walk after removing a cycle). If instead
[TABLE]
we do the same depending on (if then we concatenate the subwalk from to with ). This gives us a -connecting walk in such that is not a non-endpoint node. Finally, if is still on the walk , we must have and this edge can be substituted by . The resulting walk is present in . Every collider is different from and this means that it is in as well. Therefore, this walk is -connecting in . It follows that and are Markov equivalent (regardless of whether is a tautology). Therefore, is a tautology if and only if are Markov equivalent.
The reduction from 3DNF tautology to Markov equivalence of and (or of and ) is done in polynomial time in the number of conjunctions and it follows that Decision problems A.1 and A.2 are coNP-hard. Given a triple , one can decide -separation in polynomial time. If two graphs are not Markov equivalent, then there exists a triple such that -separation holds in one and not in the other. This is a polynomially-sized certificate, and this means that these problems are in coNP, thus, coNP-complete. ∎
Theorem 3.2 shows that deciding Markov equivalence is not computationally feasible for large graphs which hurts the practical applicability of -separation DMGs. We discuss the implications further in Subsection 3.2. We now consider the analogous decision problems in a sparse setting.
3.1 Sparse DMGs
We may ask if the hardness results still apply if we fix the maximal connectivity of each node and let the size of the node set grow. As a formalization of this, we first define a notion of node connectivity based on inseparability. We say that is inseparable from in if there is no such that is -separated from given in (Mogensen et al., 2018). We let denote the set of nodes such that is inseparable from in , and we let denote the set of nodes such that is inseparable from .
Definition 3.4** (Node connectivity in DMG).**
We define as the cardinality of the set and we define as the cardinality of the set . We define as the maximum of and .
We see that the above definitions are invariant under Markov equivalence, i.e., , , and when and are Markov equivalent. One can define other notions of node connectivity in a DMG, in particular based on the edges directly, instead of using separability. However, a DMG in which every node is adjacent with only a small number of nodes may be Markov equivalent with the complete DMG (see Figure 7). Even in a maximal DMG, the lack of an edge between a pair of nodes does not generally imply separability (Appendix B), and therefore connectivity based on separability appears to be a more useful notion of connectivity. Moreover, the graphs are intended as representations of stochastic systems, thus functional sparsity (i.e., sparsity in the implied dependence structure) seems more useful than representational sparsity (sparsity in node adjacency). Appendix B provides more details and examples.
Definition 3.5** (-sparsity).**
Let be a DMG. The maximal connectivity of is defined as . We say that is -sparse if .
We now state a sparse version of Decision problem 3.1.
Decision problem 3.6** (Markov equivalence in -sparse DMGs).**
Let be a nonnegative integer and let and be -sparse DMGs. Are and Markov equivalent?
The following are sparse versions of Theorem 3.2 and Corollary 3.3.
Theorem 3.7**.**
Let , let be an -sparse graph, and let denote an edge. Deciding Markov equivalence of and is coNP-complete (Decision problems A.3 and A.4).
Theorem 3.7 is a stronger version of Theorem 3.2 as it shows that the problem of deciding Markov equivalence of DMGs remains coNP-complete when restricting to sparse DMGs. We discuss the implications in Subsection 3.2.
Corollary 3.8**.**
Let . Deciding Markov equivalence of -sparse DMGs is coNP-complete.
The value may not be what we expect from ‘sparse’ graphical models and two comments are in order. First, the adjacency sparsity (see Section B) of the graphs in the proof are only , also in the maximal Markov equivalent graphs of the graphs used in the proof. Second, the upshot of the corollary is that there exists a finite number such that deciding Markov equivalence of -sparse DMGs is coNP-complete. This means that fixing the value of does not generally lead to computational problems that scale as polynomials in the size of the graph. On the other hand, the so-called -weak equivalences that are introduced in this paper provide polynomial-time algorithms for each fixed (Section 7). Note that results analogous to those of Theorems 3.2 and 3.7 do not hold for ADMGs with -separation. For those, polynomial-time algorithms for Markov equivalence are known, without making sparsity assumptions (Hu and Evans, 2020).
Proof.
We consider a Boolean formula in 3DNF form as in the proof of Theorem 3.2 (see that proof for related notation and terminology). We will define three -sparse graphs , , and and show that and are Markov equivalent if and only if is a tautology while and are always Markov equivalent.
We define to be the smallest integer such that . We first define a number of sets that will be subsets of the node set . Note that these sets are all pairwise disjoint.
[TABLE]
The node corresponds to the Boolean variable and the node corresponds to the negation of . Nodes and both correspond to the literal (see also the proof of Theorem 3.2 for additional explanation). We define
[TABLE]
We now define the node set as a disjoint union,
[TABLE]
We add some intuition on the construction of the graph. The - and -nodes (and their barred versions) are ‘triangular’ in shape and help connect a single node to many more in a sparse manner (see Figure 8). The - and -nodes correspond to literals in the conjunctions of the Boolean formula, . The elements of correspond to variables in , and the elements of to their negation. The - and -components will help connect every node to and to and are copies of the and sets in the sense that is a bijection from to , is a bijection from to , is a bijection from to , and is a bijection from to , though the edges are not exact copies as explained below.
We now define the edge set of . We add bidirected edges for , and analogously for , , and (see Figure 8). Moreover, we add ; ; ; for . We also add ; . We add . We also add and . We add and as well as . We add for each , and for .
For such that or , we add and if and only if was added above. For each , we also add and for each . We also add ; ; and . Note that , , , are not adjacent with . For such that or , we add and if and only if was added above. For each , we also add and for each .
In this proof, we will say that sets , and are line segments. We define
[TABLE]
and we say that is a vertical segment for . ‘Vertical’ refers to the specific visualization of used in Figure 8. The sets, , defined above are disjoint and .
We now add a number of directed edges. For every node , we add . For every node , we add . For each , we connect the nodes in the vertical segment by a directed cycle (any will work). We add directed cycles containing and all and such that is a positive literal of the variable . We add directed cycles containing and all and such that is a negative literal of the variable .
Finally, we add all directed and bidirected loops. The above defines the edge set and we let . Note that the nodes in a vertical segment are connected by a directed cyclic walk for . We also define where and where . Note that in all three graphs, if and and are in different vertical segments, and , respectively, then is bidirected and .
We will first show that and are Markov equivalent if and only if is a tautology. Assume first that is a tautology and consider a -connecting walk from to in ,
[TABLE]
Every node has a self-loop, so it suffices to consider walks where (the edge ) only occurs once. If it does not occur at all the walk is present in as well and connecting (ancestry is the same in and ). Say
[TABLE]
If , then we say that is the order of .
Lemma 3.9**.**
Let be of order . If there is an open walk from to given in or in then the ’th vertical segment , , contains at least one node in .
Proof.
If this is vacuously true as no vertical segment satisfies the condition, and we can assume that . Note that this walk must necessarily pass through a collider in each vertical segment such that which gives the result. To see this, note that removing any vertical segment such that gives us a disconnected graph with in one component and in the other as a vertical segment, , is only adjacent to vertical segments and . When a walk contains a subwalk such that is in and is in , then the connecting edge must be bidirected. If is a collider, we must have and is only an ancestor of nodes in . Otherwise, is an ancestor of a collider in and the same argument applies. ∎
Lemma 3.10**.**
Let be a node in . If there exists an open walk from to in with a head at , then there exists an open walk in with a head at such that every nonendpoint node equals for or for .
Proof.
If , this is immediate. Assume instead that . Choose first the edge if , and otherwise . We concatenate this with the open bidirected path to . Such a path exists as and . This is open since all vertical segments between and must contain at least one node which is in by Lemma 3.9.
If instead we can do as above as and are in the graph. If , then there is an open bidirected path with a head at between and . If or it follows directly. ∎
We split into cases depending on whether .
:
There is an open walk (given ) from with a head at (Lemma 3.10) that we can concatenate with to obtain a connecting walk in .
If instead
[TABLE]
the same argument holds.
:
If we have a subwalk between and with a noncollider, then we can find a connecting path in the following way. Say we have
[TABLE]
such that is a noncollider (note that, ignoring , only has bidirected edges at it, so if we remove -loops). There is necessarily a tail at on one of the adjacent edges, , and . We concatenate the subwalk from to with the open walk from to that has a head at . Lemma 3.10 gives the existence of this walk. This also holds if , , or .
On the other hand, if the subwalk between and has no noncolliders, then either it stays within a line segment or either , , or occur on the subwalk as a nonendpoint. We can assume that is only an endpoint. If occurs as a nonendpoint, then this is a collider and this means that there is an open subwalk from to with a head at which we can concatenate with . If is a collider (other than right before the final ), then we can remove the cycle from to from the walk. In any case, we can find a connecting collider walk in (no noncolliders) such that , , and will each occur once. This means that the subwalk only contains nodes from a single line segment. This segment cannot be , , , nor as is not adjacent with any node in these line segments. If the walk only intersects the -line segment, then it must either go through -nodes or the -nodes, not both, as it has no noncolliders (or such a walk can be found). If it does not visit any - or -nodes, then there is an open walk in the -segment (the analogous walk through the barred versions). Finally, assume it does not visit any -nodes. As is a tautology, there is also a conjunction segment in which is open and connecting from to with a head at . If instead the bidirected walk is in , the result follows, and if and occur in the opposite order on the original -connecting walk, we can use similar arguments.
If the formula is not a tautology, let be an assignment of values such that the formula evaluates to false. We then consider the set
[TABLE]
We also define . We see immediately that is not -separated from given in as the -segment contains an open path from to with a head at and furthermore is in the graph. On the other hand, consider a potential -connecting walk from to in . If is on the walk, it can only return to . It cannot go between bidirected components because the directed cycles are either completely contained in or in its complement. It cannot go through a -component because of the choice of , and we conclude that it cannot be -connecting. In conclusion, and are Markov equivalent if and only if is a tautology.
The arguments in the proof of Theorem 3.2 show that and are Markov equivalent. Arguments similar to those in the proof of Theorem 3.2 furthermore show that Decision problems A.3 and A.4 are coNP-complete.
Careful examination of the graphs reveals that all three are -sparse. ∎
One should note that the graphs in the proof of Theorem 3.7 could also be interpreted as -separation graphs (Didelez, 2008). In this case, the result also holds, i.e., determining -separation Markov equivalence of sparse DMGs is also coNP-complete. To see this one should simply note that -separation Markov equivalence implies -separation Markov equivalence and that the conditioning set used in the proof when is not a tautology contains . The hardness result in the -separation case then follows from the (A.1) property of the supplementary material of Mogensen and Hansen (2020) and from noting that the latent projection technique can also be used for -separation.
Richardson (1997) studied DGs under -separation and gave an example of ‘nonlocality’ in this setting. The example consisted of a sequence of pairs of graphs, and , such that and are not Markov equivalent, but the only separation on which the graphs disagree involves nodes that are arbitrarily far apart (for increasing values of ). Our setting is quite different, however, DMGs under -separation do exibit the same ‘nonlocality’ as seen from the proof of Theorem 3.7. Say that is not a tautology, in which case and in the proof of Theorem 3.7 are not Markov equivalent. From the proof, it follows that the graphs only disagree on triples such that and , and this means that the proof (for non-tautological of increasing size) gives a sequence of pairs of graphs that only disagree on -separation of a pair of nodes, and , that are arbitrarily far from each other as measured by the shortest path between and . Note that this also holds in the maximal Markov equivalent graphs of and , and it is therefore not due to non-maximality.
3.2 Implications of hardness results
The hardness results have several implications that we will outline in this section, in particular, we argue that several other computational problems are also hard in -separation DMGs.
Every Markov equivalence class has a greatest element (Mogensen and Hansen, 2020), and one can decide if two DMGs are Markov equivalent by computing the greatest Markov equivalent graph for each of them and compare. This means that finding such a greatest element is also hard. There are similar implications for oracle learning algorithms. A (local independence) oracle is an abstract function which a learning algorithm may query and which, when provided with a triple , outputs whether the corresponding local independence holds or not. The oracle gives the correct answer, but when using real data, the oracle has to be replaced by hypothesis tests of local independence, and the purpose of the oracle formalism is simply to separate the algorithmic aspects from the hypothesis testing. If we assume that there exists a constraint-based learning algorithm which can recover a unique representative of the Markov equivalence class (say the greatest element, or some other uniquely defined representative) of the true graph from when given access to a local independence oracle, then using this algorithm, one can decide Markov equivalence by querying the -separation models of the graphs. This is done by testing -separation in the graph and each test is done in polynomial time (Mogensen, 2020b). If only a polynomial number of queries are required we could also solve Markov equivalence in polynomial time by comparing the output for two graphs. Again, this means that such a learning algorithm would need an exponential number of tests.
3.2.1 Sparse DMGs
All of the above holds even if we are willing to assume that all graphs are somewhat sparse (-sparse, ). This means that a restriction to sparse graphs will not remedy this. This is also different from DAG-based models in the following sense. In partially observed DAGs, we may learn a graphical representation of the equivalence class using tests of conditional independence. If we fix such that the node degree is less than , this can be done in polynomial time (Claassen et al., 2013).
These hardness results motivate the second part of this paper. Instead of requiring sparsity of the DMGs, we will reinterpret them to obtain a weaker type of equivalence. Essentially, the DMGs are too expressive leading to the above infeasibility results in connection to their Markov equivalence classes. We can avoid this by considering a weaker type of equivalence. This leads to a simple and useful theory and to practical graph learning algorithms as we will see in subsequent sections.
4 Weak equivalence
In this section, we introduce a notion of weak equivalence and argue that it provides a computationally feasible notion of equivalence of DMGs. Under a regularity condition, the associated equivalence classes each have a greatest element and this leads to a simple graphical theory.
4.1 Classes of weak equivalence
We define three types of equivalence in this section and present them in decreasing order of generality. They each limit the set of triples, , that are used to distinguish between independence models represented by DMGs.
4.1.1 General weak equivalence
If and are Markov equivalent, then if and only if for all . This means that Markov equivalence requires the independence models of and to agree on all triplets in the set . A very general approach to defining weaker notions of equivalence is to only compare the independence models on a subset of .
Definition 4.1** (General weak equivalence).**
Let . We say that and are -weakly equivalent if
[TABLE]
We use to denote the -weak independence model induced by , . We use to denote the -weak equivalence class of , that is, the set of graphs, , such that .
Proposition 4.2**.**
Let and let be a finite set. Definition 4.1 defines an equivalence relation on the set of DMGs with node set .
Proof.
Let be a DMG. We see that is -weakly equivalent with itself such that the relation is reflexive. The relation is also symmetric and transitive. ∎
The next statement follows directly from the definition of weak equivalence.
Proposition 4.3**.**
Let and let be a DMG. It holds that .
A Markov equivalence class has a greatest element. However, a -weak equivalence class does not necessarily have a greatest element as illustrated by the following example.
Example 4.4**.**
We consider the graph, , in Figure 9 with all loops included as well. We define the set ,
[TABLE]
We also define three other graphs from , , where , and
[TABLE]
Graphs and are both -weakly equivalent with which can be seen from simply listing their -weak independence models.
We see that , but which means that and are not -weakly equivalent. We have that , and a greatest element of must be a supergraph of both and , and therefore of . If is a supergraph of , then , and we conclude that the -weak equivalence class of does not contain a greatest element.
Let . If two graphs are Markov equivalent, they are of course also equivalent when restricting to comparisons on the set . Therefore, every graph is also weakly equivalent with the unique, maximal graph of its Markov equivalence class. However, the above example shows that when considering a general -weak equivalence, an equivalence class need not have a greatest element as the maximal Markov equivalent graph need not be a greatest element of the larger weak equivalence class. This leads us to introducing the notion of a homogeneous weak equivalence by imposing a regularity condition on the set . The equivalence classes of a homogeneous weak equivalence relation do indeed contain a greatest element (Section 5).
4.1.2 Homogeneous weak equivalence
We define homogeneous equivalence relation to obtain well-behaved equivalence classes.
Definition 4.5** (Homogeneous equivalence).**
Consider some weak equivalence induced by . We say that this equivalence is homogeneous if there exists a set , , such that
[TABLE]
In this case, we will also say that the set is homogeneous and we will say that is the collection of conditioning sets of .
In other words, a homogeneous equivalence relation is one that restricts only the set of conditioning sets, . That is, if is homogeneous, then -weak equivalence of and means that for all and we have if and only if where is some collection of subsets of . Therefore, the restriction of the independence model imposed by a homogeneous only applies to the conditioning sets.
4.1.3 -weak equivalence
We will now introduce a certain type of homogeneous equivalence which simply restricts the size of the conditioning sets.
Definition 4.6** (-weak equivalence).**
Let . We say that and are -weakly equivalent if for all such that , it holds that if and only if .
The above is formulated slightly differently than Definitions 4.1 and 4.5, however, -weak equivalence is a homogeneous weak equivalence relation by using the set in Definition 4.5. On the other hand, not all homogeneous equivalences correspond to a -weak equivalence. We see that -weak equivalence only compares graphs using ‘small’ conditioning sets of size less than and that Markov equivalence is the same as -weak equivalence.
For , we use to denote the -weak independence model of , . We let denote the set of graphs on nodes that are -weakly equivalent with , and we say that is the -weak equivalence class of . When , we also use , that is, .
4.2 Properties of weak equivalence
This section describes some properties of weak equivalence and weak equivalence classes. Throughout the section is a subset of . For Markov equivalence, it holds that implies which follows from the definition of -separation. This is quite natural as a larger graph has more edges, therefore fewer independences. The same holds for weak equivalence classes as shown by the next proposition.
Proposition 4.7**.**
If , then .
Proof.
If then and , and therefore . This means that . ∎
Proposition 4.8** (Well-ordered -classes).**
Let . If and are -weakly equivalent, then they are also -weakly equivalent.
Proof.
Let , then and . Therefore and . It follows that and . Interchanging the roles of and and repeating the argument gives the result. ∎
From the above, we also see that implies . The next corollary follows directly from the above proposition.
Corollary 4.9** (Well-ordered -classes).**
Let . If and are -weakly equivalent, then they are also -weakly equivalent.
Definition 4.10**.**
We say that is singleton stable if for all , implies that for all and .
Note that the requirement is only on the - and -sets, not the -set. If is homogeneous and , then for all , thus a homogeneous is also singleton stable. The following proposition shows, for a singleton stable , the independence model is characterized by the independences where and are singletons and and are disjoint. This proof uses the fact that -separation models satisfy so-called left and right composition as well as left and right decomposition which are asymmetric graphoid properties (Didelez, 2006; Mogensen et al., 2018). These are similar to classical graphoid properties (Lauritzen, 1996), but left and right version are needed due to the lack of symmetry.
Proposition 4.11**.**
Let be singleton stable, let be a finite set and let . If , then .
Without the assumption of singleton stability, this above statement is not true. For instance, if , then is trivially true for any pair of graphs.
Proof.
Let . If or is empty, then it follows immediately that . Assume that and are both nonempty. We can write and . From the definition of -separation and using singleton stability of it follows that for all and . Therefore for all and (if , then it holds trivially). From the definition of -separation, and therefore also . ∎
Proposition 4.12** (Maximality).**
The graph is maximal in if and only if it is complete or if for all edges such that .
When is maximal in , then we also say that is -maximal (the equivalence class is implicit as a graph can only be maximal in its own equivalence class). A graph is -maximal if the addition of any edge will change the -weak independence model.
Proof.
If is complete, then it is clearly maximal. If , then for some . We have and and therefore .
On the other hand, assume that is maximal, and that is not complete. It follows from the definition of maximality that for all . ∎
If then (Proposition 4.7). One may ask if implies . The next example shows that this is not the case, also not for maximal graphs.
Example 4.13**.**
We consider two graphs, and as shown in Figure 10 (both graphs also have all directed and bidirected loops). Let . Then equals
[TABLE]
* equals*
[TABLE]
and therefore it is a subset of . Markov equivalence corresponds to -weak equivalence with , and by Proposition 4.11, . Both graphs are maximal which means that cannot be added to Markov equivalently. This illustrates that does not imply , not even if is maximal.
Proposition 4.14**.**
Let . If is -maximal, then it is also -maximal.
Proof.
If is complete, then it is also -maximal. Assume instead that is not complete and . is -maximal, so (Proposition 4.12). Using Proposition 4.7, there exist a triple such that and and therefore . We see that and . It follows that is -maximal (Proposition 4.12). ∎
We say that a graph, , is -maximal if is -maximal for which means that induces a -weak equivalence relation.
Corollary 4.15**.**
Let . If is -maximal, then it is also -maximal.
In particular, if a graph is -maximal for some , then it is also the unique maximal element in its Markov equivalence class.
Proposition 4.16** (Minimality).**
The graph is minimal in if and only if it is empty or if for all edges such that .
Proof.
If it is empty, then it is clearly also minimal. Otherwise, let . We have for (Proposition 4.7). Therefore, .
If is minimal in , then it is either the empty graph, or for all , by definition of minimality. ∎
Proposition 4.17**.**
Let . If is -minimal, then it is also -minimal.
The proposition states that the property of being minimal is preserved when considering a larger set of independences. An equivalence class is finite and nonempty, hence, it always contains a maximal element and a minimal element. We will show later that it also contains a greatest element. However, a least element need not exist and Example 6.2 provides an example of this.
Proof.
If is empty, then it is also -minimal. Assume instead that . There exists a triple such that and . Then and therefore . It follows that and . As this holds for all , we see that is -minimal (Proposition 4.16). ∎
4.2.1 Marginalization
We say that a class of graphs, , is closed under marginalization if for every and every there exists such that for every ,
[TABLE]
where is the independence model induced by . When is the class of DMGs, could for instance be a -weak independence model. Appendix C shows that DMGs with weak equivalence are closed under marginalization. This follows directly from the analogous result in the case of Markov equivalence (Mogensen and Hansen, 2020) using a so-called latent projection (see also Verma and Pearl, 1990a; Richardson et al., 2017).
4.3 -weak equivalence
In this subsection, we restrict our attention to -weak equivalence relations. The following result shows that if , then -weak and -weak equivalence is the same. By convention, is always -separated from given when . If , then , and leads to a trivial separation.
Proposition 4.18**.**
Let and such that . Graphs and are -weakly equivalent if and only if they are -weakly equivalent.
Proof.
If and are -weakly equivalent, then they are also -weakly equivalent.
On the other hand, assume that and are -weakly equivalent, and let such that , , and . We must then have , and therefore by -weak equivalence of and . By Proposition 4.11, this implies . Changing the roles of and completes the argument. ∎
Example 4.19** (Weak equivalence class).**
In this example, we restrict our attention to graphs with all loops included in which case graphs , , and in Figure 4.19 constitute a -weak equivalence class and a -weak equivalence class. Graph is the greatest element in both cases. We have that (Corollary 4.9) and . We see that and are not -weakly equivalent as is -separated from given in while this is not the case in .
Example 4.20**.**
We give an example of how ‘strong connectivity’, that is, many similar paths, may lead to more edges in a -weak graph than in the corresponding -weak graph, . For this purpose, we consider graphs and as shown in Figure 12. The graph is -maximal and therefore it is -maximal for all , including (Corollary 4.15). We construct a smaller graph, , by removing . The smaller graph is not Markov equivalent, but it is -equivalent.
In terms of interpretation, we see that in this class of graphs there are many directed paths from to and if there are more than , then the edge can be added -weakly equivalently. In a graphical sense, nodes and are ‘strongly’ connected as there are more than disjoint, directed paths from to and they cannot all be blocked by conditioning on at most nodes.
We now define treks and directed treks (see also Foygel et al., 2012; Mogensen, 2020a). Foygel et al. (2012); Mogensen (2020a) used paths in their definitions of treks, however, we use walks such that treks between and are also allowed.
Definition 4.21** (Trek, directed trek).**
Let be a nontrivial walk between and ,
[TABLE]
We say that is a trek if it has no colliders. We say that a trek is directed from to if has a head at .
We let denote the set of nodes, , such that there exists a directed trek from to in .
Definition 4.22**.**
Let and be DMGs. We say that and are trek equivalent if for all , it holds that
[TABLE]
A walk is -connecting from to given if and only if it is a directed trek from to which is reflected in the next corollary.
Corollary 4.23**.**
Graphs and are [math]-weakly equivalent if and only if they are trek equivalent.
Proof.
This follows from Corollary E.3. ∎
In Corollary 4.23, it is important to define treks using walks, not paths. For instance, the graph in Figure 14 is [math]-weak equivalent with the complete graph, but the only directed treks from to is not are paths. Therefore, the result in Corollary 4.23 does not hold if directed treks are required to be paths. We say that a DMG , , contains a directed cycle if there is some permutation of , , such that in (see an example in Figure 14).
Proposition 4.24**.**
Let be a DMG, , which contains a directed cycle. If every node has a loop, then the complete DMG on is the greatest element of both and .
Proof.
For , this follows from Corollary 4.23 as there is a directed trek between any ordered pair of nodes in . Let and consider nodes and . We show that there is no separating set, , such that . If , this is clear. If , , then either is open, or and is open. ∎
5 Greatest elements under homogeneous weak equivalences
In the rest of the paper, we assume every weak equivalence relation to be homogeneous (Definition 4.5) as this leads to the existence of a greatest element in each equivalence class which we will prove in Subsection 5.2. Mogensen and Hansen (2020) showed the analogous result in the case of Markov equivalence classes. The notions of -potential siblings and -potential parents are central to this proof and are introduced in the next subsection.
5.1 -potential siblings and -potential parents
The existence of a greatest element in each -weak equivalence class can be proven using -potential siblings and -potential parents as introduced in Definitions 5.1 and 5.2. We say that two graphs, and , are -equivalent, , if for all ,
[TABLE]
Let and let be the edge . The conditions (cs1)-(cs3) in Definition 5.1 are sufficient and necessary for and to be -equivalent. When is directed, the conditions (cp1)-(cp4) in Definition 5.2 are analogously necessary and sufficient for and to be -equivalent. The sufficiency is proven in Lemmas D.2 and D.3 and the necessity follows from applying Propositions 5.5 and 5.6 to .
Definitions 5.1 and 5.2 use an abstract independence model, , while Propositions 5.3 and 5.4 describe the content of those definitions in the case of a graphical independence model, .
Definition 5.1** (-potential sibling).**
Let be an independence model over , let , and let . We say that and are -potential siblings in if (cs1)-(cs3) hold.
-
(cs1)
-
if : , and
-
if :
-
(cs2)
if : for all ,
[TABLE]
- (cs3)
if : for all ,
[TABLE]
Definition 5.2** (-potential parent).**
Let be an independence model over , let , and let . We say that is a -potential parent of in if (cp1)-(cp4) hold.
- (cp1)
if :
- (cp2)
if : for all ,
[TABLE]
- (cp3)
if : for all ,
[TABLE]
- (cp4)
if : for all ,
[TABLE]
If is graphical, , and and are -potential siblings in , we will say that is a -potential sibling edge between and . Similarly, we will say that is a -potential parent edge from to if is a -potential parent of in . The following two propositions simply rewrite Definitions 5.1 and 5.2 to explicitly use -connecting walks in the case of a graphical independence model. Their proofs follow directly from the definitions of -separation and the independence model .
Proposition 5.3** (Graphical version of -potential siblings).**
Let be the weak independence model induced by . Let and let be the collection of conditioning sets of . Nodes and are -potential siblings if and only if or (gcs1)-(gcs3) holds.
-
(gcs1)
-
If , there exists a -connecting walk from to given , and
-
if , there exists a -connecting walk from to given .
-
(gcs2)
If , then for all such that there exists a -connecting walk from to given , there also exists a -connecting walk from to given .
- (gcs3)
If , then for all such that there exists a -connecting walk from to given , there also exists a -connecting walk from to given .
Proposition 5.4** (Graphical version of -potential parents).**
Let be the weak independence model induced by . Let and let be the collection of conditioning sets of . The node is a -potential parent of if and only if or (gcp1)-(gcp4) holds.
- (gcp1)
If , there exists a -connecting walk from to given .
- (gcp2)
If , then for all such that there exists a -connecting walk from to given , there also exists a -connecting walk from to given .
- (gcp3)
If and , then for all such that there exists a -connecting walk from to given and a -connecting walk from to given , there also exists a -connecting walk from to given .
- (gcp4)
If then for all such that there exists a -connecting walk from to given , there also exists a -connecting walk from to given .
The next two propositions show that if () is in a graph, then and are -potential siblings ( is a -potential parent of ) in the independence model of the graph for all . The edge is therefore a -potential sibling edge (-potential parent edge) in , and if and are -equivalent, then is also a -potential sibling edge (-potential parent edge) in . This means that satisfying the conditions in Definitions 5.1 and 5.2 is necessary for -equivalence of and .
Proposition 5.5**.**
Let be homogeneous. If is in , then and are -potential siblings in for all .
Proof.
If , then it follows immediately. We assume and prove (gcs1)-(gcs3). (gcs1) If , then is a -connecting walk in given . The proof of the other statement is analogous. (gcs2) Assume that and let such that there exists a -connecting walk from to given . Composing this with gives a -connecting walk from to given as . (gcs3) This is shown similarly to (gcs2). ∎
Proposition 5.6**.**
Let be homogeneous. If is in , then is a -potential parent of in for all .
Proof.
If , then this again follows immediately. We instead assume and prove (gcp1)-(gcp4). (gcp1) If , then is a -connecting walk given . (gcp2) Assume that and let , and assume there is a -connecting walk from to given . Concatenating this with the edge gives a -connecting walk from to given as . (gcp3) Assume that and let such that there exist a -connecting walk from to given and a -connecting walk from to given . Concatenating them with the edge gives a -connecting walk from to given as and . (gcp4) Assume and let such that there exists a -connecting walk from to given . Concatenating the edge with this walk gives a -connecting walk from to given as . ∎
5.2 Existence of greatest elements
Markov equivalence classes of DMGs are known to contain a greatest element (Mogensen and Hansen, 2020). This means that for an equivalence class , there exists a graph such that is a supergraph of all graphs . This is a very convenient result as it allows a succinct representation of the entire Markov equivalence class as illustrated in Example 4. The main result of this section, Theorem 5.8, shows that -weak equivalence classes enjoy the same property when is homogeneous. This means that we can represent weak equivalence classes in a similar way. Section 6 discusses this further and introduces a hierarchy of -weak equivalence classes for different values of .
Lemma 5.7**.**
Let be a DMG. Let be homogeneous and let be the collection of conditioning sets of . If and are -potential siblings for all and e denotes the edge , then . If is a -potential parent of for all and e denotes the edge , then .
Proof.
The inclusion follows from Proposition 4.7. We show the other inclusion by contraposition. Proposition 4.11 implies that it is enough to consider triples of the form , , , . Assume . If , then . If instead , then and . In this case, there exist a -connecting walk from to given in . Nodes and are -potential siblings (or is a -potential parent of ) for all , and therefore also for . Lemma D.2 (Lemma D.3) gives the result. ∎
Lemmas D.2 and D.3 that are used in the above proof are adaptations of lemmas in Mogensen and Hansen (2020). Appendix D describes how to make this generalization.
From an independence model such that is homogeneous we now define a graph on nodes , . As is homogeneous, we know that for some . For all , we include the directed edge if and only if is a -potential parent of for all . We include the bidirected edge if and only if and are -potential siblings for all . We denote the resulting graph by . We see that is uniquely defined from the -independence model of , , and is therefore the same for all elements of the equivalence class . The following shows that is a unique maximal element, that is, a greatest element, in .
Theorem 5.8**.**
Let be a DMG and let be homogeneous. The graph defined above is -weakly equivalent with and it is the unique maximal element in .
Proof.
Let . If a directed edge, , is in , then is a -potential parent of in for all (Proposition 5.6). This means that the directed edge is in . Similarly, for bidirected edges (Proposition 5.5), and is a supergraph of all graphs in .
Every edge in is a -potential edge in for all . We can construct a finite sequence of graphs starting from and adding the edges that are in , but not in , sequentially. Lemma 5.7 shows that all graphs in this sequence are -weakly equivalent with , and therefore so is .
In conclusion, is a greatest element of the equivalence class. ∎
Theorem 5.8 is central in our development of graphical modeling based on weak equivalence as it provides a unique and interpretable representative of each equivalence class. We give examples of applications of this theorem in Section 6.
5.2.1 Comparison with Markov equivalence case
The above definitions and results are related to results in the case of Markov equivalence (Mogensen and Hansen, 2020). Definitions 5.1 and 5.2 can be thought of as -specific versions of Definitions 5.1 and 5.5 in Mogensen and Hansen (2020). This leads to -specific versions of Propositions 5.5 and 5.6 that are analogous to propositions in Mogensen and Hansen (2020).
Importantly, the potential parent conditions of Mogensen and Hansen (2020) use multiple conditioning sets and are therefore not amenable as a foundation for the proof of Theorem 5.8. The conditions in this paper use a single which facilitates the generalization from Markov equivalence classes to weak equivalence classes. The reformulation of the definitions also entails an important change of perspective. Instead of describing conditions such that the addition of an edge does not change the independence model for any conditioning set (Markov equivalence), the above conditions describe conditions such that the addition of an edge does not change the independence model when restricted to a specific conditioning set. This allows us to aggregate these conditions for any set of conditioning sets as defined by a homogeneous , and from this we can prove the existence of a greatest element in this more general setting.
6 Representation of weak equivalence classes
The previous section proved the existence of a greatest element in each weak equivalence class when is homogeneous. In Subsection 6.1, we first desribe how this leads to a simple and concise representation of an entire equivalence class, and Subsection 6.3 illustrates this representation using the alarm example. In Subsection 6.2, we restrict our attention to -weak equivalence and describe a hierarchy of -weak equivalence classes. Choosing a leads to different notions of equivalence with different levels of granularity. The hierarchy in Subsection 6.2 provides a graphical representation of -weak equivalence classes across different values of which is meant to illuminate how equivalence classes change across different values of .
6.1 Directed mixed equivalence graph
The following definition provides a graphical object representing an entire weak equivalence class. Mogensen and Hansen (2020) gave the same definition in the context of Markov equivalence as illustrated in Example 4.
Definition 6.1** (Directed mixed equivalence graph (DMEG)).**
Let be homogeneous and assume that is -maximal and . We define such that if and only if and there exists such that . We define the directed mixed weak equivalence graph (DMEG) of as the triple .
We visualize a directed mixed weak equivalence graph by drawing the corresponding maximal graph and making all edges in dashed (see the example in Figure 15). A DMEG summarizes the equivalence class in the following sense. Let be a -maximal element such that , that is, is the greatest element of , and let be the corresponding DMEG. If an edge is solid in , then this edge is in every . If an edge is absent in , then no contains this edge. If an edge, , is dashed in , then there exists a such that . Clearly is in and therefore is in some elements of , but not in others. One should note that removing multiple dashed edges from does not necessarily lead to a -weakly equivalent graph as removing an edge may impose restrictions on which other edges can be removed while maintaining -weak equivalence. This is related to the fact that a weak equivalence class need not contain a least element (see Figure 15).
Example 6.2** (Directed mixed equivalence graph).**
Graphs , , and in Figure 15 constitute a 2-weak and a 3-weak equivalence class when restricting to DMGs that have all loops present (for simplicity we make this assumption). The graph is the greatest element. The corresponding DMEG is also shown in Figure 15, see Definition 6.1. The 3-weak equivalence class (2-weak equivalence class) does not contain a least element as removing both and does not lead to a 3-weakly equivalent graph (2-weakly equivalent graph).
Example 6.3**.**
This example describes a setting which leads to a weak equivalence with a homogeneous which is not a -weak equivalence. We consider a setting where a -dimensional process is observed, , but not every coordinate process is observed simultaneously. This is essentially a setting with overlapping variable sets, see, e.g., Danks (2002); Danks et al. (2008); Triantafillou et al. (2010); Huang et al. (2020). We assume that data contains observations of over an interval for ,
[TABLE]
The intervals are disjoint, for . We will approach this problem by restricting the local independences that can be tested using this data and require that there exists such that for us to be able to test the local independence .
We see that all local independences, , such that and can be tested from this data as every triple, , , is observed simultaneously (that is, for some ). We can also test for all , but not . This means that we can model this using -weak equivalence, but only for or . We can obtain further information by defining
[TABLE]
This leads to a homogeneous weak equivalence relation which is not a -weak equivalence.
6.2 Hierarchy of -weak equivalence
The previous section describes a graph, the directed mixed equivalence graph, which can help us understand a single weak equivalence class for a fixed, homogeneous . In this section, we restrict our attention to -weak equivalence relations and study a description of -weak equivalence classes for varying values of . We consider a fixed node set, . For each value of , the -weak equivalence classes form a partition of the DMGs on node set , with smaller corresponding to more coarse partitions. Each weak equivalence class can be represented by its maximal element and there is an interpretable structure between -weak equivalence classes for different values of which can help us understand the connection between these different notions of equivalence. This section describes this hierarchy of -weak equivalences.
6.2.1 Levels of granularity
Let be a DMG, and let . Let denote the greatest element of and let denote the greatest element of . We know that and it follows that . The graphs and are both representatives of , but at different levels of granularity. The -equivalence class of is smaller, thus -weak equivalence is more expressive than -weak equivalence. We may ask what ‘approximation error’ we make by using -weak equivalence instead of -weak equivalence. Let be an edge in which is not in . We know that and are -weakly equivalent, so they can only differ on -separations with such that . The approximation error induced by including is therefore restricted to ‘large’ conditioning sets. From a practical point of view, local independence tests with large conditioning sets are expected to perform poorly. This means that the loss of information when testing local independences from finite samples may be small.
6.2.2 Forest representation
We can provide a convenient representation of the -weak equivalence hierarchy using trees and forests. A tree, , is an undirected graph in which each pair of distinct nodes are connected by exactly one path. A forest is the disjoint union of a set of trees. We can construct a forest in the following way. For a fixed , , and , we consider the set of -weak equivalence classes of DMGs on node set . We let denote the number of such equivalence classes. The ’th -weak equivalence class, , contains a unique maximal element and we denote this graph by . We do this for every and define a node set
[TABLE]
Note that we write this as a disjoint union as the same graph may be a maximal element for different . Therefore, the set contains pairs such that is -maximal. For instance, if is a maximal element of a -weak equivalence class and of a -weak equivalence class, then and and these are different nodes.
We now construct a forest with node set in the following way. For each such that , there exist a unique -maximal graph, , such that , and we join to by an undirected edge. We call the resulting graph the weak equivalence hierarchy over and denote it by . For , we will use to denote the (nonempty) set of graphs such that and are adjacent in and such that . For , we will use to denote the unique graph such that and are adjacent in and such that . Example 6.4 and Figure 16 describe (parts of) the weak hierarchy over .
Properties of
We first argue that is a forest. The nodes , , must be in different connected components as for each node there is at most a single edge down in the hierarchy. Using induction on and Corollary 4.9, we see that if , then there is a path between and , and is therefore a connected subset of . It contains exactly edges and is thus a tree. This means that consists of disjoint trees, each tree rooted at for some . Corollary 4.23 characterizes [math]-weak equivalence.
When , and are disjoint when , but need not be when . For and , there exist such that which is due to the fact that if a graph is -maximal, then it is also -maximal (Corollary 4.15). The leaves of the trees are the greatest elements of the Markov equivalence classes (Proposition 4.18).
The graph represents the entire system of -weak equivalence classes and can be conveniently drawn in levels such that the vertical placement is determined by (see Figure 16). Let be a -weak equivalence class represented by its greatest element . If we move along the unique edge towards a -maximal graph, we obtain the maximal element of the -weak equivalence class containing graph the by definition of . If we move to the -level, one of the -equivalence classes will be represented by itself. Naturally, moving towards larger in the hierarchy achieves smaller equivalence classes as if is -maximal, then .
Dashed edges in the hierarchy
In , one may use DMEGs instead of the corresponding maximal DMGs, and in this paragraph we think of a node in as a pair consisting of a DMEG and an integer. In this case, there is also a certain structure in the dashed/solid status of edges across levels of . If an edge is solid in , then it is also solid in all graphs . This is seen from the fact that if then and every graph in this equivalence class contains which is why all graphs in also contain it. If the edge is dashed in , then it is also dashed in . This is because there exists a graph without this edge, and . On the other hand, the edge is in the maximal element of , thus the edge must be present in and dashed.
On the other hand, moving up (towards larger values of ) in the hierarchy a dashed edge may be removed, become solid, or remain dashed. Moving down (towards smaller values of ) in the hierarchy a solid edge may become dashed.
Example 6.4** (-weak hierarchy over ).**
Figure 16 shows a subgraph of for . A node in , , is shown as (or rather, the corresponding DMEG), and determines the vertical placement of the node. All loops are present in the maximal graphs, but omitted from the visualization for simplicity. We use the edge \alpha\mathrel{\text{\ooalign{\filleddiamond!!!!;-!!!!;\filleddiamond}}}\beta to indicate that all three possible edges between a pair of nodes, and , are present in the graph, that is, . The letters , to the right of a graph index the graphs shown in the figure.
Figure 16 shows two subtrees of trees in the hierarchy. We see that the two graphs shown on level , and , are not [math]-weak equivalent as there is no directed trek from to in (see also Corollary 4.23).
In the figure, a red undirected edge indicates graph equality, for example, the edge between and . As noted above, if is -maximal, then and when is drawn both in levels and , we indicate this by making the undirected edge connecting them red.
6.3 Alarm network
We return to the alarm example from Subsection 2.1. This is a network of moderate size with observable coordinate processes. If we consider graphical modeling of this network using a -weak equivalence relation, different values of lead to different levels of granularity as larger values of will give us smaller equivalence classes. Let denote the latent projection of the system (see Figure 1), and let denote the greatest element of . Figure 17 shows the DMEGs of and of . We know that . In this example, we see that the only difference between the two DMEGs in Figure 17 is the bidirected edge between and . This edge is necessarily dashed as . The added complexity of using does therefore not provide much additional information in this example.
7 Algorithms for weak equivalence
The results in Section 3 imply that several computational tasks that occur naturally when using -separation and local independence for graphical modeling of stochastic processes are not feasible, even for a moderate number of coordinate processes. Section 4 introduces a more flexible notion of equivalence to circumvent these issues and Section 5 shows that the convenient theory of Markov equivalence classes translates seamlessly to the more general notion of weak equivalence. As a last component of this paper, we argue that this more general theory leads to algorithms that are in fact feasible from a computational point of view.
7.1 A parametrized hierarchy of graphical equivalence
We start this subsection by providing a formal definition of the weak equivalence decision problem.
Decision problem 7.1** (Weak Markov equivalence in DMGs).**
Let and be DMGs. Are and -weakly equivalent?
Decision problem 7.1 is coNP-complete as it is a more general problem than Decision problem 3.1. We restrict this to -weak equivalence and obtain a parametrized decision problem.
Decision problem 7.2** (Weak Markov equivalence in DMGs).**
Let be a nonnegative integer, and let and be DMGs. Are and -weakly equivalent?
A decision problem is said to be slicewise polynomial if there exists an algorithm which solves the problem in steps for a computable function , input length , and parameter . For fixed , we can decide -weak equivalence of two DMGs by simply checking every possible triple , . This can be done in time bounded by as the number of conditioning sets is bounded by . This shows that parametrized -weak equivalence is a slicewise polynomial problem, in that for a fixed it is solvable by an algorithm which is polynomial in . One should note that this is different from the -sparse decision problems (e.g, Decision problem 3.6) as they remain hard for a fixed whenever .
Intuitively, the unrestricted Markov equivalence problem is computationally hard as the maximal size of the conditioning sets also grows with . On the other hand, if we consider -weak equivalence for a fixed then the maximal size of the conditioning sets is fixed, and the problem can be solved in time which scales polynomially in .
7.2 Computing greatest elements and directed mixed equivalence graphs
As explained above, for a fixed one can decide -weak equivalence in polynomial time. The same applies to the related computational problems.
Assume we have a graph and want to find the maximal element of . A simple algorithm checks for each edge if its addition violates any of the independences in and adds the edge if and only if this is not the case. For a fixed , this is done in polynomial time.
When considering a weak equivalence class as represented by its greatest element, we are interested in computing the associated directed mixed equivalence graph (DMEG) as this graph represents the entire equivalence class concisely. We may remove a single edge at a time and decide Markov equivalence to obtain the corresponding DMEG from a greatest element.
8 Learning
There is a large literature on methods for recovering a graph from observational data Spirtes and Zhang (2018). In the case of DAG-based models, many methods use tests of conditional independence. Similarly, it is possible to learn local independence graphs using tests of local independence. In this section, we briefly discuss graphical structure learning based on tests of local independence as described by Meek (2014) and its connection to weak equivalence of DMGs. Mogensen et al. (2018) described a learning algorithm outputting the Markov equivalence DMEG from tests of local independence. Absar and Zhang (2021) implemented a PC-like algorithm based on -separation. Bhattacharjya et al. (2022) studied independence tests in proximal graphical event models and graphical structure learning based on tests of local independences. Other work described tests of local independence (Thams and Hansen (2021) and Christgau et al. (2022)) and good tests are of course a prerequisite for constraint-based structure learning. The learning problem has also been studied in the discrete-time processes (Eichler, 2013).
As argued in previous sections, constrained-based algorithms that learn the Markov equivalence class of a partially observed local independence graph and are correct in the oracle setting scale poorly with the size of the graph. Therefore, -weak equivalence classes may constitute more reasonable targets for graphical structure learning. The oracle learning algorithm in Mogensen et al. (2018) leveraged the potential sibling and potential parent criteria to ensure correctness, though the number of these conditions also scales poorly with graph size, . This naturally leads to the idea of using -potential sibling and -potential parent criteria directly for learning. In the oracle case this leads to a straightforward learning algorithm by starting from the complete DMG. For each pair of nodes, , one may test the -potential parent criteria for all . If one of these criteria is violated, one simply removes , and similarly for the bidirected edges. For fixed , this leads to a polynomial-time oracle learning algorithm which outputs the maximal -weakly equivalent graph of the true graph. This is similar to early stopping in FCI (Spirtes, 2001) as it only uses tests with small conditioning sets . While smaller values of lead to less informative output (larger equivalence classes), the interpretation of a learned DMEG remains the same as when using as shown by the theory in previous sections.
Outside of the oracle setting, actual tests of local independence output a -value. When learning local independence graphs, one may compute -values from the local independence tests that comprise the -potential parent/sibling criteria, , and use these -values to output a maximal graph which is in minimum violation with the data, see e.g. Hyttinen et al. (2014) for a similar idea in DAG-based graphical structure learning.
9 Discussion
The results in Section 3 show that deciding Markov equivalence is computationally hard, even under a sparsity constraint. This also implies that finding the unique maximal element of a Markov equivalence class is hard and that constraint-based learning algorithms that are correct in oracle versions need exponentially many tests in the worst case.
The theory developed in this paper provides a new interpretation of -separation in directed mixed graphs as representations of local independence in partially observed stochastic processes. This leads to equivalence relations on directed mixed graphs that are weaker than Markov equivalence. Under a weak equivalence relation, each equivalence class of directed mixed graphs have a simple representation and interpretation using the existence of a greatest element. Importantly, they retain a clear interpretation and a convenient graphical representation of an entire -weak equivalence class is available, just as in the case of Markov equivalence classes. The greatest element of an equivalence class also provides a feasible learning target, and one can give a constructive characterization of this element (the collection of -potential sibling and -potential parent conditions). The Markov equivalence class is often the learning target when trying to recover a graph from observational data, however, the complexity results in this paper imply that this target may be too expressive. The previous sections give the theoretical underpinning for feasible learning algorithms that output graphs that are less expressive than the Markov equivalence class.
A subset of the weak equivalence relations, -weak equivalence relations, are naturally parametrized by a natural number . Varying , one obtains more or less granular graphical modeling, and a simple hierarchy of equivalence classes can be described across . The parameter specifies both the granularity of the equivalence class and the complexity of, e.g., finding a maximal graph. The work in this paper mostly focused on the -weak equivalence, however, the central results hold more general weak equivalences, and one may find applications of other types of equivalence relations, e.g., with inspiration from specific applications.
10 Acknowledgments
This work was funded by a DFF-International Postdoctoral Grant (0164-00023B) from Independent Research Fund Denmark. The author is a member of the ELLIIT Strategic Research Area at Lund University. The author thanks Karin Rathsman for discussing alarm handling at the European Spallation Source.
Appendix A Decision problems
We list the formal decision problems used in Section 3.
Decision problem A.1** (Add-1 bidirected Markov equivalence in DMGs).**
Let and be DMGs such that and is bidirected edge. Are and Markov equivalent?
Decision problem A.2** (Add-1 directed Markov equivalence in DMGs).**
Let and be DMGs such that and is directed edge. Are and Markov equivalent?
The next decision problems are sparse versions of Decision problems A.1 and A.2.
Decision problem A.3** (Add-1 birected Markov equivalence in sparse DMGs).**
Let be a nonnegative integer and let and be -sparse DMGs such that and is bidirected edge. Are and Markov equivalent?
Decision problem A.4** (Add-1 directed Markov equivalence in sparse DMGs).**
Let be a nonnegative integer and let and be -sparse DMGs such that and is directed edge. Are and Markov equivalent?
Appendix B Node connectivity in DMGs
In this section, we elaborate on the discussion in Subsection 3.1 on different notions of node connectivity in a DMG. For a DMG, and a node , we define ’s indegree, , to be number of nodes, , such that . Similarly, we define ’s outdegree, , as the number of nodes, , such that . This is an adaptation of the common definitions of in- and outdegree in DAGs. If in , then , and it follows that the indegree of is less than or equal to . Similarly, the outdegree of is less than or equal to . It holds that . However, as illustrated in Figure 18 it is possible for for some to be large while is small for all .
The indegree and outdegree of a node need not equal the and , respectively (see the example in Figure 19). Moreover, the indegree and outdegree need not be the same for Markov equivalent graphs (Figure 19).
The example in Figure 7 is exploiting non-maximality of the graph. In each Markov equivalence class, , there is a greatest element, and one could define sparsity of the nodes in by counting adjacencies in the which is invariant under Markov equivalence. However, the in- and outdegree of in may still be strictly less than and , respectively (Figure 19). In fact, one can find a family of graphs, , and a node for all such that is unbounded while the indegree and outdegree are fixed (see the example in Figure 21).
If is inseparable into and is inseparable into in a maximal DMG, they need not be adjacent (see the example in Figure 20).
Appendix C Marginalization
This section argues that the representation of weak equivalence is closed under marginalization in the sense that we can marginalize any graph, , onto a smaller node set, , which represents the same independence model as the original graph when restricting independence statements to triples such that . This is formalized in Equation 1. A so-called latent projection of satisfies this requirement. The latent projection was also used in Mogensen and Hansen (2020), and earlier in Verma and Pearl (1990a); Richardson et al. (2017).
Definition C.1** (Latent projection).**
We denote the latent projection on on by .
The latent projection of a graph on a node set represents a marginalized version of the independence model, as formalized by the following corollary. Mogensen and Hansen (2020) proved this result in the case of , that is, in the case of Markov equivalence (Mogensen and Hansen, 2020, Theorem 3.12). The general case follows directly from the Markov equivalence result.
Corollary C.2**.**
Let , , and let . For , it holds that
[TABLE]
Proof.
Theorem 3.12 of Mogensen and Hansen (2020) shows that
[TABLE]
and the result follows immediately. ∎
Mogensen and Hansen (2020) stated an algorithm to output the latent projection of a DMG (Algorithm 1). This was similar to earlier algorithms in of other classes of graphs Koster (1999); Sadeghi (2013). The following proposition was proved by Mogensen and Hansen (2020).
Proposition C.3** (Mogensen and Hansen (2020)).**
Let be a DMG and . Algorithm 1 outputs its latent projection, .
One should note that the marginalization of a (weakly) maximal graph need not be (weakly) maximal as illustrated in Figure 22.
Appendix D Proofs and lemmas
The proofs of the following lemmas are adaptations of the proofs of Lemmas 5.4 and 5.8 in Mogensen and Hansen (2020). We include them for completeness to show how the appropriate changes are made. Lemmas 5.4 and 5.8 in Mogensen and Hansen (2020) did not use the -specific conditions that are essential in obtaining the stronger results that we present in this paper.
Definition D.1** (Route).**
We say that a walk, , is a route if the node occurs at most twice on and no other node occurs more than once on .
Routes characterize -connections in DMGs (Mogensen and Hansen, 2020), and we use them in the next proofs. Note that the below lemmas are formulated using , not the restricted version .
Lemma D.2**.**
Let and let be a -potential sibling edge between and in . Let . If there is a -connecting walk from to given in , then there is a -connecting walk from to given in .
Proof.
Consider any -connecting walk from to given in . We can also find a -connecting route from to given in (Mogensen and Hansen, 2020), and we denote this route by . If , then there exists a -connecting walk from to given in using (cs1) of Definition 5.1. If , then there exists a -connecting walk from to given in , also using (cs1). We denote these walks by and , respectively, if they exist.
If does not occur on , then is -connecting given in . If occurs twice, then either contains a subroute and or contains a subroute and . Assume first the former. There is either a -connecting subroute from to , or . If , then consider the subroute between and . This subroute is either trivial or has a tail at . In either case, composing it with gives a -connecting walk from to given in , and using (cs2) there is also a -connecting walk from to given in . If , then we can compose the subroute from to with and . The resulting walk will be -connecting as . The argument is the same when and .
We now assume that occurs only once on and assume first that
[TABLE]
If , then we can compose , , and to obtain a -connecting walk given . Note that this also holds if is trivial. If , then is not trivial and it has a head at . Using (cs3), there exists a -connecting walk from to and composing it with gives the result. If instead
[TABLE]
the same arguments work, now using (cs2). ∎
Lemma D.3**.**
Let and let be a -potential parent edge from to in . Let . If there is a -connecting walk from to given in , then there is a -connecting walk from to given in .
Proof.
We consider a -connecting walk from to given in . If , then by (cp1) there exists a -connecting walk from to given , and we denote this walk by when it exists. We can find a -connecting route from to given in , and we denote this route by .
In this proof, we will say that a collider on a walk is newly closed if the collider is in , but not in . If there exists a newly closed collider, then and . We assume first that occurs at most once on . If there are newly closed colliders on , the proof of Lemma 5.8 in Mogensen and Hansen (2020) shows that we can find a -connecting walk in with no newly closed colliders such that occurs at most once, and we denote this walk by .
If does not contain , then the result follows. If it does contain , we split into two cases. Assume first that
[TABLE]
We see that . If is trivial or if it has a tail at , then composing , , and gives a -connecting walk. If has a head at , then (cp2) gives a -connecting walk from to that we can compose with . Assume instead that
[TABLE]
If has a head at and , then (cp3) gives the result. If , we can find a walk in with no newly closed colliders and only one occurrence of of the type
[TABLE]
where can be trivial, using the same argument as in the proof of Lemma 5.8 in Mogensen and Hansen (2020). We have and there is a -connecting walk from to . Using (cp4) there is also one from to . Composing this with gives the result since is either trivial or has a tail at .
Finally, if occurs twice on , we must have . We can use the same arguments as in the proof of Lemma 5.8 in Mogensen and Hansen (2020) using the walk and condition (cp2). ∎
Appendix E Additional results
When we count the number of colliders on a walk, we count them with multiplicity, that is, if
[TABLE]
is a walk, , then the number of colliders on this walk equals the number of , , such that and both have heads at on . Note that the endpoints, and are not colliders by definition. The next lemma is useful for giving a characterization of -weak equivalence in terms of -connecting walks.
Lemma E.1**.**
If there is a -connecting walk from to given in , then there is a -connecting walk from to given in with at most colliders, all of which are in .
Proof.
Let denote the colliders on the -connecting walk. We know that and therefore there exist a directed path such that and such that is the only node in on this directed path. If , then the path is trivial, that is, contains no edges and just a single node, . Adding for each creates a walk which is -connecting from to given such that every collider is in . If a node occurs as a collider more than once, we can remove the loop. The resulting walk is also -connecting, also if is a collider, and it has strictly fewer colliders. We can repeat this to find a -connecting walk with at most colliders. ∎
Proposition E.2**.**
Let be a DMG. Let and such that . We have if and only if there is no -connecting walk from to given in with at most colliders.
Proof.
If there is a -connecting walk given , then clearly . On the other hand, if then there is a -connecting walk from to given and Lemma E.1 gives the result. ∎
This means that the restriction of the independence models to -weak equivalence ignores -connecting walks with more than colliders.
Corollary E.3**.**
Graphs and are -weak equivalent if and only if it holds for all and such that that there is a -connecting walk from to given in with at most colliders if and only if there is a -connecting walk from to given in with at most colliders.
Proof.
Assume first that , and that is a -connecting walk from to given , , in with at most colliders. Proposition E.2 gives that and therefore . Using Proposition E.2 again gives the result.
Assume instead that for all such that it holds that there is a -connecting walk from to given with at most colliders in if and only if there is one in . If , , then there is no -connecting walk from to given in and therefore also no -connecting walk with at most colliders in , and Propositions 4.11 and E.2 give the result. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aalen (1987) Odd O. Aalen. Dynamic modelling and causality. Scandinavian Actuarial Journal , 1987(3-4):177–190, 1987.
- 2Absar and Zhang (2021) Saima Absar and Lu Zhang. Discovering time-invariant causal structure from temporal data. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages 2807–2811, 2021.
- 3Ali et al. (2009) R. Ayesha Ali, Thomas S. Richardson, and Peter Spirtes. Markov equivalence for ancestral graphs. The Annals of Statistics , 37(5B):2808–2837, 2009.
- 4Andersson et al. (1997 a) Steen A Andersson, David Madigan, and Michael D Perlman. A characterization of Markov equivalence classes for acyclic digraphs. The Annals of Statistics , 25(2):505–541, 1997 a.
- 5Andersson et al. (1997 b) Steen A Andersson, David Madigan, and Michael D Perlman. On the Markov equivalence of chain graphs, undirected graphs, and acyclic digraphs. Scandinavian Journal of Statistics , 24(1):81–102, 1997 b.
- 6Andersson et al. (2001) Steen A Andersson, David Madigan, and Michael D Perlman. Alternative Markov properties for chain graphs. Scandinavian journal of statistics , 28(1):33–85, 2001.
- 7Bhattacharjya et al. (2022) Debarun Bhattacharjya, Karthikeyan Shanmugam, Tian Gao, and Dharmashankar Subramanian. Process independence testing in proximal graphical event models. In Conference on Causal Learning and Reasoning , pages 144–161. PMLR, 2022.
- 8Christgau et al. (2022) Alexander Mangulad Christgau, Lasse Petersen, and Niels Richard Hansen. Nonparametric conditional local independence testing. ar Xiv preprint ar Xiv:2203.13559 , 2022.
