Weak equivalence of local independence graphs

S{\o}ren Wengel Mogensen

arXiv:2302.12541·math.ST·February 27, 2023

Weak equivalence of local independence graphs

S{\o}ren Wengel Mogensen

PDF

Open Access

TL;DR

This paper investigates the complexity of determining Markov equivalence in local independence graphs of multivariate stochastic processes, introduces weaker equivalence relations, and develops feasible algorithms for their analysis.

Contribution

It proves coNP-completeness of Markov equivalence decision and introduces a hierarchy of weak equivalence relations with practical algorithms and concise representations.

Findings

01

Deciding Markov equivalence of DMGs is coNP-complete.

02

Introduces weaker equivalence relations with feasible algorithms.

03

Provides hierarchical structure linking different equivalence levels.

Abstract

Classical graphical modeling of multivariate random vectors uses graphs to encode conditional independence. In graphical modeling of multivariate stochastic processes, graphs may encode so-called local independence analogously. If some coordinate processes of the multivariate stochastic process are unobserved, the local independence graph of the observed coordinate processes is a directed mixed graph (DMG). Two DMGs may encode the same local independences in which case we say that they are Markov equivalent. Markov equivalence is a central notion in graphical modeling. We show that deciding Markov equivalence of DMGs is coNP-complete, even under a sparsity assumption. As a remedy, we introduce a collection of equivalence relations on DMGs that are all less granular than Markov equivalence and we say that they are weak equivalence relations. This leads to feasible algorithms for…

Equations114

α \neq \to β in D \Leftrightarrow α \neq \to β ∣ V ∖ {α}

α \neq \to β in D \Leftrightarrow α \neq \to β ∣ V ∖ {α}

V = {A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, H, E},

V = {A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, H, E},

γ_{1} \sim_{1} γ_{2} \sim_{2} \dots \sim_{l} γ_{l + 1}

γ_{1} \sim_{1} γ_{2} \sim_{2} \dots \sim_{l} γ_{l + 1}

α \sim_{1} γ_{1} \sim_{2} \dots \sim_{l} β

α \sim_{1} γ_{1} \sim_{2} \dots \sim_{l} β

(z_{1}^{1} \land z_{2}^{1} \land z_{3}^{1}) \lor (z_{1}^{2} \land z_{2}^{2} \land z_{3}^{2}) \lor \dots \lor (z_{1}^{N} \land z_{2}^{N} \land z_{3}^{N})

(z_{1}^{1} \land z_{2}^{1} \land z_{3}^{1}) \lor (z_{1}^{2} \land z_{2}^{2} \land z_{3}^{2}) \lor \dots \lor (z_{1}^{N} \land z_{2}^{N} \land z_{3}^{N})

V^{-}

V^{-}

\cup {ϕ_{i}^{k}}_{i = 1, \dots, n_{k}, k = 1, \dots, N}

\cup {\overset{ˉ}{ϕ}_{i}^{k}}_{i = 1, \dots, n_{k}, k = 1, \dots, N}

\cup {χ_{i}, λ_{i}}_{i = 1, \dots, n} .

ρ_{1} \sim \dots ε \leftrightarrow β \dots \sim ρ_{m}

ρ_{1} \sim \dots ε \leftrightarrow β \dots \sim ρ_{m}

α \sim \dots ε \leftrightarrow β \dots \sim ρ_{m}

α \sim \dots ε \leftrightarrow β \dots \sim ρ_{m}

ρ_{1} \sim \dots β \leftrightarrow ε \dots \sim ρ_{m}

ρ_{1} \sim \dots β \leftrightarrow ε \dots \sim ρ_{m}

C=\mathrm{an}\Bigl{(}\{\chi_{i}:x_{i}=1\text{ in }A\}\cup\{\lambda_{i}:x_{i}=0\text{ in }A\}\cup\{\gamma,\delta,\varepsilon,\beta\}\Bigr{)}.

C=\mathrm{an}\Bigl{(}\{\chi_{i}:x_{i}=1\text{ in }A\}\cup\{\lambda_{i}:x_{i}=0\text{ in }A\}\cup\{\gamma,\delta,\varepsilon,\beta\}\Bigr{)}.

ρ_{1} \sim \dots \sim ρ_{i} \sim ϕ \to ε \sim \dots \sim ρ_{m} .

ρ_{1} \sim \dots \sim ρ_{i} \sim ϕ \to ε \sim \dots \sim ρ_{m} .

ρ_{1} \sim \dots \sim ρ_{i} \sim ε \leftarrow ϕ \sim ρ_{j} \sim \dots \sim ρ_{m}

ρ_{1} \sim \dots \sim ρ_{i} \sim ε \leftarrow ϕ \sim ρ_{j} \sim \dots \sim ρ_{m}

Γ

Γ

\overset{ˉ}{Γ}

Δ

\overset{ˉ}{Δ}

Φ

\overset{ˉ}{Φ}

X

Λ

V^{-}

V^{-}

\overset{ˉ}{V}^{-}

N_{ε}

N_{β}

\overset{ˉ}{N}_{ε}

\overset{ˉ}{N}_{β}

V = {α, β, ε, ϕ} \cup V^{-} \cup \overset{ˉ}{V}^{-} \cup N_{ε} \cup N_{β} \cup \overset{ˉ}{N}_{ε} \cup \overset{ˉ}{N}_{β} .

V = {α, β, ε, ϕ} \cup V^{-} \cup \overset{ˉ}{V}^{-} \cup N_{ε} \cup N_{β} \cup \overset{ˉ}{N}_{ε} \cup \overset{ˉ}{N}_{β} .

V^{i}

V^{i}

V^{i}

V^{0}

{χ_{i}, λ_{i}, ν_{ε}^{χ_{i}}, ν_{ε}^{λ_{i}}, ν_{β}^{χ_{i}}, ν_{β}^{λ_{i}}},

V^{- (M + 1)}

V^{M + 1}

ρ_{1} \sim \dots \sim ρ_{m} .

ρ_{1} \sim \dots \sim ρ_{m} .

ω_{1} ρ_{1} \sim \dots \sim ε \leftrightarrow ω_{2} β \sim \dots \sim ρ_{m} .

ω_{1} ρ_{1} \sim \dots \sim ε \leftrightarrow ω_{2} β \sim \dots \sim ρ_{m} .

ρ_{1} \sim \dots \sim β \leftrightarrow ε \sim \dots \sim ρ_{m}

ρ_{1} \sim \dots \sim β \leftrightarrow ε \sim \dots \sim ρ_{m}

ρ_{1} \sim \dots \sim ψ_{0} \sim ψ_{1} \sim ψ_{2} \sim \dots ε \leftrightarrow β \sim \dots \sim ρ_{m}

ρ_{1} \sim \dots \sim ψ_{0} \sim ψ_{1} \sim ψ_{2} \sim \dots ε \leftrightarrow β \sim \dots \sim ρ_{m}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Topological and Geometric Data Analysis · Constraint Satisfaction and Optimization

Full text

Weak equivalence of local independence graphs

Søren Wengel Mogensen

( Department of Automatic Control, Lund University)

Abstract

Classical graphical modeling of multivariate random vectors uses graphs to encode conditional independence. In graphical modeling of multivariate stochastic processes, graphs may encode so-called local independence analogously. If some coordinate processes of the multivariate stochastic process are unobserved, the local independence graph of the observed coordinate processes is a directed mixed graph (DMG). Two DMGs may encode the same local independences in which case we say that they are Markov equivalent.

Markov equivalence is a central notion in graphical modeling. We show that deciding Markov equivalence of DMGs is coNP-complete, even under a sparsity assumption. As a remedy, we introduce a collection of equivalence relations on DMGs that are all less granular than Markov equivalence and we say that they are weak equivalence relations. This leads to feasible algorithms for naturally occurring computational problems related to weak equivalence of DMGs. The equivalence classes of a weak equivalence relation have attractive properties. In particular, each equivalence class has a greatest element which leads to a concise representation of an equivalence class. Moreover, these equivalence relations define a hierarchy of granularity in the graphical modeling which leads to simple and interpretable connections between equivalence relations corresponding to different levels of granularity.

1 Introduction

The distribution of a multivariate random vector, $(X^{\alpha})_{\alpha\in V}$ , induces an independence model, $\mathcal{I}$ , which is simply the collection of triples, $(A,B,C)$ , such that $X^{A}$ and $X^{B}$ are conditionally independent given $X^{C}$ . Graphs are often used as convenient representations of such independence models (Lauritzen, 1996; Maathuis et al., 2019). The graphical theory reflects the fact that conditional independence is symmetric in $A$ and $B$ , i.e., $(A,B,C)\in\mathcal{I}$ if and only if $(B,A,C)\in\mathcal{I}$ . In graphical modeling of multivariate stochastic processes, it is useful to apply a notion of independence that distinguishes between past and present and for this purpose several authors have used local independence, analogously to how conditional independence is used in classical graphical modeling. However, local independence is not symmetric in the above sense and its graphical representation therefore requires a specialized framework. Local independence was first introduced by Schweder (1970) in composable Markov processes and later studied by Aalen (1987) in a broader class of stochastic processes. Didelez (2000, 2008) described graphical modeling of marked point processes based on local independence and Mogensen et al. (2018) extended this theory to Itô processes.

Graphs are said to be Markov equivalent if they represent the same independences, i.e., if they are indistinguishable when observing only the induced independences. Several characterizations of Markov equivalence are available in different classes of graphs representing classical conditional independence (Frydenberg, 1990; Verma and Pearl, 1990b; Spirtes and Verma, 1992; Andersson et al., 1997a, b; Richardson, 1997; Andersson et al., 2001; Zhao et al., 2005; Zhang, 2007; Ali et al., 2009). Mogensen and Hansen (2020) used directed mixed graphs as representations of local independences in partially observed stochastic processes and they characterized Markov equivalence in this class of graphs by proving that each equivalence class contains a greatest element. Their equivalence result also provided a simple approach to visualizing and understanding an entire equivalence class. Mogensen and Hansen (2022) characterized Markov equivalence of directed correlation graphs representing local independence in the presence of correlated noise processes. Recent work studied local independence testing in point processes (Thams and Hansen, 2021) and Christgau et al. (2022) described nonparametric tests of local independence. It is worth noting that local independence is a continuous-time version of discrete-time Granger causality which has been used in graphical models of time series (Eichler and Didelez, 2007, 2010; Eichler, 2012, 2013). The graphical theory of directed mixed graphs and the results in this paper may be applied in both continuous-time and discrete-time stochastic processes (Mogensen and Hansen, 2020, supplementary material).

In graphs representing classical conditional independence, several characterizations of Markov equivalence lead to polynomial-time algorithms for deciding Markov equivalence (e.g., Richardson, 1997; Ali et al., 2009). In the local independence framework, Mogensen and Hansen (2022) proved that deciding Markov equivalence of two directed correlation graphs is coNP-complete which means that we should not expect to find a polynomial-time algorithm in this case. In this paper, we show that deciding Markov equivalence of directed mixed graphs is also coNP-complete. We further show that assuming sparsity of the directed mixed graphs does not generally remedy this. Our results imply that several computational problems that occur naturally when using directed mixed graphs are also computationally hard. For this reason, Markov equivalence in partially observed local independence graphs may not always be a practical notion. Instead, we introduce a class of weak equivalence relations between local independence graphs. We characterize the corresponding equivalence classes and show that they too contain a greatest element. Mogensen and Hansen (2020) argued that the existence of a greatest element leads to a straightforward Markov equivalence theory. We extend this theory to the more general weak equivalences studied in this paper. This allows a simple representation of weak equivalence classes. A subset of the weak equivalence relations may be understood as creating a hierarchy of equivalence relations in which a parameter, $k$ , creates a trade-off between the size of the equivalence classes and the computational complexity, leading to a graphical theory which is both useful and practical. This hierarchy also illustrates interpretable connections between equivalence classes across different values of $k$ .

The paper is structured in the following way. In Section 2, we introduce necessary terminology and notation. We also describe global Markov properties that connect so-called $\mu$ -separation in graphs to local independence and provide justification for using graphs as representations of local independence. Moreover, we give an example to illustrate the framework and purpose of the paper. In Section 3, we prove that deciding Markov equivalence of directed mixed graphs is computationally hard, even under sparsity restrictions, and we discuss the implications of this result. In Section 4, we introduce the notion of weak equivalence of graphs. We describe its properties and compare it with Markov equivalence. Section 5 proves that, under a regularity condition, every weak equivalence class has a greatest element. Using the main result from the previous section, Section 6 first describes a graph which concisely represents an entire equivalence class. It then describes a hierarchy of certain weak equivalence classes and how they represent different levels of granularity in their description of the underlying graphs. Section 7 discusses algorithmic aspects of weak equivalence, and in Section 8 we briefly outline how results from the previous sections relate to graphical structure learning. Section 9 provides a discussion of the results.

2 Local independence and graphs

The interest in $\mu$ -separation arises from its connection to local independence as formalized through various global Markov properties. We start by defining local independence following the exposition in Christgau et al. (2022). We will give the definition for counting processes, though, it can be extended to other classes of stochastic processes (Didelez, 2008; Mogensen et al., 2018; Mogensen and Hansen, 2022).

We consider a multivariate counting processes, $N_{t}=(N_{t}^{1},\ldots,N_{t}^{n})$ , on a probability space, $(\Omega,\mathbb{F},P)$ , and we assume that $N_{t}$ is observed over some interval $[0,T]$ . We let $V$ denote the set $\{1,2,\ldots,n\}$ . We use $\mathcal{F}_{t}^{D}$ to denote the right-continuous and complete filtration generated by $N_{t}^{D}=(N_{t}^{\alpha}:\alpha\in D)$ . One can think of $\mathcal{F}_{t}^{D}$ as consisting of the information in the coordinate processes in $D\subseteq V$ up until time point $t$ . For $\beta\in V$ and $C\subseteq V$ , we assume that $N_{t}^{\beta}$ has a $\mathcal{F}_{t}^{C}$ -intensity, $\lambda_{t}^{\beta,C}$ . The stochastic process $\lambda_{t}^{\beta,C}$ is $\mathcal{F}_{t}^{C}$ -predictable and $N_{t}^{\beta}-\int_{0}^{t}\lambda_{s}^{\beta,C}\mathrm{d}s$ is a local $\mathcal{F}_{t}^{C}$ -martingale.

Definition 2.1 (Local independence).

Let $\alpha,\beta\in V$ and let $C\subseteq V$ . We say that $N_{t}^{\beta}$ is locally independent of $N_{t}^{\alpha}$ given $N_{t}^{C}$ (or simply, that $\beta$ is locally independent of $\alpha$ given $C$ ) if the local $\mathcal{F}_{t}^{C}$ -martingale as defined above is also a local $\mathcal{F}_{t}^{C\cup\{\alpha\}}$ -martingale. For $A,B,C\subseteq V$ , we say that $B$ is locally independent of $A$ given $C$ if $\beta$ is locally independent of $\alpha$ given $C$ for all $\alpha\in A$ and $\beta\in B$ , and we denote this by $A\not\rightarrow B\mid C$ .

Christgau et al. (2022) use the term conditional local independence instead of local independence which highlights the fact that Definition 2.1 is analogous to classical conditional independence of random variables. Intuitively, when $\beta$ is locally independent of $\alpha$ given $C$ , observation of the $\alpha$ -process over the interval $[0,t]$ does not provide additional information other than that contained in $\mathcal{F}_{t-}^{C}$ when trying to predict if there will be an event in process $\beta$ in the interval $[t,t+\mathrm{d}t)$ .

Local independence was first used by Schweder (1970) in composable Markov processes and later studied by Aalen (1987). Didelez (2000, 2008) described graphical modeling based on local independence. Other work on local independence Markov properties go into more detail (Didelez, 2000, 2008; Mogensen et al., 2018; Mogensen and Hansen, 2022).

Definition 2.2 (Local independence graph).

We consider a multivariate counting process, $N_{t}=(N_{t}^{1},\ldots,N_{t}^{n})$ , as above, $V=\{1,\ldots,n\}$ . Its local independence graph is the directed graph, $\mathcal{D}$ , on nodes $V$ such that

[TABLE]

for $\alpha,\beta\in V$ where $\alpha\not\rightarrow\beta$ indicates the absence of the directed edge from $\alpha$ to $\beta$ .

The statement $\{\alpha\}\not\rightarrow\{\beta\}\mid V\setminus\{\alpha\}$ denotes that $\beta$ is locally independent of $\alpha$ given $V\setminus\{\alpha\}$ , and above we have simply written the singletons $\{\alpha\}$ and $\{\beta\}$ as $\alpha$ and $\beta$ , respectively. The implication from left to right in Definition 2.2 is known as the pairwise Markov property. When this property holds, we see that the absence of an edge implies a local independence. The global Markov property allows one to read off more general local independences from a local independence graph using $\delta$ - or $\mu$ -separation (Definition 2.5). This is similar to other classes of graphical models (Maathuis et al., 2019). Several results state conditions for the equivalence of pairwise and global Markov properties (Didelez, 2008; Mogensen et al., 2018).

Local independence is a continuous-time analogue of Granger causality in discrete-time stochastic processes. The results of this paper also applies to Granger-causal graphs, see, e.g., the supplementary material of Mogensen and Hansen (2020) and Eichler (2007).

2.1 Alarm network

We describe an example application based on modeling how alarms propagate through a complex industrial system. Example data is in Figure 2. In this industrial system, a number of process variables (e.g., temperatures and pressures) are measured repeatedly. Each process variable corresponds to an alarm process, and if a measured process is outside the normal range of operations an event occurs in the corresponding alarm process. The stochastic system is described by a $12$ -dimensional counting process, $N_{t}^{V}$ ,

[TABLE]

observed over the interval $[0,1]$ . The coordinate processes in $V\setminus\{E\}$ are alarm processes. Process $\mathrm{E}$ represents exogenous events that feed into the system, e.g., changes in operating conditions, and this process is unobserved. Process $\mathrm{H}$ is an alarm process, but unavailable for some reason, and the observed processes are those in $V\setminus\{E,H\}$ . We assume that $\mathcal{D}$ is a local independence graph in the sense of Definition 2.2. Under some regularity conditions, this implies that the global Markov property is satisfied in this graph (Didelez, 2008) and therefore $\mu$ -separation (Definition 2.5) in the graph implies local independence.

The graph $\mathcal{G}$ in Figure 1 (the latent projection of $\mathcal{D}$ , see Section C) represents the observable local independences in the sense that for $A,B,C\subseteq V\setminus\{E,H\}$ it holds that $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{D}$ if and only if $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{G}$ . The underlying graph of the full system, $\mathcal{D}$ , is a directed graph while the latent projection is a directed mixed graph. In general, this larger class of graphs is needed to represent the local independences of partially observed multivariate stochastic processes.

Local independence asks the following question. If we are to predict if processes $B$ will have an event in the immediate future and we have the information in the past of processes $C$ will the information in the past of proesses $A$ add anything? This is illustrated visually in Figure 2 with $A=\{\mathrm{A1}\}$ , $B=\{\mathrm{A3}\}$ , and $C=\{\mathrm{A2},\mathrm{A6}\}$ . In this specific example, $\{\mathrm{A3}\}$ is $\mu$ -separated from $\{\mathrm{A1}\}$ given $\{\mathrm{A2},\mathrm{A6}\}$ in $\mathcal{G}$ and under the global Markov property this implies that the corresponding local independence holds. Therefore, the information in the past of process $\{\mathrm{A1}\}$ is superfluous when already accounting for the information in the past of processes $\{\mathrm{A2},\mathrm{A6}\}$ .

Several directed mixed graphs may induce the same $\mu$ -separations which means that they represent the same local independences. In this case, we say that they are Markov equivalent. The graph on the right in Figure 1 is the directed mixed equivalence graph of $\mathcal{G}$ . It represents the entire Markov equivalence class by indicating if an edge is in every Markov equivalent graph (solid), in no Markov equivalent graph (absent), or in only some Markov equivalent graphs (dashed). This is a useful representation, but it may not be a practical one for all applications as it leads to computationally hard problems. In this paper, we trade away some of the expressive power of Markov equivalence to obtain a more feasible notion of equivalence and we show that weaker notions of equivalence remain easily interpretable.

2.2 Graphs

A graph is a pair $(V,E)$ where $V$ is a finite node set and $E$ is an edge set. The edge set $E$ is a disjoint union, $E=E_{d}\mathbin{\dot{\cup}}E_{b}$ , where $E_{d}$ is a set of ordered pairs, corresponding to directed edges, $\rightarrow$ , and $E_{b}$ is a set of unordered pairs, corresponding to bidirected edges, $\leftrightarrow$ . We use $\alpha\leftrightarrow_{\mathcal{G}}\beta$ to denote that there is a bidirected edge between $\alpha$ and $\beta$ in the graph $\mathcal{G}$ , or just $\alpha\leftrightarrow\beta$ when it is clear from the context to which graph the statement refers, and we use $\alpha\rightarrow_{\mathcal{G}}$ and $\alpha\rightarrow\beta$ analogously. The definition of the node set implies that we allow multiple edges between a pair of nodes, however, the edges between two nodes $\alpha$ and $\beta$ is always a subset of $\{\alpha\rightarrow\beta,\alpha\leftarrow\beta,\alpha\leftrightarrow\beta\}$ . Moreover, $\alpha\leftrightarrow\beta$ and $\beta\leftrightarrow\alpha$ are equivalent while $\alpha\rightarrow\beta$ and $\alpha\leftarrow\beta$ are different edges. We emphasize that the edge $\alpha\leftrightarrow\beta$ is not shorthand for the two edges $\alpha\rightarrow\beta$ and $\alpha\leftrightarrow\beta$ , and the meaning of the bidirected edge is different from that of the two directed edges. This will be clear from subsequent definitions.

We use $\alpha\sim\beta$ to denote a generic edge of either type between $\alpha$ and $\beta$ , and we say that $\alpha$ and $\beta$ are adjacent in $\mathcal{G}$ when there exists an edge between them, $\alpha\sim\beta$ . When there are multiple nodes on each side of the edge, $\alpha_{1},\ldots,\alpha_{k}\sim\beta_{1},\ldots,\beta_{l}$ , this means that $\alpha_{i}\sim\beta_{j}$ for all $i=1,\ldots,k$ and $j=1,\ldots,l$ . We separate such statements by semicolons, $\alpha_{1},\ldots,\alpha_{k}\sim\beta_{1},\ldots,\beta_{l}$ ; $\gamma_{1},\ldots,\gamma_{r}\sim\delta_{1},\ldots,\delta_{s}$ . We use $\alpha\ *\!\!\rightarrow\beta$ to mean that $\alpha\rightarrow\beta$ or $\alpha\leftrightarrow\beta$ . We say that edges $\alpha\rightarrow\beta$ and $\alpha\leftrightarrow\beta$ have a head at $\beta$ , and that $\alpha\rightarrow\beta$ has a tail at $\alpha$ . If an edge $e$ is between $\alpha$ and $\beta$ and $\alpha=\beta$ , we say that $e$ is a loop.

We use $V$ as a generic node set and let $n$ denote the cardinality of $V$ , $n=|V|$ . The graphs described above are directed mixed graphs as formalized in the next definition.

Definition 2.3 (Directed mixed graph (DMG)).

We say that $\mathcal{G}=(V,E)$ is a directed mixed graph if its edge set, $E$ , consists of directed and bidirected edges.

We say that a DMG is a directed graph (DG) if it has no bidirected edges. A walk between $\gamma_{1}$ and $\gamma_{l+1}$ is an alternating sequence of nodes, $\gamma_{1},\ldots,\gamma_{l+1}$ and edges $\sim_{1},\ldots,\sim_{l}$

[TABLE]

such that for each $i=1,\ldots,l$ , $\sim_{i}$ is between $\gamma_{i}$ and $\gamma_{i+1}$ . Let $e_{i}$ denote the edge $\sim_{i}$ above. We will sometimes write a walk as $(\gamma_{1},e_{1},\gamma_{2},\ldots,e_{l},\gamma_{l+1})$ . A walk also specifies an orientation for each edge as one can otherwise not distingush between $\alpha\leftarrow\alpha$ and $\alpha\rightarrow\alpha$ . We say that $\gamma_{i}$ , $1<i<l+1$ , is a collider if $\sim_{i-1}$ and $\sim_{i}$ both have head at $\gamma_{i}$ . Otherwise, we say that it is a noncollider. A node may be repeated on a walk, $\gamma_{i}=\gamma_{j}$ , $i\neq j$ , and may therefore occur both as a collider and as a noncollider on the same walk. Thus, the property of being a collider/noncollider pertains to the specific instance of the node on the walk. We say that $\gamma_{1}$ and $\gamma_{l+1}$ are endpoints of the walk. Note that endpoints of a walk are neither colliders nor noncolliders. We say that a walk is nontrivial if it has at least one edge. A walk on which no node is repeated is a path.

Let $\mathcal{G}=(V,E)$ . When $e$ is an edge we use $\mathcal{G}+e$ to denote the graph $(V,E\cup\{e\})$ , and we use $\mathcal{G}-e$ to denote the graph $(V,E\setminus\{e\})$ . We say that $\mathcal{G}$ is complete if it contains $\alpha\rightarrow\beta$ ; $\alpha\leftarrow\beta$ , and $\alpha\leftrightarrow\beta$ for all $\alpha,\beta\in V$ , and we say that $\mathcal{G}$ is empty if $E=\emptyset$ . We say that a walk between $\alpha$ and $\beta$ is directed from $\alpha$ to $\beta$ if every edge on the walk is directed and points towards (the last) $\beta$ , $\alpha\rightarrow\ldots\rightarrow\beta$ . We say that $\alpha$ is an ancestor of $\beta$ in $\mathcal{G}$ if there exists a directed walk from $\alpha$ to $\beta$ , and we allow this walk to be trivial (no edges) meaning that a node is always an ancestor of itself. We define $\mathrm{an}_{\mathcal{G}}(\alpha)$ , or simply $\mathrm{an}(\alpha)$ , to be the set of ancestors of $\alpha$ , and for $C\subseteq V$ we define $\mathrm{an}_{\mathcal{G}}(C)=\cup_{\alpha\in C}\mathrm{an}_{\mathcal{G}}(\alpha)$ . Note that $C\subseteq\mathrm{an}_{\mathcal{G}}(C)$ .

Definition 2.4 ( $\mu$ -connecting walk).

We say that a nontrivial walk in a DMG, $\mathcal{G}$ ,

[TABLE]

is $\mu$ -connecting from $\alpha$ to $\beta$ given $C$ if $\alpha\notin C$ , the edge $\sim_{l}$ has a head at $\beta$ , every collider is in $\mathrm{an}(C)$ and no noncollider is in $C$ .

The $\mu$ -connecting walks are used in the definition of $\mu$ -separation below which will help us connect DMGs to local independence. Mogensen et al. (2018) and Mogensen and Hansen (2020) defined $\mu$ -separation as an extension to $\delta$ -separation (Didelez, 2000, 2008). One can think of $\delta$ - and $\mu$ -separation as analogous to $d$ - and $m$ -separation in DAG-based graphical models (Pearl, 2009; Richardson and Spirtes, 2002; Richardson, 2003).

Definition 2.5 ( $\mu$ -separation).

Let $\mathcal{G}=(V,E)$ and let $A,B,C\subseteq V$ . We say that $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{G}$ if there is no $\mu$ -connecting walk from any $\alpha\in A$ to any $\beta\in B$ given $C$ . We write this as $A\perp_{\mu}B\mid C\ [\mathcal{G}]$ , or simply $A\perp_{\mu}B\mid C$ . We say that $C$ is a conditioning set.

By definition, $B$ is $\mu$ -separated from $A$ given $C$ if $A\subseteq C$ . One should also note that $\mu$ -separation is not symmetric in $A$ and $B$ in that $A\perp_{\mu}B\mid C\ [\mathcal{G}]$ does not imply $A\perp_{\mu}B\mid C\ [\mathcal{G}]$ , and neither is local independence. This lack of symmetry sets the graphical modeling of local independence apart from the classical graphical modeling of conditional independence (Lauritzen, 1996). In contrast to $m$ -separation, $\mu$ -separation cannot be characterized using only paths (Mogensen and Hansen, 2020). It is, however, possible to obtain a characterization using only routes which are a finite subset of all possible walks (see Definition D.1 in Appendix D or Mogensen and Hansen (2020)). The next example illustrates the concept of $\mu$ -connecting walks and $\mu$ -separation in a DMG.

Example 2.6.

We consider the DMG, $\mathcal{G}$ , in Figure 3. The walk $1\leftrightarrow 2\rightarrow 3$ is $\mu$ -connecting from $1$ to $3$ given $\emptyset$ . It is not $\mu$ -connecting from $1$ to $3$ given $\{2\}$ as $2$ is a noncollider. On the walk $1\leftrightarrow 2\leftarrow 2\rightarrow 3$ the node $2$ is a collider in its first instance and a noncollider in its second. The walk $3\rightarrow 2\leftrightarrow 1$ is $\mu$ -connecting from $3$ to $1$ given $\{2\}$ , however, the reverse walk, $1\leftrightarrow 2\leftarrow 3$ is not $\mu$ -connecting from $1$ to $3$ given $\{2\}$ .

We see that $3$ is $\mu$ -separated from $1$ given $\{2,3\}$ in $\mathcal{G}$ . On the other hand, $3$ is not $\mu$ -separated from $1$ given $\{2\}$ as the walk $1\leftrightarrow 2\leftarrow 3\rightarrow 3$ is $\mu$ -connecting.

2.3 Independence models and Markov equivalence

For a fixed stochastic process, $X_{t}=(X_{t}^{1},\ldots,X_{t}^{n})^{T}$ , and a DMG, $\mathcal{G}=(V,E)$ , both local independence and $\mu$ -separation can be thought as ternary relations on a finite set $\mathbb{P}(V)\times\mathbb{P}(V)\times\mathbb{P}(V)$ where $V=\{1,2,\ldots,n\}$ and $\mathbb{P}(\cdot)$ denotes power set. We use $\mathcal{P}$ to denote $\mathbb{P}(V)\times\mathbb{P}(V)\times\mathbb{P}(V)=\{(A,B,C):A,B,C\subseteq V\}$ and we define an abstract independence model, $\mathcal{I}$ , to be a subset of $\mathcal{P}$ . Thus, $\mathcal{I}$ is a collection of triples $(A,B,C)$ such that $A,B,C\subseteq V$ . We say that $\mathcal{I}$ is an independence model over $V$ . When $A,B$ , or $C$ are singletons, we will often omit the set notation and write, e.g., $(\alpha,\beta,C)$ instead of $(\{\alpha\},\{\beta\},C)$ .

We use $\mathcal{I}(\mathcal{G})$ to denote the independence model induced by $\mathcal{G}$ , that is, the set of $\mu$ -separations that are true in $\mathcal{G}$ , $\mathcal{I}(\mathcal{G})=\{(A,B,C)\in\mathcal{P}:A\perp_{\mu}B\mid C\ [\mathcal{G}]\}$ . Similarly, an independence model can be defined as the set of local independences that hold in the distribution of a multivariate stochastic process. We say that an independence model, $\mathcal{I}$ , is graphical, if there exist a DMG, $\mathcal{G}$ , such that $\mathcal{I}=\mathcal{I}(\mathcal{G})$ .

Definition 2.7 (Markov equivalence).

Let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ be DMGs. We say that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent if for all $A,B,C\subseteq V$ it holds that $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{G}_{1}$ if and only if $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{G}_{1}$ . Equivalently, $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent if $\mathcal{I}(\mathcal{G}_{1})=\mathcal{I}(\mathcal{G}_{2})$ . We use $[\mathcal{G}_{1}]$ to denote the Markov equivalence class of $\mathcal{G}_{1}$ .

Example 2.8.

We return to the graph, $\mathcal{G}$ , in Figure 3. By definition, its independence model, $\mathcal{I}(\mathcal{G})$ , consists of all triples $(A,B,C)$ such that $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{G}$ . It is enough to consider $(A,B,C)$ such that $A$ and $B$ are singletons and $A\not\subseteq C$ as these characterize $\mathcal{I}(\mathcal{G})$ (Proposition 4.11). We see that $3$ is $\mu$ -separated from $1$ given $\{2,3\}$ , and this is the only $\mu$ -separation of this type in the graph.

2.3.1 Extremal elements of sets of DMGs

Let $\mathbb{G}=\{\mathcal{G}_{1}=(V,E_{1}),\ldots,\mathcal{G}_{l}=(V,E_{l})\}$ be a set of DMGs on a common node set, $V$ . If $E_{i}\subseteq E_{j}$ , we write $\mathcal{G}_{i}\subseteq\mathcal{G}_{j}$ , and we say that $\mathcal{G}_{i}$ is a subgraph of $\mathcal{G}_{j}$ , and that $\mathcal{G}_{j}$ is a supergraph of $\mathcal{G}_{i}$ . We write $\mathcal{G}_{i}\subsetneq\mathcal{G}_{j}$ when $E_{i}\subseteq E_{j}$ and $E_{i}\neq E_{j}$ . The following definitions are common set-theoretic notions when considering the set $\mathbb{G}$ with the partial order, $\subseteq$ .

Definition 2.9 (Maximal element, DMG).

We say that $\mathcal{G}\in\mathbb{G}$ is a maximal element of $\mathbb{G}$ if there is no $\bar{\mathcal{G}}\in\mathbb{G}$ , $\bar{\mathcal{G}}\neq{\mathcal{G}}$ , such that $\mathcal{G}\subseteq\bar{\mathcal{G}}$ .

Definition 2.10 (Greatest element, DMG).

We say that $\mathcal{G}\in\mathbb{G}$ is a greatest element of $\mathbb{G}$ if $\bar{\mathcal{G}}\subseteq{\mathcal{G}}$ for all $\bar{\mathcal{G}}\in\mathbb{G}$ .

When a greatest element exists, it is unique. It is also maximal, and it is the only maximal element. In this paper, we are mostly concerned with maximal and greatest elements, however, we also define minimal and least elements of sets of DMGs. We say that $\mathcal{G}\in\mathbb{G}$ is a minimal element of $\mathbb{G}$ if there is no $\bar{\mathcal{G}}\in\mathbb{G}$ , $\bar{\mathcal{G}}\neq{\mathcal{G}}$ , such that $\bar{\mathcal{G}}\subseteq\mathcal{G}$ . We say that $\mathcal{G}\in\mathbb{G}$ is a least element of $\mathbb{G}$ if ${\mathcal{G}}\subseteq\bar{\mathcal{G}}$ for all $\bar{\mathcal{G}}\in\mathbb{G}$ . The set $\mathbb{G}$ will most often be an equivalence class in our usage of the above terms, and we sometimes simply say that $\mathcal{G}$ is a maximal/minimal/greatest/least element when the equivalence class is understood from the context.

Example 2.11.

If we consider the set of graphs $\mathbb{G}=\{\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{D}\}$ in Figure 4, we see that graph $\mathbf{D}$ is the greatest element of $\mathbb{G}$ as every graph in $\mathbb{G}$ is a subgraph of $\mathbf{D}$ , and therefore $\mathbf{D}$ is also the unique maximal element of $\mathbb{G}$ . The smaller set $\bar{\mathbb{G}}=\{\mathbf{A},\mathbf{B},\mathbf{C}\}$ does not have a greatest element and graphs $\mathbf{B}$ and $\mathbf{C}$ are maximal elements of $\bar{\mathbb{G}}$ .

2.3.2 Representation of Markov equivalence classes

We introduce a central result from Mogensen and Hansen (2020). They show that every Markov equivalence class has a greatest element. Section 5 extends this theorem to weak equivalence relations.

Theorem 2.12 (Greatest element of a Markov equivalence class,

(Mogensen and Hansen, 2020)).

Let $\mathcal{G}$ be a DMG, and let $[\mathcal{G}]$ be its Markov equivalence class. There exists $\mathcal{N}\in[\mathcal{G}]$ such that for all $\bar{\mathcal{G}}\in[\mathcal{G}]$ the edge set of $\bar{\mathcal{G}}$ is a subset of the edge set of $\mathcal{N}$ .

The next example illustrates the utility of this theorem.

Example 2.13.

Graphs $\mathbf{A}$ - $\mathbf{D}$ in Figure 4 constitute a Markov equivalence class, $[\mathcal{G}]$ (for simplicity, we assume that all loops are present, and do not consider Markov equivalent graphs obtained by removing loops). Graph $\mathbf{D}$ is the greatest element of $[\mathcal{G}]$ in the sense that all Markov equivalent graphs are subgraphs of graph $\mathbf{D}$ . In other words, if a graph in the Markov equivalence class contains the edge $e$ , then $e$ is also in the graph $\mathbf{D}$ . This means that we can represent the entire Markov equivalence class using graph E. The edges are the same as in the greatest element. Edges are solid in graph $\mathbf{E}$ if they are in every Markov equivalent graph and they are dashed if they are in some Markov equivalent graphs, but not in others. Absent edges are not in any graph in the Markov equivalence class. Therefore, graph $\mathbf{E}$ represents a summary of the information the Markov equivalence class provides on each edge. Moreover, Theorem 2.12 implies that every Markov equivalence class contains a greatest element, and therefore this is a general approach to representing and understanding Markov equivalence classes (Mogensen and Hansen, 2020).

3 Hardness of marginalized local independence graphs

In this section, we argue that certain computational problems in relation to DMGs and Markov equivalence are hard. For this purpose, we give a very short introduction to the concepts from complexity theory that we will need. A decision problem is in coNP if no-instances have certificates which can be evaluated in polynomial time. For instance, if $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are not Markov equivalent (they are a no-instance when deciding Markov equivalence) a triple $(A,B,C)$ such that $B$ is $\mu$ -separated from $A$ given $C$ in $\mathcal{G}_{1}$ , but not in $\mathcal{G}_{2}$ , may function as a certificate as one can check this specific separation in both graphs and conclude that they are not Markov equivalent. A decision problem is in P if it can be solved by a deterministic Turing machine in polynomial time. A decision problem is coNP-hard if it is at least as hard as any problem in coNP, and it is coNP-complete if it is coNP-hard and in coNP. It is generally believed that P $\neq$ coNP in which case there are no polynomial-time algorithm which can solve a coNP-hard problem. The complement of a decision problem arises from interchanging yes and no. A decision problem is in coNP if and only if its complement is in NP. We now introduce some decision problems relating to DMGs.

Decision problem 3.1 (Markov equivalence in DMGs).

Let $\mathcal{G}_{1}=(V,E)_{1}$ and $\mathcal{G}_{2}=(V,E_{2})$ be DMGs. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ Markov equivalent?

The development in this paper is partly motivated by the fact that the above decision problem is hard (Corollary 3.3). We can formulate a restricted version of the problem in which the pair of graphs for which to decide Markov equivalence only differ by a single (bidirected or directed) edge, as formalized in Decision problems A.1 (bidirected) and A.2 (directed). These problems are also hard and we prove this in Theorem 3.2. Corollary 3.3 follows immediately from this theorem.

Theorem 3.2.

Let $\mathcal{G}$ be a DMG and let $e$ denote an edge. Deciding Markov equivalence of $\mathcal{G}$ and $\mathcal{G}+e$ is coNP-complete (Decision problems A.1 and A.2).

Corollary 3.3.

Deciding Markov equivalence of DMGs is coNP-complete (Decision problem 3.1).

Decision problem A.1 has been proven to be coNP-complete (PhD thesis, Mogensen (2020b)) and this was used to obtain the result in Corollary 3.3. We will give a slightly different proof to make the generalization to the proof in the sparse setting more transparent and to also prove that Decision problem A.2 is coNP-complete. The graphs $\mathcal{G}$ , $\mathcal{G}_{1}$ , and $\mathcal{G}_{2}$ used in the proof of Theorem 3.2 are clearly not sparse, that is, for the size of the node set going to infinity there are nodes with unbounded connectivity (formal definitions of node connectivity are in Subsection 3.1 and Section B). In the next section, we will show that the hardness results remain true under certain sparsity assumptions. We include the proof of the non-sparse result in Theorem 3.2 to illustrate the technique as the more general result can be proved using a similar approach, even if some additional ideas are needed.

Mogensen and Hansen (2022) showed that deciding $\mu$ -separation Markov equivalence of so-called directed correlation graphs (cDGs) is coNP-complete, though only in the non-sparse case. Their proof of coNP-hardness uses a reduction from 3DNF tautology as does the proof of Theorem 3.2. However, their proof is specific to cDGs as it uses a characterization of Markov equivalence which holds in cDGs, but not in DMGs (Mogensen and Hansen, 2022). While a DMG represents the local independences of a partially observed multivariate stochastic process, i.e., some coordinate processes are unobserved, a cDG represents a multivariate stochastic process driven by correlated noise. Mogensen and Hansen (2022) compared DMGs and cDGs further and showed that a Markov equivalence class of cDGs need not have a greatest element.

Proof.

We consider $n$ Boolean variables, $x_{1},\ldots,x_{n}$ , and a Boolean formula, $H$ ,

[TABLE]

such that $z_{i}^{k}$ is a literal of a variable, that is, either $x_{l}$ (a positive literal) or $\neg x_{l}$ (a negative literal). We assume $H$ to be in 3DNF form (each conjunction has at most three literals). $N$ is the number of conjunctions in the formula and $n$ is the number of variables. We define $n_{j}$ to be the number of factors in the $j$ ’th conjunction. Deciding whether $H$ is a tautology (evaluates to true for all inputs) is known to be coNP-complete Garey and Johnson (1979) and we will use a reduction from this problem to show coNP-hardness of Decision problems A.1 and A.2.

We construct three graphs, $\mathcal{G}=(V,E)$ , $\mathcal{G}_{1}=(V,E_{1})$ , and $\mathcal{G}_{2}=(V,E_{2})$ from $H$ such that $\mathcal{G}_{1}=\mathcal{G}+e_{b}$ and $\mathcal{G}_{2}=\mathcal{G}+e_{d}$ where $e_{b}$ is a bidirected edge and $e_{d}$ is a directed edge. We then show that $\mathcal{G}$ and $\mathcal{G}_{1}$ are Markov equivalent if and only if $H$ is a tautology and that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent if and only if $H$ is a tautology.

First, we define the set $V^{-}$ .

[TABLE]

We define the node set $V=\{\alpha,\beta,\varepsilon,\phi\}\cup V^{-}\cup\{\nu_{\beta}^{\rho},\nu_{\varepsilon}^{\rho}\}_{\rho\in V^{-}}$ and $V$ is the node set of all three graphs $\mathcal{G}=(V,E)$ , $\mathcal{G}_{1}=(V,E_{1})$ , and $\mathcal{G}_{2}=(V,E_{2})$ . Note that each literal, $z_{i}^{k}$ , corresponds to two nodes, $\phi_{i}^{k}$ and $\bar{\phi}_{i}^{k}$ .

We now define the edge set $E$ . We add $\gamma\rightarrow\bar{\gamma}$ ; $\gamma\leftarrow\bar{\gamma}$ ; $\delta\rightarrow\bar{\delta}$ ; $\delta\leftarrow\bar{\delta}$ . For each node $\rho\in V^{-}$ , we add edges $\rho\rightarrow\nu_{\varepsilon}^{\rho},\nu_{\beta}^{\rho}$ and $\rho\leftarrow\nu_{\varepsilon}^{\rho},\nu_{\beta}^{\rho}$ . We also add edges $\varepsilon\leftrightarrow\nu_{\varepsilon}^{\rho}$ ; $\beta\leftrightarrow\nu_{\beta}^{\rho}$ . We add edges $\nu_{\varepsilon}^{\rho}\rightarrow\nu_{\beta}^{\rho}$ and $\nu_{\varepsilon}^{\rho}\rightarrow\nu_{\beta}^{\rho}$ for each $\rho\in V^{-}$ . We also add all directed and bidirected loops, $\rho\sim\rho$ , for all $\rho\in V$ . We add edges $\alpha\leftrightarrow\gamma,\bar{\gamma}$ ; $\varepsilon\leftrightarrow\bar{\delta}$ ; $\beta\leftrightarrow\delta$ , and $\varepsilon\rightarrow\beta$ ; $\varepsilon\leftarrow\beta$ as well as $\phi\leftrightarrow\varepsilon,\beta$ . For each $k=1,\ldots,N$ , we add $\gamma\leftrightarrow\phi_{1}^{k}\leftrightarrow\ldots\leftrightarrow\phi_{n_{k}}^{k}\leftrightarrow\delta$ and $\bar{\gamma}\leftrightarrow\bar{\phi}_{1}^{k}\leftrightarrow\ldots\leftrightarrow\bar{\phi}_{n_{k}}^{k}\leftrightarrow\bar{\delta}$ . We add $\bar{\gamma}\leftrightarrow\chi_{1},\lambda_{1}$ and $\bar{\delta}\leftrightarrow\chi_{n},\lambda_{n}$ . For each $i=1,\ldots,n-1$ , we add $\chi_{i},\lambda_{i}\leftrightarrow\chi_{i+1},\lambda_{i+1}$ . Finally, we add for each $l=1,\ldots,n$ a directed cycle containing $\chi_{l}$ as well as every $\phi_{i}^{k}$ and $\bar{\phi}_{i}^{k}$ corresponding to a positive literal of the variable $x_{l}$ , and we add a directed cycle containing $\lambda_{l}$ as well as every $\phi_{i}^{k}$ and $\bar{\phi}_{i}^{k}$ corresponding to a negative literal of the variable $x_{l}$ . This defines the edge set $E$ , $\mathcal{G}=(V,E)$ . We obtain $\mathcal{G}_{1}=(V,E_{1})$ from $\mathcal{G}$ by adding the edge $\varepsilon\leftrightarrow\beta$ , that is, $E_{1}=E\cup\{\varepsilon\leftrightarrow\beta\}$ . Note that $\rho_{1}$ is an ancestor of $\rho_{2}$ in $\mathcal{G}$ if and only if $\rho_{1}$ is an ancestor of $\rho_{2}$ in $\mathcal{G}_{1}$ . We obtain $\mathcal{G}_{2}=(V,E_{2})$ from $\mathcal{G}$ by adding the edge $\phi\rightarrow\varepsilon$ , $E_{2}=E\cup\{\phi\rightarrow\varepsilon\}$ .

We will first argue that $\mathcal{G}$ and $\mathcal{G}_{1}$ are Markov equivalent if and only if $H$ is a tautology. Assume first that $H$ is a tautology and consider a $\mu$ -connecting walk in $\mathcal{G}_{1}$ ,

[TABLE]

Using the fact that all loops are included, we can always find a $\mu$ -connecting walk such that the edge $\varepsilon\leftrightarrow\beta$ occurs at most once and we assume that this is the case. We can assume that $\rho_{1}$ only occurs once on the walk. If $\rho_{1}\neq\alpha$ , there is a $\mu$ -connecting walk from $\rho_{1}$ to $\beta$ with a head at $\beta$ : If $\rho_{1}\in V^{-}$ , or $\rho_{1}=\nu_{\varepsilon}^{\rho}$ for some $\rho\in V^{-}$ , either $\rho_{1}\rightarrow\nu_{\beta}^{\rho}\leftrightarrow\beta$ or $\rho_{1}\leftarrow\nu_{\beta}^{\rho}\leftrightarrow\beta$ is connecting and can be composed with the subwalk from $\beta$ to $\rho_{m}$ to obtain a connecting walk in $\mathcal{G}$ . If $\rho_{1}=\varepsilon,\beta,\phi$ or $\rho_{1}=\nu_{\beta}^{\rho}$ for some $\rho\in V^{-}$ , then $\rho_{1}\ *\!\!\rightarrow\beta$ is in $\mathcal{G}$ . Assume instead that $\rho_{1}=\alpha$ ,

[TABLE]

and consider the subwalk from $\alpha$ to $\varepsilon$ , $\omega_{1}$ . If there is a noncollider on $\omega_{1}$ , say $\psi$ , then $\psi\notin C$ and $\psi\in\mathrm{an}(C)$ . We use this to argue that we can always find a walk from $\psi$ to $\beta$ such that when concatenated with the subwalk from $\alpha$ to $\psi$ we obtain a $\mu$ -connecting walk from $\alpha$ to $\beta$ . If $\psi\in V^{-}$ , we can find a connecting walk from $\alpha$ to $\beta$ with a head at $\beta$ by concatenating the subwalk from $\alpha$ to $\psi$ with $\psi\rightarrow\nu_{\beta}^{\psi}\leftrightarrow\beta$ if $\nu_{\beta}^{\psi}\in C$ and $\psi\leftarrow\nu_{\beta}^{\psi}\leftrightarrow\beta$ if $\nu_{\beta}^{\psi}\notin C$ . If $\psi=\nu_{\varepsilon}^{\rho}$ for some $\rho$ , we can concatenate with $\psi\rightarrow\nu_{\beta}^{\rho}\leftrightarrow\beta$ or $\psi\leftarrow\nu_{\beta}^{\rho}\leftrightarrow\beta$ . If $\psi=\nu_{\beta}^{\rho}$ for some $\rho$ , we can concatenate with $\psi\leftrightarrow\beta$ . If $\psi=\varepsilon$ , then we can replace $\varepsilon\leftrightarrow\beta$ with $\varepsilon\rightarrow\beta$ to obtain a connecting walk in $\mathcal{G}$ . If $\psi=\beta$ , we can concatenate with $\psi\rightarrow\beta$ . If $\psi=\phi$ , we can concatenate with $\psi\leftrightarrow\beta$ . Finally, $\psi=\alpha$ is not possible as $\rho_{1}=\alpha$ only occurs once on the original walk.

Assume now that $\omega_{1}$ is a collider walk. If it goes through a $\bar{\phi}$ -segment, then the corresponding $\phi$ -segment is open (note that $\gamma$ and $\bar{\gamma}$ are in a directed cycle and so are $\delta$ and $\bar{\delta}$ ). If it goes through the $\chi$ - $\lambda$ -segment, then for each $l=1,\ldots,n$ either $\chi_{l}\in\mathrm{an}(C)$ or $\lambda_{l}\in\mathrm{an}(C)$ . Let $x_{l}=1$ if $\chi_{l}\in\mathrm{an}(C)$ and $x_{l}=0$ otherwise. The formula $H$ is a tautology and therefore it evaluates to $1$ under this assignment of truth values. Thus, there exists $k$ such that $z_{i}^{k}=1$ for $i=1,\ldots,n_{k}$ . Assume first that $z_{i}^{k}$ is a positive literal corresponding to the variable $x_{l}$ . In this case, $x_{l}=1$ and $\chi_{l}\in\mathrm{an}(C)$ , and therefore $\phi_{i}^{k}\in\mathrm{an}(C)$ . Assume instead that $z_{i}^{k}$ is a negative literal corresponding to the variable $x_{l}$ . In this case, $x_{l}=0$ and $\chi_{l}\notin\mathrm{an}(C)$ which means that $\lambda_{l}\in\mathrm{an}(C)$ and $\phi_{i}^{k}\in\mathrm{an}(C)$ . This means that the walk $\alpha\leftrightarrow\gamma\leftrightarrow\phi_{1}^{k}\leftrightarrow\ldots\leftrightarrow\phi_{n_{k}}^{k}\leftrightarrow\delta\leftrightarrow\beta$ is open for some $k=1,\ldots,N$ and this gives us a $\mu$ -connecting walk from $\alpha$ to $\rho_{m}$ in $\mathcal{G}$ also in this case.

If instead

[TABLE]

then the same arguments hold.

On the other hand, say that $H$ is not a tautology, and consider an assignment, $A$ , of truth values such that $H$ evaluates to false. Define the set

[TABLE]

In $\mathcal{G}_{1}$ , there is an open, bidirected walk from $\alpha$ to $\beta$ through the $\chi$ - $\lambda$ segment, and we see that $\beta$ is not $\mu$ -separated from $\alpha$ given $C$ . On the other hand, consider a walk between $\alpha$ and $\beta$ in $\mathcal{G}$ . The first and last edges on a connecting walk from $\alpha$ to $\beta$ given $C$ must be bidirected and as $C=\mathrm{an}(C)$ , this means that the walk must be a collider walk to be $\mu$ -connecting from $\alpha$ to $\beta$ given $C$ , and it must go through $\delta$ . If $\phi_{i}^{k}$ corresponds to a positive literal and it is open (i.e., in $\mathrm{an}(C)$ ) then the correspond variable is $1$ in $A$ and $z_{i}^{k}=1$ . If it corresponds to a negative literal and it is open, then the corresponding variable is [math] in $A$ and $z_{i}^{k}=1$ . This means that each $\phi_{i}^{k}$ segment must be closed in at least one node as the assignment $A$ evaluates to [math]. Therefore, $\beta$ is $\mu$ -separated from $\alpha$ given $C$ in $\mathcal{G}$ , and we conclude that $\mathcal{G}$ and $\mathcal{G}_{1}$ are Markov equivalent if and only if $H$ is a tautology.

We now show that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent. Take any $\mu$ -connecting walk in $\mathcal{G}_{1}$ . Any occurrence of $\varepsilon\leftrightarrow\beta$ can be replaced by either $\beta\leftrightarrow\phi\rightarrow\varepsilon$ or $\beta\leftrightarrow\phi\leftrightarrow\varepsilon$ , depending on whether $\phi\in C$ . The resulting walk is present and connecting in $\mathcal{G}_{2}$ . On the other hand, consider a $\mu$ -connecting walk from $\rho_{1}$ to $\rho_{m}$ given $C$ in $\mathcal{G}_{2}$ . We start by removing all non-endpoint occurrences of $\phi$ . Say

[TABLE]

If $\rho_{i}=\beta$ , then $\rho_{i}\leftrightarrow\phi\rightarrow\varepsilon$ can be replaced by $\rho_{i}\leftrightarrow\varepsilon$ . If $\rho_{i}=\phi$ or if $\rho_{i}=\varepsilon$ , we can remove the cycle ( $\varepsilon=\rho_{m}$ we may need to concatenate with $\varepsilon\rightarrow\varepsilon$ to obtain a $\mu$ -connecting walk after removing a cycle). If instead

[TABLE]

we do the same depending on $\rho_{j}$ (if $\phi=\rho_{m}$ then we concatenate the subwalk from $\rho_{1}$ to $\varepsilon$ with $\varepsilon\leftrightarrow\phi$ ). This gives us a $\mu$ -connecting walk in $\mathcal{G}_{2}$ such that $\phi$ is not a non-endpoint node. Finally, if $\phi\rightarrow\varepsilon$ is still on the walk $\phi$ , we must have $\rho_{1}=\psi$ and this edge can be substituted by $\phi\leftrightarrow\varepsilon$ . The resulting walk is present in $\mathcal{G}_{1}$ . Every collider is different from $\phi$ and this means that it is in $\mathrm{an}_{\mathcal{G}_{1}}(C)$ as well. Therefore, this walk is $\mu$ -connecting in $\mathcal{G}_{1}$ . It follows that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent (regardless of whether $H$ is a tautology). Therefore, $H$ is a tautology if and only if $\mathcal{G}$ are ${\mathcal{G}}_{2}$ Markov equivalent.

The reduction from 3DNF tautology to Markov equivalence of $\mathcal{G}$ and $\mathcal{G}_{1}$ (or of $\mathcal{G}$ and $\mathcal{G}_{2}$ ) is done in polynomial time in the number of conjunctions and it follows that Decision problems A.1 and A.2 are coNP-hard. Given a triple $(A,B,C)$ , one can decide $\mu$ -separation in polynomial time. If two graphs are not Markov equivalent, then there exists a triple $(A,B,C)$ such that $\mu$ -separation holds in one and not in the other. This is a polynomially-sized certificate, and this means that these problems are in coNP, thus, coNP-complete. ∎

Theorem 3.2 shows that deciding Markov equivalence is not computationally feasible for large graphs which hurts the practical applicability of $\mu$ -separation DMGs. We discuss the implications further in Subsection 3.2. We now consider the analogous decision problems in a sparse setting.

3.1 Sparse DMGs

We may ask if the hardness results still apply if we fix the maximal connectivity of each node and let the size of the node set grow. As a formalization of this, we first define a notion of node connectivity based on inseparability. We say that $\beta$ is inseparable from $\alpha$ in $\mathcal{I}(\mathcal{G})$ if there is no $C\subseteq V\setminus\{\alpha\}$ such that $\beta$ is $\mu$ -separated from $\alpha$ given $C$ in $\mathcal{G}$ (Mogensen et al., 2018). We let $\overset{{\scaleto{\rightarrow}{1.5pt}}}{u}(\beta,\mathcal{I}(\mathcal{G}))$ denote the set of nodes $\alpha$ such that $\beta$ is inseparable from $\alpha$ in $\mathcal{G}$ , and we let $\overset{{\scaleto{\leftarrow}{1.5pt}}}{u}(\beta,\mathcal{I}(\mathcal{G}))$ denote the set of nodes $\alpha$ such that $\alpha$ is inseparable from $\beta$ .

Definition 3.4 (Node connectivity in DMG).

We define $\mathrm{con}_{\mathcal{G}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ as the cardinality of the set $\overset{{\scaleto{\rightarrow}{1.5pt}}}{u}(\beta,\mathcal{I}(\mathcal{G}))$ and we define $\mathrm{con}_{\mathcal{G}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ as the cardinality of the set $\overset{{\scaleto{\leftarrow}{1.5pt}}}{u}(\beta,\mathcal{I}(\mathcal{G}))$ . We define $\mathrm{con}_{\mathcal{G}}(\beta)$ as the maximum of $\mathrm{con}_{\mathcal{G}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ and $\mathrm{con}_{\mathcal{G}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ .

We see that the above definitions are invariant under Markov equivalence, i.e., $\mathrm{con}_{\mathcal{G}_{1}}(\beta)=\mathrm{con}_{\mathcal{G}_{2}}(\beta)$ , $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)=\mathrm{con}_{\mathcal{G}_{2}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ , and $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)=\mathrm{con}_{\mathcal{G}_{2}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ when $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent. One can define other notions of node connectivity in a DMG, in particular based on the edges directly, instead of using separability. However, a DMG in which every node is adjacent with only a small number of nodes may be Markov equivalent with the complete DMG (see Figure 7). Even in a maximal DMG, the lack of an edge between a pair of nodes does not generally imply separability (Appendix B), and therefore connectivity based on separability appears to be a more useful notion of connectivity. Moreover, the graphs are intended as representations of stochastic systems, thus functional sparsity (i.e., sparsity in the implied dependence structure) seems more useful than representational sparsity (sparsity in node adjacency). Appendix B provides more details and examples.

Definition 3.5 ( $m$ -sparsity).

Let $\mathcal{G}$ be a DMG. The maximal connectivity of $\mathcal{G}$ is defined as $\max_{\alpha\in V}(\mathrm{con}_{\mathcal{G}}(\alpha))$ . We say that $\mathcal{G}=(V,E)$ is $m$ -sparse if $\max_{\alpha\in V}(\mathrm{con}_{\mathcal{G}}(\alpha))\leq m$ .

We now state a sparse version of Decision problem 3.1.

Decision problem 3.6 (Markov equivalence in $m$ -sparse DMGs).

Let $m$ be a nonnegative integer and let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ be $m$ -sparse DMGs. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ Markov equivalent?

The following are sparse versions of Theorem 3.2 and Corollary 3.3.

Theorem 3.7.

Let $m\geq 16$ , let $\mathcal{G}=(V,E)$ be an $m$ -sparse graph, and let $e$ denote an edge. Deciding Markov equivalence of $\mathcal{G}$ and $\mathcal{G}+e$ is coNP-complete (Decision problems A.3 and A.4).

Theorem 3.7 is a stronger version of Theorem 3.2 as it shows that the problem of deciding Markov equivalence of DMGs remains coNP-complete when restricting to sparse DMGs. We discuss the implications in Subsection 3.2.

Corollary 3.8.

Let $m\geq 16$ . Deciding Markov equivalence of $m$ -sparse DMGs is coNP-complete.

The value $m=16$ may not be what we expect from ‘sparse’ graphical models and two comments are in order. First, the adjacency sparsity (see Section B) of the graphs in the proof are only $8$ , also in the maximal Markov equivalent graphs of the graphs used in the proof. Second, the upshot of the corollary is that there exists a finite number such that deciding Markov equivalence of $m$ -sparse DMGs is coNP-complete. This means that fixing the value of $m$ does not generally lead to computational problems that scale as polynomials in the size of the graph. On the other hand, the so-called $k$ -weak equivalences that are introduced in this paper provide polynomial-time algorithms for each fixed $k$ (Section 7). Note that results analogous to those of Theorems 3.2 and 3.7 do not hold for ADMGs with $m$ -separation. For those, polynomial-time algorithms for Markov equivalence are known, without making sparsity assumptions (Hu and Evans, 2020).

Proof.

We consider a Boolean formula in 3DNF form as in the proof of Theorem 3.2 (see that proof for related notation and terminology). We will define three $m$ -sparse graphs $\mathcal{G}=(V,E)$ , $\mathcal{G}_{1}=(V,E_{1})$ , and $\mathcal{G}_{2}=(V,E_{2})$ and show that $\mathcal{G}$ and $\mathcal{G}_{1}$ are Markov equivalent if and only if $H$ is a tautology while $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are always Markov equivalent.

We define $M$ to be the smallest integer such that $2^{M-1}\geq N+1$ . We first define a number of sets that will be subsets of the node set $V$ . Note that these sets are all pairwise disjoint.

[TABLE]

The node $\chi_{l}$ corresponds to the Boolean variable $x_{l}$ and the node $\lambda_{l}$ corresponds to the negation of $x_{l}$ . Nodes $\phi_{i}^{j}$ and $\bar{\phi}_{i}^{j}$ both correspond to the literal $z_{i}^{j}$ (see also the proof of Theorem 3.2 for additional explanation). We define

[TABLE]

We now define the node set $V$ as a disjoint union,

[TABLE]

We add some intuition on the construction of the graph. The $\Gamma$ - and $\Delta$ -nodes (and their barred versions) are ‘triangular’ in shape and help connect a single node to many more in a sparse manner (see Figure 8). The $\Phi$ - and $\bar{\Phi}$ -nodes correspond to literals in the conjunctions of the Boolean formula, $H$ . The elements of $\mathrm{X}$ correspond to variables in $H$ , and the elements of $\Lambda$ to their negation. The $\nu_{\varepsilon}$ - and $\nu_{\beta}$ -components will help connect every node to $\varepsilon$ and to $\beta$ and are copies of the $V^{-}$ and $\bar{V}^{-}$ sets in the sense that $\rho\mapsto\nu_{\varepsilon}^{\rho}$ is a bijection from $V^{-}$ to $\mathrm{N}_{\varepsilon}$ , $\rho\mapsto\nu_{\beta}^{\rho}$ is a bijection from $V^{-}$ to $\mathrm{N}_{\beta}$ , ${\rho}\mapsto\bar{\nu}_{\varepsilon}^{{\rho}}$ is a bijection from $\bar{V}^{-}$ to $\bar{\mathrm{N}}_{\varepsilon}$ , and ${\rho}\mapsto\bar{\nu}_{\beta}^{{\rho}}$ is a bijection from $\bar{V}^{-}$ to $\bar{\mathrm{N}}_{\beta}$ , though the edges are not exact copies as explained below.

We now define the edge set of $\mathcal{G}$ . We add bidirected edges $\gamma_{ij}\leftrightarrow\gamma_{(i+1)(2j)},\gamma_{(i+1)(2j-1)}$ for $i=1,\ldots,M-1$ , and analogously for $\bar{\Gamma}$ , $\Delta$ , and $\bar{\Delta}$ (see Figure 8). Moreover, we add $\gamma_{Mj}\leftrightarrow\phi_{1}^{j}$ ; $\bar{\gamma}_{Mj}\leftrightarrow\bar{\phi}_{1}^{j}$ ; $\delta_{Mj}\leftrightarrow\delta_{n_{j}}^{j}$ ; $\bar{\delta}_{Mj}\leftrightarrow\bar{\delta}_{n_{j}}^{j}$ for $j\leq N$ . We also add $\bar{\gamma}_{M2^{M-1}}\leftrightarrow{\chi}_{1},{\lambda}_{1}$ ; $\bar{\delta}_{M2^{M-1}}\leftrightarrow\chi_{n},\lambda_{n}$ . We add $\alpha\leftrightarrow\gamma_{11},\bar{\gamma}_{11}$ . We also add $\varepsilon\leftrightarrow\bar{\delta}_{11}$ and $\beta\leftrightarrow{\delta}_{11}$ . We add $\varepsilon\rightarrow\beta$ and $\beta\rightarrow\varepsilon$ as well as $\phi\leftrightarrow\varepsilon,\beta$ . We add for each $j=1,\ldots,N$ , $\phi_{i}^{j}\leftrightarrow\phi_{i+1}^{j}$ and $\bar{\phi}_{i}^{j}\leftrightarrow\bar{\phi}_{i+1}^{j}$ for $1\leq i\leq n_{j}-1$ .

For $\phi_{1},\phi_{2}\in V^{-}$ such that $\phi_{1}\notin\Phi$ or $\phi_{2}\notin\Phi$ , we add $\nu_{\varepsilon}^{\phi_{1}}\leftrightarrow\nu_{\varepsilon}^{\phi_{2}}$ and $\nu_{\beta}^{\phi_{1}}\leftrightarrow\nu_{\beta}^{\phi_{2}}$ if and only if $\phi_{1}\leftrightarrow\phi_{2}$ was added above. For each $j$ , we also add ${\nu}_{\beta}^{\gamma_{Mj}}\leftrightarrow{\nu}_{\beta}^{\phi_{i}^{j}}\leftrightarrow{\nu}_{\beta}^{\delta_{Mj}}$ and $\bar{\nu}_{\varepsilon}^{\gamma_{Mj}}\leftrightarrow\bar{\nu}_{\varepsilon}^{\phi_{i}^{j}}\leftrightarrow\bar{\nu}_{\varepsilon}^{\delta_{Mj}}$ for each $i=1,\ldots,n_{j}$ . We also add $\nu_{\varepsilon}^{\delta_{11}}\leftrightarrow\varepsilon$ ; $\nu_{\beta}^{\delta_{11}}\leftrightarrow\beta$ ; $\bar{\nu}_{\varepsilon}^{\bar{\delta}_{11}}\leftrightarrow\varepsilon$ and $\bar{\nu}_{\beta}^{\bar{\delta}_{11}}\leftrightarrow\beta$ . Note that $\nu_{\varepsilon}^{\gamma_{11}}$ , $\nu_{\beta}^{\gamma_{11}}$ , $\bar{\nu}_{\varepsilon}^{\bar{\gamma}_{11}}$ , $\bar{\nu}_{\beta}^{\bar{\gamma}_{11}}$ are not adjacent with $\alpha$ . For $\phi_{1},\phi_{2}\in\bar{V}^{-}$ such that $\phi_{1}\notin\bar{\Phi}$ or $\phi_{2}\notin\bar{\Phi}$ , we add $\bar{\nu}_{\varepsilon}^{\phi_{1}}\leftrightarrow\bar{\nu}_{\varepsilon}^{\phi_{2}}$ and $\bar{\nu}_{\beta}^{\phi_{1}}\leftrightarrow\bar{\nu}_{\beta}^{\phi_{2}}$ if and only if $\phi_{1}\leftrightarrow\phi_{2}$ was added above. For each $j$ , we also add $\bar{\nu}_{\beta}^{\bar{\gamma}_{Mj}}\leftrightarrow\bar{\nu}_{\beta}^{\phi_{i}^{j}}\leftrightarrow\bar{\nu}_{\beta}^{\bar{\delta}_{Mj}}$ and $\bar{\nu}_{\varepsilon}^{\bar{\gamma}_{Mj}}\leftrightarrow\bar{\nu}_{\varepsilon}^{\phi_{i}^{j}}\leftrightarrow\bar{\nu}_{\varepsilon}^{\bar{\delta}_{Mj}}$ for each $i=1,\ldots,n_{j}$ .

In this proof, we will say that sets $V^{-},\bar{V}^{-},N_{\varepsilon},N_{\beta},\bar{N}_{{\varepsilon}}$ , and $\bar{N}_{{\beta}}$ are line segments. We define

[TABLE]

and we say that $V^{i}$ is a vertical segment for $i=-(M+1),M,\ldots,-1,0,1,\ldots,M,M+1$ . ‘Vertical’ refers to the specific visualization of $\mathcal{G}$ used in Figure 8. The sets, $V_{j}^{i}$ , defined above are disjoint and $\bigcup_{i=-(M+1)}^{M+1}V^{i}=V$ .

We now add a number of directed edges. For every node $\phi\in V^{-}$ , we add $\phi,\nu_{\varepsilon}^{\phi},\nu_{\beta}^{\phi}\rightarrow\phi,\nu_{\varepsilon}^{\phi},\nu_{\beta}^{\phi}$ . For every node $\phi\in\bar{V}^{-}$ , we add $\phi,\bar{\nu}_{\varepsilon}^{\phi},\bar{\nu}_{\beta}^{\phi}\rightarrow\phi,\bar{\nu}_{\varepsilon}^{\phi},\bar{\nu}_{\beta}^{\phi}$ . For each $i=\pm 1,\ldots,\pm M$ , we connect the nodes in the vertical segment $V^{i}$ by a directed cycle (any will work). We add directed cycles containing $\chi_{k}$ and all $\phi_{i}^{j}$ and $\bar{\phi}_{i}^{j}$ such that $z_{i}^{j}$ is a positive literal of the variable $x_{k}$ . We add directed cycles containing $\lambda_{k}$ and all $\phi_{i}^{j}$ and $\bar{\phi}_{i}^{j}$ such that $z_{i}^{j}$ is a negative literal of the variable $x_{k}$ .

Finally, we add all directed and bidirected loops. The above defines the edge set $E$ and we let $\mathcal{G}=(V,E)$ . Note that the nodes in a vertical segment are connected by a directed cyclic walk for $i\neq-(M-1),0,M+1$ . We also define $\mathcal{G}_{1}=(V,E_{1})$ where $E_{1}=E\cup\{\beta\leftrightarrow\varepsilon\}$ and $\mathcal{G}_{2}=(V,E_{2})$ where $E_{2}=E\cup\{\phi\rightarrow\varepsilon\}$ . Note that in all three graphs, if $\rho_{1}\sim_{e}\rho_{2}$ and $\rho_{1}$ and $\rho_{2}$ are in different vertical segments, $V^{i_{1}}$ and $V^{i_{2}}$ , respectively, then $e$ is bidirected and $i_{1}-i_{2}=\pm 1$ .

We will first show that $\mathcal{G}$ and $\mathcal{G}_{1}$ are Markov equivalent if and only if $H$ is a tautology. Assume first that $H$ is a tautology and consider a $\mu$ -connecting walk from $\rho_{1}$ to $\rho_{m}$ in $\mathcal{G}_{1}$ ,

[TABLE]

Every node has a self-loop, so it suffices to consider walks where $e_{1}$ (the edge $\varepsilon\leftrightarrow\beta$ ) only occurs once. If it does not occur at all the walk is present in $\mathcal{G}$ as well and connecting (ancestry is the same in $\mathcal{G}$ and $\mathcal{G}_{1}$ ). Say

[TABLE]

If $\rho\in V^{i}$ , then we say that $i$ is the order of $\rho$ .

Lemma 3.9.

Let $\rho\in V$ be of order $j$ . If there is an open walk from $\rho$ to $\beta$ given $C$ in $\mathcal{G}$ or in $\mathcal{G}_{1}$ then the $k$ ’th vertical segment , $j<k<M+1$ , contains at least one node in $C$ .

Proof.

If $j=M,M+1$ this is vacuously true as no vertical segment satisfies the condition, and we can assume that $\rho\neq\varepsilon,\beta,\phi$ . Note that this walk must necessarily pass through a collider in each vertical segment $V^{k}$ such that $k>j$ which gives the result. To see this, note that removing any vertical segment such that $k>j$ gives us a disconnected graph with $\rho$ in one component and $\beta$ in the other as a vertical segment, $k$ , is only adjacent to vertical segments $k-1$ and $k+1$ . When a walk contains a subwalk $\rho_{1}\sim\rho_{2}$ such that $\rho_{1}$ is in $V^{k-1}$ and $\rho_{2}$ is in $V^{k}$ , then the connecting edge must be bidirected. If $\rho_{2}$ is a collider, we must have $\rho_{2}\in\mathrm{an}_{\mathcal{G}}(C)$ and $\rho_{2}$ is only an ancestor of nodes in $V^{k}$ . Otherwise, $\rho_{2}$ is an ancestor of a collider in $V^{k}$ and the same argument applies. ∎

Lemma 3.10.

Let $\rho\neq\alpha$ be a node in $\mathcal{G}$ . If there exists an open walk from $\rho$ to $\beta$ in $\mathcal{G}_{1}$ with a head at $\beta$ , then there exists an open walk $\rho\sim\nu_{\beta}^{\rho}\sim\ldots\sim\beta$ in $\mathcal{G}$ with a head at $\beta$ such that every nonendpoint node equals $\nu_{\beta}^{\rho}$ for $\rho\in V^{-}$ or $\bar{\nu}_{\beta}^{\rho}$ for $\rho\in\bar{V}^{-}$ .

Proof.

If $\rho=\beta,\varepsilon,\phi$ , this is immediate. Assume instead that $\rho\in V^{-}\cup\bar{V}^{-}$ . Choose first the edge $\rho\leftarrow\nu_{\beta}^{\rho}$ if $\nu_{\beta}^{\rho}\in C$ , and otherwise $\rho\rightarrow\nu_{\beta}^{\rho}$ . We concatenate this with the open bidirected path to $\beta$ . Such a path exists as $\nu_{\beta}^{\gamma_{Mj}}\leftrightarrow\nu_{\beta}^{\delta_{Mj}}$ and $\bar{\nu}_{\beta}^{\bar{\gamma}_{Mj}}\leftrightarrow\bar{\nu}_{\beta}^{\bar{\delta}_{Mj}}$ . This is open since all vertical segments between $\rho$ and $\beta$ must contain at least one node which is in $C$ by Lemma 3.9.

If instead $\rho\in\mathrm{N}_{\varepsilon}\cup\bar{\mathrm{N}}_{\varepsilon}$ we can do as above as $\rho\leftarrow\nu_{\beta}^{\rho}$ and $\rho\rightarrow\nu_{\beta}^{\rho}$ are in the graph. If $\rho\in\mathrm{N}_{\beta}\cup\bar{\mathrm{N}}_{\beta}$ , then there is an open bidirected path with a head at $\beta$ between $\rho$ and $\beta$ . If $\rho=\varepsilon$ or $\rho=\beta$ it follows directly. ∎

We split into cases depending on whether $\rho_{1}=\alpha$ .

$\rho_{1}\neq\alpha$ :

There is an open walk (given $C$ ) from $\rho_{1}$ with a head at $\beta$ (Lemma 3.10) that we can concatenate with $\omega_{2}$ to obtain a connecting walk in $\mathcal{G}$ .

If instead

[TABLE]

the same argument holds.

$\rho_{1}=\alpha$ :

If we have a subwalk between $\alpha$ and $\beta$ with a noncollider, then we can find a connecting path in the following way. Say we have

[TABLE]

such that $\psi_{1}$ is a noncollider (note that, ignoring $\alpha\rightarrow\alpha$ , $\alpha$ only has bidirected edges at it, so $\psi_{1}\neq\alpha$ if we remove $\alpha$ -loops). There is necessarily a tail at $\psi_{1}$ on one of the adjacent edges, $\psi_{1}\notin C$ , and $\psi_{1}\in\mathrm{an}(C)$ . We concatenate the subwalk from $\alpha$ to $\psi_{1}$ with the open walk from $\psi_{1}$ to $\beta$ that has a head at $\beta$ . Lemma 3.10 gives the existence of this walk. This also holds if $\rho_{1}=\psi_{0}$ , $\psi_{2}=\varepsilon$ , or $\psi_{1}=\varepsilon$ .

On the other hand, if the subwalk between $\alpha$ and $\beta$ has no noncolliders, then either it stays within a line segment or either $\alpha$ , $\beta$ , or $\varepsilon$ occur on the subwalk as a nonendpoint. We can assume that $\alpha$ is only an endpoint. If $\beta$ occurs as a nonendpoint, then this $\beta$ is a collider and this means that there is an open subwalk from $\alpha$ to $\beta$ with a head at $\beta$ which we can concatenate with $\omega_{2}$ . If $\varepsilon$ is a collider (other than right before the final $\beta$ ), then we can remove the cycle from $\varepsilon$ to $\varepsilon$ from the walk. In any case, we can find a connecting collider walk in $\mathcal{G}_{1}$ (no noncolliders) such that $\alpha$ , $\beta$ , and $\varepsilon$ will each occur once. This means that the subwalk only contains nodes from a single line segment. This segment cannot be $N_{\varepsilon}$ , $N_{\beta}$ , $\bar{N}_{\varepsilon}$ , nor $\bar{N}_{\beta}$ as $\alpha$ is not adjacent with any node in these line segments. If the walk only intersects the $V^{-}$ -line segment, then it must either go through $\Phi$ -nodes or the $\mathrm{X}\cup\Lambda$ -nodes, not both, as it has no noncolliders (or such a walk can be found). If it does not visit any $\chi$ - or $\lambda$ -nodes, then there is an open walk in the $\bar{\Gamma}\cup\bar{\Phi}\cup\bar{\Delta}$ -segment (the analogous walk through the barred versions). Finally, assume it does not visit any $\Phi$ -nodes. As $H$ is a tautology, there is also a conjunction segment in $\bar{\Phi}$ which is open and connecting from $\alpha$ to $\beta$ with a head at $\beta$ . If instead the bidirected walk is in $\bar{V}^{-}$ , the result follows, and if $\varepsilon$ and $\beta$ occur in the opposite order on the original $\mu$ -connecting walk, we can use similar arguments.

If the formula is not a tautology, let $A$ be an assignment of values such that the formula evaluates to false. We then consider the set

[TABLE]

We also define $C=\mathrm{an}(C^{-})\cup\{\beta,\delta\}$ . We see immediately that $\beta$ is not $\mu$ -separated from $\alpha$ given $C$ in $\mathcal{G}_{1}$ as the $\chi-\lambda$ -segment contains an open path from $\alpha$ to $\varepsilon$ with a head at $\varepsilon$ and furthermore $\varepsilon\leftrightarrow\beta$ is in the graph. On the other hand, consider a potential $\mu$ -connecting walk from $\alpha$ to $\beta$ in $\mathcal{G}$ . If $\varepsilon$ is on the walk, it can only return to $\alpha$ . It cannot go between bidirected components because the directed cycles are either completely contained in $C$ or in its complement. It cannot go through a $\phi$ -component because of the choice of $A$ , and we conclude that it cannot be $\mu$ -connecting. In conclusion, $\mathcal{G}$ and $\mathcal{G}_{1}$ are Markov equivalent if and only if $H$ is a tautology.

The arguments in the proof of Theorem 3.2 show that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are Markov equivalent. Arguments similar to those in the proof of Theorem 3.2 furthermore show that Decision problems A.3 and A.4 are coNP-complete.

Careful examination of the graphs reveals that all three are $16$ -sparse. ∎

One should note that the graphs in the proof of Theorem 3.7 could also be interpreted as $\delta$ -separation graphs (Didelez, 2008). In this case, the result also holds, i.e., determining $\delta$ -separation Markov equivalence of sparse DMGs is also coNP-complete. To see this one should simply note that $\mu$ -separation Markov equivalence implies $\delta$ -separation Markov equivalence and that the conditioning set used in the proof when $H$ is not a tautology contains $\beta$ . The hardness result in the $\delta$ -separation case then follows from the (A.1) property of the supplementary material of Mogensen and Hansen (2020) and from noting that the latent projection technique can also be used for $\delta$ -separation.

Richardson (1997) studied DGs under $d$ -separation and gave an example of ‘nonlocality’ in this setting. The example consisted of a sequence of pairs of graphs, $\mathcal{D}_{n}^{1}$ and $\mathcal{D}_{n}^{2}$ , such that $\mathcal{D}_{n}^{1}$ and $\mathcal{D}_{n}^{2}$ are not Markov equivalent, but the only separation on which the graphs disagree involves nodes that are arbitrarily far apart (for increasing values of $n$ ). Our setting is quite different, however, DMGs under $\mu$ -separation do exibit the same ‘nonlocality’ as seen from the proof of Theorem 3.7. Say that $H$ is not a tautology, in which case $\mathcal{G}$ and $\mathcal{G}_{1}$ in the proof of Theorem 3.7 are not Markov equivalent. From the proof, it follows that the graphs only disagree on triples $(A,B,C)$ such that $\alpha\in A$ and $\beta\in B$ , and this means that the proof (for non-tautological $H$ of increasing size) gives a sequence of pairs of graphs that only disagree on $\mu$ -separation of a pair of nodes, $\alpha$ and $\beta$ , that are arbitrarily far from each other as measured by the shortest path between $\alpha$ and $\beta$ . Note that this also holds in the maximal Markov equivalent graphs of $\mathcal{G}$ and $\mathcal{G}_{1}$ , and it is therefore not due to non-maximality.

3.2 Implications of hardness results

The hardness results have several implications that we will outline in this section, in particular, we argue that several other computational problems are also hard in $\mu$ -separation DMGs.

Every Markov equivalence class has a greatest element (Mogensen and Hansen, 2020), and one can decide if two DMGs are Markov equivalent by computing the greatest Markov equivalent graph for each of them and compare. This means that finding such a greatest element is also hard. There are similar implications for oracle learning algorithms. A (local independence) oracle is an abstract function which a learning algorithm may query and which, when provided with a triple $(A,B,C)$ , outputs whether the corresponding local independence holds or not. The oracle gives the correct answer, but when using real data, the oracle has to be replaced by hypothesis tests of local independence, and the purpose of the oracle formalism is simply to separate the algorithmic aspects from the hypothesis testing. If we assume that there exists a constraint-based learning algorithm which can recover a unique representative of the Markov equivalence class (say the greatest element, or some other uniquely defined representative) of the true graph from when given access to a local independence oracle, then using this algorithm, one can decide Markov equivalence by querying the $\mu$ -separation models of the graphs. This is done by testing $\mu$ -separation in the graph and each test is done in polynomial time (Mogensen, 2020b). If only a polynomial number of queries are required we could also solve Markov equivalence in polynomial time by comparing the output for two graphs. Again, this means that such a learning algorithm would need an exponential number of tests.

3.2.1 Sparse DMGs

All of the above holds even if we are willing to assume that all graphs are somewhat sparse ( $m$ -sparse, $m\geq 16$ ). This means that a restriction to sparse graphs will not remedy this. This is also different from DAG-based models in the following sense. In partially observed DAGs, we may learn a graphical representation of the equivalence class using tests of conditional independence. If we fix $m$ such that the node degree is less than $m$ , this can be done in polynomial time (Claassen et al., 2013).

These hardness results motivate the second part of this paper. Instead of requiring sparsity of the DMGs, we will reinterpret them to obtain a weaker type of equivalence. Essentially, the DMGs are too expressive leading to the above infeasibility results in connection to their Markov equivalence classes. We can avoid this by considering a weaker type of equivalence. This leads to a simple and useful theory and to practical graph learning algorithms as we will see in subsequent sections.

4 Weak equivalence

In this section, we introduce a notion of weak equivalence and argue that it provides a computationally feasible notion of equivalence of DMGs. Under a regularity condition, the associated equivalence classes each have a greatest element and this leads to a simple graphical theory.

4.1 Classes of weak equivalence

We define three types of equivalence in this section and present them in decreasing order of generality. They each limit the set of triples, $(A,B,C)$ , that are used to distinguish between independence models represented by DMGs.

4.1.1 General weak equivalence

If $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ are Markov equivalent, then $(A,B,C)\in\mathcal{I}(\mathcal{G}_{1})$ if and only if $(A,B,C)\in\mathcal{I}(\mathcal{G}_{2})$ for all $A,B,C\subseteq V$ . This means that Markov equivalence requires the independence models of $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ to agree on all triplets in the set $\mathcal{P}=\{(A,B,C):A,B,C\subseteq V\}$ . A very general approach to defining weaker notions of equivalence is to only compare the independence models on a subset of $\mathcal{P}$ .

Definition 4.1 (General weak equivalence).

Let $\mathcal{J}\subseteq\{(A,B,C):A,B,C\subseteq V\}$ . We say that $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ are $\mathcal{J}$ -weakly equivalent if

[TABLE]

We use $\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})$ to denote the $\mathcal{J}$ -weak independence model induced by $\mathcal{G}_{1}$ , $\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})=\mathcal{I}(\mathcal{G}_{1})\cap\mathcal{J}$ . We use $[\mathcal{G}_{1}]_{\mathcal{J}}$ to denote the $\mathcal{J}$ -weak equivalence class of $\mathcal{G}_{1}$ , that is, the set of graphs, $\mathcal{G}=(V,E)$ , such that $\mathcal{I}_{\mathcal{J}}(\mathcal{G})=\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})$ .

Proposition 4.2.

Let $\mathcal{J}\subseteq\mathcal{P}$ and let $V$ be a finite set. Definition 4.1 defines an equivalence relation on the set of DMGs with node set $V$ .

Proof.

Let $\mathcal{G}$ be a DMG. We see that $\mathcal{G}$ is $\mathcal{J}$ -weakly equivalent with itself such that the relation is reflexive. The relation is also symmetric and transitive. ∎

The next statement follows directly from the definition of weak equivalence.

Proposition 4.3.

Let $\mathcal{J}_{1}\subseteq\mathcal{J}_{2}\subseteq\mathcal{P}$ and let $\mathcal{G}$ be a DMG. It holds that $\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G})\subseteq\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G})$ .

A Markov equivalence class has a greatest element. However, a $\mathcal{J}$ -weak equivalence class does not necessarily have a greatest element as illustrated by the following example.

Example 4.4.

We consider the graph, $\mathcal{G}$ , in Figure 9 with all loops included as well. We define the set $\mathcal{J}\subseteq\mathcal{P}$ ,

[TABLE]

We also define three other graphs from $\mathcal{G}=(V,E)$ , $\mathcal{G}_{i}=(V,E_{i})$ , where $i=1,2,3$ , and

[TABLE]

Graphs $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are both $\mathcal{J}$ -weakly equivalent with $\mathcal{G}$ which can be seen from simply listing their $\mathcal{J}$ -weak independence models.

We see that $(1,5,\{2,3,4,5\})\in\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ , but $(1,5,\{2,3,4,5\})\notin\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{3})$ which means that $\mathcal{G}$ and $\mathcal{G}_{3}$ are not $\mathcal{J}$ -weakly equivalent. We have that $\mathcal{G}_{1},\mathcal{G}_{2}\in[\mathcal{G}]_{\mathcal{J}}$ , and a greatest element of $[\mathcal{G}]_{\mathcal{J}}$ must be a supergraph of both $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ , and therefore of $\mathcal{G}_{3}$ . If $\mathcal{N}$ is a supergraph of $\mathcal{G}_{3}$ , then $\mathcal{I}_{\mathcal{J}}(\mathcal{N})\subseteq\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{3})\subsetneq\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ , and we conclude that the $\mathcal{J}$ -weak equivalence class of $\mathcal{G}$ does not contain a greatest element.

Let $\mathcal{J}\subseteq\mathcal{P}$ . If two graphs are Markov equivalent, they are of course also equivalent when restricting to comparisons on the set $\mathcal{J}$ . Therefore, every graph is also weakly equivalent with the unique, maximal graph of its Markov equivalence class. However, the above example shows that when considering a general $\mathcal{J}$ -weak equivalence, an equivalence class need not have a greatest element as the maximal Markov equivalent graph need not be a greatest element of the larger weak equivalence class. This leads us to introducing the notion of a homogeneous weak equivalence by imposing a regularity condition on the set $\mathcal{J}$ . The equivalence classes of a homogeneous weak equivalence relation do indeed contain a greatest element (Section 5).

4.1.2 Homogeneous weak equivalence

We define homogeneous equivalence relation to obtain well-behaved equivalence classes.

Definition 4.5 (Homogeneous equivalence).

Consider some weak equivalence induced by $\mathcal{J}\subseteq\mathcal{P}$ . We say that this equivalence is homogeneous if there exists a set $\mathcal{C}$ , $\mathcal{C}\subseteq\{C:C\subseteq V\}$ , such that

[TABLE]

In this case, we will also say that the set $\mathcal{J}$ is homogeneous and we will say that $\mathcal{C}$ is the collection of conditioning sets of $\mathcal{J}$ .

In other words, a homogeneous equivalence relation is one that restricts only the set of conditioning sets, $C$ . That is, if $\mathcal{J}$ is homogeneous, then $\mathcal{J}$ -weak equivalence of $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ means that for all $A,B\subseteq V$ and $C\in\mathcal{C}$ we have $(A,B,C)\in\mathcal{I}(\mathcal{G}_{1})$ if and only if $(A,B,C)\in\mathcal{I}(\mathcal{G}_{2})$ where $\mathcal{C}$ is some collection of subsets of $V$ . Therefore, the restriction of the independence model imposed by a homogeneous $\mathcal{J}$ only applies to the conditioning sets.

4.1.3 $k$ -weak equivalence

We will now introduce a certain type of homogeneous equivalence which simply restricts the size of the conditioning sets.

Definition 4.6 ( $k$ -weak equivalence).

Let $0\leq k\leq n$ . We say that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $k$ -weakly equivalent if for all $C$ such that $|C|\leq k$ , it holds that $(A,B,C)\in\mathcal{I}(\mathcal{G}_{1})$ if and only if $(A,B,C)\in\mathcal{I}(\mathcal{G}_{2})$ .

The above is formulated slightly differently than Definitions 4.1 and 4.5, however, $k$ -weak equivalence is a homogeneous weak equivalence relation by using the set $\mathcal{C}=\{C\subseteq V:|C|\leq k\}$ in Definition 4.5. On the other hand, not all homogeneous equivalences correspond to a $k$ -weak equivalence. We see that $k$ -weak equivalence only compares graphs using ‘small’ conditioning sets of size less than $k$ and that Markov equivalence is the same as $n$ -weak equivalence.

For $\mathcal{G}_{1}=(V,E_{1})$ , we use $\mathcal{I}_{k}(\mathcal{G}_{1})$ to denote the $k$ -weak independence model of $\mathcal{G}_{1}$ , $\mathcal{I}_{k}(\mathcal{G}_{1})=\{(A,B,C)\in\mathcal{I}(\mathcal{G}_{1}),|C|\leq k\}$ . We let $[\mathcal{G}]_{k}$ denote the set of graphs on nodes $V$ that are $k$ -weakly equivalent with $\mathcal{G}$ , and we say that $[\mathcal{G}]_{k}$ is the $k$ -weak equivalence class of $\mathcal{G}$ . When $k=n$ , we also use $\mathcal{I}(\mathcal{G})$ , that is, $\mathcal{I}(\mathcal{G})=\mathcal{I}_{n}(\mathcal{G})$ .

4.2 Properties of weak equivalence

This section describes some properties of weak equivalence and weak equivalence classes. Throughout the section $\mathcal{J}$ is a subset of $\mathcal{P}=\{(A,B,C):A,B,C\subseteq V\}$ . For Markov equivalence, it holds that $\mathcal{G}_{1}\subseteq\mathcal{G}_{2}$ implies $\mathcal{I}(\mathcal{G}_{2})\subseteq\mathcal{I}(\mathcal{G}_{1})$ which follows from the definition of $\mu$ -separation. This is quite natural as a larger graph has more edges, therefore fewer independences. The same holds for weak equivalence classes as shown by the next proposition.

Proposition 4.7.

If $\mathcal{G}_{1}\subseteq\mathcal{G}_{2}$ , then $\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})\subseteq\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})$ .

Proof.

If $(A,B,C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})$ then $(A,B,C)\in\mathcal{I}(\mathcal{G}_{2})$ and $(A,B,C)\in\mathcal{J}$ , and therefore $(A,B,C)\in\mathcal{I}(\mathcal{G}_{1})$ . This means that $(A,B,C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})$ . ∎

Proposition 4.8 (Well-ordered $\mathcal{J}$ -classes).

Let $\mathcal{J}_{1}\subseteq\mathcal{J}_{2}\subseteq\mathcal{P}$ . If $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $\mathcal{J}_{2}$ -weakly equivalent, then they are also $\mathcal{J}_{1}$ -weakly equivalent.

Proof.

Let $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G}_{1})$ , then $(A,B,C)\in\mathcal{I}(\mathcal{G}_{1})$ and $(A,B,C)\in\mathcal{J}_{1}$ . Therefore $(A,B,C)\in\mathcal{J}_{2}$ and $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G}_{1})=\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G}_{2})$ . It follows that $(A,B,C)\in\mathcal{I}(\mathcal{G}_{2})$ and $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G}_{2})$ . Interchanging the roles of $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ and repeating the argument gives the result. ∎

From the above, we also see that $\mathcal{J}_{1}\subseteq\mathcal{J}_{2}$ implies $[\mathcal{G}]_{\mathcal{J}_{2}}\subseteq[\mathcal{G}]_{\mathcal{J}_{1}}$ . The next corollary follows directly from the above proposition.

Corollary 4.9 (Well-ordered $k$ -classes).

Let $0\leq k_{1}\leq k_{2}\leq n$ . If $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $k_{2}$ -weakly equivalent, then they are also $k_{1}$ -weakly equivalent.

Definition 4.10.

We say that $\mathcal{J}$ is singleton stable if for all $A,B,C\subseteq V$ , $(A,B,C)\in\mathcal{J}$ implies that $(\alpha,\beta,C)\in\mathcal{J}$ for all $\alpha\in A$ and $\beta\in B$ .

Note that the requirement is only on the $A$ - and $B$ -sets, not the $C$ -set. If $\mathcal{J}$ is homogeneous and $(A,B,C)\in\mathcal{J}$ , then $(\bar{A},\bar{B},C)\in\mathcal{J}$ for all $\bar{A},\bar{B}\subseteq V$ , thus a homogeneous $\mathcal{J}$ is also singleton stable. The following proposition shows, for a singleton stable $\mathcal{J}$ , the independence model $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ is characterized by the independences $(A,B,C)$ where $A$ and $B$ are singletons and $A$ and $C$ are disjoint. This proof uses the fact that $\mu$ -separation models satisfy so-called left and right composition as well as left and right decomposition which are asymmetric graphoid properties (Didelez, 2006; Mogensen et al., 2018). These are similar to classical graphoid properties (Lauritzen, 1996), but left and right version are needed due to the lack of symmetry.

Proposition 4.11.

Let $\mathcal{J}$ be singleton stable, let $V$ be a finite set and let $\mathcal{S}=\{(A,B,C)\in\mathcal{P}:|A|=|B|=1,A\cap C=\emptyset\}$ . If $\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})\cap\mathcal{S}\subseteq\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})\cap\mathcal{S}$ , then $\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})\subseteq\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})$ .

Without the assumption of singleton stability, this above statement is not true. For instance, if $\mathcal{J}\cap\mathcal{S}=\emptyset$ , then $\mathcal{I}(\mathcal{G}_{1})_{\mathcal{J}}\cap\mathcal{S}\subseteq\mathcal{I}(\mathcal{G}_{2})_{\mathcal{J}}\cap\mathcal{S}$ is trivially true for any pair of graphs.

Proof.

Let $(A,B,C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})$ . If $A$ or $B$ is empty, then it follows immediately that $(A,B,C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})$ . Assume that $A$ and $B$ are both nonempty. We can write $A=\{\alpha_{1},\ldots,\alpha_{n_{a}}\}$ and $B=\{\beta_{1},\ldots,\beta_{n_{b}}\}$ . From the definition of $\mu$ -separation and using singleton stability of $\mathcal{J}$ it follows that $(\alpha_{i},\beta_{j},C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{1})$ for all $i=1,\ldots,n_{a}$ and $j=1,\ldots,n_{b}$ . Therefore $(\alpha_{i},\beta_{j},C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})$ for all $i=1,\ldots,n_{a}$ and $j=1,\ldots,n_{b}$ (if $\alpha_{i}\in C$ , then it holds trivially). From the definition of $\mu$ -separation, $(A,B,C)\in\mathcal{I}(\mathcal{G}_{2})$ and therefore also $(A,B,C)\in\mathcal{I}_{\mathcal{J}}(\mathcal{G}_{2})$ . ∎

Proposition 4.12 (Maximality).

The graph $\mathcal{G}=(V,E)\in[{\mathcal{G}}_{1}]_{\mathcal{J}}$ is maximal in $[{\mathcal{G}}_{1}]_{\mathcal{J}}$ if and only if it is complete or if $\mathcal{G}+e\notin[{\mathcal{G}}_{1}]_{\mathcal{J}}$ for all edges $e$ such that $e\notin E$ .

When $\mathcal{G}=(V,E)\in[{\mathcal{G}}_{1}]_{\mathcal{J}}$ is maximal in $[{\mathcal{G}}_{1}]_{\mathcal{J}}$ , then we also say that $\mathcal{G}$ is $\mathcal{J}$ -maximal (the equivalence class is implicit as a graph can only be maximal in its own equivalence class). A graph is $\mathcal{J}$ -maximal if the addition of any edge will change the $\mathcal{J}$ -weak independence model.

Proof.

If $\mathcal{G}$ is complete, then it is clearly maximal. If $\mathcal{G}\subsetneq{\mathcal{G}}_{2}$ , then $\mathcal{G}\subsetneq\mathcal{G}+e\subseteq{\mathcal{G}}_{2}$ for some $e\notin E$ . We have $\mathcal{I}_{\mathcal{J}}({\mathcal{G}}_{2})\subseteq\mathcal{I}_{\mathcal{J}}({\mathcal{G}}+e)$ and $\mathcal{I}_{\mathcal{J}}({\mathcal{G}}+e)\subsetneq\mathcal{I}_{\mathcal{J}}({\mathcal{G}})=\mathcal{I}_{\mathcal{J}}({\mathcal{G}}_{1})$ and therefore ${\mathcal{G}}_{2}\notin[{\mathcal{G}}_{1}]_{\mathcal{J}}$ .

On the other hand, assume that $\mathcal{G}$ is maximal, and that $\mathcal{G}$ is not complete. It follows from the definition of maximality that $\mathcal{G}+e\notin[\mathcal{G}]_{\mathcal{J}}$ for all $e\notin E$ . ∎

If $\mathcal{G}_{1}\subseteq\mathcal{G}_{2}$ then $\mathcal{I}(\mathcal{G}_{2})\subseteq\mathcal{I}(\mathcal{G}_{1})$ (Proposition 4.7). One may ask if $\mathcal{I}(\mathcal{G}_{2})\subseteq\mathcal{I}(\mathcal{G}_{1})$ implies $\mathcal{G}_{1}\subseteq\mathcal{G}_{2}$ . The next example shows that this is not the case, also not for maximal graphs.

Example 4.13.

We consider two graphs, $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ as shown in Figure 10 (both graphs also have all directed and bidirected loops). Let $\mathcal{S}=\{(A,B,C):|A|=|B|=1,A\cap C=\emptyset\}$ . Then $\mathcal{I}(\mathcal{G}_{1})\cap\mathcal{S}$ equals

[TABLE]

$\mathcal{I}(\mathcal{G}_{2})\cap\mathcal{S}$ * equals*

[TABLE]

and therefore it is a subset of $\mathcal{I}(\mathcal{G}_{1})\cap\mathcal{S}$ . Markov equivalence corresponds to $\mathcal{J}$ -weak equivalence with $\mathcal{J}=\mathcal{P}$ , and by Proposition 4.11, $\mathcal{I}(\mathcal{G}_{2})\subseteq\mathcal{I}(\mathcal{G}_{1})$ . Both graphs are maximal which means that $2\rightarrow 1$ cannot be added to $\mathcal{G}_{2}$ Markov equivalently. This illustrates that $\mathcal{I}(\mathcal{G}_{2})\subseteq\mathcal{I}(\mathcal{G}_{1})$ does not imply $\mathcal{G}_{1}\subseteq\mathcal{G}_{2}$ , not even if $\mathcal{G}_{2}$ is maximal.

Proposition 4.14.

Let $\mathcal{J}_{1}\subseteq\mathcal{J}_{2}$ . If $\mathcal{G}$ is $\mathcal{J}_{1}$ -maximal, then it is also $\mathcal{J}_{2}$ -maximal.

Proof.

If $\mathcal{G}$ is complete, then it is also $\mathcal{J}_{2}$ -maximal. Assume instead that $\mathcal{G}=(V,E)$ is not complete and $e\notin E$ . $\mathcal{G}$ is $\mathcal{J}_{1}$ -maximal, so $\mathcal{G}+e\notin[\mathcal{G}]_{\mathcal{J}_{1}}$ (Proposition 4.12). Using Proposition 4.7, there exist a triple $(A,B,C)$ such that $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G})$ and $(A,B,C)\notin\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G}+e)$ and therefore $(A,B,C)\notin\mathcal{I}(\mathcal{G}+e)$ . We see that $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G})$ and $(A,B,C)\notin\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G}+e)$ . It follows that $\mathcal{G}$ is $\mathcal{J}_{2}$ -maximal (Proposition 4.12). ∎

We say that a graph, $\mathcal{G}$ , is $k$ -maximal if is $\mathcal{J}$ -maximal for $\mathcal{J}=\{(A,B,C)\in\mathcal{P}:|C|\leq k\}$ which means that $\mathcal{J}$ induces a $k$ -weak equivalence relation.

Corollary 4.15.

Let $0\leq k_{1}\leq k_{2}\leq n$ . If $\mathcal{G}$ is $k_{1}$ -maximal, then it is also $k_{2}$ -maximal.

In particular, if a graph is $k$ -maximal for some $k\leq n$ , then it is also the unique maximal element in its Markov equivalence class.

Proposition 4.16 (Minimality).

The graph $\mathcal{G}=(V,E)\in[{\mathcal{G}}_{1}]_{\mathcal{J}}$ is minimal in $[{\mathcal{G}}_{1}]_{\mathcal{J}}$ if and only if it is empty or if $\mathcal{G}-e\notin[{\mathcal{G}}_{1}]_{\mathcal{J}}$ for all edges such that $e\in E$ .

Proof.

If it is empty, then it is clearly also minimal. Otherwise, let ${\mathcal{G}_{2}}\subsetneq\mathcal{G}$ . We have $\mathcal{I}_{\mathcal{J}}(\mathcal{G})\subsetneq\mathcal{I}_{\mathcal{J}}(\mathcal{G}-e)\subseteq\mathcal{I}_{\mathcal{J}}({\mathcal{G}}_{2})$ for $e\in E$ (Proposition 4.7). Therefore, ${\mathcal{G}}_{2}\notin[\mathcal{G}]_{\mathcal{J}}$ .

If $\mathcal{G}$ is minimal in $[{\mathcal{G}}_{1}]_{\mathcal{J}}$ , then it is either the empty graph, or for all $e\in E$ , $\mathcal{G}-e\notin[{\mathcal{G}}_{1}]_{\mathcal{J}}$ by definition of minimality. ∎

Proposition 4.17.

Let $\mathcal{J}_{1}\subseteq\mathcal{J}_{2}$ . If $\mathcal{G}$ is $\mathcal{J}_{1}$ -minimal, then it is also $\mathcal{J}_{2}$ -minimal.

The proposition states that the property of being minimal is preserved when considering a larger set of independences. An equivalence class is finite and nonempty, hence, it always contains a maximal element and a minimal element. We will show later that it also contains a greatest element. However, a least element need not exist and Example 6.2 provides an example of this.

Proof.

If $\mathcal{G}=(V,E)$ is empty, then it is also $\mathcal{J}_{2}$ -minimal. Assume instead that $e\in E$ . There exists a triple $(A,B,C)$ such that $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G}-e)$ and $(A,B,C)\notin\mathcal{I}_{\mathcal{J}_{1}}(\mathcal{G})$ . Then $(A,B,C)\in\mathcal{J}_{1}$ and therefore $(A,B,C)\in\mathcal{J}_{2}$ . It follows that $(A,B,C)\in\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G}-e)$ and $(A,B,C)\notin\mathcal{I}_{\mathcal{J}_{2}}(\mathcal{G})$ . As this holds for all $e\in E$ , we see that $\mathcal{G}$ is $\mathcal{J}_{2}$ -minimal (Proposition 4.16). ∎

4.2.1 Marginalization

We say that a class of graphs, $\mathbb{G}$ , is closed under marginalization if for every $\mathcal{G}=(V,E)\in\mathbb{G}$ and every $O\subseteq V$ there exists $\mathcal{M}=(O,E_{O})\in\mathbb{G}$ such that for every $A,B,C\subseteq O$ ,

[TABLE]

where $\mathcal{I}_{\star}(\mathcal{G})$ is the independence model induced by $\mathcal{G}$ . When $\mathbb{G}$ is the class of DMGs, $\mathcal{I}_{\star}(\cdot)$ could for instance be a $\mathcal{J}$ -weak independence model. Appendix C shows that DMGs with weak equivalence are closed under marginalization. This follows directly from the analogous result in the case of Markov equivalence (Mogensen and Hansen, 2020) using a so-called latent projection (see also Verma and Pearl, 1990a; Richardson et al., 2017).

4.3 $k$ -weak equivalence

In this subsection, we restrict our attention to $k$ -weak equivalence relations. The following result shows that if $|V|=n$ , then $n$ -weak and $(n-1)$ -weak equivalence is the same. By convention, $\beta$ is always $\mu$ -separated from $\alpha$ given $C$ when $\alpha\in C$ . If $|C|=n$ , then $C=V$ , and leads to a trivial separation.

Proposition 4.18.

Let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ such that $|V|=n$ . Graphs $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $(n-1)$ -weakly equivalent if and only if they are $n$ -weakly equivalent.

Proof.

If $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $n$ -weakly equivalent, then they are also $(n-1)$ -weakly equivalent.

On the other hand, assume that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $(n-1)$ -weakly equivalent, and let $(\alpha,\beta,C)\in\mathcal{I}_{n}(\mathcal{G}_{1})$ such that $\alpha,\beta\in V$ , $C\subseteq V$ , and $\alpha\notin C$ . We must then have $|C|\leq n-1$ , and therefore $(\alpha,\beta,C)\in\mathcal{I}_{n}(\mathcal{G}_{2})$ by $(n-1)$ -weak equivalence of $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ . By Proposition 4.11, this implies $\mathcal{I}_{n}(\mathcal{G}_{1})\subseteq\mathcal{I}_{n}(\mathcal{G}_{2})$ . Changing the roles of $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ completes the argument. ∎

Example 4.19 (Weak equivalence class).

In this example, we restrict our attention to graphs with all loops included in which case graphs $\mathcal{G}_{\mathrm{\textbf{A}}}$ , $\mathcal{G}_{\mathrm{\textbf{B}}}$ , and $\mathcal{G}_{\mathrm{\textbf{C}}}$ in Figure 4.19 constitute a $2$ -weak equivalence class and a $3$ -weak equivalence class. Graph $\mathcal{G}_{C}$ is the greatest element in both cases. We have that $[\mathcal{G}_{\mathrm{\textbf{C}}}]_{2}\subseteq[\mathcal{G}_{\mathrm{\textbf{C}}}]_{1}$ (Corollary 4.9) and $[\mathcal{G}_{\mathrm{\textbf{C}}}]_{1}=\{\mathcal{G}_{\mathrm{\textbf{A}}},\mathcal{G}_{\mathrm{\textbf{B}}},\mathcal{G}_{\mathrm{\textbf{C}}},\mathcal{G}_{\mathrm{\textbf{D}}}\}$ . We see that $\mathcal{G}_{\mathrm{\textbf{C}}}$ and $\mathcal{G}_{\mathrm{\textbf{D}}}$ are not $2$ -weakly equivalent as $2$ is $\mu$ -separated from $3$ given $\{2,4\}$ in $\mathcal{G}_{\mathrm{\textbf{D}}}$ while this is not the case in $\mathcal{G}_{\mathrm{\textbf{C}}}$ .

Example 4.20.

We give an example of how ‘strong connectivity’, that is, many similar paths, may lead to more edges in a $k$ -weak graph than in the corresponding $n$ -weak graph, $k\leq n$ . For this purpose, we consider graphs $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ as shown in Figure 12. The graph $\mathcal{G}_{2}$ is $2$ -maximal and therefore it is $k$ -maximal for all $k\geq 2$ , including $k=n$ (Corollary 4.15). We construct a smaller graph, $\mathcal{G}_{1}$ , by removing $1\rightarrow 2$ . The smaller graph is not Markov equivalent, but it is $(n-3)$ -equivalent.

In terms of interpretation, we see that in this class of graphs there are many directed paths from $\alpha$ to $\beta$ and if there are more than $k$ , then the edge $\alpha\rightarrow\beta$ can be added $k$ -weakly equivalently. In a graphical sense, nodes $\alpha$ and $\beta$ are ‘strongly’ connected as there are more than $k$ disjoint, directed paths from $\alpha$ to $\beta$ and they cannot all be blocked by conditioning on at most $k$ nodes.

We now define treks and directed treks (see also Foygel et al., 2012; Mogensen, 2020a). Foygel et al. (2012); Mogensen (2020a) used paths in their definitions of treks, however, we use walks such that treks between $\alpha$ and $\alpha$ are also allowed.

Definition 4.21 (Trek, directed trek).

Let $\omega$ be a nontrivial walk between $\alpha$ and $\beta$ ,

[TABLE]

We say that $\omega$ is a trek if it has no colliders. We say that a trek is directed from $\alpha$ to $\beta$ if $\sim_{e}$ has a head at $\beta$ .

We let $\mathrm{dtr}_{\mathcal{G}}(\beta)\subseteq V$ denote the set of nodes, $\alpha$ , such that there exists a directed trek from $\alpha$ to $\beta$ in $\mathcal{G}$ .

Definition 4.22.

Let $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ be DMGs. We say that $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are trek equivalent if for all $\beta\in V$ , it holds that

[TABLE]

A walk is $\mu$ -connecting from $\alpha$ to $\beta$ given $\emptyset$ if and only if it is a directed trek from $\alpha$ to $\beta$ which is reflected in the next corollary.

Corollary 4.23.

Graphs $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are [math]-weakly equivalent if and only if they are trek equivalent.

Proof.

This follows from Corollary E.3. ∎

In Corollary 4.23, it is important to define treks using walks, not paths. For instance, the graph in Figure 14 is [math]-weak equivalent with the complete graph, but the only directed treks from $1$ to $2$ is not are paths. Therefore, the result in Corollary 4.23 does not hold if directed treks are required to be paths. We say that a DMG $\mathcal{G}=(V,E)$ , $V=\{1,2,\ldots,n\}$ , contains a directed cycle if there is some permutation of $V$ , $\sigma$ , such that $\sigma(1)\rightarrow\sigma(2)\rightarrow\ldots\rightarrow\sigma(n-1)\rightarrow\sigma(n)\rightarrow\sigma(1)$ in $\mathcal{G}$ (see an example in Figure 14).

Proposition 4.24.

Let $\mathcal{G}=(V,E)$ be a DMG, $V=\{1,2,\ldots,n\}$ , which contains a directed cycle. If every node has a loop, then the complete DMG on $V$ is the greatest element of both $[\mathcal{G}]_{0}$ and $[\mathcal{G}]_{1}$ .

Proof.

For $k=0$ , this follows from Corollary 4.23 as there is a directed trek between any ordered pair of nodes in $\mathcal{G}$ . Let $k=1$ and consider nodes $\alpha$ and $\beta$ . We show that there is no separating set, $C$ , such that $C\leq 1$ . If $C=\emptyset$ , this is clear. If $C=\{\gamma\}$ , $\gamma\neq\alpha$ , then either $\alpha\ *\!\!\rightarrow\ldots\rightarrow\beta$ is open, or $\gamma\neq\beta$ and $\alpha\leftarrow\ldots\leftarrow\beta\ *\!\!\rightarrow\beta$ is open. ∎

5 Greatest elements under homogeneous weak equivalences

In the rest of the paper, we assume every weak equivalence relation to be homogeneous (Definition 4.5) as this leads to the existence of a greatest element in each equivalence class which we will prove in Subsection 5.2. Mogensen and Hansen (2020) showed the analogous result in the case of Markov equivalence classes. The notions of $C$ -potential siblings and $C$ -potential parents are central to this proof and are introduced in the next subsection.

5.1 $C$ -potential siblings and $C$ -potential parents

The existence of a greatest element in each $\mathcal{J}$ -weak equivalence class can be proven using $C$ -potential siblings and $C$ -potential parents as introduced in Definitions 5.1 and 5.2. We say that two graphs, $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ , are $C$ -equivalent, $C\subseteq V$ , if for all $\gamma,\delta\in V$ ,

[TABLE]

Let $\alpha,\beta\in V$ and let $e$ be the edge $\alpha\leftrightarrow\beta$ . The conditions (cs1)-(cs3) in Definition 5.1 are sufficient and necessary for $\mathcal{G}$ and $\mathcal{G}+e$ to be $C$ -equivalent. When $e$ is directed, the conditions (cp1)-(cp4) in Definition 5.2 are analogously necessary and sufficient for $\mathcal{G}$ and $\mathcal{G}+e$ to be $C$ -equivalent. The sufficiency is proven in Lemmas D.2 and D.3 and the necessity follows from applying Propositions 5.5 and 5.6 to $\mathcal{G}+e$ .

Definitions 5.1 and 5.2 use an abstract independence model, $\mathcal{I}$ , while Propositions 5.3 and 5.4 describe the content of those definitions in the case of a graphical independence model, $\mathcal{I}=\mathcal{I}(\mathcal{G})$ .

Definition 5.1 ( $C$ -potential sibling).

Let $\mathcal{I}$ be an independence model over $V$ , let $\alpha,\beta\in V$ , and let $C\subseteq V$ . We say that $\alpha$ and $\beta$ are $C$ -potential siblings in $\mathcal{I}$ if (cs1)-(cs3) hold.

(cs1)
if $\alpha\notin C$ : $(\alpha,\beta,C)\notin\mathcal{I}$ , and
if $\beta\notin C$ : $(\beta,\alpha,C)\notin\mathcal{I}$
(cs2)

if $\beta\in C$ : for all $\gamma\in V$ ,

[TABLE]

(cs3)

if $\alpha\in C$ : for all $\gamma\in V$ ,

[TABLE]

Definition 5.2 ( $C$ -potential parent).

Let $\mathcal{I}$ be an independence model over $V$ , let $\alpha,\beta\in V$ , and let $C\subseteq V$ . We say that $\alpha$ is a $C$ -potential parent of $\beta$ in $\mathcal{I}$ if (cp1)-(cp4) hold.

(cp1)

if $\alpha\notin C$ : $(\alpha,\beta,C)\notin\mathcal{I}$

(cp2)

if $\alpha\notin C$ : for all $\gamma\in V$ ,

[TABLE]

(cp3)

if $\alpha\notin C,\beta\in C$ : for all $\gamma,\delta\in C$ ,

[TABLE]

(cp4)

if $\alpha,\beta\notin C$ : for all $\gamma\in V$ ,

[TABLE]

If $\mathcal{I}$ is graphical, $\mathcal{I}=\mathcal{I}(\mathcal{G})$ , and $\alpha$ and $\beta$ are $C$ -potential siblings in $\mathcal{I}(\mathcal{G})$ , we will say that $\alpha\leftrightarrow\beta$ is a $C$ -potential sibling edge between $\alpha$ and $\beta$ . Similarly, we will say that $\alpha\rightarrow\beta$ is a $C$ -potential parent edge from $\alpha$ to $\beta$ if $\alpha$ is a $C$ -potential parent of $\beta$ in $\mathcal{I}(\mathcal{G})$ . The following two propositions simply rewrite Definitions 5.1 and 5.2 to explicitly use $\mu$ -connecting walks in the case of a graphical independence model. Their proofs follow directly from the definitions of $\mu$ -separation and the independence model $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ .

Proposition 5.3 (Graphical version of $C$ -potential siblings).

Let $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ be the weak independence model induced by $\mathcal{G}=(V,E)$ . Let $C\subseteq V$ and let $\mathcal{C}$ be the collection of conditioning sets of $\mathcal{J}$ . Nodes $\alpha$ and $\beta$ are $C$ -potential siblings if and only if $C\notin\mathcal{C}$ or (gcs1)-(gcs3) holds.

(gcs1)
If $\alpha\notin C$ , there exists a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ , and
if $\beta\notin C$ , there exists a $\mu$ -connecting walk from $\beta$ to $\alpha$ given $C$ .
(gcs2)

If $\beta\in C$ , then for all $\gamma\in V$ such that there exists a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ , there also exists a $\mu$ -connecting walk from $\gamma$ to $\alpha$ given $C$ .

(gcs3)

If $\alpha\in C$ , then for all $\gamma\in V$ such that there exists a $\mu$ -connecting walk from $\gamma$ to $\alpha$ given $C$ , there also exists a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ .

Proposition 5.4 (Graphical version of $C$ -potential parents).

Let $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ be the weak independence model induced by $\mathcal{G}=(V,E)$ . Let $C\subseteq V$ and let $\mathcal{C}$ be the collection of conditioning sets of $\mathcal{J}$ . The node $\alpha$ is a $C$ -potential parent of $\beta$ if and only if $C\notin\mathcal{C}$ or (gcp1)-(gcp4) holds.

(gcp1)

If $\alpha\notin C$ , there exists a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ .

(gcp2)

If $\alpha\notin C$ , then for all $\gamma\in V$ such that there exists a $\mu$ -connecting walk from $\gamma$ to $\alpha$ given $C$ , there also exists a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ .

(gcp3)

If $\alpha\notin C$ and $\beta\in C$ , then for all $\gamma,\delta\in V$ such that there exists a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ and a $\mu$ -connecting walk from $\alpha$ to $\delta$ given $C$ , there also exists a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ .

(gcp4)

If $\alpha,\beta\notin C$ then for all $\gamma\in V$ such that there exists a $\mu$ -connecting walk from $\alpha$ to $\gamma$ given $C$ , there also exists a $\mu$ -connecting walk from $\beta$ to $\gamma$ given $C$ .

The next two propositions show that if $\alpha\leftrightarrow\beta$ ( $\alpha\rightarrow\beta$ ) is in a graph, then $\alpha$ and $\beta$ are $C$ -potential siblings ( $\alpha$ is a $C$ -potential parent of $\beta$ ) in the independence model of the graph for all $C\subseteq V$ . The edge $e$ is therefore a $C$ -potential sibling edge ( $C$ -potential parent edge) in $\mathcal{I}(\mathcal{G}+e)$ , and if $\mathcal{G}$ and $\mathcal{G}+e$ are $C$ -equivalent, then $e$ is also a $C$ -potential sibling edge ( $C$ -potential parent edge) in $\mathcal{I}(\mathcal{G})$ . This means that $\mathcal{I}(\mathcal{G})$ satisfying the conditions in Definitions 5.1 and 5.2 is necessary for $C$ -equivalence of $\mathcal{G}$ and $\mathcal{G}+e$ .

Proposition 5.5.

Let $\mathcal{J}$ be homogeneous. If $\alpha\leftrightarrow\beta$ is in $\mathcal{G}$ , then $\alpha$ and $\beta$ are $C$ -potential siblings in $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ for all $C\subseteq V$ .

Proof.

If $C\notin\mathcal{C}$ , then it follows immediately. We assume $C\in\mathcal{C}$ and prove (gcs1)-(gcs3). (gcs1) If $\alpha\notin C$ , then $\alpha\leftrightarrow\beta$ is a $\mu$ -connecting walk in $\mathcal{G}$ given $C$ . The proof of the other statement is analogous. (gcs2) Assume that $\beta\in C$ and let $\gamma\in V$ such that there exists a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ . Composing this with $\beta\leftrightarrow\alpha$ gives a $\mu$ -connecting walk from $\gamma$ to $\alpha$ given $C$ as $\beta\in C$ . (gcs3) This is shown similarly to (gcs2). ∎

Proposition 5.6.

Let $\mathcal{J}$ be homogeneous. If $\alpha\rightarrow\beta$ is in $\mathcal{G}$ , then $\alpha$ is a $C$ -potential parent of $\beta$ in $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ for all $C\subseteq V$ .

Proof.

If $C\notin\mathcal{C}$ , then this again follows immediately. We instead assume $C\in\mathcal{C}$ and prove (gcp1)-(gcp4). (gcp1) If $\alpha\notin C$ , then $\alpha\rightarrow\beta$ is a $\mu$ -connecting walk given $C$ . (gcp2) Assume that $\alpha\notin C$ and let $\gamma\in V$ , and assume there is a $\mu$ -connecting walk from $\gamma$ to $\alpha$ given $C$ . Concatenating this with the edge $\alpha\rightarrow\beta$ gives a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ as $\alpha\notin C$ . (gcp3) Assume that $\alpha\notin C,\beta\in C$ and let $\gamma,\delta\in V$ such that there exist a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ and a $\mu$ -connecting walk from $\alpha$ to $\delta$ given $C$ . Concatenating them with the edge $\alpha\rightarrow\beta$ gives a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ as $\beta\in C$ and $\alpha\notin C$ . (gcp4) Assume $\alpha,\beta\notin C$ and let $\gamma\in V$ such that there exists a $\mu$ -connecting walk from $\alpha$ to $\gamma$ given $C$ . Concatenating the edge $\alpha\rightarrow\beta$ with this walk gives a $\mu$ -connecting walk from $\beta$ to $\gamma$ given $C$ as $\alpha,\beta\notin C$ . ∎

5.2 Existence of greatest elements

Markov equivalence classes of DMGs are known to contain a greatest element (Mogensen and Hansen, 2020). This means that for an equivalence class $[\mathcal{G}]$ , there exists a graph $\bar{\mathcal{G}}\in[\mathcal{G}]$ such that $\bar{\mathcal{G}}$ is a supergraph of all graphs $\tilde{\mathcal{G}}\in[\mathcal{G}]$ . This is a very convenient result as it allows a succinct representation of the entire Markov equivalence class as illustrated in Example 4. The main result of this section, Theorem 5.8, shows that $\mathcal{J}$ -weak equivalence classes enjoy the same property when $\mathcal{J}$ is homogeneous. This means that we can represent weak equivalence classes in a similar way. Section 6 discusses this further and introduces a hierarchy of $k$ -weak equivalence classes for different values of $k$ .

Lemma 5.7.

Let $\mathcal{G}_{1}$ be a DMG. Let $\mathcal{J}$ be homogeneous and let $\mathcal{C}$ be the collection of conditioning sets of $\mathcal{J}$ . If $\alpha$ and $\beta$ are $C$ -potential siblings for all $C\in\mathcal{C}$ and e denotes the edge $\alpha\leftrightarrow\beta$ , then $\mathcal{I}_{\mathcal{J}}(\mathcal{G})=\mathcal{I}_{\mathcal{J}}(\mathcal{G}+e)$ . If $\alpha$ is a $C$ -potential parent of $\beta$ for all $C\in\mathcal{C}$ and e denotes the edge $\alpha\rightarrow\beta$ , then $\mathcal{I}_{\mathcal{J}}(\mathcal{G})=\mathcal{I}_{\mathcal{J}}(\mathcal{G}+e)$ .

Proof.

The inclusion $\mathcal{I}_{\mathcal{J}}(\mathcal{G}+e)\subseteq\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ follows from Proposition 4.7. We show the other inclusion by contraposition. Proposition 4.11 implies that it is enough to consider triples of the form $(\gamma,\delta,D)$ , $\gamma,\delta\in V$ , $D\subseteq V$ , $\gamma\notin D$ . Assume $(\gamma,\delta,D)\notin\mathcal{I}_{\mathcal{J}}(\mathcal{G}+e)$ . If $(\gamma,\delta,D)\notin\mathcal{J}$ , then $(\gamma,\delta,D)\notin\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ . If instead $(\gamma,\delta,D)\in\mathcal{J}$ , then $(\gamma,\delta,D)\notin\mathcal{I}(\mathcal{G}+e)$ and $D\in\mathcal{C}$ . In this case, there exist a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $D$ in $\mathcal{G}+e$ . Nodes $\alpha$ and $\beta$ are $C$ -potential siblings (or $\alpha$ is a $C$ -potential parent of $\beta$ ) for all $C\in\mathcal{C}$ , and therefore also for $D\in\mathcal{C}$ . Lemma D.2 (Lemma D.3) gives the result. ∎

Lemmas D.2 and D.3 that are used in the above proof are adaptations of lemmas in Mogensen and Hansen (2020). Appendix D describes how to make this generalization.

From an independence model $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ such that $\mathcal{J}$ is homogeneous we now define a graph on nodes $V$ , $\mathcal{G}=(V,E)$ . As $\mathcal{J}$ is homogeneous, we know that $\mathcal{J}=\{(A,B,C)\in\mathcal{P}:C\in\mathcal{C}\}$ for some $\mathcal{C}\subseteq\{C:C\subseteq V\}$ . For all $\alpha,\beta\in V$ , we include the directed edge $\alpha\rightarrow\beta$ if and only if $\alpha$ is a $C$ -potential parent of $\beta$ for all $C\in\mathcal{C}$ . We include the bidirected edge $\alpha\leftrightarrow\beta$ if and only if $\alpha$ and $\beta$ are $C$ -potential siblings for all $C\in\mathcal{C}$ . We denote the resulting graph by $\mathcal{N}$ . We see that ${\mathcal{N}}$ is uniquely defined from the $\mathcal{J}$ -independence model of $\mathcal{G}$ , $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ , and is therefore the same for all elements of the equivalence class $[\mathcal{G}]_{\mathcal{J}}$ . The following shows that $\mathcal{N}$ is a unique maximal element, that is, a greatest element, in $[\mathcal{G}]_{\mathcal{J}}$ .

Theorem 5.8.

Let $\mathcal{G}$ be a DMG and let $\mathcal{J}$ be homogeneous. The graph $\mathcal{N}$ defined above is $\mathcal{J}$ -weakly equivalent with $\mathcal{G}$ and it is the unique maximal element in $[\mathcal{G}]_{\mathcal{J}}$ .

Proof.

Let $\bar{\mathcal{G}}\in[\mathcal{G}]_{\mathcal{J}}$ . If a directed edge, $\alpha\rightarrow\beta$ , is in $\bar{\mathcal{G}}$ , then $\alpha$ is a $C$ -potential parent of $\beta$ in $\mathcal{I}_{\mathcal{J}}(\bar{\mathcal{G}})=\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ for all $C$ (Proposition 5.6). This means that the directed edge is in ${\mathcal{N}}$ . Similarly, for bidirected edges (Proposition 5.5), and $\mathcal{N}$ is a supergraph of all graphs in $[\mathcal{G}]_{\mathcal{J}}$ .

Every edge in $\mathcal{N}$ is a $C$ -potential edge in $[\mathcal{G}]_{\mathcal{J}}$ for all $C\in\mathcal{C}$ . We can construct a finite sequence of graphs starting from $\mathcal{G}$ and adding the edges that are in $\mathcal{N}$ , but not in $\mathcal{G}$ , sequentially. Lemma 5.7 shows that all graphs in this sequence are $\mathcal{J}$ -weakly equivalent with $\mathcal{G}$ , and therefore so is $\mathcal{N}$ .

In conclusion, ${\mathcal{N}}$ is a greatest element of the equivalence class. ∎

Theorem 5.8 is central in our development of graphical modeling based on weak equivalence as it provides a unique and interpretable representative of each equivalence class. We give examples of applications of this theorem in Section 6.

5.2.1 Comparison with Markov equivalence case

The above definitions and results are related to results in the case of Markov equivalence (Mogensen and Hansen, 2020). Definitions 5.1 and 5.2 can be thought of as $C$ -specific versions of Definitions 5.1 and 5.5 in Mogensen and Hansen (2020). This leads to $C$ -specific versions of Propositions 5.5 and 5.6 that are analogous to propositions in Mogensen and Hansen (2020).

Importantly, the potential parent conditions of Mogensen and Hansen (2020) use multiple conditioning sets and are therefore not amenable as a foundation for the proof of Theorem 5.8. The conditions in this paper use a single $C$ which facilitates the generalization from Markov equivalence classes to weak equivalence classes. The reformulation of the definitions also entails an important change of perspective. Instead of describing conditions such that the addition of an edge does not change the independence model for any conditioning set (Markov equivalence), the above conditions describe conditions such that the addition of an edge does not change the independence model when restricted to a specific conditioning set. This allows us to aggregate these conditions for any set of conditioning sets as defined by a homogeneous $\mathcal{J}$ , and from this we can prove the existence of a greatest element in this more general setting.

6 Representation of weak equivalence classes

The previous section proved the existence of a greatest element in each weak equivalence class when $\mathcal{J}$ is homogeneous. In Subsection 6.1, we first desribe how this leads to a simple and concise representation of an entire equivalence class, and Subsection 6.3 illustrates this representation using the alarm example. In Subsection 6.2, we restrict our attention to $k$ -weak equivalence and describe a hierarchy of $k$ -weak equivalence classes. Choosing a $k=0,1,\ldots,n-1$ leads to different notions of equivalence with different levels of granularity. The hierarchy in Subsection 6.2 provides a graphical representation of $k$ -weak equivalence classes across different values of $k$ which is meant to illuminate how equivalence classes change across different values of $k$ .

6.1 Directed mixed equivalence graph

The following definition provides a graphical object representing an entire weak equivalence class. Mogensen and Hansen (2020) gave the same definition in the context of Markov equivalence as illustrated in Example 4.

Definition 6.1 (Directed mixed equivalence graph (DMEG)).

Let $\mathcal{J}$ be homogeneous and assume that $\mathcal{N}=(V,F)$ is $\mathcal{J}$ -maximal and $\mathcal{N}\in[\mathcal{G}]_{\mathcal{J}}$ . We define $\bar{F}\subseteq F$ such that $e\in\bar{F}$ if and only if $e\in F$ and there exists $\mathcal{G}=(V,E)\in[\mathcal{G}]_{\mathcal{J}}$ such that $e\notin E$ . We define the directed mixed weak equivalence graph (DMEG) of $[\mathcal{G}]_{\mathcal{J}}$ as the triple $(V,F,\bar{F})$ .

We visualize a directed mixed weak equivalence graph by drawing the corresponding maximal graph and making all edges in $\bar{F}$ dashed (see the example in Figure 15). A DMEG summarizes the equivalence class in the following sense. Let $\mathcal{N}$ be a $\mathcal{J}$ -maximal element such that $\mathcal{N}\in[\mathcal{G}]_{\mathcal{J}}$ , that is, $\mathcal{N}$ is the greatest element of $[\mathcal{G}]_{\mathcal{J}}$ , and let $\mathcal{N}^{\prime}$ be the corresponding DMEG. If an edge is solid in $\mathcal{N}^{\prime}$ , then this edge is in every ${\mathcal{G}}_{1}\in[\mathcal{G}]_{\mathcal{J}}$ . If an edge is absent in $\mathcal{N}^{\prime}$ , then no ${\mathcal{G}}_{1}\in[\mathcal{G}]_{\mathcal{J}}$ contains this edge. If an edge, $e$ , is dashed in $\mathcal{N}^{\prime}$ , then there exists a ${\mathcal{G}}_{1}=(V,E)\in[\mathcal{G}]_{\mathcal{J}}$ such that $e\notin E$ . Clearly $e$ is in $\mathcal{N}\in[\mathcal{G}]_{\mathcal{J}}$ and therefore $e$ is in some elements of $[\mathcal{G}]_{\mathcal{J}}$ , but not in others. One should note that removing multiple dashed edges from $\mathcal{N}^{\prime}$ does not necessarily lead to a $\mathcal{J}$ -weakly equivalent graph as removing an edge may impose restrictions on which other edges can be removed while maintaining $\mathcal{J}$ -weak equivalence. This is related to the fact that a weak equivalence class need not contain a least element (see Figure 15).

Example 6.2 (Directed mixed equivalence graph).

Graphs $\mathcal{G}_{\mathbf{A}}$ , $\mathcal{G}_{\mathbf{B}}$ , and $\mathcal{G}_{\mathbf{C}}$ in Figure 15 constitute a 2-weak and a 3-weak equivalence class when restricting to DMGs that have all loops present (for simplicity we make this assumption). The graph $\mathcal{G}_{\mathbf{C}}$ is the greatest element. The corresponding DMEG is also shown in Figure 15, see Definition 6.1. The 3-weak equivalence class (2-weak equivalence class) does not contain a least element as removing both $4\rightarrow 3$ and $2\rightarrow 3$ does not lead to a 3-weakly equivalent graph (2-weakly equivalent graph).

Example 6.3.

This example describes a setting which leads to a weak equivalence with a homogeneous $\mathcal{J}$ which is not a $k$ -weak equivalence. We consider a setting where a $5$ -dimensional process is observed, $V=\{1,2,3,4,5\}$ , but not every coordinate process is observed simultaneously. This is essentially a setting with overlapping variable sets, see, e.g., Danks (2002); Danks et al. (2008); Triantafillou et al. (2010); Huang et al. (2020). We assume that data contains observations of $X_{t}^{R}$ over an interval $T_{R}$ for $R\in\mathcal{R}$ ,

[TABLE]

The intervals are disjoint, $T_{R_{1}}\cap T_{R_{2}}=\emptyset$ for $R_{1}\neq R_{2}$ . We will approach this problem by restricting the local independences that can be tested using this data and require that there exists $R\in\mathcal{R}$ such that $A,B,C\subseteq R$ for us to be able to test the local independence $(A,B,C)$ .

We see that all local independences, $(\alpha,\beta,C)$ , such that $\alpha,\beta\in V$ and $|C|\leq 1$ can be tested from this data as every triple, $\{\alpha,\beta,\gamma\}$ , $\alpha,\beta,\gamma\in V$ , is observed simultaneously (that is, $\alpha,\beta,\gamma\in R$ for some $R\in\mathcal{R}$ ). We can also test $(\alpha,\beta,\{1,2\})$ for all $\alpha,\beta\in\{1,2,3,4,5\}$ , but not $(4,5,\{1,3\})$ . This means that we can model this using $k$ -weak equivalence, but only for $k=0$ or $k=1$ . We can obtain further information by defining

[TABLE]

This leads to a homogeneous weak equivalence relation which is not a $k$ -weak equivalence.

6.2 Hierarchy of $k$ -weak equivalence

The previous section describes a graph, the directed mixed equivalence graph, which can help us understand a single weak equivalence class for a fixed, homogeneous $\mathcal{J}$ . In this section, we restrict our attention to $k$ -weak equivalence relations and study a description of $k$ -weak equivalence classes for varying values of $k$ . We consider a fixed node set, $V$ . For each value of $k$ , the $k$ -weak equivalence classes form a partition of the DMGs on node set $V$ , with smaller $k$ corresponding to more coarse partitions. Each weak equivalence class can be represented by its maximal element and there is an interpretable structure between $k$ -weak equivalence classes for different values of $k$ which can help us understand the connection between these different notions of equivalence. This section describes this hierarchy of $k$ -weak equivalences.

6.2.1 Levels of granularity

Let $\mathcal{G}$ be a DMG, and let $k_{1}<k_{2}$ . Let $\mathcal{N}_{1}$ denote the greatest element of $[\mathcal{G}]_{k_{1}}$ and let $\mathcal{N}_{2}$ denote the greatest element of $[\mathcal{G}]_{k_{2}}$ . We know that $[\mathcal{G}]_{k_{2}}\subseteq[\mathcal{G}]_{k_{1}}$ and it follows that $\mathcal{N}_{2}\subseteq\mathcal{N}_{1}$ . The graphs $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ are both representatives of $\mathcal{G}$ , but at different levels of granularity. The $k_{2}$ -equivalence class of $\mathcal{G}$ is smaller, thus $k_{2}$ -weak equivalence is more expressive than $k_{1}$ -weak equivalence. We may ask what ‘approximation error’ we make by using $k_{1}$ -weak equivalence instead of $k_{2}$ -weak equivalence. Let $e$ be an edge in $\mathcal{N}_{1}$ which is not in $\mathcal{N}_{2}$ . We know that $\mathcal{G}$ and $\mathcal{G}+e$ are $k_{1}$ -weakly equivalent, so they can only differ on $\mu$ -separations with $C$ such that $|C|>k_{1}$ . The approximation error induced by including $e$ is therefore restricted to ‘large’ conditioning sets. From a practical point of view, local independence tests with large conditioning sets are expected to perform poorly. This means that the loss of information when testing local independences from finite samples may be small.

6.2.2 Forest representation

We can provide a convenient representation of the $k$ -weak equivalence hierarchy using trees and forests. A tree, $\mathcal{T}=(V_{\mathcal{T}},E_{\mathcal{T}})$ , is an undirected graph in which each pair of distinct nodes are connected by exactly one path. A forest is the disjoint union of a set of trees. We can construct a forest in the following way. For a fixed $V$ , $|V|=n$ , and $k\in K=\{0,1,\ldots,n-1\}$ , we consider the set of $k$ -weak equivalence classes of DMGs on node set $V$ . We let $n_{k}$ denote the number of such equivalence classes. The $i$ ’th $k$ -weak equivalence class, $i=1,\ldots,n_{k}$ , contains a unique maximal element and we denote this graph by $\mathcal{G}_{k,i}$ . We do this for every $k\in\{0,1,\ldots,n-1\}$ and define a node set

[TABLE]

Note that we write this as a disjoint union as the same graph may be a maximal element for different $k$ . Therefore, the set $V_{\mathrm{g}}$ contains pairs $(\mathcal{G},k)$ such that $\mathcal{G}$ is $k$ -maximal. For instance, if $\mathcal{G}$ is a maximal element of a $k_{1}$ -weak equivalence class and of a $k_{2}$ -weak equivalence class, then $(\mathcal{G},k_{1})\in V_{\mathrm{g}}$ and $(\mathcal{G},k_{2})\in V_{\mathrm{g}}$ and these are different nodes.

We now construct a forest with node set $V_{\mathrm{g}}$ in the following way. For each $(\mathcal{G},k)$ such that $k>0$ , there exist a unique $(k-1)$ -maximal graph, $\bar{\mathcal{G}}$ , such that $\mathcal{G}\in[\bar{\mathcal{G}}]_{k-1}$ , and we join $({\mathcal{G}},k)$ to $(\bar{\mathcal{G}},k-1)$ by an undirected edge. We call the resulting graph the weak equivalence hierarchy over $V$ and denote it by $\mathcal{H}_{V}$ . For $k<n-1$ , we will use $\mathrm{up}(\mathcal{G},k))$ to denote the (nonempty) set of graphs $\bar{\mathcal{G}}$ such that $(\mathcal{G},k)$ and $(\bar{\mathcal{G}},\bar{k})$ are adjacent in $\mathcal{H}_{V}$ and such that $\bar{k}=k+1$ . For $k>0$ , we will use $\mathrm{down}(\mathcal{G},k)$ to denote the unique graph $\bar{\mathcal{G}}$ such that $(\mathcal{G},k)$ and $(\bar{\mathcal{G}},\bar{k})$ are adjacent in $\mathcal{H}_{V}$ and such that $\bar{k}=k-1$ . Example 6.4 and Figure 16 describe (parts of) the weak hierarchy over $V=\{1,2,3,4\}$ .

Properties of $\mathcal{H}_{V}$

We first argue that $\mathcal{H}_{V}$ is a forest. The nodes $(\mathcal{G}_{0,i},0)$ , $i=1,\ldots,n_{0}$ , must be in different connected components as for each node there is at most a single edge down in the hierarchy. Using induction on $k$ and Corollary 4.9, we see that if $\mathcal{G}_{k,i}\in[\mathcal{G}_{0,j}]_{0}$ , then there is a path between $(\mathcal{G}_{k,i},k)$ and $(\mathcal{G}_{0,j},0)$ , and $V_{j}=\{(\mathcal{G}_{k,i},k):\mathcal{G}_{k,i}\in[\mathcal{G}_{0,j}]_{0}\}$ is therefore a connected subset of $V_{\mathrm{g}}$ . It contains exactly $|V_{j}|-1$ edges and is thus a tree. This means that $\mathcal{H}_{V}$ consists of $n_{0}$ disjoint trees, each tree rooted at $\mathcal{G}_{0,j}$ for some $j=1,2,\ldots,n_{0}$ . Corollary 4.23 characterizes [math]-weak equivalence.

When $i_{1}\neq i_{2}$ , $[\mathcal{G}]_{k_{1},i_{1}}$ and $[\mathcal{G}]_{k_{2},i_{2}}$ are disjoint when $k_{1}=k_{2}$ , but need not be when $k_{1}\neq k_{2}$ . For $k_{2}\geq k_{1}$ and $i_{1}=1,\ldots,n_{k_{1}}$ , there exist $i_{2}$ such that $\mathcal{G}_{k_{1},i_{1}}=\mathcal{G}_{k_{2},i_{2}}$ which is due to the fact that if a graph is $k_{1}$ -maximal, then it is also $k_{2}$ -maximal (Corollary 4.15). The leaves of the trees are the greatest elements of the Markov equivalence classes (Proposition 4.18).

The graph $\mathcal{H}_{V}$ represents the entire system of $k$ -weak equivalence classes and can be conveniently drawn in levels such that the vertical placement is determined by $k$ (see Figure 16). Let $[\mathcal{G}_{k,i}]_{k}$ be a $k$ -weak equivalence class represented by its greatest element $\mathcal{G}_{k,i}$ . If we move along the unique edge towards a $(k-1)$ -maximal graph, we obtain the maximal element of the $(k-1)$ -weak equivalence class containing graph the $\mathcal{G}_{k,i}$ by definition of $\mathcal{H}_{V}$ . If we move to the $(k+1)$ -level, one of the $(k+1)$ -equivalence classes will be represented by $\mathcal{G}_{k,i}$ itself. Naturally, moving towards larger $k$ in the hierarchy achieves smaller equivalence classes as if $\mathcal{G}$ is $(k-1)$ -maximal, then $[\mathcal{G}]_{k-1}=\bigcup_{\bar{\mathcal{G}}\in\mathrm{up}(\mathcal{G},k-1)}[\bar{\mathcal{G}}]_{k}$ .

Dashed edges in the hierarchy

In $\mathcal{H}_{V}$ , one may use DMEGs instead of the corresponding maximal DMGs, and in this paragraph we think of a node $(\mathcal{G},k)$ in $\mathcal{H}_{V}$ as a pair consisting of a DMEG and an integer. In this case, there is also a certain structure in the dashed/solid status of edges across levels of $k$ . If an edge $\alpha\sim\beta$ is solid in $(\mathcal{G},k)$ , then it is also solid in all graphs $\bar{\mathcal{G}}\in\mathrm{up}(\mathcal{G},k)$ . This is seen from the fact that if $\tilde{\mathcal{G}}\in[\bar{\mathcal{G}}]_{k+1}$ then $\tilde{\mathcal{G}}\in[\mathcal{G}]_{k}$ and every graph in this equivalence class contains $\alpha\sim\beta$ which is why all graphs in $[\bar{\mathcal{G}}]_{k+1}$ also contain it. If the edge $\alpha\sim\beta$ is dashed in $(\mathcal{G},k)$ , then it is also dashed in $\bar{\mathcal{G}}=\mathrm{down}(\mathcal{G},k-1)$ . This is because there exists a graph $\tilde{\mathcal{G}}\in[{\mathcal{G}}]_{k}$ without this edge, and $\tilde{\mathcal{G}}\in[\bar{\mathcal{G}}]_{k-1}$ . On the other hand, the edge is in the maximal element of $[{\mathcal{G}}]_{k}$ , thus the edge must be present in $\bar{\mathcal{G}}$ and dashed.

On the other hand, moving up (towards larger values of $k$ ) in the hierarchy a dashed edge may be removed, become solid, or remain dashed. Moving down (towards smaller values of $k$ ) in the hierarchy a solid edge may become dashed.

Example 6.4 ( $k$ -weak hierarchy over $V=\{1,2,3,4\}$ ).

Figure 16 shows a subgraph of $\mathcal{H}_{V}$ for $V=\{1,2,3,4\}$ . A node in $\mathcal{H}_{V}$ , $(\mathcal{G},k)$ , is shown as $\mathcal{G}$ (or rather, the corresponding DMEG), and $k$ determines the vertical placement of the node. All loops are present in the maximal graphs, but omitted from the visualization for simplicity. We use the edge $\alpha\mathrel{\text{\ooalign{$ \filleddiamond!!!!;-!!!!;\filleddiamond $}}}\beta$ to indicate that all three possible edges between a pair of nodes, $\alpha$ and $\beta$ , are present in the graph, that is, $\alpha\rightarrow\beta,\alpha\leftrightarrow\beta,\alpha\leftrightarrow\beta$ . The letters $(x,y)$ , to the right of a graph index the graphs shown in the figure.

Figure 16 shows two subtrees of trees in the hierarchy. We see that the two graphs shown on level $k=0$ , $(a,a)$ and $(e,a)$ , are not [math]-weak equivalent as there is no directed trek from $1$ to $2$ in $(e,a)$ (see also Corollary 4.23).

In the figure, a red undirected edge indicates graph equality, for example, the edge between $(a,c)$ and $(a,d)$ . As noted above, if $\mathcal{G}$ is $k$ -maximal, then $\mathcal{G}\in\mathrm{up}(\mathcal{G},k)$ and when $\mathcal{G}$ is drawn both in levels $k$ and $k+1$ , we indicate this by making the undirected edge connecting them red.

6.3 Alarm network

We return to the alarm example from Subsection 2.1. This is a network of moderate size with $10$ observable coordinate processes. If we consider graphical modeling of this network using a $k$ -weak equivalence relation, different values of $k\in\{0,1,\ldots,10\}$ lead to different levels of granularity as larger values of $k$ will give us smaller equivalence classes. Let $\mathcal{G}$ denote the latent projection of the system (see Figure 1), and let $\mathcal{N}_{k}$ denote the greatest element of $[\mathcal{G}]_{k}$ . Figure 17 shows the DMEGs of $\mathcal{N}_{10}$ and of $\mathcal{N}_{3}$ . We know that $\mathcal{N}_{10}\subseteq\mathcal{N}_{3}$ . In this example, we see that the only difference between the two DMEGs in Figure 17 is the bidirected edge between $3$ and $10$ . This edge is necessarily dashed as $\mathcal{N}_{10}\in[\mathcal{G}]_{3}=[\mathcal{N}_{3}]_{3}$ . The added complexity of using $k=10$ does therefore not provide much additional information in this example.

7 Algorithms for weak equivalence

The results in Section 3 imply that several computational tasks that occur naturally when using $\mu$ -separation and local independence for graphical modeling of stochastic processes are not feasible, even for a moderate number of coordinate processes. Section 4 introduces a more flexible notion of equivalence to circumvent these issues and Section 5 shows that the convenient theory of Markov equivalence classes translates seamlessly to the more general notion of weak equivalence. As a last component of this paper, we argue that this more general theory leads to algorithms that are in fact feasible from a computational point of view.

7.1 A parametrized hierarchy of graphical equivalence

We start this subsection by providing a formal definition of the weak equivalence decision problem.

Decision problem 7.1 (Weak Markov equivalence in DMGs).

Let $\mathcal{G}_{1}=(V,E)_{1}$ and $\mathcal{G}_{2}=(V,E_{2})$ be DMGs. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ $\mathcal{J}$ -weakly equivalent?

Decision problem 7.1 is coNP-complete as it is a more general problem than Decision problem 3.1. We restrict this to $k$ -weak equivalence and obtain a parametrized decision problem.

Decision problem 7.2 (Weak Markov equivalence in DMGs).

Let $k$ be a nonnegative integer, and let $\mathcal{G}_{1}=(V,E)_{1}$ and $\mathcal{G}_{2}=(V,E_{2})$ be DMGs. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ $k$ -weakly equivalent?

A decision problem is said to be slicewise polynomial if there exists an algorithm which solves the problem in $\mathcal{O}(n^{g(k)})$ steps for a computable function $g$ , input length $n$ , and parameter $k$ . For fixed $k$ , we can decide $k$ -weak equivalence of two DMGs by simply checking every possible triple $(\alpha,\beta,C)$ , $\alpha,\beta\in V,C\subseteq V$ . This can be done in time bounded by $n^{g(k)}$ as the number of conditioning sets is bounded by $n^{k}$ . This shows that parametrized $k$ -weak equivalence is a slicewise polynomial problem, in that for a fixed $k$ it is solvable by an algorithm which is polynomial in $n$ . One should note that this is different from the $m$ -sparse decision problems (e.g, Decision problem 3.6) as they remain hard for a fixed $m$ whenever $m\geq 16$ .

Intuitively, the unrestricted Markov equivalence problem is computationally hard as the maximal size of the conditioning sets also grows with $n$ . On the other hand, if we consider $k$ -weak equivalence for a fixed $k$ then the maximal size of the conditioning sets is fixed, and the problem can be solved in time which scales polynomially in $n$ .

7.2 Computing greatest elements and directed mixed equivalence graphs

As explained above, for a fixed $k$ one can decide $k$ -weak equivalence in polynomial time. The same applies to the related computational problems.

Assume we have a graph $\mathcal{G}$ and want to find the maximal element of $[\mathcal{G}]_{k}$ . A simple algorithm checks for each edge if its addition violates any of the independences in $[\mathcal{G}]_{k}$ and adds the edge if and only if this is not the case. For a fixed $k$ , this is done in polynomial time.

When considering a weak equivalence class as represented by its greatest element, we are interested in computing the associated directed mixed equivalence graph (DMEG) as this graph represents the entire equivalence class concisely. We may remove a single edge at a time and decide Markov equivalence to obtain the corresponding DMEG from a greatest element.

8 Learning

There is a large literature on methods for recovering a graph from observational data Spirtes and Zhang (2018). In the case of DAG-based models, many methods use tests of conditional independence. Similarly, it is possible to learn local independence graphs using tests of local independence. In this section, we briefly discuss graphical structure learning based on tests of local independence as described by Meek (2014) and its connection to weak equivalence of DMGs. Mogensen et al. (2018) described a learning algorithm outputting the Markov equivalence DMEG from tests of local independence. Absar and Zhang (2021) implemented a PC-like algorithm based on $\mu$ -separation. Bhattacharjya et al. (2022) studied independence tests in proximal graphical event models and graphical structure learning based on tests of local independences. Other work described tests of local independence (Thams and Hansen (2021) and Christgau et al. (2022)) and good tests are of course a prerequisite for constraint-based structure learning. The learning problem has also been studied in the discrete-time processes (Eichler, 2013).

As argued in previous sections, constrained-based algorithms that learn the Markov equivalence class of a partially observed local independence graph and are correct in the oracle setting scale poorly with the size of the graph. Therefore, $k$ -weak equivalence classes may constitute more reasonable targets for graphical structure learning. The oracle learning algorithm in Mogensen et al. (2018) leveraged the potential sibling and potential parent criteria to ensure correctness, though the number of these conditions also scales poorly with graph size, $n$ . This naturally leads to the idea of using $C$ -potential sibling and $C$ -potential parent criteria directly for learning. In the oracle case this leads to a straightforward learning algorithm by starting from the complete DMG. For each pair of nodes, $(\alpha,\beta)$ , one may test the $C$ -potential parent criteria for all $|C|\leq k$ . If one of these criteria is violated, one simply removes $\alpha\rightarrow\beta$ , and similarly for the bidirected edges. For fixed $k$ , this leads to a polynomial-time oracle learning algorithm which outputs the maximal $k$ -weakly equivalent graph of the true graph. This is similar to early stopping in FCI (Spirtes, 2001) as it only uses tests with small conditioning sets $C$ . While smaller values of $k$ lead to less informative output (larger equivalence classes), the interpretation of a learned DMEG remains the same as when using $k=n$ as shown by the theory in previous sections.

Outside of the oracle setting, actual tests of local independence output a $p$ -value. When learning local independence graphs, one may compute $p$ -values from the local independence tests that comprise the $C$ -potential parent/sibling criteria, $|C|$ , and use these $p$ -values to output a maximal graph which is in minimum violation with the data, see e.g. Hyttinen et al. (2014) for a similar idea in DAG-based graphical structure learning.

9 Discussion

The results in Section 3 show that deciding Markov equivalence is computationally hard, even under a sparsity constraint. This also implies that finding the unique maximal element of a Markov equivalence class is hard and that constraint-based learning algorithms that are correct in oracle versions need exponentially many tests in the worst case.

The theory developed in this paper provides a new interpretation of $\mu$ -separation in directed mixed graphs as representations of local independence in partially observed stochastic processes. This leads to equivalence relations on directed mixed graphs that are weaker than Markov equivalence. Under a weak equivalence relation, each equivalence class of directed mixed graphs have a simple representation and interpretation using the existence of a greatest element. Importantly, they retain a clear interpretation and a convenient graphical representation of an entire $k$ -weak equivalence class is available, just as in the case of Markov equivalence classes. The greatest element of an equivalence class also provides a feasible learning target, and one can give a constructive characterization of this element (the collection of $C$ -potential sibling and $C$ -potential parent conditions). The Markov equivalence class is often the learning target when trying to recover a graph from observational data, however, the complexity results in this paper imply that this target may be too expressive. The previous sections give the theoretical underpinning for feasible learning algorithms that output graphs that are less expressive than the Markov equivalence class.

A subset of the weak equivalence relations, $k$ -weak equivalence relations, are naturally parametrized by a natural number $k$ . Varying $k$ , one obtains more or less granular graphical modeling, and a simple hierarchy of equivalence classes can be described across $k$ . The parameter $k$ specifies both the granularity of the equivalence class and the complexity of, e.g., finding a maximal graph. The work in this paper mostly focused on the $k$ -weak equivalence, however, the central results hold more general weak equivalences, and one may find applications of other types of equivalence relations, e.g., with inspiration from specific applications.

10 Acknowledgments

This work was funded by a DFF-International Postdoctoral Grant (0164-00023B) from Independent Research Fund Denmark. The author is a member of the ELLIIT Strategic Research Area at Lund University. The author thanks Karin Rathsman for discussing alarm handling at the European Spallation Source.

Appendix A Decision problems

We list the formal decision problems used in Section 3.

Decision problem A.1 (Add-1 bidirected Markov equivalence in DMGs).

Let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ be DMGs such that $E_{2}=E_{1}\cup\{e\}$ and $e$ is bidirected edge. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ Markov equivalent?

Decision problem A.2 (Add-1 directed Markov equivalence in DMGs).

Let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ be DMGs such that $E_{2}=E_{1}\cup\{e\}$ and $e$ is directed edge. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ Markov equivalent?

The next decision problems are sparse versions of Decision problems A.1 and A.2.

Decision problem A.3 (Add-1 birected Markov equivalence in sparse DMGs).

Let $m$ be a nonnegative integer and let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ be $m$ -sparse DMGs such that $E_{2}=E_{1}\cup\{e\}$ and $e$ is bidirected edge. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ Markov equivalent?

Decision problem A.4 (Add-1 directed Markov equivalence in sparse DMGs).

Let $m$ be a nonnegative integer and let $\mathcal{G}_{1}=(V,E_{1})$ and $\mathcal{G}_{2}=(V,E_{2})$ be $m$ -sparse DMGs such that $E_{2}=E_{1}\cup\{e\}$ and $e$ is directed edge. Are $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ Markov equivalent?

Appendix B Node connectivity in DMGs

In this section, we elaborate on the discussion in Subsection 3.1 on different notions of node connectivity in a DMG. For a DMG, $\mathcal{G}=(V,E)$ and a node $\beta\in V$ , we define $\beta$ ’s indegree, $\mathrm{in}_{\mathcal{G}}(\beta)$ , to be number of nodes, $\alpha\in V$ , such that $\alpha\ *\!\!\rightarrow\beta$ . Similarly, we define $\beta$ ’s outdegree, $\mathrm{out}_{\mathcal{G}}(\beta)$ , as the number of nodes, $\alpha\in V$ , such that $\beta\ *\!\!\rightarrow\alpha$ . This is an adaptation of the common definitions of in- and outdegree in DAGs. If $\alpha\ *\!\!\rightarrow\beta$ in $\mathcal{G}$ , then $\alpha\in u(\beta,\mathcal{I}(\mathcal{G}))$ , and it follows that the indegree of $\beta$ is less than or equal to $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ . Similarly, the outdegree of $\beta$ is less than or equal to $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ . It holds that $\sum_{\beta\in V}\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)=\sum_{\beta\in V}\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ . However, as illustrated in Figure 18 it is possible for $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ for some $\beta$ to be large while $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\leftarrow}{1.8pt}}(\alpha)$ is small for all $\alpha\in V$ .

The indegree and outdegree of a node $\beta$ need not equal the $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ and $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ , respectively (see the example in Figure 19). Moreover, the indegree and outdegree need not be the same for Markov equivalent graphs (Figure 19).

The example in Figure 7 is exploiting non-maximality of the graph. In each Markov equivalence class, $[\mathcal{G}]$ , there is a greatest element, $\mathcal{N}$ and one could define sparsity of the nodes in $\mathcal{G}$ by counting adjacencies in the $\mathcal{N}$ which is invariant under Markov equivalence. However, the in- and outdegree of $\beta$ in $\mathcal{N}$ may still be strictly less than $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ and $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\leftarrow}{1.8pt}}(\beta)$ , respectively (Figure 19). In fact, one can find a family of graphs, $\{\mathcal{G}_{n}=(V_{n},E_{n})\}$ , and a node $\beta\in V_{n}$ for all $n$ such that $\mathrm{con}_{\mathcal{G}_{1}}^{\scaleto{\rightarrow}{1.8pt}}(\beta)$ is unbounded while the indegree and outdegree are fixed (see the example in Figure 21).

If $\alpha$ is inseparable into $\beta$ and $\beta$ is inseparable into $\alpha$ in a maximal DMG, they need not be adjacent (see the example in Figure 20).

Appendix C Marginalization

This section argues that the representation of weak equivalence is closed under marginalization in the sense that we can marginalize any graph, $\mathcal{G}$ , onto a smaller node set, $O$ , which represents the same independence model as the original graph when restricting independence statements to triples $(A,B,C)$ such that $A,B,C\subseteq O$ . This is formalized in Equation 1. A so-called latent projection of $\mathcal{G}$ satisfies this requirement. The latent projection was also used in Mogensen and Hansen (2020), and earlier in Verma and Pearl (1990a); Richardson et al. (2017).

Definition C.1 (Latent projection).

We denote the latent projection on $\mathcal{G}$ on $O$ by $m(\mathcal{G},O)$ .

The latent projection of a graph on a node set represents a marginalized version of the independence model, as formalized by the following corollary. Mogensen and Hansen (2020) proved this result in the case of $\mathcal{J}=\mathcal{P}$ , that is, in the case of Markov equivalence (Mogensen and Hansen, 2020, Theorem 3.12). The general case follows directly from the Markov equivalence result.

Corollary C.2.

Let $\mathcal{G}(V,E)$ , $O\subseteq V$ , and let $\mathcal{M}=m(\mathcal{G},O)$ . For $A,B,C\subseteq O$ , it holds that

[TABLE]

Proof.

Theorem 3.12 of Mogensen and Hansen (2020) shows that

[TABLE]

and the result follows immediately. ∎

Mogensen and Hansen (2020) stated an algorithm to output the latent projection of a DMG (Algorithm 1). This was similar to earlier algorithms in of other classes of graphs Koster (1999); Sadeghi (2013). The following proposition was proved by Mogensen and Hansen (2020).

Proposition C.3 (Mogensen and Hansen (2020)).

Let $\mathcal{G}=(V,E)$ be a DMG and $O\subseteq V$ . Algorithm 1 outputs its latent projection, $m(\mathcal{G},O)$ .

One should note that the marginalization of a (weakly) maximal graph need not be (weakly) maximal as illustrated in Figure 22.

Appendix D Proofs and lemmas

The proofs of the following lemmas are adaptations of the proofs of Lemmas 5.4 and 5.8 in Mogensen and Hansen (2020). We include them for completeness to show how the appropriate changes are made. Lemmas 5.4 and 5.8 in Mogensen and Hansen (2020) did not use the $C$ -specific conditions that are essential in obtaining the stronger results that we present in this paper.

Definition D.1 (Route).

We say that a walk, $\omega=(\gamma_{1},e_{1},\gamma_{2},\ldots,e_{l},\gamma_{l+1})$ , is a route if the node $\gamma_{l+1}$ occurs at most twice on $\omega$ and no other node occurs more than once on $\omega$ .

Routes characterize $\mu$ -connections in DMGs (Mogensen and Hansen, 2020), and we use them in the next proofs. Note that the below lemmas are formulated using $\mathcal{I}(\mathcal{G})$ , not the restricted version $\mathcal{I}_{\mathcal{J}}(\mathcal{G})$ .

Lemma D.2.

Let $C\subseteq V$ and let $e$ be a $C$ -potential sibling edge between $\alpha$ and $\beta$ in $\mathcal{I}(\mathcal{G})$ . Let $\gamma,\delta\in V$ . If there is a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}+e$ , then there is a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}$ .

Proof.

Consider any $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}+e$ . We can also find a $\mu$ -connecting route from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}+e$ (Mogensen and Hansen, 2020), and we denote this route by $\rho$ . If $\alpha\notin C$ , then there exists a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}$ using (cs1) of Definition 5.1. If $\beta\notin C$ , then there exists a $\mu$ -connecting walk from $\beta$ to $\alpha$ given $C$ in $\mathcal{G}$ , also using (cs1). We denote these walks by $\nu_{1}$ and $\nu_{2}$ , respectively, if they exist.

If $e$ does not occur on $\rho$ , then $\rho$ is $\mu$ -connecting given $C$ in $\mathcal{G}$ . If $e$ occurs twice, then either $\rho$ contains a subroute $\alpha\leftrightarrow\beta\leftrightarrow\alpha$ and $\delta=\alpha$ or $\rho$ contains a subroute $\beta\leftrightarrow\alpha\leftrightarrow\beta$ and $\delta=\beta$ . Assume first the former. There is either a $\mu$ -connecting subroute from $\gamma$ to $\alpha$ , or $\alpha\notin C$ . If $\beta\in C$ , then consider the subroute between $\gamma$ and $\alpha$ . This subroute is either trivial or has a tail at $\alpha$ . In either case, composing it with $\nu_{1}$ gives a $\mu$ -connecting walk from $\gamma$ to $\beta$ given $C$ in $\mathcal{G}$ , and using (cs2) there is also a $\mu$ -connecting walk from $\gamma$ to $\alpha$ given $C$ in $\mathcal{G}$ . If $\beta\notin C$ , then we can compose the subroute from $\gamma$ to $\alpha$ with $\nu_{1}$ and $\nu_{2}$ . The resulting walk will be $\mu$ -connecting as $\beta\in\mathrm{an}(C)\setminus C$ . The argument is the same when $\beta\leftrightarrow\alpha\leftrightarrow\beta$ and $\delta=\beta$ .

We now assume that $e$ occurs only once on $\rho$ and assume first that

[TABLE]

If $\alpha\notin C$ , then we can compose $\rho_{1}$ , $\nu_{1}$ , and $\rho_{2}$ to obtain a $\mu$ -connecting walk given $C$ . Note that this also holds if $\rho_{1}$ is trivial. If $\alpha\in C$ , then $\rho_{1}$ is not trivial and it has a head at $\alpha$ . Using (cs3), there exists a $\mu$ -connecting walk from $\gamma$ to $\beta$ and composing it with $\rho_{2}$ gives the result. If instead

[TABLE]

the same arguments work, now using (cs2). ∎

Lemma D.3.

Let $C\subseteq V$ and let $e$ be a $C$ -potential parent edge from $\alpha$ to $\beta$ in $\mathcal{I}(\mathcal{G})$ . Let $\gamma,\delta\in V$ . If there is a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}+e$ , then there is a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}$ .

Proof.

We consider a $\mu$ -connecting walk from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}+e$ . If $\alpha\notin C$ , then by (cp1) there exists a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ , and we denote this walk by $\nu$ when it exists. We can find a $\mu$ -connecting route from $\gamma$ to $\delta$ given $C$ in $\mathcal{G}+e$ , and we denote this route by $\rho$ .

In this proof, we will say that a collider on a walk is newly closed if the collider is in $\mathrm{an}_{\mathcal{G}+e}(C)$ , but not in $\mathrm{an}_{\mathcal{G}}(C)$ . If there exists a newly closed collider, then $\alpha\notin C$ and $\beta\in\mathrm{an}_{\mathcal{G}}(C)$ . We assume first that $e$ occurs at most once on $\rho$ . If there are newly closed colliders on $\rho$ , the proof of Lemma 5.8 in Mogensen and Hansen (2020) shows that we can find a $\mu$ -connecting walk in $\mathcal{G}+e$ with no newly closed colliders such that $e$ occurs at most once, and we denote this walk by $\tilde{\omega}$ .

If $\tilde{\omega}$ does not contain $e$ , then the result follows. If it does contain $e$ , we split into two cases. Assume first that

[TABLE]

We see that $\alpha\notin C$ . If $\rho_{1}$ is trivial or if it has a tail at $\alpha$ , then composing $\rho_{1}$ , $\nu$ , and $\rho_{2}$ gives a $\mu$ -connecting walk. If $\rho_{1}$ has a head at $\alpha$ , then (cp2) gives a $\mu$ -connecting walk from $\gamma$ to $\beta$ that we can compose with $\rho_{2}$ . Assume instead that

[TABLE]

If $\rho_{1}$ has a head at $\beta$ and $\beta\in C$ , then (cp3) gives the result. If $\beta\notin C$ , we can find a walk in $\mathcal{G}+e$ with no newly closed colliders and only one occurrence of $e$ of the type

[TABLE]

where $\rho_{1}$ can be trivial, using the same argument as in the proof of Lemma 5.8 in Mogensen and Hansen (2020). We have $\alpha,\beta\notin C$ and there is a $\mu$ -connecting walk from $\alpha$ to $\delta$ . Using (cp4) there is also one from $\beta$ to $\delta$ . Composing this with $\rho_{1}$ gives the result since $\rho_{1}$ is either trivial or has a tail at $\beta$ .

Finally, if $e$ occurs twice on $\rho$ , we must have $\alpha\notin C$ . We can use the same arguments as in the proof of Lemma 5.8 in Mogensen and Hansen (2020) using the walk $\nu$ and condition (cp2). ∎

Appendix E Additional results

When we count the number of colliders on a walk, we count them with multiplicity, that is, if

[TABLE]

is a walk, $\omega$ , then the number of colliders on this walk equals the number of $i$ , $2\leq i\leq l$ , such that $e_{l-1}$ and $e_{i}$ both have heads at $\gamma_{i}$ on $\omega$ . Note that the endpoints, $\gamma_{1}$ and $\gamma_{l+1}$ are not colliders by definition. The next lemma is useful for giving a characterization of $k$ -weak equivalence in terms of $\mu$ -connecting walks.

Lemma E.1.

If there is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}$ , then there is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}$ with at most $|C|$ colliders, all of which are in $C$ .

Proof.

Let $\gamma_{1},\gamma_{2},\ldots,\gamma_{l}$ denote the colliders on the $\mu$ -connecting walk. We know that $\gamma_{i}\in\mathrm{an}(C)$ and therefore there exist a directed path $\gamma_{i}\rightarrow\delta_{1}\rightarrow\ldots\rightarrow\delta_{l_{i}}$ such that $\delta_{l_{i}}\in C$ and such that $\delta_{l_{i}}\in C$ is the only node in $C$ on this directed path. If $\gamma_{i}\in C$ , then the path is trivial, that is, contains no edges and just a single node, $\gamma_{i}$ . Adding $\gamma_{i}\rightarrow\delta_{1}\rightarrow\ldots\rightarrow\delta_{l_{i}}\leftarrow\ldots\leftarrow\delta_{1}\leftarrow\gamma_{i}$ for each $i$ creates a walk which is $\mu$ -connecting from $\alpha$ to $\beta$ given $C$ such that every collider is in $C$ . If a node occurs as a collider more than once, we can remove the loop. The resulting walk is also $\mu$ -connecting, also if $\beta$ is a collider, and it has strictly fewer colliders. We can repeat this to find a $\mu$ -connecting walk with at most $|C|$ colliders. ∎

Proposition E.2.

Let $\mathcal{G}$ be a DMG. Let $\alpha,\beta\in V$ and $C\subseteq V$ such that $|C|\leq k$ . We have $(\alpha,\beta,C)\in\mathcal{I}_{k}(\mathcal{G})$ if and only if there is no $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}$ with at most $k$ colliders.

Proof.

If there is a $\mu$ -connecting walk given $C$ , then clearly $(\alpha,\beta,C)\notin\mathcal{I}_{k}(\mathcal{G})$ . On the other hand, if $(\alpha,\beta,C)\notin\mathcal{I}_{k}(\mathcal{G})$ then there is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ and Lemma E.1 gives the result. ∎

This means that the restriction of the independence models to $k$ -weak equivalence ignores $\mu$ -connecting walks with more than $k$ colliders.

Corollary E.3.

Graphs $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ are $k$ -weak equivalent if and only if it holds for all $\alpha,\beta\in V$ and $C\subseteq V$ such that $|C|\leq k$ that there is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}_{1}$ with at most $k$ colliders if and only if there is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}_{2}$ with at most $k$ colliders.

Proof.

Assume first that $\mathcal{G}_{1}\in[\mathcal{G}_{2}]_{k}$ , and that $\omega$ is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ , $|C|\leq C$ , in $\mathcal{G}_{1}$ with at most $k$ colliders. Proposition E.2 gives that $(\alpha,\beta,C)\notin\mathcal{I}_{k}(\mathcal{G}_{1})$ and therefore $(\alpha,\beta,C)\notin\mathcal{I}_{k}(\mathcal{G}_{2})$ . Using Proposition E.2 again gives the result.

Assume instead that for all $\alpha,\beta,C$ such that $|C|\leq k$ it holds that there is a $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ with at most $k$ colliders in $\mathcal{G}_{1}$ if and only if there is one in $\mathcal{G}_{2}$ . If $(\alpha,\beta,C)\in\mathcal{I}_{k}(\mathcal{G}_{1})$ , $\alpha\notin C$ , then there is no $\mu$ -connecting walk from $\alpha$ to $\beta$ given $C$ in $\mathcal{G}_{1}$ and therefore also no $\mu$ -connecting walk with at most $k$ colliders in $\mathcal{G}_{2}$ , and Propositions 4.11 and E.2 give the result. ∎

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aalen (1987) Odd O. Aalen. Dynamic modelling and causality. Scandinavian Actuarial Journal , 1987(3-4):177–190, 1987.
2Absar and Zhang (2021) Saima Absar and Lu Zhang. Discovering time-invariant causal structure from temporal data. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages 2807–2811, 2021.
3Ali et al. (2009) R. Ayesha Ali, Thomas S. Richardson, and Peter Spirtes. Markov equivalence for ancestral graphs. The Annals of Statistics , 37(5B):2808–2837, 2009.
4Andersson et al. (1997 a) Steen A Andersson, David Madigan, and Michael D Perlman. A characterization of Markov equivalence classes for acyclic digraphs. The Annals of Statistics , 25(2):505–541, 1997 a.
5Andersson et al. (1997 b) Steen A Andersson, David Madigan, and Michael D Perlman. On the Markov equivalence of chain graphs, undirected graphs, and acyclic digraphs. Scandinavian Journal of Statistics , 24(1):81–102, 1997 b.
6Andersson et al. (2001) Steen A Andersson, David Madigan, and Michael D Perlman. Alternative Markov properties for chain graphs. Scandinavian journal of statistics , 28(1):33–85, 2001.
7Bhattacharjya et al. (2022) Debarun Bhattacharjya, Karthikeyan Shanmugam, Tian Gao, and Dharmashankar Subramanian. Process independence testing in proximal graphical event models. In Conference on Causal Learning and Reasoning , pages 144–161. PMLR, 2022.
8Christgau et al. (2022) Alexander Mangulad Christgau, Lasse Petersen, and Niels Richard Hansen. Nonparametric conditional local independence testing. ar Xiv preprint ar Xiv:2203.13559 , 2022.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Weak equivalence of local independence graphs

Abstract

1 Introduction

2 Local independence and graphs

Definition 2.1** (Local independence).**

Definition 2.2** (Local independence graph).**

2.1 Alarm network

2.2 Graphs

Definition 2.3** (Directed mixed graph (DMG)).**

Definition 2.4** (μ\muμ-connecting walk).**

Definition 2.5** (μ\muμ-separation).**

Example 2.6**.**

2.3 Independence models and Markov equivalence

Definition 2.7** (Markov equivalence).**

Example 2.8**.**

2.3.1 Extremal elements of sets of DMGs

Definition 2.9** (Maximal element, DMG).**

Definition 2.10** (Greatest element, DMG).**

Example 2.11**.**

2.3.2 Representation of Markov equivalence classes

Theorem 2.12** **(Greatest element of a Markov equivalence class,

Example 2.13**.**

3 Hardness of marginalized local independence graphs

Decision problem 3.1** (Markov equivalence in DMGs).**

Theorem 3.2**.**

Corollary 3.3**.**

Proof.

3.1 Sparse DMGs

Definition 3.4** (Node connectivity in DMG).**

Definition 3.5** (mmm-sparsity).**

Decision problem 3.6** (Markov equivalence in mmm-sparse DMGs).**

Theorem 3.7**.**

Corollary 3.8**.**

Proof.

Lemma 3.9**.**

Proof.

Lemma 3.10**.**

Proof.

ρ1≠α\rho_{1}\neq\alphaρ1​=α:

ρ1=α\rho_{1}=\alphaρ1​=α:

3.2 Implications of hardness results

3.2.1 Sparse DMGs

4 Weak equivalence

4.1 Classes of weak equivalence

4.1.1 General weak equivalence

Definition 4.1** (General weak equivalence).**

Proposition 4.2**.**

Proof.

Proposition 4.3**.**

Example 4.4**.**

4.1.2 Homogeneous weak equivalence

Definition 4.5** (Homogeneous equivalence).**

4.1.3 kkk-weak equivalence

Definition 4.6** (kkk-weak equivalence).**

4.2 Properties of weak equivalence

Proposition 4.7**.**

Proof.

Proposition 4.8** (Well-ordered J\mathcal{J}J-classes).**

Proof.

Corollary 4.9** (Well-ordered kkk-classes).**

Definition 4.10**.**

Proposition 4.11**.**

Proof.

Proposition 4.12** (Maximality).**

Proof.

Example 4.13**.**

Proposition 4.14**.**

Proof.

Corollary 4.15**.**

Proposition 4.16** (Minimality).**

Proof.

Proposition 4.17**.**

Proof.

Definition 2.1 (Local independence).

Definition 2.2 (Local independence graph).

Definition 2.3 (Directed mixed graph (DMG)).

Definition 2.4 ( $\mu$ -connecting walk).

Definition 2.5 ( $\mu$ -separation).

Example 2.6.

Definition 2.7 (Markov equivalence).

Example 2.8.

Definition 2.9 (Maximal element, DMG).

Definition 2.10 (Greatest element, DMG).

Example 2.11.

Theorem 2.12 (Greatest element of a Markov equivalence class,

Example 2.13.

Decision problem 3.1 (Markov equivalence in DMGs).

Theorem 3.2.

Corollary 3.3.

Definition 3.4 (Node connectivity in DMG).

Definition 3.5 ( $m$ -sparsity).

Decision problem 3.6 (Markov equivalence in $m$ -sparse DMGs).

Theorem 3.7.

Corollary 3.8.

Lemma 3.9.

Lemma 3.10.

$\rho_{1}\neq\alpha$ :

$\rho_{1}=\alpha$ :

Definition 4.1 (General weak equivalence).

Proposition 4.2.

Proposition 4.3.

Example 4.4.

Definition 4.5 (Homogeneous equivalence).

4.1.3 $k$ -weak equivalence

Definition 4.6 ( $k$ -weak equivalence).

Proposition 4.7.

Proposition 4.8 (Well-ordered $\mathcal{J}$ -classes).

Corollary 4.9 (Well-ordered $k$ -classes).

Definition 4.10.

Proposition 4.11.

Proposition 4.12 (Maximality).

Example 4.13.

Proposition 4.14.

Corollary 4.15.

Proposition 4.16 (Minimality).

Proposition 4.17.

4.3 $k$ -weak equivalence

Proposition 4.18.

Example 4.19 (Weak equivalence class).

Example 4.20.

Definition 4.21 (Trek, directed trek).

Definition 4.22.

Corollary 4.23.

Proposition 4.24.

5.1 $C$ -potential siblings and $C$ -potential parents

Definition 5.1 ( $C$ -potential sibling).

Definition 5.2 ( $C$ -potential parent).

Proposition 5.3 (Graphical version of $C$ -potential siblings).

Proposition 5.4 (Graphical version of $C$ -potential parents).

Proposition 5.5.

Proposition 5.6.

Lemma 5.7.

Theorem 5.8.

Definition 6.1 (Directed mixed equivalence graph (DMEG)).

Example 6.2 (Directed mixed equivalence graph).

Example 6.3.

6.2 Hierarchy of $k$ -weak equivalence

Properties of $\mathcal{H}_{V}$

Example 6.4 ( $k$ -weak hierarchy over $V=\{1,2,3,4\}$ ).

Decision problem 7.1 (Weak Markov equivalence in DMGs).

Decision problem 7.2 (Weak Markov equivalence in DMGs).

Decision problem A.1 (Add-1 bidirected Markov equivalence in DMGs).

Decision problem A.2 (Add-1 directed Markov equivalence in DMGs).

Decision problem A.3 (Add-1 birected Markov equivalence in sparse DMGs).

Decision problem A.4 (Add-1 directed Markov equivalence in sparse DMGs).

Definition C.1 (Latent projection).

Corollary C.2.

Proposition C.3 (Mogensen and Hansen (2020)).

Definition D.1 (Route).

Lemma D.2.

Lemma D.3.

Lemma E.1.

Proposition E.2.