Markov Conditions and Factorization in Logical Credal Networks
Fabio Gagliardi Cozman

TL;DR
This paper explores the properties of Logical Credal Networks, focusing on how different Markov conditions affect their structure, factorization, and specification, especially in networks with and without cycles.
Contribution
It introduces the concept of structure in Logical Credal Networks and analyzes the impact of Markov conditions on factorization and network properties.
Findings
Acyclic structures lead to known factorization results.
Differences between Markov conditions in cyclic networks are analyzed.
The paper clarifies specification requirements for various network structures.
Abstract
We examine the recently proposed language of Logical Credal Networks, in particular investigating the consequences of various Markov conditions. We introduce the notion of structure for a Logical Credal Network and show that a structure without directed cycles leads to a well-known factorization result. For networks with directed cycles, we analyze the differences between Markov conditions, factorization results, and specification requirements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Logic, Reasoning, and Knowledge · DNA and Biological Computing
\jmlrworkshop
ISIPTA 2023
Markov Conditions and Factorization in Logical Credal Networks
\NameFabio Gagliardi Cozman
\addrUniversidade de São Paulo
Brazil
Abstract
We examine the recently proposed language of Logical Credal Networks, in particular investigating the consequences of various Markov conditions. We introduce the notion of structure for a Logical Credal Network and show that a structure without directed cycles leads to a well-known factorization result. For networks with directed cycles, we analyze the differences between Markov conditions, factorization results, and specification requirements.
keywords:
Logical credal networks, probabilistic logic, Markov condition, factorization.
1 Introduction
This paper examines Logical Credal Networks, a formalism recently introduced by Qian et al. [16] to combine logical sentences, probabilities and independence relations. The have proposed interesting ideas and evaluated the formalism in practical scenarios with positive results.
The central element of a Logical Credal Network (LCN) is a collection of constraints over probabilities. Independence relations are then extracted mostly from the logical content of those inequalities. This scheme differs from previous proposals that extract independence relations from explicitly specified graphs [1, 7, 8]. Several probabilistic logics have also adopted explicit syntax for independence relations even when graphs are not employed [2, 9, 10].
While Logical Credal Networks have points in common with existing formalisms, they do have novel features that deserve attention. For one thing, they resort to directed graphs that may contain directed cycles. Also they are endowed with a sensible Markov condition that is distinct from previous ones. Little is known about the consequences of these features, and how they interact with the syntactic conventions that turn logical formulas into edges in graphs. In particular, it seems that no study has focused on the consequences of Markov conditions on factorization results; that is, how such conditions affect the factors that constitute probability distributions.
In this paper we present a first investigation towards a deeper understanding of Logical Credal Networks, looking at their specification, their Markov conditions, their factorization properties. We introduce the notion of “structure” for a LCN. We then show that the local Markov condition proposed by Qian et al. [16] collapses to the usual local Markov condition applied to chain graphs when the structure has no directed cycles. We analyze the behavior of the former Markov condition in the presence of directed cycles, in particular examining factorization properties and discussing the semantics of the resulting language. To conclude, we propose a novel semantics for LCNs and examine factorization results for their associated probability distributions.
2 Graphs and Markov Conditions
In this section we present the necessary concepts related to graphs and graph-theoretical probabilistic models (Bayesian networks, Markov networks, and chain graphs). Definitions and notation vary across the huge literature on these topics; we rely here on three sources. We use definitions by Qian et al. [16] and by Spirtes [18] in their work on LCNs and on directed graphs respectively; we also use standard results from the textbook by Cowell et al. [5].
A graph is a triple , where is a set of nodes, and both and are sets of edges. A node is always labeled with the name of a random variable; in fact, we do not distinguish between a node and the corresponding random variable. The elements of are directed edges. A directed edge is an ordered pair of distinct nodes, and is denoted by . The elements of are undirected edges. An undirected edge is a pair of distinct nodes, and is denoted by ; note that nodes are not ordered in an undirected edge, so there is no difference between and . Note that and are sets, so there are no multiple copies of elements in them (for instance, there are no multiple undirected edges between two nodes). Note also that there is no loop from a node to itself.
If there is a directed edge from to , the edge is said to be from to , and then is a parent of and is a child of . The parents of are denoted by . If there are directed edges and between and , we say there is bi-directed edge between and and write . If , then both nodes are said to be neighbors. The neighbors of are denoted by . The boundary of a node , denoted by , is the set . The boundary of a set of nodes is . If we have a set of nodes such that, for all , the boundary of contained in , then is an ancestral set.
A path from to is a sequence of edges, the first one between and some node , then from to and so on, until an edge from to , where all nodes other than and are distinct, and such that for each pair of consecutive nodes in the path we have either or but never . If and are identical, the path is a cycle. If there is at least one directed edge in a path, the path is a directed path; if that path is a cycle, then it is a directed cycle. If a path is not directed, then it is undirected (hence all edges in the path are undirected ones). A directed/undirected graph is a graph that only contains directed/undirected edges. A graph without directed cycles is a chain graph. Figure 1 depicts a number of graphs.
If there is a directed path from to , then is an ancestor of and is a descendant of . For instance, in Figure 1.a, is the ancestor of and is the descendant of ; in Figure 1.c, there are no ancestors nor descendants of ; in Figure 1.e, is the ancestor of , and there are no descendants of . As a digression, note that Cowell et al. [5] define “ancestor” and “descendant” somewhat differently, by asking that there is a path from to but not from to ; this definition is equivalent to the previous one for graphs without directed cycles, but it is different otherwise (for instance, in Figure 1.b the node has descendants in the previous definition but no descendants in the sense of Cowell et al. [5]). We stick to our former definition, a popular one [13] that seems appropriate in the presence of directed cycles [18].
We will need the following concepts:
- •
Suppose we take graph and remove its directed edges to obtain an auxiliary undirected graph . A set of nodes is a chain multi-component of iff every pair of nodes in is connected by a path in . And is a chain component iff it is either a chain multi-component or a single node that does not belong to any chain multi-component.
- •
Suppose we take graph and add undirected edges between all pairs of nodes that have a children in a common chain component of and that are not already joined in . Suppose we then take the resulting graph and transform every directed edge into an undirected edge (if , then both transformed undirected edges collapse into ). The final result is the moral graph of , denoted by .
- •
Suppose we take a graph and a triple of disjoint subsets of nodes, and we build the moral graph of the smallest ancestral set containing the nodes in . The resulting graph is denoted by .
Figure 2 depicts the moral graphs that correspond respectively to the five graphs in Figure 1.
There are several formalisms that employ graphs to represent stochastic independence (and dependence) relations among the random variables associated with nodes. In this paper we focus only on discrete random variables, so the concept of stochastic independence is quite simple: random variables and are (conditionally) independent given random variables iff for every possible and and every such that . In case is absent, we have independence of and iff for every possible and .
A Markov condition explains how to extract independent relations from a graph; there are many such conditions in the literature [5].
Consider first an undirected graph with set of nodes . The local Markov condition states that a node is independent of all nodes in other than itself and ’s neighbors, , given . The global Markov condition states that, given any triple of disjoint subsets of , such that separates and , then nodes and are independent given nodes .111In an undirected graph, a set of nodes separates two other sets iff, by deleting the separating nodes, we have no connecting path between a node in one set and a node in the other set. If a probability distribution over all random variables in is everywhere larger than zero, then both conditions are equivalent and they are equivalent to a factorization property: for each configuration of variables , where denotes the random variables in , we have , where is the set of cliques of , each is a function over the random variables in clique , and is the projection of over the random variables in clique .222A clique is a maximal set of nodes such that each pair of nodes in the set is joined.
Now consider an acyclic directed graph with set of nodes . The local Markov condition states that a node is independent, given ’s parents , of all its non-descendants non-parents except itself. The factorization produced by the local Markov condition is
[TABLE]
where is the value of in and is the projection of over the parents of .
Finally, consider a chain graph with set of nodes . The local Markov condition for chain graphs is:
Definition 2.1** (LMC(C)).**
A node is independent, given its parents, of all nodes that are not itself nor descendants nor boundary nodes of .
The global Markov condition for chain graphs is significantly more complicated:
Definition 2.2** (GMC(C)).**
Given any triple of disjoint subsets of , if separates and in the graph , then nodes and are independent given nodes .
Again, if a probability distribution over all random variables in is everywhere larger than zero, then both Markov conditions are equivalent and they are equivalent to a factorization property as follows. Take the chain components ordered so that nodes in can only be at the end of directed edges starting from chain compoents before ; this is always possible in a chain graph. Then the factorization has the form where is the set of nodes in the th chain component; and are respectively the projection of over and . Moreover, each factor in the product itself factorizes accordingly to an undirected graph that depends on the corresponding chain component [5]. More precisely, for each chain component , build an undirected graph consisting of the nodes in and with all edges between these nodes in turned into undirected edges in this new graph, and with new undirected edges connecting each pair of nodes in that were not joined already; then each equals the ratio for positive function , where with the sum extending over all configurations of .
3 Logical Credal Networks
A Logical Credal Network (LCN) consists of a set of propositions and two sets of constraints and . The set is finite with propositions . Each proposition is associated with a random variable that is an indicator variable: if holds in an interpretation of the propositions then ; otherwise, . From now on we simply use the same symbol for a proposition and its corresponding indicator random variable. Each constraint in and in is of the form
[TABLE]
where each and each is a formula. In this paper, formulas are propositional (with propositions in and connectives such as negation, disjunction, conjunction). The definition of LCNs by Qian et al. [16] allows for relational structures and first-order formulas; however, their semantics is obtained by grounding on finite domains, so we can focus on a propositional language for our purposes here.
Note that can be a tautology , in which case we can just write the “unconditional” probability . One can obviously use simple variants of constraints, such as or or , whenever needed.
The semantics of a LCN is given by a translation from the LCN to a directed graph where each proposition/random variable is a node (we often refer to them as proposition-nodes). Each constraint is then processed as follows. First, a node labeled with formula is added and, in case is not , another node labeled with is added (we often refer to them as formula-nodes), with a directed edge from to . Then an edge is added from each proposition in to node in case the latter is in the graph, and an edge is added from node to each proposition in .333We note that the original presentation of LCNs is a bit different from what we just described, as there are no edges added for a constraint in for which is . But this does not make any difference in the results and simplifies a bit the discussion. Finally, in case the constraint is in , an edge is added from each proposition in to node . We do not distinguish between two logically equivalent formulas (the original proposal by Qian et al. [16] focused only on syntactic operations).
The graph just described is referred to as the dependency graph of the LCN. Note that, when a formula is just a single proposition, we do not need to present it explicitly in the dependency graph; we can just connect edges from and to the corresponding proposition-node. As shown in the next example, in our drawings formulas appear inside dashed rectangles.
Example 3.1**.**
Consider the following LCN, based on the Smokers and Friends example by Qian et al. [16]. We have propositions , , for . All constraints belong to (that is, is empty), with :
[TABLE]
The structure of the LCN is depicted in Figure 3. Note that there are several directed cycles in this dependency graph.
Qian et al. [16] then define:
Definition 3.2**.**
The lcn-parents of a proposition , denoted by , are the propositions such that there exists a directed path in the dependency graph from each of them to in which all intermediate nodes are formulas.
Definition 3.3**.**
The lcn-descendants of a proposition , denoted by , are the propositions such that there exists a directed path in the dependency graph from to each of them in which no intermediate node is a parent of .
The connections between these concepts and the definitions of parent and descendant in Section 2 will be clear in the next section.
In any case, using these definitions Qian et al. [16] proposed a Markov condition:
Definition 3.4** (LMC(LCN)).**
A node is independent, given its lcn-parents, of all nodes that are not itself nor lcn-descendants of nor lcn-parents of .
The Markov condition in Definition 3.4 is:
[TABLE]
where we use
here, and in the remainder of the paper, to mean “is independent of”.
We will often use the superscript to mean complement, hence .
Qian et al. [16] have derived inference algorithms (that is, they consider the computation of conditional probabilities) that exploit such independence relations, and they examine applications that demonstrate the practical value of LCNs.
It seems that a bit more discussion about the meaning of this Markov condition, as well as its properties and consequences, would be welcome. To do so, we find it useful to introduce a novel concept, namely, the structure of a LCN.
4 The Structure of a LCN
The dependency graph of a LCN is rather similar in spirit to the factor graph of a Bayesian network [13], where both random variables and conditional probabilities are explicitly represented. This is a convenient device when it comes to message-passing inference algorithms, but perhaps it contains too much information when one wishes to examine independence relations.
We introduce another graph to be extracted from the dependency graph of a given LCN, that we call the structure of the LCN, as follows:
For each formula-node that appears as a conditioned formula in a constraint in , place an undirected edge between any two propositions that appear in . 2. 2.
For each pair of formula-nodes and that appear in a constraint, add a directed edge from each proposition in to each proposition in . 3. 3.
If, for some pair of proposition-nodes and , there is now a pair of edges , then replace both edges by an undirected edge. 4. 4.
Finally, remove multiple edges and remove the formula-nodes and all edges in and out of them.
Example 4.1**.**
Figures 4.a and 4.b depict the structure of the LCN in Example 3.1.
We have:
Lemma 4.2**.**
The set of lcn-parents of a proposition in a LCN is identical to the boundary of with respect to the structure of the LCN.
Proof 4.3**.**
Consider a LCN with a dependency graph . If is a lcn-parent of with respect to , then or or or or or for formula and ; hence appears either as a parent of or a neighbor in the structure of the LCN. Conversely, if is a parent or a neighbor of in the structure of the LCN, then one of the sequences of edges already mentioned must be in , so is a lcn-parent of in .
The natural candidate for the concept of “descendant” in a structure, so as to mirror the concept of lcn-descendant, is as follows:
Definition 4.4**.**
If there is a directed path from to such that no intermediate node is a boundary node of , then is a strict descendant of .
Using the previous definitions, we can state a local Markov condition that works for any graph but that, when applied to the structure of a LCN, has the same effect as the original Markov condition LMC(LCN) (Definition 3.4) applied to the LCN:
Definition 4.5** (LMC(C-STR)).**
A node is independent, given its boundary, of all nodes that are not itself nor strict descendants of nor boundary nodes of .
In symbols,
[TABLE]
where denotes the set of strict descendants of .
We then have:
Theorem 4.6**.**
Given a LCN, the Markov condition LMC(LCN) in Definition 3.4 is identical, with respect to the independence relations it imposes, to the local Markov condition LMC(C-STR) in Definition 4.5 applied to the structure of the LCN.
Proof 4.7**.**
To prove that Expressions (2) and (3) are equivalent, we use the fact that and are identical by Lemma 4.2 and we prove (next) that is equal to .
*Suppose then that and assume, to obtain a contradiction, that . So, our assumption is that , and the latter union can be written as , a union of disjoint sets. So it may be either that
We have , a contradiction.
We have . Then is either a parent or a neighbor, and in both cases there must be an edge from to in the dependency graph and then , a contradiction.
We have . Then there must be a directed path from to (with intermediate nodes that are not boundary nodes) in the structure, and so there must be a corresponding directed path from to (with intermediate nodes that are not parents) in the dependency graph; so , a contradiction.
So, we always get a contradiction; hence if then .*
*Suppose now that we have a node , distinct from , such that and assume, to obtain a contradiction, that . The reasoning that follows is parallel to the one in the last paragraph, but this case has a few additional twists to take care of. So, our assumption is that , and the latter union can be written as , again a union of disjoint sets. So it may be either that
We have , a contradiction.
We have . Then there is a directed edge from to , or a bi-directed edge between them, and , a contradiction.
We have . Then there must be a path from to in the dependency graph (with intermediate nodes that are not parents), and by construction of the structure there must be a path from to in the structure (with intermediate nodes that are not boundary nodes). The latter path cannot be an undirected path; otherwise, it would have passed through a parent of in the dependency graph, contradicting the assumption that is a lcn-descendant. As there is a directed path that cannot go through a parent in the structure, , a contradiction.
So, we always get a contradiction; hence if then . Thus the latter two sets are identical and the proof is finished.*
5 Chain Graphs and Factorization
If the structure of a LCN is a directed acyclic graph, the LMC(LCN) is actually the usual local Markov condition for directed acyclic graphs as applied to Bayesian or credal networks [6, 14]. If instead all constraints in a LCN belong to , all of them only referring to “unconditional” probabilities (that is, in every constraint), then the structure of the LCN is an undirected graph endowed with the usual local Markov condition for undirected graphs.
These previous points can be generalized in a satisfying way whenever the structure contains no directed cycle:
Theorem 5.1**.**
If the structure of a LCN is a chain graph, and probabilities are positive, then the Markov condition LMC(LCN) in Definition 3.4 is identical, with respect to the independence relations it imposes, to the LMC(C) applied to the structure.
Before we prove this theorem, it should be noted that sets of descendants and strict descendants are not identical. This is easy to see in graphs with directed cycles: in Figure 1.b, node has descendants and strict descendants . But even in chain graphs we may have differences: for instance, suppose that in Figure 1.e we add a single directed edge from to a new node ; then is the only descendant of , but has no strict descendants.
In fact, the descendants of a node can be divided into two sets. First, a node is in the first set if and only if there is at least one directed path from to that starts with a directed edge. All of those nodes are strict descendants of when the structure is a chain graph. Second, a node is in the second set if and only if all directed paths from to start with an undirected edge. Then all directed paths from to first reach a node that is in the boundary of and consequently is not a strict descendant of . When the structure is a chain graph, the first set is thus exactly , and we migh refer to the second set as , the set of “weak” descendants of . By construction we have . Moreover, we conclude (see Figure 5) that
[TABLE]
from which we have that . Note that by construction, hence is the union of two disjoint sets:
[TABLE]
Proof 5.2**.**
Suppose we have the LMC(C-STR); so, for any node , we have Expression (3). Using Expression (4), we have A\rotatebox[origin={c}]{90.0}{\models}((\{A\}^{c}\cap(\mbox{de}(A))^{c}\cap(\mbox{bd}(A))^{c})\cup\mbox{wde}(A))|\mbox{bd}(A); using the Decomposition property of probabilities,444The Decomposition property states that X\rotatebox[origin={c}]{90.0}{\models}Y\cup W|Z implies X\rotatebox[origin={c}]{90.0}{\models}Y|Z for sets of random variables [13]. we obtain A\rotatebox[origin={c}]{90.0}{\models}\{A\}^{c}\cap(\mbox{de}(A))^{c}\cap(\mbox{bd}(A))^{c}|\mbox{bd}(A). Thus the LMC(C) holds.
Suppose the LMC(C) holds; given then positivity condition, the GMC(C) holds [5]. Take a node and suppose there is a node . We now prove that is separated from by in
[TABLE]
and consequently the GMC(C) leads to A\rotatebox[origin={c}]{90.0}{\models}(\{A\}^{c}\cap(\mbox{sde}(A))^{c}\cap(\mbox{bd}(A))^{c})|\mbox{bd}(A) as desired. Note that, by construction, nodes in cannot be in the ancestral set of any node in , so is not in and consequently is the moral graph of the graph consisting of nodes in and edges among them in the structure. Now reason as follows. Paths that go from to a parent or neighbor are obviously blocked by . The only possible “connecting” paths from to in would have to start with an undirected edge from to say , an edge added to because and have directed edges to a common chain component. Now suppose points to node in this latter chain component. Then is not in the boundary of (no directed cycles) and the only possible way for to be in is if there is a directed path from to a node in , but the resulting directed path from to would create a contradiction (as would then not be in ). Such “connecting” paths from to are thus impossible in . Hence we must have separation of and by in .
The significance of the previous theorem is that, assuming that all probabilities are positive, the local Markov condition for a chain graph is equivalent both to the global Markov condition and to the factorization property of chain graphs. This allows us to break down the probability distribution over all random variables in a LCN in hopefully much smaller pieces that require less specification effort.
Example 5.3**.**
Figure 4.b depicts a structure that is in fact a chain graph. We can group variables to obtain chain components and and draw a directed acyclic graph with the chain components, as in Figure 4.c. The joint probability distribution factorizes as Expression (1):
[TABLE]
where is a configuration of the random variables in , while is a configuration of the random variables in . Because there are no independence relations “inside” the chain components, this factorization is guaranteed even if some probability values are equal to zero [5].
Suppose that the three constraints were replaced so that we had a similar structure but instead of a single chain component with , , , suppose we had two chain components, one with and , the other with and . The chain components might be organized as in Figure 4.d. If all probabilities are positive, that chain graph leads to a factorization of the joint probability distribution similar to the previous one, but now , where and are positive functions, and the values of , and are indicated by , , respectively.
In the previous paragraph the assumption that probabilities are positive is important: when some probabilities are zero, there is no guarantee that a factorization actually exists [15]. This is unfortunate as a factorization leads to valuable computational simplifications. One strategy then is to guarantee that all configurations do have positive probability, possibly by adding language directives that bound probabilities from below. A language command might consist of explict bounds, say , or even a direct guarantee of positivity without an explicit bounding value. This solution may be inconvenient if we do have some hard constraints in the domain. For instance, we may impose that (in which case ). However, is is still possible to obtain a factorization if hard constraints are imposed. Say we have a formula, for instance , that must be satisfied. We treat it as a constraint in , thus guaranteeing that there is a clique containing its propositions/random variables. Then we remove the impossible configurations of these random variables (in our running example, we remove ), thus reducing the number of possible configurations for the corresponding clique. A factorization is obtained again in the reduced space of configurations, provided the remaining configurations do have positive probabilities. Finally, an entirely different strategy may be pursued: adopt a stronger Markov condition that guarantees factorization (and hence global independence relations) in all circumstances. Moussouris [15] has identified one such condition, where a system is strongly Markovian in case a Markov condition holds for the system and suitable sub-systems. That (very!) strong condition forces zero probabilities to be, in a sense, localized, so that probabilities satisfy a nice factorization property; alas, the condition cannot be guaranteed for all graphs, and its consequences have not been explored in depth so far.
6 Directed Cycles
As noted already, existence of a factorization is a very desirable property for any probabilistic formalism: not only it simplifies calculations, but it also emphasizes modularity in modeling and ease of understanding. In the previous section we have shown that LCNs whose structure is a chain graph do have, under a positivity assumption, a well-known factorization property. We now examine how that result might be extended when structures have directed cycles.
The LMC(LCN) is, of course, a local condition that can be applied even in the presence of directed cycles. However, local Markov conditions may not be very satisfactory in the presence of directed cycles, as a simple yet key example suggests:
Example 6.1**.**
Take a LCN whose dependency graph is a long cycle , for some large . No has any non-descendant non-parent. And no has any non-strict-descendant non-parent either. The local Markov conditions we have contemplated do not impose any independence relation.
Local conditions seem too weak when there are long cycles. On the other hand, a global condition may work fine in those settings. For instance, apply the GMC(C) to the graph in Example 6.1; the condition does impose non-trivial independence relations such as A_{1}\rotatebox[origin={c}]{90.0}{\models}A_{3},\dots,A_{k-1}|A_{2},A_{k} and A_{2}\rotatebox[origin={c}]{90.0}{\models}A_{4},\dots,A_{k}|A_{1},A_{3} (and more generally, for any with , we have A_{i}\rotatebox[origin={c}]{90.0}{\models}A_{1},\dots,A_{i-2},A_{i+2},A_{k}|A_{i-1},A_{i+1}).
At this point it is mandatory to examine results by Spirtes [18], as he has studied local and global conditions for directed graphs, obtaining factorization results even in the presence of directed cycles. The local Markov condition adopted by Spirtes [18] is just the one adopted for directed graphs in Section 2:
Definition 6.2** (LMC(D)).**
A node is independent, given its parents, of all nodes that are not itself nor descendants nor parents of .
The global Markov condition adopted by Spirtes [18] is just the GMC(C) (Definition 2.2). Spirtes shows that the LMC(D) is not equivalent to the GMC(C) for directed graphs with directed cycles. This observation can be adapted to our setting as follows:
Example 6.3**.**
Suppose we have a LCN whose dependency graph is depicted in Figure 1.d. For instance, we might have whenever is an edge in that figure. Assume all configurations have positive probability.
The LMC(D) applied to this dependency graph yields only A\rotatebox[origin={c}]{90.0}{\models}C, B\rotatebox[origin={c}]{90.0}{\models}C|A,D and A\rotatebox[origin={c}]{90.0}{\models}D|B,C.
However, if we apply the GMC(C) directly to the dependency graph, we do not get the same independence relations: then we only obtain A\rotatebox[origin={c}]{90.0}{\models}C and A\rotatebox[origin={c}]{90.0}{\models}C|B,D, perhaps a surprising result (in this case, the graph is depicted in Figure 2.d).
Spirtes [18, Lemma 3] has shown that, for a directed graph that may have directed cycles, a positive probability distribution over the random variables is a product of factors, one per random variable, iff the distribution satisfies the GMC(C) for the graph. Note that the GMC(C) is equivalent, for graphs without directed cycles, under a positivity assumption, to the LMC(C).
However, there is a difficulty in applying Spirtes’ result to our setting.
Example 6.4**.**
Consider Example 6.3. The structure of the LCN is the chain graph in Figure 1.e, and we know that the LMC(LCN) is equivalent to the LMC(C) and GMC(C) for chain graphs. In fact, the LMC(LCN), the LMC(C), the LMC(C-STR), and the GMC(C) also yield only A\rotatebox[origin={c}]{90.0}{\models}C, B\rotatebox[origin={c}]{90.0}{\models}C|A,D and A\rotatebox[origin={c}]{90.0}{\models}D|B,C when applied to the structure. Clearly this is not the same set of independence relations imposed by the GMC(C) applied to the dependency graph (as listed in Example 6.3). There is a difference between undirected and bi-directed edges when it comes to the GMC(C).
The message of this example is that we cannot impose the GMC(C) on (a suitable version of) directed dependency graphs and hope to keep the LMC(LCN) by Qian et al. [16]. If we want the factorization induced by the GMC(C) on (a version of) dependency graphs, we must somehow modify the original semantics for LCNs by Qian et al. [16].
It is worth summarizing the discussion so far. First, it is well-known that the LMC(C) and the GMC(C) are equivalent, under a positivity assumption, for chain graphs (both conditions may differ in the presence of directed cycles). Second, we know that the LMC(LCN) for dependency graphs is equivalent to the LMC(C-STR) with respect to the corresponding structures. And if the structure is a chain graph, then the LMC(C-STR) and the LMC(C) are equivalent when applied to the structure. But for general dependency graphs any local condition seems quite weak. We might move to general dependency graphs by adapting the GMC(C) to them, so as to look for a factorization result; however, we saw that the result is not equivalent to what we obtained by applying the GMC(C) to structures.
In the next section we examine alternative semantics that are based on applying the GMC(C) to structures (possibly with directed cycles). Before we jump into that, it is worth noticing that there are many other relevant results in the literature besides the ones by Spirtes. For instance, dependency networks [11] allow for directed cycles and do have a modular specification scheme; they have only an approximate factorization, but that may be enough in applications. Another proposal has been advanced by Schmidt and Murphy [17], where directed cycles are allowed and the adopted Markov condition looks only at the Markov blanket of nodes; it does not seem that a factorization has been proven for that proposal, but it is attractive in its simplicity. There are also many kinds of graphs that have been contemplated to handle causal loops and dynamic feedback systems [3, 12, 4]. This is indeed a huge literature, filled with independence conditions and factorization properties, to which we cannot do justice in the available space. It is necessary to examine whether we can bring elements of those previous efforts into LCNs. We leave a more detailed study for the future.
7 New Semantics for LCNs
In this section we explore new semantics for LCNs by applying the GMC(C) to structures. This is motivated by the weakness of local conditions as discussed in the previous section, and also on the fact that a condition based on moralized graphs is the most obvious route to factorization properties (as the Hammersley-Clifford theorem can then be invoked under a positivity assumption [15]).
Here is a (new) semantics: a LCN represents the set of probability distributions over its nodes such that all constraints in the LCN are satisfied, and each distribution satisfies the GMC(C) with respect to the structure. Note that the GMC(C) is equivalent to the LMC(LCN) when a structure is a chain graph, but these conditions may differ in the presence of directed cycles.
The path to a factorization result is then as follows. Take the mixed-structure and, for each node , build a set with all nodes that belong to directed cycles starting at . If there a directed cycle in a set such that is in , then merge and into a set ; repeat this until there are no more sets to merge (this must stop, in the worst case with a single set containing all nodes). For each set, replace all nodes in the set by a single “super”-node, directing all edges in and out of nodes in the set to this super-node. The resulting graph has no directed cycles, so the GMC(C) applied to it results in the usual factorization over chain components of the resulting graph. Now each super-node is in fact a set of nodes that can be subject to further factorization, even though it is an open question whether a decomposition can be obtained with factors that are directly related to graph properties.
To continue, we suggest that, instead of using structures as mere secondary objects that help us clarify the meaning of dependency graphs, structures should be the primary tools in dealing with LCNs. That is, we should translate every LCN to its structure (without going through the dependency graph) and then apply appropriate Markov conditions there. Given a LCN, we can build its structure by taking every proposition as a node and then:
For each constraint in , add an undirected arrow between each pair of proposition-nodes in . 2. 2.
For each constraint add a directed edge from each proposition-node in to each proposition-node in (if is , there is no such edge to add). 3. 3.
Remove multiple identical edges. 4. 4.
For each pair of nodes and , if there is a bi-directed edge between them, replace the two edges by a single undirected edge .
For instance, the procedure above goes directly from the LCN in Example 3.1 to the structure in Figure 4.b.
When we think of structures this way, we might wish to differentiate the symmetric connections that appear when a pair of propositions appear in a formula from the mutual influences that one proposition is conditioned on the other and vice-versa.
An alternative semantics would be as follows. Take a LCN and build a mixed-structure by going through the first two steps above. That is, create a node per proposition that appears in the LCN; then take each constraint in and add undirected edges between any two propositions in , and finally take each constraint and add a directed edge from each proposition that appears in to each proposition that appears in the corresponding . Figure 6 depicts the mixed-structure for Example 3.1.
Now adopt: a LCN represents the set of probability distributions over its nodes such that all constraints in the LCN are satisfied, and each distribution satisfies the GMC(C) with respect to the mixed-structure.
The next example emphasizes the differences between semantics.
Example 7.1**.**
Suppose we have a LCN with constraints , , , . Both the structure and the mixed-structure of this LCN is depicted in Figure 7.a. Consider another LCN with constraints , , and . This second LCN has the same structure as the first one, but the mixed-structure is depicted in Figure 7.b. The GMC(C) produces quite different sets of independence relations when applied to these distinct mixed-structures; for instance, A,B\rotatebox[origin={c}]{90.0}{\models}D|C,E in the first LCN, but not necessarily in the second; A,B\rotatebox[origin={c}]{90.0}{\models}E|C,D in the second LCN, but not necessarily in the first. This seems appropriate as the LCNs convey quite distinct scenarios, one related to the symmetry of logical constraints, the other related to the links induced by directed influences.
We hope to pursue a comparison between the theoretical and pragmatic aspects of these semantics in future work.
8 Conclusion
In this paper we visited many Markov conditions that can be applied, if properly adapted, to Logical Credal Networks [16]. We reviewed existing concepts and introduced the notion of structure of a LCN, showing that the original local condition LMC(LCN) can be viewed as a local condition on structures. We then showed that the LMC(LCN) is equivalent to a usual local condition when the structure is a chain graph, and this leads to a factorization result. Moreover, we introduced a new semantics based on structures and a global Markov condition — a semantics that agrees with the original one when the structure is a chain graph but that offers a possible path to factorization properties.
There are many issues left for future work. LCNs stress the connection between the syntactic form of constraints and the semantic consequences of independence assumptions, a theme that surfaces in many probabilistic logics. We must investigate more carefully the alternatives when extracting independence relations from constraints, in particular to differentiate ways in which bi-directed edges are created.
We must also examine positivity assumptions. What is the best way to guarantee a factorization? Should we require the user to explicitly express positivity assumptions? Should we allow for logical constraints that assign probability zero to some configurations; if so, which kinds of configurations, and how to make those constraints compatible with factorization properties?
It is also important to study a large number of Markov conditions that can be found in the literature, both the ones connected with chain graphs and the ones connected with causal and feedback models, that we did not deal with in this paper. We must verify which conditions lead to factorization results, and which conditions are best suited to capture the content of logical formulas, causal influences, feedback loops.
In a more applied perspective, we must investigate whether the ideas behind LCNs can be used with practical specification languages such as Probabilistic Answer Set Programming, and we must test how various semantics for LCNs fare in realistic settings.
Acknowledgements
This work was carried out at the Center for Artificial Intelligence (C4AI - USP/IBM/FAPESP), with support by the São Paulo Research Foundation (FAPESP grant 2019/07665-4) and by the IBM Corporation. The author was partially supported by CNPq grant 312180/2018-7. We acknowledge support by CAPES - Finance Code 001.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Andersen and Hooker [1996] Kim A. Andersen and John N. Hooker. A linear programming framework for logics of uncertainty. Decision Support Systems , 16:39–53, 1996.
- 2Bacchus [1990] Fahiem Bacchus. Representing and Reasoning with Probabilistic Knowledge: A Logical Approach . MIT Press, Cambridge, 1990.
- 3Baier et al. [2022] Christel Baier, Clemens Dubslaff, Holger Hermanns, and Nikolai Kafer. On the foundations of cycles in bayesian networks. In J. F. Raskin, K. Chatterjee, L. Doyen, and R. Majumdar, editors, Principles of Systems Design , volume 13660 of Lecture Notes in Computer Science . Springer, 2022.
- 4Bongers et al. [2021] Stephan Bongers, Patrick Forré, Jonas Peters, and Joris M. Mooij. Foundations of structural causal models with cycles and latent variables. Annals of Statistics , 49(5):2885–2915, 2021.
- 5Cowell et al. [1999] Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen, and David J. Spiegelhalter. Probabilistic Networks and Expert Systems . Springer-Verlag, New York, 1999.
- 6Cozman [2000] Fabio G. Cozman. Credal networks. Artificial Intelligence , 120:199–233, 2000.
- 7Cozman and Polastro [2009] Fabio Gagliardi Cozman and Rodrigo Bellizia Polastro. Complexity analysis and variational inference for interpretation-based probabilistic description logics. In Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-09) , pages 117–125, Corvallis, Oregon, 2009. AUAI Press.
- 8da Rocha and Cozman [2005] José Carlos Ferreira da Rocha and Fabio Gagliardi Cozman. Inference in credal networks: branch-and-bound methods and the A/R+ algorithm. International Journal of Approximate Reasoning , 39(2-3):279–296, 2005.
