Algebraic Aspects of Conditional Independence and Graphical Models
Thomas Kahle, Johannes Rauh, Seth Sullivant

TL;DR
This chapter explores algebraic geometry techniques, such as binomial ideals and primary decomposition, to analyze conditional independence and graphical models, providing computational tools and examples for understanding model constraints.
Contribution
It introduces algebraic geometry methods to study conditional independence in graphical models, including primary decomposition and vanishing ideals, with practical examples.
Findings
Algebraic geometry tools can analyze implications between conditional independences.
Computing primary decompositions helps understand model constraints.
Examples include four-cycle graphical models and trek separation constraints.
Abstract
This chapter of the forthcoming Handbook of Graphical Models contains an overview of basic theorems and techniques from algebraic geometry and how they can be applied to the study of conditional independence and graphical models. It also introduces binomial ideals and some ideas from real algebraic geometry. When random variables are discrete or Gaussian, tools from computational algebraic geometry can be used to understand implications between conditional independence statements. This is accomplished by computing primary decompositions of conditional independence ideals. As examples the chapter presents in detail the graphical model of a four cycle and the intersection axiom, a certain implication of conditional independence statements. Another important problem in the area is to determine all constraints on a graphical model, for example, equations determined by trek separation. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
Algebraic Aspects of Conditional Independence and Graphical Models
Thomas Kahle
Fakultät für Mathematik, Otto-von-Guericke University Magdeburg, Germany http://www.thomas-kahle.de ,
Johannes Rauh
Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany. http://www.yorku.ca/jarauh/ and
Seth Sullivant
Department of Mathematics, North Carolina State University, Raleigh, USA. http://www4.ncsu.edu/~smsulli2/
(Date: May 2017)
Abstract.
This chapter of the forthcoming Handbook of Graphical Models contains an overview of basic theorems and techniques from algebraic geometry and how they can be applied to the study of conditional independence and graphical models. It also introduces binomial ideals and some ideas from real algebraic geometry. When random variables are discrete or Gaussian, tools from computational algebraic geometry can be used to understand implications between conditional independence statements. This is accomplished by computing primary decompositions of conditional independence ideals. As examples the chapter presents in detail the graphical model of a four cycle and the intersection axiom, a certain implication of conditional independence statements. Another important problem in the area is to determine all constraints on a graphical model, for example, equations determined by trek separation. The full set of equality constraints can be determined by computing the model’s vanishing ideal. The chapter illustrates these techniques and ideas with examples from the literature and provides references for further reading.
2010 Mathematics Subject Classification:
62-00, 13P25
1. Introduction
Consider a finite set of random variables , . Section 1.6 of Part I111All references to other sections refer to the forthcoming Handbook of Graphical Models, edited by Mathias Drton, Steffen Lauritzen, Marloes Maathuis, and Martin Wainwright. It will contain this document as Section 3 of Part I. describes how to use a simple undirected graph to encode conditional independence (CI) statements among the random variables. One can also naturally associate a parametrized family of joint probability distributions of the to a graph. For undirected graphs, the Hammersley–Clifford theorem (see Section 1.6.3 of Part I shows that both the implicit method and the parametric method lead to the same families of probability distributions (called graphical models), as long as all distributions are assumed strictly positive.
When probabilities are allowed to go to zero, the models defined by the collections of CI statements contain probability distributions that do not lie in the parametric graphical model, which, by definition, consists of strictly positive probability distributions. In fact, these additional distributions do not even lie in the closure of the parametric graphical model, so they cannot be approximated by distributions from the parametric graphical model. Moreover, models defined by the defined collections of CI statements (pairwise Markov properties, local Markov properties, global Markov properties) differ from one another. As an example, consider the four-cycle .
Proposition 1.1**.**
The binary random variables satisfy the global Markov statements of , \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|\{2,4\}\right.{} and \left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}4\,\middle|\{1,3\}\right.{}, if and only if one (or more) of the following statements is true:
- (1)
The joint distribution lies in the closure of the graphical model. 2. (2)
There is a pair of neighboring nodes such that a.s. 3. (3)
There is a pair of neighboring nodes such that a.s.
This chapter shows how to prove results such as Proposition 1.1 using algebraic tools. The algebraic method can also be used to study implications between conditional independence statements. Here is an example:
Proposition 1.2**.**
Suppose that are binary random variables or jointly normal random variables. If \left.X\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\right. and \left.X\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\,\middle|Z\right.{} then either \left.X\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(Y,Z)\right. or \left.(X,Z)\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\right..
The CI implication in Proposition 1.2 is a special case of the gaussoid axiom [26]. One may wonder what is special about jointly normal or binary random variables. For instance, is there a variant of this implication when are discrete but not binary? How can one systematically find and study implications like this?
CI implications can also be interpreted as intersections of graphical models. For example, the two CI statements \left.X\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\right. and \left.X\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\,\middle|Z\right.{} in Proposition 1.2 correspond to the two graphical models and , respectively. Thus, Proposition 1.2 says that the intersection of these two graphical models equals the union of the two graphical models and , provided the random variables are either binary or jointly normal. As the example shows, the intersection of two graphical models need not be a graphical model. How can one compute this intersection?
The goal of this chapter is to explore these questions and introduce tools from computational algebra for studying them. Our perspective is that, for a fixed type of random variable, the set of distributions that satisfy a collection of independence constraints is the zero set of a collection of polynomial equations. Solutions of systems of polynomial equations are the objects of study of algebraic geometry, and so tools from algebra can be brought to bear on the problem. The next section contains an overview of basic ideas in algebraic geometry which are useful for the study of conditional independence structures and graphical models. In particular, it introduces algebraic varieties, polynomial ideals, and primary decomposition. Section 3 introduces the ideals associated to families of conditional independence statements, and explains how to apply the basic techniques to deduce conditional independence implications. Section 4 illustrates the main ideas with some deeper examples coming from the literature. Section 5 concerns the vanishing ideal of a graphical model, which is a complete set of implicit restrictions for that model. This set of restrictions is usually much larger than the set of conditional independence constraints that come from the graph, but it can illuminate the structure of the model especially with more complex families of models involving mixed graphs or hidden random variables. Section 6 highlights some key references in this area.
2. Notions of Algebraic Geometry and Commutative Algebra
Commutative algebra is the study of systems of polynomial equations, and algebraic geometry is the study of geometric properties of their solutions. Both are rich fields with many deep results. This section only gives a very coarse introduction to the basic facts that hopefully makes it possible for the reader to understand the phenomena and algorithms discussed in later parts of this chapter. For a more detailed introduction, the reader is referred to the standard textbook [7].
2.1. Polynomials, ideals and varieties
Let be a field, for example the rational numbers , the real numbers , or the complex numbers . Let denote the natural numbers. Let be a collection of indeterminates or variables. A monomial in the indeterminates is an expression of the form where are nonnegative integers. Writing , let
[TABLE]
A polynomial is a finite linear combination of monomials, i.e.
[TABLE]
where is a finite set and . Of course, one is used to thinking of a polynomial as a function, , which can be evaluated in a point for . In the following, this function will usually have the role of a constraint; i.e., the object of interest is the zero set . In algebra, it is also useful to think of a polynomial as a formal object, i.e. the indeterminates are simply symbols that are used in manipulations, with no need for them to be evaluated.
The set of all polynomials in indeterminates with coefficients in is called the polynomial ring and denoted . The word ring means that has two operations, namely addition of polynomials and multiplication of polynomials, and that these operations satisfy all the usual properties of addition and multiplication (associativity, commutativity, distributivity). However, multiplicative inverses need not exist: The result of dividing one polynomial by another non-constant polynomial is in general not a polynomial, but a rational function.
Definition 2.1**.**
Let . The variety defined by is the vanishing set of the polynomials in , that is,
[TABLE]
Example 2.2*.*
Let and consider . The variety is the familiar parabola “” in the plane. For and , the variety is the set of all singular matrices with entries in .
Example 2.3*.*
Let . The variety is the set of points
[TABLE]
in other words, the intersection of a parabola and a circle. The number of points in this intersection varies depending on whether the underlying field is , , or (or some other field). If the variety is empty, if the variety has two points, and if , the variety has four points. In statistical applications one is usually interested in solutions over . However, it is often easier to first perform computations in algebraically closed fields like before restricting to the real numbers, at least in theory. On the other hand, when using a computer algebra system, it may be advantageous to work with , if possible, because rational numbers can be represented exactly on a computer.
The examples so far have always used finite sets . This is not necessary for the definition of a variety, and it is often worthwhile to consider the variety where is an infinite set of polynomials. In fact, it is often convenient to replace the original set of polynomials by an infinite set, the ideal generated by , which is equivalent to in some sense but has more structure.
One reason is that different families of polynomials may have the same varieties. For example, if , then the variety of equals . Similarly, for and , the variety of equals .
Definition 2.4**.**
A set is an ideal if for all , and for all and , .
Definition 2.5**.**
Let be a set of polynomials. The ideal generated by is the smallest ideal in that contains . Equivalently, the ideal generated by consists of all polynomials for , , and any . The ideal generated by is denoted .
Proposition 2.6**.**
Let be a set of polynomials. Then .
Example 2.7*.*
The ideal generated by the set from Example 2.3 has many different possible generating sets. For example, an alternate generating set is . This allows to easily find all the solutions of the polynomial system because all roots of the univariate polynomial can be plugged into the second polynomial , which can then be solved for .
Hilbert’s basis theorem implies that for any ideal there exists a finite set of polynomials such that .
Even though it is, for theoretical considerations, often easier to think about systems of polynomial equations in terms of ideals, in practice (i.e. when working with computer algebra systems), the ideal is almost always specified in terms of a finite set of generators (or such a finite set of generators has to be computed on the way). On the other hand, during a computation it is often necessary to replace this set of generators by a more convenient set of generators (e.g. a Gröbner basis), so the generators may change even though the ideal stays the same along a computation.
Definition 2.8**.**
Let . The vanishing ideal of is the set
[TABLE]
It is easy to check that a vanishing ideal is indeed an ideal. Clearly, any ideal satisfies . However, the converse inclusion does not hold in general. For instance, , and (over any field ).
Definition 2.9**.**
The ideal is the -radical of . An ideal such that is a -radical ideal. If is algebraically closed (e.g. if ), such an ideal is simply called a radical ideal.
Radical ideals can also be characterized algebraically, and there are algorithms to compute radicals. The radical is usually a simpler ideal, and if the radical of an ideal can be computed, it is advantageous to do this in a first step in each calculation (as long as one is only interested in properties of , and not in algebraic properties of ).
The following proposition illustrates the close relation between ideals and varieties.
Proposition 2.10**.**
**
- •
Let . Then and .
- •
Let . Then and .
2.2. Irreducible and primary decomposition
Proposition 2.10 shows that the union of two varieties is again a variety. Interestingly, not every variety can be written as a non-trivial finite union.
Definition 2.11**.**
A variety is reducible if there are two varieties such that . Otherwise is irreducible.
Theorem 2.12**.**
Any variety has a unique decomposition into finitely many irreducible varieties (with for ).
The varieties are called the irreducible components of .
Theorem 2.13**.**
- (1)
Let be an irreducible variety, and let be a rational map. Then is irreducible. 2. (2)
Let be a variety that has a rational parametrization such that the image of is dense in . Then is irreducible.
According to Proposition 2.10, the corresponding decomposition operation for ideals is to write ideals as the intersection of other ideals. However, for general ideals, the situation is much more complicated than for varieties. The situation simplifies for radical ideals (which are in a one-to-one correspondence with varieties). This case is discussed next. The general case is summarized afterwards.
Definition 2.14**.**
An ideal is prime if for all with , one of the factors belongs to .
For example, is not prime, because , but neither nor .
Theorem 2.15**.**
A variety is irreducible if and only if is prime.
Definition 2.16**.**
A prime ideal is a minimal prime of an ideal if and only if is an irreducible component of .
There is also an algebraic definition of the minimal primes, and there are algorithms to compute the minimal primes. By definition, the minimal primes of an ideal encode the irreducible decomposition of the corresponding variety:
Theorem 2.17**.**
- (1)
Any ideal has finitely many minimal primes . 2. (2)
The ideal equals the radical of . 3. (3)
The irreducible components of are .
If is not radical, then . In this case, it is still possible to write as an intersection of special ideals (called primary ideals) in a way that is algebraically and geometrically meaningful. This intersection is called a primary decomposition. The precise definitions are omitted, since a primary decomposition often adds little to the statistical understanding. However, some works in algebraic statistics written by algebraists who do care about the differences between ideals and their radicals use this notation. The following result explains how a primary decomposition is related to the minimal primes.
Theorem 2.18**.**
Let be a primary decomposition of , and let be the radical of .
- (1)
. 2. (2)
Each is prime. 3. (3)
Each minimal prime of is among the . 4. (4)
If is not a minimal prime of , then there is a minimal prime of with (and so ).
Example 2.19*.*
Let . The variety consists of the union of the plane where , and the line where . Hence is a decomposition into irreducibles. This corresponds to the ideal decomposition .
The primary decomposition need not be unique.
Example 2.20*.*
The ideal has several different primary decompositions, e.g.
[TABLE]
The variety equals the line where , corresponding to the unique minimal prime . The non-uniqueness of the primary decomposition is related to the fact that the variety of the “extra” component is a subset of one of the other components. This variety (which is superfluous in the irreducible decomposition) is called an embedded components. This example can be analyzed as follows using the computer algebra system Macaulay2 [19]. First set up a polynomial ring in the indeterminates with the rational numbers as the coefficient field. In Macaulay2 it is advisable to work with rather than or since the arithmetic in can be carried out exactly on a computer.
i1 : R = QQ[x,y] o1 = R o1 : PolynomialRing
The system reports that it understands R as a polynomial ring. The following input makes Macaulay2 decompose the ideal. The decomposition is computed over , but in this case it happens to be valid over any field .
i2 : primaryDecomposition ideal (x^2, x*y) 2 o2 = {ideal x, ideal (x , y)}
If one is only interested in the irreducible decomposition, the command decompose returns the minimal primes corresponding to the irreducible components, discarding all embedded components:
i3 : decompose ideal (x^2, x*y) o3 = {ideal x}
2.3. Binomial ideals
This Section ends with a short discussion of binomial ideals and toric ideals, which make frequent appearance in applications.
Definition 2.21**.**
A binomial is a polynomial , with at most two terms. An ideal is a binomial ideal if it has a generating set of binomials. A binomial ideal that is prime and does not contain any variable is a toric ideal.
The main reason why it is important whether an ideal is binomial is that there are dedicated algorithms for binomial ideals that are much faster than the generic algorithms that work for any ideal [13, 10, 22, 23]. Note that there are some instances of ideals that arise in algebraic statistics that are not binomial in their natural coordinate systems but become binomial ideals after a linear change of coordinates [31].
Let be an integer matrix, and consider the ideal
[TABLE]
in the polynomial ring , where is the decomposition of into its positive and negative part . Clearly, is binomial and does not contain any of the . One can also show that is prime, and thus it is an example of a toric ideal. In fact, any toric ideal is of this form up to a scaling of coordinates [13, Corollary 2.6]. The generating set above is infinite, but Theorem 3.1 in [9] shows that finite generating sets of toric ideals are related to Markov bases, which can be computed using the software 4ti2 [1].
2.4. Real algebraic geometry
In addition to polynomial equations, in many situations in statistics it is useful to consider solutions to polynomial inequalities as well. This is the subject of the field real algebraic geometry. Inequalities only make sense over an ordered field like (but not over ). For simplicity, the following definitions and results are formulated with . Again, this text only contains the basic definitions. For more details the reader is referred to [4, 5].
Definition 2.22**.**
Let be sets of polynomials with finite. The basic semialgebraic set defined by and is
[TABLE]
A semialgebraic set is a finite union of basic semialgebraic sets.
Here are some common examples of semialgebraic sets arising in statistics.
Example 2.23*.*
The open probability simplex
[TABLE]
consists of all probability distributions for a categorical random variables with states. It is a basic semialgebraic set: In the above definition, one may take \mathcal{F}=\big{\{}\sum_{i=1}^{r}p_{i}-1\big{\}} and \mathcal{G}=\big{\{}p_{1},\ldots,p_{r}\big{\}}. The probability simplex
[TABLE]
is a semialgebraic set. It can be written as the union of basic semialgebraic sets.
Example 2.24*.*
The cone of positive definite symmetric matrices is an example of a basic semialgebraic set in , where and where consists of the principal subdeterminants of an symmetric matrix of indeterminates. For instance, if consider the polynomial ring and the symmetric matrix of indeterminates
[TABLE]
The symmetry has been enforced by making certain entries in the matrix equal. The set of polynomials defining can be chosen to be
[TABLE]
the set of leading principal minors of . The cone of positive semidefinite symmetric matrices is a semialgebraic set, which can be realized by using non-strict inequalities with the much larger set of all principal minors of .
3. Conditional Independence Ideals
This section shows how the algebraic tools introduced in Section 2 can be used to analyze conditional independence structures. The tools can be applied in the settings of discrete random variables and jointly normal variables, but in different ways.
3.1. Discrete random variables
Let be finite discrete random variables. Suppose that the state space of is . There is an algebraic description of the set of all distributions that satisfy a given conditional independence statement. The first example comes from the simplest CI statement: \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right..
Proposition 3.1**.**
Let be discrete random variables where the state space of is . Let and let be the joint probability mass function of and . Then \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right. if and only if is a rank one matrix.
Proof.
If \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right. then . This expresses the joint probability mass function as an outer product of two nonzero vectors, hence has rank one.
Conversely, if has rank one, it is expressed as the outer product of two vectors . Since is a matrix of nonnegative real numbers, one can assume that and are also nonnegative. Let denote the -norm. Replacing by and by , yields a rank one factorization for where the two factors are necessarily the marginal distributions of and respectively. Hence \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.. ∎
A nonzero matrix having rank one is characterized by the vanishing of all its subdeterminants. Hence, one can associate an ideal to the independence statement \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right..
Definition 3.2**.**
The conditional independence ideal for the statement \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right. is
[TABLE]
Example 3.3*.*
Let and . Then
[TABLE]
The conditional independence ideal I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.} captures the algebraic structure of the independence condition. Although all probability distributions would satisfy the additional constraint that , this trivial constraint is not included in the conditional independence ideal because leaving it out tends to simplify certain algebraic calculations. For example, without this constraint I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.} is a binomial ideal.
More generally, any conditional independence condition for discrete random variables can be expressed by similar determinantal constraints. This requires a bit of notation. The determinantal constraints are written in terms of the entries of the joint distribution of . This is a tensor .
Let be disjoint subsets of indices of the random variables , and the set of indices appearing in none of . Any such assignment yields a grouping of indices and random variables. The random vector takes values in . Let and be defined analogously. The grouping allows one to write where now and similarly for , and . The final notational gadget is the marginalization of over . The entries of this marginal distribution are indexed by and have entries
[TABLE]
The indicates the summation.
Definition 3.4**.**
The conditional independence ideal for the conditional independence statement \left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{} is
[TABLE]
The notation simplifies for saturated conditional independence statements, for which . With this condition there is no marginalization, and the defining polynomials of I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}} are binomials.
Example 3.5*.*
Consider three binary random variables . Let denote the indeterminates standing for the elementary probabilities in the joint distribution. The conditional independence ideal of the statement \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{} is
[TABLE]
The conditional independence ideal of the statement \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right. is
[TABLE]
Proposition 3.6**.**
For any conditional independence statement \left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}, the conditional independence ideal I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}} is a prime ideal and hence V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}) is an irreducible variety.
Proposition 3.6 is a consequence of the fact that general determinantal ideals are prime (see [6]). Irreducibility of the variety V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}) can also be deduced from the fact that this variety can be parametrized, for instance, the set of all probability distributions in V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}) can be realized as the set of probability distributions in a graphical model.
Example 3.7*.*
A strictly positive joint distribution of binary random variables satisfies \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{} if and only if
[TABLE]
for some vectors , (see Section 1.3 of Part I). That is, it lies in the undirected graphical model
[TABLE]
Since V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}) is irreducible, any joint distribution (possibly with zeros) that satisfies \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{} lies in the closure of the undirected graphical model. In fact, any such joint distribution has a parametrization of the form (1), where or also may have zeros.
More interesting than just single statements are combinations of two or more conditional independence statements. To determine the classes of distributions satisfying a collection of independence statements leads to interesting problems in computational algebra. Such sets are typically not irreducible varieties and cannot be parametrized with a single parametrization. The first task is to break such a set into components, and to see if those components have natural interpretations in terms of conditional independence and can be parametrized.
Definition 3.8**.**
Let \mathcal{C}=\{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{},\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{},\ldots\} be a set of conditional independence statements for the random variables . The conditional independence ideal of is the sum of the conditional independence ideals of the elements of :
[TABLE]
Understanding the probability distributions that satisfy can be accomplished by analyzing an irreducible decomposition of , which can be obtained from a primary decomposition of .
Example 3.9*.*
Let , , be binary random variables, and consider \mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{},\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\}. The conditional independence ideal is generated by three polynomials of degree :
[TABLE]
The following Macaulay2 code asks for the primary decomposition of this ideal over . It can be shown that the decomposition is the same over and .
loadPackage "GraphicalModels" S = markovRing (2,2,2) L = {{{1},{3},{2}}, {{1},{3},{}}} I = conditionalIndependenceIdeal(S,L) primaryDecomposition I
This code uses the GraphicalModels package of Macaulay2 which implements many convenient functions to work with graphical and other conditional independence models. In particular, it allows to easily set up the polynomial ring with eight variables with markovRing and write out the equations for with conditionalIndependenceIdeal. The command primaryDecomposition is a generic Macaulay2 command. The output of this code consists of two ideals which upon inspection can be recognized as binomial conditional independence ideals themselves. The result is
[TABLE]
According to Section 2 this implies a decomposition of varieties
[TABLE]
On the level of probability distributions, this proves the binary case of Proposition 1.2.
The general situation may be less favorable than that in Example 3.9. In particular, the components that appear need not have interpretations in terms of conditional independence. The appearing ideals also need not be prime ideals (in general they are only primary) and it is unclear what this algebraic extra information may reveal about conditional independence. For examples on how to extract information from primary decompositions see [20, 24].
3.2. Gaussian random variables
Algebraic approaches to conditional independence can also be applied to Gaussian random variables. Let be a nonsingular multivariate Gaussian random vector with mean and covariance matrix , the cone of symmetric positive definite matrices. One writes . For subsets let be the submatrix of obtained by extracting rows indexed by and columns indexed by , that is .
Proposition 3.10**.**
Let with . Let be disjoint subsets. Then the conditional independence statement \left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{} holds if and only if the matrix has rank .
A proof of this proposition recognizes as a Schur complement of a submatrix of . The details can be found in [11, Proposition 3.1.13].
Just as the rank one condition on a matrix was characterized by the vanishing of subdeterminants, higher rank conditions on matrices can also be characterized by the vanishing of subdeterminants. Indeed, a basic fact of linear algebra is that a matrix has rank if and only if the determinant of every submatrix is zero. This leads to the conditional independence ideals for multivariate Gaussian random variables.
Let be the polynomial ring with real coefficients in the entries of the symmetric matrix .
Definition 3.11**.**
The Gaussian conditional independence ideal for the conditional independence statement \left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{} is the ideal
[TABLE]
If \mathcal{C}=\{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{},\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{},\ldots,\} is a collection of conditional independence statements, the Gaussian conditional independence ideal is
[TABLE]
Remark 3.12*.*
A common criterion in statistics says that, in fact, \left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{} holds if and only if vanishes for all , . Since, by assumption, is non-singular, it is easy to see that this condition is, in fact, equivalent to the vanishing of all -minors of .
Example 3.13*.*
Consider the conditional independence statement \left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{1,3\}\,\middle|4\right.{}. The ideal J_{\left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{1,3\}\,\middle|4\right.{}} is generated by the minors of the matrix
[TABLE]
Since is a symmetric matrix, and one always writes with . Then
[TABLE]
Example 3.14*.*
Let , , be jointly Gaussian random variables. The conditional independence ideal of \mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{},\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\} is
[TABLE]
Straightforward manipulations of these ideals show
[TABLE]
[TABLE]
This last primary decomposition proves the Gaussian case of Proposition 1.2.
3.3. The contraction axiom
When computing the decompositions of conditional independence ideals, there might be components that are “uninteresting" from the statistical standpoint. These components might not intersect the region of interest in probabilistic applications (e.g. they might miss the probability simplex or the cone of positive definite matrices) or they might have non-trivial intersections but that intersection is contained in some other component.
Example 3.15*.*
Let be a multivariate Gaussian random vector. The conditional independence ideal of \mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\,\middle|3\right.{},\left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\} is
[TABLE]
which has primary decomposition
[TABLE]
This decomposition shows that
[TABLE]
However, the second component does not intersect the positive definite cone, because for all . The first component is the conditional independence ideal J_{\left.1,3\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}. From this decomposition it is visible that \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\,\middle|3\right.{} and \left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right. imply that \left.1,3\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.. This implication is called the contraction axiom. See Section 1.5 of Part I for other CI axioms.
The contraction axiom also holds for non-Gaussian random variables. For discrete random variables, it can again be checked algebraically. The primary decomposition associated to the discrete contraction axiom is worked out in detail in [17]. The next examples discusses the binary case as an illustration:
Example 3.16*.*
Let be binary random variables. The conditional independence ideal of \mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\,\middle|3\right.{},\left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\} is
[TABLE]
which has primary decomposition
[TABLE]
The intersection of the second component with the probability simplex forces that . This in turn implies that
[TABLE]
A similar argument holds for the third component. So although the variety has three components, only one of them is statistically meaningful:
[TABLE]
4. Examples of Decompositions of Conditional Independence Ideals
This section studies some examples of families of conditional independence statements and how algebraic tools can be used to understand them. The first example is a detailed study of the intersection axiom, and the second example concerns the conditional independence statements associated to the -cycle graph.
4.1. The intersection axiom
The intersection axiom (see Section 1.5) is the following implication of conditional independence statements:
[TABLE]
This implication is valid for strictly positive probability distributions. Algebraic techniques can be used to study how its validity extends beyond this.
The question about the primary decomposition of the ideal(s) I_{\{\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\cup D\right.{},\left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}C\,\middle|B\cup D\right.{}\}} was first asked in [11, Chapter 6.6], and the answer is due to [15]. Grouping variables if necessary, one can assume , , . Moreover, let . From this one can always recover the general case by adding conditioning constraints.
Proposition 4.1** (Proposition 1 in [15]).**
The ideal I_{\{\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{2}\,\middle|X_{3}\right.{},\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{3}\,\middle|X_{2}\right.{}\}} is radical, that is, its irredundant primary decompositions consists only of prime ideals. These minimal primes correspond to pairs of partitions , of the same size. The minimal prime corresponding to two partitions is
[TABLE]
The paper [15] uses a different formulation in terms of complete bipartite graphs. It can be seen that our formualation is equivalent.
To give a statistical interpretation to Proposition 4.1, whenever the joint distribution of lies in the prime corresponding to the two partitions , , construct a random variable as follows: put whenever and . Thus, is uniquely defined except on a set of measure zero, since for , which follows from the containment of monomials in . The variable specifies in which blocks of the two partitions the random variables and lie. Now the binomials in imply that \left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{X_{2},X_{3}\}\,\middle|B\right.{}.
Corollary 4.2**.**
Suppose that satisfy \left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{2}\,\middle|X_{3}\right.{} and \left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{3}\,\middle|X_{2}\right.{}. Then there is a random variable that satisfies:
- (1)
* is a (deterministic) function of ;* 2. (2)
* is a (deterministic) function of ;* 3. (3)
\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{X_{2},X_{3}\}\,\middle|B\right.{}.
Conversely, whenever there exists a random variable with properties 1. to 3., the random variables satisfy \left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{2}\,\middle|X_{3}\right.{} and \left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{3}\,\middle|X_{2}\right.{}.
The case where is a constant corresponds to the CI statement \left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{X_{2},X_{3}\}\right.. The intersection axiom can be recovered by noting that a function that is a function of only as well as a function of only is necessarily constant, if the joint distribution of and is strictly positive.
Similar results hold for all families of CI statements of the form \left.A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{i}\,\middle|C_{i}\right.{}, where and where is fixed for all statements, see [28, 29] (the case where is binary was already described in [20]). The corresponding CI ideal is still radical, and the minimal primes have a similar interpretation. However, finding the minimal primes is more difficult and involves solving a combinatorial problem. Finally, Corollary 4.2 can be generalized to continuous random variables [27].
4.2. The four-cycle
Consider four discrete random variables and the (undirected) graphical model of the four cycle with edge set . The global Markov CI statements of this graph are
[TABLE]
The primary decomposition of the corresponding CI ideal was studied in [24] in the case where and are binary. The case that all variables are binary is as follows:
Proposition 4.3** (Theorem 5.6 in [24]).**
The minimal primes of the CI ideal of the binary four cycle are the toric ideal and the monomial ideals
[TABLE]
The ideal equals the vanishing ideal of the graphical model, to be discussed in Section 5. Interestingly, in this primary decomposition, all ideals except are monomial ideals; that is, they only give support restrictions on the probability distribution.
The primary decomposition of the CI ideal gives an irreducible decomposition of the corresponding set of probability distributions. This leads to the statement of Proposition 1.1 in the introduction.
When and are not binary, the decomposition of involves prime ideals parametrized by and two sets . For such a choice of , let denote the element of , and let
[TABLE]
the result is the following:
Proposition 4.4** (Theorem 6.5 in [24]).**
Let and be binary random variables. The minimal primes of the CI ideal of the four cycle are the toric ideal and the ideals for and . Furthermore the ideal is radical and thus equals the intersection of its minimal primes.
In this case, the non-toric primes are not monomial, but consist of monomials and binomials. This fact is independent of the field . The following is one example of the kind of information that can be extracted from knowing the minimal primes of a conditional independence ideal.
Corollary 4.5**.**
Let be finite random variables that satisfy , and suppose that and are binary. Then one (or more) of the following statements is true:
- (1)
The joint distribution lies in the closure of the graphical model. 2. (2)
There is and there are sets such that the following holds:
[TABLE]
Conversely, any probability distribution that satisfies one of these statements and that satisfies \left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}4\,\middle|\{1,3\}\right.{} also satisfies .
5. The Vanishing Ideal of a Graphical Model
Graphical models can be represented via either parametric descriptions (e.g. factorizations of the density function) or implicit descriptions (e.g. Markov properties and conditional independence constraints). One use of the algebraic perspective on graphical models is to find the complete implicit description of the model, in particular, to find the vanishing ideal of the model. As described in Definition 2.8, the vanishing ideal of a set is the set of all polynomial functions that evaluate to zero at every point in . Although some graphical models have complete descriptions only in terms of conditional independence constraints, understanding the vanishing ideal can be useful for more complex models or hidden variable models where conditional independence is not sufficient to describe the model, for instance, the mixed graph models studied in Section 2 of Part I.
Example 5.1*.*
Consider the four cycle and let be binary random variables. The vanishing ideal is generated by binomials, of which have degree and of which have degree . The degree binomials are all implied by the two conditional independence statements \left.1\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|\{2,4\}\right.{} and \left.2\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}4\,\middle|\{1,3\}\right.{}. On the other hand, the degree binomials are not implied by the conditional independence constraints, even when we restrict to probability distributions. One example degree four polynomial is
[TABLE]
and the others are obtained by applying the symmetry group of the four cycle and permuting levels of the random variables.
Even in the simple example of the four cycle, there are generators of the vanishing ideal that do not correspond to conditional independence statements. It seems an important problem to try to understand what other types of equations can arise. Theorem 3.2 in [18] shows that the vanishing ideal of a graphical model of discrete random variables is the toric ideal , where is the design matrix of the graphical model, defined in the end of Section 2. A classification for discrete random variables and undirected graphs of when no further polynomials are needed beyond conditional independence constraints is obtained in the following theorem:
Theorem 5.2** (Theorem 4.4 in [18]).**
Let be an undirected graph and let be its graphical model for discrete random variables. Then the vanishing ideal is equal to the conditional independence ideal if and only if is a chordal graph.
It is unknown what the appropriate analogue of Theorem 5.2 for other families of graphical models, either with different classes of graphs (e.g. DAGs) or with other types of random variables (e.g. Gaussian). Computational studies of the vanishing ideals appear in many different papers: for Bayesian networks with discrete random variables [17], for Bayesian networks with Gaussian random variables [33], for undirected graphical models with Gaussian random variables [32]. A characterization for which graph families the vanishing ideal is equal to the conditional independence ideal of global Markov statements is lacking in all these cases.
One natural question is to determine the other generators of the vanishing ideal that do not come from conditional independence, and to give combinatorial structures in the underlying graphs that imply that these more general constraints hold. For instance, for mixed Gaussian graphical models, conditional independence constraints are determinantal, but not every determinantal constraint comes from a conditional independence statement, and there is a characterization of which determinantal constraints come from conditional independence:
A mixed graph is a triple where is the vertex set, is a set of unordered pairs of representing bidirected edges in , and is a set of ordered pairs of represented directed edges in . There might also be both directed and bidirected edges between a pair of vertices. To the set of of bidirected edges one associates the set of symmetric positive definite matrices
[TABLE]
To the set of directed edges one associates the set of matrices
[TABLE]
Let and let be a jointly normal random vector satisfying the structural equation system
[TABLE]
This is an example of a linear structural equation model, and contains as special cases various families of graphical models. Let denote the identity matrix. With these assumptions, if is invertible,
[TABLE]
Example 5.3*.*
Consider the mixed graph from Figure 1. In this case, is the set of positive definite matrices of the form
[TABLE]
and is the set of real matrices of the form
[TABLE]
A positive definite matrix belongs to the graphical model associated to this mixed graph, if and only if there are and such that .
Definition 5.4**.**
Let be a mixed graph. A trek between vertices and in consists of either
- •
a pair where is a directed path ending in and is a directed path ending in where both and have the same source, or
- •
a pair where is a directed path ending in and is a directed path ending in such that the source of and the source of are connected by a bidirected edge.
Let denote the set of all treks in between and .
To each trek one associates the trek monomial which is the product with multiplicities of all over all directed edges appearing in times where and are the sources of and . One reason for the interest in treks is the trek rule, which says the for the Gaussian graphical model associated to
[TABLE]
For instance, for the mixed graph in Figure 1, the pair is a trek from to . The corresponding trek monomial is .
Definition 5.5**.**
Let be four sets of vertices of , not necessarily disjoint. The pair of sets t-separates (short for trek separates) from if for every and and every trek , either has a vertex in or has a vertex in , or both.
Theorem 5.6**.**
[35]** Let be a mixed graph and and two subsets of with . Then the minor belongs to the vanishing ideal if and only if there is a pair of sets such that t-separates and and such that .
The t-separation criterion can produce implicit constraints for structural equation models in situations where there are no conditional independence constraints.
Example 5.7*.*
Consider the mixed graph from Figure 1. The vanishing ideal of the model is . This determinantal constraint is not a conditional independence constraint. It is implied by the t-separation criterion because the pair t-separates and .
Remark 5.8*.*
In the case where there are hidden random variables, the vanishing ideal is typically not sufficient to completely describe the set of probability distributions that come from the graphical model. Usually one also needs to consider inequality constraints and other semialgebraic conditions. This problem is discussed in more detail in [3, 2, 37], among others.
6. Further Reading
Diaconis, Eisenbud and Sturmfels [8] were the first to consider primary decompositions for statistical applications, in particular the analysis of the connectivity of certain random walks. This perspective was also picked up in [24] using conditional independence ideals. Primary decomposition of conditional independence ideals also makes an appearance in the following papers not already mentioned [12, 16, 25, 34, 36].
The algebraic view on undirected graphical models was presented in [18], which began extensive study of the vanishing ideals of undirected graphical models for discrete random variables. Focus has been on developing techniques for constructing generating sets of the vanishing ideals with [14, 21, 30] being representative papers in this area. Vanishing ideals of undirected models with Gaussian random variables and models for DAGs have not been much studied. Some papers that initiated their study include [17, 32, 33].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] 4ti 2 team, 4ti 2—a software package for algebraic, geometric and combinatorial problems on linear spaces , available at http://www.4ti 2.de .
- 2[2] Elizabeth S. Allman, John A. Rhodes, Bernd Sturmfels, and Piotr Zwiernik, Tensors of nonnegative rank two , Linear Algebra Appl. 473 (2015), 37–53. MR 3338324
- 3[3] Elizabeth S. Allman, John A. Rhodes, and Amelia Taylor, A semialgebraic description of the general Markov model on phylogenetic trees , SIAM J. Discrete Math. 28 (2014), no. 2, 736–755. MR 3206983
- 4[4] Saugata Basu, Richard Pollack, and Marie-Françoise Roy, Algorithms in real algebraic geometry , second ed., Algorithms and Computation in Mathematics, vol. 10, Springer-Verlag, Berlin, 2006. MR 2248869
- 5[5] Jacek Bochnak, Michel Coste, and Marie-Françoise Roy, Real algebraic geometry , Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 36, Springer-Verlag, Berlin, 1998, Translated from the 1987 French original, Revised by the authors. MR 1659509
- 6[6] Winfried Bruns and Udo Vetter, Determinantal rings , Monografías de Matemática [Mathematical Monographs], vol. 45, Instituto de Matemática Pura e Aplicada (IMPA), Rio de Janeiro, 1988. MR 986492
- 7[7] David A. Cox, John Little, and Donal O’Shea, Ideals, varieties, and algorithms , fourth ed., Undergraduate Texts in Mathematics, Springer, Cham, 2015, An introduction to computational algebraic geometry and commutative algebra. MR 3330490
- 8[8] Persi Diaconis, David Eisenbud, and Bernd Sturmfels, Lattice walks and primary decomposition , Mathematical essays in honor of Gian-Carlo Rota (Cambridge, MA, 1996), Progr. Math., vol. 161, Birkhäuser Boston, Boston, MA, 1998, pp. 173–193. MR 1627343
