Algebraic Aspects of Conditional Independence and Graphical Models

Thomas Kahle; Johannes Rauh; Seth Sullivant

arXiv:1705.07411·math.ST·May 23, 2017

Algebraic Aspects of Conditional Independence and Graphical Models

Thomas Kahle, Johannes Rauh, Seth Sullivant

PDF

Open Access

TL;DR

This chapter explores algebraic geometry techniques, such as binomial ideals and primary decomposition, to analyze conditional independence and graphical models, providing computational tools and examples for understanding model constraints.

Contribution

It introduces algebraic geometry methods to study conditional independence in graphical models, including primary decomposition and vanishing ideals, with practical examples.

Findings

01

Algebraic geometry tools can analyze implications between conditional independences.

02

Computing primary decompositions helps understand model constraints.

03

Examples include four-cycle graphical models and trek separation constraints.

Abstract

This chapter of the forthcoming Handbook of Graphical Models contains an overview of basic theorems and techniques from algebraic geometry and how they can be applied to the study of conditional independence and graphical models. It also introduces binomial ideals and some ideas from real algebraic geometry. When random variables are discrete or Gaussian, tools from computational algebraic geometry can be used to understand implications between conditional independence statements. This is accomplished by computing primary decompositions of conditional independence ideals. As examples the chapter presents in detail the graphical model of a four cycle and the intersection axiom, a certain implication of conditional independence statements. Another important problem in the area is to determine all constraints on a graphical model, for example, equations determined by trek separation. The…

Equations107

p^{u} := p_{1}^{u_{1}} p_{2}^{u_{2}} \dots p_{r}^{u_{r}} .

p^{u} := p_{1}^{u_{1}} p_{2}^{u_{2}} \dots p_{r}^{u_{r}} .

f = u \in U \sum c_{u} p^{u}

f = u \in U \sum c_{u} p^{u}

V (F) = {a \in k^{r} : f (a) = 0 \mbox f or a l l f \in F} .

V (F) = {a \in k^{r} : f (a) = 0 \mbox f or a l l f \in F} .

{(x, y) \in k^{2} : y = x^{2} \mbox an d x^{2} + y^{2} = 1},

{(x, y) \in k^{2} : y = x^{2} \mbox an d x^{2} + y^{2} = 1},

I (S) := {f \in k [p_{1}, \dots, p_{r}] : f (a) = 0 \mbox f or a l l a \in S} .

I (S) := {f \in k [p_{1}, \dots, p_{r}] : f (a) = 0 \mbox f or a l l a \in S} .

⟨ x^{2}, x y ⟩ = ⟨ x ⟩ \cap ⟨ x^{2}, y ⟩ = ⟨ x ⟩ \cap ⟨ x^{2}, x + y ⟩

⟨ x^{2}, x y ⟩ = ⟨ x ⟩ \cap ⟨ x^{2}, y ⟩ = ⟨ x ⟩ \cap ⟨ x^{2}, x + y ⟩

I_{A}:=\big{\langle}p^{u_{+}}-p^{u_{-}}:u=u_{+}-u_{-}\in\ker_{\mathbb{Z}}A\big{\rangle}

I_{A}:=\big{\langle}p^{u_{+}}-p^{u_{-}}:u=u_{+}-u_{-}\in\ker_{\mathbb{Z}}A\big{\rangle}

\big{\{}a\in\mathbb{R}^{r}:f(a)=0\mbox{ for all }f\in\mathcal{F}\mbox{ and }g(a)>0\mbox{ for all }g\in\mathcal{G}\big{\}}.

\big{\{}a\in\mathbb{R}^{r}:f(a)=0\mbox{ for all }f\in\mathcal{F}\mbox{ and }g(a)>0\mbox{ for all }g\in\mathcal{G}\big{\}}.

{\rm int}(\Delta_{r-1}):=\big{\{}p\in\mathbb{R}^{r}:\sum_{i=1}^{r}p_{i}=1,p_{i}>0,i=1,\ldots,r\big{\}}

{\rm int}(\Delta_{r-1}):=\big{\{}p\in\mathbb{R}^{r}:\sum_{i=1}^{r}p_{i}=1,p_{i}>0,i=1,\ldots,r\big{\}}

\Delta_{r-1}:=\big{\{}p\in\mathbb{R}^{r}:\sum_{i=1}^{r}p_{i}=1,p_{i}\geq 0,i=1,\ldots,r\big{\}}

\Delta_{r-1}:=\big{\{}p\in\mathbb{R}^{r}:\sum_{i=1}^{r}p_{i}=1,p_{i}\geq 0,i=1,\ldots,r\big{\}}

Σ = σ_{11} σ_{12} σ_{13} σ_{12} σ_{22} σ_{23} σ_{13} σ_{23} σ_{33} .

Σ = σ_{11} σ_{12} σ_{13} σ_{12} σ_{22} σ_{23} σ_{13} σ_{23} σ_{33} .

\mathcal{G}=\big{\{}\sigma_{11},\sigma_{11}\sigma_{22}-\sigma_{12}^{2},\det\Sigma\big{\}},

\mathcal{G}=\big{\{}\sigma_{11},\sigma_{11}\sigma_{22}-\sigma_{12}^{2},\det\Sigma\big{\}},

\displaystyle I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}

\displaystyle I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}

= ⟨ 2 \times 2 subdeterminants of p ⟩ \subseteq R [p_{i_{1}, i_{2}} : i_{1} \in [r_{1}], i_{2} \in [r_{2}]] .

I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}=\langle p_{11}p_{22}-p_{12}p_{21},p_{11}p_{23}-p_{13}p_{21},p_{12}p_{23}-p_{13}p_{22}\rangle.

I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}=\langle p_{11}p_{22}-p_{12}p_{21},p_{11}p_{23}-p_{13}p_{21},p_{12}p_{23}-p_{13}p_{22}\rangle.

p_{i_{A}, i_{B}, i_{C}, +} = i_{D} \in R_{D} \sum p_{i_{A}, i_{B}, i_{C}, i_{D}}

p_{i_{A}, i_{B}, i_{C}, +} = i_{D} \in R_{D} \sum p_{i_{A}, i_{B}, i_{C}, i_{D}}

\begin{split}I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}=\Big{\langle}p_{i_{A},i_{B},i_{C},+}\cdot p_{j_{A},j_{B},i_{C},+}-p_{i_{A},j_{B},i_{C},+}\cdot p_{j_{A},i_{B},i_{C},+},\text{ for all }\\ i_{A},j_{A}\in\mathcal{R}_{A},i_{B},j_{B}\in\mathcal{R}_{B},i_{C}\in\mathcal{R}_{C}\Big{\rangle}\end{split}

\begin{split}I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}=\Big{\langle}p_{i_{A},i_{B},i_{C},+}\cdot p_{j_{A},j_{B},i_{C},+}-p_{i_{A},j_{B},i_{C},+}\cdot p_{j_{A},i_{B},i_{C},+},\text{ for all }\\ i_{A},j_{A}\in\mathcal{R}_{A},i_{B},j_{B}\in\mathcal{R}_{B},i_{C}\in\mathcal{R}_{C}\Big{\rangle}\end{split}

I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}}=\langle p_{111}p_{212}-p_{211}p_{112},p_{121}p_{222}-p_{221}p_{122}\rangle.

I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}}=\langle p_{111}p_{212}-p_{211}p_{112},p_{121}p_{222}-p_{221}p_{122}\rangle.

I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}=\langle(p_{111}+p_{121})(p_{212}+p_{222})-(p_{112}+p_{122})(p_{211}+p_{221})\rangle.

I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}=\langle(p_{111}+p_{121})(p_{212}+p_{222})-(p_{112}+p_{122})(p_{211}+p_{221})\rangle.

p_{i_{1}, i_{2}, i_{3}} = s_{i_{1}, i_{2}} t_{i_{2}, i_{3}}

p_{i_{1}, i_{2}, i_{3}} = s_{i_{1}, i_{2}} t_{i_{2}, i_{3}}

X_{1} - X_{2} - X_{3} .

X_{1} - X_{2} - X_{3} .

I_{\mathcal{C}}=I_{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{}}+I_{\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{}}+\cdots.

I_{\mathcal{C}}=I_{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{}}+I_{\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{}}+\cdots.

I_{\mathcal{C}}=I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}}+I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}=\langle p_{111}p_{212}-p_{112}p_{211},p_{121}p_{222}-p_{122}p_{221},\\ (p_{111}+p_{121})(p_{212}+p_{222})-(p_{112}+p_{122})(p_{211}+p_{221})\rangle.

I_{\mathcal{C}}=I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}}+I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}=\langle p_{111}p_{212}-p_{112}p_{211},p_{121}p_{222}-p_{122}p_{221},\\ (p_{111}+p_{121})(p_{212}+p_{222})-(p_{112}+p_{122})(p_{211}+p_{221})\rangle.

I_{\mathcal{C}}=I_{\left.\{1,2\}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}\cap I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{2,3\}\right.}.

I_{\mathcal{C}}=I_{\left.\{1,2\}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}\cap I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{2,3\}\right.}.

V(I_{\mathcal{C}})=V(I_{\left.\{1,2\}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.})\cup V(I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{2,3\}\right.}).

V(I_{\mathcal{C}})=V(I_{\left.\{1,2\}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.})\cup V(I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{2,3\}\right.}).

J_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}:=\langle\text{$(\#C+1)$-minors of }\Sigma_{A\cup C,B\cup C}\rangle.

J_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}:=\langle\text{$(\#C+1)$-minors of }\Sigma_{A\cup C,B\cup C}\rangle.

J_{\mathcal{C}}=J_{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{}}+J_{\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{}}+\cdots.

J_{\mathcal{C}}=J_{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{}}+J_{\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{}}+\cdots.

Σ_{{2, 4}, {1, 3, 4}} = (σ_{12} σ_{14} σ_{23} σ_{34} σ_{24} σ_{44}) .

Σ_{{2, 4}, {1, 3, 4}} = (σ_{12} σ_{14} σ_{23} σ_{34} σ_{24} σ_{44}) .

J_{\left.2\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{1,3\}\,\middle|4\right.{}}=\langle\sigma_{12}\sigma_{34}-\sigma_{14}\sigma_{23},\sigma_{12}\sigma_{44}-\sigma_{14}\sigma_{24},\sigma_{23}\sigma_{44}-\sigma_{34}\sigma_{24}\rangle.

J_{\left.2\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{1,3\}\,\middle|4\right.{}}=\langle\sigma_{12}\sigma_{34}-\sigma_{14}\sigma_{23},\sigma_{12}\sigma_{44}-\sigma_{14}\sigma_{24},\sigma_{23}\sigma_{44}-\sigma_{34}\sigma_{24}\rangle.

J_{\mathcal{C}}=J_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}}+J_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}=\langle\sigma_{13}\sigma_{22}-\sigma_{12}\sigma_{23},\sigma_{13}\rangle.

J_{\mathcal{C}}=J_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}}+J_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.}=\langle\sigma_{13}\sigma_{22}-\sigma_{12}\sigma_{23},\sigma_{13}\rangle.

J_{C} = ⟨ σ_{13} σ_{22} - σ_{12} σ_{23}, σ_{13} ⟩ = ⟨ σ_{12} σ_{23}, σ_{13} ⟩ = ⟨ σ_{12}, σ_{13} ⟩ \cap ⟨ σ_{23}, σ_{13} ⟩

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference

Full text

Algebraic Aspects of Conditional Independence and Graphical Models

Thomas Kahle

Fakultät für Mathematik, Otto-von-Guericke University Magdeburg, Germany http://www.thomas-kahle.de ,

Johannes Rauh

Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany. http://www.yorku.ca/jarauh/ and

Seth Sullivant

Department of Mathematics, North Carolina State University, Raleigh, USA. http://www4.ncsu.edu/~smsulli2/

(Date: May 2017)

Abstract.

This chapter of the forthcoming Handbook of Graphical Models contains an overview of basic theorems and techniques from algebraic geometry and how they can be applied to the study of conditional independence and graphical models. It also introduces binomial ideals and some ideas from real algebraic geometry. When random variables are discrete or Gaussian, tools from computational algebraic geometry can be used to understand implications between conditional independence statements. This is accomplished by computing primary decompositions of conditional independence ideals. As examples the chapter presents in detail the graphical model of a four cycle and the intersection axiom, a certain implication of conditional independence statements. Another important problem in the area is to determine all constraints on a graphical model, for example, equations determined by trek separation. The full set of equality constraints can be determined by computing the model’s vanishing ideal. The chapter illustrates these techniques and ideas with examples from the literature and provides references for further reading.

2010 Mathematics Subject Classification:

62-00, 13P25

1. Introduction

Consider a finite set of random variables $X_{v}$ , $v\in V$ . Section 1.6 of Part I111All references to other sections refer to the forthcoming Handbook of Graphical Models, edited by Mathias Drton, Steffen Lauritzen, Marloes Maathuis, and Martin Wainwright. It will contain this document as Section 3 of Part I. describes how to use a simple undirected graph $G=(V,E)$ to encode conditional independence (CI) statements among the random variables. One can also naturally associate a parametrized family of joint probability distributions of the $X_{v}$ to a graph. For undirected graphs, the Hammersley–Clifford theorem (see Section 1.6.3 of Part I shows that both the implicit method and the parametric method lead to the same families of probability distributions (called graphical models), as long as all distributions are assumed strictly positive.

When probabilities are allowed to go to zero, the models defined by the collections of CI statements contain probability distributions that do not lie in the parametric graphical model, which, by definition, consists of strictly positive probability distributions. In fact, these additional distributions do not even lie in the closure of the parametric graphical model, so they cannot be approximated by distributions from the parametric graphical model. Moreover, models defined by the defined collections of CI statements (pairwise Markov properties, local Markov properties, global Markov properties) differ from one another. As an example, consider the four-cycle $C_{4}$ .

Proposition 1.1.

The binary random variables $X=(X_{1},X_{2},X_{3},X_{4})$ satisfy the global Markov statements of $C_{4}$ , $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|\{2,4\}\right.{}$ and $\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}4\,\middle|\{1,3\}\right.{}$ , if and only if one (or more) of the following statements is true:

(1)

The joint distribution lies in the closure of the graphical model. 2. (2)

There is a pair $(X_{i},X_{i+1})$ of neighboring nodes such that $X_{i}=X_{i+1}$ a.s. 3. (3)

There is a pair $(X_{i},X_{i+1})$ of neighboring nodes such that $X_{i}\neq X_{i+1}$ a.s.

This chapter shows how to prove results such as Proposition 1.1 using algebraic tools. The algebraic method can also be used to study implications between conditional independence statements. Here is an example:

Proposition 1.2.

Suppose that $X,Y,Z$ are binary random variables or jointly normal random variables. If $\left.X\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\right.$ and $\left.X\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\,\middle|Z\right.{}$ then either $\left.X\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(Y,Z)\right.$ or $\left.(X,Z)\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\right.$ .

The CI implication in Proposition 1.2 is a special case of the gaussoid axiom [26]. One may wonder what is special about jointly normal or binary random variables. For instance, is there a variant of this implication when $X,Y,Z$ are discrete but not binary? How can one systematically find and study implications like this?

CI implications can also be interpreted as intersections of graphical models. For example, the two CI statements $\left.X\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\right.$ and $\left.X\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\,\middle|Z\right.{}$ in Proposition 1.2 correspond to the two graphical models $X\to Z\leftarrow Y$ and $X{-Z}{-Y}$ , respectively. Thus, Proposition 1.2 says that the intersection of these two graphical models equals the union of the two graphical models $X\ Z{-Y}$ and $X{-Z}\ Y$ , provided the random variables are either binary or jointly normal. As the example shows, the intersection of two graphical models need not be a graphical model. How can one compute this intersection?

The goal of this chapter is to explore these questions and introduce tools from computational algebra for studying them. Our perspective is that, for a fixed type of random variable, the set of distributions that satisfy a collection of independence constraints is the zero set of a collection of polynomial equations. Solutions of systems of polynomial equations are the objects of study of algebraic geometry, and so tools from algebra can be brought to bear on the problem. The next section contains an overview of basic ideas in algebraic geometry which are useful for the study of conditional independence structures and graphical models. In particular, it introduces algebraic varieties, polynomial ideals, and primary decomposition. Section 3 introduces the ideals associated to families of conditional independence statements, and explains how to apply the basic techniques to deduce conditional independence implications. Section 4 illustrates the main ideas with some deeper examples coming from the literature. Section 5 concerns the vanishing ideal of a graphical model, which is a complete set of implicit restrictions for that model. This set of restrictions is usually much larger than the set of conditional independence constraints that come from the graph, but it can illuminate the structure of the model especially with more complex families of models involving mixed graphs or hidden random variables. Section 6 highlights some key references in this area.

2. Notions of Algebraic Geometry and Commutative Algebra

Commutative algebra is the study of systems of polynomial equations, and algebraic geometry is the study of geometric properties of their solutions. Both are rich fields with many deep results. This section only gives a very coarse introduction to the basic facts that hopefully makes it possible for the reader to understand the phenomena and algorithms discussed in later parts of this chapter. For a more detailed introduction, the reader is referred to the standard textbook [7].

2.1. Polynomials, ideals and varieties

Let $\Bbbk$ be a field, for example the rational numbers $\mathbb{Q}$ , the real numbers $\mathbb{R}$ , or the complex numbers $\mathbb{C}$ . Let $\mathbb{N}=\{0,1,\ldots\}$ denote the natural numbers. Let $p_{1},p_{2},\dots,p_{r}$ be a collection of indeterminates or variables. A monomial in the indeterminates $p_{1},p_{2},\dots,p_{r}$ is an expression of the form $p_{1}^{u_{1}}p_{2}^{u_{2}}\cdots p_{r}^{u_{r}}$ where $u_{1},\ldots,u_{r}$ are nonnegative integers. Writing $u=(u_{1},\ldots,u_{r})$ , let

[TABLE]

A polynomial is a finite linear combination of monomials, i.e.

[TABLE]

where $U\subset\mathbb{N}^{r}$ is a finite set and $c_{u}\in\Bbbk$ . Of course, one is used to thinking of a polynomial as a function, $f:\Bbbk^{r}\rightarrow\Bbbk$ , which can be evaluated in a point $a\in\Bbbk^{r}$ for $p$ . In the following, this function will usually have the role of a constraint; i.e., the object of interest is the zero set $\{a\in\Bbbk^{r}:f(a)=0\}$ . In algebra, it is also useful to think of a polynomial as a formal object, i.e. the indeterminates are simply symbols that are used in manipulations, with no need for them to be evaluated.

The set of all polynomials in indeterminates $p_{1},\ldots,p_{r}$ with coefficients in $\Bbbk$ is called the polynomial ring and denoted $\Bbbk[p_{1},\ldots,p_{r}]$ . The word ring means that $\Bbbk[p_{1},\ldots,p_{r}]$ has two operations, namely addition of polynomials and multiplication of polynomials, and that these operations satisfy all the usual properties of addition and multiplication (associativity, commutativity, distributivity). However, multiplicative inverses need not exist: The result of dividing one polynomial by another non-constant polynomial is in general not a polynomial, but a rational function.

Definition 2.1.

Let $\mathcal{F}\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ . The variety defined by $\mathcal{F}$ is the vanishing set of the polynomials in $\mathcal{F}$ , that is,

[TABLE]

*Example 2.2**.*

Let $r=2$ and consider $\mathcal{F}=\{x^{2}-y\}\subseteq\Bbbk[x,y]$ . The variety $V(\{x^{2}-y\})\subseteq\Bbbk^{2}$ is the familiar parabola “ $y=x^{2}$ ” in the plane. For $r=4$ and $\mathcal{F}=\{p_{11}p_{22}-p_{12}p_{21}\}\subseteq\Bbbk[p_{11},p_{12},p_{21},p_{22}]$ , the variety $V(\{p_{11}p_{22}-p_{12}p_{21}\})\subseteq\Bbbk^{4}$ is the set of all singular $2\times 2$ matrices $\left(\begin{smallmatrix}p_{11}&p_{12}\\ p_{21}&p_{22}\end{smallmatrix}\right)$ with entries in $\Bbbk$ .

*Example 2.3**.*

Let $\mathcal{F}=\{x^{2}-y,x^{2}+y^{2}-1\}$ . The variety $V(\mathcal{F})$ is the set of points

[TABLE]

in other words, the intersection of a parabola and a circle. The number of points in this intersection varies depending on whether the underlying field is $\mathbb{Q}$ , $\mathbb{R}$ , or $\mathbb{C}$ (or some other field). If $\Bbbk=\mathbb{Q}$ the variety is empty, if $\Bbbk=\mathbb{R}$ the variety has two points, and if $\Bbbk=\mathbb{C}$ , the variety has four points. In statistical applications one is usually interested in solutions over $\mathbb{R}$ . However, it is often easier to first perform computations in algebraically closed fields like $\mathbb{C}$ before restricting to the real numbers, at least in theory. On the other hand, when using a computer algebra system, it may be advantageous to work with $\mathbb{Q}$ , if possible, because rational numbers can be represented exactly on a computer.

The examples so far have always used finite sets $\mathcal{F}$ . This is not necessary for the definition of a variety, and it is often worthwhile to consider the variety $V(\mathcal{F})$ where $\mathcal{F}$ is an infinite set of polynomials. In fact, it is often convenient to replace the original set of polynomials $\mathcal{F}$ by an infinite set, the ideal generated by $\mathcal{F}$ , which is equivalent to $\mathcal{F}$ in some sense but has more structure.

One reason is that different families of polynomials may have the same varieties. For example, if $f,g\in\mathcal{F}$ , then the variety of $\mathcal{F}\cup\{f+g\}$ equals $V(\mathcal{F})$ . Similarly, for $f\in\mathcal{F}$ and $\lambda\in k$ , the variety of $\mathcal{F}\cup\{\lambda f\}$ equals $V(\mathcal{F})$ .

Definition 2.4.

A set $I\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ is an ideal if for all $f,g\in I$ , $f+g\in I$ and for all $f\in I$ and $h\in\Bbbk[p_{1},\ldots,p_{r}]$ , $hf\in I$ .

Definition 2.5.

Let $\mathcal{F}\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ be a set of polynomials. The ideal generated by $\mathcal{F}$ is the smallest ideal in $\Bbbk[p_{1},\ldots,p_{r}]$ that contains $\mathcal{F}$ . Equivalently, the ideal generated by $\mathcal{F}$ consists of all polynomials $h_{1}f_{1}+\cdots+h_{k}f_{k}$ for $h_{1},\ldots,h_{k}\in\Bbbk[p_{1},\ldots,p_{r}]$ , $f_{1},\ldots,f_{k}\in\mathcal{F}$ , and any $k$ . The ideal generated by $\mathcal{F}$ is denoted $\langle\mathcal{F}\rangle$ .

Proposition 2.6.

Let $\mathcal{F}\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ be a set of polynomials. Then $V(\mathcal{F})=V(\langle\mathcal{F}\rangle)$ .

*Example 2.7**.*

The ideal $\langle x^{2}-y,x^{2}+y^{2}-1\rangle$ generated by the set $\mathcal{F}$ from Example 2.3 has many different possible generating sets. For example, an alternate generating set is $\{x^{2}-y,x^{4}+x^{2}-1\}$ . This allows to easily find all the solutions of the polynomial system because all roots of the univariate polynomial $x^{4}+x^{2}-1=0$ can be plugged into the second polynomial $x^{2}-y=0$ , which can then be solved for $y$ .

Hilbert’s basis theorem implies that for any ideal $I\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ there exists a finite set of polynomials $\mathcal{F}\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ such that $I=\langle\mathcal{F}\rangle$ .

Even though it is, for theoretical considerations, often easier to think about systems of polynomial equations in terms of ideals, in practice (i.e. when working with computer algebra systems), the ideal is almost always specified in terms of a finite set of generators (or such a finite set of generators has to be computed on the way). On the other hand, during a computation it is often necessary to replace this set of generators by a more convenient set of generators (e.g. a Gröbner basis), so the generators may change even though the ideal stays the same along a computation.

Definition 2.8.

Let $S\subseteq\Bbbk^{r}$ . The vanishing ideal of $S$ is the set

[TABLE]

It is easy to check that a vanishing ideal is indeed an ideal. Clearly, any ideal $J$ satisfies $J\subseteq I(V(J))$ . However, the converse inclusion does not hold in general. For instance, $I(V(\langle x^{2}\rangle))=\langle x\rangle$ , and $I(V(\langle x^{2}y,xy^{2}\rangle))=\langle xy\rangle$ (over any field $\Bbbk$ ).

Definition 2.9.

The ideal $I(V(J))$ is the $\Bbbk$ -radical of $J$ . An ideal $J$ such that $I(V(J))=J$ is a $\Bbbk$ -radical ideal. If $\Bbbk$ is algebraically closed (e.g. if $\Bbbk=\mathbb{C}$ ), such an ideal $J$ is simply called a radical ideal.

Radical ideals can also be characterized algebraically, and there are algorithms to compute radicals. The radical is usually a simpler ideal, and if the radical of an ideal can be computed, it is advantageous to do this in a first step in each calculation (as long as one is only interested in properties of $V(J)$ , and not in algebraic properties of $J$ ).

The following proposition illustrates the close relation between ideals and varieties.

Proposition 2.10.

**

•

Let $S_{1},S_{2}\subseteq\Bbbk^{r}$ . Then $I(S_{1}\cup S_{2})=I(S_{1})\cap I(S_{2})$ and $I(S_{1}\cap S_{2})\supseteq I(S_{1})+I(S_{2}):=\{f+g:f\in I(S_{1}),g\in I(S_{2})\}$ .

•

Let $I,J\subseteq\Bbbk[p_{1},\dots,p_{r}]$ . Then $V(I\cup J)=V(I+J)=V(I)\cap V(J)$ and $V(I\cap J)=V(I)\cup V(J)$ .

2.2. Irreducible and primary decomposition

Proposition 2.10 shows that the union of two varieties is again a variety. Interestingly, not every variety can be written as a non-trivial finite union.

Definition 2.11.

A variety $V$ is reducible if there are two varieties $V_{1},V_{2}\neq V$ such that $V_{1}\cup V_{2}=V$ . Otherwise $V$ is irreducible.

Theorem 2.12.

Any variety $V\subseteq\Bbbk^{r}$ has a unique decomposition into finitely many irreducible varieties $V=V_{1}\cup\cdots\cup V_{k}$ (with $V_{i}\not\subseteq V_{j}$ for $i\neq j$ ).

The varieties $V_{1},\dots,V_{k}$ are called the irreducible components of $V$ .

Theorem 2.13.

(1)

Let $V$ be an irreducible variety, and let $\phi:\Bbbk^{r}\to\Bbbk^{s}$ be a rational map. Then $V(I(\phi(V)))$ is irreducible. 2. (2)

Let $V$ be a variety that has a rational parametrization $\phi:\Bbbk^{r}\to V$ such that the image of $\phi$ is dense in $V$ . Then $V$ is irreducible.

According to Proposition 2.10, the corresponding decomposition operation for ideals is to write ideals as the intersection of other ideals. However, for general ideals, the situation is much more complicated than for varieties. The situation simplifies for radical ideals (which are in a one-to-one correspondence with varieties). This case is discussed next. The general case is summarized afterwards.

Definition 2.14.

An ideal $I\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ is prime if for all $f,g\in\Bbbk[p_{1},\ldots,p_{r}]$ with $f\cdot g\in I$ , one of the factors $f,g$ belongs to $I$ .

For example, $I:=\langle xy\rangle$ is not prime, because $xy\in I$ , but neither $x\in I$ nor $y\in I$ .

Theorem 2.15.

A variety $V$ is irreducible if and only if $I(V)$ is prime.

Definition 2.16.

A prime ideal $P\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ is a minimal prime of an ideal $I\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ if and only if $V(P)$ is an irreducible component of $V(I)$ .

There is also an algebraic definition of the minimal primes, and there are algorithms to compute the minimal primes. By definition, the minimal primes of an ideal encode the irreducible decomposition of the corresponding variety:

Theorem 2.17.

(1)

Any ideal $I\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ has finitely many minimal primes $P_{1},\dots,P_{k}$ . 2. (2)

The ideal $P_{1}\cap\dots\cap P_{k}$ equals the radical of $I$ . 3. (3)

The irreducible components of $V(I)$ are $V(P_{1}),\dots,V(P_{k})$ .

If $I$ is not radical, then $P_{1}\cap\dots\cap P_{k}\subseteq I$ . In this case, it is still possible to write $I$ as an intersection of special ideals (called primary ideals) in a way that is algebraically and geometrically meaningful. This intersection is called a primary decomposition. The precise definitions are omitted, since a primary decomposition often adds little to the statistical understanding. However, some works in algebraic statistics written by algebraists who do care about the differences between ideals and their radicals use this notation. The following result explains how a primary decomposition is related to the minimal primes.

Theorem 2.18.

Let $I=I_{1}\cap\dots\cap I_{l}$ be a primary decomposition of $I\subseteq\Bbbk[p_{1},\ldots,p_{r}]$ , and let $P_{i}$ be the radical of $I_{i}$ .

(1)

$V(I)=V(I_{1})\cup V(I_{2})\cup\dots\cup V(I_{l})=V(P_{1})\cup V(P_{2})\cup\dots\cup V(P_{l})$ . 2. (2)

Each $P_{i}$ is prime. 3. (3)

Each minimal prime of $I$ is among the $P_{i}$ . 4. (4)

If $P_{i}$ is not a minimal prime of $I$ , then there is a minimal prime $P_{j}$ of $I$ with $P_{j}\subset P_{i}$ (and so $V(P_{i})\subset V(P_{j})$ ).

*Example 2.19**.*

Let $I=\langle xy,xz\rangle\in\Bbbk[x,y,z]$ . The variety $V(I)$ consists of the union of the plane where $x=0$ , and the line where $y=0,z=0$ . Hence $V(\langle xy,xz\rangle)=V(\langle x\rangle)\cup V(\langle y,z\rangle)$ is a decomposition into irreducibles. This corresponds to the ideal decomposition $\langle xy,xz\rangle=\langle x\rangle\cap\langle y,z\rangle$ .

The primary decomposition need not be unique.

*Example 2.20**.*

The ideal $\langle x^{2},xy\rangle$ has several different primary decompositions, e.g.

[TABLE]

The variety $V(\langle x^{2},xy\rangle)$ equals the line where $x=0$ , corresponding to the unique minimal prime $\langle x\rangle$ . The non-uniqueness of the primary decomposition is related to the fact that the variety of the “extra” component is a subset of one of the other components. This variety (which is superfluous in the irreducible decomposition) is called an embedded components. This example can be analyzed as follows using the computer algebra system Macaulay2 [19]. First set up a polynomial ring in the indeterminates $x,y$ with the rational numbers $\mathbb{Q}$ as the coefficient field. In Macaulay2 it is advisable to work with $\mathbb{Q}$ rather than $\mathbb{R}$ or $\mathbb{C}$ since the arithmetic in $\mathbb{Q}$ can be carried out exactly on a computer.

i1 : R = QQ[x,y] o1 = R o1 : PolynomialRing

The system reports that it understands R as a polynomial ring. The following input makes Macaulay2 decompose the ideal. The decomposition is computed over $\mathbb{Q}$ , but in this case it happens to be valid over any field $\Bbbk$ .

i2 : primaryDecomposition ideal (x^2, x*y) 2 o2 = {ideal x, ideal (x , y)}

If one is only interested in the irreducible decomposition, the command decompose returns the minimal primes corresponding to the irreducible components, discarding all embedded components:

i3 : decompose ideal (x^2, x*y) o3 = {ideal x}

2.3. Binomial ideals

This Section ends with a short discussion of binomial ideals and toric ideals, which make frequent appearance in applications.

Definition 2.21.

A binomial is a polynomial $p^{u}-\lambda p^{v}$ , $\lambda\in\Bbbk$ with at most two terms. An ideal $I$ is a binomial ideal if it has a generating set of binomials. A binomial ideal that is prime and does not contain any variable is a toric ideal.

The main reason why it is important whether an ideal is binomial is that there are dedicated algorithms for binomial ideals that are much faster than the generic algorithms that work for any ideal [13, 10, 22, 23]. Note that there are some instances of ideals that arise in algebraic statistics that are not binomial in their natural coordinate systems but become binomial ideals after a linear change of coordinates [31].

Let $A\in\mathbb{Z}^{h\times r}$ be an integer matrix, and consider the ideal

[TABLE]

in the polynomial ring $\Bbbk[p_{1},\dots,p_{r}]$ , where $u=u_{+}-u_{-}$ is the decomposition of $u$ into its positive and negative part $u_{+},u_{-}\in\mathbb{N}^{r}$ . Clearly, $I_{A}$ is binomial and does not contain any of the $p_{i}$ . One can also show that $I_{A}$ is prime, and thus it is an example of a toric ideal. In fact, any toric ideal is of this form up to a scaling of coordinates [13, Corollary 2.6]. The generating set above is infinite, but Theorem 3.1 in [9] shows that finite generating sets of toric ideals are related to Markov bases, which can be computed using the software 4ti2 [1].

2.4. Real algebraic geometry

In addition to polynomial equations, in many situations in statistics it is useful to consider solutions to polynomial inequalities as well. This is the subject of the field real algebraic geometry. Inequalities only make sense over an ordered field like $\mathbb{R}$ (but not over $\mathbb{C}$ ). For simplicity, the following definitions and results are formulated with $\mathbb{R}$ . Again, this text only contains the basic definitions. For more details the reader is referred to [4, 5].

Definition 2.22.

Let $\mathcal{F},\mathcal{G}\subseteq\mathbb{R}[p_{1},\ldots,p_{r}]$ be sets of polynomials with $\mathcal{G}$ finite. The basic semialgebraic set defined by $\mathcal{F}$ and $\mathcal{G}$ is

[TABLE]

A semialgebraic set is a finite union of basic semialgebraic sets.

Here are some common examples of semialgebraic sets arising in statistics.

*Example 2.23**.*

The open probability simplex

[TABLE]

consists of all probability distributions for a categorical random variables with $r$ states. It is a basic semialgebraic set: In the above definition, one may take $\mathcal{F}=\big{\{}\sum_{i=1}^{r}p_{i}-1\big{\}}$ and $\mathcal{G}=\big{\{}p_{1},\ldots,p_{r}\big{\}}$ . The probability simplex

[TABLE]

is a semialgebraic set. It can be written as the union of $2^{r}-1$ basic semialgebraic sets.

*Example 2.24**.*

The cone $PD_{m}$ of $m\times m$ positive definite symmetric matrices is an example of a basic semialgebraic set in $\mathbb{R}^{\binom{m+1}{2}}$ , where $\mathcal{F}=\emptyset$ and where $\mathcal{G}$ consists of the principal subdeterminants of an $m\times m$ symmetric matrix of indeterminates. For instance, if $m=3$ consider the polynomial ring $\mathbb{R}[\sigma_{11},\sigma_{12},\sigma_{13},\sigma_{22},\sigma_{23},\sigma_{33}]$ and the symmetric matrix of indeterminates

[TABLE]

The symmetry has been enforced by making certain entries in the matrix equal. The set of polynomials defining $PD_{3}$ can be chosen to be

[TABLE]

the set of leading principal minors of $\Sigma$ . The cone of positive semidefinite symmetric matrices is a semialgebraic set, which can be realized by using non-strict inequalities with the much larger set of all principal minors of $\Sigma$ .

3. Conditional Independence Ideals

This section shows how the algebraic tools introduced in Section 2 can be used to analyze conditional independence structures. The tools can be applied in the settings of discrete random variables and jointly normal variables, but in different ways.

3.1. Discrete random variables

Let $X_{1},X_{2},\ldots,X_{m}$ be finite discrete random variables. Suppose that the state space of $X_{i}$ is $[r_{i}]:=\{1,2,\ldots,r_{i}\}$ . There is an algebraic description of the set of all distributions that satisfy a given conditional independence statement. The first example comes from the simplest CI statement: $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ .

Proposition 3.1.

Let $X_{1},X_{2}$ be discrete random variables where the state space of $X_{i}$ is $[r_{i}]$ . Let $p_{i_{1}i_{2}}=P(X_{1}=i_{1},X_{2}=i_{2})$ and let $p=(p_{i_{1}i_{2}})_{i_{1}\in[r_{1}],i_{2}\in[r_{2}]}$ be the joint probability mass function of $X_{1}$ and $X_{2}$ . Then $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ if and only if $p$ is a rank one matrix.

Proof.

If $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ then $P(X_{1}=i_{1},X_{2}=i_{2})=P(X_{1}=i_{1})P(X_{2}=i_{2})$ . This expresses the joint probability mass function as an outer product of two nonzero vectors, hence $p$ has rank one.

Conversely, if $p$ has rank one, it is expressed as the outer product of two vectors $p=\alpha^{T}\beta$ . Since $p$ is a matrix of nonnegative real numbers, one can assume that $\alpha$ and $\beta$ are also nonnegative. Let $\|.\|_{1}$ denote the $l_{1}$ -norm. Replacing $\alpha$ by $\alpha/\|\alpha\|_{1}$ and $\beta$ by $\beta/\|\beta\|_{1}$ , yields a rank one factorization for $p$ where the two factors are necessarily the marginal distributions of $X_{1}$ and $X_{2}$ respectively. Hence $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ . ∎

A nonzero matrix having rank one is characterized by the vanishing of all its $2\times 2$ subdeterminants. Hence, one can associate an ideal to the independence statement $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ .

Definition 3.2.

The conditional independence ideal for the statement $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ is

[TABLE]

*Example 3.3**.*

Let $r_{1}=2$ and $r_{2}=3$ . Then

[TABLE]

The conditional independence ideal $I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}$ captures the algebraic structure of the independence condition. Although all probability distributions would satisfy the additional constraint that $\sum_{i_{1}\in[r_{1}],i_{2}\in[r_{2}]}p_{i_{1}i_{2}}-1=0$ , this trivial constraint is not included in the conditional independence ideal because leaving it out tends to simplify certain algebraic calculations. For example, without this constraint $I_{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}$ is a binomial ideal.

More generally, any conditional independence condition for discrete random variables can be expressed by similar determinantal constraints. This requires a bit of notation. The determinantal constraints are written in terms of the entries of the joint distribution of $X_{1},\dots,X_{m}$ . This is a tensor $p=(p_{i_{1},\dots,i_{m}})_{i_{j}\in[r_{j}]}$ .

Let $A,B,C\subset[m]$ be disjoint subsets of indices of the random variables $X_{1},\dots,X_{m}$ , and $D=[m]\setminus(A\cup B\cup C)$ the set of indices appearing in none of $A,B,C$ . Any such assignment yields a grouping of indices and random variables. The random vector $X_{A}=(X_{j})_{j\in A}$ takes values in $\mathcal{R}_{A}=\prod_{j\in A}[r_{j}]$ . Let $\mathcal{R}_{B},\mathcal{R}_{C}$ and $\mathcal{R}_{D}$ be defined analogously. The grouping allows one to write $p=(p_{i_{A},i_{B},i_{C},i_{D}})$ where now $i_{A}\in\mathcal{R}_{A}$ and similarly for $i_{B},i_{C}$ , and $i_{D}$ . The final notational gadget is the marginalization of $p$ over $D$ . The entries of this marginal distribution are indexed by $\mathcal{R}_{A},\mathcal{R}_{B},\mathcal{R}_{C}$ and have entries

[TABLE]

The $+$ indicates the summation.

Definition 3.4.

The conditional independence ideal for the conditional independence statement $\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}$ is

[TABLE]

The notation simplifies for saturated conditional independence statements, for which $A\cup B\cup C=[m]$ . With this condition there is no marginalization, and the defining polynomials of $I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}$ are binomials.

*Example 3.5**.*

Consider three binary random variables $X_{1},X_{2},X_{3}$ . Let $p_{111},\dots,p_{222}$ denote the indeterminates standing for the elementary probabilities in the joint distribution. The conditional independence ideal of the statement $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}$ is

[TABLE]

The conditional independence ideal of the statement $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.$ is

[TABLE]

Proposition 3.6.

For any conditional independence statement $\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}$ , the conditional independence ideal $I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}}$ is a prime ideal and hence $V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}})$ is an irreducible variety.

Proposition 3.6 is a consequence of the fact that general determinantal ideals are prime (see [6]). Irreducibility of the variety $V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}})$ can also be deduced from the fact that this variety can be parametrized, for instance, the set of all probability distributions in $V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}})$ can be realized as the set of probability distributions in a graphical model.

*Example 3.7**.*

A strictly positive joint distribution $p$ of binary random variables $X_{1},X_{2},X_{3}$ satisfies $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}$ if and only if

[TABLE]

for some vectors $(s_{i_{1},i_{2}})_{i_{1}\in[r_{1}],i_{2}\in[r_{2}]}$ , $(t_{i_{2},i_{3}})_{i_{2}\in[r_{2}],i_{3}\in[r_{3}]}$ (see Section 1.3 of Part I). That is, it lies in the undirected graphical model

[TABLE]

Since $V(I_{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}})$ is irreducible, any joint distribution (possibly with zeros) that satisfies $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{}$ lies in the closure of the undirected graphical model. In fact, any such joint distribution has a parametrization of the form (1), where $s$ or $t$ also may have zeros.

More interesting than just single statements are combinations of two or more conditional independence statements. To determine the classes of distributions satisfying a collection of independence statements leads to interesting problems in computational algebra. Such sets are typically not irreducible varieties and cannot be parametrized with a single parametrization. The first task is to break such a set into components, and to see if those components have natural interpretations in terms of conditional independence and can be parametrized.

Definition 3.8.

Let $\mathcal{C}=\{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{},\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{},\ldots\}$ be a set of conditional independence statements for the random variables $X_{1},X_{2},\ldots,X_{m}$ . The conditional independence ideal of $\mathcal{C}$ is the sum of the conditional independence ideals of the elements of $\mathcal{C}$ :

[TABLE]

Understanding the probability distributions that satisfy $\mathcal{C}$ can be accomplished by analyzing an irreducible decomposition of $V(I_{\mathcal{C}})$ , which can be obtained from a primary decomposition of $I_{\mathcal{C}}$ .

*Example 3.9**.*

Let $X_{1}$ , $X_{2}$ , $X_{3}$ be binary random variables, and consider $\mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{},\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\}$ . The conditional independence ideal $I_{\mathcal{C}}$ is generated by three polynomials of degree $2$ :

[TABLE]

The following Macaulay2 code asks for the primary decomposition of this ideal over $\mathbb{Q}$ . It can be shown that the decomposition is the same over $\mathbb{R}$ and $\mathbb{C}$ .

loadPackage "GraphicalModels" S = markovRing (2,2,2) L = {{{1},{3},{2}}, {{1},{3},{}}} I = conditionalIndependenceIdeal(S,L) primaryDecomposition I

This code uses the GraphicalModels package of Macaulay2 which implements many convenient functions to work with graphical and other conditional independence models. In particular, it allows to easily set up the polynomial ring with eight variables $p_{111},\dots,p_{222}$ with markovRing and write out the equations for $I_{\mathcal{C}}$ with conditionalIndependenceIdeal. The command primaryDecomposition is a generic Macaulay2 command. The output of this code consists of two ideals which upon inspection can be recognized as binomial conditional independence ideals themselves. The result is

[TABLE]

According to Section 2 this implies a decomposition of varieties

[TABLE]

On the level of probability distributions, this proves the binary case of Proposition 1.2.

The general situation may be less favorable than that in Example 3.9. In particular, the components that appear need not have interpretations in terms of conditional independence. The appearing ideals also need not be prime ideals (in general they are only primary) and it is unclear what this algebraic extra information may reveal about conditional independence. For examples on how to extract information from primary decompositions see [20, 24].

3.2. Gaussian random variables

Algebraic approaches to conditional independence can also be applied to Gaussian random variables. Let $X\in\mathbb{R}^{m}$ be a nonsingular multivariate Gaussian random vector with mean $\mu\in\mathbb{R}^{m}$ and covariance matrix $\Sigma\in PD_{m}$ , the cone of $m\times m$ symmetric positive definite matrices. One writes $X\sim\mathcal{N}(\mu,\Sigma)$ . For subsets $A,B\subseteq[m]$ let $\Sigma_{A,B}$ be the submatrix of $\Sigma$ obtained by extracting rows indexed by $A$ and columns indexed by $B$ , that is $\Sigma_{A,B}=(\sigma_{a,b})_{a\in A,b\in B}$ .

Proposition 3.10.

Let $X\sim\mathcal{N}(\mu,\Sigma)$ with $\Sigma\in PD_{m}$ . Let $A,B,C\subseteq[m]$ be disjoint subsets. Then the conditional independence statement $\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}$ holds if and only if the matrix $\Sigma_{A\cup C,B\cup C}$ has rank $\leq\#C$ .

A proof of this proposition recognizes $\Sigma_{A\cup C,B\cup C}$ as a Schur complement of a submatrix of $\Sigma$ . The details can be found in [11, Proposition 3.1.13].

Just as the rank one condition on a matrix was characterized by the vanishing of $2\leavevmode\nobreak\ \times\leavevmode\nobreak\ 2$ subdeterminants, higher rank conditions on matrices can also be characterized by the vanishing of subdeterminants. Indeed, a basic fact of linear algebra is that a matrix has rank $\leq r$ if and only if the determinant of every $(r+1)\times(r+1)$ submatrix is zero. This leads to the conditional independence ideals for multivariate Gaussian random variables.

Let $\mathbb{R}[\Sigma]:=\mathbb{R}[\sigma_{ij}:1\leq i\leq j\leq m]$ be the polynomial ring with real coefficients in the entries of the symmetric matrix $\Sigma$ .

Definition 3.11.

The Gaussian conditional independence ideal for the conditional independence statement $\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}$ is the ideal

[TABLE]

If $\mathcal{C}=\{\left.A_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{1}\,\middle|C_{1}\right.{},\left.A_{2}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{2}\,\middle|C_{2}\right.{},\ldots,\}$ is a collection of conditional independence statements, the Gaussian conditional independence ideal is

[TABLE]

*Remark 3.12**.*

A common criterion in statistics says that, in fact, $\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\right.{}$ holds if and only if $\det(\Sigma_{\{a\}\cup C,\{b\}\cup C})$ vanishes for all $a\in A$ , $b\in B$ . Since, by assumption, $C$ is non-singular, it is easy to see that this condition is, in fact, equivalent to the vanishing of all $(\#C+1)$ -minors of $\Sigma_{A\cup C,B\cup C}$ .

*Example 3.13**.*

Consider the conditional independence statement $\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{1,3\}\,\middle|4\right.{}$ . The ideal $J_{\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{1,3\}\,\middle|4\right.{}}$ is generated by the $2\times 2$ minors of the matrix

[TABLE]

Since $\Sigma$ is a symmetric matrix, $\sigma_{ij}=\sigma_{ji}$ and one always writes $\sigma_{ij}$ with $i\leq j$ . Then

[TABLE]

*Example 3.14**.*

Let $X_{1}$ , $X_{2}$ , $X_{3}$ be jointly Gaussian random variables. The conditional independence ideal of $\mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|2\right.{},\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\}$ is

[TABLE]

Straightforward manipulations of these ideals show

[TABLE]

This last primary decomposition proves the Gaussian case of Proposition 1.2.

3.3. The contraction axiom

When computing the decompositions of conditional independence ideals, there might be components that are “uninteresting" from the statistical standpoint. These components might not intersect the region of interest in probabilistic applications (e.g. they might miss the probability simplex or the cone of positive definite matrices) or they might have non-trivial intersections but that intersection is contained in some other component.

*Example 3.15**.*

Let $X=(X_{1},X_{2},X_{3})$ be a multivariate Gaussian random vector. The conditional independence ideal of $\mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\,\middle|3\right.{},\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\}$ is

[TABLE]

which has primary decomposition

[TABLE]

This decomposition shows that

[TABLE]

However, the second component does not intersect the positive definite cone, because $\sigma_{33}>0$ for all $\Sigma\in PD_{3}$ . The first component is the conditional independence ideal $J_{\left.1,3\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.}$ . From this decomposition it is visible that $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\,\middle|3\right.{}$ and $\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.$ imply that $\left.1,3\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\right.$ . This implication is called the contraction axiom. See Section 1.5 of Part I for other CI axioms.

The contraction axiom also holds for non-Gaussian random variables. For discrete random variables, it can again be checked algebraically. The primary decomposition associated to the discrete contraction axiom is worked out in detail in [17]. The next examples discusses the binary case as an illustration:

*Example 3.16**.*

Let $X_{1},X_{2},X_{3}$ be binary random variables. The conditional independence ideal of $\mathcal{C}=\{\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}2\,\middle|3\right.{},\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\right.\}$ is

[TABLE]

which has primary decomposition

[TABLE]

The intersection of the second component with the probability simplex forces that $p_{122}=p_{222}=p_{112}=p_{212}=0$ . This in turn implies that

[TABLE]

A similar argument holds for the third component. So although the variety $V(I_{\mathcal{C}})$ has three components, only one of them is statistically meaningful:

[TABLE]

4. Examples of Decompositions of Conditional Independence Ideals

This section studies some examples of families of conditional independence statements and how algebraic tools can be used to understand them. The first example is a detailed study of the intersection axiom, and the second example concerns the conditional independence statements associated to the $4$ -cycle graph.

4.1. The intersection axiom

The intersection axiom (see Section 1.5) is the following implication of conditional independence statements:

[TABLE]

This implication is valid for strictly positive probability distributions. Algebraic techniques can be used to study how its validity extends beyond this.

The question about the primary decomposition of the ideal(s) $I_{\{\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B\,\middle|C\cup D\right.{},\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}C\,\middle|B\cup D\right.{}\}}$ was first asked in [11, Chapter 6.6], and the answer is due to [15]. Grouping variables if necessary, one can assume $A=\{1\}$ , $B=\{2\}$ , $C=\{3\}$ . Moreover, let $D=\emptyset$ . From this one can always recover the general case by adding conditioning constraints.

Proposition 4.1 (Proposition 1 in [15]).

The ideal $I_{\{\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{2}\,\middle|X_{3}\right.{},\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{3}\,\middle|X_{2}\right.{}\}}$ is radical, that is, its irredundant primary decompositions consists only of prime ideals. These minimal primes correspond to pairs of partitions $[r_{2}]=A_{1}\cup\dots\cup A_{s}$ , $[r_{3}]=B_{1}\cup\dots\cup B_{s}$ of the same size. The minimal prime $P$ corresponding to two partitions is

[TABLE]

The paper [15] uses a different formulation in terms of complete bipartite graphs. It can be seen that our formualation is equivalent.

To give a statistical interpretation to Proposition 4.1, whenever the joint distribution of $X_{1},X_{2},X_{3}$ lies in the prime $P$ corresponding to the two partitions $[r_{2}]=A_{1}\cup\dots\cup A_{s}$ , $[r_{3}]=B_{1}\cup\dots\cup B_{s}$ , construct a random variable $B$ as follows: put $B:=j$ whenever $X_{2}\in A_{j}$ and $X_{3}\in B_{j}$ . Thus, $B$ is uniquely defined except on a set of measure zero, since $P(X_{2}\in A_{j},X_{3}\in B_{k})=0$ for $j\neq k$ , which follows from the containment of monomials in $P$ . The variable $B$ specifies in which blocks of the two partitions the random variables $X_{2}$ and $X_{3}$ lie. Now the binomials in $P$ imply that $\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{X_{2},X_{3}\}\,\middle|B\right.{}$ .

Corollary 4.2.

Suppose that $X_{1},X_{2},X_{3}$ satisfy $\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{2}\,\middle|X_{3}\right.{}$ and $\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{3}\,\middle|X_{2}\right.{}$ . Then there is a random variable $B$ that satisfies:

(1)

$B$ * is a (deterministic) function of $X_{2}$ ;* 2. (2)

$B$ * is a (deterministic) function of $X_{3}$ ;* 3. (3)

$\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{X_{2},X_{3}\}\,\middle|B\right.{}$ .

Conversely, whenever there exists a random variable $B$ with properties 1. to 3., the random variables $X_{1},X_{2},X_{3}$ satisfy $\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{2}\,\middle|X_{3}\right.{}$ and $\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X_{3}\,\middle|X_{2}\right.{}$ .

The case where $B$ is a constant corresponds to the CI statement $\left.X_{1}\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\{X_{2},X_{3}\}\right.$ . The intersection axiom can be recovered by noting that a function $B$ that is a function of only $X_{2}$ as well as a function of only $X_{3}$ is necessarily constant, if the joint distribution of $X_{2}$ and $X_{3}$ is strictly positive.

Similar results hold for all families of CI statements of the form $\left.A\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}B_{i}\,\middle|C_{i}\right.{}$ , where $A\cup B_{i}\cup C_{i}=V$ and where $A$ is fixed for all statements, see [28, 29] (the case where $X_{A}$ is binary was already described in [20]). The corresponding CI ideal is still radical, and the minimal primes have a similar interpretation. However, finding the minimal primes is more difficult and involves solving a combinatorial problem. Finally, Corollary 4.2 can be generalized to continuous random variables [27].

4.2. The four-cycle

Consider four discrete random variables $X_{1},X_{2},X_{3},X_{4}$ and the (undirected) graphical model of the four cycle $C_{4}=(V,E)$ with edge set $E=\{(1,2),(2,3),(3,4),(1,4)\}$ . The global Markov CI statements of this graph are

[TABLE]

The primary decomposition of the corresponding CI ideal was studied in [24] in the case where $X_{1}$ and $X_{3}$ are binary. The case that all variables are binary is as follows:

Proposition 4.3 (Theorem 5.6 in [24]).

The minimal primes of the CI ideal $I_{\operatorname{global}(C_{4})}$ of the binary four cycle are the toric ideal $I_{C_{4}}$ and the monomial ideals

[TABLE]

The ideal $I_{C_{4}}$ equals the vanishing ideal of the graphical model, to be discussed in Section 5. Interestingly, in this primary decomposition, all ideals except $I_{C_{4}}$ are monomial ideals; that is, they only give support restrictions on the probability distribution.

The primary decomposition of the CI ideal gives an irreducible decomposition of the corresponding set of probability distributions. This leads to the statement of Proposition 1.1 in the introduction.

When $X_{2}$ and $X_{4}$ are not binary, the decomposition of $I_{\operatorname{global}(C_{4})}$ involves prime ideals parametrized by $i\in\{2,4\}$ and two sets $\emptyset\neq C,D\subsetneq[r_{i}]$ . For such a choice of $i,C,D$ , let $j$ denote the element of $\{2,4\}\setminus\{i\}$ , and let

[TABLE]

the result is the following:

Proposition 4.4 (Theorem 6.5 in [24]).

Let $X_{1}$ and $X_{3}$ be binary random variables. The minimal primes of the CI ideal $I_{\operatorname{global}(C_{4})}$ of the four cycle are the toric ideal $I_{C_{4}}$ and the ideals $P_{i,C,D}$ for $i\in\{2,4\}$ and $\emptyset\neq C,D\subsetneq[r_{i}]$ . Furthermore the ideal is radical and thus equals the intersection of its minimal primes.

In this case, the non-toric primes are not monomial, but consist of monomials and binomials. This fact is independent of the field $\Bbbk$ . The following is one example of the kind of information that can be extracted from knowing the minimal primes of a conditional independence ideal.

Corollary 4.5.

Let $X_{1},X_{2},X_{3},X_{4}$ be finite random variables that satisfy $\operatorname{global}(C_{4})$ , and suppose that $X_{1}$ and $X_{3}$ are binary. Then one (or more) of the following statements is true:

(1)

The joint distribution lies in the closure of the graphical model. 2. (2)

There is $i\in\{2,4\}$ and there are sets $E,F\subseteq[r_{i}]$ such that the following holds:

[TABLE]

Conversely, any probability distribution that satisfies one of these statements and that satisfies $\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}4\,\middle|\{1,3\}\right.{}$ also satisfies $\operatorname{global}(C_{4})$ .

5. The Vanishing Ideal of a Graphical Model

Graphical models can be represented via either parametric descriptions (e.g. factorizations of the density function) or implicit descriptions (e.g. Markov properties and conditional independence constraints). One use of the algebraic perspective on graphical models is to find the complete implicit description of the model, in particular, to find the vanishing ideal of the model. As described in Definition 2.8, the vanishing ideal of a set $S$ is the set of all polynomial functions that evaluate to zero at every point in $S$ . Although some graphical models have complete descriptions only in terms of conditional independence constraints, understanding the vanishing ideal can be useful for more complex models or hidden variable models where conditional independence is not sufficient to describe the model, for instance, the mixed graph models studied in Section 2 of Part I.

*Example 5.1**.*

Consider the four cycle $C_{4}$ and let $X_{1},X_{2},X_{3},X_{4}$ be binary random variables. The vanishing ideal $I_{C_{4}}\subseteq\mathbb{R}[p_{i_{1}i_{2}i_{3}i_{4}}:i_{1},i_{2},i_{3},i_{4}\in\{1,2\}]$ is generated by $16$ binomials, $8$ of which have degree $2$ and $8$ of which have degree $4$ . The degree $2$ binomials are all implied by the two conditional independence statements $\left.1\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}3\,\middle|\{2,4\}\right.{}$ and $\left.2\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}4\,\middle|\{1,3\}\right.{}$ . On the other hand, the degree $4$ binomials are not implied by the conditional independence constraints, even when we restrict to probability distributions. One example degree four polynomial is

[TABLE]

and the others are obtained by applying the symmetry group of the four cycle and permuting levels of the random variables.

Even in the simple example of the four cycle, there are generators of the vanishing ideal that do not correspond to conditional independence statements. It seems an important problem to try to understand what other types of equations can arise. Theorem 3.2 in [18] shows that the vanishing ideal of a graphical model of discrete random variables is the toric ideal $I_{A}$ , where $A$ is the design matrix of the graphical model, defined in the end of Section 2. A classification for discrete random variables and undirected graphs of when no further polynomials are needed beyond conditional independence constraints is obtained in the following theorem:

Theorem 5.2 (Theorem 4.4 in [18]).

Let $G$ be an undirected graph and let $\mathcal{M}_{G}$ be its graphical model for discrete random variables. Then the vanishing ideal $I(\mathcal{M}_{G})$ is equal to the conditional independence ideal $I_{\operatorname{global}(G)}$ if and only if $G$ is a chordal graph.

It is unknown what the appropriate analogue of Theorem 5.2 for other families of graphical models, either with different classes of graphs (e.g. DAGs) or with other types of random variables (e.g. Gaussian). Computational studies of the vanishing ideals appear in many different papers: for Bayesian networks with discrete random variables [17], for Bayesian networks with Gaussian random variables [33], for undirected graphical models with Gaussian random variables [32]. A characterization for which graph families the vanishing ideal is equal to the conditional independence ideal of global Markov statements is lacking in all these cases.

One natural question is to determine the other generators of the vanishing ideal that do not come from conditional independence, and to give combinatorial structures in the underlying graphs that imply that these more general constraints hold. For instance, for mixed Gaussian graphical models, conditional independence constraints are determinantal, but not every determinantal constraint comes from a conditional independence statement, and there is a characterization of which determinantal constraints come from conditional independence:

A mixed graph $G=([m],B,D)$ is a triple where $[m]=\{1,2,\ldots,m\}$ is the vertex set, $B$ is a set of unordered pairs of $[m]$ representing bidirected edges in $G$ , and $D$ is a set of ordered pairs of $[m]$ represented directed edges in $G$ . There might also be both directed and bidirected edges between a pair of vertices. To the set of $B$ of bidirected edges one associates the set of symmetric positive definite matrices

[TABLE]

To the set of directed edges one associates the set of $m\times m$ matrices

[TABLE]

Let $\epsilon\sim\mathcal{N}(0,\Omega)$ and let $X$ be a jointly normal random vector satisfying the structural equation system

[TABLE]

This is an example of a linear structural equation model, and contains as special cases various families of graphical models. Let $\mathrm{Id}$ denote the $m\times m$ identity matrix. With these assumptions, if $(\mathrm{Id}-\Lambda)$ is invertible,

[TABLE]

*Example 5.3**.*

Consider the mixed graph $G$ from Figure 1. In this case, $PD(B)$ is the set of positive definite matrices of the form

[TABLE]

and $\mathbb{R}^{D}$ is the set of real matrices of the form

[TABLE]

A positive definite matrix $\Sigma$ belongs to the graphical model associated to this mixed graph, if and only if there are $\Omega\in PD(B)$ and $\Lambda\in\mathbb{R}^{D}$ such that $\Sigma=(\mathrm{Id}-\Lambda)^{-T}\Omega(\mathrm{Id}-\Lambda)^{-1}$ .

Definition 5.4.

Let $G=([m],B,D)$ be a mixed graph. A trek between vertices $i$ and $j$ in $G$ consists of either

•

a pair $(P_{L},P_{R})$ where $P_{L}$ is a directed path ending in $i$ and $P_{R}$ is a directed path ending in $j$ where both $P_{L}$ and $P_{R}$ have the same source, or

•

a pair $(P_{L},P_{R})$ where $P_{L}$ is a directed path ending in $i$ and $P_{R}$ is a directed path ending in $j$ such that the source of $P_{L}$ and the source of $P_{R}$ are connected by a bidirected edge.

Let $\mathcal{T}(i,j)$ denote the set of all treks in $G$ between $i$ and $j$ .

To each trek $T=(P_{L},P_{R})$ one associates the trek monomial $m_{T}$ which is the product with multiplicities of all $\lambda_{st}$ over all directed edges appearing in $T$ times $\omega_{st}$ where $s$ and $t$ are the sources of $P_{L}$ and $P_{R}$ . One reason for the interest in treks is the trek rule, which says the for the Gaussian graphical model associated to $G$

[TABLE]

For instance, for the mixed graph in Figure 1, the pair $(\{1\to 2,2\to 3\},\{1\to 3\})$ is a trek from $3$ to $3$ . The corresponding trek monomial $m_{T}$ is $\omega_{11}\lambda_{12}\lambda_{23}\lambda_{13}$ .

Definition 5.5.

Let $A,B,C_{A},C_{B}$ be four sets of vertices of $G$ , not necessarily disjoint. The pair of sets $(C_{A},C_{B})$ t-separates (short for trek separates) $A$ from $B$ if for every $a\in A$ and $b\in B$ and every trek $(P_{L},P_{R})\in\mathcal{T}(a,b)$ , either $P_{L}$ has a vertex in $C_{A}$ or $P_{R}$ has a vertex in $C_{B}$ , or both.

Theorem 5.6.

[35]** Let $G=([m],B,D)$ be a mixed graph and $A$ and $B$ two subsets of $[m]$ with $|A|=|B|=k$ . Then the minor $\det\Sigma_{A,B}$ belongs to the vanishing ideal $I_{G}$ if and only if there is a pair of sets $C_{A},C_{B}$ such that $(C_{A},C_{B})$ t-separates $A$ and $B$ and such that $|C_{A}|+|C_{B}|<k$ .

The t-separation criterion can produce implicit constraints for structural equation models in situations where there are no conditional independence constraints.

*Example 5.7**.*

Consider the mixed graph $G$ from Figure 1. The vanishing ideal of the model is $I_{G}=\langle\det\Sigma_{\{1,2\},\{3,4\}}\rangle$ . This determinantal constraint is not a conditional independence constraint. It is implied by the t-separation criterion because the pair $(\emptyset,\{3\})$ t-separates $\{1,2\}$ and $\{3,4\}$ .

*Remark 5.8**.*

In the case where there are hidden random variables, the vanishing ideal is typically not sufficient to completely describe the set of probability distributions that come from the graphical model. Usually one also needs to consider inequality constraints and other semialgebraic conditions. This problem is discussed in more detail in [3, 2, 37], among others.

6. Further Reading

Diaconis, Eisenbud and Sturmfels [8] were the first to consider primary decompositions for statistical applications, in particular the analysis of the connectivity of certain random walks. This perspective was also picked up in [24] using conditional independence ideals. Primary decomposition of conditional independence ideals also makes an appearance in the following papers not already mentioned [12, 16, 25, 34, 36].

The algebraic view on undirected graphical models was presented in [18], which began extensive study of the vanishing ideals of undirected graphical models for discrete random variables. Focus has been on developing techniques for constructing generating sets of the vanishing ideals with [14, 21, 30] being representative papers in this area. Vanishing ideals of undirected models with Gaussian random variables and models for DAGs have not been much studied. Some papers that initiated their study include [17, 32, 33].

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] 4ti 2 team, 4ti 2—a software package for algebraic, geometric and combinatorial problems on linear spaces , available at http://www.4ti 2.de .
2[2] Elizabeth S. Allman, John A. Rhodes, Bernd Sturmfels, and Piotr Zwiernik, Tensors of nonnegative rank two , Linear Algebra Appl. 473 (2015), 37–53. MR 3338324
3[3] Elizabeth S. Allman, John A. Rhodes, and Amelia Taylor, A semialgebraic description of the general Markov model on phylogenetic trees , SIAM J. Discrete Math. 28 (2014), no. 2, 736–755. MR 3206983
4[4] Saugata Basu, Richard Pollack, and Marie-Françoise Roy, Algorithms in real algebraic geometry , second ed., Algorithms and Computation in Mathematics, vol. 10, Springer-Verlag, Berlin, 2006. MR 2248869
5[5] Jacek Bochnak, Michel Coste, and Marie-Françoise Roy, Real algebraic geometry , Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 36, Springer-Verlag, Berlin, 1998, Translated from the 1987 French original, Revised by the authors. MR 1659509
6[6] Winfried Bruns and Udo Vetter, Determinantal rings , Monografías de Matemática [Mathematical Monographs], vol. 45, Instituto de Matemática Pura e Aplicada (IMPA), Rio de Janeiro, 1988. MR 986492
7[7] David A. Cox, John Little, and Donal O’Shea, Ideals, varieties, and algorithms , fourth ed., Undergraduate Texts in Mathematics, Springer, Cham, 2015, An introduction to computational algebraic geometry and commutative algebra. MR 3330490
8[8] Persi Diaconis, David Eisenbud, and Bernd Sturmfels, Lattice walks and primary decomposition , Mathematical essays in honor of Gian-Carlo Rota (Cambridge, MA, 1996), Progr. Math., vol. 161, Birkhäuser Boston, Boston, MA, 1998, pp. 173–193. MR 1627343

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Algebraic Aspects of Conditional Independence and Graphical Models

Abstract.

2010 Mathematics Subject Classification:

1. Introduction

Proposition 1.1**.**

Proposition 1.2**.**

2. Notions of Algebraic Geometry and Commutative Algebra

2.1. Polynomials, ideals and varieties

Definition 2.1**.**

Example 2.2*.*

Example 2.3*.*

Definition 2.4**.**

Definition 2.5**.**

Proposition 2.6**.**

Example 2.7*.*

Definition 2.8**.**

Definition 2.9**.**

Proposition 2.10**.**

2.2. Irreducible and primary decomposition

Definition 2.11**.**

Theorem 2.12**.**

Theorem 2.13**.**

Definition 2.14**.**

Theorem 2.15**.**

Definition 2.16**.**

Theorem 2.17**.**

Theorem 2.18**.**

Example 2.19*.*

Example 2.20*.*

2.3. Binomial ideals

Definition 2.21**.**

2.4. Real algebraic geometry

Definition 2.22**.**

Example 2.23*.*

Example 2.24*.*

3. Conditional Independence Ideals

3.1. Discrete random variables

Proposition 3.1**.**

Proof.

Definition 3.2**.**

Example 3.3*.*

Definition 3.4**.**

Example 3.5*.*

Proposition 3.6**.**

Example 3.7*.*

Definition 3.8**.**

Example 3.9*.*

3.2. Gaussian random variables

Proposition 3.10**.**

Definition 3.11**.**

Remark 3.12*.*

Example 3.13*.*

Example 3.14*.*

3.3. The contraction axiom

Example 3.15*.*

Example 3.16*.*

4. Examples of Decompositions of Conditional Independence Ideals

4.1. The intersection axiom

Proposition 4.1** (Proposition 1 in [15]).**

Corollary 4.2**.**

4.2. The four-cycle

Proposition 4.3** (Theorem 5.6 in [24]).**

Proposition 4.4** (Theorem 6.5 in [24]).**

Corollary 4.5**.**

5. The Vanishing Ideal of a Graphical Model

Example 5.1*.*

Theorem 5.2** (Theorem 4.4 in [18]).**

Example 5.3*.*

Definition 5.4**.**

Definition 5.5**.**

Theorem 5.6**.**

Example 5.7*.*

Remark 5.8*.*

Proposition 1.1.

Proposition 1.2.

Definition 2.1.

*Example 2.2**.*

*Example 2.3**.*

Definition 2.4.

Definition 2.5.

Proposition 2.6.

*Example 2.7**.*

Definition 2.8.

Definition 2.9.

Proposition 2.10.

Definition 2.11.

Theorem 2.12.

Theorem 2.13.

Definition 2.14.

Theorem 2.15.

Definition 2.16.

Theorem 2.17.

Theorem 2.18.

*Example 2.19**.*

*Example 2.20**.*

Definition 2.21.

Definition 2.22.

*Example 2.23**.*

*Example 2.24**.*

Proposition 3.1.

Definition 3.2.

*Example 3.3**.*

Definition 3.4.

*Example 3.5**.*

Proposition 3.6.

*Example 3.7**.*

Definition 3.8.

*Example 3.9**.*

Proposition 3.10.

Definition 3.11.

*Remark 3.12**.*

*Example 3.13**.*

*Example 3.14**.*

*Example 3.15**.*

*Example 3.16**.*

Proposition 4.1 (Proposition 1 in [15]).

Corollary 4.2.

Proposition 4.3 (Theorem 5.6 in [24]).

Proposition 4.4 (Theorem 6.5 in [24]).

Corollary 4.5.

*Example 5.1**.*

Theorem 5.2 (Theorem 4.4 in [18]).

*Example 5.3**.*

Definition 5.4.

Definition 5.5.

Theorem 5.6.

*Example 5.7**.*

*Remark 5.8**.*