Integer programming on the junction tree polytope for influence diagrams

Axel Parmentier; Victor Cohen; Vincent Lecl\`ere; Guillaume Obozinski,; Joseph Salmon

arXiv:1902.07039·math.OC·July 8, 2019·INFORMS J. Optim.

Integer programming on the junction tree polytope for influence diagrams

Axel Parmentier, Victor Cohen, Vincent Lecl\`ere, Guillaume Obozinski,, Joseph Salmon

PDF

TL;DR

This paper introduces an integer programming approach on the junction tree polytope for solving influence diagrams, providing a mixed integer linear formulation and valid inequalities that improve computational efficiency and solution optimality.

Contribution

It develops a novel mixed integer linear programming formulation for influence diagrams using junction trees, enhancing solution efficiency and theoretical understanding.

Findings

01

Linear relaxation often yields optimal solutions for certain instances.

02

The approach improves computational efficiency over existing methods.

03

Valid inequalities strengthen the formulation and solution process.

Abstract

Influence Diagrams (ID) are a flexible tool to represent discrete stochastic optimization problems, including Markov Decision Process (MDP) and Partially Observable MDP as standard examples. More precisely, given random variables considered as vertices of an acyclic digraph, a probabilistic graphical model defines a joint distribution via the conditional distributions of vertices given their parents. In ID, the random variables are represented by a probabilistic graphical model whose vertices are partitioned into three types : chance, decision and utility vertices. The user chooses the distribution of the decision vertices conditionally to their parents in order to maximize the expected utility. Leveraging the notion of rooted junction tree, we present a mixed integer linear formulation for solving an ID, as well as valid inequalities, which lead to a computationally efficient…

Tables3

Table 1. (a) Results on chess game example

$(ω_{s}, ω_{a}, T)$	$\| Δ \|$	Polytope	Int. Gap	Final Gap	SPU Gap	Time (s)
$(3, 4, 20)$	$10^{48}$	${\bar{𝒬}}^{1}$	$5.18$	$0.35$	0.02	299.1
		${\bar{𝒬}}^{b}$	$4.62$	$0.28$	0.02	264.1
		$𝒬^{⊥ ⊥, 1}$	$1.00$	$0.20$	0.02	76.8
		$𝒬^{⊥ ⊥, b}$	$0.90$	$0.24$	0.02	57.1
$(3, 5, 20)$	$10^{69}$	${\bar{𝒬}}^{1}$	$5.49$	$0.46$	0.13	498.1
		${\bar{𝒬}}^{b}$	$5.05$	$0.46$	0.13	562.3
		$𝒬^{⊥ ⊥, 1}$	$1.38$	$0.27$	0.13	200.5
		$𝒬^{⊥ ⊥, b}$	$1.27$	$0.23$	0.013	181.7
$(3, 6, 20)$	$10^{93}$	${\bar{𝒬}}^{1}$	$4.17$	$0.48$	0.05	1948.9
		${\bar{𝒬}}^{b}$	$3.84$	$0.36$	0.05	1563.1
		$𝒬^{⊥ ⊥, 1}$	$0.73$	$0.20$	0.05	594.9
		$𝒬^{⊥ ⊥, b}$	$0.69$	$0.20$	0.05	1109.5
$(3, 9, 20)$	$10^{171}$	${\bar{𝒬}}^{1}$	$6.57$	$1.50$	0.16	2752 .2
		${\bar{𝒬}}^{b}$	$5.97$	$1.90$	0.13	3067.8
		$𝒬^{⊥ ⊥, 1}$	$1.05$	$0.37$	0.16	843.6
		$𝒬^{⊥ ⊥, b}$	$1.00$	$0.37$	0.13	868.2
$(3, 10, 20)$	$10^{200}$	${\bar{𝒬}}^{1}$	$6.99$	$2.28$	0.04	TL
		${\bar{𝒬}}^{b}$	$6.45$	$2.39$	0.04	TL
		$𝒬^{⊥ ⊥, 1}$	$1.33$	$0.82$	0.04	1759.3
		$𝒬^{⊥ ⊥, b}$	$1.25$	$0.81$	0.04	1758.3
$(4, 10, 20)$	$10^{200}$	${\bar{𝒬}}^{1}$	$8.49$	$4.59$	0.14	TL
		${\bar{𝒬}}^{b}$	$8.10$	$4.97$	0.03	TL
		$𝒬^{⊥ ⊥, 1}$	$2.40$	$1.77$	0.11	TL
		$𝒬^{⊥ ⊥, b}$	$2.26$	$1.74$	0.13	TL

Table 2. (a) Results on chess game example

$(ω_{s}, ω_{a}, T)$	$\| Δ \|$	Polytope	Int. Gap	Final Gap	SPU Gap	Time (s)
$(3, 4, 20)$	$10^{48}$	${\bar{𝒬}}^{1}$	$5.18$	$0.35$	0.02	299.1
		${\bar{𝒬}}^{b}$	$4.62$	$0.28$	0.02	264.1
		$𝒬^{⊥ ⊥, 1}$	$1.00$	$0.20$	0.02	76.8
		$𝒬^{⊥ ⊥, b}$	$0.90$	$0.24$	0.02	57.1
$(3, 5, 20)$	$10^{69}$	${\bar{𝒬}}^{1}$	$5.49$	$0.46$	0.13	498.1
		${\bar{𝒬}}^{b}$	$5.05$	$0.46$	0.13	562.3
		$𝒬^{⊥ ⊥, 1}$	$1.38$	$0.27$	0.13	200.5
		$𝒬^{⊥ ⊥, b}$	$1.27$	$0.23$	0.013	181.7
$(3, 6, 20)$	$10^{93}$	${\bar{𝒬}}^{1}$	$4.17$	$0.48$	0.05	1948.9
		${\bar{𝒬}}^{b}$	$3.84$	$0.36$	0.05	1563.1
		$𝒬^{⊥ ⊥, 1}$	$0.73$	$0.20$	0.05	594.9
		$𝒬^{⊥ ⊥, b}$	$0.69$	$0.20$	0.05	1109.5
$(3, 9, 20)$	$10^{171}$	${\bar{𝒬}}^{1}$	$6.57$	$1.50$	0.16	2752 .2
		${\bar{𝒬}}^{b}$	$5.97$	$1.90$	0.13	3067.8
		$𝒬^{⊥ ⊥, 1}$	$1.05$	$0.37$	0.16	843.6
		$𝒬^{⊥ ⊥, b}$	$1.00$	$0.37$	0.13	868.2
$(3, 10, 20)$	$10^{200}$	${\bar{𝒬}}^{1}$	$6.99$	$2.28$	0.04	TL
		${\bar{𝒬}}^{b}$	$6.45$	$2.39$	0.04	TL
		$𝒬^{⊥ ⊥, 1}$	$1.33$	$0.82$	0.04	1759.3
		$𝒬^{⊥ ⊥, b}$	$1.25$	$0.81$	0.04	1758.3
$(4, 10, 20)$	$10^{200}$	${\bar{𝒬}}^{1}$	$8.49$	$4.59$	0.14	TL
		${\bar{𝒬}}^{b}$	$8.10$	$4.97$	0.03	TL
		$𝒬^{⊥ ⊥, 1}$	$2.40$	$1.77$	0.11	TL
		$𝒬^{⊥ ⊥, b}$	$2.26$	$1.74$	0.13	TL

Table 3. (b) Results on POMDP example

$(ω_{s}, ω_{a}, T)$	$\| Δ \|$	Polytope	Int. Gap	Final Gap	SPU Gap	Time (s)
$(3, 4, 20)$	$10^{48}$	${\bar{𝒬}}^{1}$	$5.34$	$0.58$	$4.15$	563.1
		${\bar{𝒬}}^{b}$	$4.99$	$0.41$	$4.15$	384.6
		$𝒬^{⊥ ⊥, 1}$	$1.36$	Opt	$4.15$	77.8
		$𝒬^{⊥ ⊥, b}$	$1.11$	Opt	$4.15$	71.2
$(3, 5, 20)$	$10^{69}$	${\bar{𝒬}}^{1}$	$7.74$	$3.90$	$0.73$	1090.9
		${\bar{𝒬}}^{b}$	$7.23$	$3.60$	$0.69$	985.8
		$𝒬^{⊥ ⊥, 1}$	$1.85$	$0.78$	$0.73$	282.5
		$𝒬^{⊥ ⊥, b}$	$1.46$	$0.79$	$0.73$	245.6
$(3, 6, 20)$	$10^{93}$	${\bar{𝒬}}^{1}$	$9.01$	$5.68$	$0.74$	TL
		${\bar{𝒬}}^{b}$	$8.47$	$5.42$	$0.74$	TL
		$𝒬^{⊥ ⊥, 1}$	$1.67$	$1.02$	$0.74$	1935.0
		$𝒬^{⊥ ⊥, b}$	$1.37$	$1.00$	$0.74$	1533.8
$(3, 9, 20)$	$10^{171}$	${\bar{𝒬}}^{1}$	$8.09$	$5.94$	$1.67$	TL
		${\bar{𝒬}}^{b}$	$7.60$	$5.47$	$1.71$	TL
		$𝒬^{⊥ ⊥, 1}$	$2.45$	$1.86$	$1.59$	2729.6
		$𝒬^{⊥ ⊥, b}$	$2.07$	$1.87$	$1.60$	2894.9
$(3, 10, 20)$	$10^{200}$	${\bar{𝒬}}^{1}$	$12.40$	$10.0$	$1.24$	TL
		${\bar{𝒬}}^{b}$	$11.76$	$9.95$	$1.23$	TL
		$𝒬^{⊥ ⊥, 1}$	$4.45$	$3.86$	$1.05$	TL
		$𝒬^{⊥ ⊥, b}$	$3.87$	$3.77$	$1.11$	TL
$(4, 8, 20)$	$10^{144}$	${\bar{𝒬}}^{1}$	$12.90$	$9.89$	$1.20$	TL
		${\bar{𝒬}}^{b}$	$12.00$	$9.70$	$1.23$	TL
		$𝒬^{⊥ ⊥, 1}$	$3.14$	$2.27$	$1.20$	TL
		$𝒬^{⊥ ⊥, b}$	$2.43$	$2.22$	$1.23$	TL

Equations139

P (X_{V} = x_{V}) = v \in V \prod p_{v ∣ prt (v)} (x_{v} ∣ x_{prt (v)}),

P (X_{V} = x_{V}) = v \in V \prod p_{v ∣ prt (v)} (x_{v} ∣ x_{prt (v)}),

P_{δ} (X_{V} = x_{V}) = v \in V^{s} \prod p_{v ∣ prt (v)} (x_{v} ∣ x_{prt (v)}) v \in V^{a} \prod δ_{v ∣ prt (v)} (x_{v} ∣ x_{prt (v)}) .

P_{δ} (X_{V} = x_{V}) = v \in V^{s} \prod p_{v ∣ prt (v)} (x_{v} ∣ x_{prt (v)}) v \in V^{a} \prod δ_{v ∣ prt (v)} (x_{v} ∣ x_{prt (v)}) .

\max_{\delta\in\Delta}\quad\mathbb{E}_{\delta}\Bigg{(}\sum_{v\in V^{\mathrm{r}}}r_{v}(X_{v})\Bigg{)}.

\max_{\delta\in\Delta}\quad\mathbb{E}_{\delta}\Bigg{(}\sum_{v\in V^{\mathrm{r}}}r_{v}(X_{v})\Bigg{)}.

μ max

μ max

s.t.

μ_{s_{0}}^{0} = 1,

s, a \sum μ_{s a s^{'}}^{t} = μ_{s^{'}}^{t + 1},

s \sum μ_{s}^{t} = 1,

μ_{s}^{t}, μ_{s a}^{t}, μ_{s a s^{'}}^{t} \in {0, 1},

P_{μ} (X_{V} = x_{V}) = v \in V \prod P_{μ} (X_{v} = x_{v} ∣ X_{prt (v)} = x_{prt (v)}),

P_{μ} (X_{V} = x_{V}) = v \in V \prod P_{μ} (X_{v} = x_{v} ∣ X_{prt (v)} = x_{prt (v)}),

(X_{v} ⊥ ⊥ X_{V \ \overline{dsc}_{G} (v)} ∣ X_{prt (v)})_{μ} for all v in V .

(X_{v} ⊥ ⊥ X_{V \ \overline{dsc}_{G} (v)} ∣ X_{prt (v)})_{μ} for all v in V .

x_{C_{1} \ C_{2}} \sum τ_{C_{1}} = x_{C_{2} \ C_{1}} \sum τ_{C_{2}},

x_{C_{1} \ C_{2}} \sum τ_{C_{1}} = x_{C_{2} \ C_{1}} \sum τ_{C_{2}},

\mathcal{L}^{0}_{\mathcal{G}}=\left\{(\tau_{C})_{C\in\mathcal{V}}\colon\left|\begin{array}[]{l}\displaystyle\tau_{C}\geq 0\quad\text{and}\quad\sum_{x_{C}}\tau_{C}(x_{C})=1\quad\forall x_{C}\in\mathcal{X}_{C},\>\forall C\in\mathcal{V},\\ \text{ and }\quad\displaystyle\sum_{x_{C_{1}\backslash C_{2}}}\tau_{C_{1}}=\sum_{x_{C_{2}\backslash C_{1}}}\tau_{C_{2}},\quad\forall\{C_{1},C_{2}\}\in\mathcal{A},\end{array}\right.\right\}

\mathcal{L}^{0}_{\mathcal{G}}=\left\{(\tau_{C})_{C\in\mathcal{V}}\colon\left|\begin{array}[]{l}\displaystyle\tau_{C}\geq 0\quad\text{and}\quad\sum_{x_{C}}\tau_{C}(x_{C})=1\quad\forall x_{C}\in\mathcal{X}_{C},\>\forall C\in\mathcal{V},\\ \text{ and }\quad\displaystyle\sum_{x_{C_{1}\backslash C_{2}}}\tau_{C_{1}}=\sum_{x_{C_{2}\backslash C_{1}}}\tau_{C_{2}},\quad\forall\{C_{1},C_{2}\}\in\mathcal{A},\end{array}\right.\right\}

\big{(}X_{v}\,\bot\!\!\!\bot\,X_{C\backslash\overline{\mathrm{dsc}}(v)}|X_{\mathrm{prt}(v)}\big{)}_{\tau_{C}}\quad\text{for all }C\in\mathcal{V},\text{ for all }v\in V\colon\mathrm{fa}(v)\subseteq C.

\big{(}X_{v}\,\bot\!\!\!\bot\,X_{C\backslash\overline{\mathrm{dsc}}(v)}|X_{\mathrm{prt}(v)}\big{)}_{\tau_{C}}\quad\text{for all }C\in\mathcal{V},\text{ for all }v\in V\colon\mathrm{fa}(v)\subseteq C.

\mathcal{L}_{\mathcal{G}}=\bigg{\{}(\mu_{C_{v}},\mu_{\check{C}_{v}})_{v\in V}\colon(\mu_{C_{v}})_{v\in V}\in\mathcal{L}_{\mathcal{G}}^{0}\text{ and }\mu_{\check{C}_{v}}=\sum_{x_{v}}\mu_{C_{v}}\bigg{\}},

\mathcal{L}_{\mathcal{G}}=\bigg{\{}(\mu_{C_{v}},\mu_{\check{C}_{v}})_{v\in V}\colon(\mu_{C_{v}})_{v\in V}\in\mathcal{L}_{\mathcal{G}}^{0}\text{ and }\mu_{\check{C}_{v}}=\sum_{x_{v}}\mu_{C_{v}}\bigg{\}},

\overline{\mathcal{P}}(G,\mathcal{X},\mathfrak{p},\mathcal{G})=\big{\{}\mu\in\mathcal{L}_{\mathcal{G}}\colon\mu_{C_{v}}=\mu_{\check{C}_{v}}\,p_{v|\mathrm{prt}(v)}\text{ for all }v\in V^{\mathrm{s}}\big{\}}.

\overline{\mathcal{P}}(G,\mathcal{X},\mathfrak{p},\mathcal{G})=\big{\{}\mu\in\mathcal{L}_{\mathcal{G}}\colon\mu_{C_{v}}=\mu_{\check{C}_{v}}\,p_{v|\mathrm{prt}(v)}\text{ for all }v\in V^{\mathrm{s}}\big{\}}.

μ, δ max

μ, δ max

s.t.

δ \in Δ

μ_{C_{v}} = δ_{v ∣ prt (v)} μ_{\overset{ˇ}{C}_{v}},

μ_{A \cup P \cup D} = μ_{A \cup P} p_{D ∣ P} ⟹ X_{D} ⊥ ⊥ X_{A} ∣ X_{P},

μ_{A \cup P \cup D} = μ_{A \cup P} p_{D ∣ P} ⟹ X_{D} ⊥ ⊥ X_{A} ∣ X_{P},

\mathcal{S}(G)=\big{\{}\mu\in\overline{\mathcal{P}}\colon\exists\delta\in\Delta,\mu_{C_{v}}=\mu_{\check{C}_{v}}\delta_{v|\mathrm{prt}_{G}(v)}\text{ for all $v$ in $V^{\mathrm{a}}$}\big{\}},

\mathcal{S}(G)=\big{\{}\mu\in\overline{\mathcal{P}}\colon\exists\delta\in\Delta,\mu_{C_{v}}=\mu_{\check{C}_{v}}\delta_{v|\mathrm{prt}_{G}(v)}\text{ for all $v$ in $V^{\mathrm{a}}$}\big{\}},

μ \in S (G) max v \in V^{r} \sum ⟨ r_{v}, μ_{v} ⟩ .

μ \in S (G) max v \in V^{r} \sum ⟨ r_{v}, μ_{v} ⟩ .

δ_{v ∣ prt (v)} (x_{fa (v)}) \in {0, 1}, \forall x_{fa (v)} \in X_{fa (v)}, \forall v \in V^{a} .

δ_{v ∣ prt (v)} (x_{fa (v)}) \in {0, 1}, \forall x_{fa (v)} \in X_{fa (v)}, \forall v \in V^{a} .

\mathbb{P}_{\delta^{\prime}}\big{(}X_{\check{C}_{v}}=x_{\check{C}_{v}}\big{)}\leq b_{\check{C}_{v}}(x_{\check{C}_{v}})\qquad\forall\delta^{\prime}\in\Delta,\quad\forall v\in V^{\mathrm{a}},\quad\forall x_{\check{C}_{v}}\in\mathcal{X}_{\check{C}_{v}}.

\mathbb{P}_{\delta^{\prime}}\big{(}X_{\check{C}_{v}}=x_{\check{C}_{v}}\big{)}\leq b_{\check{C}_{v}}(x_{\check{C}_{v}})\qquad\forall\delta^{\prime}\in\Delta,\quad\forall v\in V^{\mathrm{a}},\quad\forall x_{\check{C}_{v}}\in\mathcal{X}_{\check{C}_{v}}.

\left\{\begin{array}[]{l}\displaystyle\mu_{C_{v}}\geq\mu_{\check{C}_{v}}+(\delta_{v|\mathrm{prt}(v)}-1)\,b_{\check{C}_{v}},\\ \displaystyle\mu_{C_{v}}\leq\delta_{v|\mathrm{prt}(v)}\,b_{\check{C}_{v}},\\ \mu_{C_{v}}\leq\mu_{\check{C}_{v}}.\end{array}\right.

\left\{\begin{array}[]{l}\displaystyle\mu_{C_{v}}\geq\mu_{\check{C}_{v}}+(\delta_{v|\mathrm{prt}(v)}-1)\,b_{\check{C}_{v}},\\ \displaystyle\mu_{C_{v}}\leq\delta_{v|\mathrm{prt}(v)}\,b_{\check{C}_{v}},\\ \mu_{C_{v}}\leq\mu_{\check{C}_{v}}.\end{array}\right.

\mathcal{Q}^{b}(G,\mathcal{X},\mathfrak{p},\mathcal{G})=\Big{\{}(\mu,\delta)\in\mathcal{L}_{\mathcal{G}}\times\Delta\colon{\rm McCormick}(v,b)\text{ is satisfied for all }v\in V^{\mathrm{a}}\Big{\}}.

\mathcal{Q}^{b}(G,\mathcal{X},\mathfrak{p},\mathcal{G})=\Big{\{}(\mu,\delta)\in\mathcal{L}_{\mathcal{G}}\times\Delta\colon{\rm McCormick}(v,b)\text{ is satisfied for all }v\in V^{\mathrm{a}}\Big{\}}.

μ, δ max

μ, δ max

μ \in \overline{P} (G, X, p, G)

δ \in Δ^{d}

(μ, δ) \in Q^{b}

P_{μ} (X_{v} ∣ X_{V \ dsc (v)}) = P_{μ} (X_{v} ∣ X_{\overset{ˇ}{C}_{v}}) = p_{v ∣ prt (v)} for all v in V^{s},

P_{μ} (X_{v} ∣ X_{V \ dsc (v)}) = P_{μ} (X_{v} ∣ X_{\overset{ˇ}{C}_{v}}) = p_{v ∣ prt (v)} for all v in V^{s},

μ_{C} = μ_{C \ D} p_{D ∣ C \ D} .

μ_{C} = μ_{C \ D} p_{D ∣ C \ D} .

\mu_{C}=\mu_{C^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$\displaystyle\not$\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$\textstyle\not$\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$\scriptstyle\not$\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$\scriptscriptstyle\not$\hss}{\,\bot\!\!\!\bot\,}}}\!}}}p_{C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}|C^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$\displaystyle\not$\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$\textstyle\not$\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$\scriptstyle\not$\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$\scriptscriptstyle\not$\hss}{\,\bot\!\!\!\bot\,}}}\!}}},\quad\forall C\in\mathcal{V},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Integer programming on the junction tree polytope for influence diagrams

Axel Parmentier1, Victor Cohen1, Vincent Leclère1, Guillaume Obozinski2, Joseph Salmon3

(1Université Paris-Est, CERMICS (ENPC), F-77455 Marne-la-Vallée, France

2 Swiss Data Science Center, EPFL & ETH Zürich, Switzerland

3 IMAG, Univ Montpellier, CNRS, Montpellier, France

)

Abstract

Keywords: Influence diagrams, Partially Observed Markov Decision Processes, Probabilistic graphical models, Linear Programming.

Influence Diagrams (ID) are a flexible tool to represent discrete stochastic optimization problems, including Markov Decision Process (MDP) and Partially Observable MDP as standard examples. More precisely, given random variables considered as vertices of an acyclic digraph, a probabilistic graphical model defines a joint distribution via the conditional distributions of vertices given their parents. In ID, the random variables are represented by a probabilistic graphical model whose vertices are partitioned into three types : chance, decision and utility vertices. The user chooses the distribution of the decision vertices conditionally to their parents in order to maximize the expected utility. Leveraging the notion of rooted junction tree, we present a mixed integer linear formulation for solving an ID, as well as valid inequalities, which lead to a computationally efficient algorithm. We also show that the linear relaxation yields an optimal integer solution for instances that can be solved by the “single policy update”, the default algorithm for addressing IDs.

1 Introduction

In this paper we want to address stochastic optimization problems with structured information and discrete decision variables, via mixed integer linear reformulations. We start by recalling the framework of influence diagrams (more details can be found in (Koller and Friedman, 2009, Chapter 23)), and present the classical linear formulation for some special cases.

1.1 The framework of parametrized influence diagram

Let $G=(V,E)$ be a directed graph, and, for each vertex $v$ in $V$ , let $X_{v}$ be a random variable taking value in a finite state space $\mathcal{X}_{v}$ . For any $C\subset V,$ let $X_{C}$ denote $(X_{v})_{v\in C}$ and $\mathcal{X}_{C}$ be the cartesian product $\mathcal{X}_{C}=\prod_{v\in C}\mathcal{X}_{v}.$ We say that the distribution of the random vector $X_{V}$ factorizes as a directed graphical model on $G$ if, for all $x_{V}\in\mathcal{X}_{V}$ , we have

[TABLE]

where $\mathrm{prt}(v)$ is the set of parents of $v$ , that is, the set of vertices $u$ such that $(u,v)$ belongs to $E$ , and $p_{v|\mathrm{prt}(v)}(x_{v}|x_{\mathrm{prt}(v)})=\mathbb{P}(X_{v}=x_{v}|X_{\mathrm{prt}(v)}=x_{\mathrm{prt}(v)})$ . Further, given an arbitrary collection of conditional distributions $\left\{p_{v|\mathrm{prt}(v)}\right\}_{v\in V}$ , Equation (1) uniquely defines a probability distribution on $\mathcal{X}_{V}$ .

Let $(V^{\mathrm{a}},V^{\mathrm{c}},V^{\mathrm{r}})$ be a partition of $V$ where $V^{\mathrm{c}}$ is the set of chances vertices, $V^{\mathrm{a}}$ is the set of decision vertices, and $V^{\mathrm{r}}$ is the set of utility vertices (the ones with no descendants). For ease of notation we denote $V^{\mathrm{s}}=V^{\mathrm{c}}\cup V^{\mathrm{r}}$ . Letters $\mathrm{a}$ , $\mathrm{r}$ , and $\mathrm{s}$ respectively stand for action, reward, and state in $V^{\mathrm{a}}$ , $V^{\mathrm{r}}$ , and $V^{\mathrm{s}}$ . We say that $G=(V^{\mathrm{s}},V^{\mathrm{a}},E)$ is an Influence Diagram (ID). Consider a set of conditional distributions $\mathfrak{p}=\left\{p_{v|\mathrm{prt}(v)}\right\}_{v\in V^{\mathrm{c}}\cup V^{\mathrm{r}}}$ , and a collection of reward functions $r=\{r_{v}\}_{v\in V^{\mathrm{r}}}$ , with $r_{v}:\mathcal{X}_{v}\rightarrow\mathbb{R}$ . Then we call $(G,\mathcal{X}_{V},\mathfrak{p},r)$ a Parametrized Influence Diagram (PID). We will sometimes refer to the parameters $(\mathcal{X}_{V},\mathfrak{p},r)$ by $\rho$ for conciseness.

Let $\Delta_{v}$ denote the set of conditional distributions $\delta_{v|\mathrm{prt}(v)}$ on $\mathcal{X}_{v}$ given $\mathcal{X}_{\mathrm{prt}(v)}$ . Given the set of conditional distributions $\mathfrak{p}$ , a policy $\delta$ in $\Delta=\prod_{v\in V^{\mathrm{a}}}\Delta_{v}$ , uniquely defines a distribution $\mathbb{P}_{\delta}$ on $\mathcal{X}_{V}$ through

[TABLE]

Let $\mathbb{E}_{\delta}$ denote the corresponding expectation. The Maximum Expected Utility (MEU) problem associated to the PID $(G,\mathcal{X}_{V},\mathfrak{p},r)$ is the maximization problem

[TABLE]

A deterministic policy $\delta\in\Delta^{\mathrm{d}}\subset\Delta$ , is such that for every $v\in V^{\mathrm{a}}$ , and any $x_{v},x_{\mathrm{prt}(v)}\in\mathcal{X}_{v}\times\mathcal{X}_{\mathrm{prt}(v)}$ , $\delta_{v|\mathrm{prt}(v)}(x_{v}|x_{\mathrm{prt}(v)})$ is a Dirac measure. It is well known that there always exists an optimal solution to MEU (3) that is deterministic (see e.g., (Liu, 2014, Lemma C.1) for a proof).

We conclude this section with some classical examples of IDs, shown in Figure 1.

*Example 1**.*

Consider a maintenance problem where at time $t$ a machine is in state $s_{t}$ . The action $a_{t}$ taken by the decision maker according to the current state is typically maintaining it (which is costly) or not (which increases the probability of failure). State and decision together lead to a new (random) state $s_{t+1}$ , and the triple $(s_{t},a_{t},s_{t+1})$ induces a reward $r_{t}$ . This is an example of a Markov decision process (MDP) which is probably the simplest ID, represented in Figure 1(a).

In practice, the actual state $s_{t}$ of the machine is often not known, but we only have some observation $o_{t}$ carrying partial information about the state, which leads to a more complex ID known as a partially observed Markov decision process (POMDP). In theory, an optimal decision should be taken knowing all past observations and decisions (which is the perfect recall case). However, this would lead to policies living in spaces of exponentially large dimension and untractable MEU problems. It is thus common to restrict the decision $a_{t}$ to be made only based on observation $o_{t}$ , as illustrated in Figure 1(b). ∎

*Example 2**.*

Consider two chess players : Bob and Alice. They are used to play chess and for each game they bet a symbolic coin. However, they can refuse to play. Suppose that Alice wants to play chess every day. On the day $t$ , she has a current confidence level $s_{t}$ . The day of the game, her current mental fitness is denoted $o_{t}$ . When Bob meets with Alice, he makes the decision to play depending on her demeanor, denoted $u_{t}$ . Then Bob can accept or decline the challenge, and his decision is denoted $a_{t}$ . Let $v_{t}$ denote the winner (getting a reward $r_{t}$ ). If Bob declines the challenge, there is no winner and no reward. Then, Alice’s next confidence level is affected by the result of the game and her previous confidence level. This stochastic decision problem can be modeled by an influence diagram as shown in Figure 2.

∎

1.2 Solving MDP through linear programs

We recall here a well known linear programming formulation for MDP (see e.g., Puterman (2014)), which is a special case of the Mixed Integer Linear Program (MILP) formulation introduced in the paper. We denote by $p(s^{\prime}|s,a)$ the probability of transiting from state $s$ to state $s^{\prime}$ if action $a$ is taken, and $r(s,a,s^{\prime})$ the reward associated to this transition. For $t\in[T]:=\{1,\ldots,T\}$ , let $\mu_{s}^{t}$ represent the probability of being in state $s$ at time $t$ , let $\mu_{sa}^{t}$ represent the probability of being in state $s$ and taking action $a$ at time $t$ , and let $\mu_{sas^{\prime}}^{t}$ represent the probability of being in state $s$ and taking action $a$ at time $t$ , while transiting to state $s^{\prime}$ at time $t+1$ . This leads to the following mixed integer linear program

[TABLE]

where the objective (4a) is simply the expected reward, Constraints (4b) represent the state dynamics, Constraints (4c) set the initial state of the system to $s_{0}$ , and Constraints (4d)-(4f) ensure that $\mu$ represent marginals laws of a joint distribution. Integrity constraints (4f) ensure that the policy chosen is deterministic. In the MDP case, we can drop these integrity constraints and still obtain an optimal solution. In Section 4, the integrity constraints will come out to be useful in the general case.

1.3 Literature

Influence diagrams were introduced by Howard and Matheson (1984) (see also Howard and Matheson, 2005) to model stochastic optimization problems using a probabilistic graphical model framework. Originally, the decision makers were assumed to have perfect recall (Shenoy, 1992; Shachter, 1986; Jensen et al., 1994) of the past actions family.

Lauritzen and Nilsson (2001) relaxed this assumption111These authors used the name limited memory influence diagrams when relaxing the perfect recall assumption, but we follow the convention of Koller and Friedman (2009) who still call them influence diagrams (ID). and provided a simple (coordinate descent) algorithm to find a good policy: the Single Policy Update (SPU) algorithm. The same authors also introduced the notion of soluble ID as a sufficient condition for SPU to converge to an optimal solution. This notion has been generalized by Koller and Milch (2003) to obtain a necessary and sufficient condition. In general, SPU only finds a locally optimal policy, and requires to perform exact inference, so that it is therefore limited by the treewidth (Chandrasekaran et al., 2008). More recently, Mauá and Campos (2011) and Mauá and Cozman (2016) have introduced a new algorithm, Multiple Policy Update, which has both an exact and a heuristic version and relies on dominance to discard partial solutions. It can be interpreted as a generalization of SPU where several decisions are considered simultaneously. Later on, Khaled et al. (2013) proposed a similar approach, with a Branch-and-Bound flavor, while Liu (2014) introduced heuristics based on approximate variational inference. Finally, Maua (2016) has recently shown that the problem of solving an ID can be polynomially transformed into a maximum a posteriori (MAP) problem, and hence can be solved using popular MAP solvers such as toulbar2 (Hurley et al., 2016).

Finding an optimal policy for an ID has been shown to be NP-hard even when restricted to IDs of treewidth non-greater than two, or to trees with binary variables Mauá et al. (2012a, 2013). Note that even obtaining an approximate solution is also NP-hard Mauá et al. (2012a).

Credal networks are generalizations of probabilistic graphical models where the parameters of the model are not known exactly. MILP formulations for credal networks that could be applied to IDs have been introduced by de Campos and Cozman (2007); de Campos and Ji (2012). However, the number of variables they require is exponential in the pathwidth, which is non-smaller and can be arbitrarily larger than the width of the tree we are using (follows from (Scheffler, 1990, Theorem 4)), and the linear relaxation of their MILP is not as good as the one of the MILP we propose, and does not yield an integer solution on soluble IDs. Our approach can naturally be extended to credal networks.

1.4 Contributions

The contributions of the paper are as follows.

•

We introduce a non-linear program and a mixed integer linear program for the MEU problem on influence diagrams.

•

These mathematical programs rely on a variant of the concept of a strong junction tree which we introduce and call a rooted junction tree. We provide algorithms to build rooted junction trees that lead to “good” mathematical programs for influence diagrams.

•

We introduce a particular form of valid cuts for the obtained mixed integer linear program. These valid cuts leverage conditional independence properties in the influence diagram. We show that our cuts are the strongest ones in a certain sense. We believe that this idea of leveraging conditional independence to obtain valid cuts is fairly general and could be extended to other contexts.

•

We establish a link between the linear relaxation of our MILP and the concept of soluble relaxation previously introduced in the literature on influence diagrams. In fact, our relaxation provides a better bound than those relaxations.

•

We provide two new characterizations of soluble influence diagrams. First, as the only influence diagrams that can be solved to optimality using the linear relaxation of our mixed integer linear program. Second, and more importantly, as the influence diagrams for which there exists a rooted junction tree such that the set of collections of moments of distributions that are induced by the different policies is convex.

•

We illustrate our mathematical programs and their properties on some simple numerical examples.

1.5 Organization of the paper

In Section 2, we recall some definitions for graphical models, that are used to extend the notion of junction tree to rooted junction tree in Section 3. With these tools, Section 4 introduces a bilinear formulation that can be rewritten as a mixed integer linear programming (MILP) formulation to the MEU Problem (3). In Section 5 we give efficient valid cutsfor the MILP formulation, and interpret them in terms of graph relaxations. Section 6 studies the polynomial case of soluble ID, showing that the ID that can be solved to optimality by SPU can be solved by (continuous) linear programming using our formulation. Finally Section 7 summarizes our numerical experiments.

2 Tools from Probabilistic graphical model theory

In this section we present notations and tools used in the following sections to refomulate the MEU Problem 3.

2.1 Graph notation

This section introduces our notations for graphs, which are for the most part the ones commonly used in the combinatorial optimization community (Schrijver, 2003). A directed graph $G$ is a pair $(V,E)$ where $V$ is the set of vertices and $E\subseteq V^{2}$ the set of arcs. We write $u\rightarrow v$ when $(u,v)\in E$ . Let $[k]:=\{1,\ldots,k\}$ . A path is a sequence of vertices $v_{1},\ldots,v_{k}$ such that $v_{i}\rightarrow v_{i+1}$ , for any $i\in[k-1]$ . A path between two vertices $u$ and $v$ is called a $u$ - $v$ path. We write $u\leavevmode\hbox to20.32pt{\vbox to17.03pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.89749pt}{3.533pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ G $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}v$ to denote the existence of a $u$ - $v$ path in $G$ , or simply $u\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}v$ when $G$ is clear from context. We write $u\rightleftharpoons v$ if there is an arc $u\rightarrow v$ or $v\rightarrow u$ . A trail is a sequence of vertices $v_{1},\ldots,v_{k}$ such that $v_{i}\rightleftharpoons v_{i+1},$ for all $i\in[k-1]$ .

A parent (resp. child) of a vertex $v$ is a vertex $u$ such that $(u,v)$ (resp. $(v,u)$ ) belongs to $E$ ; we denote by $\mathrm{prt}(v)$ the set of parents vertices (resp. $\mathrm{cld}(v)$ the set of children vertices).

The family of $v$ , denoted by $\mathrm{fa}(v)$ , is the set $\{v\}\cup\mathrm{prt}(v)$ . A vertex $u$ is an ascendant (resp. a descendant) of $v$ if there exists a $u$ - $v$ path. We denote respectively $\mathrm{asc}(v)$ and $\mathrm{dsc}(v)$ the set of ascendants and descendants of $v$ . Finally, let $\overline{\mathrm{asc}}(v)=\{v\}\cup\mathrm{asc}(v)$ , and $\overline{\mathrm{dsc}}(v)=\{v\}\cup\mathrm{dsc}(v)$ . For a set of vertices $C$ , the parent set of $C$ , again denoted by $\mathrm{prt}(C)$ , is the set of vertices $u$ that are parents of a vertex $v\in C$ . We define similarly $\mathrm{fa}(C)$ , $\mathrm{cld}(C)$ , $\mathrm{asc}(C)$ , and $\mathrm{dsc}(C)$ . Note that we sometimes indicate in subscript the graph according to which the parents, children, etc., are taken. For instance, $\mathrm{prt}_{G}(v)$ denotes the parents of $v$ in $G$ . We drop the subscript when the graph is clear from the context.

A cycle is a path $v_{1},\ldots,v_{k}$ such that $v_{1}=v_{k}$ . A graph is connected if there exists a path between any pair of vertices. An undirected graph is a tree if it is connected and has no cycles. A directed graph is a directed tree if its underlying undirected graph is a tree. A rooted tree is a directed tree such that all vertices have a common ascendant referred to as the root of the tree222The probabilistic graphical model community sometimes calls a directed tree what we call here a rooted tree, and a polytree what we call here a directed tree.. In a rooted tree, all vertices but the root have exactly one parent.

2.2 Directed graphical model

In this paper, we manipulate several distributions on the same random variables. Given three random variables $X$ , $Y$ , $Z$ , the notation $\big{(}X\,\bot\!\!\!\bot\,Y|Z\big{)}_{\mu}$ stands for “ $X$ is independent from $Y$ given $Z$ according to $\mu$ ”. The parenthesis $(\cdot)_{\mu}$ are dropped when $\mu$ is clear from context. The same notation is used for independence of events.

A well-known sufficient condition for a distribution to factorize as a probabilistic graphical model is that each vertex is independent from its non-descendants given its parents.

Proposition 1.

(Koller and Friedman, 2009, Theorem 3.1, p. 62)* Let $\mathbb{P}_{\mu}$ be a distribution on $\mathcal{X}_{V}$ . Then $\mathbb{P}_{\mu}$ factorizes as a directed graphical model on $G$ , that is*

[TABLE]

if and only if

[TABLE]

2.3 Junction trees

When dealing with the MEU Problem 3, one needs to deal with distributions $\mu_{V}$ on $\mathcal{X}_{V}$ that factorize as in (2) for some policy $\delta$ . In theory, it suffices to consider distributions $\mu_{V}$ satisfying the conditional independences given by Equation 5 and such that $\mathbb{P}_{\mu}(X_{v}|X_{\mathrm{prt}(v)})=p_{v|\mathrm{prt}(v)}$ for each vertex $v$ that is not a decision. However, the joint distribution $\mu_{V}$ on all the variables is too large to be manipulated in practice as soon as $V$ is moderately large. In that case, it is handy to work with a vector of moments $\tau=(\tau_{C})_{C\in\mathcal{V}}$ , where $\mathcal{V}\subseteq 2^{V}$ , that is, a vector of distributions $\tau_{C}$ on subsets of variables $C$ of tractable size. A vector of moment $(\tau_{C})_{C\in\mathcal{V}}$ derives from a distribution $\mu_{V}$ on $\mathcal{X}_{V}$ if each moment $\tau_{C}\in[0,1]^{\mathcal{X}_{C}}$ is the marginal of $\mu_{V}$ , i.e., $\tau_{C}(x_{C})=\sum_{x_{V\backslash C}\in\mathcal{X}_{V\backslash C}}\mu_{V}(x_{C},x_{V\backslash C})$ for all $C$ in $\mathcal{V}$ and $x_{C}$ in $\mathcal{X}_{C}$ . To keep notations light, we will write this type of equality more compactly as $\tau_{C}=\sum_{x_{V\backslash C}}\mu_{V}$ . We use the notation $\mu=(\mu_{C})_{C\in\mathcal{V}}$ for the vector of moments deriving from a distribution, and $\mathbb{P}_{\mu}$ or $\mu_{V}$ for the corresponding distribution on $\mathcal{X}_{V}$ .

A necessary condition for a vector of moments $(\tau_{C})_{C\in\mathcal{V}}$ to derive from a distribution is to be locally consistent, that is to induce the same marginals on the intersections of pairs of elements of $\mathcal{V}$ , i.e., that for all $C_{1},C_{2}\in\cal V,$ we have

[TABLE]

where, as before, $\sum_{x_{C_{1}\backslash C_{2}}}\tau_{C_{1}}$ is the vector $\big{(}\sum_{x_{C_{1}\backslash C_{2}}\in\mathcal{X}_{C_{1}\backslash C_{2}}}\tau_{C_{1}}(x_{C_{1}\backslash C_{2}},x_{C_{1}\cap C_{2}})\big{)}_{x_{C_{1}\cap C_{2}}\in\mathcal{X}_{C_{1}\cap C_{2}}}$ . It turns out that graphical model theory provides a condition on the choice of $\mathcal{V}$ together with the choice of local consistency constraints which are sufficient for $(\tau_{C})_{C\in\mathcal{V}}$ to derive from a distribution on $\mathcal{X}_{V}$ . This is done via the definition of a junction tree. Let $\mathcal{G}=(\mathcal{V},\mathcal{A})$ be an undirected graph associated with $G=(V,E)$ with $\mathcal{V}\subseteq 2^{V}$ , and such that there is a mapping $v\mapsto C_{v}$ from $V$ to $\mathcal{V}$ satisfying that $\mathrm{fa}(v)\subseteq C_{v}$ . If $\mathcal{G}$ is a tree, and satisfies the running intersection property, i.e., that given two vertices $C_{1}$ and $C_{2}$ in $\mathcal{V}$ , any vertex $C$ on the unique undirected path from $C_{1}$ to $C_{2}$ in $\mathcal{G}$ satisfies $C_{1}\cap C_{2}\subset C$ , then $\mathcal{G}$ is called a junction tree of $G$ . See Figure 3 for an illustration of this notion. Given a junction tree $\mathcal{G}$ , its associated marginal polytope $\mathcal{L}^{0}_{\mathcal{G}}$ is defined as follows

[TABLE]

Then $\tau=(\tau_{C})_{C\in\mathcal{V}}$ is a vector of moments deriving from a distribution $\mu_{V}$ on $\mathcal{X}_{V}$ if and only if $\tau\in\mathcal{L}^{0}_{\mathcal{G}}$ (Wainwright and Jordan, 2008, Proposition 2.1).

3 Rooted junction trees

To solve the MEU Problem (3), we work on vectors of moments $(\mu_{C})_{C\in\mathcal{V}}$ that correspond to the moments of distributions $\mu$ induced by policies $\delta\in\Delta$ . Hence, we are interested in vectors $\mu$ of moments such that $\mu_{V}$ factorizes as a directed graphical model on $G$ . Such vectors of moments necessarily satisfy a “local” version of the sufficient condition (5), which is that for $\tau_{C}=\mu_{C},$

[TABLE]

Given a vector of moment $\tau_{C}$ in the local polytope of a junction tree $(\mathcal{V},\mathcal{A})$ , satisfying (7) is not a sufficient condition for $\tau_{C}$ to be the moments of a distribution $\mu_{V}$ that factorizes on $G$ . But it becomes a sufficient condition under the additional assumption that $(\mathcal{V},\mathcal{A})$ is a “rooted junction tree”, a notion that we introduce in this section, and develop in more detail in Appendix A.

3.1 Definition and main properties

Let $\mathcal{G}=(V,E)$ be a junction tree on $G=(\mathcal{V},\mathcal{A})$ and $v\in V$ a vertex of $G$ . Then, thanks to the running intersection property, the subgraph $\mathcal{G}_{v}$ of $\mathcal{G}$ made of all nodes $C\in\mathcal{V}$ containing $v$ is a tree. Moreover, any orientation of the edges of $\mathcal{G}$ that makes it a rooted tree, also makes $\mathcal{G}_{v}$ a rooted tree, and we denote $C_{v}$ its root node.

Definition 1.

A rooted junction tree (RJT) on $G=(V,E)$ is a rooted tree with nodes in $2^{V}$ , such that

(i)

its underlying undirected graph $\mathcal{G}=(\mathcal{V},\mathcal{A})$ is a junction tree,

(ii)

for all $v\in V$ , we have $\mathrm{fa}(v)\subseteq C_{v}$ ,

where $C_{v}$ is the root clique of $v$ defined as the root node of the subgraph $\mathcal{G}_{v}$ of $\mathcal{G}$ induced by the nodes $C\in\mathcal{V}$ containing $v$ .

Let $\mathcal{G}$ be an RJT on $G$ , and $v$ a vertex of $V$ . Given $C\in\mathcal{G}$ , let $\mathring{C}=\{v\in V:C_{v}=C\}$ , which we call the offspring of $C,$ and let $\check{C}$ denote $C\backslash\mathring{C}$ .

See Figure 3 for a graphical example of this notion. Note that an RJT always exists: Indeed, the cluster graph composed of a single vertex $C=V$ is an RJT. Algorithms to build interesting RJT are provided in Section 3.2.

Theorem 2, which is a natural generalization of the well-known Proposition 1, ensures that given a vector of moments on an RJT that satisfies local independences, we can construct a distribution on the initial directed graphical model which admits these moments as marginals.

Theorem 2.

Let $\mu$ be a vector of moments in the local polytope of an RJT $\mathcal{G}$ on $G=(V,E)$ . Suppose that for each vertex $v$ , according to $\mu_{C_{v}}$ , the variable $X_{v}$ is independent from its non-descendants in $G$ that are in $C_{v}$ , conditionally to its parents. Then there exists a distribution $\mathbb{P}_{\mu}$ on $\mathcal{X}_{V}$ factorizing on $G$ with moments $\mu$ .

*Remark 1**.*

By adding nodes to an RJT, we can always turn it into an RJT satisfying $\mathring{C_{v}}=\{v\}$ for each vertex $v$ in $V^{\mathrm{a}}$ . Indeed, suppose that $\mathring{C}=\{v_{1},\ldots,v_{k}\}$ , where $v_{1},\ldots,v_{k}$ are given along a topological order. It suffices to replace the node $C$ by $C_{1}\rightarrow C_{2}\rightarrow\dots\rightarrow C_{k}$ , where $C_{i}=C\backslash\{v_{i+1},\ldots,v_{k}\}$ . Note that for such RJTs we have $\check{C}_{v}=C_{v}\backslash\{v\}.$ ∎

*Remark 2**.*

Jensen et al. (1994, beginning of Section 4) introduces a similar notion of strong junction tree. It relies on the notion of elimination ordering for a given influence diagram with perfect recall. The main difference is that a strong junction tree is a notion on an influence diagram, where the set of decision vertices and their orders play a role, when RJTs rely on the underlying digraph. The notion of strong junction tree is obtained by replacing (ii) in the definition of an RJT by: “given an elimination ordering, if $(C_{u},C_{v})$ is an arc, there exists an ordering of $C_{v}$ that respects the elimination ordering such that $C_{u}\cap C_{v}$ is before $C_{v}\backslash C_{u}$ in that ordering”. An RJT is a strong junction tree. Conversely, a strong junction tree is not necessarly an RJT. Indeed, Jensen et al. (1994, Figure 4) shows an example of strong junction where there is $v\in V$ such that $\mathrm{fa}(v)\subsetneq C_{v}$ . As strong junction trees is a notion on influence diagram and not on graphs, Theorem 2 has no natural generalization for strong junction trees. ∎

3.2 Building an RJT

Although $(\{V\},\emptyset)$ is a rooted junction tree, the concept has only practical interest if it is possible to construct RJTs with small cluster nodes. In that respect, note that any RJT must satisfy, for all $u,v\in V,$ the implication

[TABLE]

where $C\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C^{\prime}$ denotes the existence of a $C$ - $C^{\prime}$ path in the RJT $\mathcal{G}$ considered. This notation will be used throughout this section. Indeed, since $u\in C_{u}$ and $\mathrm{fa}(w)\subset C_{w}$ by definition, and since $C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{w}$ , the running intersection property implies $u\in C_{v}$ . This motivates Algorithm 1, a simple RJT construction algorithm which propagates iteratively elements present in each cluster node to their parent cluster node. Let $\preceq$ be an arbitrary topological order on $G$ , and $\max_{\preceq}C$ denote the maximum of $C$ for the topological order $\preceq$ . The algorithm maintains a set $C^{\prime}_{v}$ for each vertex $v$ , which coincide at the end of the algorithm with the nodes $C_{v}$ in the RJT produced. We denote by $\check{C}_{v}^{\prime}$ is the set $C^{\prime}_{v}\backslash\{v\}$ . As we will show, Algorithm 1 produces an RJT $\mathcal{G}=(\mathcal{V},\mathcal{A})$ which is minimal for $\preceq$ , in the sense that it satisfies a converse of (8).

*Remark 3**.*

Algorithm 1 takes as input a topological order on $G$ . For a practical use, we recommend to use Algorithm 3 in Appendix C, which builds simultaneously the RJT and a “good” topological order. ∎

For instance, for any topological order on the graph of the chess example of Figure 2, Algorithm 1 produces the RJT illustrated on Figure 4.

The following proposition, whose proof can be found in Appendix A shows that Algorithm 1 builds the minimal RJT.

Proposition 3.

Algorithm 1 produces an RJT such that the root node $C_{v}$ of $v$ is $C^{\prime}_{v}$ , satisfying $\mathring{C_{v}}=\{v\}$ , that admits $\preceq$ as a topological order, and such that $(u\in C_{v})\Rightarrow(u\preceq v)$ . Moreover, its cluster nodes are minimal in the sense that

[TABLE]

4 MILP formulation for influence diagrams

Given that Algorithm 1 produces an RJT such that $\mathring{C_{v}}=\{v\}$ for all $v\in V,$ we will assume in the rest of the paper that all the RJTs considered satisfy this property. As noted in Remark 1, any RJT can be turned into an RJT satisfying this property by adding more nodes. In the rest of the paper, we work with the following variant of the local polytope $\mathcal{L}_{\mathcal{G}}^{0}$ defined in Equation (6)

[TABLE]

where moments $\mu_{\check{C}_{v}}$ have been introduced. This is for convenience, and all the results could have been written using $\mathcal{L}_{\mathcal{G}}^{0}$ .

On graphical models, the inference problem, which is hard in general, becomes easy on junction trees. Since problem (3) is $NP$ -hard even when restricted to graphs of treewidth 2 (Mauá et al., 2012b), unless $P=NP$ , the situation is strictly worse for the MEU problem associated with influence diagrams. However, we will see in this section that, given a rooted junction tree, we can obtain mathematical programs to solve the MEU problem 3 with a tractable number of variables and constraints provided that cliques are of reasonable size. We first obtain an NLP formulation in Section 4.1, and then linearize it into an exact mixed integer linear program (MILP) in Section 4.2.

4.1 An exact Non Linear Program formulation

Consider a Parameterized Influence Diagram (PID) encoded as the quadruple $(G,\mathcal{X},\mathfrak{p},r)$ , where $G=(V,E)$ is a graph with set of vertices $V$ partitioned into $(V^{\mathrm{a}},V^{\mathrm{s}})$ , with $\mathcal{X}=\prod_{v\in V}\mathcal{X}_{v}$ the support of the vector of random variables attached to all vertices of $G$ , $\mathfrak{p}=\{p_{v|\mathrm{prt}(v)}\}_{v\in V^{\mathrm{s}}}$ is the collection of fixed and assumed known conditional probabilities, and $r=\{r_{v}\}_{v\in V^{\mathrm{r}}}$ is the collection of reward functions333we remind the reader that $V^{\mathrm{r}}$ is the set of utility vertices as introduced in Section 1.1 $r_{v}:\mathcal{X}_{v}\rightarrow\mathbb{R}$ which we will also view as vectors $r_{v}\in\mathbb{R}^{|\mathcal{X}_{v}|}.$

For $(G,\mathcal{X},\mathfrak{p},r)$ a given PID, and $\mathcal{G}$ a given RJT, we introduce the following polytope

[TABLE]

where the equality $\mu_{C_{v}}=\mu_{\check{C}_{v}}\,p_{v|\mathrm{prt}(v)}$ should be understood functionally, i.e., meaning that $\mu_{C_{v}}(x_{C_{v}})=\mu_{\check{C}_{v}}(x_{\check{C}_{v}})\,p_{v|\mathrm{prt}(v)}(x_{v}|x_{\mathrm{prt}(v)}),\>\forall x_{C_{v}}\in\mathcal{X}_{C_{v}};$ we will use such functional (in)equalities throughout the paper. We omit the dependence of $\overline{\mathcal{P}}$ in $(G,\mathcal{X},\mathfrak{p},\mathcal{G})$ when the context is clear. Consider the following Non Linear Program (NLP)

[TABLE]

where the inner product notation $\langle r_{v},\mu_{v}\rangle$ stands for $\sum_{x_{v}}\mu_{v}(x_{v})r_{v}(x_{v})$ . Note that the constraints $\delta\in\Delta$ are implied by the other ones.

Theorem 4.

The (NLP) Problems (11) and (14) are equivalent to the MEU Problem (3), in the sense that they have the same value and that, if $(\mu,\delta)$ is a feasible solution for Problem (11), then $\delta$ defines an admissible policy for Problem (3), and $\mu$ characterizes the moments of the distribution induced by $\delta$ .

Proof.

If $(\mu,\delta)$ is a solution of (11), then $\mu$ is is a solution of (14), and conversely, if $\mu$ is a solution of (14), by definition of $\mathcal{S}(G)$ , there exists $\delta$ such that $(\mu,\delta)$ is a solution of (11), which gives the equivalence between (11) and (14).

Let now $(\mu,\delta)$ be an admissible solution of Problem (11). Then $\delta$ is an admissible solution of the MEU problem. We now prove that $\mu$ corresponds to the moments of the distribution $\mathbb{P}_{\delta}$ induced by $\delta$ , from which we can deduce that $\mathbb{E}_{\delta}\Big{(}\sum_{v\in V^{\mathrm{r}}}r_{v}(X_{v})\Big{)}=\sum_{v\in V^{\mathrm{r}}}\langle r_{v},\mu_{v}\rangle$ . Note that, if $A$ , $P$ , and $D$ are disjoint subsets of $V$ , $\mu$ is a distribution on $\mathcal{X}_{V}$ , $\mu_{A\cup P\cup D}$ is the distribution induced by $\mu$ on $\mathcal{X}_{A\cup P\cup D}$ , and $p_{D|P}$ is a conditional distribution of $D$ given $P$ , then

[TABLE]

where the independence is according to $\mu$ . By (12), we have that the vector $\mu$ satisfies the conditions of Theorem 2, and hence corresponds to a distribution $\mathbb{P}_{\mu}$ that factorizes on $G$ . Furthermore, constraint (10) ensures that $\mathbb{P}_{\mu}(X_{v}|X_{\mathrm{prt}(v)})=p_{v|\mathrm{prt}(v)}$ for all $v\in V^{\mathrm{s}}$ , which yields the result. Conversely, let $\delta$ be an admissible solution of the MEU Problem (3), and $\mu$ be the vector of moments induced by $\mathbb{P}_{\delta}$ . We have $\mu_{C_{v}}=\mu_{\check{C}_{v}}p_{v|\mathrm{prt}(v)}$ for $v$ in $V^{\mathrm{s}}$ and $\mu_{C_{v}}=\mu_{\check{C}_{v}}\delta_{v|\mathrm{prt}(v)}$ for $v$ in $V^{\mathrm{a}}$ , and $(\mu,\delta)$ is a solution of (11). Furthermore, $\mathbb{E}_{\delta}\Big{(}\sum_{v\in V^{\mathrm{r}}}r_{v}(X_{v})\Big{)}=\sum_{v\in V^{\mathrm{r}}}\langle r_{v},\mu_{v}\rangle$ , and (11) is equivalent to the MEU Problem (3). ∎

By introducing the following set of moments

[TABLE]

we can reformulate the Problem (11) more concisely as

[TABLE]

$\mathcal{S}(G)$ is the set of moments corresponding to distributions induced by feasible policies: $\mu$ is in $\mathcal{S}(G)$ if there exists $\delta$ in $\Delta$ such that $\mu_{C_{v}}(x_{C_{v}})=\mathbb{P}_{\delta}(X_{C_{v}}=x_{C_{v}})$ for all $v$ and $x_{C_{v}}$ . It is non-convex in general as shown by the examples in the proof of Theorem 12. However, we show in Section 6 that $\mathcal{S}(G)$ is a polytope if $G$ is soluble, a property identifying “easy” IDs.

4.2 MILP formulation

The NLP (11) is hard to solve due to the non-linear constraints (11d). But by Theorem 4, Problems (3) and (11) are equivalent, and in particular admit the same optimal solutions in terms of $\delta$ .

We recall that there always exists at least one optimal policy which is deterministic (and therefore integral) for Problem (3), that is a policy $\delta$ such that

[TABLE]

We can therefore add integrality constraint (15) to (11). With this integrality constraint, Equation (11d) becomes a logical constraint, *i.e., *a constraint of the form $\lambda y=z$ with $\lambda$ binary and continuous $y$ and $z$ . Such constraints can be handled by modern MILP solvers such as CPLEX or Gurobi, that can therefore directly solve Problem (11). Alternatively, by a classical result in integer programming, we can turn Problem (11) into an equivalent MILP by replacing constraint (11d) by its McCormick relaxation (McCormick, 1976). For a given $\mathfrak{p}$ , let $b$ be a vector of upper bounds $b_{\check{C}_{v}}(x_{\check{C}_{v}})$ satisfying

[TABLE]

For such a vector $b$ , we say that, for a given node $v$ , $(\mu_{C_{v}},\delta_{v|\mathrm{prt}(v)})$ satisfies McCormick’s inequalities (see appendix D) if

[TABLE]

Note that the last inequality $\mu_{C_{v}}\leq\mu_{\check{C}_{v}}$ can be omitted in our case as it is implied by the marginalization constraint $\mu_{\check{C}_{v}}=\sum_{x_{v}}\mu_{C_{v}}$ in the definition of $\mathcal{L}_{\mathcal{G}}$ . Given the upper bounds provided by $b$ , we introduce the polytope of valid moments and decisions satisfying all McCormick constraints:

[TABLE]

With the previously introduced notation the MEU Problem (3) is equivalent to the following MILP:

[TABLE]

where $\Delta^{\mathrm{d}}$ is the set of deterministic policies and contains the integrality constraints (15).

*Remark 4**.*

The strength of the McCormick constraints ( ${\rm McCormick}(v,b)$ ) depends on the quality of the bounds $b_{\check{C}_{v}}$ on $\mu_{\check{C}_{v}}$ . As for a solution $\mu$ of Problem (18), $\mu_{\check{C}_{v}}$ corresponds to a probability distribution, the simplest admissible bound over $\mu_{\check{C}_{v}}$ is simply $b=1$ . Unfortunately, McCormick’s constraints are loose in this case: we show in Appendix D.2.1 that, for any $\mu$ in $\overline{\mathcal{P}}$ , there exists $\delta$ in $\Delta$ such that $(\mu,\delta)$ satisfies the McCormick constraints. Hence, when $b=1$ , McCormick constraints fail to retain any information about the conditional independence statements encoded in the associated nonlinear constraints. Since $\delta$ does not appear outside of the McCormick constraints, their sole interest in that case is to enable the branching decisions on $\delta$ to have an impact on $\mu$ . Appendix D.2.2 gives an example showing that McCormick constraints do retain information about the conditional independence if bounds $b_{\check{C}_{v}}$ smaller than $1$ are used. Finally, Appendix D.3 provides a dynamic programming algorithm that efficiently computes such a $b$ .∎

5 Valid cuts

Classical techniques in integer programming such as branch and bound algorithms rely on solving the relaxation of the MILP to obtain a lower bound on the value of the objective. For Problem (18) the relaxation is likely to be poor, and so the MILP is not well solved by off-the-shelf solvers: indeed as explained above, when $b=1$ , the McCormick inequalities fail completely to enforce in the linear relaxation the conditional independences that are encoded in the nonlinear constraints, and using a better bound $b$ does not completely adress the issue. In this section, we introduce valid cuts to strengthen the relaxation and ease the MILP resolution. A valid cut for a MILP is an (in)equality that is satisfied by any solution of the MILP, but not necessarily by solutions of its linear relaxation. A family of valid cuts is stronger than another when the former yields a polytope strictly included in the latter.

5.1 Constructing valid cuts

By restricting ourselves to vectors of moments $\mu\in\overline{\mathcal{P}}$ , we have imposed

[TABLE]

because $\mu\in\overline{\mathcal{P}}$ must satisfy $\mu_{C_{v}}=\mu_{\check{C}_{v}}p_{v|\mathrm{prt}(v)}$ . If we could impose as well the nonlinear constraints $\mu_{C_{v}}=\mu_{\check{C}_{v}}\delta_{v|\mathrm{prt}(v)}$ for $v$ in $V^{\mathrm{a}}$ , we would be able to impose that decisions encoded in $\mu$ at the nodes $a\in V^{\mathrm{a}}$ satisfy $\mathbb{P}_{\mu}(X_{a}|X_{C_{a}\backslash\{a\}})=\mathbb{P}_{\mu}(X_{a}|X_{\mathrm{prt}(a)})$ . Unfortunately, in general, The constraint $\mu_{C_{v}}=\mu_{\check{C}_{v}}p_{v|\mathrm{prt}(v)}$ for $v$ in $V^{\mathrm{s}}$ is linear only because $p_{v|\mathrm{prt}(v)}$ is a constant that does not depend on $\delta$ . But, as an indirect consequence of setting the conditional distributions $p_{v|\mathrm{prt}(v)}$ for $v\in V^{\mathrm{s}}$ , there are other conditional distributions that do not depend on $\delta$ . Indeed, for some pairs of sets of vertices $C,D$ with $D\subseteq C$ , the conditional probabilities $\mathbb{P}_{\delta}(X_{D}=x_{D}|X_{C\backslash D}=x_{C\backslash D})$ are identical for any policy $\delta$ . We can therefore introduce valid cuts of the form

[TABLE]

While these additional constraints are not needed to set the value of the conditionals on $v\in V^{\mathrm{s}}$ and the conditional independences of the form $X_{v}\,\bot\!\!\!\bot\,X_{V\backslash\mathrm{dsc}(v)}\mid X_{\mathrm{prt}(v)}$ for $v\in V^{\mathrm{s}}$ , they can be useful to enforce some of the conditional independences that should be satisfied by $\mu$ at decision nodes. In particular, if there exists a subset $M$ of $C\backslash D$ such that $p_{D|C\backslash D}=p_{D|M}$ , then (19) enforces that for any $v\in V^{\mathrm{a}}\cap(C\backslash(D\cup M)),$ we have $\mathbb{P}_{\mu}(X_{a}|X_{D\cup M})=\mathbb{P}_{\mu}(X_{a}|X_{M})$ . Clearly, the larger $D$ , the stronger the valid cut. This motivates the following definition.

Definition 2.

Given a set of vertices $C$ , we define $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ to be the largest subset $D$ of $C$ such that, for any parametrization of $G$ , there exists $p_{D|C\backslash D}$ such that $\mathbb{P}_{\delta}(X_{D}|X_{C\backslash D})=p_{D|C\backslash D}$ holds for any policy $\delta$ . We define $C^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}$ as $C\backslash C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ .

It is not obvious that a largest such set exists and is unique, and therefore that $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ is well defined. We prove that it is the case later in this section. As for now, if we accept that $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ is well defined, then the equalities

[TABLE]

are the strongest valid cuts of the form (19) that we can obtain for Problem (18). We can then define $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}$ as the polytope we obtain when we strengthen $\overline{\mathcal{P}}$ with our valid cuts:

[TABLE]

In the definition of $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}$ , we decided to introduce valid cuts of the form (20) only for sets of vertices $C$ of the form $C_{v}$ with $v\in V^{\mathrm{a}}$ . This is to strike a balance between the number of constraints added and the number of independences enforced. Our choice is however heuristic, and it could notably be relevant to introduce constraints of the form (20) for well chosen $C\subsetneq C_{v}$ .

Figure 5 provides an example of ID where valid cuts (20) reduce the size of the initial polytope. To compute $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ , we have used the characterization in the next section.

5.2 Characterization of $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$

In order to characterize $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ , we need some concepts from graphical model theory. The first notions make it possible to identify conditional independence from properties of the graph. Let $D\subset V$ be a set of vertices. A trail ${v_{1}}\rightleftharpoons\dots\rightleftharpoons{v_{n}}$ is active given $D$ if, whenever there is a v-structure ${v_{i-1}}\rightarrow{v_{i}}\leftarrow{v_{i+1}}$ , then $v_{i}$ or one of its descendant is in $D$ , and no other vertex of the trail is in $D$ . Two sets of vertices $B_{1}$ and $B_{2}$ are said to be d-separated by $D$ in $G$ , and we will denote this property by $B_{1}\,\bot\,B_{2}\mid D$ , if there is no active trail between $B_{1}$ and $B_{2}$ given $D$ . We have $X_{B_{1}}\,\bot\!\!\!\bot\,X_{B_{2}}\mid X_{D}$ for any distribution that factorizes on $G$ if and only if $B_{1}$ and $B_{2}$ are d-separated by $D$ (Koller and Friedman, 2009, Theorem 3.4).

The other notion we need is the one of augmented model (Koller and Friedman, 2009, Chapter 21). Consider $(G,\rho)$ , a PID with $G=(V^{\mathrm{s}},V^{\mathrm{a}},E)$ , and let $V=V^{\mathrm{a}}\cup V^{\mathrm{s}}$ . For each $v\in V^{\mathrm{a}}$ , we introduce a vertex $\vartheta_{v}$ and a corresponding random variable $\theta_{v}$ . The variable $\theta_{v}$ takes its value in the space $\Delta_{v}$ of conditional distributions on $X_{v}$ given $X_{\mathrm{prt}(v)}$ . Let ${G^{\dagger}}$ be the digraph with vertex set $V_{G^{\dagger}}=V\cup\vartheta_{V^{\mathrm{a}}}$ , where $\vartheta_{V^{\mathrm{a}}}=\{\vartheta_{v}\}_{v\in V^{\mathrm{a}}}$ , and arc set $E_{G^{\dagger}}=E\cup\{(\vartheta_{v},v),\forall v\in V^{\mathrm{a}}\}$ . Such a graph ${G^{\dagger}}$ is illustrated on Figure 6, where vertices in ${G^{\dagger}}\backslash G$ are represented as rectangles with rounded corners. The augmented model of $(G,\rho)$ is the collection of distributions factorizing on ${G^{\dagger}}$ such that $\mathcal{X}_{v}$ is defined as in $\rho$ for each $v$ in $V$ , $\mathcal{X}_{\theta_{v}}=\Delta_{v}$ , and

[TABLE]

where $x_{\mathrm{prt}_{{G^{\dagger}}}(v)}=(x_{\mathrm{prt}_{G}(v)},\theta^{o}_{v})$ for $v\in V^{\mathrm{a}}$ , and $x_{\mathrm{prt}_{{G^{\dagger}}}(v)}=x_{\mathrm{prt}_{G}(v)}$ for $v\in V^{\mathrm{s}}$ .

A distribution of the augmented model is specified by choosing the distributions of the $\theta_{v}$ . In the rest of the paper, we denote by $\mathbb{P}_{{G^{\dagger}}}$ the distribution of the augmented model with uniformly distributed $\theta_{v}$ for each $v$ in $V^{\mathrm{a}}$ .

With these definitions, a policy $\delta$ can now be interpreted as a value taken by $\theta_{V^{\mathrm{a}}}$ , and we have

[TABLE]

Note that in general $\mathbb{P}_{{G^{\dagger}}}(X_{D}=x_{D}|X_{M}=x_{M})$ is the expected value over $\theta_{V^{\mathrm{a}}}$ of $\mathbb{P}_{\theta_{V^{\mathrm{a}}}}(X_{D}=x_{D}|X_{M}=x_{M}).$ The following result, which is an immediate consequence of (23), characterizes the pairs $(D,M)$ such that the conditional distribution $\mathbb{P}_{\delta}(X_{D}|X_{M})$ is the same regardless of the choice of policy $\delta$ .

Proposition 5.

We have $\mathbb{P}_{\delta}(X_{D}|X_{M})=\mathbb{P}_{{G^{\dagger}}}(X_{D}|X_{M})$ for any PID on $G$ , any policy $\delta$ , and any $M$ such that $\mathbb{P}_{\delta}(X_{M})>0$ if and only if $D$ is d-separated from $\vartheta_{V^{\mathrm{a}}}$ given $M$ in ${G^{\dagger}}.$

Note that this is a particular case of a result known in the causality theory for graphical models (see *e.g., * Koller and Friedman, 2009, Proposition 21.3). We have now all the tools to characterize $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ .

Theorem 6.

$C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ * exists, is unique, and equal to $\Big{\{}v\in C\colon v\perp\vartheta_{V^{\mathrm{a}}}\,|\,C\backslash\{v\}\Big{\}}$ .*

With this characterization, the reader can check the value of $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ on the example of Figure 5.

If we want to use the valid cuts in (21) in practice, we must to compute $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ and $p_{C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}|C^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}}$ efficiently. Theorem 6 ensures that $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ is easy to compute using any d-separation algorithm (and more efficient algorithms are presumably possible), and Proposition 5 ensures that, if we solve the inference problem on the RJT for an arbitrary policy, *e.g., *one where decisions are taken with uniform probability, we can deduce $p_{C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}|C^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}}$ from the distribution $\mu_{C}$ obtained.

Theorem 6 is an immediate corollary of the following Lemma, recently proved by two of the authors (Cohen and Parmentier, 2019, Theorem 1).

Lemma 7.

Let $B$ and $C$ be two sets of vertices. Then $M^{*}:=\Big{\{}v\in C\backslash B\colon v\,{\mathchoice{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \displaystyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \textstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 2.625pt\kern-4.11108pt$ \scriptstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 1.875pt\kern-3.3333pt$ \scriptscriptstyle\not $\hss}{\perp}}}}\,B\,|\,C\backslash(B\cup\{v\})\Big{\}}$ is a subset $M$ of $C$ such that

[TABLE]

Furthermore, if $M$ satisfies (24), then $M^{*}\subseteq M,$ so that $M^{*}$ is a minimum for the inclusion.

Cohen and Parmentier (2019) call $M^{*}$ the Markov Blanket of $B$ in $C$ . Note that if $C=V$ this is the usual Markov Blanket.

Proof of Theorem 6.

Let $M$ be a subset of $C$ . Proposition 5 ensures that $\mathbb{P}_{\delta}(X_{C\backslash M}|X_{M})$ does not depend on $\delta$ for any parametrization if and only if $C\backslash M\perp\vartheta_{V^{\mathrm{a}}}\mid M$ . Theorem 6 then follows by letting $B=\vartheta_{V^{\mathrm{a}}}$ in Lemma 7. ∎

Using the terminology of Cohen and Parmentier (2019), $C^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}$ is the Markov blanket of $\vartheta_{V^{\mathrm{a}}}$ in $C$ .

5.3 Strength of the relaxations and their interpretation in terms of graph

Consider $(G,\rho)$ , a PID with $G=(V^{\mathrm{s}},V^{\mathrm{a}},E)$ and $\rho=(\mathcal{X},\mathfrak{p},r)$ . Let $\mathcal{G}$ be an RJT on $G$ , and $b$ an admissible bound satisfying (16). The valid cuts of Section 5.1 enable to introduce the following strengthened version of the MILP (18).

[TABLE]

The following proposition summarizes the results of Section 5.1.

Proposition 8.

Any feasible solution $(\mu,\delta)$ of the MILP (25) is such that $\mu$ is the vector of moments of the distribution $\mathbb{P}_{\delta}$ . Hence, $(\mu,\delta)$ is an optimal solution of (25) if and only if $\delta$ is an optimal solution of the MEU problem (3) on $(G,\rho)$ .

In this section we give interpretations of the linear relaxations of (18) and (25) in terms of graphs. We introduce the sets of edges and IDs

[TABLE]

Figure 7 illustrates $\overline{G}$ and $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ on the ID of Figure 2. Note that $E\subseteq E^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!\subseteq\overline{E}$ , and remark the three following facts on $\overline{G}$ and $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ . First, the definition of both IDs depends on $G$ and $\mathcal{G}$ . Second, $\mathcal{G}$ is still an RJT on $\overline{G}$ and $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ . And third, any parametrization $(\mathcal{X}_{V},\mathfrak{p},r)$ of $G$ is also a parametrization of $\overline{G}$ and of $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ . The second and third results are satisfied by any ID $G^{\prime}=(V^{\mathrm{s}},V^{\mathrm{a}},E\cup E^{\prime})$ , where $E^{\prime}$ contains only arcs of the form $(u,v)$ with $v\in V^{\mathrm{a}}$ and $u\in C_{v}$ . Hence, if we denote by $\Delta_{G^{\prime}}$ the set of feasible policies for $(G^{\prime},\mathcal{X}_{V},\mathfrak{p},r)$ , we can extend the definition of $\mathcal{S}(G)$ in Equation (13) to such $G^{\prime}$

[TABLE]

Theorem 9.

We have

[TABLE]

and

[TABLE]

Hence, if $(\mu,\delta)$ is a solution of the linear relaxation of (18), then $\delta$ is a policy on $\overline{G}$ , while if $(\mu,\delta)$ is a solution of the linear relaxation of (25), then $\delta$ is a policy on $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ .

Remark furthermore that $\mathcal{S}(G^{\prime})$ is generally not a polytope. Indeed, when $G^{\prime}=G$ , this is the reason why $\eqref{pb:NLP}$ is not a linear program. An important result of the theorem is that $\mathcal{S}(\overline{G})$ and $\mathcal{S}(G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!)$ are polytopes, and $MEU(\overline{G},\rho)$ and $MEU(G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!,\rho)$ can therefore be solved using the linear programs $\max_{\mu\in\overline{\mathcal{P}}}\displaystyle\sum_{v\in V^{\mathrm{r}}}\langle r_{v},\mu_{v}\rangle$ and $\max_{\mu\in\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}\displaystyle\sum_{v\in V^{\mathrm{r}}}\langle r_{v},\mu_{v}\rangle$ respectively.

The proof of the theorem uses the following lemma.

Lemma 10.

Let $v$ be a vertex in $V^{\mathrm{a}}$ . Then $x_{C_{v}}\mapsto p_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}|C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}}(x_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}}|x_{C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash v},x_{v})$ is a function of $(x_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}},x_{C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash v})$ only. Hence, if a distribution $\mu_{C_{v}}$ satisfies $\mu_{C_{v}}=\mu_{C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}}p_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}|C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}}$ , then $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}\perp v\mid C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash\{v\}$ according to $\mu_{C_{v}}$ .

Proof.

Consider the augmented model $\mathbb{P}_{{G^{\dagger}}}$ . Let $P$ be a $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ - $v$ trail. Let $Q$ be the trail $P$ followed by the arc $(v,\vartheta_{v})$ . Given that $v$ has no descendants in $C_{v}$ (because of the hypothesis $\mathring{C}_{v}=\{v\}$ ), the vertex $v$ is a v-structure of $Q$ . As $v\in C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}$ , if $P$ is active given $C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash\{v\}$ , then $P$ is active given $C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}$ , which contradicts the definition of $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ . Hence, $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}\perp v\mid C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash\{v\}$ according to $\mathbb{P}_{{G^{\dagger}}}$ , and $x_{C_{v}}\mapsto p_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}|C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}}(x_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}}|x_{C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash v},x_{v})$ is a function of $(x_{C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}},x_{C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash v})$ only. The second part of the lemma is an immediate corollary. ∎

Proof of Theorem 9.

First, remark that, once we have proved $\overline{\mathcal{P}}=\mathcal{S}(\overline{G})$ and $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}=\mathcal{S}(G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!)$ , the result follows from Theorem 4.

We now prove $\overline{\mathcal{P}}=\mathcal{S}(\overline{G})$ . Let $\mu$ be in $\overline{\mathcal{P}}$ . Then $\mu$ is a vector of moments in the local polytope of the RJT $\mathcal{G}$ on $\overline{G}$ . Furthermore, since, first, for $v\in V^{\mathrm{a}},$ $\mathrm{fa}_{\overline{G}}(v)=C_{v}$ , and second, for $v\in V^{\mathrm{s}},\mu_{C_{v}}=\mu_{\check{C}_{v}}p_{v|\mathrm{prt}_{G}(v)}$ together with $\mathrm{prt}_{\overline{G}}(v)=\mathrm{prt}_{G}(v)$ imply that, according to $\mu_{C_{v}}$ , $X_{v}$ is independent from its non-descendants in $\overline{G}$ restricted to $C_{v}$ given $\mathrm{prt}_{\overline{G}}(v)$ , Theorem 2 ensures that $\mu$ is a vector of moments of a distribution that factorizes on $\overline{G}$ , which yields $\overline{\mathcal{P}}\subseteq\mathcal{S}(\overline{G})$ . Inclusion $\mathcal{S}(\overline{G})\subseteq\overline{\mathcal{P}}$ is immediate.

Consider now a vector of moments $\mu$ in $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}$ . Given $v\in V^{\mathrm{a}}$ , Lemma 10 and the definition of $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ ensure that, according to $\mu_{C_{v}}$ , variable $X_{v}$ is independent from its non-descendants in $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ in $C_{v}$ , i.e., $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}\backslash\{v\}$ , given its parents in $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ , i.e., $C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\backslash v$ . If $v\in V^{\mathrm{s}}$ , constraints $\mu_{C_{v}}=\mu_{\check{C}_{v}}p_{v|\mathrm{prt}(v)}$ still implies that $X_{v}$ is independent from its non-descendants in $C_{v}$ given its parents according to $\mu_{C_{v}}$ , because by definition of $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ , for $v\in V^{\mathrm{s}}$ , we have $\mathrm{prt}_{G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!}(v)=\mathrm{prt}_{G}(v)$ . Theorem 2 again enables to conclude that $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}\subseteq\mathcal{S}(G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!)$ . Inclusion $\mathcal{S}(G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!)\subseteq\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}$ is immediate. ∎

6 Soluble influence diagrams

In this section, we make the assumption that IDs are such that any vertex $v\in V$ has a descendant in the set of utility vertices $V^{\mathrm{r}}$ , i.e., $V^{\mathrm{s}}\cup V^{\mathrm{a}}=\overline{\mathrm{asc}}(V^{\mathrm{r}})$ . The following remark explains why we can make this assumption without loss of generality.

*Remark 5**.*

Consider a parametrized ID $(G,\rho)$ where $G=(V^{\mathrm{s}},V^{\mathrm{a}},E)$ and $V^{\mathrm{s}}$ is the union of chance vertices $V^{\mathrm{c}}$ and utility vertices $V^{\mathrm{r}}$ . Let $(G^{\prime},\rho^{\prime})$ be the ID obtained by removing any vertex that is not in $V^{\mathrm{r}}$ and has no descendant in $V^{\mathrm{r}}$ and restrict $\rho$ accordingly. If a random vector $X_{V}$ factorizes as a directed graphical model on $(V,E)$ and $V^{\prime}\subseteq V$ is such that $\overline{\mathrm{asc}}(V^{\prime})=V^{\prime}$ , then $X_{V^{\prime}}$ factorizes as a directed graphical model on the subgraph induced by $V^{\prime}$ with the same conditional probabilities $p_{v|\mathrm{prt}(v)}$ . Hence, given a policy $\delta$ on $(G,\rho)$ and its restriction $\delta^{\prime}$ to $(G^{\prime},\rho^{\prime})$ , we have $\mathbb{E}_{\delta}\big{(}\sum_{v\in V^{\mathrm{r}}}r_{v}(X_{v})\big{)}=\mathbb{E}_{\delta^{\prime}}\big{(}\sum_{v\in V^{\mathrm{r}}}r_{v}(X_{v})\big{)}$ where the first expectation is taken in $(G,\rho)$ and the second in $(G^{\prime},\rho^{\prime})$ , and the two IDs model the same MEU problem. ∎

The proofs of this section are quite technical and can be found in Appendix B.

6.1 Linear program for soluble influence diagrams

Consider an ID $G=(V^{\mathrm{s}},V^{\mathrm{a}},E)$ with $V^{\mathrm{s}}$ being the union of chance vertices $V^{\mathrm{c}}$ and utility vertices $V^{\mathrm{r}}$ . Given a policy $(\delta_{u})_{u\in V^{\mathrm{a}}}$ and a decision vertex $v$ , we denote $\delta_{-v}$ the partial policy $(\delta_{u})_{u\in V^{\mathrm{a}}\backslash v}$ . A policy $(\delta_{v})_{v\in V^{\mathrm{a}}}$ is called a local optimum if

[TABLE]

It is a global optimum if it is an optimal solution of (3). Two concepts, s-reachability and the relevance graph have been introduced in the literature to characterize when a local minimum is also global (see *e.g., * Koller and Friedman, 2009, Chapter 23.5). A decision vertex $u$ is s-reachable from a decision vertex $v$ if $\vartheta_{u}$ is not d-separated from $\mathrm{dsc}(v)$ given $\mathrm{fa}(v)$ :

[TABLE]

The usual definition is $\vartheta_{u}\,{\mathchoice{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \displaystyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \textstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 2.625pt\kern-4.11108pt$ \scriptstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 1.875pt\kern-3.3333pt$ \scriptscriptstyle\not $\hss}{\perp}}}}_{G^{\dagger}}\,\mathrm{dsc}(v)\cap V^{\mathrm{r}}\mid\mathrm{fa}(v)$ , but these definitions coincide in our setting, since we have assumed that $\mathrm{dsc}(v)\cap V^{\mathrm{r}}\neq\emptyset$ for any $v\in V^{\mathrm{a}}$ . Intuitively, the definition of this concept is motivated by the fact that the choice of policy $\delta_{v}$ given $(\delta_{w})_{w\neq v}$ depends on $\delta_{u}$ only if $u$ is s-reachable from $v$ . Note that, for example, if $u\in\mathrm{dsc}(v),$ then $u$ is s-reachable from $v$ . The relevance graph of $G$ is the digraph $H$ with vertex set $V^{\mathrm{a}}$ , and whose arcs are the pairs $(v,u)$ of decision vertices such that $u$ is s-reachable from $v$ . Finally, the single policy update algorithm (SPU) (Lauritzen and Nilsson, 2001) is the standard coordinate ascent heuristic for IDs. It iteratively improves a policy $\delta$ as follows: at each step, a vertex $v$ is picked, and $\delta_{v}$ is replaced by an element in $\displaystyle\operatorname*{\arg\!\max}_{\delta^{\prime}_{v}\in\Delta_{v}}\mathbb{E}_{\delta^{\prime}_{v},\delta_{-v}}\Big{(}\sum_{u\in V^{\mathrm{r}}}r_{u}(X_{u})\Big{)}$ .

The following proposition characterizes a subset of IDs, called soluble IDs, which are easily solved, and provides several equivalent criteria to identify them.

Proposition 11.

(Koller and Friedman, 2009, Theorem 23.5)* Given an influence diagram $G$ , the following statements are equivalent and define a soluble influence diagram.*

For any parametrization $\rho$ of $G$ , any local optimum is a global optimum. 2. 2.

For any parametrization $\rho$ of $G$ , SPU converges to a global optimum in a finite number of steps444In fact, if the graph is soluble, and if the decision nodes are ordered in reverse topological order for the relevance graph, then SPU converges after exactly one pass over the nodes. . 3. 3.

The relevance graph is acyclic.

Given a parametrized influence diagram $G$ and an RJT $\mathcal{G}$ , we introduced in Equation (13) the notation $\mathcal{S}(G)$ for the subset of the local polytope $\mathcal{L}_{G}$ corresponding to moments of policies.

The following theorem introduces a new characterization of soluble IDs in terms of convexity.

Theorem 12.

If $G$ is not soluble then there exists a parametrization $\rho$ such that, for any junction tree $\mathcal{G}$ , the set of achievable moments $\mathcal{S}(\mathcal{G})$ is not convex.

If $G$ is soluble, Algorithm 2 returns an RJT such that $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}=\mathcal{S}(\mathcal{G})$ for any parametrization $\rho$ .

The property of being soluble characterizes “easy” IDs that can be solved by SPU. Theorems 4 and 12 imply that, if $G$ is soluble, our MILP formulation 25 reduces to the linear program

[TABLE]

and is therefore “easy” to solve. Of course, this property of being “easy” refers only to the decision part of the ID. If a soluble ID is such that, given a policy, the inference problem is not tractable, both SPU and our MILP formulation will not be tractable in practice. Theorem 12 is a corollary of Theorem 9 and the following lemma, and both results are proved in Section B.

Lemma 13.

There exists an RJT $\mathcal{G}$ such that $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!=G$ if and only if $G$ is soluble. Such an RJT can be computed using Algorithm 2.

Note that based on a topological order of the relevant graph, Algorithm 2 proceeds by computing a maximal perfect recall graph that contains graph $G$ and that assigns the same parent sets to elements of $V^{\mathrm{s}},$ then uses a topological order of this graph to order the nodes of $G$ for the computation of a rooted junction tree.

6.2 Comparison of soluble and linear relaxations

MILP solvers are based on (much improved) branch-and-bound algorithms that use the linear relaxation to obtain bounds. Their ability to solve formulation (25) therefore depends on the quality of the bound provided by the linear relaxation. As SPU solves efficiently soluble IDs, we could imagine alternative branch-and-bounds schemes that use bounds computed using SPU on “soluble graph relaxation” of influence diagrams. We now formalize this notion and compare the two approaches.

A soluble graph relaxation of an ID $G=(V^{\mathrm{s}},V^{\mathrm{a}},E)$ is a soluble ID $G^{\prime}=(V^{\mathrm{s}},V^{\mathrm{a}},E^{\prime})$ where $E^{\prime}$ is the union of $E$ and a set of arcs with head in $V^{\mathrm{a}}$ . Remark that Theorem 9 can be reinterpreted as the link between soluble graph relaxation and linear relaxations. And since $\mathcal{S}(\overline{G})=\overline{\mathcal{P}}$ and $\mathcal{S}(G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!)=\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}$ , by Theorem 12, $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!$ and $\overline{G}$ are soluble, and therefore soluble graph relaxations of $G$ .

Since any feasible policy for the ID $G$ is a feasible policy for a soluble graph relaxation $G^{\prime}$ , for any parametrization $\rho$ , the value of $\mathrm{MEU}(G^{\prime},\rho)$ , which can be computed by SPU, provides a tractable bound on $\mathrm{MEU}(G,\rho)$ . Soluble relaxations can therefore be used in branch-and-bound schemes for IDs, as proposed in Khaled et al. (2013). To compare the interest of such a scheme to our MILP approach we need to compare the quality of the soluble graph relaxation and linear relaxation bounds. Let $G^{\prime}$ be a soluble graph relaxation of $G$ , applying Algorithm 2 on $G^{\prime}$ provides an RJT such that $E^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!\subseteq E^{\prime}$ Indeed, by Lemma 13, $v$ is d-separated from $C_{v}\backslash\mathrm{fa}_{G^{\prime}}(v)$ given $\mathrm{prt}_{G^{\prime}}(v)$ in $G^{\prime}$ , and therefore also in $G$ , which implies $E^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!\subseteq E^{\prime}$ . Thus, by Theorem 9, the bound provided by the linear relaxation of the MILP (25) is at least as good as the soluble graph relaxation bound, and sometimes strictly better thanks to constraints $(\mu,\delta)\in\mathcal{Q}^{b}$ .

7 Numerical experiments

In this section, we provide numerical experiments showcasing the results of the paper. In particular, on two examples of varying size, we study the impact of the valid inequalities. On such examples, we solve the MILP formulation (18) with improved McCormick bounds relying on Section D.3, and valid inequalities from Section 5 obtained from the RJT of Algorithm 1. More precisely we solve $\max\left\{ \sum_{v\in V^{\mathrm{r}}}\langle r_{v},\mu_{v}\rangle\mid(\mu,\delta)\in\mathcal{Q},\delta\in\Delta^{\mathrm{d}}\right\}$ where $\mathcal{Q}$ is one of the four following polytopes : ${\overline{\mathcal{Q}}}^{1}=\left(\overline{\mathcal{P}}\times\Delta\right)\cap\mathcal{Q}^{1}$ (no cuts), $\overline{\mathcal{Q}}^{b}=\left(\overline{\mathcal{P}}\times\Delta\right)\cap\mathcal{Q}^{b}$ (McCormick only), $\mathcal{Q}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!},1}=\left(\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}\times\Delta\right)\cap\mathcal{Q}^{1}$ (independence cuts only), $\mathcal{Q}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!},b}=\left(\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}\times\Delta\right)\cap\mathcal{Q}^{b}$ (McCormick and independence cuts).

The difficulty of an instance can be roughly measured by the number of feasible deterministic policies (Mauá et al., 2012b), i.e., $\big{|}\Delta^{\mathrm{d}}\big{|}$ . We have $\big{|}\Delta^{\mathrm{d}}\big{|}=\prod_{v\in V^{\mathrm{a}}}\big{|}\mathcal{X}_{v}\big{|}^{\prod_{u\in\mathrm{prt}(v)}\big{|}\mathcal{X}_{u}\big{|}}$ . Therefore, the difficulty depends exponentially on $\big{|}\mathcal{X}_{v}\big{|}$ for $v\in\mathrm{fa}(V^{\mathrm{a}})$ . In our examples, we assume that $\omega_{a}=\big{|}\mathcal{X}_{v}\big{|}$ for all $v\in\mathrm{fa}(V^{\mathrm{a}})$ and $\omega_{s}=\big{|}\mathcal{X}_{v}\big{|}$ for all $v\in V\backslash\mathrm{fa}(V^{\mathrm{a}})$ . Each instance is generated by first choosing $\omega_{a}$ and $\omega_{s}$ . We then draw uniformly on $[0,1]$ the conditional probabilities $p_{v|\mathrm{prt}(v)}$ for all $v\in V\backslash V^{\mathrm{a}}$ and on $[0,10]$ the rewards $r_{v}$ for all $v\in V^{\mathrm{r}}$ . We repeat the process $10$ times, and obtain therefore $10$ instances of the same size.

The results are reported in Table 1. The first column specifies the size of the problem, the second the approximate number of admissible strategies. The third column indicates the cuts used. In the last four columns, we report the integrity gap (*i.e., *the relative difference between the linear relaxation and the best integer solution), the final gap (relative difference between best integer solution and best lower bound), the improvement obtained over the solution given by SPU and the (shifted geometric mean of the) computation time for each instance. All gaps are given in percentage. Computing times are given in seconds and correspond to the shifted geometric mean of the time over $10$ instances. All values are averaged over the $10$ instances. In the last column, we write TL when the time limit is reached for the 10 instances of the same size. Sometimes, the time limit is reached only for some of the $10$ instances, and we end up with a non-zero average final gap together with an average computing time that is smaller than the time limit.

All mixed-integer linear programs have been written in Julia (Bezanson et al., 2017) with JuMP (Dunning et al., 2017) interface and solved using Gurobi 7.5.2. Experiments have been run on a server with 192Gb of RAM and 32 cores at 3.30GHz. For each program, we use a warm start solution obtained by running the SPU algorithm of Lauritzen and Nilsson (2001) on the instances.

For notational simplicity, and since it is unambiguous, in the rest of this section we use the same notation to refer to a given node of the graph and to refer to the random variable associated with this node.

7.1 Bob and Alice daily chess game

We consider the chess game example represented in Figure 2. The beginning of the RJT built by Algorithm 1 for this example is represented in Figure 8 Since $\vartheta_{a_{t-1}}\,{\mathchoice{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \displaystyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \textstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 2.625pt\kern-4.11108pt$ \scriptstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 1.875pt\kern-3.3333pt$ \scriptscriptstyle\not $\hss}{\perp}}}}_{G^{\dagger}}\,\mathrm{dsc}(a_{t})\mid\mathrm{fa}(a_{t})$ for all $t\in[T]$ , the chess game example is not a soluble ID, thus cannot be solved to optimality by SPU. Table 1(a) reports results on the generated instances.

In this problem we see that we can tackle large problems : we can reach optimality in less than one hour for a strategy set of size $10^{144}$ , and find a small provable gap on even bigger instances. Moreover, we see that the independance cuts reduce the computation time by a factor 100, whereas the improved McCormick bounds yield less impactfull improvements.

However, on this problem the SPU heuristic yields good results that are marginally improved by our MILP formulation. On this problem the main interest of our formulation is the bounds obtained. In the next problem we show better improvement.

7.2 Partially Observed Markov Decision Process with limited memory

Another classical example of ID is the Partially Observed Markov Decision Process (POMDP) introduced in Section 1. Figure 9 provides the graph representation of the POMDP with limited information. Since $\vartheta_{a_{t-1}}\,{\mathchoice{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \displaystyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \textstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 2.625pt\kern-4.11108pt$ \scriptstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 1.875pt\kern-3.3333pt$ \scriptscriptstyle\not $\hss}{\perp}}}}_{G^{\dagger}}\,\mathrm{dsc}(a_{t})|\mathrm{prt}(a_{t})$ for all $t\in[T]$ , this ID is not soluble. Figure 10 represents the RJT built by Algorithm 1.

However, for $v\in V^{\mathrm{a}}$ , $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}=\emptyset$ and the linear relaxation of Problem (18) does not enforce all the conditional independences that are entailed by the graph structure. Indeed, Theorem 9 ensures that the linear relaxation of the MILP (18) corresponds to solving Problem (3) on the graph $\overline{G}$ . For this example $\overline{G}$ corresponds to the MDP relaxation, in which the decision maker knows the state $s_{t}$ when he makes the decision $a_{t}$ . Therefore, the conditional independences $s_{t}\perp a_{t}|o_{t}$ is no more satisfied. Although we cannot enforce these independences with linear constraints, we propose slightly weaker independences: in particular, we propose an extended formulation corresponding to the bigger RJT represented in Figure 10 to enforce for $s_{t}$ to be conditionally independent from $a_{t}$ given $(s_{t-1},a_{t-1},o_{t})$ for $t>1$ . In such a RJT, we have $C_{a_{t}}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}=\big{\{}s_{t}\big{\}}$ . Therefore, we can derive valid equalities (20) in $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}$ . We demonstrate the efficiency of such inequalities by solving the different formulation on a set of instances, summed up in Table 1(b).

This example is harder to solve to optimality as we only reach strategy set of size $10^{72}$ . Further, we can see that there are some instances where SPU seems to reach a local maximum that is improved by our MILP formulation. Once again the valid cuts significantly reduce the root linear relaxation gap and the solving time, even on large instances.

Conclusion

This paper introduced linear and mixed integer linear programming approaches for the MEU problem on influence diagrams. The variables of the programs correspond, for the distributions induced by feasible policies, to the collection of vector of moments of the distribution on subsets of the variables that are associated to nodes of a new kind of junction tree, that we call a rooted junction tree. We have thus introduced as well algorithms to build rooted junction trees tailored to our linear and integer programs.

For soluble IDs, which are IDs whose MEU problem is easy, in the sense that it can be solved by the single policy update algorithm, we showed that it can also be solved exactly via our linear programs. Furthermore, we characterized soluble IDs as the IDs for which there exists a rooted junction tree such that the set of possible vector of moments on the nodes of the tree is convex for any parametrization of the influence diagram.

Finally, we proposed a mixed integer linear programming approach to solve the MEU problem on non-soluble IDs, together with valid cuts. The bound provided by the linear relaxation is better than the bound that could be obtained using SPU on a soluble relaxation. Numerical experiments show that the bound is indeed better in practice.

Appendix A Rooted junction tree properties

In this section we present further technical results on RJT that are usefull in the analysis of our approach. We start we generic properties of RJT.

Proposition 14.

Let $\mathcal{G}$ be an RJT on $G$ .

If there is a path from $u$ to $v$ in $G$ , then there is a path from $C_{u}$ to $C_{v}$ in $\mathcal{G}$ . 2. 2.

If $\overline{\mathrm{dsc}}_{G}(u)\cap\overline{\mathrm{dsc}}_{G}(v)\neq\emptyset$ , then either there is a unique path from $C_{u}$ to $C_{v}$ or from $C_{v}$ to $C_{u}$ in $\mathcal{G}$ .

Proof.

Let $\mathcal{G}$ be an RJT on $G$ . Consider a vertex $v$ of $G$ and a node $C$ of $\mathcal{G}$ containing $v$ . Since $C$ is a node of $\mathcal{G}_{v}$ , and by definition of $C_{v}$ , there exists a $C_{v}$ - $C$ path in $\mathcal{G}$ . Now consider $u\in\mathrm{prt}(v)$ . Since $\mathrm{fa}(v)\subseteq C_{v}$ , we have $u\in C_{v}$ . Thus there exists a $C_{u}$ - $C_{v}$ path in $\mathcal{G}$ . The first statement follows by induction along a $u$ - $v$ path in $G$ .

We now show the second statement. Let $w$ be a vertex in $\overline{\mathrm{dsc}}_{G}(u)\cap\overline{\mathrm{dsc}}_{G}(v)$ , then by the first statement there exists both a $C_{u}$ - $C_{w}$ and a $C_{v}$ - $C_{w}$ path in $\mathcal{G}$ . As $\mathcal{G}$ is a rooted tree, this implies the existence of either a $C_{u}$ - $C_{v}$ path or of a $C_{v}$ - $C_{u}$ path in $\mathcal{G}$ . ∎

The following lemma is key in proving Theorem 2.

Lemma 15.

Let $C,D$ be subsets of $V$ such that $\mathrm{fa}(D)\subseteq C$ and $\overline{\mathrm{dsc}}(D)\cap C=D$ . Any distribution $\mu_{C}$ on $C$ such that each $v$ in $D$ is independent from its non-descendants given its parents factorizes as $\mu_{C}=\displaystyle\mu_{C\backslash D}\prod_{v\in D}q_{v|\mathrm{prt}(v)}$ where $\displaystyle\mu_{C\backslash D}=\sum_{x_{D}}\mu_{C}$ and $q_{v|\mathrm{prt}(v)}$ is defined as $\frac{\sum_{x_{C\backslash\mathrm{fa}(v)}}\mu_{C}}{\sum_{x_{C\backslash\mathrm{prt}(v)}}\mu_{C}}$ when the denominator is non-zero, and as [math] otherwise.

Proof.

Let $\preceq$ be a topological order on $C$ such that $u\in C\backslash D$ and $v\in D$ implies $u\preceq v$ . Such a topological order exists since $\overline{\mathrm{dsc}}(D)\cap C=D$ . We have

[TABLE]

where the first equality is the chain rule and the second follows from the hypothesis of the lemma. ∎

Proof of Theorem 2.

Let $\mathcal{G}$ be an RJT on $G$ . Let $C_{1},\ldots,C_{n}$ be a topological ordering on $\mathcal{G}$ , let $C_{\leq i}=\bigcup_{j\leq i}C_{j}$ , and $C_{<i}=C_{\leq_{i}}\backslash C_{i}$ . Let $\tau$ be a vector of moments satisfying the hypothesis of the theorem, and for each $v$ in $V$ , let $q_{v|\mathrm{prt}(v)}$ be equal to $\frac{\sum_{x_{C\backslash\mathrm{fa}(v)}}\tau_{C_{v}}}{\sum_{x_{C\backslash\mathrm{prt}(v)}}\tau_{C_{v}}}$ if the denominator is non-zero, and to [math] otherwise. We show by induction on $i$ that

[TABLE]

Suppose the result true for all $j<i$ , with the convention that $\mu_{0}=1$ . We immediately deduce from the induction hypothesis that $\tau_{C_{i^{\prime}}}=\sum_{x_{C_{\leq i}\backslash C_{i^{\prime}}}}\mu_{C_{\leq i}}$ for all ${i^{\prime}}<i$ . It only remains to prove to prove $\tau_{C_{i}}=\sum_{x_{C_{<i}}}\mu_{C_{\leq i}}$ . By definition of an RJT, we have $\mathrm{fa}(\mathring{C_{i}})\subseteq C_{i}$ . Proposition 14 implies that $\mathrm{dsc}(\mathring{C_{i}})\cap C_{i}\subseteq\mathring{C_{i}}$ . Indeed let $u$ be in $\mathrm{dsc}(\mathring{C_{i}})\cap C_{i}$ . Then there is a $C_{i}$ - $C_{u}$ path as $u\in\mathrm{dsc}(C_{i})$ , and a $C_{u}$ - $C_{i}$ path as $u\in C_{i}$ . Hence $C_{u}=C_{i}$ and $u\in\mathring{C_{i}}$ . By Lemma 15, we have $\tau_{C_{i}}=\tau_{C_{i}\backslash\mathring{C_{i}}}\prod_{v\in\mathring{C_{i}}}q_{v|\mathrm{prt}(v)}$ . Let $C_{j}$ be the parent of $C_{i}$ in $\mathcal{G}$ , we have $\tau_{C_{i}\backslash\mathring{C_{i}}}=\sum_{x_{C_{j}\backslash C_{i}}}\tau_{C_{j}}=\sum_{x_{C_{<i}\backslash C_{i}}}\mu_{C_{<i}}$ , the first equality coming from the fact that $(\tau_{C})_{C\in\mathcal{V}}$ belongs to the local polytope, and the second from the induction hypothesis. Thus,

[TABLE]

which gives the induction hypothesis, and the theorem.

∎

Proof of Proposition 3.

Algorithm 1 obviously converges given that it has only a finite number of iterations. If $G$ is not connected, the algorithm is equivalent to its separate application on each of the connected components, which each yield a tree. W.l.o.g., we prove properties of the algorithm under the assumption that $G$ is connected. To simplify notations, we denote $C^{\prime}_{v}$ by $C_{v}$ , and check that it indeed corresponds to the root node of $v$ .

We first prove that $\preceq$ is a topological order on $\mathcal{G}$ . First, remark that $(u\in C_{v})\Rightarrow(u\preceq v)$ . Indeed, if $u\in C_{v},$ then either Step 4 of the algorithm ensures that $u\in\mathrm{fa}(v)$ and $u\preceq v$ or $u\notin\mathrm{fa}(v)$ and there exists $x$ such that $u\in C_{x}$ and $C_{v}\rightarrow C_{x}$ . But by Step 6 of Algorithm 1, the fact that $C_{v}\rightarrow C_{x}$ entails that $v$ is the maximal element of $C_{x}\backslash\{x\}$ for the topological order, so that $u\prec v$ . Furthermore, Step 6 ensures that $(C_{u},C_{v})\in\mathcal{A}$ implies $u\in C_{v}$ . We deduce from the previous result that $(C_{u},C_{v})\in\mathcal{A}$ implies $u\preceq v$ , and $\preceq$ is a topological order on $\mathcal{G}$ .

Then we show that (9) holds. We first show that $(u\in C_{v})\Rightarrow C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v}$ and $u\in C^{\prime}$ for any $C^{\prime}$ on path $C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v}$ . Either $u=v$ and this is obvious, or $u\in\mathrm{prt}_{\mathcal{G}}(C_{v})$ ; and by recursion either $C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v}$ or $u\in C_{r}$ with $C_{r}$ the root of the tree which is also the first element in the topological order. But, unless $u=r$ , this is excluded given that $u\in C_{r}$ implies $u\preceq r$ . Note that this shows that $C_{u}$ is indeed the unique minimal element of the set $\{C\colon u\in C\}$ for the partial order defined by the arcs of the tree. To show the first part of (9), we just need to note that either $u\in\mathrm{fa}(v)$ and the result holds, or there must exist $x$ such that $C_{v}\rightarrow C_{x}$ and $u\in C_{x}$ and by recursion, there exists $w$ such that $C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{w}$ and $u\in\mathrm{fa}(w)$ .

Finally, we prove that we have constructed an RJT. Indeed, if two vertices $C_{v}$ and $C_{v^{\prime}}$ contain $u$ then since $\mathcal{G}$ is singly connected, the trail connecting $C_{v}$ and $C_{v^{\prime}}$ must be composed of vertices on the paths $C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{3.59995pt}{0.0pt}\pgfsys@lineto{13.6575pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{3.59995pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{u}$ and $C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v^{\prime}}$ , and we have shown in the previous paragraph that that $u$ belongs to any $C^{\prime}$ on $C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{3.59995pt}{0.0pt}\pgfsys@lineto{13.6575pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{3.59995pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{u}$ and $C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v^{\prime}}$ , and so the running intersection property holds. Finally, property (ii) of Definition 1 must holds because the fact that $C_{u}$ is minimal among all cluster vertices containing $u$ together with the running intersection property entails that the cluster vertices containing $u$ are indeed a subtree of $\mathcal{G}$ with root $C_{u}$ . ∎

Proposition 3 provides a justification for Algorithm 1, but it characterizes the content of the cluster vertices based on the topology of the obtained RJT, which is itself produced by the algorithm (note that the composition of cluster vertices depends only on $\preceq$ via the partial order of the tree). The cluster nodes of any RJT and those produced by Algorithm 1 admit however more technical characterizations using only $\preceq$ and the information in $G$ , which we present next. These characterizations will be useful in Appendix B. For each vertex $v$ in $V$ , let

[TABLE]

Proposition 16.

Let $\mathcal{G}=(\mathcal{V},\mathcal{A})$ be an RJT satisfying $\mathring{C_{v}}=\{v\}$ , and $\preceq$ be a topological order on $\mathcal{G}$ . Then $\preceq$ induces a topological order on $G$ and

[TABLE]

Proof.

Let $\mathcal{G}=(\mathcal{V},\mathcal{A})$ be an RJT satisfying $\mathring{C_{v}}=\{v\}$ , and $\preceq$ be a topological order on $\mathcal{G}$ . Property 1 in Proposition 14 ensures that $\preceq$ induces a topological order on $G$ .

We start by proving (29a). Let $v$ and $w$ be vertices such that $w\succ v$ and that there is a $v$ - $w$ trail $Q$ in $V_{\succeq v}$ . Let $s_{0},\ldots,s_{k}$ be the nodes where $Q$ has a v-structure and $t_{1},\ldots,t_{k}$ the nodes with diverging arcs in $Q$ . Note that, since the trail is included in $V_{\succeq v}$ , the first nodes of the trail have to be immediate descendants of $v$ in $G$ so that the trail takes the form $v\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}s_{0}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{3.59995pt}{0.0pt}\pgfsys@lineto{13.6575pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{3.59995pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}t_{1}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$ $s_{1}\ldots t_{k}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}s_{k}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{3.59995pt}{0.0pt}\pgfsys@lineto{13.6575pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{3.59995pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}w,$ where possibly $s_{k}=w$ and the last arc is not present. Then, given that $v\prec s_{0}$ , and that $\preceq$ is topological for $\mathcal{G}$ , Proposition 14.2 implies that ${C_{v}}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}{C_{s_{0}}}$ . But by the same argument, Property 2 in Proposition 14 implies ${C_{t_{1}}}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}{C_{s_{0}}}$ , but since $\mathcal{G}$ is a tree and $v\prec t_{1}$ , we must have ${C_{v}}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}{C_{t_{1}}}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}{C_{s_{1}}}.$ By induction on $i$ , we have ${C_{v}}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}{C_{s_{i}}}$ and thus ${C_{v}}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}{C_{w}},$ which shows Equation 29a.

We now prove (29d). Let $u$ and $v$ be two vertices such that $u\preceq v$ and there is a $u$ - $v$ trail $P$ with $P\backslash\{u\}\subseteq V_{\succeq v}$ . Let $w$ be the vertex right after $u$ on $P$ . We have $u\in\mathrm{fa}(w),$ $w\succeq v$ and there is a $v$ - $w$ trail in $V_{\succeq v}$ , which implies $C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{w}$ by (29a). But, since $u\preceq v$ , the $u$ - $v$ trail is also in $V_{\succeq u}$ , which similarly shows that $C_{u}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{v}$ . So by (8) we have proved (29d). ∎

Proposition 17.

*The graph $\mathcal{G}=(\mathcal{V},\mathcal{A})$ produced by Algorithm 1 is the unique RJT satisfying $\mathring{C_{v}}=\{v\}$ such that the topological order $\preceq$ on $G$ taken as input of Algorithm 1 induces a topological order on $\mathcal{G}$ and the implications in (29) are equivalences. *

Proof.

Note that some visual elements of the proof are given in Figure 11. It is sufficient to prove the following inclusions

[TABLE]

Indeed, note that by Proposition 3, the obtained tree is an RJT so that, by Proposition 16, the reverse inequalities hold.

We prove the result by backward induction on (30b) and (30a). For a leaf $C_{v}$ of $\mathcal{G},\>\overline{\mathrm{dsc}}_{\mathcal{G}}(C_{v})=\{C_{v}\}$ so that (30a) holds trivially and $C_{v}=\mathrm{fa}(v)$ so that (30b) holds because $u\in\mathrm{fa}(v)$ implies $u\preceq v$ . Then, assume the induction hypothesis holds for all children of a node $C_{v}$ .

We first show (30a) for $C_{v}$ , i.e. that $(C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{w})\Rightarrow(w\in T_{\succeq v})$ (see Figure 11). Let $C_{x}$ be the child of $C_{v}$ on the path $C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{w}$ . By Proposition 3, we have $v\prec x$ , so that $V_{\succeq x}\subset V_{\succeq v}$ . Then, using the induction hypothesis, by (30b), $(v\in C_{x})$ implies that there is a $v$ - $x$ trail in $V_{\succeq v}$ , and by (30a), $C_{x}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{w}$ implies there is a trail $x$ - $w$ in $V_{\succeq x}$ , so there is a $v$ - $w$ trail in $v\prec x$ in $V_{\succeq v}$ , which shows the result.

We then show (30b) for $C_{v}$ (see Figure 11). Indeed if $u\in C_{v}$ , either $u\in\mathrm{fa}(v)$ and $u$ is in the RHS of (30b), or there exists a child of $C_{v}$ , say $C_{x}$ such that $u\in C_{x}$ and $u\prec v$ , because the algorithm imposes $v=\max_{\preceq}(C_{x}\backslash\{x\})$ . Since $C_{v}\leavevmode\hbox to20.32pt{\vbox to6.67pt{\pgfpicture\makeatletter\hbox{\hskip 3.33301pt\lower-3.33301pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.6575pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{3.0pt,3.0pt}{0.0pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{10.05756pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{10.05756pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}C_{x}$ there exists a path $v$ - $x$ in $V_{\succeq v}$ , and using induction, by (30b), $(u\in C_{x})$ implies that $\exists w$ such that $u\in\mathrm{fa}(w)$ and there exists a trail $w$ - $x$ in $T_{\succeq v}$ . But we have shown in Proposition 3 that $(v\in C_{x})\Rightarrow(v\preceq x)$ , so $T_{\succeq x}\subset T_{\succeq v}$ and we have shown that there exists a $v$ - $w$ trail in $T_{\succeq v}$ with $u\preceq v$ and $u\in\mathrm{fa}(w)$ , which shows the result. ∎

Appendix B Proofs of Section 6

For any set $C$ and binary relation $R$ , we denote $C_{\mathbin{\mathrm{R}}v}$ the set of vertices $u$ in $C$ such that $u\mathbin{\mathrm{R}}v$ . The following lemma will be useful in the proof of Lemma 13. Let $H$ denote the relevance graph of $G$ .

Lemma 18.

In general, $u\in\mathrm{dsc}(v)\Rightarrow(v,u)\in H$ . But, when $G$ is soluble, then $u\in\mathrm{dsc}(v)\Leftrightarrow(v,u)\in H.$

Proof.

Assume that $u$ is s-reachable from $v$ , that is $(v,u)$ is an arc in $H$ . We first show that this implies that $u$ and $v$ have descendants in common. Indeed, by definition of s-reachability, this means that there exist $w\in\mathrm{dsc}(v)$ and an active trail $T$ from $\vartheta_{u}$ to $w$ . Either, $T$ is a directed path and $w$ is also a descendant of $u$ or $T$ must have a v-structure. In the latter case, let $x$ be the node with the v-structure closest to $\vartheta_{u}$ on $T$ ; since the trail is active, we must have $x\in\mathrm{fa}(v)$ but since $x$ is a descendant of $u$ , in that case, $v$ must be a descendant of $u$ . In both cases considered $u$ and $v$ have descendants in common. Now, if $u$ is not a descendant of $v$ , then $v$ is s-reachable from $u$ , which is not possible as $H$ is acyclic. Hence $u\in\mathrm{dsc}(v)$ . ∎

As an immediate consequence, we obtain the following corollary.

Corollary 19.

If $G$ is soluble and $\preceq$ is a topological order on $G$ , then its restriction $\preceq_{H}$ to $V^{\mathrm{a}}$ is a topological order on the relevance graph $H$ .

Proof of Lemma 13.

Let $G$ be a soluble influence diagram. We start by proving that Algorithm 2 with $G$ as an input returns an RJT $\mathcal{G}$ . It suffices to show that it is possible to compute topological orderings in step 9, that is, to prove that $H$ , $G^{\prime}$ and $G^{\prime\prime}$ , defined in Algorithm 2, are acyclic. $H$ is acyclic because the ID is soluble. We now prove that $G^{\prime}$ is acyclic. As $G$ is acyclic and by definition of $G^{\prime}$ , a cycle in $G^{\prime}$ contains necessarily two vertices of $V^{\mathrm{a}}$ . Let $u$ and $v$ thus be two distinct elements of $V^{\mathrm{a}}$ . Remark that, if there exists a path from $u$ to $v$ in $G$ , then $v$ is strategically reachable from $u$ , and $u\preceq_{H}v$ . Hence, by definition of $G^{\prime}$ , the indices of vertices in $V^{\mathrm{a}}$ for $\preceq_{H}$ can only increase along a path in $G^{\prime}$ . There is therefore no cycle in $G^{\prime}$ containing two vertices in $V^{\mathrm{a}}$ , and thus no cycle in $G^{\prime}$ . We now prove that $G^{\prime\prime}$ is acyclic. Suppose that there is a cycle in $G^{\prime\prime}$ . Let $\preceq_{G^{\prime}}$ be a topological order on $G^{\prime}$ , and let $v_{h}$ be the smallest vertex $v$ for $\preceq_{G^{\prime}}$ in the cycle such that there is an arc $(u,v)$ in $E^{\prime\prime}\backslash E^{\prime}$ in the cycle. And let $(u_{h},v_{h})$ be the corresponding arc in the cycle. Let $(u_{l},v_{l})$ be the arc of $E^{\prime}$ right before $(u_{h},v_{h})$ in the cycle such that $v_{l}\in V^{\mathrm{a}}$ . Arc $(u_{l},v_{l})$ is possibly identical to $(u_{h},v_{h})$ . By definition of $G^{\prime}$ , given two disjoint vertices $u$ and $v$ in $V^{\mathrm{a}}$ , either $(u,v)\in E^{\prime}$ or $(v,u)\in E^{\prime}$ . Since $v_{h}\preceq_{G^{\prime}}v_{l}$ by definition of $v_{h}$ , we have either $v_{h}=v_{l}$ or $(v_{h},v_{l})\in E^{\prime}$ . And since all the arcs in the $v_{l}$ - $u_{h}$ subpath of the cycle are in $E^{\prime}$ , we have $u_{h}\in\overline{\mathrm{dsc}}_{G^{\prime}}(v_{l})$ . Hence $u_{h}\in\mathrm{dsc}_{G^{\prime}}(v_{h})$ , which contradicts the definition of $E^{\prime\prime}$ in Step 7. Hence, Algorithm 2 always returns an RJT, which we denote by $\mathcal{G}$ .

It remains to prove that $\mathcal{G}$ is such that $C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\subseteq\mathrm{fa}(v)$ for each decision vertex $v\in V^{\mathrm{a}}$ . We start with two preliminary results. Remark that $E\subseteq E^{\prime}$ implies that $\preceq$ is a topological order on $G$ . Let $\preceq_{H}$ denotes its restriction to $V^{\mathrm{a}}$ . Corollary 19 ensures that $\preceq_{H}$ is a topological order on $H$ . Hence, we have

[TABLE]

Furthermore, if $u\in V^{\mathrm{a}}$ and $v\in V^{\mathrm{s}}_{\succeq u}$ , the definition of $G^{\prime}$ implies the existence of a path from $u$ to $v$ in $G$ , and hence the existence of a path from $V^{\mathrm{a}}_{\succeq u}$ to $v$ in $G$ .

We now prove $C_{v}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\subseteq\mathrm{fa}(v)$ for each $v\in V^{\mathrm{a}}$ . This part of the proof is illustrated on Figure 12.a. Let $v$ be a vertex in $V^{\mathrm{a}}$ , let $u\in C_{v}\backslash\mathrm{fa}(v)$ , and let $b\in V^{\mathrm{a}}_{\prec u}$ . We only have to prove that $u$ is d-separated from $\vartheta_{b}$ given $\mathrm{fa}(v)$ . We start by proving that $u$ and $v$ have common descendants. Proposition 17 guarantees that (29d) is an equivalence. Hence, there exists a $u$ - $v$ trail in $V_{\succ v}$ . Consider such a $u$ - $v$ trail $Q$ with a minimum number of v-structures. Suppose for a contradiction that $Q$ has more than one v-structure. Starting from $v$ , let $w_{1}$ be the first v-structure of $Q$ and $u_{1}$ bet its first vertex with diverging arcs $u_{1}$ . Using the result at the end of the previous paragraph, we have $u_{1}\in\overline{\mathrm{dsc}}(V^{\mathrm{a}}_{\succeq v})$ . Since $Q$ has been chosen with a minimal number of $v$ -structures, we obtain $u_{1}\in\overline{\mathrm{dsc}}(V^{\mathrm{a}}_{\succ v})$ . Let $a_{1}$ denote an ascendant of $u_{1}$ in $V^{\mathrm{a}}_{\succ v}$ . Since $w_{1}\in\mathrm{dsc}(v)$ , Equation (31) ensures that $w_{1}\perp\vartheta_{v}\mid\mathrm{fa}(a_{1})$ . Hence, the $v$ - $w_{i}$ path is not active given $\mathrm{fa}(a_{1})$ , and it therefore necessarily intersects $\mathrm{prt}(a_{i})$ . Hence, $u_{1}\in\mathrm{dsc}(v)$ , and $Q$ there exists a $u$ - $v$ trail $Q$ with fewer v-structures than $Q$ , which gives a contradiction. Trail $Q$ therefore has a unique $v$ -structure, and $u$ and $v$ have a common descendant $w$ . Hence, if $\vartheta_{b}$ - $u$ trail $P$ is active given $\mathrm{fa}(v)$ , then $P$ followed by a $u$ - $w$ path is active given $\mathrm{fa}(v)$ . The fact that $\mathrm{dsc}(v)\perp\vartheta_{b}|\mathrm{fa}(v)$ ensures that there is no-such path $P$ , and we have proved that $u$ is d-separated from $\vartheta_{b}$ given $\mathrm{fa}(v)$ .

Conversely, let $G$ be a non-soluble influence diagram, and $\mathcal{G}$ an RJT on $G$ . Let $u$ and $b$ be two vertices in $V^{\mathrm{a}}$ such that $\mathrm{dsc}(v){\mathchoice{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \displaystyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \textstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 2.625pt\kern-4.11108pt$ \scriptstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 1.875pt\kern-3.3333pt$ \scriptscriptstyle\not $\hss}{\perp}}}}\vartheta_{u}|\mathrm{prt}(v)$ and $\mathrm{dsc}(u){\mathchoice{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \displaystyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 3.75pt\kern-5.27776pt$ \textstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 2.625pt\kern-4.11108pt$ \scriptstyle\not $\hss}{\perp}}}{\mathrel{\hbox to0.0pt{\kern 1.875pt\kern-3.3333pt$ \scriptscriptstyle\not $\hss}{\perp}}}}\vartheta_{v}|\mathrm{prt}(u)$ . Without loss of generality, we assume that if there is a path between $C_{u}$ and $C_{v}$ , it is from $C_{v}$ to $C_{u}$ . To prove the converse, we prove that $C_{u}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\neq\mathrm{fa}(u)$ . This part of the proof is illustrated on Figure 12.b. There exists an active trail $Q$ from $w\in\mathrm{dsc}(u)$ to $\vartheta_{v}$ given $\mathrm{prt}(u)$ . Starting from $w$ , let $x$ be the first vertex with diverging arcs of $Q$ if $Q$ contains such a structure, and be equal to $v$ otherwise. And let $P$ be the $w$ - $x$ subtrail of $Q$ . Remark that $P$ must be an $x$ - $w$ path in $G$ , because any passing v-structure on $P$ cannot be at a descendant of $w$ , for it would then be a descendant of $u$ which could not have any descendant in $\mathrm{fa}(u)$ as $G$ is acyclic. The path $P$ contains no v-structure, and is active given $\mathrm{fa}(u)$ . Hence, it does not intersect $\mathrm{fa}(u)$ . Since $x$ and $u$ have $w$ as common descendant, Proposition 14 ensures that $C_{x}$ and $C_{u}$ are on the same branch of $\mathcal{G}$ . If $v=x,$ $x\in\mathrm{asc}(w)$ and there is a path in $\mathcal{G}$ from $C_{x}$ to $C_{w}$ , moreover, since we assumed $C_{u}$ is a descendant of $C_{v}$ in $\mathcal{G}$ , and since $u\in\mathrm{asc}(w),$ then the path from $C_{x}$ to $C_{w}$ contains $C_{u}$ and all the vertices of $P$ . Now, if $x\neq v$ , then $x$ is the first vertex with diverging arcs, and in that case it belongs to $\mathrm{asc}(u)$ , because $Q\setminus P$ must contain at least one v-structure and any such v-structure can only be at a node in $\mathrm{asc}(u)$ . So, again, there is a path in $\mathcal{G}$ from $C_{x}$ to $C_{w}$ which contains $C_{u}$ and all the vertices of $P$ . Starting from $x$ , let $y$ be the last vertex of $P$ such that $C_{y}$ is above $C_{u}$ in $\mathcal{G}$ , and $z$ be the child of $y$ in $P$ . But since $Q$ is active, the $y$ - $\vartheta_{v}$ subtrail of $Q$ is active given $\mathrm{fa}(u)$ , and we therefore have $C_{u}^{{\!\scriptscriptstyle\mathchoice{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \displaystyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 5.69441pt\kern-2.63887pt$ \textstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 2.30557pt\kern-2.05554pt$ \scriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}{\mathrel{\hbox to0.0pt{\kern 1.52779pt\kern-1.66664pt$ \scriptscriptstyle\not $\hss}{\,\bot\!\!\!\bot\,}}}\!}}\neq\mathrm{fa}(u)$ . ∎

Proof of Theorem 12.

If $G$ is soluble, Lemma 13 ensures that Algorithm 2 builds an RJT $\mathcal{G}$ such that $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!=G$ , and Theorem 9 ensures that $\mathcal{P}^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}=\mathcal{S}(P)$ .

Consider now the result for non-soluble IDs. Let $G$ be a non-soluble influence diagram. Let $a$ and $b$ be two decision vertices that are both strategically dependent on the other one.

First, we suppose that $a\notin\mathrm{asc}(b)$ and $b\notin\mathrm{asc}(a)$ . Let $P$ be a path from $a$ to $w\in\mathrm{dsc}(b)$ with a minimum number of arcs, and $Q$ be a $b$ - $w$ path with a minimum number of arcs. Then $w$ is the unique vertex in the intersection of $P$ and $Q$ . Let $u$ and $v$ be the parents of $w$ in $P$ and $Q$ respectively. Consider a parametrization where all the variables that are not in $P$ or $Q$ are unary, all the variables in $P$ and $Q$ are binary, all the variables in the $a$ - $u$ subpath of $P$ are equal to $X_{a}$ , all the variables in the $b$ - $v$ subpath of $P$ are equal to $X_{b}$ , and $p_{w|\mathrm{prt}(w)}$ is defined arbitrarily. Let $\mathcal{G}$ be an arbitrary junction tree, $C$ be its cluster containing $\mathrm{fa}(w)$ . Then choosing a distribution $\mu_{a}$ as policy $\delta_{a}$ and a distribution $\mu_{b}$ as policy $\delta_{b}$ implies that the restriction of $\mu_{C}$ to $X_{uv}$ is $\mu_{uv}=\mu_{a}\mu_{b}$ . Hence, the marginalization on $X_{uv}$ of the set of distributions $\mu_{C}$ that can be reached for different policy is the set of independent distributions, which is not convex. Hence, $\mathcal{S}(\mathcal{G})$ is not convex.

We now consider the case where $a\in\mathrm{asc}(b)$ or $b\in\mathrm{asc}(a)$ . W.l.o.g., we suppose $a\in\mathrm{asc}(b)$ . There exists a trail from $\vartheta_{a}$ to $w$ in $\mathrm{dsc}(b)$ that is active given $\mathrm{prt}(b)$ . Let $Q$ be such a trail with a minimum number of $v$ -structures. And let $P$ be a $b$ - $w$ path. W.l.o.g., we suppose that $w$ is the only vertex in both $P$ and $Q$ . Let $w_{b}$ be the parent of $w$ on $P$ and $w_{s_{0}}$ its parent in $Q$ . Starting from $w$ , let $s_{0},\ldots,s_{k-1}$ denote the vertices with divergent arcs in $Q$ , let $t_{1},\ldots,t_{k}$ the $v$ -structures, and $p_{\ell}$ denote the parent of $b$ that is below $t_{\ell}$ . Finally let $s^{\prime}_{\ell}$ (resp. $s^{\prime\prime}_{\ell}$ ) denote the parent of $t_{\ell}$ (resp $t_{\ell+1}$ ) on the $s_{\ell}$ - $t_{\ell}$ subpath (resp. $s_{\ell}$ - $t_{i+1}$ subpath) of $Q$ . The structures that we have just exhibited entail that $G$ contains a subgraph of the form represented on Figure 13. Each dashed arrows correspond to a path whose length may be equal to [math], in which case the vertices connected by the path are the same.

We now introduce a game that we will be able to encode on the graph of Figure 13 and hence on $G$ . This game is a dice game with two players $a$ and $b$ . Before rolling a uniform die with three faces, player $a$ chooses to play $1$ or $2$ , where “playing $i$ ” means observing if the die is equal to $i$ , and passing this information to player $b$ . The die $s_{0}$ is rolled. If $a$ has played $1$ (resp. $2$ ), he passes the information true to $b$ if the die $s_{0}$ is equal to $1$ (resp. $2$ ), and false if it is equal to $2$ (resp. $1$ ), or something else $\mathsf{e}$ . Player $b$ does not know what $a$ has played. Based on the information he receives, player $b$ decides to play $1$ , $2$ , or joker, that we denote $\mathsf{j}$ . If he plays $\mathsf{j}$ , then none of the player either earns or loses money. If he plays $i$ in $\{1,2\}$ , then both players earn $i$ euros if die $s_{0}$ is equal to $i$ , and lose 10 euros otherwise. The goal is maximize the expected payoff. This game has two locally optimal strategies $\delta^{1}$ and $\delta^{2}$ . In strategy $\delta^{i}$ , player $a$ plays $i$ and $b$ plays $i$ if he receives true and $\mathsf{j}$ otherwise. Both strategy are locally optimal: each players decision is the best possible given the other ones. But only strategy $\delta^{2}$ is globally optimal.

It changes nothing to the game if we add $k-1$ coin tosses $X_{s_{1}},\ldots,X_{s_{k-1}}$ , and player $b$ observes the $k$ equality tests $X_{t_{1}},\ldots,X_{t_{k-1}}$ , where $X_{t_{\ell}}=\mathbbm{1}(X_{s_{\ell-1}}=X_{s_{\ell}})$ . Indeed, player $b$ can compute $\sum_{\ell=1}^{k}x_{p_{\ell}}$ and knows that $X_{a}=X_{s_{0}}$ if and only if this sum is even. The parameterization of the influence diagram that enables to encode this game is specified on the right part of Figure 13. For any $x$ , the mapping $\mathbbm{1}_{x}(\cdot)$ is the indicator function of $x$ . All the variables that are not on Figure 13 or on the paths on Figure 13 are unary. All the variables along paths represented by dashed arrows are equal. Policies $\delta^{i}$ can therefore be defined as

[TABLE]

where $\mathbbm{1}_{i}$ is the Dirac in $i$ . A technical case to handle is the one where $a=a^{\prime}=t_{k}$ . In that case, we define $\mathcal{X}_{a}=\{0,1\}$ and $\delta_{a}^{i}=\mathbbm{1}_{i}(X_{s_{k-1}}^{\prime\prime})$ .

Consider now a junction tree $\mathcal{G}$ on $G$ . Let $C$ be a node of $\mathcal{G}$ that contains $\mathrm{fa}(w)$ . Then $C$ contains both $w_{b}$ and $w_{s_{0}}$ . Let $\mu_{C}^{1}$ and $\mu_{C}^{2}$ be the distributions induced by $\delta^{1}$ and $\delta^{2}$ on $X_{C}$ , and $\mu_{bs_{0}}^{1}$ and $\mu_{bs_{0}}^{2}$ their marginalizations on $X_{w_{b}w_{s_{0}}}$ . Since $X_{w_{b}}=X_{b}$ and $X_{w_{s_{0}}}=X_{s_{0}}$ , $\mu_{bs_{0}}^{1}$ and $\mu_{bs_{0}}^{2}$ are the distributions induced by $\delta^{1}$ and $\delta^{2}$ on $X_{bs_{0}}$ . Let $\mu_{bs_{0}}=\frac{\mu_{bs_{0}}^{1}+\mu_{bs_{0}}^{2}}{2}$ . Denoting again $\mathbbm{1}_{x}$ the Dirac at $x$ , we have

[TABLE]

We claim that there is no policy $\delta$ that induces distribution $\mu_{bs_{0}}$ on $X_{bs_{0}}$ . Indeed, in a distribution induced by a policy $\delta$ , it follows from the parametrization that if $\mathbb{P}(X_{a}=1)<1$ and $\mathbb{P}(X_{b}=1)>0$ , then $\mathbb{P}(X_{b}=1,X_{u}=\mathsf{e})>0$ . And, if $\mathbb{P}(X_{a}=2)<1$ and $\mathbb{P}(X_{b}=2)>0$ , then $\mathbb{P}(X_{b}=2,X_{u}=\mathsf{e})>0$ . (In both claims, “if $\mathbb{P}(X_{a}=1)<1$ ” must be replaced by “if $\delta_{a}(x_{s_{k-1}})\neq\mathbbm{1}_{i}(x_{s_{k-1}})$ ” when $a=t_{k}$ ). As $\mu_{ub}$ is such that $\mathbb{P}(X_{b}=1)>0$ , $\mathbb{P}(X_{b}=2)>0$ , and $\mathbb{P}(X_{b}=1,X_{u}=\mathsf{e})=0$ , it cannot be induced by a policy. Hence, $\mathcal{S}(\mathcal{G})$ is not convex. ∎

Appendix C Algorithm to build a small RJT

In this section we present an algorithm to build a RJT without considering a topological ordering on the initial graph $G=(V,A)$ .

The only difference between Algorithms 1 and 3 is that the for loop along a reverse topological ordering of Algorithm 1 is replaced in Algorithm 3 by a breadth first search that computes online this reverse topological ordering. Hence, if we denote $\preceq$ this ordering, Algorithm 3 builds the same RJT as the one we obtain when we use Algorithm 1 with $\preceq$ in input. Therefore, the RJT built by Algorithm 3 satisfies 9, and is such that the implications in (29) are equivalence.

Furthermore, Steps 5 and 6 enable to ensure that, when there is no path between a vertex $u\in V^{\mathrm{a}}$ and a vertex $v\in V^{\mathrm{s}}$ , then $u$ is placed before $v$ in the reverse topological ordering computed by the breadth first search. Therefore, $\preceq$ is a topological ordering on the graph $G^{\prime\prime}$ used as Step 9 of Algorithm 2. Hence, if $G$ is soluble, Algorithm 3 builds a RJT such that $G^{\!\scriptscriptstyle\,\bot\!\!\!\bot\,}\!=G$ .

Remark that on non-soluble IDs, Steps 5 and 6 are a heuristic aimed at minimizing the size of $C_{v}^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$ for each $v$ in $V^{\mathrm{s}}$ . Such a heuristic is not relevant if valid cuts (20) are not used. In that case, an alternative strategy could be to add as few variable as possible to $C_{v}$ for $v$ in $V^{\mathrm{a}}$ to improve the quality of the soluble relaxation $\overline{G}$ . This could be done by putting vertices $u$ in $V^{\mathrm{s}}$ unrelated to $v\in V^{\mathrm{a}}$ after in this topological order, i.e., by replacing $V^{\mathrm{a}}$ by $V^{\mathrm{s}}$ in Steps 5 and 6.

Appendix D McCormick Relaxation

McCormick inequalities allow to turn the NLP formulation (11) into the MILP formulation (18). Further good bounds ease the resolution of Problem (18). In this section we first discuss these relaxation, show that in the NLP formulation loose bounds are useless while tight bounds improve the formulation. Finally, we give an algorithm to compute good quality bounds.

D.1 Review of McCormick’s relaxation

For the sake of completeness we briefly recall McCormick’s relaxation, and condition for exactness if all of the variables but one are binary.

Proposition 20.

Consider the variables $(x,y,z)\in[0,1]^{3}$ and the following constraint

[TABLE]

Further, assume that we have an upper bound $y\leq b$ . We call ${\rm{McCormick}}\eqref{eq:prod_cst}$ the following set of contraints

[TABLE]

If $x,y$ and $z$ satisfy eq. 32, then they also satisfy eq. 33. If $x$ is a binary variable (that is $x\in\{0,1\}$ ) and eq. 33 is satisfied, then so is eq. 32.

Proof.

Consider $x\in[0,1]$ , $y\in[0,b]$ and $z\in[0,1]$ , such that $z=xy$ . Noting that $(1-x)(b-y)\geq 0$ we obtain Constraint (33a). Constraints (33b) and (33c) are obtained by upper bounding by bounding one variable. Now assume that $x\in\{0,1\}$ , $y\in[0,b]$ and $z\in[0,1]$ satisfy eq. 33. Then, if $x=1$ , constraints (33a) and (33b) yield $z=y$ . Otherwise, as $z\geq 0$ , we have $z=0$ by (33c). ∎

D.2 Choice of the bounds in McCormick inequalities

D.2.1 Using $b_{\check{C}_{v}}=1$ leads to loose constraints

As $\mu_{\check{C}_{v}}$ is a probability distribution, $1$ is an immediate upper bound on $\mu_{\check{C}_{v}}$ . Let $\mathcal{Q}^{1}$ be the polytope $\mathcal{Q}^{b}$ obtained using bounds vector $b$ defined by $b_{\check{C}_{v}}=1$ for all $v$ in $V^{\mathrm{a}}$ .

Proposition 21.

Let $\mu$ be in $\overline{\mathcal{P}}$ . Then there exists $\delta$ in $\Delta$ such that $(\mu,\delta)$ belongs to $\mathcal{Q}^{1}$ , and the linear relaxation of (18) is equal to $\displaystyle\max_{\mu\in\overline{\mathcal{P}}}\sum_{v\in V^{\mathrm{r}}}\langle r_{v},\mu_{v}\rangle$ .

Proof.

Let $v$ be a vertex in $V^{\mathrm{a}}$ , and let

[TABLE]

where $e_{v}$ is an arbitrary element of $\mathcal{X}_{v}$ . To prove the result, we show that ( ${\rm McCormick}(v,b)$ ) is satisfied for this well-chosen $\delta_{v|\mathrm{prt}(v)}$ and $b_{\check{C}_{v}}=1$ .

We have

[TABLE]

which yields $\mu_{C_{v}}\geq\mu_{\check{C}_{v}}+(\delta_{v|\mathrm{prt}(v)}-1)b_{\check{C}_{v}}$ .

Besides, if $\mu_{C_{v}}(x_{C_{v}})\geq 0$ , following the definition of $\delta$ and given that $\mu_{\mathrm{prt}(v)}(x_{\mathrm{prt}(v)})\leq 1$ , we have

[TABLE]

and the constraint $\mu_{C_{v}}\leq\delta_{v|\mathrm{prt}(v)}b_{\check{C}_{v}}$ is satisfied.

Finally, $\mu_{C_{v}}\leq\mu_{\check{C}_{v}}$ follows from the marginalization constraint $\mu_{\check{C}_{v}}=\sum_{x_{v}}\mu_{C_{v}}$ in the definition of the local polytope. ∎

D.2.2 McCormick inequalities with well-chosen bounds are useful

This section provides examples of IDs where McCormick inequalities ( ${\rm McCormick}(v,b)$ ) improves the linear relaxation of MILPs (18) and (25).

Consider the ID on Figure 14.a, and assume that we have a bound $\mu_{st}\leq b_{st}$ . Then, the McCormick relaxation of $\mu_{sta}=\mu_{st}\delta_{a|t}$ reads

[TABLE]

Suppose that all variables are binary, that $s$ is Bernoulli with parameter $\frac{1}{2}$ , that $\mathbb{P}(X_{t}=1|X_{s})=1+\varepsilon X_{s}-\varepsilon(1-X_{s})$ , that $X_{w}$ indicates if $X_{s}=X_{a}$ , and that the objective is to maximize $\mathbb{E}_{\delta}(X_{w})$ , and has value $\tfrac{1}{2}+\varepsilon$ . An optimal policy consists in choosing $X_{a}=X_{t}$ . An optimal solution of the linear relaxation of (18) on $\overline{\mathcal{P}}$ without McCormick inequalities, has value 1. Whereas an optimal solution with McCormick inequality and $b_{st}(x_{s},x_{t})=\tfrac{1}{2}+\varepsilon\mathbbm{1}_{x_{s}=x_{t}}$ has value $\tfrac{1}{2}+\varepsilon$ . However, on this simple example, the McCormick inequalities are implied by the valid inequalities of Section 5.

This is no more the case on the ID of Figure 14.b, where $r$ is a Bernoulli of parameter $0.5$ and $X_{s}=X_{r}X_{b}+(1-X_{r})(1-X_{b})$ , and the remaining of the parameters are defined as previously. Using the same bounds, this new example leads to exactly the same results as before.

D.3 Algorithm to compute good quality bounds

This section provides a dynamic programming equation to compute bounds $b_{\check{C}_{v}}$ on $\mu_{\check{C}_{v}}$ that are smaller than $1$ . Let $\mathcal{G}$ be a RJT, and $C_{1},\ldots,C_{n}$ be a topological order on $\mathcal{G}$ . Let $C_{k}$ be a vertex in $\mathcal{G}$ , $C_{j}$ be the parent of $C_{k}$ and $C_{i}$ the parent of $C_{j}$ ( $i<j<k$ ). If $k=1$ , then $C_{i}=C_{j}=C_{k}=C_{1}$ . We introduce the notation $C_{j}^{a}=(C_{j}\backslash(C_{i}\cup C_{k}))\cap V^{\mathrm{a}}$ . We define inductively on $k$ the functions $\tilde{b}_{k}:\mathcal{X}_{C_{k}}\rightarrow[0,1]$ as follows.

[TABLE]

Proposition 22.

Let $\mu$ be in $\mathcal{S}(G)$ . We have $\mu_{C_{k}}(x_{C_{k}})\leq\tilde{b}_{k}(x_{C_{k}})$ for all $i$ and $x_{C_{k}}$ in $\mathcal{X}_{C_{k}}$ .

As a consequence, $b_{\check{C}_{v}}$ defined as $\sum_{x_{v}}\tilde{b}_{C_{v}}$ provides an upper bound on $\mu_{\check{C}_{v}}$ that can be used in McCormick constraints.

Proof.

We prove the result by induction. Let $(\mu,\delta)$ be a feasible solution of Problem 11 and $C_{j}$ be the parent of $C_{k}$ in $\mathcal{G}$ , and $C_{i}$ the parent of $C_{j}$ ( $i<j<k$ ).

If $k=1$ , then the result is obtained by using $\delta_{v}\leq 1$ for all $v\in V^{\mathrm{a}}$ .

We assume now that the induction is true until $k>1$ . We have

[TABLE]

From (36) to (37), we maximize over the policies in $(C_{i}\cup C_{j})\backslash C_{k}$ . From (37) to (38), we bound all policies in $C_{k}\cap V^{\mathrm{a}}$ by $1$ . Then (39) is obtained by using the induction assumption. Let $\alpha:\mathcal{X}_{\mathrm{fa}(C_{j}^{a})}\rightarrow\mathbb{R}$ be such that for all $x_{C_{j}^{a}}\in\mathcal{X}_{C_{j}^{a}}$ ,

[TABLE]

Then, (39) becomes

[TABLE]

Now, we can suppose that $\mathring{C_{v}}=\{v\}$ . Therefore, $|C_{j}^{a}|\leq 1$ and the maximum above can be decomposed into the sum.

[TABLE]

where from (41) to (42) we use a local maximization. Therefore, we obtain the result

[TABLE]

∎

Note that $\tilde{b}_{k}(x_{C_{k}})$ is computed via an order two recursion from $\tilde{b}_{i}(x_{C_{i}})$ where $i$ is the grand-parent of $k$ , which can be generalized to higher order if stricter bound are needed.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bezanson et al. [2017] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review , 59(1):65–98, 2017.
2Chandrasekaran et al. [2008] Venkat Chandrasekaran, Nathan Srebro, and Prahladh Harsha. Complexity of inference in graphical models. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence , pages 70–78. AUAI Press, 2008.
3Cohen and Parmentier [2019] Victor Cohen and Axel Parmentier. Two generalizations of markov blankets. ar Xiv preprint ar Xiv:1903.03538 , 2019.
4de Campos and Cozman [2007] Cassio Polpo de Campos and Fabio Gagliardi Cozman. Inference in credal networks through integer programming. In Proceedings of the 5th International Symposium on Imprecise Probability: Theories and Applications , pages 145–154, 2007.
5de Campos and Ji [2012] Cassio Polpo de Campos and Qiang Ji. Strategy selection in influence diagrams using imprecise probabilities. ar Xiv preprint ar Xiv:1206.3246 , 2012.
6Dunning et al. [2017] Iain Dunning, Joey Huchette, and Miles Lubin. Ju MP: A modeling language for mathematical optimization. SIAM Review , 59(2):295–320, 2017. doi: 10.1137/15M 1020575 .
7Howard and Matheson [1984] Ronald A Howard and James E Matheson. Influence diagrams, readings in the principles and practice of decision analysis. Strategic Decision Systems”, Menlo Park, Calif , 1984.
8Howard and Matheson [2005] Ronald A Howard and James E Matheson. Influence diagrams. Decision Analysis , 2(3):127–143, 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Integer programming on the junction tree polytope for influence diagrams

Abstract

1 Introduction

1.1 The framework of parametrized influence diagram

Example 1*.*

Example 2*.*

1.2 Solving MDP through linear programs

1.3 Literature

1.4 Contributions

1.5 Organization of the paper

2 Tools from Probabilistic graphical model theory

2.1 Graph notation

2.2 Directed graphical model

Proposition 1**.**

2.3 Junction trees

3 Rooted junction trees

3.1 Definition and main properties

Definition 1**.**

Theorem 2**.**

Remark 1*.*

Remark 2*.*

3.2 Building an RJT

Remark 3*.*

Proposition 3**.**

4 MILP formulation for influence diagrams

4.1 An exact Non Linear Program formulation

Theorem 4**.**

Proof.

4.2 MILP formulation

Remark 4*.*

5 Valid cuts

5.1 Constructing valid cuts

Definition 2**.**

5.2 Characterization of C ⁣⊥⊥C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}C⊥⊥

Proposition 5**.**

Theorem 6**.**

Lemma 7**.**

Proof of Theorem 6.

5.3 Strength of the relaxations and their interpretation in terms of graph

Proposition 8**.**

Theorem 9**.**

Lemma 10**.**

Proof.

Proof of Theorem 9.

6 Soluble influence diagrams

Remark 5*.*

6.1 Linear program for soluble influence diagrams

Proposition 11**.**

Theorem 12**.**

Lemma 13**.**

6.2 Comparison of soluble and linear relaxations

7 Numerical experiments

7.1 Bob and Alice daily chess game

7.2 Partially Observed Markov Decision Process with limited memory

Conclusion

Appendix A Rooted junction tree properties

Proposition 14**.**

Proof.

Lemma 15**.**

Proof.

Proof of Theorem 2.

Proof of Proposition 3.

Proposition 16**.**

Proof.

Proposition 17**.**

Proof.

Appendix B Proofs of Section 6

Lemma 18**.**

Proof.

Corollary 19**.**

Proof of Lemma 13.

Proof of Theorem 12.

Appendix C Algorithm to build a small RJT

Appendix D McCormick Relaxation

*Example 1**.*

*Example 2**.*

Proposition 1.

Definition 1.

Theorem 2.

*Remark 1**.*

*Remark 2**.*

*Remark 3**.*

Proposition 3.

Theorem 4.

*Remark 4**.*

Definition 2.

5.2 Characterization of $C^{{\!\scriptscriptstyle\,\bot\!\!\!\bot\,\!}}$

Proposition 5.

Theorem 6.

Lemma 7.

Proposition 8.

Theorem 9.

Lemma 10.

*Remark 5**.*

Proposition 11.

Theorem 12.

Lemma 13.

Proposition 14.

Lemma 15.

Proposition 16.

Proposition 17.

Lemma 18.

Corollary 19.

Proposition 20.

D.2.1 Using $b_{\check{C}_{v}}=1$ leads to loose constraints

Proposition 21.

Proposition 22.