A Note on the Relationship Between Conditional and Unconditional   Independence, and its Extensions for Markov Kernels

A.G. Nogales; P. P\'erez

arXiv:1706.03955·math.ST·October 28, 2021

A Note on the Relationship Between Conditional and Unconditional Independence, and its Extensions for Markov Kernels

A.G. Nogales, P. P\'erez

PDF

TL;DR

This paper explores the relationship between conditional and unconditional independence, extending classical results to Markov kernels and providing new theorems, counterexamples, and representation results to clarify these concepts.

Contribution

It introduces a main theorem linking independence of Markov kernels to conditional independence, extending existing results and providing new insights and counterexamples.

Findings

01

Main theorem establishes minimal conditions for independence from conditional independence.

02

Counterexamples clarify the boundaries of the theoretical results.

03

Extensions to Markov kernels broaden the applicability of independence concepts.

Abstract

Two known results on the relationship between conditional and unconditional independence are obtained as a consequence of the main result of this paper, a theorem that uses independence of Markov kernels to obtain a minimal condition which added to conditional independence implies independence. Some counterexamples and representation results are provided to clarify the concepts introduced and the propositions of the statement of the main theorem. Moreover, conditional independence and the mentioned results are extended to the framework of Markov kernels.

Equations136

M_{1}\times M_{2}:(\Omega,\mathcal{A})\mbox{$\succ

\longrightarrow

}(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{2})

$

M_{1}\times M_{2}:(\Omega,\mathcal{A})\mbox{$\succ

\longrightarrow

}(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{2})

$

(M_{1} \times M_{2}) (ω, A_{1} \times A_{2}) = M_{1} (ω, A_{1}) \cdot M_{2} (ω, A_{2}), A_{i} \in A_{i}, i = 1, 2 \mathchar 46

(M_{1} \times M_{2}) (ω, A_{1} \times A_{2}) = M_{1} (ω, A_{1}) \cdot M_{2} (ω, A_{2}), A_{i} \in A_{i}, i = 1, 2 \mathchar 46

\int_{Ω} M_{1} (ω, A_{1}) M_{2} (ω, A_{2}) d P (ω) = \int_{A_{2}} L (ω_{2}, A_{1}) d P^{M_{2}} (ω_{2}) \mathchar 46

\int_{Ω} M_{1} (ω, A_{1}) M_{2} (ω, A_{2}) d P (ω) = \int_{A_{2}} L (ω_{2}, A_{1}) d P^{M_{2}} (ω_{2}) \mathchar 46

P^{(X_{1}, X_{2}) ∣ X_{3}} = P^{X_{1} ∣ X_{3}} \times P^{X_{2} ∣ X_{3}}, P^{X_{3}} - c \mathchar 46 s \mathchar 46

P^{(X_{1}, X_{2}) ∣ X_{3}} = P^{X_{1} ∣ X_{3}} \times P^{X_{2} ∣ X_{3}}, P^{X_{3}} - c \mathchar 46 s \mathchar 46

\int_{A_{3}} P^{(X_{1}, X_{2}) ∣ X_{3} = ω_{3}} (A_{1} \times A_{2}) d P^{X_{3}} (ω_{3}) = \int_{A_{3}} P^{X_{1} ∣ X_{3} = ω_{3}} (A_{1}) \cdot P^{X_{2} ∣ X_{3} = ω_{3}} (A_{2}) d P^{X_{3}} (ω_{3}) \mathchar 46

\int_{A_{3}} P^{(X_{1}, X_{2}) ∣ X_{3} = ω_{3}} (A_{1} \times A_{2}) d P^{X_{3}} (ω_{3}) = \int_{A_{3}} P^{X_{1} ∣ X_{3} = ω_{3}} (A_{1}) \cdot P^{X_{2} ∣ X_{3} = ω_{3}} (A_{2}) d P^{X_{3}} (ω_{3}) \mathchar 46

E [(f_{1} \circ X_{1}) \cdot (f_{2} \circ X_{2}) ∣ X_{3}] = E [f_{1} \circ X_{1} ∣ X_{3}] \cdot E [f_{2} \circ X_{2} ∣ X_{3}], P^{X_{3}} - c \mathchar 46 s \mathchar 46,

E [(f_{1} \circ X_{1}) \cdot (f_{2} \circ X_{2}) ∣ X_{3}] = E [f_{1} \circ X_{1} ∣ X_{3}] \cdot E [f_{2} \circ X_{2} ∣ X_{3}], P^{X_{3}} - c \mathchar 46 s \mathchar 46,

E [(f_{1} \circ X_{1}) \cdot (f_{2} \circ X_{2})] = E [f_{1} \circ X_{1}] \cdot E [f_{2} \circ X_{2}] \mathchar 46

E [(f_{1} \circ X_{1}) \cdot (f_{2} \circ X_{2})] = E [f_{1} \circ X_{1}] \cdot E [f_{2} \circ X_{2}] \mathchar 46

\int_{Ω_{3}} P^{X_{1} ∣ X_{3} = ω_{3}} (A_{1}) \cdot P^{X_{2} ∣ X_{3} = ω_{3}} (A_{2}) d P^{X_{3}} (ω_{3}) = \int_{Ω_{3}} P^{X_{1} ∣ X_{3} = ω_{3}} (A_{1}) d P^{X_{3}} (ω_{3}) \cdot \int_{Ω_{3}} P^{X_{2} ∣ X_{3} = ω_{3}} (A_{2}) d P^{X_{3}} (ω_{3}),

\int_{Ω_{3}} P^{X_{1} ∣ X_{3} = ω_{3}} (A_{1}) \cdot P^{X_{2} ∣ X_{3} = ω_{3}} (A_{2}) d P^{X_{3}} (ω_{3}) = \int_{Ω_{3}} P^{X_{1} ∣ X_{3} = ω_{3}} (A_{1}) d P^{X_{3}} (ω_{3}) \cdot \int_{Ω_{3}} P^{X_{2} ∣ X_{3} = ω_{3}} (A_{2}) d P^{X_{3}} (ω_{3}),

\int_{Ω_{3}} E (f_{1} \circ X_{1} ∣ X_{3}) \cdot E (f_{2} \circ X_{2} ∣ X_{3}) d P^{X_{3}} = E (f_{1} \circ X_{1}) \cdot E (f_{2} \circ X_{2}) \mathchar 46

\int_{Ω_{3}} E (f_{1} \circ X_{1} ∣ X_{3}) \cdot E (f_{2} \circ X_{2} ∣ X_{3}) d P^{X_{3}} = E (f_{1} \circ X_{1}) \cdot E (f_{2} \circ X_{2}) \mathchar 46

C (n_{000}, n_{001}, n_{010}, n_{011}, n_{100}, n_{101}, n_{110}, n_{111}) \mathchar 46

C (n_{000}, n_{001}, n_{010}, n_{011}, n_{100}, n_{101}, n_{110}, n_{111}) \mathchar 46

X_{1} (ω) = i, \mbox i f ω \in A_{i ++}, i = 0, 1,

X_{1} (ω) = i, \mbox i f ω \in A_{i ++}, i = 0, 1,

X_{2} (ω) = j, \mbox i f ω \in A_{+ j +}, j = 0, 1,

X_{3} (ω) = k, \mbox i f ω \in A_{++ k}, k = 0, 1 \mathchar 46

\mbox p r e v a l e n ceo f t h e d i se a se = \frac{n _{++ 1}}{n _{+++}},

\mbox p r e v a l e n ceo f t h e d i se a se = \frac{n _{++ 1}}{n _{+++}},

\mbox s p ec i f i c i t y o f X_{1} = \frac{n _{+ 00}}{n _{++ 0}}, \mbox s p ec i f i c i t y o f X_{2} = \frac{n _{0 + 0}}{n _{++ 0}},

\mbox se n s i t i v i t y o f X_{1} = \frac{n _{+ 11}}{n _{++ 1}}, \mbox se n s i t i v i t y o f X_{2} = \frac{n _{1 + 1}}{n _{++ 1}} \mathchar 46 □

n_{ij +} \cdot n_{+++} = n_{i ++} \cdot n_{+ j +}

n_{ij +} \cdot n_{+++} = n_{i ++} \cdot n_{+ j +}

k = 0 \sum 1 P (X_{1} = i ∣ X_{3} = k) \cdot P (X_{2} = j ∣ X_{3} = k) \cdot P (X_{3} = k) =

k = 0 \sum 1 P (X_{1} = i ∣ X_{3} = k) \cdot P (X_{2} = j ∣ X_{3} = k) \cdot P (X_{3} = k) =

(k = 0 \sum 1 P (X_{1} = i ∣ X_{3} = k) \cdot P (X_{3} = k)) \cdot (k = 0 \sum 1 P (X_{2} = j ∣ X_{3} = k) \cdot P (X_{3} = k)),

\frac{n _{0 + 0} n _{+ 00}}{n _{++ 0}} + \frac{n _{0 + 1} n _{+ 01}}{n _{++ 1}} = \frac{n _{0 ++} n _{+ 0 +}}{n _{+++}}

\frac{n _{0 + 0} n _{+ 00}}{n _{++ 0}} + \frac{n _{0 + 1} n _{+ 01}}{n _{++ 1}} = \frac{n _{0 ++} n _{+ 0 +}}{n _{+++}}

\frac{n _{0 + 0} n _{+ 10}}{n _{++ 0}} + \frac{n _{0 + 1} n _{+ 11}}{n _{++ 1}} = \frac{n _{0 ++} n _{+ 1 +}}{n _{+++}}

\frac{n _{1 + 0} n _{+ 00}}{n _{++ 0}} + \frac{n _{1 + 1} n _{+ 01}}{n _{++ 1}} = \frac{n _{1 ++} n _{+ 0 +}}{n _{+++}}

\frac{n _{1 + 0} n _{+ 10}}{n _{++ 0}} + \frac{n _{1 + 1} n _{+ 11}}{n _{++ 1}} = \frac{n _{1 ++} n _{+ 1 +}}{n _{+++}}

P (X_{1} = i, X_{2} = j ∣ X_{3} = k) = P (X_{1} = i ∣ X_{3} = k) \cdot P (X_{2} = j ∣ X_{3} = k),

P (X_{1} = i, X_{2} = j ∣ X_{3} = k) = P (X_{1} = i ∣ X_{3} = k) \cdot P (X_{2} = j ∣ X_{3} = k),

P (X_{1} = i, X_{2} = j, X_{3} = k) \cdot P (X_{3} = k) = P (X_{1} = i, X_{3} = k) \cdot P (X_{2} = j, X_{3} = k),

P (X_{1} = i, X_{2} = j, X_{3} = k) \cdot P (X_{3} = k) = P (X_{1} = i, X_{3} = k) \cdot P (X_{2} = j, X_{3} = k),

n_{000} \cdot n_{++ 0} = n_{0 + 0} \cdot n_{+ 00}, n_{001} \cdot n_{++ 1} = n_{0 + 1} \cdot n_{+ 01}

n_{000} \cdot n_{++ 0} = n_{0 + 0} \cdot n_{+ 00}, n_{001} \cdot n_{++ 1} = n_{0 + 1} \cdot n_{+ 01}

n_{010} \cdot n_{++ 0} = n_{0 + 0} \cdot n_{+ 10}, n_{011} \cdot n_{++ 1} = n_{0 + 1} \cdot n_{+ 11}

n_{100} \cdot n_{++ 0} = n_{1 + 0} \cdot n_{+ 00}, n_{101} \cdot n_{++ 1} = n_{1 + 1} \cdot n_{+ 01}

n_{110} \cdot n_{++ 0} = n_{1 + 0} \cdot n_{+ 10}, n_{111} \cdot n_{++ 1} = n_{1 + 1} \cdot n_{+ 11}

\int_{Ω_{3}} E (f_{1} \circ X_{1} ∣ X_{3}) E (f_{2} \circ X_{2} ∣ X_{3}) d P^{X_{3}} = E (f_{1} \circ X_{1}) E (f_{2} \circ X_{2}),

\int_{Ω_{3}} E (f_{1} \circ X_{1} ∣ X_{3}) E (f_{2} \circ X_{2} ∣ X_{3}) d P^{X_{3}} = E (f_{1} \circ X_{1}) E (f_{2} \circ X_{2}),

f_{1} \circ Z_{1} \circ (Y_{1}, Y_{2}) ⊥⊥ f_{2} \circ Z_{2} \circ (Y_{1}, Y_{2})

f_{1} \circ Z_{1} \circ (Y_{1}, Y_{2}) ⊥⊥ f_{2} \circ Z_{2} \circ (Y_{1}, Y_{2})

E (E (I_{F_{1}} ∣ Y_{1}) I_{F_{2}}) = E (I_{F_{1}}) \cdot E (I_{F_{2}}) and E (E (I_{F_{2}} ∣ Y_{1}) I_{F_{1}}) = E (I_{F_{1}}) \cdot E (I_{F_{2}}),

E (E (I_{F_{1}} ∣ Y_{1}) I_{F_{2}}) = E (I_{F_{1}}) \cdot E (I_{F_{2}}) and E (E (I_{F_{2}} ∣ Y_{1}) I_{F_{1}}) = E (I_{F_{1}}) \cdot E (I_{F_{2}}),

\int_{Ω_{3}} E (g_{1} \circ X_{1} ∣ X_{3}) \cdot E (g_{2} \circ X_{2} ∣ X_{3}) d P^{X_{3}} = E (g_{1} \circ X_{1}) \cdot E (g_{2} \circ X_{2}) \mathchar 46

\int_{Ω_{3}} E (g_{1} \circ X_{1} ∣ X_{3}) \cdot E (g_{2} \circ X_{2} ∣ X_{3}) d P^{X_{3}} = E (g_{1} \circ X_{1}) \cdot E (g_{2} \circ X_{2}) \mathchar 46

E [E (I_{F_{1}} ∣ X_{1}) E (I_{F_{2}} ∣ X_{1})] = E (I_{F_{1}}) \cdot E (I_{F_{2}}),

E [E (I_{F_{1}} ∣ X_{1}) E (I_{F_{2}} ∣ X_{1})] = E (I_{F_{1}}) \cdot E (I_{F_{2}}),

E [E (I_{F_{1}} ∣ X_{1}) E (I_{F_{2}} ∣ X_{1})] = E [E (I_{F_{1}} E (I_{F_{2}} ∣ X_{1}) ∣ X_{1})] = E [I_{F_{1}} E (I_{F_{2}} ∣ X_{1})]

E [E (I_{F_{1}} ∣ X_{1}) E (I_{F_{2}} ∣ X_{1})] = E [E (I_{F_{1}} E (I_{F_{2}} ∣ X_{1}) ∣ X_{1})] = E [I_{F_{1}} E (I_{F_{2}} ∣ X_{1})]

E [E (I_{F_{1}} ∣ X_{1}) E (I_{F_{2}} ∣ X_{1})] = E [E (I_{F_{2}} E (I_{F_{1}} ∣ X_{1}) ∣ X_{1})] = E [I_{F_{2}} E (I_{F_{1}} ∣ X_{1})] \mathchar 46

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Note on the Relationship Between Conditional and Unconditional Independence, and its Extensions for Markov Kernels

A.G. Nogales and P. Pérez

Dpto. de Matemáticas, Universidad de Extremadura

Avda. de Elvas, s/n, 06006–Badajoz, SPAIN.

e-mail: [email protected]

Abstract. Two known results on the relationship between conditional and unconditional independence are obtained as a consequence of the main result of this paper, a theorem that uses independence of Markov kernels to obtain a minimal condition which added to conditional independence implies independence. Some examples, counterexamples and representation results are provided to clarify the concepts introduced and the propositions of the statement of the main theorem. Moreover, conditional independence and the mentioned results are extended to the framework of Markov kernels.

AMS Subject Class. (2010): Primary 60Exx Secondary 60J35
Key words and phrases: conditional independence, Markov kernel.

1 Introduction and basic definitions

Conditional independence is a classical and familiar basic tool of both probability theory (think on Markov chains theory, for example) and mathematical statistics. See, for instance, Dawid (1979) and Florens et al. (1990), where an extensive use of conditional independence is made in order to unify many seemingly unrelated concepts of statistical inference, either from the Bayesian and the frequentist point of views. The introduction sections of Dawid (1979), Phillips (1988) and van Putten et al. (1985) list some of the main fields of application of the conditional independence relation: e.g. econometric distribution theory, asymptotic studies of regression with non-ergodic processes, the definition of a stochastic dynamic system and the stochastic realization problem, or, in a statistical framework, the areas of sufficiency, ancillarity, identification or invariance, among others. van Putten et al. (1985) includes also, among other interesting results, a systematic study of invariance properties of conditional independence under enlargement or reduction of the involved $\sigma$ -fields, which keep some connection with the main problem raised in this paper.

It is well known that conditional independence does not imply, and it is not implied by, independence. We shall write $X\perp\perp Y$ and $X\perp\perp Y|Z$ for the independence of the random variables $X$ and $Y$ and its conditional independence given a third random variable $Z$ , respectively.

Section 2 contains the main result of this paper, Theorem 1, that uses independence of Markov kernels, a concept introduced by Nogales (2013a), to obtain a minimal condition which added to conditional independence implies independence.

This way the result becomes an improvement of two known results on the relationship between conditional and unconditional independence: one that constitutes the main goal of Phillips (1988), and another that is obtained as an immediate consequence of Theorem 2.2.10 of Florens et al. (1990) (or the Lemma 4.3 of Dawid (1979)), as it is remarked in Section 3. In this section some examples and counterexamples are also given to delimit the relations between the three propositions of Theorem 1.

In this paper (Section 4) we also attack the problem of constructing a rigorous general theory of conditional independence in terms of Markov kernels; notice that Markov kernels are extensions of the concepts of both random variable and $\sigma$ -field, and Theorem 1 is here extended to this new framework. Dawid (1980) constructs a theory of conditional independence for “statistical operations”, which is presented as a slight generalization of Markovian operator, which appears itself as a generalization of Markov kernel. Although this article also runs in the field of specialized mathematics, we hope the reader can find the development of conditional independence in the less abstract frame of Markov kernels (or transition probabilities) useful.

A more general result than Theorem 1 in terms of random variables is finally presented in Section 5. The introduced definition of conditional independence between Markov kernels is used to obtain a minimal condition which added to conditional independence of $X_{1}$ and $X_{2}$ given $X_{3}$ implies the conditional independence of $X_{1}$ and $X_{2}$ given $X_{4}$ , provided $X_{4}$ is a function of $X_{3}$ .

The paper is completed with some understandable reformulations of several of the propositions considered. With the same purpose, some representation results of the introduced definitions for Markov kernels in terms of random variables are also facilitated.

For ease of reading, the demonstrations will appear in a final section

In what follows $(\Omega,\mathcal{A})$ , $(\Omega_{1},\mathcal{A}_{1})$ , and so on, will denote measurable spaces. A random variable is a map $X:(\Omega,\mathcal{A})\rightarrow(\Omega_{1},\mathcal{A}_{1})$ such that $X^{-1}(A_{1})\in\mathcal{A}$ , for all $A_{1}\in\mathcal{A}_{1}$ . Its probability distribution (or, simply, distribution) $P^{X}$ with respect to a probability measure $P$ on $\mathcal{A}$ is the image measure of $P$ by $X$ , i.e., the probability measure on $\mathcal{A}_{1}$ defined by $P^{X}(A_{1}):=P(X^{-1}(A_{1}))$ . Let us write $\times$ instead of $\otimes$ for the product of $\sigma$ -fields or measures. The next definition is well known and can be found, for instance, in Heyer (1982).

Definition 1.

(i) (Markov kernel) A Markov kernel $M_{1}:(\Omega,\mathcal{A})\mbox{$ \succ$$\longrightarrow $}(\Omega_{1},\mathcal{A}_{1})$ is a map $M_{1}:\Omega\times\mathcal{A}_{1}\rightarrow[0,1]$ such that: a) $\forall\omega\in\Omega$ , $M_{1}(\omega,\cdot)$ is a probability measure on $\mathcal{A}_{1}$ ; b) $\forall A_{1}\in\mathcal{A}_{1}$ , $M_{1}(\cdot,A_{1})$ is an $\mathcal{A}$ -measurable map.

(ii) (Diagonal product of Markov kernels) The diagonal product

[TABLE]

of two Markov kernels $M_{1}:(\Omega,\mathcal{A})\mbox{$ \succ $\longrightarrow$}(\Omega_{1},\mathcal{A}_{1})$ and $M_{2}:(\Omega,\mathcal{A})\mbox{$\succ$ \longrightarrow $}(\Omega_{2},\mathcal{A}_{2})$ is defined as the only Markov kernel such that

[TABLE]

(iii) (Image of a Markov kernel) The image (let us also call it probability distribution) of a Markov kernel $M_{1}:(\Omega,\mathcal{A},P)\mbox{$ \succ$$\longrightarrow $}(\Omega_{1},\mathcal{A}_{1})$ on a probability space is the probability measure $P^{M_{1}}$ on $\mathcal{A}_{1}$ defined by $P^{M_{1}}(A_{1}):=\int_{\Omega}M_{1}(\omega,A_{1})\,dP(\omega)$ .

Definition 2.

(Independence of Markov kernels, Nogales (2013a)) Let $(\Omega,\mathcal{A},P)$ be a probability space. Two Markov kernels $M_{1}:(\Omega,\mathcal{A},P)\mbox{$ \succ $\longrightarrow$}(\Omega_{1},\mathcal{A}_{1})$ and $M_{2}:(\Omega,\mathcal{A},P)\mbox{$\succ$ \longrightarrow $}(\Omega_{2},\mathcal{A}_{2})$ are said to be independent if $P^{M_{1}\times M_{2}}=P^{M_{1}}\times P^{M_{2}}$ . We write $M_{1}\perp\perp M_{2}$ (or $M_{1}\perp\perp_{P}M_{2}$ ).

Given two random variables $X_{i}:(\Omega,\mathcal{A},P)\rightarrow(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2$ , the conditional distribution of $X_{2}$ given $X_{1}$ , when it exists, is a Markov kernel $M_{1}:(\Omega_{1},\mathcal{A}_{1})\mbox{$ \succ$$\longrightarrow $}(\Omega_{2},\mathcal{A}_{2})$ such that $P(X_{1}\in A_{1},X_{2}\in A_{2})=\int_{A_{1}}M_{1}(\omega_{1},A_{2})dP^{X_{1}}(\omega_{1})$ , for all $A_{1}\in\mathcal{A}_{1}$ and $A_{2}\in\mathcal{A}_{2}$ . We write $P^{X_{2}|X_{1}=\omega_{1}}(A_{2}):=M_{1}(\omega_{1},A_{2})$ . Reciprocally, every Markov kernel is also a conditional distribution, as it is noted in (2013b). This paper also introduces the next definition.

Definition 3.

(Conditional distribution of a Markov kernel given another) Let $M_{1}:(\Omega,\mathcal{A},P)\mbox{$ \succ $\longrightarrow$}(\Omega_{1},\mathcal{A}_{1})$ and $M_{2}:(\Omega,\mathcal{A},P)\mbox{$\succ$ \longrightarrow $}(\Omega_{2},\mathcal{A}_{2})$ be two Markov kernels over the same probability space. The conditional distribution $P^{M_{1}|M_{2}}$ of $M_{1}$ given $M_{2}$ is defined as a Markov kernel $L:(\Omega_{2},\mathcal{A}_{2})\mbox{$ \succ$$\longrightarrow $}(\Omega_{1},\mathcal{A}_{1})$ such that, for every pair of events $A_{1}\in\mathcal{A}_{1}$ and $A_{2}\in\mathcal{A}_{2}$ ,

[TABLE]

Remark.

An interesting problem in this context is the existence of such conditional distributions, something that happens under well known regularity conditions on the involved measurable spaces, e.g. $(\Omega,\mathcal{A})$ , or the corresponding measurable space $(\Omega_{i},\mathcal{A}_{i})$ , is a standard Borel space. This is the same for both random variables and Markov kernels (see Nogales (2013b)). In the rest of the paper we will assume this when necessary. $\Box$

2 Conditional Independence

Let us recall the definition of conditional independence for random variables; we refer to Dawid (1979), for instance, where some basic properties are also given.

Definition 4.

Let $X_{i}:(\Omega,\mathcal{A},P)\rightarrow(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2,3,$ be arbitrary random variables $X_{1}$ and $X_{2}$ are said to be conditional independent given $X_{3}$ , and we write $X_{1}\perp\perp X_{2}|X_{3}$ (or $X_{1}\perp\perp_{P}X_{2}|X_{3}$ to be more precise), if

[TABLE]

We are now ready for the main result of the paper.

Theorem 1.

If $X_{1}$ and $X_{2}$ are conditional independent given $X_{3}$ , then $X_{1}$ and $X_{2}$ are independent if, and only if, the Markov kernels $P^{X_{1}|X_{3}}$ and $P^{X_{2}|X_{3}}$ are $P^{X_{3}}$ -independent.

Remark.

(Some reformulations of the three propositions involved in the previous theorem) By definition, $X_{1}\perp\perp X_{2}\mid X_{3}$ means that, for every $A_{i}\in\mathcal{A}_{i}$ , $1\leq i\leq 3$ ,

[TABLE]

This is equivalent to

[TABLE]

for every bounded real random variables $f_{i}:(\Omega_{i},\mathcal{A}_{i})\rightarrow\mathbb{R}$ , $i=1,2$ .

In particular, $X_{1}\perp\perp X_{2}$ is equivalent to

[TABLE]

Finally, $P^{X_{1}|X_{3}}\perp\perp_{P^{X_{3}}}P^{X_{2}|X_{3}}$ means that, for every $A_{i}\in\mathcal{A}_{i}$ , $1\leq i\leq 2$ ,

[TABLE]

which is equivalent to

[TABLE]

for every pair of functions $f_{1},f_{2}$ as above. As $E(E(f_{i}\circ X_{i}|X_{3}))=E(f_{i}\circ X_{i})$ , this is equivalent to the uncorrelatedness of the conditional expectations given $X_{3}$ of every pair of real bounded measurable functions of $X_{1}$ and $X_{2}$ . Seen in this way, the independence of these two conditional distributions has a degree of difficulty comparable to other conditions that appear in the literature cited in the bibliography; for instance, see Proposition 2.4.g of van Putten et al. (1985), or others appearing in the results following it.

3 Counterexamples

Let $X_{i}:(\Omega,\mathcal{A},P)\rightarrow(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2,3,$ be random variables. Consider the propositions:

(i)

$X_{1}\perp\perp X_{2}\mid X_{3}$ .

(ii)

$X_{1}\perp\perp X_{2}$ .

(iii)

$P^{X_{1}|X_{3}}\perp\perp_{P^{X_{3}}}P^{X_{2}|X_{3}}$ .

We have shown that $\mbox{(i)}+\mbox{(ii)}\Longrightarrow\mbox{(iii)}$ and $\mbox{(i)}+\mbox{(iii)}\Longrightarrow\mbox{(ii)}$ , i.e., in presence of (i), the statements (ii) and (iii) are equivalent. In particular, (iii) is just we need to reach independence from conditional independence.

We can ask ourselves if every two of these propositions implies the third. In particular, we wonder if (i) and (ii) are equivalent when (iii) is satisfied. All the answers are negative, as the next counterexamples show. We also include two examples in which the theorem applies.

First, let us describe a common framework for them.

Let $\Omega$ be a population with $n$ individuals and consider a partition $(A_{ijk})_{i,j,k=0,1}$ of $\Omega$ . We write $n_{ijk}$ for the number of individuals of $A_{ijk}$ . One or more of the indices $i,j,k$ can be replaced by a $+$ sign to denote the union of the corresponding sets of the partition: for instance, $A_{+01}=A_{001}\cup A_{101}$ . In particular, $\Omega=A_{+++}$ . Similar notations should be used for the numbers $n_{ijk}$ (e.g. $n_{+0+}=n_{000}+n_{001}+n_{100}+n_{101}$ ). Such a situation will be referred to as

[TABLE]

We introduce three dichotomic random variables $X_{1},X_{2},X_{3}$ as follows:

[TABLE]

Example 1.

A scheme like this could be obtained when we want to study the relationship between two diagnostic procedures, represented by the dichotomous variables $X_{1}$ and $X_{2}$ ( $X_{i}=1$ or [math] when the $i^{th}$ diagnostic test is positive or negative, respectively), for a disease represented by the dichotomous variable $X_{3}$ , which takes the values 1 or 0 depending on whether the disease is actually present or absent. In this case, we have the following equivalence for some known related concepts:

[TABLE]

The independence of $X_{1}$ and $X_{2}$ means that, for every $i,j=0,1$ ,

[TABLE]

The independence of $M_{1}:=P^{X_{1}|X_{3}}$ and $M_{2}:=P^{X_{2}|X_{3}}$ with respect to $P^{X_{3}}$ means that, for every $i,j=0,1$ ,

[TABLE]

that is to say,

[TABLE]

The conditional independence of $X_{1}$ and $X_{2}$ given $X_{3}$ , i.e. $P^{(X_{1},X_{2})|X_{3}}=P^{X_{1}|X_{3}}\times P^{X_{2}|X_{3}}$ , means that, for every $i,j,k=0,1$ ,

[TABLE]

or

[TABLE]

which is the same as

[TABLE]

The following counterexamples delimit Theorem 1.

Counterexample 1.

For $C(3000,200,1500,300,1500,200,3000,300)$ it is easy to see that $M_{1}=P^{X_{1}|X_{3}}$ and $M_{2}=P^{X_{2}|X_{3}}$ are $P^{X_{3}}$ -independent, but $X_{1}$ and $X_{2}$ are not $P$ -independent. So, in absence of (i), (ii) is not implied by (iii). $\Box$

Counterexample 2.

For $C(4200,400,2000,300,2000,200,1000,100)$ , $M_{1}=P^{X_{1}|X_{3}}$ and $M_{2}=P^{X_{2}|X_{3}}$ are not $P^{X_{3}}$ -independent. Nevertheless $X_{1}$ and $X_{2}$ are independent. Obviously, $X_{1}$ and $X_{2}$ are not conditionally independent given $X_{3}$ . So, in absence of (i), (iii) is not implied by (ii). $\Box$

Counterexample 3.

For $C(1000,1000,0,2000,0,2000,1000,1000)$ , $M_{1}=P^{X_{1}|X_{3}}$ and $M_{2}=P^{X_{2}|X_{3}}$ are $P^{X_{3}}$ -independent, and $X_{1}$ and $X_{2}$ are independent, but $X_{1}$ and $X_{2}$ are not conditionally independent given $X_{3}$ . So (i) is not implied by (ii)+(iii). $\Box$

In the next two examples, the condition (i) holds. Hence the propositions (ii) and (iii) hold or not simultaneously. See also the remark below to see how Theorem 1 is an improvement of two previous known results on the relationship between unconditional and conditional independence.

Example 2.

For $C(1200,3000,1200,3000,2000,3200,2000,3200)$ the three propositions (i), (ii) and (iii) are satisfied. $\Box$

Example 3.

For $C(1200,3000,1200,3200,2000,3000,2000,3200)$ , (i) holds, but not (ii) or (iii). $\Box$

Remark.

Keeping the previous notations, it is known that (i) + $X_{1}\perp\perp X_{3}$ implies (ii); see, for instance, Florens et al. (2000, Theorem 2.2.10) or Lemma 4.3 of Dawid (1979) when the conditioning on $Z$ is absent. Theorem 1 is an improvement of this result as $X_{1}\perp\perp X_{3}$ implies, and it is not implied by, (iii), as we prove in what follows. It is easy to see that the independence of $X_{1}$ and $X_{3}$ implies (iii). Indeed, given bounded real random variables $f_{i}$ , $i=1,2$ , the independence of $X_{1}$ and $X_{3}$ yields $E(f_{1}\circ X_{1}|X_{3})=E(f_{1}\circ X_{1})$ and hence

[TABLE]

which is equivalent to (iii). Let us show that the reciproque is not true: it is proved in Nogales (2013b) that, for a trivariate normal random variable $(X_{1},X_{2},X_{3})$ with null mean and covariance matrix $(\sigma_{ij})$ , the $P^{X_{3}}$ -conditional distribution $L(x_{1},\cdot)$ of $P^{X_{2}|X_{3}}$ given that $P^{X_{1}|X_{3}}$ has taken the value $x_{1}$ follows a normal distribution with mean $\sigma_{1}^{-1}\sigma_{2}\rho_{23}\rho_{13}x_{1}$ and variance $\sigma_{2}^{2}(1-\rho_{23}^{2}\rho_{13}^{2})$ , where $\rho_{ij}$ denotes the correlation coefficient of $X_{i}$ and $X_{j}$ . So, the Markov kernels $P^{X_{2}|X_{3}}$ and $P^{X_{1}|X_{3}}$ are $P^{X_{3}}$ -independent if, and only if, $L(x_{1},\cdot)$ coincides with $(P^{X_{3}})^{P^{X_{2}|X_{3}}}$ (which coincides with $P^{X_{2}}$ ), and this happens if $\rho_{23}=0$ or $\rho_{13}=0$ . So, for $\rho_{23}=0$ and $\rho_{13}\neq 0$ , we have that $P^{X_{2}|X_{3}}$ and $P^{X_{1}|X_{3}}$ are $P^{X_{3}}$ -independent, but $X_{1}$ and $X_{3}$ are not independent. $\Box$

Remark.

Phillips (1988) shows the next result: “For $i=1,2$ , consider random variables $Y_{i}:(\Omega,\mathcal{A},P)\rightarrow(\Omega_{i},\mathcal{A}_{i})$ , $Z_{i}:(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{2})\rightarrow(\Omega^{\prime}_{i},\mathcal{A}^{\prime}_{i})$ , $f_{i}:(\Omega^{\prime}_{i},\mathcal{A}^{\prime}_{i})\rightarrow(\Omega^{\prime\prime}_{i},\mathcal{A}^{\prime\prime}_{i})\mathchar 46\relax$ If $f_{1}\circ Z_{1}\circ(Y_{1},Y_{2})\perp\perp f_{2}\circ Z_{2}\circ(Y_{1},Y_{2})|Y_{1}$ , then

[TABLE]

is equivalent to

[TABLE]

whatever be the events $F_{i}\in(f_{i}\circ Z_{i}\circ(Y_{1},Y_{2}))^{-1}(\mathcal{A}^{\prime\prime}_{i})$ , $i=1,2\mathchar 46\relax$ ”

This is a particular case of Theorem 1 with no more to take $X_{1}=f_{1}\circ Z_{1}\circ(Y_{1},Y_{2})$ , $X_{2}=f_{2}\circ Z_{2}\circ(Y_{1},Y_{2})$ y $X_{3}=Y_{1}$ . Indeed, according to Theorem 1, if $X_{1}\perp\perp X_{2}|X_{3}$ , then $X_{1}\perp\perp X_{2}$ is equivalent to $P^{X_{1}|X_{3}}\perp\perp_{P^{X_{3}}}P^{X_{2}|X_{3}}$ , which in turns means that, for every bounded real random variable $g_{i}$ on $(\Omega^{\prime\prime}_{i},\mathcal{A}^{\prime\prime}_{i})$ ,

[TABLE]

If $F_{i}=X_{i}^{-1}(A^{\prime\prime}_{i})$ , making $g_{i}:=I_{A^{\prime\prime}_{i}}$ , $i=1,2$ , it follows that

[TABLE]

and, on the other hand,

[TABLE]

and

[TABLE]

Hence

[TABLE]

It is readily shown that, from these two equalities, we obtain

[TABLE]

for every bounded real random variables $g_{1},g_{2}$ on $(\Omega^{\prime\prime}_{i},\mathcal{A}^{\prime\prime}_{i})$ . $\Box$

4 Extension to Markov kernels

In this section we extend to Markov kernels the concept of conditional independence. Theorem 1 is also extended to this framework.

Definition 5.

(Conditional independence of Markov kernels) Given three Markov kernels $M_{i}:(\Omega,\mathcal{A},P)\mbox{$ \succ$$\longrightarrow $}(\Omega_{i},\mathcal{A}_{i})$ , $1\leq i\leq 3$ , we shall say that $M_{1}$ and $M_{2}$ are conditionally independent given $M_{3}$ , and we write $M_{1}\perp\perp_{P}M_{2}|M_{3}$ (or $M_{1}\perp\perp M_{2}|M_{3}$ if there is not ambiguity), when

[TABLE]

Remark.

(A representation in terms of random variables) Keeping the suppositions of the previous definition, let us write $q_{i}$ for the natural $i^{\mbox{th}}$ projection on $\Omega_{1}\times\Omega_{2}\times\Omega_{3}$ , $1\leq i\leq 3\mathchar 46\relax$ It is readily shown that

[TABLE]

So,

[TABLE]

Moreover, when $\Omega_{2}=\mathbb{R}^{k}$ and $M_{2}$ is integrable, from

[TABLE]

we obtain that

[TABLE]

Remark.

(Characterization in terms of densities) Suppose that, for $i=1,2,3$ , $\mu_{i}$ is a $\sigma$ -finite measure on $\mathcal{A}_{i}$ such that $dM_{i}(\omega,\cdot)=\phi_{i}(\omega,\cdot)d\mu_{i}$ , where $\phi_{i}$ is a nonnegative real $\mathcal{A}\times\mathcal{A}_{i}$ -measurable function on $\Omega\times\Omega_{i}$ . Usually, the dominating measure $\mu_{i}$ is the counting measure in the discrete (respectively, the Lebesgue measure in the continuous) case, both in the univariate and multivariate framework. It is shown in Nogales (2013b) that the map $\omega_{3}\mapsto\int_{\Omega}\phi_{3}(\omega,\omega_{3})dP(\omega)$ is a $\mu_{3}$ -density of $P^{M_{3}}$ and, besides, for $i=1,2$ , the conditional distribution $L_{i}:=P^{M_{i}|M_{3}}$ exists and, for $P^{M_{3}}$ -almost every $\omega_{3}$ , the map

[TABLE]

is a $\mu_{i}$ -density of $L_{i}(\omega_{3},\cdot)$ .

A similar reasoning shows that the map

[TABLE]

is a $\mu_{1}\times\mu_{2}$ -density of $P^{M_{1}\times M_{2}}$ , and the conditional distribution $L:=P^{M_{1}\times M_{2}|M_{3}}$ exists and, for $P^{M_{3}}$ -almost every $\omega_{3}$ , the map

[TABLE]

is a $\mu_{1}\times\mu_{2}$ -density of $L(\omega_{3},\cdot)$ .

Hence, the conditional independence of $M_{1}$ and $M_{2}$ given $M_{3}$ means that, for $P^{M_{3}}$ -almost every $\omega_{3}$ and $\mu_{1}\times\mu_{2}$ -almost every $(\omega_{1},\omega_{2})$ ,

[TABLE]

The next theorem extends Theorem 1 to Markov kernels.

Theorem 2.

Let $M_{i}:(\Omega,\mathcal{A},P)\mbox{$ \succ$$\longrightarrow $}(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2,3,$ be Markov kernels. Consider the propositions:

(i)

$M_{1}\perp\perp M_{2}\mid M_{3}$ .

(ii)

$M_{1}\perp\perp M_{2}$ .

(iii)

$P^{M_{1}|M_{3}}\perp\perp_{P^{M_{3}}}P^{M_{2}|M_{3}}$ .

Then, under (i), the propositions (ii) and (iii) are equivalent.

5 Another extension of the main result

A more general result than Theorem 1 in terms of random variables is presented in this section, where the introduced definition of conditional independence between Markov kernels is used to obtain a minimal condition which added to conditional independence of $X_{1}$ and $X_{2}$ given $X_{3}$ implies the conditional independence of $X_{1}$ and $X_{2}$ given $X_{4}$ when $X_{4}$ is function of $X_{3}$ . In fact, Theorem 1 appears as the particular case in which $X_{4}$ is a constant function.

Theorem 3.

Let $X_{i}:(\Omega,\mathcal{A},P)\mbox{$ \succ$$\longrightarrow $}(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2,3,4,$ random variables. Suppose that $X_{4}=f\circ X_{3}$ , where $f:(\Omega_{3},\mathcal{A}_{3})\rightarrow(\Omega_{4},\mathcal{A}_{4})$ . Consider the propositions:

(i)

$X_{1}\perp\perp X_{2}\mid X_{3}$ .

(ii)

$X_{1}\perp\perp X_{2}\mid X_{4}$ .

(iii)

$P^{X_{1}|X_{3}}\perp\perp_{P^{X_{3}}}P^{X_{2}|X_{3}}\mid P^{X_{4}|X_{3}}$ .

Then, if (i) holds, the propositions (ii) and (iii) are equivalent.

Remark.

To obtain a characterization of the statement (iii), note first that, for $i=1,2$ ,

[TABLE]

where $\sigma(X_{3})$ denotes the $\sigma$ -field $X_{3}^{-1}(\mathcal{A}_{3})$ induced by $X_{3}$ . Indeed, we have that, by definition, $(P^{X_{3}})^{P^{X_{i}|X_{3}}|P^{X_{4}|X_{3}}}$ ( $:=Q^{M_{i}|M_{4}}$ ) is a Markov kernel $M_{i4}:(\Omega_{4},\mathcal{A}_{4})\mbox{$ \succ$$\longrightarrow $}(\Omega_{i},\mathcal{A}_{i})$ such that

[TABLE]

for every $A_{i}\in\mathcal{A}_{i}$ and $A_{4}\in\mathcal{A}_{4}\mathchar 46\relax$ But,

[TABLE]

Since $Q^{M_{4}}=P^{X_{4}}$ , it readily follows that

[TABLE]

Analogously, by definition, $(P^{X_{3}})^{P^{X_{1}|X_{3}}\times P^{X_{2}|X_{3}}|P^{X_{4}|X_{3}}}$ ( $:=Q^{M_{1}\times M_{2}|M_{4}}$ ) is a Markov kernel $M_{(12)4}:(\Omega_{4},\mathcal{A}_{4})\mbox{$ \succ$$\longrightarrow $}(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{1})$ such that

[TABLE]

for every $A_{i}\in\mathcal{A}_{i}$ , $i=1,2,4$ . But

[TABLE]

So, the statement (iii) $P^{X_{1}|X_{3}}\perp\perp_{P^{X_{3}}}P^{X_{2}|X_{3}}\mid P^{X_{4}|X_{3}}$ can be expressed in the form

[TABLE]

for every bounded real random variable $f_{i}$ on $(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2$ . $\Box$

In the previous result the $\sigma$ -field $\sigma(X_{4})$ is contained in $\sigma(X_{3})$ . The reader is referred to van Putten et al. (1985) where invariance properties of conditional independence under enlargement or reduction is systematically investigated.

6 Proofs

Proof of Theorem 1. Let us write $Q=P^{X_{3}}$ and $M_{i}=P^{X_{i}|X_{3}}$ , $i=1,2$ . In the following, we suppose $X_{1}\perp\perp X_{2}|X_{3}$ .

We show first that if $X_{1}\perp\perp X_{2}$ , then

[TABLE]

Note that

[TABLE]

i.e.,

[TABLE]

By definition,

[TABLE]

Hence, by conditional independence,

[TABLE]

which coincides with

[TABLE]

since $X_{1}$ and $X_{2}$ are independent.

Now suppose that the Markov kernels $P^{X_{1}|X_{3}}$ and $P^{X_{2}|X_{3}}$ are $P^{X_{3}}$ -independent (in addition that $X_{1}\perp\perp X_{2}|X_{3}$ ). Then, given $A_{i}\in\mathcal{A}_{i}$ , $i=1,2$ , we have that

[TABLE]

which shows that $X_{1}\perp\perp X_{2}$ . $\Box$

Proof of Theorem 2. Let $q_{i}:\Omega_{1}\times\Omega_{2}\times\Omega_{3}\rightarrow\Omega_{i}$ the natural $i^{\text{\footnotesize th}}$ projection, $1\leq i\leq 3$ . Writing $Q=P^{M_{1}\times M_{2}\times M_{3}}$ , we have that

[TABLE]

It follows that

[TABLE]

and the result becomes a consequence of this and Theorem 1. $\Box$

Proof of Theorem 3. Consider the Markov kernels $M_{i}=P^{X_{i}|X_{3}}:(\Omega_{3},\mathcal{A}_{3})\mbox{$ \succ$$\longrightarrow $}(\Omega_{i},\mathcal{A}_{i})$ , $i=1,2,4\mathchar 46\relax$ Write $Q=P^{X_{3}}$ . Note that proposition (iii), i.e. $M_{1}\perp\perp_{Q}M_{2}\mid M_{4}$ , means that

[TABLE]

We assume that (i) holds, that is, $P^{(X_{1},X_{2})|X_{3}}=P^{X_{1}|X_{3}}\times P^{X_{2}|X_{3}}$ . Under such assumption, it will be enough to prove that

[TABLE]

Let us show the first equality, the second being similar.

By definition, $Q^{M_{1}\times M_{2}|M_{4}}$ is a Markov kernel $M:(\Omega_{4},\mathcal{A}_{4})\mbox{$ \succ$$\longrightarrow $}(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{2})$ such that, for every $C\in\mathcal{A}_{1}\times\mathcal{A}_{2}$ and $A_{4}\in\mathcal{A}_{4},$

[TABLE]

Note that, as it can be easily verified, $Q^{M_{4}}=P^{X_{4}}$ . Note also that, being $X_{4}=f\circ X_{3}$ ,

[TABLE]

So,

[TABLE]

It follows that

[TABLE]

Moreover, using (i),

[TABLE]

which shows that, $Q^{M_{1}\times M_{2}|M_{4}}=P^{(X_{1},X_{2})|X_{4}}\mathchar 46\relax$

An analogous reasoning ((i) is not needed in this case) shows that $Q^{M_{i}|M_{4}}=P^{X_{i}|X_{4}},\ \ i=1,2,$ and this finishes the proof. $\Box$

7 Acknowledgments

This work was supported by the Junta de Extremadura (Autonomous Government of Extremadura, Spain) under the project GR15013.

References:

Dawid, A.P. (1979) Conditional Independence in Statistical Theory, Journal of the Royal Statistical Society B 41, 1-31.
Dawid, A.P. (1980) Conditional Independence for Statistical Operations, Annals of Statistics 8, 598-617.
Florens, J.P., Mouchart, M., and Rolin, J.M. (1990) Elements of Bayesian Statistics, Marcel Dekker, New York.
Heyer, H. (1982) Theory of Statistical Experiments, Springer, Berlin.
Nogales, A.G. (2013a) On Independence of Markov Kernels and a Generalization of Two Theorems of Basu, Journal of Statistical Planning and Inference 143, 603-610.
Nogales, A.G. (2013b) Existence of Regular Conditional Probabilities for Markov Kernels, Statistics and Probability Letters 83, 891-897.
Phillips, P.C.B. (1988) Conditional and Unconditional Statistical Independence, Journal of Econometrics 38, 341-348.
van Putten, C.; van Schuppen, J.H. (1985) Invariance Properties of the Conditional Independence Relation, Ann. Probab. 13, no. 3, 934–945.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

1 Introduction and basic definitions

Definition 1**.**

Definition 2**.**

Definition 3**.**

Remark**.**

2 Conditional Independence

Definition 4**.**

Theorem 1**.**

Remark**.**

3 Counterexamples

Example 1**.**

Counterexample 1**.**

Counterexample 2**.**

Counterexample 3**.**

Example 2**.**

Example 3**.**

Remark**.**

Remark**.**

4 Extension to Markov kernels

Definition 5**.**

Remark**.**

Remark**.**

Theorem 2**.**

5 Another extension of the main result

Theorem 3**.**

Remark**.**

6 Proofs

7 Acknowledgments

References:

Definition 1.

Definition 2.

Definition 3.

Remark.

Definition 4.

Theorem 1.

Remark.

Example 1.

Counterexample 1.

Counterexample 2.

Counterexample 3.

Example 2.

Example 3.

Remark.

Remark.

Definition 5.

Remark.

Remark.

Theorem 2.

Theorem 3.

Remark.