A Note on the Relationship Between Conditional and Unconditional Independence, and its Extensions for Markov Kernels
A.G. Nogales, P. P\'erez

TL;DR
This paper explores the relationship between conditional and unconditional independence, extending classical results to Markov kernels and providing new theorems, counterexamples, and representation results to clarify these concepts.
Contribution
It introduces a main theorem linking independence of Markov kernels to conditional independence, extending existing results and providing new insights and counterexamples.
Findings
Main theorem establishes minimal conditions for independence from conditional independence.
Counterexamples clarify the boundaries of the theoretical results.
Extensions to Markov kernels broaden the applicability of independence concepts.
Abstract
Two known results on the relationship between conditional and unconditional independence are obtained as a consequence of the main result of this paper, a theorem that uses independence of Markov kernels to obtain a minimal condition which added to conditional independence implies independence. Some counterexamples and representation results are provided to clarify the concepts introduced and the propositions of the statement of the main theorem. Moreover, conditional independence and the mentioned results are extended to the framework of Markov kernels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Note on the Relationship Between Conditional and Unconditional Independence, and its Extensions for Markov Kernels
A.G. Nogales and P. Pérez
Dpto. de Matemáticas, Universidad de Extremadura
Avda. de Elvas, s/n, 06006–Badajoz, SPAIN.
e-mail: [email protected]
Abstract. Two known results on the relationship between conditional and unconditional independence are obtained as a consequence of the main result of this paper, a theorem that uses independence of Markov kernels to obtain a minimal condition which added to conditional independence implies independence. Some examples, counterexamples and representation results are provided to clarify the concepts introduced and the propositions of the statement of the main theorem. Moreover, conditional independence and the mentioned results are extended to the framework of Markov kernels.
- AMS Subject Class. (2010): Primary 60Exx Secondary 60J35
- Key words and phrases: conditional independence, Markov kernel.
1 Introduction and basic definitions
Conditional independence is a classical and familiar basic tool of both probability theory (think on Markov chains theory, for example) and mathematical statistics. See, for instance, Dawid (1979) and Florens et al. (1990), where an extensive use of conditional independence is made in order to unify many seemingly unrelated concepts of statistical inference, either from the Bayesian and the frequentist point of views. The introduction sections of Dawid (1979), Phillips (1988) and van Putten et al. (1985) list some of the main fields of application of the conditional independence relation: e.g. econometric distribution theory, asymptotic studies of regression with non-ergodic processes, the definition of a stochastic dynamic system and the stochastic realization problem, or, in a statistical framework, the areas of sufficiency, ancillarity, identification or invariance, among others. van Putten et al. (1985) includes also, among other interesting results, a systematic study of invariance properties of conditional independence under enlargement or reduction of the involved -fields, which keep some connection with the main problem raised in this paper.
It is well known that conditional independence does not imply, and it is not implied by, independence. We shall write and for the independence of the random variables and and its conditional independence given a third random variable , respectively.
Section 2 contains the main result of this paper, Theorem 1, that uses independence of Markov kernels, a concept introduced by Nogales (2013a), to obtain a minimal condition which added to conditional independence implies independence.
This way the result becomes an improvement of two known results on the relationship between conditional and unconditional independence: one that constitutes the main goal of Phillips (1988), and another that is obtained as an immediate consequence of Theorem 2.2.10 of Florens et al. (1990) (or the Lemma 4.3 of Dawid (1979)), as it is remarked in Section 3. In this section some examples and counterexamples are also given to delimit the relations between the three propositions of Theorem 1.
In this paper (Section 4) we also attack the problem of constructing a rigorous general theory of conditional independence in terms of Markov kernels; notice that Markov kernels are extensions of the concepts of both random variable and -field, and Theorem 1 is here extended to this new framework. Dawid (1980) constructs a theory of conditional independence for “statistical operations”, which is presented as a slight generalization of Markovian operator, which appears itself as a generalization of Markov kernel. Although this article also runs in the field of specialized mathematics, we hope the reader can find the development of conditional independence in the less abstract frame of Markov kernels (or transition probabilities) useful.
A more general result than Theorem 1 in terms of random variables is finally presented in Section 5. The introduced definition of conditional independence between Markov kernels is used to obtain a minimal condition which added to conditional independence of and given implies the conditional independence of and given , provided is a function of .
The paper is completed with some understandable reformulations of several of the propositions considered. With the same purpose, some representation results of the introduced definitions for Markov kernels in terms of random variables are also facilitated.
For ease of reading, the demonstrations will appear in a final section
In what follows , , and so on, will denote measurable spaces. A random variable is a map such that , for all . Its probability distribution (or, simply, distribution) with respect to a probability measure on is the image measure of by , i.e., the probability measure on defined by . Let us write instead of for the product of -fields or measures. The next definition is well known and can be found, for instance, in Heyer (1982).
Definition 1**.**
(i) (Markov kernel) A Markov kernel M_{1}:(\Omega,\mathcal{A})\mbox{\succ$$\longrightarrow}(\Omega_{1},\mathcal{A}_{1}) is a map such that: a) , is a probability measure on ; b) , is an -measurable map.
(ii) (Diagonal product of Markov kernels) The diagonal product
[TABLE]
of two Markov kernels M_{1}:(\Omega,\mathcal{A})\mbox{\succ\longrightarrow$}(\Omega_{1},\mathcal{A}_{1})$ and $M_{2}:(\Omega,\mathcal{A})\mbox{$\succ\longrightarrow}(\Omega_{2},\mathcal{A}_{2}) is defined as the only Markov kernel such that
[TABLE]
(iii) (Image of a Markov kernel) The image (let us also call it probability distribution) of a Markov kernel M_{1}:(\Omega,\mathcal{A},P)\mbox{\succ$$\longrightarrow}(\Omega_{1},\mathcal{A}_{1}) on a probability space is the probability measure on defined by .
Definition 2**.**
(Independence of Markov kernels, Nogales (2013a)) Let be a probability space. Two Markov kernels M_{1}:(\Omega,\mathcal{A},P)\mbox{\succ\longrightarrow$}(\Omega_{1},\mathcal{A}_{1})$ and $M_{2}:(\Omega,\mathcal{A},P)\mbox{$\succ\longrightarrow}(\Omega_{2},\mathcal{A}_{2}) are said to be independent if . We write (or ).
Given two random variables , , the conditional distribution of given , when it exists, is a Markov kernel M_{1}:(\Omega_{1},\mathcal{A}_{1})\mbox{\succ$$\longrightarrow}(\Omega_{2},\mathcal{A}_{2}) such that , for all and . We write . Reciprocally, every Markov kernel is also a conditional distribution, as it is noted in (2013b). This paper also introduces the next definition.
Definition 3**.**
(Conditional distribution of a Markov kernel given another) Let M_{1}:(\Omega,\mathcal{A},P)\mbox{\succ\longrightarrow$}(\Omega_{1},\mathcal{A}_{1})$ and $M_{2}:(\Omega,\mathcal{A},P)\mbox{$\succ\longrightarrow}(\Omega_{2},\mathcal{A}_{2}) be two Markov kernels over the same probability space. The conditional distribution of given is defined as a Markov kernel L:(\Omega_{2},\mathcal{A}_{2})\mbox{\succ$$\longrightarrow}(\Omega_{1},\mathcal{A}_{1}) such that, for every pair of events and ,
[TABLE]
Remark**.**
An interesting problem in this context is the existence of such conditional distributions, something that happens under well known regularity conditions on the involved measurable spaces, e.g. , or the corresponding measurable space , is a standard Borel space. This is the same for both random variables and Markov kernels (see Nogales (2013b)). In the rest of the paper we will assume this when necessary.
2 Conditional Independence
Let us recall the definition of conditional independence for random variables; we refer to Dawid (1979), for instance, where some basic properties are also given.
Definition 4**.**
Let , be arbitrary random variables and are said to be conditional independent given , and we write (or to be more precise), if
[TABLE]
We are now ready for the main result of the paper.
Theorem 1**.**
If and are conditional independent given , then and are independent if, and only if, the Markov kernels and are -independent.
Remark**.**
(Some reformulations of the three propositions involved in the previous theorem) By definition, means that, for every , ,
[TABLE]
This is equivalent to
[TABLE]
for every bounded real random variables , .
In particular, is equivalent to
[TABLE]
Finally, means that, for every , ,
[TABLE]
which is equivalent to
[TABLE]
for every pair of functions as above. As , this is equivalent to the uncorrelatedness of the conditional expectations given of every pair of real bounded measurable functions of and . Seen in this way, the independence of these two conditional distributions has a degree of difficulty comparable to other conditions that appear in the literature cited in the bibliography; for instance, see Proposition 2.4.g of van Putten et al. (1985), or others appearing in the results following it.
3 Counterexamples
Let , be random variables. Consider the propositions:
- (i)
.
- (ii)
.
- (iii)
.
We have shown that and , i.e., in presence of (i), the statements (ii) and (iii) are equivalent. In particular, (iii) is just we need to reach independence from conditional independence.
We can ask ourselves if every two of these propositions implies the third. In particular, we wonder if (i) and (ii) are equivalent when (iii) is satisfied. All the answers are negative, as the next counterexamples show. We also include two examples in which the theorem applies.
First, let us describe a common framework for them.
Let be a population with individuals and consider a partition of . We write for the number of individuals of . One or more of the indices can be replaced by a sign to denote the union of the corresponding sets of the partition: for instance, . In particular, . Similar notations should be used for the numbers (e.g. ). Such a situation will be referred to as
[TABLE]
We introduce three dichotomic random variables as follows:
[TABLE]
Example 1**.**
A scheme like this could be obtained when we want to study the relationship between two diagnostic procedures, represented by the dichotomous variables and ( or [math] when the diagnostic test is positive or negative, respectively), for a disease represented by the dichotomous variable , which takes the values 1 or 0 depending on whether the disease is actually present or absent. In this case, we have the following equivalence for some known related concepts:
[TABLE]
The independence of and means that, for every ,
[TABLE]
The independence of and with respect to means that, for every ,
[TABLE]
that is to say,
[TABLE]
The conditional independence of and given , i.e. , means that, for every ,
[TABLE]
or
[TABLE]
which is the same as
[TABLE]
The following counterexamples delimit Theorem 1.
Counterexample 1**.**
For it is easy to see that and are -independent, but and are not -independent. So, in absence of (i), (ii) is not implied by (iii).
Counterexample 2**.**
For , and are not -independent. Nevertheless and are independent. Obviously, and are not conditionally independent given . So, in absence of (i), (iii) is not implied by (ii).
Counterexample 3**.**
For , and are -independent, and and are independent, but and are not conditionally independent given . So (i) is not implied by (ii)+(iii).
In the next two examples, the condition (i) holds. Hence the propositions (ii) and (iii) hold or not simultaneously. See also the remark below to see how Theorem 1 is an improvement of two previous known results on the relationship between unconditional and conditional independence.
Example 2**.**
For the three propositions (i), (ii) and (iii) are satisfied.
Example 3**.**
For , (i) holds, but not (ii) or (iii).
Remark**.**
Keeping the previous notations, it is known that (i) + implies (ii); see, for instance, Florens et al. (2000, Theorem 2.2.10) or Lemma 4.3 of Dawid (1979) when the conditioning on is absent. Theorem 1 is an improvement of this result as implies, and it is not implied by, (iii), as we prove in what follows. It is easy to see that the independence of and implies (iii). Indeed, given bounded real random variables , , the independence of and yields and hence
[TABLE]
which is equivalent to (iii). Let us show that the reciproque is not true: it is proved in Nogales (2013b) that, for a trivariate normal random variable with null mean and covariance matrix , the -conditional distribution of given that has taken the value follows a normal distribution with mean and variance , where denotes the correlation coefficient of and . So, the Markov kernels and are -independent if, and only if, coincides with (which coincides with ), and this happens if or . So, for and , we have that and are -independent, but and are not independent.
Remark**.**
Phillips (1988) shows the next result: “For , consider random variables , , If , then
[TABLE]
is equivalent to
[TABLE]
whatever be the events , ”
This is a particular case of Theorem 1 with no more to take , y . Indeed, according to Theorem 1, if , then is equivalent to , which in turns means that, for every bounded real random variable on ,
[TABLE]
If , making , , it follows that
[TABLE]
and, on the other hand,
[TABLE]
and
[TABLE]
Hence
[TABLE]
It is readily shown that, from these two equalities, we obtain
[TABLE]
for every bounded real random variables on .
4 Extension to Markov kernels
In this section we extend to Markov kernels the concept of conditional independence. Theorem 1 is also extended to this framework.
Definition 5**.**
(Conditional independence of Markov kernels) Given three Markov kernels M_{i}:(\Omega,\mathcal{A},P)\mbox{\succ$$\longrightarrow}(\Omega_{i},\mathcal{A}_{i}), , we shall say that and are conditionally independent given , and we write (or if there is not ambiguity), when
[TABLE]
Remark**.**
(A representation in terms of random variables) Keeping the suppositions of the previous definition, let us write for the natural projection on , It is readily shown that
[TABLE]
So,
[TABLE]
Moreover, when and is integrable, from
[TABLE]
we obtain that
[TABLE]
Remark**.**
(Characterization in terms of densities) Suppose that, for , is a -finite measure on such that , where is a nonnegative real -measurable function on . Usually, the dominating measure is the counting measure in the discrete (respectively, the Lebesgue measure in the continuous) case, both in the univariate and multivariate framework. It is shown in Nogales (2013b) that the map is a -density of and, besides, for , the conditional distribution exists and, for -almost every , the map
[TABLE]
is a -density of .
A similar reasoning shows that the map
[TABLE]
is a -density of , and the conditional distribution exists and, for -almost every , the map
[TABLE]
is a -density of .
Hence, the conditional independence of and given means that, for -almost every and -almost every ,
[TABLE]
The next theorem extends Theorem 1 to Markov kernels.
Theorem 2**.**
Let M_{i}:(\Omega,\mathcal{A},P)\mbox{\succ$$\longrightarrow}(\Omega_{i},\mathcal{A}_{i}), be Markov kernels. Consider the propositions:
- (i)
.
- (ii)
.
- (iii)
.
Then, under (i), the propositions (ii) and (iii) are equivalent.
5 Another extension of the main result
A more general result than Theorem 1 in terms of random variables is presented in this section, where the introduced definition of conditional independence between Markov kernels is used to obtain a minimal condition which added to conditional independence of and given implies the conditional independence of and given when is function of . In fact, Theorem 1 appears as the particular case in which is a constant function.
Theorem 3**.**
Let X_{i}:(\Omega,\mathcal{A},P)\mbox{\succ$$\longrightarrow}(\Omega_{i},\mathcal{A}_{i}), random variables. Suppose that , where . Consider the propositions:
- (i)
.
- (ii)
.
- (iii)
.
Then, if (i) holds, the propositions (ii) and (iii) are equivalent.
Remark**.**
To obtain a characterization of the statement (iii), note first that, for ,
[TABLE]
where denotes the -field induced by . Indeed, we have that, by definition, () is a Markov kernel M_{i4}:(\Omega_{4},\mathcal{A}_{4})\mbox{\succ$$\longrightarrow}(\Omega_{i},\mathcal{A}_{i}) such that
[TABLE]
for every and But,
[TABLE]
Since , it readily follows that
[TABLE]
Analogously, by definition, () is a Markov kernel M_{(12)4}:(\Omega_{4},\mathcal{A}_{4})\mbox{\succ$$\longrightarrow}(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{1}) such that
[TABLE]
for every , . But
[TABLE]
So, the statement (iii) can be expressed in the form
[TABLE]
for every bounded real random variable on , .
In the previous result the -field is contained in . The reader is referred to van Putten et al. (1985) where invariance properties of conditional independence under enlargement or reduction is systematically investigated.
6 Proofs
Proof of Theorem 1. Let us write and , . In the following, we suppose .
- We show first that if , then
[TABLE]
Note that
[TABLE]
i.e.,
[TABLE]
By definition,
[TABLE]
Hence, by conditional independence,
[TABLE]
which coincides with
[TABLE]
since and are independent.
- Now suppose that the Markov kernels and are -independent (in addition that ). Then, given , , we have that
[TABLE]
which shows that .
Proof of Theorem 2. Let the natural projection, . Writing , we have that
[TABLE]
It follows that
[TABLE]
and the result becomes a consequence of this and Theorem 1.
Proof of Theorem 3. Consider the Markov kernels M_{i}=P^{X_{i}|X_{3}}:(\Omega_{3},\mathcal{A}_{3})\mbox{\succ$$\longrightarrow}(\Omega_{i},\mathcal{A}_{i}), Write . Note that proposition (iii), i.e. , means that
[TABLE]
We assume that (i) holds, that is, . Under such assumption, it will be enough to prove that
[TABLE]
Let us show the first equality, the second being similar.
By definition, is a Markov kernel M:(\Omega_{4},\mathcal{A}_{4})\mbox{\succ$$\longrightarrow}(\Omega_{1}\times\Omega_{2},\mathcal{A}_{1}\times\mathcal{A}_{2}) such that, for every and
[TABLE]
Note that, as it can be easily verified, . Note also that, being ,
[TABLE]
So,
[TABLE]
It follows that
[TABLE]
Moreover, using (i),
[TABLE]
which shows that,
An analogous reasoning ((i) is not needed in this case) shows that and this finishes the proof.
7 Acknowledgments
This work was supported by the Junta de Extremadura (Autonomous Government of Extremadura, Spain) under the project GR15013.
References:
- Dawid, A.P. (1979) Conditional Independence in Statistical Theory, Journal of the Royal Statistical Society B 41, 1-31.
- Dawid, A.P. (1980) Conditional Independence for Statistical Operations, Annals of Statistics 8, 598-617.
- Florens, J.P., Mouchart, M., and Rolin, J.M. (1990) Elements of Bayesian Statistics, Marcel Dekker, New York.
- Heyer, H. (1982) Theory of Statistical Experiments, Springer, Berlin.
- Nogales, A.G. (2013a) On Independence of Markov Kernels and a Generalization of Two Theorems of Basu, Journal of Statistical Planning and Inference 143, 603-610.
- Nogales, A.G. (2013b) Existence of Regular Conditional Probabilities for Markov Kernels, Statistics and Probability Letters 83, 891-897.
- Phillips, P.C.B. (1988) Conditional and Unconditional Statistical Independence, Journal of Econometrics 38, 341-348.
- van Putten, C.; van Schuppen, J.H. (1985) Invariance Properties of the Conditional Independence Relation, Ann. Probab. 13, no. 3, 934–945.
