Bayesian definition of random sequences with respect to conditional probabilities
Hayato Takahashi

TL;DR
This paper explores the concept of ML-randomness in Bayesian models, analyzing conditional randomness variants, their well-definedness, and the relationship between posterior convergence and ML-random parameters.
Contribution
It introduces a new perspective on conditional randomness in Bayesian models and provides an algorithmic solution to posterior convergence issues related to ML-random parameters.
Findings
Conditional blind randomness variants are ill-defined in Bayesian context.
Existence of a consistent estimator when model sets are pairwise disjoint.
Posterior distributions converge to ML-random parameters if and only if they converge weakly to all such parameters.
Abstract
We study Martin-L\"{o}f random (ML-random) points on computable probability measures on sample and parameter spaces (Bayes models). We consider variants of conditional randomness defined by ML-randomness on Bayes models and those of conditional blind randomness. We show that variants of conditional blind randomness are ill-defined from the Bayes statistical point of view. We prove that if the sets of random sequences of uniformly computable parametric models are pairwise disjoint then there is a consistent estimator for the model. Finally, we present an algorithmic solution to a classical problem in Bayes statistics, i.e., the posterior distributions converge weakly to almost all parameters if and only if the posterior distributions converge weakly to all ML-random parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Machine Learning and Algorithms · Benford’s Law and Fraud Detection
Bayesian definition of random sequences with respect to conditional probabilities111Parts of the paper
were presented at the Ergod Theory Seminar (2016 Tsukuba Univ.), Probability Seminar (2017 Kyoto Univ.), ISIT2017 Aachen, CCR2017 Mysour, SITA2017 Niigata, MSJ2017 Tokyo Metropolitan Univ., MSJ2017 Yamagata Univ., MSJ2020 (online presentation) Nihon Univ., and MSJ2023 Chuo Univ.
Hayato Takahashi
Abstract
We study Martin-Löf random (ML-random) points on computable probability measures on sample and parameter spaces (Bayes models). We consider variants of conditional randomness defined by ML-randomness on Bayes models and those of conditional blind randomness. We show that variants of conditional blind randomness are ill-defined from the Bayes statistical point of view. We prove that if the sets of random sequences of uniformly computable parametric models are pairwise disjoint then there is a consistent estimator for the model. Finally, we present an algorithmic solution to a classical problem in Bayes statistics, i.e. the posterior distributions converge weakly to almost all parameters if and only if the posterior distributions converge weakly to all ML-random parameters.
keywords:
Martin-Löf randomness, generalized van Lambalgen’s theorem, conditional probability, collective, Bayes consistency theorem, uniform randomness
MSC[2020] 03D32, 68Q30
††journal: Information and Computation
url]http://h-takahashi.sakura.ne.jp \affiliation[rdl]organization=Random Data Lab. Inc., addressline=3-8-18 Minami-Hanahata Adachi-ku, city=Tokyo, postcode=1210062, country=Japan
{highlights}
Algorithmic randomness for conditional probabilities is studied.
Blind randomness is ill-defined for conditional probabilities.
Effective orthogonality and existence of consistent estimator are equivalent.
An algorithmic solution to a classical problem in Bayes statistics.
1 Introduction
We study Martin-Löf random [17, 18, 19, 23, 22, 28] (ML-random) points on computable probability measures on sample and parameter spaces (computable Bayes models). We assume that samples and parameters are infinite binary sequences except for Section 4.2. A conditional distribution [16, 36] is defined for a Bayes model. A conditional distribution given a finite prefix of sample sequence is called a posterior distribution. A marginal distribution on parameter space is called a prior.
In Bayes statistics, we study relations between samples and parameters. Loosely speaking, we say that a probability model is a true model of a sequence or a sequence is generated by the model if the sequence is random with respect to (w.r.t.) the model. In Bayes statistics, we assume that a sample sequence is generated by the marginal distribution on sample space. Then we estimate the parameter by the posterior distributions given finite prefixes of such that is random w.r.t. the conditional distribution given . The Bayes consistency theorem [11, 8, 14] says that the posterior distributions given finite prefixes of weakly converge to a parameter for almost all w.r.t. the conditional distribution given for almost all w.r.t. the prior for an appropriate class of Bayes models (consistent Bayes models). The parameter of the true model is not determined a priori but is estimated by the posterior distributions cf. von Mises [25]. If we consider only a fixed parameter, we do not know if the posterior distributions given finite prefixes of random sequence weakly converge to some parameter.
To state the Bayes consistency theorem for individual random sequences and parameters, we need to consider the sets of random sequences and parameters w.r.t. marginal distributions and the family of sets of random sequences w.r.t. conditional distributions given random parameters simultaneously. For computable Bayes models, we assume that (i) a sequence is random w.r.t. the marginal distribution on sample space if and only if there is a random parameter and the sequence is random w.r.t. the conditional distribution given the parameter. For computable and consistent Bayes models, we further assume that (ii) if a sequence is random w.r.t. the marginal distribution on sample space then the posterior distributions given finite prefixes of the sequence weakly converge to the true model. The assumption (i) is equivalent to that (i’) the set of random sequences w.r.t. the marginal distribution on sample space equals the union of the sets of random sequences w.r.t. the conditional distributions given random parameters. The family of the sets of random sequences w.r.t. the conditional distributions given random parameters is determined by those distributions (and hence by the Bayes model). The assumption (i’) requires that the union of those sets is determined only by the marginal distribution on sample space. Assumption (i), (i’), and (ii) are natural requirements for any notion of randomness in Bayes models, see Section 4.3.
Theorem 2.1 [30, 32, 31] shows that for a computable Bayes model there exists a version of conditional distribution (the standard conditional distribution) that is defined for ML-random parameters w.r.t. the prior. We study four families of the sets of random sequences w.r.t. the standard conditional distribution: two of them are defined by ML-randomness on Bayes models (variants of conditional Bayes randomness) and the others are defined by the blind (Hippocratic) tests [6, 15] for conditional distributions with given parameters (variants of conditional blind randomness). Theorem 2.4 parts 1 and 3 show that variants of conditional Bayes randomness satisfy the assumptions (i’) and (ii) for all computable Bayes models. On the other hand, there are different computable and consistent Bayes models that have the same marginal distribution on sample space but their unions of conditional blind random sequences for all random parameters are different. Theorem 2.4 part 4 shows that there is no family of sets of random sequences w.r.t. the marginal distribution on sample space that satisfy the assumption (i’) for variants of conditional blind randomness for the class of computable and consistent Bayes models. We consider that variants of conditional blind randomness are ill-defined as conditional randomness for Bayes models.
The rest of the paper is structured as follows. In Section 2, we state our main theorem. In Section 3, we study ML-random points in Bayes models. In Section 3.1, we show the relationship between variants of random sequences w.r.t. the standard conditional distributions. In this paper, we assume the computability of joint probability measures, however, we do not demand uniform computability on standard conditional distributions. In Section 4.1, we compare ML-randomness in Bayes models with randomness for uniformly computable parametric models [21, 6, 13, 15, 35]. Theorem 4.6 shows that standard conditional distributions are equal to uniformly computable parametric models on ML-random parameters when the Bayes models are constructed from uniformly computable parametric models and computable priors. We show that if the sets of uniform random sequences are pairwise disjoint for different parameters (effectively orthogonal [6]), then there is a consistent estimator, i.e. a measurable function that equals with probability one for the probability model of the parameter for all . In Section 4.2, we discuss ML-randomness of Bayes models on complete separable metric spaces and present an algorithmic solution to a classical problem in Bayes statistics [9], i.e. for the Bayes models on complete separable metric spaces, the posterior distribution is consistent at almost all parameters if and only if the posterior distribution is consistent at all ML-random parameters. Finally in Section 4.3, we briefly re-discuss randomness in statistical models. We compare our notion of randomness in Bayes models with that in non-Bayes models (parametric models without prior), which may help to understand our assumptions (i), (i’), and (ii).
2 Main theorem
Let be the set of infinite binary sequences, and the set of finite binary strings. We also call an element of a point. Let be the set of infinite binary sequences that start with , and the length of . To clarify the difference between finite strings and infinite sequences, we use symbols such as for finite strings and for infinite sequences. Except for the clearly stated cases, do not confuse symbols such as with the repetition of a string. Let be the empty word. We write if is a prefix of for including the case and if and . Let be the complement of . We write for . For , is defined similarly.
Let open sets in be those generated by , i.e. every open set in is a union of elements in . Let be a measurable space where is the smallest -algebra that includes . Similarly, let open sets in be those generated by and a measurable space where is the smallest -algebra that includes . Let for all where and . Then is a metric on and open sets induced by are equivalent to those generated by . Similarly for , let . Then is a metric on and open sets induced by are equivalent to those generated by . In the following, we consider the metrics on and on .
Except for Section 4.2, set . Let be a probability measure on , marginal distribution on , and marginal distribution on . For all , let , , and . For all , let if and (posterior distribution) if .
Let . is called a test (effective null set) or ML-test w.r.t. on if is recursively enumerable (r.e.), , and , where , for all . The ML-random sequences w.r.t. are defined as the complement of the effective null sets w.r.t. . We denote it by , i.e. . Let , where is a test with oracle , i.e. is r.e. with oracle , , and for all . Similarly, and are defined w.r.t. on . and are forms of blind randomness, i.e. we neither assume that is computable nor computable with oracle . In the following, for simplicity, we say that and are computable if on and are computable, respectively.
We obtain the following theorem from the martingale convergence theorem for individual ML-random sequences [30, 31].
Theorem 2.1** **(Takahashi [30, 32, 31])
Assume that is computable. For all and , set
[TABLE]
if the right-hand side exists. Then for each , is a probability measure on .
Definition 2.2
The family of probability measures is called the standard conditional distribution.
For each , conditional probability is a Borel-measurable random variable on that satisfies
[TABLE]
see Theorem 33.1 on page pp.434 in [7]. Conditional probability is not unique. Two versions of conditional probability are called equivalent if they coincide for almost all . From the Radon-Nikodým theorem, the standard conditional distribution is a version of conditional probability [36]. A version of conditional probability is called regular if the conditional probability given the parameter is a probability measure for almost all parameters. By Theorem 2.1, the standard conditional distribution is a probability measure for each ML-random parameter and hence regular, but it may not be computable with oracle access to the ML-random parameter, see Remark 3.3. To state the theorems in the Bayes models for individual ML-random points, e.g. consistency theorem for the posterior distributions (Theorem 3.4), we fix the standard conditional distribution as a version of conditional probability. A computable Bayes model defines a standard conditional distribution and prior, and vice versa.
We define consistent Bayes models.
Definition 2.3
Let be a probability measure on . Let be the probability measure on such that . The posterior distribution weakly converges to as if and only if for any neighborhood of , . The posterior distribution is called consistent at if it weakly converges to as for almost all w.r.t. . is called consistent if the posterior distribution is consistent at almost all parameters.
For a set , let . Consider the following sets for ,
(i) , the intersection of the sets of ML-random sequences w.r.t. for all finite prefixes of ,
(ii) , the section of ML-random points at ,
(iii) , ML-random sequences w.r.t. , and
(iv) , ML-random sequences w.r.t. with oracle .
Figure 1 shows relations between these sets.
Theorem 2.4** **(Main theorem)
1. Assume that is computable. Then
[TABLE]
2. Assume that is computable and is computable with oracle for each fixed . Then
[TABLE]
3. Assume that is computable and consistent. Then
[TABLE]
4. There is no such that
[TABLE]
Similarly, there is no such that
[TABLE]
*5. Assume that is computable. Then
the sets (i)–(iv) are Borel sets in and the probabilities of these sets w.r.t. are one for all , and
and are Borel sets in and the probabilities of these sets w.r.t. are one.*
The proof of Theorem 2.4 is given after the proof of Theorem 3.10.
Roughly speaking, the sets (i) and (ii) are defined from Bayes models; those (iii) and (iv) are defined from conditional probabilities with given parameters. For simplicity, we call the families of sets and variants of conditional Bayes randomness. We call the families of sets and variants of conditional blind randomness. By Theorem 2.4 parts 1 and 3, both of the variants of conditional Bayes randomness satisfy our assumptions (i’) and (ii). On the other hand, by Theorem 2.4 part 4, neither variant of conditional blind randomness satisfies the assumption (i’). We consider that variants of conditional blind randomness are ill-defined as conditional randomness for Bayes models.
Takahashi [30, 32, 31, 33] defined the family of sets to be the sets of conditional random sequences w.r.t. the standard conditional distribution .
Remark 2.5
The author does not know if the sets and are always measurable for all computable .
3 ML-randomness and Bayes models
The van Lambalgen theorem [20] states that a pair of sequences is ML-random w.r.t. the product of uniform measures if and only if is ML-random and is ML-random with oracle .
Theorem 3.1** **(Generalized van Lambalgen theorem, Takahashi [30, 32, 31, 33])
Assume that is computable. Then
[TABLE]
Fix and assume that is computable with oracle . Then
[TABLE]
Theorem 3.2** **(Takahashi [30, 32, 31])
Assume that is computable. Then
[TABLE]
[TABLE]
Remark 3.3
1. Vovk and V’yugin [35] proved (2) and (3) for Bayes models that are constructed from uniformly computable parametric models and computable priors, see Section 4.1.
2. Theorem 5.2 in [31] demonstrates equation (1) when is computable with oracle . However, the same proof for (1) holds true when is not computable with oracle . In [33], (2) is proved without assuming uniform computability of the conditional distributions.
3. Non-computable conditional distributions are presented in work by Ackerman et al. [1]. Bauwens [4] showed an example that violates the equality in (2) and is a proper subset of in (1) when the conditional distribution is not computable with oracle . Takahashi [34] showed an example that the conditional distributions are not computable with oracle for all , but (2) holds true. For more details on the generalized van Lambalgen theorem, see the survey Bauwens et al. [5]. ∎
Two probability measures and on are called orthogonal and denoted by if there is such that and . The following theorem shows equivalent statements for the consistency of Bayes models including statements described with ML-randomness.
Theorem 3.4
*Assume that is computable. The following statements are equivalent:
(i) for all .
(ii) for all .
(iii) weakly converges to as for all .
(iv) .
(v) is consistent.
(vi) There is a measurable such that for almost all w.r.t. .
(vii) for all .
(viii) There is a measurable onto such that for all .*
Before we prove the theorem, we show a lemma.
Lemma 3.5
Assume that is consistent. Then there is a measurable such that for almost all w.r.t. .
Proof) Let
[TABLE]
where is the sequence consisting of 0s.
Then is well-defined, i.e. for all . We show that is measurable. Let
[TABLE]
We have that , if , and . By the definition of weak convergence of the posterior distributions, we have
[TABLE]
Since is open for all , we have that for all and is measurable. Since is consistent, we have for almost all . ∎
Proof of Theorem 3.4) For the proof of (i) (ii) (iii), see the proof of Theorem 6.1 in [33]. The implication (iii) (iv) is immediate. The implication (iv) (v) follows from that . The implication (v) (vi) is due to Lemma 3.5. We show (vi) (i). By (vi), we have and for all . By the Fubini theorem and (vi), we have and for all . We have the equivalence (i)–(vi).
The implication (iii) (vii) is immediate. We show (vii) (viii). Let for all . By (vii) and Theorem 3.2, is well-defined, an onto function, and for all . By Theorem 3.2, and similarly, for any , and we see that in (viii) is measurable. The proof of (viii) (i) is similar to that of (vi) (i). ∎
Remark 3.6
Takahashi [30, 32, 33] showed the equivalence of the statement in Theorem 3.4 except for statements (iv) and (v). The equivalence (i) and (vi) is due to Corollary 2 in Breiman et al. [8]. The equivalence (i) and (ii) is due to Martin-Löf (pp. 103 second paragraph [24]), see also Theorem 4.1 in [33] and Theorem 8.6 in [6]. Doob [11] showed that if there is a measurable function such that for all then is consistent, see also [14, 27].
For further study on ML-random points on Bayes models, see Dębowski [10], Takahashi [31, 33] and Vovk and V’yugin [35]. When the prior is discrete, Theorem 3.4 reduces to the consistency of the MDL model selection [3, 2] for individual random sequences [33]. Li and Vitányi [22] study MDL model selection in terms of Kolmogorov complexity.
3.1 Relations between conditional randomness
We prove all inclusions shown in Figure 1.
Lemma 3.7
Assume that is computable. Then
[TABLE]
Proof) First we prove the lemma for . Let then . Let be a test w.r.t. and . is a test w.r.t. . Since , from Corollary 4.1 in [31], for all there is an integer such that
[TABLE]
where is the characteristic function, i.e. if and [math] otherwise. We have a test () w.r.t. and the lemma is proved for . Similarly, we can show the lemma for all finite prefix . ∎
Theorem 3.8
Assume that is computable.
[TABLE]
If is consistent, we have
[TABLE]
Fix and assume that is computable with oracle . If is consistent, we have
[TABLE]
Proof) Similarly, as presented in Theorem 3.2, we can show that
[TABLE]
From Lemma 3.7, we obtain (4).
Assume that is consistent. By Theorem 3.4, we have if . From (7), we have
[TABLE]
From (4) and (8) we have (5). Eq. (6) follows from Theorem 3.1 and (5). ∎
The following lemma shows a counterexample that the inclusions (i) and (iv) in Figure 1 can be strict simultaneously.
Lemma 3.9
Let , where is the uniform probability measure on , then and for all .
Proof) The diagonal set is covered by a test, i.e. . Let , from van Lambalgen’s theorem [20], we have . ∎
To show an example in which in (5), we construct a consistent Bayes model with the properties (i)–(iv) listed in Theorem 3.10 by modifying the examples of Bauwens [4] (Example 2 in [5]). Note that the examples in [4, 5] do not imply (iv) in Theorem 3.10.
Theorem 3.10
*There is such that
(i) is computable,
(ii) and is not computable,
(iii) for all , and
(iv) and .*
Here, is the sequence consisting of 1s.
Proof of Theorem 3.10) First, we prove (i). Let be the uniform distribution on . Let , where is the prefix complexity. Let be the binary expansion of then [22]. Let where and for . Consider a computable sequence of strings such that is increasing and .
For all , set , for , and . We define the measure on as follows. For all and , define
[TABLE]
where is the string consisting of 1s. For the construction of the measure, see Fig. 2. By construction, the total measure of is one. Let . For all , we have
[TABLE]
Since the right-hand sides are computable, is computable.
Next, we show (ii). Since , we have and . We have if and if . Since is not computable, is not computable.
By construction, we have for all and . Since , the set is covered by a test, and we have . By construction, is orthogonal for different ; we obtain statement (iii) of the theorem.
Similar to [5, 4], set . Then is r.e. and for all . is a test that covers , i.e. .
Let be an increasing sequence of prefixes of such that as . By construction, . We have , , and . Since , from (1), we have and . Since , from Theorem 3.8, we have the first part of statement (iv) of the theorem.
Since , , are disjoint for different (Theorem 3.4), and for all (Theorem 3.8), we have the latter part of statement (iv) of the theorem. ∎
Proof of Theorem 2.4. Part 1 follows from
[TABLE]
where (9) follows from (4), and (10) follows from (7). By Lemma 3.7, we have .
Part 2 follows from part 1 and Theorem 3.1.
Proof of part 3. The if part follows from part 1. The only if part follows from Theorem 3.4 (iii) and (5).
Proof of part 4. Let and , where is the uniform measure on . Then is computable and consistent. We have and . Let be the computable and consistent model defined in Theorem 3.10. Then and . We have the proof.
Proof of part 5. By definition of ML-random sequences, the sets (i), (iii), and (iv) are Borel sets. By definition of ML-random points, . By Fubini theorem, . By Theorem 3.1 and 3.8, the set (iv) is the smallest one among the sets (i)–(iv), and we have the first part of part 1. The latter part follows from part 1. ∎
Remark 3.11
In Theorem 3.10, is not computable with oracle , however . The author does not know if the statement is always true for the consistent Bayes models.
4 Discussions
We discuss the three topics: Bayes models and uniformly computable parametric models, an algorithmic solution to a classical statistical problem, and randomness in statistical models.
4.1 Bayes models and uniformly computable parametric models
Before we show our results, we summarize several known results for uniform tests [21, 6, 13, 15, 35]. Let be a parameterized family of distributions (parametric model for short). We assume that is measurable for any fixed [8] and uniformly computable [21, 6, 13, 15, 35]. If is uniformly computable, then is defined for all and the function is continuous for each fixed .
A function is called lower semicomputable if the set is effectively open uniformly in rational . A lower semicomputable function is defined in a similar manner [6]. We call a function a blind test w.r.t. and oracle if is lower semicomputable with oracle and . Let be the set of blind tests w.r.t. and oracle . The set of blind random sequences w.r.t. and oracle is defined by . The set of blind random sequences w.r.t. , is defined similarly. A lower semicomputable function is called a uniform test w.r.t. a parametric model if for all .
The following proposition shows the relationship between (), blind randomness, and randomness for uniformly computable parametric models.
Proposition 4.1** **(Bienvenu et al. [6])
*1. and for all probability measure on and oracle . Assume that are computable and computable with oracle then and coincide with Martin-Löf random sequences and those with oracle [23], respectively.
2. Assume that is uniformly computable. Then, there is a universal uniform test , i.e. for each uniform test w.r.t. for there is such that , and for all .*
A Bayes model is defined from a parametric model and a prior by
[TABLE]
Theorem 4.2** **(Vovk and Y’yugin[35])
Assume that a parametric model is uniformly computable and is computable. Let be the Bayes model defined by (11). Then
[TABLE]
Definition 4.3** **(Bienvenu et al. [6])
A uniformly computable parametric model is called effectively orthogonal if for all .
Theorem 4.4** **(Theorem 5.41 in Bienvenu et al. [6])
Assume that a parametric model is uniformly computable and effectively orthogonal. Then
[TABLE]
We state our results for uniform tests.
Definition 4.5
* is called continuous at if for all and for any sequence such that .*
Theorem 4.6
*Assume that a parametric model is uniformly computable and is computable. Let be the Bayes model defined by (11). Then,
1. is computable. For all , , is continuous and computable with oracle , , and .
2. In addition, if is effectively orthogonal then, is consistent.*
Proof) Since is uniformly computable and is computable, is computable. From (11), for all
[TABLE]
Since is uniformly computable, is continuous for all . From (13), for all and ,
[TABLE]
By Theorem 2.1, for all and
[TABLE]
Since is arbitrary, we have that for all and the latter statements in part 1. Continuity of follows from that of .
Proof of part 2. Assume that the parametric model is effectively orthogonal. By Theorem 4.2, for all and . If , we have , see [20] or [31]. By Theorem 3.4, is consistent. ∎
The following proposition shows that neither of the converses of Theorem 4.6 is true.
Proposition 4.7
*There is a computable and consistent such that
(i) is computable with oracle for all and
(ii) and is not continuous at .*
Proof) Let for and . Let for all if for . Let for all if for . Let and . Then is a computable probability measure and . if and . Let be a Bayes model defined by (11). Then, is computable and for all . By Theorem 3.4, is consistent. Since and , is not continuous at . ∎
By Theorem 4.6 and Proposition 4.7, the assumption of Theorem 3.1 and 3.2 is weaker than that of Theorem 4.2. In [35], a quantitative version of Theorem 4.2 is shown, for more details, see Section 7 in [5] and [26]. The equation (6) in Theorem 3.8 is true under computable consistent Bayes models. By Theorem 4.6 and Proposition 4.7, those models constitute a larger family of models than those defined by (11) with uniformly computable and effectively orthogonal parametric models and computable priors. Note that the equality in (12) is true for all while that in (6) is true for all . In [6], Theorem 4.4 is proved for the families of measures that are parameterized by themselves. Kjos Hanssen [15] demonstrated a different proof of (12) for the Bernoulli model.
Definition 4.8
A measurable function (estimator) is called consistent for the parametric model if for all . A parametric model is called consistent if there exists a consistent estimator.
In Definition 8.1 [6], uniformly computable consistent models are called orthogonal. We use the term consistent for the orthogonal models to distinguish pairwise orthogonality from orthogonality.
Theorem 4.9
Assume that is uniformly computable. Let
[TABLE]
*The following statements are equivalent.
(i) is effectively orthogonal.
(ii) is well-defined and a consistent estimator for .*
We give the proof after Definition 4.10, Proposition 4.11, and Lemma 4.12.
Definition 4.10
Let be a subset of and be a subset of . Set
[TABLE]
A countable union of closed sets is called set.
Proposition 4.11** **(a special case of Exercise 2.3.24 [29] pp.61)
Assume that is closed. Then is closed.
Proof) Let be a sequence of points in . Suppose that as . We show that . By (14), there is a sequence . Since is compact, is compact and there is a subsequence
such that and as . Thus . ∎
Lemma 4.12
Assume that is in . Then is . Further assume that for all . Then for all .
Proof) Since is , there are closed sets in for all and . First we show that . We have
[TABLE]
By Proposition 4.11, is closed and is . Similarly,
1. is for all .
2. Assume that for all . Then .
3. Assume that . By assumption, .
Let . From 1–3, and is a -algebra. By definition, we have . ∎
Proof of Theorem 4.9) First we prove (i) (ii). Let be a universal uniform test and for all . Then is open, for all , and is a set. Assume (i). Then is well-defined, i.e. for all . Since the set satisfies the assumptions of Lemma 4.12, is measurable. Thus is consistent and we have (ii). The implication (ii) (i) is obvious. ∎
In [8], necessary and sufficient conditions for consistency of parametric models are shown for general parametric models.
4.2 Algorithmic solution to a classical statistical problem
In parametric models, estimators that are consistent at all parameters are concerned, while in Bayes models, those that are consistent at almost all parameters are concerned. The identification of the points at which the posterior distributions weakly converge constitutes the problem (see pp.4 Diaconis and Freedman [9] and pp.24 Ghosh and Ramamoorthi [14]).
Let be computable, where . By Theorem 3.4, we have an algorithmic solution to the problem, i.e. the Bayes model is consistent if and only if
[TABLE]
Remark 4.13
1. Assume that the Bayes model is computable and consistent. Then, by (15), we know that the posterior distribution is always consistent at the ML-random parameters without explicitly computing the posterior distribution. In other words, if the Bayes model is consistent and computable and the posterior distribution is not consistent at some parameters, then these parameters are not ML-random. Freedman [12] and Schwartz [27] identified the sets of the consistent parameters for smooth finite dimensional i.i.d. (independent and identically distributed) models, see pp.4 [9] or the examples below. Though (15) identifies a subset of consistent parameters, Theorem 3.4 and (15) give a new solution to the problem since they even hold true for any consistent on by relativizing with oracle without any further assumption.
2. It is straightforward to extend the equivalence of the statements except for (viii) in Theorem 3.4 to joint probabilities on complete separable metric spaces such as , see Doob [11] or pp.24 Remark 1.3.3 [14]. For example, let be a measurable space where is the Borel -algebra. Then, Theorem 3.4 holds for computable probabilities on by replacing open intervals in statements (iii) and (iv) in Theorem 3.4 with half open intervals . Similarly, we can extend Theorem 3.4 to probability measures on .
We examine (15) with examples of consistent Bayes models on complete separable metric spaces.
Example 4.14** **(Example 2 pp.17 in Schwartz [27])
Let be the parameter space and the prior the uniform measure on . Let be i.i.d. random variables and where obeys the uniform measure on if and on if . Let be the probability density function for . Then if else . Let and . Then if else . is consistent for all parameters. The Bayes model is computable. By Theorem 3.4, the Bayes model is consistent, and we have (15). The expectation of the parameter w.r.t. the posterior distribution (Bayes estimate) is
[TABLE]
and the posterior distribution is consistent except for . Since is the uniform measure, , and (15) is true. Let , where and . Then, the posterior distribution with prior is consistent for all . Since , , and (15) is true.
Before we show Example 4.19, we prove Proposition 4.17 for constructive topological spaces.
Definition 4.15** **(Constructive topological space [6, 13, 24] )
Let be a topological space with countable base , i.e. every open set is a union of elements of . Let be a subset of and an onto function (naming system p.132 [13]). We call constructive topological space if is r.e. The countable base is called r.e. if is a constructive topological space.
Definition 4.16** **(ML-randomness on constructive topological space)
Let be a constructive topological space and a probability space where is the -algebra generated by . Then ML-test and w.r.t. are defined with similar manner in Section 2, i.e. is called ML-test or test w.r.t. if is r.e. and for all and where and for . Then .
Proposition 4.17
Assume that is a constructive topological space. Let be a probability measure where is the -algebra generated by . Then
[TABLE]
where the support of is the complement of the largest open null set w.r.t. .
Proof) The complement of the support is a finite or countable union of null sets of . For each null set , there is such that and , i.e. is covered by a test. Since a countable union of tests covers the complement of the support, we have the proposition. ∎
For simplicity, we write for a constructive topological space .
Remark 4.18
1. in (16) is blind randomness since we do not assume that is computable in Definition 4.16.
2. is an r.e base for with usual topology. Similarly, for a positive integer , and with usual topologies have r.e. bases and are constructive topological spaces. For other examples of r.e. bases and constructive topological spaces, see [13].
Example 4.19** **(Freedman [12] Schwartz [27])
Freedman [12] and Schwartz [27] showed that for smooth i.i.d. parametric models with finite-dimensional parameter space , posterior distributions are consistent on the support and inconsistent outside the support. Since has an r.e. base, by (16), we have (15) for these models.
4.3 Randomness in statistical models
We compare randomness in Bayes models with that in non-Bayes models.
Example 4.20** **(Uniformly computable parametric models)
Assume that is uniformly computable and effectively orthogonal. Let for all . Then is well-defined and surjective. By Lemma 4.12, is measurable. We have and for all .
In the example above, first we consider the set of random sequences , the union of the sets of random sequences w.r.t. the parameterized family of models. Then for uniformly computable and effectively orthogonal models, for all , the true parameter is estimated by () and is random w.r.t. (). For a similar argument on collectives, see von Mises [25].
In Bayes models, we assume that a sequence is random w.r.t. the marginal distribution on sample space. Assumption (i’) requires that is also a member of the union of the sets of random sequences w.r.t. conditional distributions given random parameters. Then for consistent models, for all random sequence w.r.t. the marginal distribution on sample space, the true parameter is estimated by the posterior distributions given finite prefixes of , is random w.r.t. the conditional distribution given , and is random w.r.t. the prior. In other words, we may say that our notion of randomness is a Bayes version of that in uniformly computable parametric models.
Acknowledgment
The author wishes to thank the anonymous referees for their insightful comments, which significantly improved the paper and helped in highlighting the relevant results and classical statistical problems. A part of this work was supported by KAKENHI (24540153).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ackerman et al. [2011] Ackerman, N.L., Freer, C.E., Roy, D.M., 2011. On the computability of conditional probability, in: IEEE 26th Annual Symposium on Logic in Computer Science, pp. 107–116. Arxiv:1005.3014.
- 2Barron et al. [1998] Barron, A., Rissanen, J., Yu, B., 1998. The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theory 44, 2743–2760.
- 3Barron [1985] Barron, A.R., 1985. Logically smooth density estimation. Ph.D. thesis. Stanford Univ.
- 4Bauwens [2017] Bauwens, B., 2017. Conditional measure and the violation of van Lambalgen’s theorem for Martin-Löf randomness. Theory Comput. Syst. 60, 314–323. Arxiv:1103.1529.
- 5Bauwens et al. [2017] Bauwens, B., Shen, A., Takahashi, H., 2017. Conditional probabilities and van Lambalgen theorem revisited. Theory Comput. Syst. 61, 1315–1336.
- 6Bienvenu et al. [2011] Bienvenu, L., Gács, P., Hoyrup, M., Rojas, C., Shen, A., 2011. Algorithmic tests and randomness with respect to a class of measures. Proc. of the Steklov Institute of Mathematics 274, 41–102. Arxiv:1103.1529 v 2.
- 7Billingsley [1995] Billingsley, P., 1995. Probability and Measures. 3rd ed., Wiley.
- 8Breiman et al. [1964] Breiman, L., Le Cam, L., Schwartz, L., 1964. Consistent estimates and zero-one sets. Ann. Math. Statist. 35, 157–161.
