
TL;DR
The paper presents a theorem relating the universal probability of string sets to the maximum probability of individual strings, bounded by the information about the halting sequence.
Contribution
It introduces the EL Theorem, connecting universal probability sums over sets with maximum individual probabilities and halting sequence information.
Findings
Universal probability of sets approximates maximum string probability
Difference bounded by halting sequence information
Provides insights into algorithmic probability and complexity
Abstract
The combined universal probability of strings in sets is close to max over in : their logs differ by at most 's information about the halting sequence .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Algorithms and Data Compression · Rough Sets and Fuzzy Logic
The EL Theorem
Samuel Epstein
Abstract
The combined universal probability of strings in sets is close to max over in : their logs differ by at most ’s information about the halting sequence .
1 Introduction
One common goal in computer science is to find the hidden part of the environment, this task has been called Inductive Inference, Extrapolation, Passive Learning, etc. The complete environment can represented as a huge string . The known observations restrict it to a set . For example in thermodynamics, the environment can be seen as a record of every particle’s position and velocity in a closed box. An observation of some macro parameters, such as pressure and temperature, restricting the possible environments to a set of hypotheses consistent with the observation.
One method used to select a hypothesis (i.e. environment) is to leverage an * apriori* distribution over the environment space. This distribution encodes any knowledge about the environment known before the observation is made. Then selection of the hypothesis is
[TABLE]
Note in AIT, for enumerable distributions (i.e. generatable as outputs of randomized algorithms), there is a universal apriori distribution . This is because , for all enumerable . Furthermore, for all , , where is deficiency of randomness; so there is no lower computable refutation to the statement: “ is generated from ”. Thus when the universal prior is used, inductive inference becomes an exercise of Occam’s razor:
[TABLE]
However there exists a potential complication. It could be there is a collection of hypotheses representing a concept (such as a more detailed description of particles) where its combined apriori measure is greater than that of the simpliest element , with . Or, making the endeavor more murkier, it could be that is just the set of all complicated hypothesis and has greater combined apriori measure than the simpliest element. In this case, which explanation does one choose?
The EL Theorem shows that this dilemma is purely a mathematical construction. All the universal apriori measure of an observation is concentrated on its simpliest member. This is true for all non-exotic set with low mutual information with the halting sequence, . There are no (randomized) algorithmic means of creating with arbitrarily high .
2 Related Work
For information relating to the history of Algorithmic Information Theory and Kolmogorov complexity, we refer the readers to the textbooks [LV08] and [DH10]. A survey about the shared information between strings and the halting sequence is in the work [VV04]. Work on the deficiency of randomness can be found in [She83, KU87, V’Y87, She99]. Stochasticity of objects can be found in the works [She83, She99, V’Y87, V’Y99]. More information on stochasticity and algorithmic statistics are in the works [GTV01, VS17, VS15]. The EL Theorem is joint work between the author and L. A. Levin who published this result in [Lev16].
3 Conventions
As noted in the introduction, is the conditional prefix free Kolmogorov complexity. is the algorithmic probability. is the amount of information that the halting sequence has about . A probability is elementary, if it has finite support and rational values. The deficiency of randomness of relative to a elementary probability measure is . We recall for a set , . For the nonnegative real function , we use , , and to denote , , and . We also use and to denote and , respectively.
4 The EL Theorem
Definition 1** (Stochasticisty)**
A string is -stochastic if there exists an elementary probability measure such that
[TABLE]
Theorem 1** (Epstein,Levin)**
Let be a lower-semicomputable semimeasure and be a large constant. Every -stochastic set with contains an element with
[TABLE]
The theorem is directly implied by the following lemma.
Lemma 1
Let be a lower-semicomputable semimeasure and be a large constant. If a set is -stochastic relative to an integer , then contains an element with
[TABLE]
Note that if is -stochastic relative to , then it is -stochastic. Hence the lemma implies the theorem.
Lemma 2
Let be a discrete mesure and be a measure on sets. There exists a set of size such that
[TABLE]
Proof.
We use the probabilistic method, and show that if we draw elements according to the distribution , then the obtained set satisfies the inequality with positive probability. The probability that a fixed set with is disjoint from is
[TABLE]
Hence the expected -measure of such a is at most and the required set exists.
Proof of Lemma 1 for computable . Let be an elementary probability measure with and . Without loss of generality, we assume that is large positive power of 2. Fix a search procedure that on input , , and finds a set satisfying the conditions of Lemma 2.
For large , the set must intersect the obtained set . Indeed, consider the -test that is equal to if is disjoint from , and is zero otherwise. This is indeed a test, because the above lemma implies that its expected value for is bounded by 1. Since the test is also computable, it is a lower bound to the optimal test , up to a constant factor. By stochasticity of the set , , because is an optimal test relative to . Thus for large enough , intersects .
It remains to construct a description of each element in of the size given in the proposition. We construct a special decompressor that assigns short description to each element in . On input of a string, the decompressor interprets the string as a concatenation of 4 parts:
A prefix-free description of of size at most . 2. 2.
A prefix-free description of of size . 3. 3.
A prefix-free description of of size . 4. 4.
An integer of bitsize .
It interprets the last integer as the index of an element in the set of size that is computed by the search procedure on input , , and . The element is the output of the decompressor. The proposition is proven for computable .
Remark 1
If is computable, a set satisfying the conditions of the lemma can be easily searched. But if is not computable, then the collection of sets with grows over time. Thus after constructing a good S, it can happen that a large -measure of sets appears that does not contain an element from , and that new elements to need to be added. This type of interactive construction leads to an equivalent characterization of the problem in terms of a game which is shown in [She12]. Below, another proof is presented.
Proof of Lemma 1 for lower-semicomputable . We still assume that is a large power of 2. Let . We can rewrite , with , such that are probability measures with finite support obtained by a lower semi-computable approximation of , and is a lower-semicomputable semimeasure.
Construction of a lower-semicomputable test over sets. We first construct tests together with a list of strings . Let . Assume we already constructed and for some . Choose such that the test
[TABLE]
satisfies where the expectations are taken for . Let be equal to if there exists an such that , otherwise let . *End of construction
We first show that each required string in the construction exists. Suppose and have already been constructed. We show the existence of using the probabilistic method. If we draw according to , then for each set for which the second condition of is satisfied, we have
[TABLE]
because of the inequality for all reals . If satisfies the first or third condition, then is trivially true. So
[TABLE]
and the required exists.
We have , where is the optimal test because the construction implies and is effective, thus is lower semicomputable. Every set with satisfies by choice of . Any such that is disjoint from the set satisfies
[TABLE]
This implies for large , because up to constants, we have
[TABLE]
By the assumption on -stochasticity of , we have and hence must contain some . The theorem follows by constructing a description for each string of bitsize in a similar way as above.
4.1 Non-Stochastic Objects
It is well known in the literature that non-stochastic objects have high mutual information with the halting sequence [VS17]. In the following lemma, we reprove this fact, without using left-total machines, which was used in the original proof.
Lemma 3
.
Proof.
We dovetail all programs to the universal Turing machine . For , is the position in which the program terminates. Let and be Chaitin’s Omega. Let be restricted to the first digits. Let , with with minimum . Let and . We define the elementary probability measure , .
[TABLE]
Corollary 1** (EL Theorem)**
For finite , .
Proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[DH 10] R. G. Downey and D.R. Hirschfeldt. Algorithmic Randomness and Complexity . Theory and Applications of Computability. Springer New York, 2010.
- 2[GTV 01] P. Gács, J. Tromp, and P. Vitányi. Algorithmic Statistics. IEEE Transactions on Information Theory , 47(6):2443–2463, 2001.
- 3[KU 87] A. N. Kolmogorov and V. A. Uspensky. Algorithms and Randomness. SIAM Theory of Probability and Its Applications , 32(3):389–412, 1987.
- 4[Lev 16] L. A. Levin. Occam bound on lowest complexity of elements. Annals of Pure and Applied Logic , 167(10):897–900, 2016. And also: S. Epstein and L.A. Levin, Sets have simple members, ar Xiv preprint ar Xiv:1107.1458, 2011.
- 5[LV 08] M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications . Springer Publishing Company, Incorporated, 3 edition, 2008.
- 6[She 83] A. Shen. The concept of (alpha,beta)-stochasticity in the Kolmogorov sense, and its properties. Soviet Mathematics Doklady , 28(1):295–299, 1983.
- 7[She 99] A. Shen. Discussion on Kolmogorov Complexity and Statistical Analysis. The Computer Journal , 42(4):340–342, 1999.
- 8[She 12] A. Shen. Game arguments in computability theory and algorithmic information theory. Ar Xiv e-prints , 2012. http://http://arxiv.org/abs/1204.0198.
