Sets and Probability
Hazel Brickhill, Leon Horsten

TL;DR
This paper investigates the concept of random variables within set theory, focusing on what it means for a random set to have a certain probability of belonging to a predefined class of sets.
Contribution
It introduces a novel perspective on random variables in set theory and explores their probabilistic properties within this framework.
Findings
Defined the notion of random variables over set-theoretic universe
Analyzed probabilities of random sets belonging to specific classes
Provided foundational insights for probabilistic set theory
Abstract
In this article the idea of random variables over the set theoretic universe is investigated. We explore what it can mean for a random set to have a specific probability of belonging to an antecedently given class of sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Mathematical and Theoretical Analysis · Advanced Topology and Set Theory
Sets and Probability††thanks: Versions of this paper have been presented a Bristol–Leuven workshop on Logic and Philosophy of Science (2015), at the Philosophy Department of the Universidade Federal do Rio
Grande do Norte (2015), the Fourth Reasoning Conference in Manchester (2015), the Philosophy of Mathematics Seminar in Oxford (2014), and at the Philosophy Departmental Research Seminar in Aberdeen (2014). We are grateful to the audiences for helpful comments, questions, and suggestions. In this respect we are especially indebted to Philip Welch, George Wilmers, and Sylvia Wenmackers.
Hazel Brickhill and Leon Horsten
Abstract
In this article the idea of random variables over the set theoretic universe is investigated. We explore what it can mean for a random set to have a specific probability of belonging to an antecedently given class of sets.
1 Introduction
Probabilistic notions have been applied to mathematical objects and notions. For instance, probabilistic concepts have been applied in the theory of random graphs [Alon et al 2000]. The aim of this article is to apply a notion of probability to the mathematical universe as a whole. More in particular, we wish to explicate what it could mean for a property of sets to have a probability of being true of a set in the set theoretic universe . Properties are identified with their extensions, so that ranges over all proper and improper classes in .
The aim is to develop a theory of the probability of events of the form , where is a class and the variable is a random variable. The state space of the random variables is of course . The outcome space of the random variables has to be at least as large as because there must be enough states for a random variable to take each set as a possible value. On the other hand, there is no need for it to be larger than . Therefore the outcome space is simply identified with .
Without invoking fixed set of postulates, intuitions about probability have occasionally been used in set theory, for instance to motivate new basic principles [Freiling 1986]. However, such attempts are mostly regarded as unsuccessful [Hamkins 2015]. In the light of this it is natural to wonder what we should require from probability functions associated with random variables on .
Surely it would be unreasonable to insist on there being one unique correct probability function that yields the probability of a random variable taking a value in a given class of sets. On the other hand, for our functions to have any hope of meriting the label probability function, they have to satisfy Kolmogorov’s conditions for being a finitely additive probability function.
From the outset we impose additional constraints on the class of probability functions that we are interested in:111For a discussion of these constraints in the context of non-Archimedean probability theory, see [Benci et al 2018].
Totality. The probability functions are defined on all classes. 2. 2.
Uniformity. All singleton events are given the same probability. 3. 3.
Regularity. All singleton events are given non-zero probability.
All this means, for familiar reasons, that the sought-for probability functions cannot be Kolmogorov probability functions. Given our insistence on finite additivity, this means that the probability functions will be non-Archimedean. They will not satisfy -additivity, but they will instead satisfy a generalised infinite additivity rule.
In mathematics today, the term ‘probability’ has become virtually synonymous with ‘function that satisfies the Kolmogorov axioms (including -additivity)’. If you see matters this way, then you will will be loath to dignify the functions constructed in this paper by the term ‘probability function’. Nonetheless, you may ask the question whether a fine-grained quantitative theory of possibility, with which the degree of possibility of properties can quantitatively be compared, can be constructed. This is what is investigated in the present article. So, if you prefer, you can call the theory constructed in this paper a quantitative theory of possibility. You are then advised to replace all occurrences of ‘(non-standard) probability function’ by ‘quantitative possibility function’.
The project in which we are engaging in this article is related to the work in [Benci et al 2007]. The aim of the latter article is to construct a theory of sizes for mathematical universes inspired by the Euclidean principle that the size of the whole is larger than the sizes of its proper parts. Now there is of course a familiar theory of size—Cantor’s theory of cardinality,—which does not satisfy this Euclidean principle. So Benci and his co-authors propose their Euclidean theory of size as a rival to Cantor’s theory.
We, on the other hand, fully accept Cantor’s theory of cardinality. Nonetheless, the probability functions that will be constructed satisfy the Euclidean principle that the probability of an event is strictly greater than the probability of each of its sub-events. Moreover, the mathematical techniques for generating them are closely related to the techniques that are used in [Benci et al 2007].
What we shall mean by ‘mathematical universe’ is not the same as what is meant [Benci et al 2007] by the term. The authors of [Benci et al 2007] impose mainly algebraic constraints on what counts as a mathematical universe [Benci et al 2007, Introduction]. We, in contrast, take the term ‘mathematical universe’ in the set theoretical sense. Naively, you may take there to be one preferred set theoretic universe: . But if you are uncomfortable with taking as given, then you might want to take a mathematical universe to be a rank that constitutes a model of most or perhaps even all of the standard principles of set theory. Indeed, we will see that for random variables defined on any large set , the general idea of equipping them with a probability function will be the same as that for random variables on .
We will discuss two ways of generating non-Archimedean probability functions for random variables on . In section 2 a simple way of generating such probability functions (the finite snapshot approach) will be described. In section 3 we go on to discuss how global properties of these probability functions can be made to hold by imposing constraints on the process of generating such functions. In section 4, a theoretically more satisfying but also more complicated way of generating non-Archimedean probability functions for random variables on is discussed (the bootstrapping method).
2 The finite snapshot approach
A random variable on is a function from states to the outcome space, i.e., an element of . So there are many random variables on . The aim is to associate a notion of probability with elements of that meet the minimal constraints (totality, uniformity and regularity) that were described in section 1.
In fact, we want to give precise meaning to conditional probability statements of the form
[TABLE]
where and . But we will see that it will be sufficient for our purposes to give meaning to unconditional probability statements of the form So our fundamental problem amounts to giving meaning to expressions of the form Such probability measures will be determined by a choice of a fine ultrafilter on the collection of finite subsets of the state space.222What follows is an adaptation of the approach of [Brickhill et al 2018, section 2].
The starting point is a fine ultrafilter on . This fine ultrafilter defines a non-Archimedean field in the following way.
For any two functions we define:
Definition 1
[TABLE]
In words: two functions are identified if they coincide on ultrafilter-many states.
The relation is an equivalence relation, so we can take equivalence classes for which we then have
[TABLE]
Moreover, it is again a routine exercise to verify that the ’s form a hyper-rational field .
Now suppose and . Then we define the function as follows:
Definition 2
For every
[TABLE]
In words: for every finite set of states , is the ratio between the number of states in for which and the number of states in . In this sense, is the probability of on a finite snapshot of states.
Similarly, we define the function as follows:
Definition 3
For every
[TABLE]
Now we are ready to define the probability of , relative to a fine (and therefore free) ultrafilter on :
Definition 4
[TABLE]
Similarly, we define as . Thus we have constructed a probability function that takes its values in the hyper-rational field . Such probability functions are sometimes called NAP functions.
Conditional probability can then be expressed in terms of unconditional probability:
Definition 5
[TABLE]
3 Constraints
From section 1 we know that the aim is not to arrive at a unique (correct) probability function on . But we did insist from the outset on our probability functions satisfying three global constraints: totality, uniformity, and regularity. It will be shown that these properties are always guaranteed to hold.
There are further global conditions on probability functions on that seem reasonable to require, and that are not guaranteed to hold without further work. These global constraints will be explored. We will show that many of them can be forced to hold by imposing constraints on the ultrafilters from which the probability functions are generated.
3.1 Elementary properties
The definition of is relative to an initial choice of the fine ultrafilter . The properties of depend on . Nonetheless, certain basic properties of can be easily seen to hold regardless of which fine ultrafilter is chosen:
Proposition 1
* is a finitely additive probability function;* 2. 2.
* is Euclidean.*
Proof.* Easy. * **
Now we define the notion of a diagonal random variable:
Definition 6
A random variable is said to be a diagonal random variable if for any set , there is exactly one element of the state space such that .
In words: a diagonal random variable is a random variable that takes every value exactly once.
Using this notion, we define the notions of regularity and uniformity:
Definition 7** (regularity)**
A probability function is regular if for every diagonal random variable and for every .
Definition 8** (uniformity)**
A probability function is uniform if for every diagonal random variable and for all
[TABLE]
Proposition 2
For every fine ultrafilter :
* is regular;* 2. 2.
* is uniform.*
Proof.* These properties are proved as propositions 2.5 and 2.6 in [Brickhill et al 2018, p. 525–526]. * **
The Euclidean property is formally defined as follows:
Definition 9** (Euclidean)**
A probability function is Euclidean if for every diagonal random variable and all :
[TABLE]
Then we have:
Proposition 3
For every fine ultrafilter , the probability function is Euclidean.
Proof.* By finite additivity and regularity. * **
Now we turn to infinite additivity. Countable additivity means that the probability of the union of a countable family of disjoint sets is the infinite sum of the probabilities of the elements of the family, where the notion of infinite sum is spelled out in terms of the classical notion of limit. In the present setting, the probability of the union of any family of disjoint sets is also the infinite sum of the probabilities of the elements of the family [Benci et al 2013, section 3.4]. But now the notion of infinite sum is spelled out in terms of the generalised notion of limit based on the ultrafilter . More precisely, the new notion of infinite sum is defined as follows. Suppose we are given a family of rational numbers, and . Then consider the function given by
[TABLE]
This function can be seen as giving the value of the infinite sum on all finite parts (“snapshots”) of the index set. So we identify the infinite sum of the family of rational numbers with the generalised limit of according to the ultrafilter :
Definition 10
[TABLE]
Using this notion of infinite sum, we can express the probability of the union of a disjoint family of sets as the sum of the probabilities of the members of that family:
Proposition 4
If , with for all , then for every random variable :
[TABLE]
In sum, has a natural infinite additivity property that is sometimes called perfect additivity.
Proposition 5
For every fine ultrafilter , the probability function is perfectly additive.
Proof.* This proposition is proved as proposition 8 in [Benci et al 2013, p. 132–133]. * **
3.2 Symmetry principles
From now on, the symbol will be used to refer to some arbitrary diagonal random variable. When it is not assumed that the random variable in question is diagonal, we will write .
The Euclidean-ness of has implications for symmetry principles. As a rule of thumb, one can say that symmetry principles fail.333See [Benci et al 2007], [Benci et al 2013], [Benci et al 2018].
Proposition 6
For every fine ultraflter , the probability function is not invariant under all permutations of .
Proof.* We concentrate on as it is canonically represented in (by means of the Zermelo ordinals, for instance). Define a permutation of as follows:*
- •
* for ; Otherwise:*
- •
* for even;*
- •
;
- •
* for odd and .*
*Let , and let be a diagonal random variable. Then . Therefore, by the Euclidean principle, * **
This of course entails that there are diagonal random variables such that for some ,
[TABLE]
One popular global constraint on probability measures is translation-invariance. The Lebesgue measure has this property, and Banach limits seem to occupy a privileged position in the class of generalised limits at least in part because they are translation-invariant. In our context, translation-invariance does not make obvious sense. For a random class , it is not clear what ‘’ (where is a number) means. But a clear interpretation of ‘adding an ordinal number’ can of course be given if is a collection of ordinals:
Definition 11
For any collection of ordinals:
[TABLE]
Then for to be translation-invariant means that for all ordinals and for every ,
[TABLE]
However, even if we consider non-Archimedean measures (of the kind that we have been describing) on ordinals, translation-invariance conflicts with the Euclidean Property of our generalised probability functions. In particular, there is no probability function on any infinite cardinal such that there is even one ordinal with and
[TABLE]
The reason is simple. We have so if we had then we would contradict the Euclidean principle.
As this example shows, such translations aren t necessarily one to one so we may not want full invariance in general. In [Benci et al 2007, section 1.3], Benci, Forti, and Di Nasso explore a restricted notion of translation-invariance of -like measures on ordinals. We do not pursue this theme further here, but only pause to note that there are other reasonable-looking principles that are hard to satisfy. In the context of their theory of numerosities, Benci, Forti, and Di Nasso consider a principle that in the present context would take the following form:
Definition 12** (Difference Principle)**
[TABLE]
On countable sample spaces, the difference principle can be made to hold by building from a selective ultrafilter [Benci et al 2003]. But the existence of selective ultrafilters is independent of ZFC. As far as we know, it is an open whether the difference principle can be consistently made to hold for NAP probability functions on uncountable sample spaces.
3.3 Probability and cardinality
In this (sub-)section we investigate the relation between our notion of generalised probability on the one hand, and the familiar notion of cardinality on the other hand.
3.3.1 Hume’s principle for probability
One might naively wonder whether the following probabilistic analogue of Hume’s Principle for cardinality can hold:
Definition 13** (Hume’s principle for probability)**
For all :
[TABLE]
But the probability functions that we have been considering cannot satisfy Hume’s principle for probability, as its failure is an immediate consequence of Proposition 6: invariance under permutations and Hume’s principle for probability are mathematically equivalent. However, this was only to be expected. After all, we do not expect Kolmogorov probability (on infinite spaces) to satisfy any such principle.
3.3.2 Superregularity
The hyper-rational field in which the probability functions take their values contain infinitesimal numbers—this is what makes it non-Archimedean. We will write if for each . And we will write if
[TABLE]
We have seen that cannot satisfy Hume’s principle for probability. But, at least at first sight, it seems that it would be reasonable to demand:
[TABLE]
Indeed, if in addition , then we might even expect
[TABLE]
Further, this may be expected to hold if is a proper class but is a set . The result is a size constraint which is a strengthening of the requirement of regularity:
Definition 14** (Superregularity)**
[TABLE]
Note that if is finite and is infinite then the consequent holds automatically.
By a suitable restriction on admissible ultrafilters , superregularity can indeed be made to hold:
Theorem 1
There are fine ultrafilters such that is superregular.
Proof.**
If such that are given, then we have if and only if for each ,
[TABLE]
The aim is to build an ultrafilter for which this holds.
For any , define
[TABLE]
Moreover, let
[TABLE]
Define also
[TABLE]
We want to prove that has the finite intersection property. Therefore take any , and any such that and for Assume for the construction that . For every finite , if then So setting we will extend to a set in , and hence , for each . Set and . As is infinite and of larger cardinality than we add elements of to , yielding a finite set . Now set , and add elements of to to give . Note we can find these elements of as . Continuing in this manner, set . Then we have ensured that for all
[TABLE]
and so we have and since we also have
*So indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter . By design, then, the resulting probability function is super-regular. *
Once again, Hume’s Principle for probability cannot hold for the notion of probability that we are investigating. But this leaves open the question whether the converse of Hume’s Principle for probability can be made to hold. This is called Cantor’s Principle in [Benci et al 2007], where the authors investigate it in the context of their Euclidean theory of size:
Definition 15** (Cantor’s Principle)**
[TABLE]
Benci, Forti, and Di Nasso prove that ‘Cantor’s Principle’ can be made to hold [Benci et al 2007, section 3.2]. It is also clear that Cantor’s Principle follows from super-regularity.
3.3.3 The power set principle
The question whether
[TABLE]
is true, is independent of the axioms of set theory. (Of course the principle is true if the Generalised Continuum Hypothesis holds.) Like the cardinality operator, our NAP probability functions are measures of some kind. One might wonder what should follow from In particular, given that is intended to be a fine-grained quantitative possibility measure, perhaps probability should be expected to co-vary with the power set operation in some fairly direct manner. In other words, it is natural to ask if the following principle can be made to hold:
Definition 16** (Power Set Condition)**
[TABLE]
It turns out that the power set condition can indeed be satisfied:
Theorem 2
There are fine ultrafilters such that satisfies the power set condition.
The argument for this is somewhat more involved.
We aim to prove Theorem 2 by building the probability function up from an ultrafilter which is based on a pre-filter that has the finite intersection property.
The class is built up in stages, and in such a way that it eventually witnesses the truth of the power set condition for all .
Stage 0
The class consists of all
[TABLE]
for . This is to ensure that the ultrafilter that will be built from is fine. We know that has the finite intersection property.
Limit stages
For limit stages , we simply set .
Successor stages
Given fine-ness, we may, and will, ignore the elements of . At stage , where is a successor ordinal, we consider the sets of and ensure that the power set condition eventually holds for all these sets and their power sets, by adding families of finite sets to in such a way that the finite intersection property is preserved.
As an illustrative and indeed representative example we do the case where .
Let there be given an enumeration of the pairs of elements of .
For the induction, we assume that, by having added appropriate sets of finite sets to , the power set condition holds for and their power sets, and that in the process the finite intersection property has been preserved. The aim is now to extend this so that it also holds for . In other words, we have constructed , and we want to obtain , where .
Definition 17
[TABLE]
Definition 18
[TABLE]
Claim
Either has the finite intersection property, or has the finite intersection property (or both).
Proof
Suppose not. Then there is a finite intersection of elements of such that , and there is a finite intersection of elements of such that . But then and . But So then . But this contradicts the inductive assumption that has the finite intersection property.
Thus define to be if this has the finite intersection property, or otherwise, and by the claim, has the finite intersection property. Now setting we may conclude that has the finite intersection property.
At this point we must extend by adding to :
- •
every set of the form such that ;
- •
every set of the form such that .
Call the resulting set . Our aim is to prove that has the finite intersection property.
Consider an arbitrary non-empty finite family . Without loss of generality we may assume that the ‘judgements’ in of the form or , taken together, describe a finite total pre-ordering relation on some set . Further, we may also assume that for and sets and from , if and only if , and iff . Thus contains witnesses for all the relevant judgements we may be interested in.
Let , so consists only of judgements about sets in . Then we know from the foregoing that . So take some . Our plan is inductively to extend , using the pre-order , to a finite set .
We will add to elements that ensure that the constraints of are satisfied. Moreover, by choosing the elements to be added to from ,444For later stages we will take these sets from , i.e. sets of rank . we ensure that the constraints imposed by remain satisfied. As a result, will satisfy all constraints from , so and hence has the finite intersection property.
As an example, suppose that
[TABLE]
(1) We start by ensuring that is satisfied.
Suppose that already contains elements of . Since , there must be an element . This implies that there are infinitely many infinite sets in such that : we add such elements to , and call the resulting finite set .
(2) We proceed in similar fashion to ensure that is satisfied:
Suppose that already contains elements from , observing that it may be the case that , for there may already be a finite number of elements of in . Since , there must be an element , and since , there must be an element . So there are infinitely many infinite sets in such that : add such elements to , and call the resulting set .
(3) Now suppose that there are elements of in , and elements of in . Moreover, suppose that . (The case where is similar.) Since , but also , there must be some and some . Moreover, since , there are elements . So contains infinitely many infinite sets such that . Similarly, contains infinitely many infinite sets that are outside . So we add a sufficient number of such elements to so that there are an equal number of “witnesses” for as for but where is larger than the number of witnesses for . Call the resulting set .
(4) To conclude, we set . It is clear that .
This procedure of extending easily generalises to any finite total pre-ordering on . Thus we have shown that has the finite intersection property.
This procedure for extending to while preserving the finite intersection property also works for larger successor ordinals: at level (stage with ) we can extend the corresponding using subsets of rank . As we have said above, at limit stages we can simply take unions. Ultimately we set .
The class will then have the finite intersection property, so it can be extended to a filter and then to an ultrafilter . The probability function based on will make the power set condition true for all , and this concludes the proof of theorem 2.
Our proof actually shows something slightly stronger: for all with , we have
[TABLE]
The reason is that in enlarging the set we always have infinitely many elements to choose from.
For any probability measure that satisfies power set condition we also have that :
[TABLE]
where . An easy argument shows this cannot extend to infinite applications of the power set operation.
One might wonder whether the motivations behind the power set condition should not also support imposing the following restricted power set condition on :555Thanks to Philip Welch for this question.
Question 1
Are there probability measures such that
[TABLE]
3.4 The ordinals
For in each level of the iterative hierarchy one finds only one ordinal, but infinitely many sets that are not ordinals. This might lead one to believe that a probability function on should satisfy
[TABLE]
where ‘On’ is the class of ordinals.
Just as it seems reasonable to require that the probability of choosing an even natural number from the set of natural numbers must be equal to or infinitesimally close to (see [Wenmackers et al 2013, section 6.2]), it seems reasonable to require that
[TABLE]
where ‘Even’ is the class of even ordinals, which is defined in the obvious way.
Moreover, between any two limit ordinals there are infinitely many successor ordinals, so one might expect
[TABLE]
where ‘Lim’ is the class of limit ordinals.
We will sketch how probability functions can be constructed that meet these expectations. Indeed, we will see that there are probability functions that meet these ‘ordinal expectations’ and in addition meet the size constraint of super-regularity.
Theorem 3
There are super-regular probability functions such that:
** 2. 2.
** 3. 3.
**
Proof. As before, the aim is wisely to choose the ultrafilter on which is based. We want to be such that for all :
- •
* if *
- •
* and *
- •
**
Now we define:
- •
**
- •
**
- •
**
- •
**
And now we set:
[TABLE]
Claim: has the finite intersection property.
Let some be given. Now where , and similarly for , so as before in theorem 1, it suffices to concentrate on the highest values of .
(1) So we start with the finite set and will extend it.
(2) Again we concentrate on one pair such that ; we leave out further cases as they are similar. There are arbitrarily large finite subsets that are -isolated from elements of , meaning that each ordinal in is more than ordinals removed from any ordinal in . We choose any such that is of size at least , and we set .
(3) Now we extend to ensure that all ordinal intervals are of length : for each , we add . Call the resulting finite collection . Note that by our choice of -isolated elements in (2), none of are elements of .
(4) Let . Then we add elements of to and call the resulting set .
*It is now routine to verify that . The case including further sets is similar, thus the claim is verified. So indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter . By design, the resulting probability function has the required properties. *
4 The bootstrapping approach
The probability is obtained by ‘summing up’ the probabilities for all ‘small’ parts of ; such are seen as approximations of .
In the finite snapshot approach, ‘small’ in this context means ‘finite’. But from a conceptual point of view, ‘finite’ might be taken to be too small as far as the test sets (or snapshots) are concerned. Compared to , all sets —and not just the finite sets— are small. So to determine , we should take the ‘limit’ of the values , where is a set of any size. Then if is infinite, cannot just be taken to be given by the ratio formula but needs to be defined.
In the approach to which we now turn (the bootstrapping approach), a probability is determined by the probabilities , where , for a large set, is then in turn determined by probabilities for being smaller ‘snapshots’ than , and so on, until we reach the finite snapshots and can appeal to the probability functions that were discussed in the previous sections. Thus the bootstrapping account can be seen as a generalisation of the finite snapshot approach.
4.1 The rough idea
In general terms, this is how we will proceed:
(1) By the construction from the previous section, a fine ultrafilter on yields a notion of probability on all sets with . In other words, this yields a suitable notion of probability, call it , for every countable set .
(2) The notion of for all with is determined using the notion of probability on countable sets: the probability of on such an is determined by the class of probabilities of on the countable ‘snapshots’ of . Using these countable probability functions, a fine ultrafilter on gives us a notion of probability on sets with .
Again the resulting functions are essentially NAP-functions as defined in [Benci et al 2013]. They are total, regular, etc.
…
() A fine ultrafilter on , together with probability functions for all such that , yields a notion of probability on all sets with .
…
Limit stages of course do not present a problem. So by transfinite recursion on cardinality this yields for every set a notion of probability on .
Then a fine ultrafilter on yields, using the general notion for , a notion that is a total (class) function from properties and random variables to values in a non-Archimedean class field. This probability function again satisfies the principles of the theory NAP in [Benci et al 2013].
For this construction, what we need is suitable (fine) ultrafilters on small, and somewhat larger, and large, sets, and a fine ultrafilter on . But we will see that all the set ultrafilters used in the construction can be uniformly obtained as restrictions to sets of the given fine ultrafilter on . So is determined by one initial choice of , whereby can be seen as the ‘limit’ of its set-restrictions , where the functions can in turn be seen as ‘limits’ of restrictions to their small subsets. This uniform construction has the advantage that the resulting probability functions are all coherent, in the sense that for a set , is the same for all and hence also for .
Now it is time to look at details of the construction.
4.2 Details 1: Restrictions of fine ultrafilters
Since our construction involves ultrafilters on sets with , we make the following definition, which accords with the usual definition of fineness on .
Definition 19
For any infinite cardinal , an ultrafilter on is fine iff for every
[TABLE]
The notion of ‘set-fine’ ultrafilter on is defined in the obvious way.
We first show that appropriate restrictions of ultrafilters to smaller sets can be obtained in a uniform fashion.
Definition 20
Suppose , , and a fine ultrafilter on , and with . Then we define the restriction of to as follows.
For any , let
[TABLE]
Then
Proposition 7
For any with , there are fine ultrafilters on that restrict to a fine ultrafilter on every with , and .
Further, such ultrafilters are coherent in that if with , then .
Proof.* We build the ultrafilter from a pre-filter (i.e., a set with the finite intersection property), which can then be extended to a filter and then to an ultrafilter.*
For each , let
[TABLE]
And let for each with and :
[TABLE]
Now set
[TABLE]
It is easy to see that has the finite intersection property and so can be extended to an ultrafilter . And by design, is fine.
Clearly We must check the fine ultrafilter properties:
(1) Fine. This follows from the fact that is fine: for this is witnessed by .
(2) Finite intersection. Let . Then there are such that and . By the finite intersection property of , we know that But So .
(3) Ultra. Take any , and let Let and let Then By the ultra property for , we have or . But and . So or
(4) Non-principality. This is implied by fineness.
(5) Empty set property: We have to show that . It suffices to show that for each , . Since , . But for any set in this intersection, . So
*For coherence, take with and let . As it is enough to show that . Now , but by definition, for any we have . Thus . *
But this means that this property must also hold for fine ultrafilters on
Consequence 1
There are fine ultrafilters on , such that for every set with , is a fine ultrafilter on and the coherence property holds.
Proof.* By the same reasoning as in the previous proposition. * **
4.3 Details 2: defining probability functions
Now we show how for every set, a probability function on that set can be defined. The same procedure can then be used to define a probability function on , and these probability functions are coherent.
The key is to spell out what is involved in the -th step of the recursive procedure for defining probabilities on sets:
() A fine ultrafilter on (with ), together with probability functions for all such that , yields a notion of probability on .
As in section 2, we define a function such that for all :
[TABLE]
Similarly, we define a function such that for all :
[TABLE]
Then is defined as , and is defined as
[TABLE]
This function will then be an NAP probability function in the sense of [Benci et al 2013].
Now in an exactly similar way, we define a class probability function on , using the probability functions on ‘small’ classes (i.e., sets) and ultrafilters on ‘small’ classes which (given proposition 7) we can now assume to have been defined on the basis of an ultrafilter on with which we start. The function is total, regular, and uniform for the same reasons as why its ‘smaller cousin’ has these properties.
We now check coherence. We will do this only for straight probabilities rather than random variables in general, as although coherence holds for random variables also, it is much more technical to state. Below we use to denote where s the identity random variable.
Proposition 8
For any class and sets with we have
[TABLE]
Proof.* We show by induction on that that the above holds for all with . Strictly speaking, the range of may be a different non-archimedean field to the range of , but there is a natural embedding of the former into the latter defined by where for , . This is well-defined as .*
Using this embedding we have . Now for we have:
[TABLE]
As we have so by our inductive hypothesis
[TABLE]
*But by definition, \big{[}\frac{f_{A\cap T}}{f_{T}}\big{]}_{\mathcal{U}_{S}}=\mathsf{Pr}^{S}(A|T), so and we’re done. *
4.4 Comparison of the finite snapshot approach and the bootstrapping approach
In our definition of the probability of a set theoretic property, the probability of a property is determined by the probabilities of on large ‘snapshots’ , where a probability (for a large set) is then in turn determined by the probabilities for being smaller ‘snapshots’ than , and so on. Conceptually, the definition in section 4.3 is superior to the simpler definition suggested from section 2: we want to take the behaviour of the property on as many and as large ‘snapshots‘ as possible into account.
It is not straightforward to compare the simple and the more involved definition: the simple method is based on an ultrafilter on whereas the more involved method is based on an ultrafilter on .
The obvious suggestion is to base the comparison on the relation between a probability function determined by an ultrafilter on and its restriction666This is a different notion of restriction to that defined in the previous section as here we are only restricting the index, while the underlying class remains the same (). to defined as . But:
Proposition 9
Not all ultrafilters on restrict to ultrafilters on to .
Proof.* Consider , where is the set of atoms (guaranteeing fine-ness) and is the relative complement of in . Then has the finite intersection property and so can be extended to a fine ultrafilter on . But . So does not restrict to an ultrafilter on . * **
On the other hand, every fine ultrafilter on restricting to an ultrafilter on essentially is an ultrafilter on :
Proposition 10
Suppose is a fine ultrafilter on restricting to an ultrafilter on . Then .
Proof.* Since is ultra, we have or . But if , then , so that does not restrict, contradicting the assumption. So . * **
This means that the essentially involved probability functions on cannot be reduced to ‘simple’ probability functions on .
5 Conclusion
In this article we have explored two methods for modelling, by means of non-Archimedean probability functions, the properties of random variables ranging over the set theoretic universe: the finite snapshot method and the bootstrapping method. Concerning the finite snapshot method, we found that many of the probabilistic properties that seem intuitively plausible can be satisfied. The bootstrapping method is more satisfying from a conceptual point of view, but we have only been able to show that the resulting probability functions satisfy minimal requirements. So much work remains to be done.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Alon et al 2000] Alon, N. & Spencer, J. The Probabilistic Method. Second edition, Wiley, 2000.
- 2[Benci et al 2003] Benci, V. & Di Nasso, M., Numerosities of labelled sets. A new way of counting. Adv. Math. 173 (2003), p. 50–67.
- 3[Benci et al 2007] Benci, V., Di Nasso, Mauro, Forti, M. An Euclidean measure of size for mathematical universes. Logique et Analyse 50 (2007), p. 43–62.
- 4[Benci et al 2013] Benci, V., Horsten, H., Wenmackers, S. Non-Archimedean probability , Milan Journal of Mathematics 81 (2013), p. 121–151.
- 5[Benci et al 2018] Benci, V. , Horsten, L. Wenmackers, S., Infinitesimal probabilities. British Journal for the Philosophy of Science 69 (2018), p. 509–552.
- 6[Brickhill et al 2018] Brickhill, H. & Horsten, L. Triangulating non-Archimedean probability. Review of Symbolic Logic 11 (2018). p. 519–546.
- 7[Freiling 1986] Freiling, C. Axioms of infinity. Throwing darts at the real number line. Journal of Symbolic Logic 51 (1986), p. 190–200.
- 8[Hamkins 2015] Hamkins, J. Is the dream solution to the continuum hypothesis attainable? Notre Dame Journal of Formal Logic 56.1 (2015), p. 135–145.
