Sets and Probability

Hazel Brickhill; Leon Horsten

arXiv:1903.08361·math.LO·March 21, 2019

Sets and Probability

Hazel Brickhill, Leon Horsten

PDF

Open Access

TL;DR

This paper investigates the concept of random variables within set theory, focusing on what it means for a random set to have a certain probability of belonging to a predefined class of sets.

Contribution

It introduces a novel perspective on random variables in set theory and explores their probabilistic properties within this framework.

Findings

01

Defined the notion of random variables over set-theoretic universe

02

Analyzed probabilities of random sets belonging to specific classes

03

Provided foundational insights for probabilistic set theory

Abstract

In this article the idea of random variables over the set theoretic universe is investigated. We explore what it can mean for a random set to have a specific probability of belonging to an antecedently given class of sets.

Equations104

Pr (σ \in A ∣ τ \in B),

Pr (σ \in A ∣ τ \in B),

f \approx_{U} g \equiv {T \in [V]^{< ω} : f (T) = g (T)} \in U .

f \approx_{U} g \equiv {T \in [V]^{< ω} : f (T) = g (T)} \in U .

[f]_{U} = [g]_{U} \Leftrightarrow f \approx_{U} g .

[f]_{U} = [g]_{U} \Leftrightarrow f \approx_{U} g .

f_{θ \in A} (T) \equiv \frac{∣ { s \in T : θ ( s ) \in A }∣}{∣ T ∣} .

f_{θ \in A} (T) \equiv \frac{∣ { s \in T : θ ( s ) \in A }∣}{∣ T ∣} .

f_{θ \in A \land ν \in B} (T) \equiv \frac{∣ { s \in T : θ ( s ) \in A and ν ( s ) \in B }∣}{∣ T ∣} .

f_{θ \in A \land ν \in B} (T) \equiv \frac{∣ { s \in T : θ ( s ) \in A and ν ( s ) \in B }∣}{∣ T ∣} .

Pr_{U} (θ \in A) \equiv [f_{θ \in A}]_{U} .

Pr_{U} (θ \in A) \equiv [f_{θ \in A}]_{U} .

Pr_{U} (θ \in A ∣ ν \in B) \equiv \frac{Pr _{U} ( θ \in A \land ν \in B )}{Pr _{U} ( ν \in B )} .

Pr_{U} (θ \in A ∣ ν \in B) \equiv \frac{Pr _{U} ( θ \in A \land ν \in B )}{Pr _{U} ( ν \in B )} .

Pr_{U} (θ = x) = Pr_{U} (θ = y) .

Pr_{U} (θ = x) = Pr_{U} (θ = y) .

A ⊊ B \Rightarrow Pr_{U} (θ \in A) < Pr_{U} (θ \in B) .

A ⊊ B \Rightarrow Pr_{U} (θ \in A) < Pr_{U} (θ \in B) .

f (T) = i \in I \cap T \sum q_{i} .

f (T) = i \in I \cap T \sum q_{i} .

i \in I \sum^{*} q_{i} \equiv [f]_{U} .

i \in I \sum^{*} q_{i} \equiv [f]_{U} .

Pr_{U} (τ \in A) = i \in I \sum^{*} Pr_{U} (τ \in A_{i}) .

Pr_{U} (τ \in A) = i \in I \sum^{*} Pr_{U} (τ \in A_{i}) .

Pr_{U} (θ \in A) \neq = Pr_{U} (θ^{'} \in A) .

Pr_{U} (θ \in A) \neq = Pr_{U} (θ^{'} \in A) .

A \oplus α \equiv {β : \exists γ \in A such that β = γ + α} .

A \oplus α \equiv {β : \exists γ \in A such that β = γ + α} .

Pr_{U} (θ \in A) = Pr_{U} (θ \in A \oplus α) .

Pr_{U} (θ \in A) = Pr_{U} (θ \in A \oplus α) .

Pr_{U} (θ \in κ) = Pr_{U} (θ \in κ \oplus α) .

Pr_{U} (θ \in κ) = Pr_{U} (θ \in κ \oplus α) .

\forall A, B \in V : Pr_{U} (θ \in A) < Pr_{U} (θ \in B) \Rightarrow \exists C \in V : Pr_{U} (θ \in B) = Pr_{U} (θ \in A) + Pr_{U} (θ \in C) .

\forall A, B \in V : Pr_{U} (θ \in A) < Pr_{U} (θ \in B) \Rightarrow \exists C \in V : Pr_{U} (θ \in B) = Pr_{U} (θ \in A) + Pr_{U} (θ \in C) .

∣ A ∣ = ∣ B ∣ \Rightarrow Pr_{U} (τ \in A) = Pr_{U} (τ \in B) .

∣ A ∣ = ∣ B ∣ \Rightarrow Pr_{U} (τ \in A) = Pr_{U} (τ \in B) .

\frac{Pr _{U} ( σ \in A )}{Pr _{U} ( τ \in B )} \approx 0.

\frac{Pr _{U} ( σ \in A )}{Pr _{U} ( τ \in B )} \approx 0.

∣ A ∣ < ∣ B ∣ \Rightarrow Pr_{U} (δ \in A) < Pr_{U} (δ \in B) .

∣ A ∣ < ∣ B ∣ \Rightarrow Pr_{U} (δ \in A) < Pr_{U} (δ \in B) .

∣ A ∣ < ∣ B ∣ \Rightarrow Pr_{U} (σ \in A) ≪ Pr_{U} (σ \in B) .

∣ A ∣ < ∣ B ∣ \Rightarrow Pr_{U} (σ \in A) ≪ Pr_{U} (σ \in B) .

ω \leq ∣ A ∣ < ∣ B ∣ \leq ∣ V ∣ \Rightarrow Pr_{U} (θ \in A) ≪ Pr_{U} (θ \in B) .

ω \leq ∣ A ∣ < ∣ B ∣ \leq ∣ V ∣ \Rightarrow Pr_{U} (θ \in A) ≪ Pr_{U} (θ \in B) .

{D \in [V]^{< ω} : \frac{Pr ( θ \in A ∣ θ \in D )}{Pr ( θ \in B ∣ θ \in D )} \leq n^{- 1}} \in U .

{D \in [V]^{< ω} : \frac{Pr ( θ \in A ∣ θ \in D )}{Pr ( θ \in B ∣ θ \in D )} \leq n^{- 1}} \in U .

C_{A B}^{n} \equiv {D \in [V]^{< ω} : \frac{Pr ( θ \in A ∣ θ \in D )}{Pr ( θ \in B ∣ θ \in D )} \leq n^{- 1}} .

C_{A B}^{n} \equiv {D \in [V]^{< ω} : \frac{Pr ( θ \in A ∣ θ \in D )}{Pr ( θ \in B ∣ θ \in D )} \leq n^{- 1}} .

A_{x} \equiv {D \in [V]^{< ω} : x \in D} .

A_{x} \equiv {D \in [V]^{< ω} : x \in D} .

F \equiv {C_{A B}^{n} : n \in N, ∣ A ∣ < ∣ B ∣} \cup {A_{x} : x \in V} .

F \equiv {C_{A B}^{n} : n \in N, ∣ A ∣ < ∣ B ∣} \cup {A_{x} : x \in V} .

\frac{Pr ( θ \in A _{j} ∣ θ \in F )}{Pr ( θ \in B _{j} ∣ θ \in F )} \leq n^{- 1},

\frac{Pr ( θ \in A _{j} ∣ θ \in F )}{Pr ( θ \in B _{j} ∣ θ \in F )} \leq n^{- 1},

Pr_{U} (θ \in A) = Pr_{U} (θ \in B) \Rightarrow ∣ A ∣ = ∣ B ∣ .

Pr_{U} (θ \in A) = Pr_{U} (θ \in B) \Rightarrow ∣ A ∣ = ∣ B ∣ .

\forall A, B \in V : ∣ A ∣ < ∣ B ∣ \Rightarrow ∣ P (A)∣ < ∣ P (B)∣

\forall A, B \in V : ∣ A ∣ < ∣ B ∣ \Rightarrow ∣ P (A)∣ < ∣ P (B)∣

\forall A, B \in V : Pr_{U} (θ \in A) < Pr_{U} (θ \in B) \Leftrightarrow Pr_{U} (θ \in P (A)) < Pr_{U} (θ \in P (B)) .

\forall A, B \in V : Pr_{U} (θ \in A) < Pr_{U} (θ \in B) \Leftrightarrow Pr_{U} (θ \in P (A)) < Pr_{U} (θ \in P (B)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Mathematical and Theoretical Analysis · Advanced Topology and Set Theory

Full text

Sets and Probability††thanks: Versions of this paper have been presented a Bristol–Leuven workshop on Logic and Philosophy of Science (2015), at the Philosophy Department of the Universidade Federal do Rio

Grande do Norte (2015), the Fourth Reasoning Conference in Manchester (2015), the Philosophy of Mathematics Seminar in Oxford (2014), and at the Philosophy Departmental Research Seminar in Aberdeen (2014). We are grateful to the audiences for helpful comments, questions, and suggestions. In this respect we are especially indebted to Philip Welch, George Wilmers, and Sylvia Wenmackers.

Hazel Brickhill and Leon Horsten

Abstract

In this article the idea of random variables over the set theoretic universe is investigated. We explore what it can mean for a random set to have a specific probability of belonging to an antecedently given class of sets.

1 Introduction

Probabilistic notions have been applied to mathematical objects and notions. For instance, probabilistic concepts have been applied in the theory of random graphs [Alon et al 2000]. The aim of this article is to apply a notion of probability to the mathematical universe as a whole. More in particular, we wish to explicate what it could mean for a property $A$ of sets to have a probability of being true of a set $y$ in the set theoretic universe $V$ . Properties are identified with their extensions, so that $A$ ranges over all proper and improper classes in $V$ .

The aim is to develop a theory of the probability of events of the form $A(\tau)$ , where $A$ is a class and the variable $\tau$ is a random variable. The state space of the random variables is of course $V$ . The outcome space of the random variables has to be at least as large as $V$ because there must be enough states for a random variable to take each set as a possible value. On the other hand, there is no need for it to be larger than $V$ . Therefore the outcome space is simply identified with $V$ .

Without invoking fixed set of postulates, intuitions about probability have occasionally been used in set theory, for instance to motivate new basic principles [Freiling 1986]. However, such attempts are mostly regarded as unsuccessful [Hamkins 2015]. In the light of this it is natural to wonder what we should require from probability functions associated with random variables on $V$ .

Surely it would be unreasonable to insist on there being one unique correct probability function that yields the probability of a random variable taking a value in a given class of sets. On the other hand, for our functions to have any hope of meriting the label probability function, they have to satisfy Kolmogorov’s conditions for being a finitely additive probability function.

From the outset we impose additional constraints on the class of probability functions that we are interested in:111For a discussion of these constraints in the context of non-Archimedean probability theory, see [Benci et al 2018].

Totality. The probability functions are defined on all classes. 2. 2.

Uniformity. All singleton events are given the same probability. 3. 3.

Regularity. All singleton events are given non-zero probability.

All this means, for familiar reasons, that the sought-for probability functions cannot be Kolmogorov probability functions. Given our insistence on finite additivity, this means that the probability functions will be non-Archimedean. They will not satisfy $\sigma$ -additivity, but they will instead satisfy a generalised infinite additivity rule.

In mathematics today, the term ‘probability’ has become virtually synonymous with ‘function that satisfies the Kolmogorov axioms (including $\sigma$ -additivity)’. If you see matters this way, then you will will be loath to dignify the functions constructed in this paper by the term ‘probability function’. Nonetheless, you may ask the question whether a fine-grained quantitative theory of possibility, with which the degree of possibility of properties can quantitatively be compared, can be constructed. This is what is investigated in the present article. So, if you prefer, you can call the theory constructed in this paper a quantitative theory of possibility. You are then advised to replace all occurrences of ‘(non-standard) probability function’ by ‘quantitative possibility function’.

The project in which we are engaging in this article is related to the work in [Benci et al 2007]. The aim of the latter article is to construct a theory of sizes for mathematical universes inspired by the Euclidean principle that the size of the whole is larger than the sizes of its proper parts. Now there is of course a familiar theory of size—Cantor’s theory of cardinality,—which does not satisfy this Euclidean principle. So Benci and his co-authors propose their Euclidean theory of size as a rival to Cantor’s theory.

We, on the other hand, fully accept Cantor’s theory of cardinality. Nonetheless, the probability functions that will be constructed satisfy the Euclidean principle that the probability of an event is strictly greater than the probability of each of its sub-events. Moreover, the mathematical techniques for generating them are closely related to the techniques that are used in [Benci et al 2007].

What we shall mean by ‘mathematical universe’ is not the same as what is meant [Benci et al 2007] by the term. The authors of [Benci et al 2007] impose mainly algebraic constraints on what counts as a mathematical universe [Benci et al 2007, Introduction]. We, in contrast, take the term ‘mathematical universe’ in the set theoretical sense. Naively, you may take there to be one preferred set theoretic universe: $V$ . But if you are uncomfortable with taking $V$ as given, then you might want to take a mathematical universe to be a rank $V_{\alpha}$ that constitutes a model of most or perhaps even all of the standard principles of set theory. Indeed, we will see that for random variables defined on any large set $S$ , the general idea of equipping them with a probability function will be the same as that for random variables on $V$ .

We will discuss two ways of generating non-Archimedean probability functions for random variables on $V$ . In section 2 a simple way of generating such probability functions (the finite snapshot approach) will be described. In section 3 we go on to discuss how global properties of these probability functions can be made to hold by imposing constraints on the process of generating such functions. In section 4, a theoretically more satisfying but also more complicated way of generating non-Archimedean probability functions for random variables on $V$ is discussed (the bootstrapping method).

2 The finite snapshot approach

A random variable $\tau$ on $V$ is a function from states to the outcome space, i.e., an element of ${{}^{V}}V$ . So there are many random variables on $V$ . The aim is to associate a notion of probability with elements of ${{}^{V}}V$ that meet the minimal constraints (totality, uniformity and regularity) that were described in section 1.

In fact, we want to give precise meaning to conditional probability statements of the form

[TABLE]

where $\sigma,\tau\in{{}^{V}}V$ and $A,B\subseteq V$ . But we will see that it will be sufficient for our purposes to give meaning to unconditional probability statements of the form ${\mathsf{Pr}(\sigma\in A)}.$ So our fundamental problem amounts to giving meaning to expressions of the form $\mathsf{Pr}(\sigma\in A).$ Such probability measures will be determined by a choice of a fine ultrafilter on the collection $[V]^{<\omega}$ of finite subsets of the state space.222What follows is an adaptation of the approach of [Brickhill et al 2018, section 2].

The starting point is a fine ultrafilter $\mathcal{U}$ on $[V]^{<\omega}$ . This fine ultrafilter $\mathcal{U}$ defines a non-Archimedean field $\mathcal{F}_{\mathcal{U}}$ in the following way.

For any two functions $f,g:[V]^{<\omega}\rightarrow\mathbb{Q}$ we define:

Definition 1

[TABLE]

In words: two functions are identified if they coincide on ultrafilter-many states.

The relation $\approx_{\mathcal{U}}$ is an equivalence relation, so we can take equivalence classes for which we then have

[TABLE]

Moreover, it is again a routine exercise to verify that the $[f]_{\mathcal{U}}$ ’s form a hyper-rational field $\mathcal{F}_{\mathcal{U}}$ .

Now suppose $A\subseteq V$ and $\theta\in{{}^{V}}V$ . Then we define the function $f_{\theta\in A}:[V]^{<\omega}\rightarrow\mathbb{Q}$ as follows:

Definition 2

For every $T\in[V]^{<\omega}:$

[TABLE]

In words: for every finite set of states $T$ , $f_{\theta\in A}(T)$ is the ratio between the number of states $s$ in $T$ for which $\theta(s)\in A$ and the number of states in $T$ . In this sense, $f_{\theta\in A}(T)$ is the probability of $\theta\in A$ on a finite snapshot of states.

Similarly, we define the function $f_{\theta\in A\wedge\nu\in B}$ as follows:

Definition 3

For every $T\in[V]^{<\omega}:$

[TABLE]

Now we are ready to define the probability of $\theta\in A$ , relative to a fine (and therefore free) ultrafilter $\mathcal{U}$ on $[V]^{<\omega}$ :

Definition 4

[TABLE]

Similarly, we define $\mathsf{Pr}_{\mathcal{U}}(\theta\in A\wedge\nu\in B)$ as $[f_{\theta\in A\wedge\nu\in B}]_{\mathcal{U}}$ . Thus we have constructed a probability function $\mathsf{Pr}_{\mathcal{U}}$ that takes its values in the hyper-rational field $\mathcal{F}_{\mathcal{U}}$ . Such probability functions are sometimes called NAP functions.

Conditional probability can then be expressed in terms of unconditional probability:

Definition 5

[TABLE]

3 Constraints

From section 1 we know that the aim is not to arrive at a unique (correct) probability function on $V$ . But we did insist from the outset on our probability functions satisfying three global constraints: totality, uniformity, and regularity. It will be shown that these properties are always guaranteed to hold.

There are further global conditions on probability functions on $V$ that seem reasonable to require, and that are not guaranteed to hold without further work. These global constraints will be explored. We will show that many of them can be forced to hold by imposing constraints on the ultrafilters from which the probability functions are generated.

3.1 Elementary properties

The definition of $\mathsf{Pr}_{\mathcal{U}}$ is relative to an initial choice of the fine ultrafilter $\mathcal{U}$ . The properties of $\mathsf{Pr}_{\mathcal{U}}$ depend on $\mathcal{U}$ . Nonetheless, certain basic properties of $\mathsf{Pr}_{\mathcal{U}}$ can be easily seen to hold regardless of which fine ultrafilter $\mathcal{U}$ is chosen:

Proposition 1

$\mathsf{Pr}_{\mathcal{U}}$ * is a finitely additive probability function;* 2. 2.

$\mathsf{Pr}_{\mathcal{U}}$ * is Euclidean.*

Proof.* Easy. * **

Now we define the notion of a diagonal random variable:

Definition 6

A random variable $\theta$ is said to be a diagonal random variable if for any set $x$ , there is exactly one element $u$ of the state space such that $\theta(u)=x$ .

In words: a diagonal random variable is a random variable that takes every value exactly once.

Using this notion, we define the notions of regularity and uniformity:

Definition 7 (regularity)

A probability function $\mathsf{Pr}_{\mathcal{U}}$ is regular if for every diagonal random variable $\theta$ and for every $x\in V,\mathsf{Pr}_{\mathcal{U}}(\theta=x)>0$ .

Definition 8 (uniformity)

A probability function $\mathsf{Pr}_{\mathcal{U}}$ is uniform if for every diagonal random variable $\theta$ and for all $x,y\in V:$

[TABLE]

Proposition 2

For every fine ultrafilter $\mathcal{U}$ :

$\mathsf{Pr}_{\mathcal{U}}$ * is regular;* 2. 2.

$\mathsf{Pr}_{\mathcal{U}}$ * is uniform.*

Proof.* These properties are proved as propositions 2.5 and 2.6 in [Brickhill et al 2018, p. 525–526]. * **

The Euclidean property is formally defined as follows:

Definition 9 (Euclidean)

A probability function $\mathsf{Pr}_{\mathcal{U}}$ is Euclidean if for every diagonal random variable $\theta$ and all $A,B\subseteq V$ :

[TABLE]

Then we have:

Proposition 3

For every fine ultrafilter $\mathcal{U}$ , the probability function $\mathsf{Pr}_{\mathcal{U}}$ is Euclidean.

Proof.* By finite additivity and regularity. * **

Now we turn to infinite additivity. Countable additivity means that the probability of the union of a countable family of disjoint sets is the infinite sum of the probabilities of the elements of the family, where the notion of infinite sum is spelled out in terms of the classical notion of limit. In the present setting, the probability $Pr_{\mathcal{U}}$ of the union of any family of disjoint sets is also the infinite sum of the probabilities of the elements of the family [Benci et al 2013, section 3.4]. But now the notion of infinite sum is spelled out in terms of the generalised notion of limit based on the ultrafilter $\mathcal{U}$ . More precisely, the new notion of infinite sum is defined as follows. Suppose we are given a family $\{q_{i}:i\in\mathbb{N}\}$ of rational numbers, and $I\subseteq\mathbb{N}$ . Then consider the function $f:[\mathbb{N}]^{<\omega}\rightarrow\mathbb{Q}$ given by

[TABLE]

This function can be seen as giving the value of the infinite sum on all finite parts (“snapshots”) of the index set. So we identify the infinite sum of the family $\{q_{i}:i\in I\}$ of rational numbers with the generalised limit of $f$ according to the ultrafilter $\mathcal{U}$ :

Definition 10

[TABLE]

Using this notion of infinite sum, we can express the probability of the union of a disjoint family of sets as the sum of the probabilities of the members of that family:

Proposition 4

If $A=\bigcup_{i\in I}A_{i}$ , with $A_{i}\cap A_{j}=\emptyset$ for all $i,j\in I$ , then for every random variable $\tau$ :

[TABLE]

In sum, $\mathsf{Pr}_{\mathcal{U}}$ has a natural infinite additivity property that is sometimes called perfect additivity.

Proposition 5

For every fine ultrafilter $\mathcal{U}$ , the probability function $\mathsf{Pr}_{\mathcal{U}}$ is perfectly additive.

Proof.* This proposition is proved as proposition 8 in [Benci et al 2013, p. 132–133]. * **

3.2 Symmetry principles

From now on, the symbol $\theta$ will be used to refer to some arbitrary diagonal random variable. When it is not assumed that the random variable in question is diagonal, we will write $\tau$ .

The Euclidean-ness of $\mathsf{Pr}_{\mathcal{U}}$ has implications for symmetry principles. As a rule of thumb, one can say that symmetry principles fail.333See [Benci et al 2007], [Benci et al 2013], [Benci et al 2018].

Proposition 6

For every fine ultraflter $\mathcal{U}$ , the probability function $Pr_{\mathcal{U}}$ is not invariant under all permutations of $V$ .

Proof.* We concentrate on $\mathbb{N}$ as it is canonically represented in $V$ (by means of the Zermelo ordinals, for instance). Define a permutation $\pi$ of $V$ as follows:*

•

$\pi(x)=x$ * for $x\in V\setminus\mathbb{N}$ ; Otherwise:*

•

$\pi(x)=x+2$ * for $x$ even;*

•

$\pi(1)=0$ ;

•

$\pi(x)=x-2$ * for $x$ odd and $>1$ .*

*Let $A\equiv\{0,2,4,\ldots\}$ , and let $\theta$ be a diagonal random variable. Then $\pi(A)\subsetneq A$ . Therefore, by the Euclidean principle, $\mathsf{Pr}_{\mathcal{U}}(\theta\in\pi(A))<\mathsf{Pr}_{\mathcal{U}}(\theta\in A).$ * **

This of course entails that there are diagonal random variables $\theta,\theta^{\prime}$ such that for some $A\subseteq V$ ,

[TABLE]

One popular global constraint on probability measures is translation-invariance. The Lebesgue measure has this property, and Banach limits seem to occupy a privileged position in the class of generalised limits at least in part because they are translation-invariant. In our context, translation-invariance does not make obvious sense. For a random class $A$ , it is not clear what ‘ $A+\alpha$ ’ (where $\alpha$ is a number) means. But a clear interpretation of ‘adding an ordinal number’ can of course be given if $A$ is a collection of ordinals:

Definition 11

For $A$ any collection of ordinals:

[TABLE]

Then for $A$ to be translation-invariant means that for all ordinals $\alpha$ and for every $\theta$ ,

[TABLE]

However, even if we consider non-Archimedean measures (of the kind that we have been describing) on ordinals, translation-invariance conflicts with the Euclidean Property of our generalised probability functions. In particular, there is no $\mathrm{NAP}$ probability function $\mathsf{Pr}_{\mathcal{U}}$ on any infinite cardinal $\kappa$ such that there is even one ordinal $\alpha$ with $0<\alpha<\kappa$ and

[TABLE]

The reason is simple. We have $\kappa\oplus\alpha=\kappa\backslash\alpha\subsetneq\kappa,$ so if we had $\mathsf{Pr}_{\mathcal{U}}(\theta\in\kappa)=\mathsf{Pr}_{\mathcal{U}}(\theta\in\kappa\oplus\alpha),$ then we would contradict the Euclidean principle.

As this example shows, such translations aren t necessarily one to one so we may not want full invariance in general. In [Benci et al 2007, section 1.3], Benci, Forti, and Di Nasso explore a restricted notion of translation-invariance of $\mathrm{NAP}$ -like measures on ordinals. We do not pursue this theme further here, but only pause to note that there are other reasonable-looking principles that are hard to satisfy. In the context of their theory of numerosities, Benci, Forti, and Di Nasso consider a principle that in the present context would take the following form:

Definition 12 (Difference Principle)

[TABLE]

On countable sample spaces, the difference principle can be made to hold by building $\mathsf{Pr}_{\mathcal{U}}$ from a selective ultrafilter [Benci et al 2003]. But the existence of selective ultrafilters is independent of ZFC. As far as we know, it is an open whether the difference principle can be consistently made to hold for NAP probability functions on uncountable sample spaces.

3.3 Probability and cardinality

In this (sub-)section we investigate the relation between our notion of generalised probability on the one hand, and the familiar notion of cardinality on the other hand.

3.3.1 Hume’s principle for probability

One might naively wonder whether the following probabilistic analogue of Hume’s Principle for cardinality can hold:

Definition 13 (Hume’s principle for probability)

For all $A,B\in V$ :

[TABLE]

But the probability functions $\mathsf{Pr}_{\mathcal{U}}$ that we have been considering cannot satisfy Hume’s principle for probability, as its failure is an immediate consequence of Proposition 6: invariance under permutations and Hume’s principle for probability are mathematically equivalent. However, this was only to be expected. After all, we do not expect Kolmogorov probability (on infinite spaces) to satisfy any such principle.

3.3.2 Superregularity

The hyper-rational field $\mathcal{F}_{\mathcal{U}}$ in which the probability functions $\mathsf{Pr}_{\mathcal{U}}$ take their values contain infinitesimal numbers—this is what makes it non-Archimedean. We will write $\mathsf{Pr}_{\mathcal{U}}(\sigma\in A)\approx 0$ if $\mathsf{Pr}_{\mathcal{U}}(\sigma\in A)<n^{-1}$ for each $n\in\mathbb{N}$ . And we will write $\mathsf{Pr}_{\mathcal{U}}(\sigma\in A)\ll\mathsf{Pr}_{\mathcal{U}}(\tau\in B)$ if

[TABLE]

We have seen that $\mathsf{Pr}_{\mathcal{U}}$ cannot satisfy Hume’s principle for probability. But, at least at first sight, it seems that it would be reasonable to demand:

[TABLE]

Indeed, if in addition $\left|B\right|\geq\omega$ , then we might even expect

[TABLE]

Further, this may be expected to hold if $B$ is a proper class but $A$ is a set . The result is a size constraint which is a strengthening of the requirement of regularity:

Definition 14 (Superregularity)

[TABLE]

Note that if $A$ is finite and $B$ is infinite then the consequent holds automatically.

By a suitable restriction on admissible ultrafilters $\mathcal{U}$ , superregularity can indeed be made to hold:

Theorem 1

There are fine ultrafilters $\mathcal{U}$ such that $\mathsf{Pr}_{\mathcal{U}}$ is superregular.

Proof.**

If $A,B\in V$ such that $\omega\leq\left|A\right|<\left|B\right|$ are given, then we have $\mathsf{Pr}_{\mathcal{U}}(\theta\in A)\ll\mathsf{Pr}_{\mathcal{U}}(\theta\in B)$ if and only if for each $n\in\mathbb{N}$ ,

[TABLE]

The aim is to build an ultrafilter $\mathcal{U}$ for which this holds.

For any $n\in\mathbb{N}$ , define

[TABLE]

Moreover, let

[TABLE]

Define also

[TABLE]

We want to prove that $\mathcal{F}$ has the finite intersection property. Therefore take any $x_{1},\ldots,x_{k}\in V$ , and any $\langle A_{1},B_{1},n_{1}\rangle,\ldots,\langle A_{l},B_{l},n_{l}\rangle$ such that $\left|A_{j}\right|<\left|B_{j}\right|$ and $n_{j}\in\mathbb{N}$ for $j\leq l.$ Assume for the construction that $|A_{1}|\leq|A_{2}|\leq\dots\leq|A_{l}|$ . For every finite $D$ , if $\{x_{1},\ldots.x_{k}\}\subseteq D,$ then $D\in\bigcap_{i\leq k}A_{x_{i}}.$ So setting $n=max\{n_{j}:j<l\}$ we will extend $\{x_{1},\ldots.x_{k}\}$ to a set in $C^{n}_{A_{j}B_{j}}$ , and hence $C^{n_{j}}_{A_{j}B_{j}}$ , for each $j\leq l$ . Set $F_{0}=\{x_{1},\ldots.x_{k}\}$ and $a_{0}=|F_{0}\cap A_{1}|$ . As $B_{1}$ is infinite and of larger cardinality than $A_{1}$ we add $n\cdot a_{0}$ elements of $B_{1}\setminus A_{1}$ to $F_{0}$ , yielding a finite set $F_{1}$ . Now set $a_{1}=|F_{1}\cap A_{2}|$ , and add $n\cdot a_{1}$ elements of $B_{2}\setminus(A_{1}\cup A_{2})$ to $F_{1}$ to give $F_{2}$ . Note we can find these elements of $B_{2}$ as $|B_{2}|>|A_{2}|\geq|A_{1}|$ . Continuing in this manner, set $F=F_{l}$ . Then we have ensured that for all $j\leq l$

[TABLE]

and so we have $F\in C^{n}_{A_{j}B_{j}},$ and since $D\subseteq F,$ we also have $F\in\bigcap_{i\leq k}A_{x_{i}}.$

*So $\mathcal{F}$ indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter $\mathcal{U}$ . By design, then, the resulting probability function $\mathsf{Pr}_{\mathcal{U}}$ is super-regular. *

Once again, Hume’s Principle for probability cannot hold for the notion of probability that we are investigating. But this leaves open the question whether the converse of Hume’s Principle for probability can be made to hold. This is called Cantor’s Principle in [Benci et al 2007], where the authors investigate it in the context of their Euclidean theory of size:

Definition 15 (Cantor’s Principle)

[TABLE]

Benci, Forti, and Di Nasso prove that ‘Cantor’s Principle’ can be made to hold [Benci et al 2007, section 3.2]. It is also clear that Cantor’s Principle follows from super-regularity.

3.3.3 The power set principle

The question whether

[TABLE]

is true, is independent of the axioms of set theory. (Of course the principle is true if the Generalised Continuum Hypothesis holds.) Like the cardinality operator, our NAP probability functions are measures of some kind. One might wonder what should follow from $\mathsf{Pr}_{\mathcal{U}}(\theta\in A)<\mathsf{Pr}_{\mathcal{U}}(\theta\in B).$ In particular, given that $\mathsf{Pr}_{\mathcal{U}}$ is intended to be a fine-grained quantitative possibility measure, perhaps probability should be expected to co-vary with the power set operation in some fairly direct manner. In other words, it is natural to ask if the following principle can be made to hold:

Definition 16 (Power Set Condition)

[TABLE]

It turns out that the power set condition can indeed be satisfied:

Theorem 2

There are fine ultrafilters $\mathcal{U}$ such that $\mathsf{Pr}_{\mathcal{U}}$ satisfies the power set condition.

The argument for this is somewhat more involved.

We aim to prove Theorem 2 by building the probability function up from an ultrafilter $\mathcal{U}$ which is based on a pre-filter $\mathcal{C}\subseteq\mathcal{P}([V]^{<\omega})$ that has the finite intersection property.

The class $\mathcal{C}$ is built up in stages, and in such a way that it eventually witnesses the truth of the power set condition for all $A,B\in V$ .

Stage 0

The class $\mathcal{C}_{0}$ consists of all

[TABLE]

for $x\in V$ . This is to ensure that the ultrafilter that will be built from $\mathcal{C}$ is fine. We know that $\mathcal{C}_{0}$ has the finite intersection property.

Limit stages

For limit stages $\lambda$ , we simply set $\mathcal{C}_{\lambda}\equiv\bigcup_{\beta<\lambda}\mathcal{C}_{\beta}$ .

Successor stages

Given fine-ness, we may, and will, ignore the elements of $V_{\omega}$ . At stage $\alpha>\omega$ , where $\alpha$ is a successor ordinal, we consider the sets of $V_{\alpha}\backslash V_{\alpha-1}$ and ensure that the power set condition eventually holds for all these sets and their power sets, by adding families of finite sets to $\mathcal{C}_{\alpha-1}$ in such a way that the finite intersection property is preserved.

As an illustrative and indeed representative example we do the case where $\alpha=\omega+1$ .

Let there be given an enumeration $\{A_{1},B_{1}\},\ldots,\{A_{\beta},B_{\beta}\},\ldots$ of the pairs of elements of $V_{\omega+1}\backslash V_{\omega}$ .

For the induction, we assume that, by having added appropriate sets of finite sets to $\mathcal{C}_{0}$ , the power set condition holds for $\{A_{1},B_{1}\},\ldots,\{A_{\beta},B_{\beta}\}$ and their power sets, and that in the process the finite intersection property has been preserved. The aim is now to extend this so that it also holds for $\{A_{\beta+1},B_{\beta+1}\}$ . In other words, we have constructed $\mathcal{C}_{1}^{\beta}$ , and we want to obtain $\mathcal{C}_{1}^{\beta+1}$ , where $\mathcal{C}_{1}^{0}\equiv\mathcal{C}_{0}$ .

Definition 17

[TABLE]

Definition 18

[TABLE]

Claim

Either $\mathcal{C}_{1}^{\beta}\cup\{C_{A_{\beta}<B_{\beta}}\}$ has the finite intersection property, or $\mathcal{C}_{1}^{\beta}\cup\{C_{A_{\beta}\geq B_{\beta}}\}$ has the finite intersection property (or both).

Proof

Suppose not. Then there is a finite intersection $F$ of elements of $\mathcal{C}_{1}^{\beta}$ such that $F\cap C_{A_{\beta}<A_{\beta}}=\emptyset$ , and there is a finite intersection $F^{\prime}$ of elements of $\mathcal{C}_{1}^{\beta}$ such that $F^{\prime}\cap C_{A_{\beta}\geq B_{\beta}}=\emptyset$ . But then $(F\cap F^{\prime})\cap C_{A_{\beta}<B_{\beta}}=\emptyset$ and $(F\cap F^{\prime})\cap C_{A_{\beta}\geq B_{\beta}}=\emptyset$ . But $C_{A_{\beta}<B_{\beta}}\cup C_{A_{\beta}\geq B_{\beta}}=[V]^{<\omega}.$ So then $(F\cap F^{\prime})=\emptyset$ . But this contradicts the inductive assumption that $\mathcal{C}_{1}^{\beta}$ has the finite intersection property.

Thus define $\mathcal{C}^{\beta+1}_{1}$ to be $\mathcal{C}_{1}^{\beta}\cup\{C_{A_{\beta}<B_{\beta}}\}$ if this has the finite intersection property, or $\mathcal{C}_{1}^{\beta}\cup\{C_{A_{\beta}\geq B_{\beta}}\}$ otherwise, and by the claim, $\mathcal{C}^{\beta+1}_{1}$ has the finite intersection property. Now setting $\mathcal{C}^{-}_{1}\equiv\bigcup_{\beta}\mathcal{C}^{\beta}_{1},$ we may conclude that $\mathcal{C}^{-}_{1}$ has the finite intersection property.

At this point we must extend $\mathcal{C}^{-}_{1}$ by adding to $\mathcal{C}^{-}_{1}$ :

•

every set of the form $C_{\mathcal{P}(A)<\mathcal{P}(B)}$ such that $C_{A<B}\in\mathcal{C}^{-}_{1}$ ;

•

every set of the form $C_{\mathcal{P}(A)\geq\mathcal{P}(B)}$ such that $C_{A\geq B}\in\mathcal{C}^{-}_{1}$ .

Call the resulting set $\mathcal{C}_{1}$ . Our aim is to prove that $\mathcal{C}_{1}$ has the finite intersection property.

Consider an arbitrary non-empty finite family $\mathcal{F}\subseteq\mathcal{C}_{1}$ . Without loss of generality we may assume that the ‘judgements’ in $\mathcal{F}$ of the form $C_{\mathcal{P}(A)<\mathcal{P}(B)}$ or $C_{\mathcal{P}(A)\geq\mathcal{P}(B)}$ , taken together, describe a finite total pre-ordering relation $R$ on some set $\{\mathcal{P}(A_{1}),\ldots,\mathcal{P}(A_{k})\}$ . Further, we may also assume that for and sets $A$ and $B$ from $V_{\omega+1}\backslash V_{\omega}$ , $C_{\mathcal{P}(A)<\mathcal{P}(B)}\in\mathcal{F}$ if and only if $C_{A<B}\in\mathcal{F}$ , and $C_{\mathcal{P}(A)\geq\mathcal{P}(B)}$ iff $C_{A\geq B}\in\mathcal{F}$ . Thus $\mathcal{F}$ contains witnesses for all the relevant judgements we may be interested in.

Let $\mathcal{F}^{-}=\mathcal{F}\cap\mathcal{C}^{-}_{1}$ , so $\mathcal{F}^{-}$ consists only of judgements about sets in $V_{\omega+1}\backslash V_{\omega}$ . Then we know from the foregoing that $\bigcap\mathcal{F}^{-}\neq\emptyset$ . So take some $F^{-}\in\bigcap\mathcal{F}^{-}$ . Our plan is inductively to extend $F^{-}$ , using the pre-order $R$ , to a finite set $F\in\bigcap\mathcal{F}$ .

We will add to $F^{-}$ elements that ensure that the constraints of $R$ are satisfied. Moreover, by choosing the elements to be added to $F^{-}$ from $V_{\omega+1}\backslash V_{\omega}$ ,444For later stages we will take these sets from $V_{\alpha+1}\backslash V_{\alpha}$ , i.e. sets of rank $\alpha$ . we ensure that the constraints imposed by $\mathcal{F}^{-}$ remain satisfied. As a result, $F$ will satisfy all constraints from $\mathcal{F}$ , so $\bigcap\mathcal{F}\neq\emptyset$ and hence $\mathcal{C}_{1}$ has the finite intersection property.

As an example, suppose that

[TABLE]

(1) We start by ensuring that $\mathcal{P}(A_{1})<\mathcal{P}(A_{2})$ is satisfied.

Suppose that $F^{-}$ already contains $n$ elements of $\mathcal{P}(A_{1})$ . Since $C_{A_{1}<A_{2}}\in\mathcal{F}$ , there must be an element $x^{-}\in A_{2}\backslash A_{1}$ . This implies that there are infinitely many infinite sets $x$ in $\mathcal{P}(A_{2})\backslash\mathcal{P}(A_{1})$ such that $x^{-}\in x$ : we add $n+1$ such elements to $F^{-}$ , and call the resulting finite set $F^{-}_{1}$ .

(2) We proceed in similar fashion to ensure that $\mathcal{P}(A_{2})<\mathcal{P}(A_{3})$ is satisfied:

Suppose that $F^{-}_{1}$ already contains $m$ elements from $\mathcal{P}(A_{2})$ , observing that it may be the case that $m>n+1$ , for there may already be a finite number of elements of $\mathcal{P}(A_{2})$ in $F^{-}$ . Since $C_{A_{2}<A_{3}}\in\mathcal{F}$ , there must be an element $y^{-}_{1}\in A_{3}\backslash A_{2}$ , and since $C_{A_{1}<A_{3}}\in\mathcal{F}$ , there must be an element $y^{-}_{2}\in A_{3}\backslash A_{1}$ . So there are infinitely many infinite sets $y$ in $\mathcal{P}(A_{3})$ such that $y^{-}_{1},y^{-}_{2}\in y$ : add $m+1$ such elements to $F^{-}_{1}$ , and call the resulting set $F^{-}_{2}$ .

(3) Now suppose that there are $m_{1}$ elements of $\mathcal{P}(A_{3})$ in $F^{-}_{2}$ , and $m_{2}$ elements of $\mathcal{P}(A_{4})$ in $F^{-}_{2}$ . Moreover, suppose that $m_{2}<m_{1}$ . (The case where $m_{1}<m_{2}$ is similar.) Since $C_{A_{3}\geq A_{4}},C_{A_{4}\geq A_{3}}\in\mathcal{F}$ , but also $A_{3}\neq A_{4}$ , there must be some $x_{1}\in A_{3}\backslash A_{4}$ and some $x_{2}\in A_{4}\backslash A_{3}$ . Moreover, since $C_{A_{1}<A_{4}},C_{A_{2}<A_{4}}\in\mathcal{F}$ , there are elements $x_{3}\in A_{4}\backslash A_{1},x_{4}\in A_{4}\backslash A_{2}$ . So $\mathcal{P}(A_{4})$ contains infinitely many infinite sets $x$ such that $\{x_{2},x_{3},x_{4}\}\subset x$ . Similarly, $\mathcal{P}(A_{3})$ contains infinitely many infinite sets $x$ that are outside $\mathcal{P}(A_{1}),\mathcal{P}(A_{2}),\mathcal{P}(A_{4})$ . So we add a sufficient number of such elements to $F^{-}_{2}$ so that there are an equal number $p$ of “witnesses” for $\mathcal{P}(A_{3})$ as for $\mathcal{P}(A_{4})$ but where $p$ is larger than the number of witnesses for $\mathcal{P}(A_{2})$ . Call the resulting set $F^{-}_{3}$ .

(4) To conclude, we set $F\equiv F_{3}^{-}$ . It is clear that $F\in\bigcap\mathcal{F}$ .

This procedure of extending $F^{-}$ easily generalises to any finite total pre-ordering on $\{\mathcal{P}(A_{1}),\ldots,\mathcal{P}(A_{k})\}$ . Thus we have shown that $\mathcal{C}_{1}$ has the finite intersection property.

This procedure for extending $\mathcal{C}_{0}$ to $\mathcal{C}_{1}$ while preserving the finite intersection property also works for larger successor ordinals: at level $V_{\alpha+1}$ (stage $\beta+1$ with $\alpha=\omega+\beta$ ) we can extend the corresponding $F^{-}$ using subsets of rank $\alpha$ . As we have said above, at limit stages we can simply take unions. Ultimately we set $\mathcal{C}\equiv\bigcup_{\alpha\in On}\mathcal{C}_{\alpha}$ .

The class $\mathcal{C}$ will then have the finite intersection property, so it can be extended to a filter and then to an ultrafilter $\mathcal{U}$ . The probability function based on $\mathcal{U}$ will make the power set condition true for all $A,B\in V$ , and this concludes the proof of theorem 2.

Our proof actually shows something slightly stronger: for all $A,B$ with $\left|A\right|,\left|B\right|\geq\omega$ , we have

[TABLE]

The reason is that in enlarging the set $F^{-}$ we always have infinitely many elements to choose from.

For any probability measure $\mathsf{Pr}_{\mathcal{U}}$ that satisfies power set condition we also have that $\forall A,B\in V,\forall n\in\omega$ :

[TABLE]

where $\mathcal{P}^{n}(A)=\mathcal{P}(\mathcal{P}(\dots\mathcal{P}(A)\dots))$ . An easy argument shows this cannot extend to infinite applications of the power set operation.

One might wonder whether the motivations behind the power set condition should not also support imposing the following restricted power set condition on $\mathsf{Pr}_{\mathcal{U}}$ :555Thanks to Philip Welch for this question.

Question 1

Are there probability measures such that

[TABLE]

3.4 The ordinals

For $\alpha\geq\omega,$ in each level $V_{\alpha+1}\setminus V_{\alpha}$ of the iterative hierarchy one finds only one ordinal, but infinitely many sets that are not ordinals. This might lead one to believe that a probability function on $V$ should satisfy

[TABLE]

where ‘On’ is the class of ordinals.

Just as it seems reasonable to require that the probability of choosing an even natural number from the set of natural numbers must be equal to or infinitesimally close to $\frac{1}{2}$ (see [Wenmackers et al 2013, section 6.2]), it seems reasonable to require that

[TABLE]

where ‘Even’ is the class of even ordinals, which is defined in the obvious way.

Moreover, between any two limit ordinals there are infinitely many successor ordinals, so one might expect

[TABLE]

where ‘Lim’ is the class of limit ordinals.

We will sketch how probability functions can be constructed that meet these expectations. Indeed, we will see that there are probability functions that meet these ‘ordinal expectations’ and in addition meet the size constraint of super-regularity.

Theorem 3

There are super-regular probability functions $Pr$ such that:

$\mathsf{Pr}_{\mathcal{U}}(\theta\in\mathrm{On})\approx 0;$ ** 2. 2.

$\mathsf{Pr}_{\mathcal{U}}(\theta\in\mathrm{Even}\mid\theta\in\mathrm{On})\approx 2^{-1};$ ** 3. 3.

$\mathsf{Pr}_{\mathcal{U}}(\theta\in\mathrm{Lim}\mid\theta\in\mathrm{On})\approx 0.$ **

Proof. As before, the aim is wisely to choose the ultrafilter $\mathcal{U}$ on which $\mathsf{Pr}_{\mathcal{U}}$ is based. We want $\mathcal{U}$ to be such that for all $k,l,m\in\mathbb{N}$ :

•

$\frac{\mathsf{Pr}_{\mathcal{U}}(\theta\in A)}{\mathsf{Pr}_{\mathcal{U}}(\theta\in B)}\leq k^{-1}$ * if $\omega\leq\left|A\right|<\left|B\right|;$ *

•

$\mathsf{Pr}_{\mathcal{U}}(\theta\in\mathrm{Even}\mid\theta\in\mathrm{On})-\mathsf{Pr}_{\mathcal{U}}(\theta\in\mathrm{Odd}\mid\theta\in\mathrm{On})\leq l^{-1}$ * and $\mathsf{Pr}_{\mathcal{U}}(\theta\in\mathrm{Lim}\mid\theta\in\mathrm{On})\leq l^{-1};$ *

•

$\mathsf{Pr}_{\mathcal{U}}(\theta\in On)\leq m^{-1}.$ **

Now we define:

•

$A_{x}\equiv\{D\in[V]^{<\omega}:x\in D\};$ **

•

$C^{k}_{AB}\equiv\{D\in[V]^{<\omega}:\frac{\mathsf{Pr}[A\mid D]}{\mathsf{Pr}[B\mid D]}\leq k^{-1}\};$ **

•

$I^{l}\equiv\{D\in[V]^{<\omega}:\forall\alpha\in D\exists\beta\exists n\geq l(\alpha\in[\beta,\beta+n]\subseteq D)\};$ **

•

$W^{m}\equiv\{D\in[V]^{<\omega}:\mathsf{Pr}[On\mid D]\leq m^{-1}\}.$ **

And now we set:

[TABLE]

Claim: $\mathcal{F}_{0}$ has the finite intersection property.

Let some $x_{1},\ldots,x_{n}$ be given. Now $\bigcap_{i\leq n}I^{l_{i}}=I^{l}$ where $l={max\{l_{i}:i<n\}}$ , and similarly for $\bigcap_{i\leq n}W^{m_{i}}$ , so as before in theorem 1, it suffices to concentrate on the highest values of $k,l,m$ .

(1) $A\in\bigcap_{i\leq n}A_{x_{i}}\Leftrightarrow\{x_{1},\ldots,x_{n}\}\subseteq A.$ So we start with the finite set $A_{0}\equiv\{x_{1},\ldots,x_{n}\},$ and will extend it.

(2) Again we concentrate on one pair $\langle A,B\rangle$ such that $\omega\leq\left|A\right|<\left|B\right|$ ; we leave out further cases as they are similar. There are arbitrarily large finite subsets $C\subseteq B$ that are $l$ -isolated from elements of $A_{0}$ , meaning that each ordinal in $C$ is more than $l$ ordinals removed from any ordinal in $A$ . We choose any such $C\subseteq B$ that is of size at least $k\cdot n$ , and we set $A_{1}\equiv A_{0}\cup C$ .

(3) Now we extend $A_{1}$ to ensure that all ordinal intervals are of length $\geq l$ : for each $\alpha\in A_{1}$ , we add $\alpha+1,\ldots,\alpha+l$ . Call the resulting finite collection $A_{2}$ . Note that by our choice of $l$ -isolated elements in (2), none of $\alpha+1,\ldots,\alpha+l$ are elements of $A$ .

(4) Let $\left|A_{2}\right|=j$ . Then we add $j\cdot m$ elements of $V\setminus(A\cup B\cup On)$ to $A_{2}$ and call the resulting set $A_{3}$ .

*It is now routine to verify that $A_{3}\in\bigcap_{i\leq n}A_{x_{i}}\cap C^{k}_{AB}\cap I^{l}\cap W^{m}$ . The case including further sets $C^{k}_{A^{\prime}B^{\prime}}$ is similar, thus the claim is verified. So $\mathcal{F}_{0}$ indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter $\mathcal{U}$ . By design, the resulting probability function $\mathsf{Pr}_{\mathcal{U}}$ has the required properties. *

4 The bootstrapping approach

The probability $\mathsf{Pr}_{\mathcal{U}}(\theta\in A)$ is obtained by ‘summing up’ the probabilities $\mathsf{Pr}(\theta\in A\mid\theta\in S)$ for all ‘small’ parts $S$ of $V$ ; such $\mathsf{Pr}(\theta\in A\mid\theta\in S)$ are seen as approximations of $\mathsf{Pr}_{\mathcal{U}}(\theta\in A)$ .

In the finite snapshot approach, ‘small’ in this context means ‘finite’. But from a conceptual point of view, ‘finite’ might be taken to be too small as far as the test sets (or snapshots) are concerned. Compared to $V$ , all sets —and not just the finite sets— are small. So to determine $\mathsf{Pr}_{\mathcal{U}}(\theta\in A)$ , we should take the ‘limit’ of the values $\mathsf{Pr}(\theta\in A\mid\theta\in S)$ , where $S$ is a set of any size. Then if $S$ is infinite, $\mathsf{Pr}(\theta\in A\mid\theta\in S)$ cannot just be taken to be given by the ratio formula but needs to be defined.

In the approach to which we now turn (the bootstrapping approach), a probability $\mathsf{Pr}_{\mathcal{U}}(\theta\in A)$ is determined by the probabilities $\mathsf{Pr}_{\mathcal{U}}(\theta\in A\mid\theta\in S)$ , where $\mathsf{Pr}_{\mathcal{U}}(\theta\in A\mid\theta\in S)$ , for $S$ a large set, is then in turn determined by probabilities $\mathsf{Pr}_{\mathcal{U}}(\theta\in A\mid\theta\in S^{\prime})$ for $S^{\prime}$ being smaller ‘snapshots’ than $S$ , and so on, until we reach the finite snapshots and can appeal to the probability functions that were discussed in the previous sections. Thus the bootstrapping account can be seen as a generalisation of the finite snapshot approach.

4.1 The rough idea

In general terms, this is how we will proceed:

(1) By the construction from the previous section, a fine ultrafilter on $[S]^{<\omega}$ yields a notion of probability on all sets $S\in V$ with $\left|S\right|<\omega_{1}$ . In other words, this yields a suitable notion of probability, call it $\mathsf{Pr}^{S}$ , for every countable set $S$ .

(2) The notion of $\mathsf{Pr}^{S}$ for all $S\in V$ with $\left|S\right|<\omega_{2}$ is determined using the notion of probability on countable sets: the probability of $A$ on such an $S$ is determined by the class of probabilities of $A$ on the countable ‘snapshots’ of $S$ . Using these countable probability functions, a fine ultrafilter on $[S]^{<\omega_{1}}$ gives us a notion of probability on sets $S$ with $\left|S\right|<\omega_{2}$ .

Again the resulting functions $\mathsf{Pr}^{S}$ are essentially NAP-functions as defined in [Benci et al 2013]. They are total, regular, etc.

…

( $\beta$ ) A fine ultrafilter on $[S]^{<\omega_{\alpha}}$ , together with probability functions $\mathsf{Pr}^{S}$ for all $S$ such that $\left|S\right|<\omega_{\alpha}$ , yields a notion of probability on all sets $S$ with $\left|S\right|<\omega_{\alpha+1}$ .

…

Limit stages of course do not present a problem. So by transfinite recursion on cardinality this yields for every set $S$ a notion $\mathsf{Pr}^{S}$ of probability on $S$ .

Then a fine ultrafilter $\mathcal{U}$ on $V=[V]^{<{Card}}$ yields, using the general notion $\mathsf{Pr}^{S}$ for $S\in V$ , a notion $\mathsf{Pr}^{V}$ that is a total (class) function from properties $A$ and random variables $\theta$ to values $\mathsf{Pr}^{V}(\theta\in A)$ in a non-Archimedean class field. This probability function again satisfies the principles of the theory NAP in [Benci et al 2013].

For this construction, what we need is suitable (fine) ultrafilters on small, and somewhat larger, and large, $\ldots$ sets, and a fine ultrafilter $\mathcal{U}$ on $[V]^{<Card}$ . But we will see that all the set ultrafilters used in the construction can be uniformly obtained as restrictions to sets $S$ of the given fine ultrafilter on $[V]^{<Card}$ . So $\mathsf{Pr}^{V}$ is determined by one initial choice of $\mathcal{U}$ , whereby $\mathsf{Pr}^{V}$ can be seen as the ‘limit’ of its set-restrictions $\mathsf{Pr}^{S}$ , where the functions $\mathsf{Pr}^{S}$ can in turn be seen as ‘limits’ of restrictions to their small subsets. This uniform construction has the advantage that the resulting probability functions are all coherent, in the sense that for a set $T$ , $\mathsf{Pr}^{S}(A|T)$ is the same for all $S\supseteq T$ and hence also for $V$ .

Now it is time to look at details of the construction.

4.2 Details 1: Restrictions of fine ultrafilters

Since our construction involves ultrafilters on sets $[S]^{<\kappa}$ with $\kappa>\omega$ , we make the following definition, which accords with the usual definition of fineness on $[S]^{<\omega}$ .

Definition 19

For any infinite cardinal $\kappa$ , an ultrafilter on $[S]^{<\kappa}$ is fine iff for every $x\in S:$

[TABLE]

The notion of ‘set-fine’ ultrafilter on $V$ is defined in the obvious way.

We first show that appropriate restrictions of ultrafilters to smaller sets can be obtained in a uniform fashion.

Definition 20

Suppose $S\in V$ , $\lvert S\rvert=\kappa$ , and $\mathcal{U}$ a fine ultrafilter on $[S]^{<\kappa}$ , and $S^{\prime}\subseteq S$ with $\lvert S^{\prime}\rvert=\alpha<\kappa$ . Then we define the restriction $\mathcal{U}_{S^{\prime}}$ of $\mathcal{U}$ to $S^{\prime}$ as follows.

For any $X\in\mathcal{P}([S]^{<\kappa})$ , let

[TABLE]

Then $\mathcal{U}_{S^{\prime}}\equiv\{X_{S^{\prime}}\mid X\in\mathcal{U}\}.$

Proposition 7

For any $S\in V$ with $\left|S\right|=\kappa$ , there are fine ultrafilters $\mathcal{U}$ on $[S]^{<\kappa}$ that restrict to a fine ultrafilter on every $S^{\prime}\subseteq S$ with $\left|S^{\prime}\right|=\alpha$ , and $\omega\leq\alpha<\kappa$ .

Further, such ultrafilters are coherent in that if $T\subset S^{\prime}$ with $\omega\leq|T|<|S^{\prime}|$ , then $(\mathcal{U}_{S^{\prime}})_{T}=\mathcal{U}_{T}$ .

Proof.* We build the ultrafilter from a pre-filter $\mathcal{F}_{0}$ (i.e., a set with the finite intersection property), which can then be extended to a filter and then to an ultrafilter.*

For each $x\in S$ , let

[TABLE]

And let for each $S^{\prime}$ with $\left|S^{\prime}\right|=\alpha<\kappa$ and $S^{\prime}\subseteq S$ :

[TABLE]

Now set

[TABLE]

It is easy to see that $\mathcal{F}_{0}$ has the finite intersection property and so can be extended to an ultrafilter $\mathcal{U}$ . And by design, $\mathcal{U}$ is fine.

Clearly $\mathcal{U}_{S^{\prime}}\subseteq\mathcal{P}([S^{\prime}]^{<\alpha}).$ We must check the fine ultrafilter properties:

(1) Fine. This follows from the fact that $\mathcal{U}$ is fine: for $x\in S^{\prime}$ this is witnessed by $(A_{x})_{S^{\prime}}$ .

(2) Finite intersection. Let $X,Y\in\mathcal{U}_{S^{\prime}}$ . Then there are $\overline{X},\overline{Y}\in\mathcal{U}$ such that $X=\overline{X}_{S^{\prime}}$ and $Y=\overline{Y}_{S^{\prime}}$ . By the finite intersection property of $\mathcal{U}$ , we know that $\overline{X}\cap\overline{Y}\in\mathcal{U}.$ But $X\cap Y\supseteq(\overline{X}\cap\overline{Y})_{S^{\prime}}.$ So $X\cap Y\in\mathcal{U}_{S^{\prime}}$ .

(3) Ultra. Take any $X\subseteq[S^{\prime}]^{<\alpha}$ , and let $X^{c}\equiv[S^{\prime}]^{<\alpha}\backslash X.$ Let $\overline{X}\equiv\{x\in[S]^{<\kappa}\mid x\cap S^{\prime}\in X\}$ and let $\overline{X^{c}}\equiv\{x\in[S]^{<\kappa}\mid x\cap S^{\prime}\not\in X\}.$ Then $\overline{X^{c}}=[S]^{<\kappa}\backslash\overline{X}.$ By the ultra property for $\mathcal{U}$ , we have $\overline{X}\in\mathcal{U}$ or $\overline{X^{c}}\in\mathcal{U}$ . But $X=\overline{X}_{S^{\prime}}$ and $X^{c}=\overline{X^{c}}_{S^{\prime}}$ . So $X\in\mathcal{U}_{S^{\prime}}$ or $X^{c}\in\mathcal{U}_{S^{\prime}}.$

(4) Non-principality. This is implied by fineness.

(5) Empty set property: We have to show that $\emptyset\not\in\mathcal{U}_{S^{\prime}}$ . It suffices to show that for each $X\in\mathcal{U}$ , $X_{S^{\prime}}\neq\emptyset$ . Since $R^{S^{\prime}}\in\mathcal{U}$ , $X\cap R^{S^{\prime}}\neq\emptyset$ . But for any set $x$ in this intersection, $x\cap S^{\prime}\in[S^{\prime}]^{<\alpha}$ . So $x\cap S^{\prime}\in X_{S^{\prime}}\neq\emptyset.$

*For coherence, take $T\subset S^{\prime}\subset S$ with $|T|<|S^{\prime}|<|S|$ and let $X\in\mathcal{U}$ . As $R^{S^{\prime}}\in\mathcal{U}$ it is enough to show that $((X\cap R^{S^{\prime}})_{S^{\prime}})_{T}=(X\cap R^{S^{\prime}})_{T}$ . Now $((X\cap R^{S^{\prime}})_{S^{\prime}})_{T}=\{y\mid\exists z\in X\cap R^{S^{\prime}}:y=z\cap T,|y|<|T|\textrm{ and }\lvert z\cap S^{\prime}\rvert<|S^{\prime}|\}$ , but by definition, for any $z\in R^{S^{\prime}}$ we have $\lvert z\cap S^{\prime}\rvert<|S^{\prime}|$ . Thus $((X\cap R^{S^{\prime}})_{S^{\prime}})_{T}=\{y\mid\exists z\in X\cap R^{S^{\prime}}:y=z\cap T\textrm{ and }|y|<|T|\}=(X\cap R^{S^{\prime}})_{T}$ . *

But this means that this property must also hold for fine ultrafilters on $[V]^{<Card}:$

Consequence 1

There are fine ultrafilters $\mathcal{U}$ on $[V]^{<Card}$ , such that for every set $S$ with $\left|S\right|=\alpha$ , $\mathcal{U}_{S}$ is a fine ultrafilter on $[S]^{<\alpha}$ and the coherence property holds.

Proof.* By the same reasoning as in the previous proposition. * **

4.3 Details 2: defining probability functions

Now we show how for every set, a probability function on that set can be defined. The same procedure can then be used to define a probability function on $V$ , and these probability functions are coherent.

The key is to spell out what is involved in the $\beta$ -th step of the recursive procedure for defining probabilities on sets:

( $\beta$ ) A fine ultrafilter $\mathcal{U}$ on $[S]^{<\omega_{\beta}}$ (with $\omega_{\beta}=\left|S\right|$ ), together with probability functions $\mathsf{Pr}^{T}$ for all $T$ such that $\left|T\right|<\omega_{\beta}$ , yields a notion of probability $\mathsf{Pr}^{S}$ on $S$ .

As in section 2, we define a function $f_{\theta\in A}$ such that for all $T\in[S]^{<\omega_{\beta}}$ :

[TABLE]

Similarly, we define a function $f_{\theta\in A\wedge\nu\in B}$ such that for all $T\in[S]^{<\omega_{\beta}}$ :

[TABLE]

Then $\mathsf{Pr}^{S}(\theta\in A)$ is defined as $[f_{\theta\in A}]_{\mathcal{U}}$ , and $\mathsf{Pr}^{S}(\theta\in A\mid\nu\in B)$ is defined as

[TABLE]

This function $\mathsf{Pr}^{S}$ will then be an NAP probability function in the sense of [Benci et al 2013].

Now in an exactly similar way, we define a class probability function $\mathsf{Pr}^{+}_{\mathcal{U}}$ on $V$ , using the probability functions on ‘small’ classes (i.e., sets) and ultrafilters on ‘small’ classes which (given proposition 7) we can now assume to have been defined on the basis of an ultrafilter $\mathcal{U}$ on $[V]^{<Card}$ with which we start. The function $\mathsf{Pr}^{+}_{\mathcal{U}}$ is total, regular, and uniform for the same reasons as why its ‘smaller cousin’ $\mathsf{Pr}_{\mathcal{U}}$ has these properties.

We now check coherence. We will do this only for straight probabilities rather than random variables in general, as although coherence holds for random variables also, it is much more technical to state. Below we use $\mathsf{Pr}(A)$ to denote $\mathsf{Pr}(\iota\in A)$ where $\iota$ s the identity random variable.

Proposition 8

For any class $A$ and sets $T\subset S$ with $|T|<|S|$ we have

[TABLE]

Proof.* We show by induction on $|T|$ that that the above holds for all $S\supset T$ with $|S|>|T|$ . Strictly speaking, the range of $\mathsf{Pr}^{T}$ may be a different non-archimedean field to the range of $\mathsf{Pr}^{S}$ , but there is a natural embedding of the former into the latter defined by $i([f]_{\mathcal{U}_{T}})=[\bar{f}]_{\mathcal{U}_{S}}$ where for $X\in S^{<|S|}$ , $\bar{f}(X)=f(X\cap T)$ . This is well-defined as $\{X\in S^{<|S|}:|X\cap T|<|T|\}=(R^{T})_{S}\in\mathcal{U}_{S}$ .*

Using this embedding we have $i(\mathsf{Pr}^{T}(A))=i([f_{A}]_{\mathcal{U}_{T}})=[\bar{f_{A}}]_{\mathcal{U}_{S}}$ . Now for $X\in(R^{T})_{S}(\in{\mathcal{U}_{S}})$ we have:

[TABLE]

As $X\in(R^{T})_{S}$ we have $|X\cap T|<|T|$ so by our inductive hypothesis

[TABLE]

*But by definition, $\big{[}\frac{f_{A\cap T}}{f_{T}}\big{]}_{\mathcal{U}_{S}}=\mathsf{Pr}^{S}(A|T)$ , so $[\bar{f_{A}}]_{\mathcal{U}_{S}}=\mathsf{Pr}^{S}(A|T)$ and we’re done. *

4.4 Comparison of the finite snapshot approach and the bootstrapping approach

In our definition of the probability of a set theoretic property, the probability $\mathsf{Pr}^{+}_{\mathcal{U}}(\theta\in A)$ of a property $A$ is determined by the probabilities $Pr^{S}(\theta\in A)$ of $A$ on large ‘snapshots’ $S$ , where a probability $Pr^{S}(\theta\in A)$ (for $S$ a large set) is then in turn determined by the probabilities $Pr^{S^{\prime}}(\theta\in A$ for $S^{\prime}$ being smaller ‘snapshots’ than $S$ , and so on. Conceptually, the definition in section 4.3 is superior to the simpler definition suggested from section 2: we want to take the behaviour of the property on as many and as large ‘snapshots‘ as possible into account.

It is not straightforward to compare the simple and the more involved definition: the simple method is based on an ultrafilter on $[V]^{<\omega}$ whereas the more involved method is based on an ultrafilter on $V=[V]^{<Card}$ .

The obvious suggestion is to base the comparison on the relation between a probability function determined by an ultrafilter $\mathcal{U}$ on $[V]^{<Card}$ and its restriction666This is a different notion of restriction to that defined in the previous section as here we are only restricting the index, while the underlying class remains the same ( $V$ ). to $[V]^{<\omega}$ defined as $\mathcal{U}\upharpoonright\omega=\{X\cap[V]^{<\omega}|X\in\mathcal{U}\}$ . But:

Proposition 9

Not all ultrafilters on $[V]^{<Card}$ restrict to ultrafilters on to $[V]^{<\omega}$ .

Proof.* Consider $\mathcal{A}\cup\overline{[V]^{<\omega}}$ , where $\mathcal{A}$ is the set of atoms (guaranteeing fine-ness) and $\overline{[V]^{<\omega}}$ is the relative complement of ${[V]^{<\omega}}$ in ${[V]^{<Card}}$ . Then $\mathcal{A}\cup\overline{[V]^{<\omega}}$ has the finite intersection property and so can be extended to a fine ultrafilter $\mathcal{U}$ on ${[V]^{<Card}}$ . But $\emptyset\in\mathcal{U}\upharpoonright\omega$ . So $\mathcal{U}$ does not restrict to an ultrafilter on $[V]^{<\omega}$ . * **

On the other hand, every fine ultrafilter on $[V]^{<Card}$ restricting to an ultrafilter on $[V]^{<\omega}$ essentially is an ultrafilter on $[V]^{<\omega}$ :

Proposition 10

Suppose $\mathcal{U}$ is a fine ultrafilter on $[V]^{<Card}$ restricting to an ultrafilter $\mathcal{U}\upharpoonright\omega$ on $[V]^{<\omega}$ . Then $[V]^{<\omega}\in\mathcal{U}$ .

Proof.* Since $\mathcal{U}$ is ultra, we have $[V]^{<\omega}\in\mathcal{U}$ or $\overline{[V]^{<\omega}}\in\mathcal{U}$ . But if $\overline{[V]^{<\omega}}\in\mathcal{U}$ , then $\emptyset\in\mathcal{U}\upharpoonright\omega$ , so that $\mathcal{U}$ does not restrict, contradicting the assumption. So $[V]^{<\omega}\in\mathcal{U}$ . * **

This means that the essentially involved probability functions on $V$ cannot be reduced to ‘simple’ probability functions on $V$ .

5 Conclusion

In this article we have explored two methods for modelling, by means of non-Archimedean probability functions, the properties of random variables ranging over the set theoretic universe: the finite snapshot method and the bootstrapping method. Concerning the finite snapshot method, we found that many of the probabilistic properties that seem intuitively plausible can be satisfied. The bootstrapping method is more satisfying from a conceptual point of view, but we have only been able to show that the resulting probability functions satisfy minimal requirements. So much work remains to be done.

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Alon et al 2000] Alon, N. & Spencer, J. The Probabilistic Method. Second edition, Wiley, 2000.
2[Benci et al 2003] Benci, V. & Di Nasso, M., Numerosities of labelled sets. A new way of counting. Adv. Math. 173 (2003), p. 50–67.
3[Benci et al 2007] Benci, V., Di Nasso, Mauro, Forti, M. An Euclidean measure of size for mathematical universes. Logique et Analyse 50 (2007), p. 43–62.
4[Benci et al 2013] Benci, V., Horsten, H., Wenmackers, S. Non-Archimedean probability , Milan Journal of Mathematics 81 (2013), p. 121–151.
5[Benci et al 2018] Benci, V. , Horsten, L. Wenmackers, S., Infinitesimal probabilities. British Journal for the Philosophy of Science 69 (2018), p. 509–552.
6[Brickhill et al 2018] Brickhill, H. & Horsten, L. Triangulating non-Archimedean probability. Review of Symbolic Logic 11 (2018). p. 519–546.
7[Freiling 1986] Freiling, C. Axioms of infinity. Throwing darts at the real number line. Journal of Symbolic Logic 51 (1986), p. 190–200.
8[Hamkins 2015] Hamkins, J. Is the dream solution to the continuum hypothesis attainable? Notre Dame Journal of Formal Logic 56.1 (2015), p. 135–145.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Sets and Probability††thanks: Versions of this paper have been presented a Bristol–Leuven workshop on Logic and Philosophy of Science (2015), at the Philosophy Department of the Universidade Federal do Rio

Abstract

1 Introduction

2 The finite snapshot approach

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

3 Constraints

3.1 Elementary properties

Proposition 1

Definition 6

Definition 7** (regularity)**

Definition 8** (uniformity)**

Proposition 2

Definition 9** (Euclidean)**

Proposition 3

Definition 10

Proposition 4

Proposition 5

3.2 Symmetry principles

Proposition 6

Definition 11

Definition 12** (Difference Principle)**

3.3 Probability and cardinality

3.3.1 Hume’s principle for probability

Definition 13** (Hume’s principle for probability)**

3.3.2 Superregularity

Definition 14** (Superregularity)**

Theorem 1

Definition 15** (Cantor’s Principle)**

3.3.3 The power set principle

Definition 16** (Power Set Condition)**

Theorem 2

Definition 17

Definition 18

Question 1

3.4 The ordinals

Theorem 3

4 The bootstrapping approach

4.1 The rough idea

4.2 Details 1: Restrictions of fine ultrafilters

Definition 19

Definition 20

Proposition 7

Consequence 1

4.3 Details 2: defining probability functions

Proposition 8

4.4 Comparison of the finite snapshot approach and the bootstrapping approach

Proposition 9

Proposition 10

5 Conclusion

Definition 7 (regularity)

Definition 8 (uniformity)

Definition 9 (Euclidean)

Definition 12 (Difference Principle)

Definition 13 (Hume’s principle for probability)

Definition 14 (Superregularity)

Definition 15 (Cantor’s Principle)

Definition 16 (Power Set Condition)