Aiming Low Is Harder -- Induction for Lower Bounds in Probabilistic   Program Verification

Marcel Hark; Benjamin Lucien Kaminski; J\"urgen Giesl; Joost-Pieter; Katoen

arXiv:1904.01117·cs.LO·August 12, 2021

Aiming Low Is Harder -- Induction for Lower Bounds in Probabilistic Program Verification

Marcel Hark, Benjamin Lucien Kaminski, J\"urgen Giesl, Joost-Pieter, Katoen

PDF

TL;DR

This paper introduces a new inductive rule for verifying lower bounds on expected values and runtimes in probabilistic loops, simplifying the process by avoiding the need to compute limits of sequences.

Contribution

The paper presents a novel inductive rule that simplifies lower bound verification in probabilistic program analysis by eliminating the need for limit calculations.

Findings

01

The new rule effectively verifies lower bounds without sequence limits.

02

Simplifies probabilistic loop analysis by applying loop body semantics finitely.

03

Enhances the efficiency of probabilistic program verification methods.

Abstract

We present a new inductive rule for verifying lower bounds on expected values of random variables after execution of probabilistic loops as well as on their expected runtimes. Our rule is simple in the sense that loop body semantics need to be applied only finitely often in order to verify that the candidates are indeed lower bounds. In particular, it is not necessary to find the limit of a sequence as in many previous rules.

Tables2

Table 1. Table 1 . Rules for the wp –transformer.

$\tensor * [_{⟨ φ, C ⟩}^{wp}] Φ {}_{f}{(X)} = [\neg φ] \cdot f + [φ] \cdot wp ⟦ C ⟧ (X) \begin{matrix} characteristic \\ function \end{matrix}$
$𝑪$	$wp ⟦ 𝑪 ⟧ (𝒇)$
skip	$f$
$b := e$	$f [b / e]$
$if (φ) {C_{1}} else {C_{2}}$	$[φ] \cdot wp ⟦ C_{1} ⟧ (f) + [\neg φ] \cdot wp ⟦ C_{2} ⟧ (f)$
${C_{1}} [p] {C_{2}}$	$p \cdot wp ⟦ C_{1} ⟧ (f) + (1 - p) \cdot wp ⟦ C_{2} ⟧ (f)$
$C_{1} ⨟ C_{2}$	$wp ⟦ C_{1} ⟧ (wp ⟦ C_{2} ⟧ (f))$
$while (φ) {C^{'}}$	$lfp \tensor * [_{⟨ C^{'}, φ ⟩}^{wp}] Φ_{f}$

Table 2. Table 2 . Rules for the ert –transformer.

$\tensor * [_{⟨ φ, C ⟩}^{ert}] Φ {}_{t}{(X)} = 1 + [\neg φ] \cdot t + [φ] \cdot ert ⟦ C ⟧ (X) \begin{matrix} characteristic \\ function \end{matrix}$
$𝑪$	$ert ⟦ 𝑪 ⟧ (𝒕)$
skip	$1 + t$
$b := e$	$1 + t [b / e]$
$if (φ) {C_{1}} else {C_{2}}$	$1 + [φ] \cdot ert ⟦ C_{1} ⟧ (t) + [\neg φ] \cdot ert ⟦ C_{2} ⟧ (t)$
${C_{1}} [p] {C_{2}}$	$1 + p \cdot ert ⟦ C_{1} ⟧ (t) + (1 - p) \cdot ert ⟦ C_{2} ⟧ (t)$
$C_{1} ⨟ C_{2}$	$ert ⟦ C_{1} ⟧ (ert ⟦ C_{2} ⟧ (t))$
$while (φ) {C^{'}}$	$lfp \tensor * [_{⟨ C^{'}, φ ⟩}^{ert}] Φ_{t}$

Equations655

Φ (I) ⊑ I implies lfp Φ ⊑ I,

Φ (I) ⊑ I implies lfp Φ ⊑ I,

I ⊑ Φ (I) implies I ⊑ lfp Φ,

I ⊑ Φ (I) implies I ⊑ lfp Φ,

✓

✓

x := N \fatsemi

x := N \fatsemi

while (0 < x) {

i := N + 1 \fatsemi

while (x < i) {i := Unif [1.. N]} \fatsemi

x := x - 1

},

\displaystyle\mathbb{F}~{}{}={}~{}\Bigl{\{}f~{}\Big{|}~{}f\colon\Sigma\rightarrow\overline{\mathbb{R}}_{\geq 0}\Bigr{\}}~{},

\displaystyle\mathbb{F}~{}{}={}~{}\Bigl{\{}f~{}\Big{|}~{}f\colon\Sigma\rightarrow\overline{\mathbb{R}}_{\geq 0}\Bigr{\}}~{},

f_{1} ⪯ f_{2} iff \forall s \in Σ : f_{1} (s) \leq f_{2} (s) .

f_{1} ⪯ f_{2} iff \forall s \in Σ : f_{1} (s) \leq f_{2} (s) .

\displaystyle g(s_{0})~{}{}={}~{}\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)(s_{0})~{}{}={}~{}\int_{\Sigma}~{}f~{}d\,(\phantom{}^{s_{0}}\mu_{C})~{}.

\displaystyle g(s_{0})~{}{}={}~{}\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)(s_{0})~{}{}={}~{}\int_{\Sigma}~{}f~{}d\,(\phantom{}^{s_{0}}\mu_{C})~{}.

wp : pGCL \to F \to F

wp : pGCL \to F \to F

{b := b + 5} [\nicefrac 45] {b := 10} .

{b := b + 5} [\nicefrac 45] {b := 10} .

g^{'} ⋈ g

g^{'} ⋈ g

g=\textsf{{wp}}\left\llbracket{C^{\prime}}\right\rrbracket\left({f}\right)

C^{'}

f

\displaystyle{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{=}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{\tfrac{4b}{5}+6}}}

\displaystyle{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{=}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{\tfrac{4b}{5}+6}}}

\displaystyle\boldsymbol{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\text{\tiny$\textsf{{wp}}$}}}\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{\tfrac{4}{5}\cdot(b+5)+\tfrac{1}{5}\cdot 10}}}

{b := b + 5} [\nicefrac 45] {b := 10}

f

Φ (I) ⊑ I implies lfp Φ ⊑ I .

Φ (I) ⊑ I implies lfp Φ ⊑ I .

\displaystyle\Phi_{f}(I)~{}{}\preceq{}~{}I\quad\textnormal{implies}\quad\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}{}\preceq{}~{}I~{}.

\displaystyle\Phi_{f}(I)~{}{}\preceq{}~{}I\quad\textnormal{implies}\quad\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}{}\preceq{}~{}I~{}.

while (a \neq = 0) {{a := 0} [\nicefrac 12] {b := b + 1}},

while (a \neq = 0) {{a := 0} [\nicefrac 12] {b := b + 1}},

\displaystyle{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{\bowtie}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{I}}}

\displaystyle{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{\bowtie}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{I}}}

g = [\neg φ] \cdot f + [φ] \cdot I^{''}

while (φ) {

\displaystyle\qquad{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{\bowtie}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{I^{\prime\prime}}}}

\displaystyle\qquad\boldsymbol{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\text{\tiny$\textsf{{wp}}$}}}\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{I^{\prime}}}}

Body

I

f

\displaystyle{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{\preceq}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{b+\left[{a\neq 0}\right]}}}

\displaystyle{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{\preceq}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{b+\left[{a\neq 0}\right]}}}

\displaystyle\boldsymbol{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\Phi}}\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{\left[{a=0}\right]\cdot b~{}{}+{}~{}\left[{a\neq 0}\right]\cdot\Bigl{(}b+\tfrac{1}{2}\bigl{(}1+\left[{a\neq 0}\right]\bigr{)}\Bigr{)}}}}

while (a \neq = 0) {

\displaystyle\qquad{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{=}}}\boldsymbol{\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{b~{}{}+{}~{}\tfrac{1}{2}\bigl{(}1+\left[{a\neq 0}\right]\bigr{)}}}}

\displaystyle\qquad\boldsymbol{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{}^{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\text{\tiny$\textsf{{wp}}$}}}\!\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{\tfrac{1}{2}\bigl{(}b+\left[{0\neq 0}\right]~{}{}+{}~{}b+1+\left[{a\neq 0}\right]\bigr{)}}}}

{a := 0} [\nicefrac 12] {b := b + 1}

\displaystyle\qquad\boldsymbol{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}{\talloblong}\!{\talloblong}\>\vphantom{G^{\prime}}{b+\left[{a\neq 0}\right]}}}~{}\}

\displaystyle\boldsymbol{{\color[rgb]{0.375,0.375,0.375}\definecolor[named]{pgfstrokecolor}{rgb}{0.375,0.375,0.375}\pgfsys@color@gray@stroke{0.375}\pgfsys@color@gray@fill{0.375}\!\!{\fatslash}\!\!{\fatslash}~{}~{}\vphantom{G^{\prime}}{b}}}

\displaystyle\Phi_{f}(I)~{}{}={}~{}\left[{\neg\varphi}\right]\cdot f+\left[{\varphi}\right]\cdot\textsf{{wp}}\left\llbracket{\mathit{Body}}\right\rrbracket\left({I}\right)~{}{}\preceq{}~{}\left[{\neg\varphi}\right]\cdot f+\left[{\varphi}\right]\cdot I^{\prime\prime}~{}{}={}~{}g~{}{}\preceq{}~{}I

\displaystyle\Phi_{f}(I)~{}{}={}~{}\left[{\neg\varphi}\right]\cdot f+\left[{\varphi}\right]\cdot\textsf{{wp}}\left\llbracket{\mathit{Body}}\right\rrbracket\left({I}\right)~{}{}\preceq{}~{}\left[{\neg\varphi}\right]\cdot f+\left[{\varphi}\right]\cdot I^{\prime\prime}~{}{}={}~{}g~{}{}\preceq{}~{}I

Φ^{ω} (I) = n \to ω lim Φ^{n} (I) = sup {Φ^{n} (I) ∣ n \in N} \in D,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Aiming Low Is Harder

Induction for Lower Bounds in Probabilistic Program Verification

Marcel Hark

RWTH Aachen UniversityGermany

[email protected]

,

Benjamin Kaminski

RWTH Aachen UniversityGermany

[email protected]

,

Jürgen Giesl

RWTH Aachen UniversityGermany

[email protected]

and

Joost-Pieter Katoen

RWTH Aachen UniversityGermany

[email protected]

(2020)

Abstract.

We present a new inductive rule for verifying lower bounds on expected values of random variables after execution of probabilistic loops as well as on their expected runtimes. Our rule is simple in the sense that loop body semantics need to be applied only finitely often in order to verify that the candidates are indeed lower bounds. In particular, it is not necessary to find the limit of a sequence as in many previous rules.

probabilistic programs, verification, weakest precondition, weakest preexpectation, lower bounds, optional stopping theorem, uniform integrability

††copyright: none††doi: 10.1145/3371105††journalyear: 2020††journal: PACMPL††journalvolume: 4††journalnumber: POPL††publicationmonth: 1††ccs: Mathematics of computing Probabilistic algorithms††ccs: Mathematics of computing Markov processes††ccs: Theory of computation Denotational semantics

1. Introduction and Overview

We study probabilistic programs featuring discrete probabilistic choices as well as unbounded loops. Randomized algorithms are the classical application of such programs. Recently, applications in biology, quantum computing, cyber security, machine learning, and artificial intelligence led to rapidly growing interest in probabilistic programming (Gordon et al., 2014).

Formal verification of probabilistic programs is strictly harder than for nonprobabilistic programs (Kaminski et al., 2019). Given a random variable $f$ , a key verification task is to reason about the expected value of $f$ after termination of a program $C$ on input $s$ . If $f$ is the indicator function of an event $A$ , then this expected value is the probability that $A$ has occurred on termination of $C$ .

For verifying probabilistic loops, most approaches share a common, conceptually very simple, technique: an induction rule for verifying upper bounds on expected values, which are characterized as least fixed points (lfp) of a suitable function $\Phi$ . This rule, called “Park induction”, reads

[TABLE]

i.e., for a candidate upper bound $I$ we check $\Phi(I)\sqsubseteq I$ (for a suitable partial order $\sqsubseteq$ ) to prove that $I$ is indeed an upper bound on the least fixed point, and hence on the sought–after expected value.

For lower bounds, a simple proof principle analogous to Park induction, namely

[TABLE]

is unsound in general. Sound rules (see Sect. 9), on the other hand, often suffer from the fact that either $f$ needs to be bounded, or that one has to find the limit of some sequence, as well as the sequence itself, rendering those rules conceptually much more involved than Park induction.

Our main contribution (Sect. 5, Thm. 9) is to provide relatively simple side conditions that can be added to the (unsound) implication above, such that the implication becomes true, i.e.,

[TABLE]

In particular, our side conditions will be simple in the sense that (a variation of) $\Phi$ needs to be applied to a candidate $I$ only a finite number of times, which is beneficial for potential automation.

The need for verifying lower bounds on expected values is quite natural: First of all, they help to assess the quality and tightness of upper bounds. Moreover, giving total correctness guarantees for probabilistic programs amounts to lower–bounding the correctness probability, e.g., in order to establish membership in complexity classes like RP and PP.

In addition to expected values of random variables at program termination, lower bounds on expected runtimes are also of significant interest: Lower bounds on expected runtimes which depend on secret program variables may compromise the secret, thus allowing for timing side–channel attacks; “very large” lower bounds could indicate potential denial–of–service attacks.

In order to enable practicable reasoning about lower bounds on expected runtimes, we will show how our inductive lower bound rule carries over to expected runtimes (Sect. 8, Thm. 3). As an example to show the applicability of our rule, we will verify that the well–known and notoriously difficult coupon collector’s problem (Motwani and Raghavan, 1995) (Sect. 8, Ex. 4), modeled by the probabilistic program111The random assignment $i\mathrel{\textnormal{{:=}}}\mathrm{Unif}[1..N]$ does not — strictly speaking — adhere to our syntax of binary probabilistic choices, but it can be modeled in our syntax. For the sake of readability, we opted for $i\mathrel{\textnormal{{:=}}}\mathrm{Unif}[1..N]$ .

[TABLE]

has an expected runtime of at least $N\mathcal{H}_{N}$ , where $\mathcal{H}_{N}$ is the $N$ -th harmonic number.

Our new inductive rules will be stated in terms of so–called expectation transformers (McIver and Morgan, 2005) (Sect. 2) and rely on the notions of uniform integrability (Sect. 3, in particular 3.4, and Sect. 4), martingales, conditional difference boundedness, and the Optional Stopping Theorem (Sect. 5) from the theory of stochastic processes. However, we do not only make use of these notions in order to prove soundness of our induction rule, but instead establish tight connections in terms of these notions between expectation transformers and certain canonical stochastic processes (Sect. 4, Thm. 14 and Sect. 5, Thm. 8). In particular, we will build upon the key result of this connection (Thm. 14) to study exactly how inductive proof rules for both upper and lower bounds can be understood in the realm of these stochastic processes and vice versa (Sect. 5, Thm. 9 and Sect. 7). We see those connections between the theories of expectation transformers and stochastic processes as a stepping stone for applying further results from stochastic process theory to probabilistic program analysis and possibly also vice versa.

As a final contribution, we revisit one of the few existing rules for lower bounds due to (McIver and Morgan, 2005), which gives sufficient criteria for a candidate being a lower bound on the expected value of a bounded function $f$ . We show that their rule is also a consequence of uniform integrability and we are moreover able to generalize their rule to a necessary and sufficient criterion (Sect. 6, Thm. 2). We demonstrate the usability of our generalization by an example (Sect. 6, Ex. 3).

The appendix contains more case studies illustrating the effectiveness of our lower bound proof rule, a more detailed introduction to probability theory, and more detailed proofs of our results.

2. Weakest Preexpectation Reasoning

Weakest preexpectations for probabilistic programs are a generalization of Dijkstra’s weakest preconditions for nonprobabilistic programs. Dijkstra employs predicate transformers, which push a postcondition $F$ (a predicate) backward through a nonprobabilistic program $C$ and yield the weakest precondition $G$ (another predicate) describing the largest set of states such that whenever $C$ is started in a state satisfying $G$ , $C$ terminates in a state satisfying $F$ .222We consider total correctness, i.e., from any state satisfying the weakest precondition $G$ , $C$ definitely terminates.

The weakest preexpecation calculus on the other hand employs expectation transformers which act on real–valued functions called expectations, mapping program states to non–negative reals.333For simplicity of the presentation, we study the standard case of positive expectations. Mixed–sign expectations mapping to the full extended reals require much more technical machinery, see (Kaminski and Katoen, 2017). These transformers push a postexpectation $f$ backward through a probabilistic program $C$ and yield a preexpectation $g$ , such that $g$ represents the expected value of $f$ after executing $C$ . The term expectation coined by (McIver and Morgan, 2005) may appear somewhat misleading at first. We clearly distinguish between expectations and expected values: An expectation is hence not an expected value, per se. Instead, we can think of an expectation as a random variable. In Bayesian network jargon, expectations are also called factors.

Definition 1 (Expectations (Kaminski, 2019; McIver and Morgan, 2005)).

Let $\mathsf{Vars}$ denote the finite set of program variables and let $\Sigma=\{s\mid s\colon\mathsf{Vars}\rightarrow\mathbb{Q}\}$ denote the set of program states.444We choose rationals to have some range of values at hand which are conveniently represented in a computer.

The set of expectations, denoted by $\mathbb{F}$ , is defined as

[TABLE]

where $\overline{\mathbb{R}}_{\geq 0}=\left\{\,{r\in\mathbb{R}}~{}\middle|~{}{r\geq 0}\,\right\}\cup\{\infty\}$ . We say that $f\in\mathbb{F}$ is finite and write $f\mathrel{{\prec}{\prec}}\infty$ , if $f(s)<\infty$ for all $s\in\Sigma$ . A partial order $\preceq$ on $\mathbb{F}$ is obtained by point–wise lifting the usual order $\leq$ on $\overline{\mathbb{R}}_{\geq 0}$ , i.e.,

[TABLE]

$(\mathbb{F},\,{\preceq})$ * is a complete lattice where suprema and infima are constructed point–wise.*

We note that our notion of expectations is more general than the one of McIver and Morgan: Their work builds almost exclusively on bounded expectations, i.e., non–negative real–valued functions which are bounded from above by some constant, whereas we allow unbounded expectations. As a result, we have that $(\mathbb{F},\,{\preceq})$ forms a complete lattice, whereas McIver and Morgan’s space of bounded expectations does not.

2.1. Weakest Preexpectations

Given program $C$ and postexpectation $f\in\mathbb{F}$ , we are interested in the expected value of $f$ evaluated in the final states reached after termination of $C$ . More specifically, we are interested in a function $g\colon\Sigma\rightarrow\overline{\mathbb{R}}_{\geq 0}$ mapping each initial state $s_{0}$ of $C$ to the respective expected value of $f$ evaluated in the final states reached after termination of $C$ on input $s_{0}$ . This function $g$ is called the weakest preexpectation of $C$ with respect to $f$ , denoted $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)$ . Put as an equation, if $\phantom{}{}^{s_{0}}\mu_{C}$ is the probability (sub)measure555 $\phantom{}{}^{s_{0}}\mu_{C}(s)\in[0,1]$ is the probability that $s$ is the final state reached after termination of $C$ on input $s_{0}$ . We have $\sum_{s\in\Sigma}\phantom{}^{s_{0}}\mu_{C}(s)\leq 1$ , where the “missing” probability mass is the probability of nontermination of $C$ on $s_{0}$ . over final states reached after termination of $C$ on initial state $s_{0}$ , then666As $\Sigma$ is countable, the integral can be expressed as $\sum_{s\in\Sigma}\phantom{}^{s_{0}}\mu_{C}(s)\cdot f(s)$ .

[TABLE]

While $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)$ in fact represents an expected value, $f$ itself does not. In an analogy to Dijkstra’s pre– and postconditions, as $f$ is evaluated in the final states after termination of $C$ it is called the postexpectation, and as $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)$ is evaluated in the initial states of $C$ it is called the preexpectation.

2.2. The Weakest Preexpectation Calculus

We now show how to determine weakest preexpectations in a systematic and compositional manner by recapitulating the weakest preexpectation calculus à la McIver and Morgan. This calculus employs expectation transformers which move backward through the program in a continuation–passing style, see Fig. 1.

If we are interested in the expected value of some postexpectation $f$ after executing the sequential composition ${C_{1}}{\,\fatsemi}~{}{C_{2}}$ , then we can first determine the weakest preexpectation of $C_{2}$ with respect to $f$ , i.e., $\textsf{{wp}}\left\llbracket{C_{2}}\right\rrbracket\left({f}\right)$ . Thereafter, we can use the intermediate result $\textsf{{wp}}\left\llbracket{C_{2}}\right\rrbracket\left({f}\right)$ as postexpectation to determine the weakest preexpectation of $C_{1}$ with respect to $\textsf{{wp}}\left\llbracket{C_{2}}\right\rrbracket\left({f}\right)$ . Overall, this gives the weakest preexpectation of ${C_{1}}{\,\fatsemi}~{}{C_{2}}$ with respect to the postexpectation $f$ . The above explanation illustrates the compositional nature of the weakest preexpectation calculus. wp–transformers for all language constructs can be defined by induction on the program structure:

Definition 2 (The wp–Transformer (McIver and Morgan, 2005)).

Let pGCL be the set of programs in the probabilistic guarded command language. Then the weakest preexpectation transformer

[TABLE]

is defined according to the rules given in Table 1, where $\left[{\varphi}\right]$ denotes the Iverson–bracket of $\varphi$ , i.e., $\left[{\varphi}\right](s)$ evaluates to $1$ if $s\models\varphi$ and to [math] otherwise. Moreover, for any variable $b\in\mathsf{Vars}$ and any expression $e$ , let $f\left[{b}\middle/{e}\right]$ be the expectation with $f\left[{b}\middle/{e}\right](s)=f(s\left[{b}\middle/{e}\right])$ for any $s\in\Sigma$ , where $s\left[{b}\middle/{e}\right](b)=s(e)$ and $s\left[{b}\middle/{e}\right](x)=s(x)$ for all $x\in\mathsf{Vars}\setminus\{b\}$ .

We call the function $\tensor*[^{\smash{\textsf{{wp}}}}_{\smash{\langle\varphi,C\rangle}}]{\Phi}{{}_{{f}}}$ the characteristic function of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $f$ . Its least fixed point is understood in terms of $\preceq$ . To increase readability, we omit wp, $\varphi$ , $C$ , or $f$ from $\Phi$ whenever they are clear from the context.

Example 3 (Applying the wp Calculus).

Consider the probabilistic program $C$ given by

[TABLE]

Suppose we want to know the expected value of $b$ after execution of $C$ . For this, we determine $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({b}\right)$ . Using the annotation style shown in Fig. 2(a), we can annotate the program $C$ as shown in Fig. 2(b), using the rules from Table 1. At the top, we read off the weakest preexpectation of $C$ with respect to $b$ , namely $\smash{\tfrac{4b}{5}+6}$ . This tells us that the expected value of $b$ after termination of $C$ on $s_{0}$ is equal to $\smash{\tfrac{4\cdot s_{0}(b)}{5}+6}$ .

The wp–transformer satisfies what is sometimes called healthiness conditions (Hino et al., 2016; Keimel, 2015; McIver and Morgan, 2005) or homomorphism properties (Back and von Wright, 1998):

Theorem 4 (Healthiness Conditions (Kaminski, 2019; McIver and Morgan, 2005)).

Let $C\in\textnormal{{{pGCL}}}$ , $S=\{s_{0}\preceq s_{1}\preceq s_{2}\preceq\cdots\}\subseteq\mathbb{F}$ ,777That is, $S$ is a chain. $f,g\in\mathbb{F}$ , and $r\in\mathbb{R}_{\geq 0}$ . Then:

(1)

Continuity:* $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({\sup~{}S}\right)~{}{}={}~{}\sup~{}\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({S}\right)$ .* 2. (2)

Strictness:888Here, we overload notation and denote by [math] the constant expectation that maps every $s\in\Sigma$ to [math].* $\textsf{{wp}}\left\llbracket{C}\right\rrbracket$ is strict, i.e., $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({0}\right)=0$ .* 3. (3)

Monotonicity:* $f~{}{}\preceq{}~{}g\quad\textnormal{implies}\quad\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({g}\right)$ .* 4. (4)

Linearity:* $\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({r\cdot f+g}\right)~{}{}={}~{}r\cdot\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)+\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({g}\right)$ .*

3. Bounds on Weakest Preexpectations

For loop–free programs, it is generally straightforward to determine weakest preexpectations, simply by applying the rules in Table 1, which guide us along the syntax of $C$ , see Ex. 3. Weakest preexpectations of loops, on the other hand, are generally non–computable least fixed points and we often have to content ourselves with some approximation of those fixed points.

For us, a sound approximation is either a lower or an upper bound on the least fixed point. There are in principle two challenges: (1) finding a candidate bound and (2) verifying that the candidate is indeed an upper or lower bound. In this paper, we study the latter problem.

3.1. Upper Bounds

The Park induction principle provides us with a very convenient proof rule for verifying upper bounds. In general, this principle reads as follows:

Theorem 1 (Park Induction (Park, 1969)).

Let $(D,\,{\sqsubseteq})$ be a complete lattice and let $\Phi\colon D\rightarrow D$ be continuous.999It would even suffice for $\Phi$ to be monotonic, but we consider continuous functions throughout this paper. Then $\Phi$ has a least fixed point in $D$ and for any $I\in D$ ,

[TABLE]

In the realm of weakest preconditions, Park induction gives rise to the following induction principle:

Corollary 2 (Park Induction for wp (Kozen, 1985; Kaminski, 2019)).

Let $\Phi_{f}$ be the characteristic function of the while loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to postexpectation $f$ and let $I\in\mathbb{F}$ . Then

[TABLE]

We call an $I$ that satisfies $\Phi_{f}(I)\preceq I$ a superinvariant. The striking power of Park induction is its simplicity: Once an appropriate candidate $I$ is found (even though this is usually not an easy task), all we have to do is push it through the characteristic function $\Phi_{f}$ once and check whether we went down in our underlying partial order. If this is the case, we have verified that $I$ is indeed an upper bound on the least fixed point and thus on the sought–after weakest preexpectation.

Example 3 (Induction for Upper Bounds).

Consider the program $C_{\mathit{geo}}$ , given by

[TABLE]

where we assume $b\in\mathbb{N}$ . Suppose we aim at an upper bound on the expected value of $b$ executing $C_{\mathit{geo}}$ . Using the annotation style of Fig. 3(a), we can annotate the loop $C_{\mathit{geo}}$ as shown in Fig. 3(b), using superinvariant $I=b+\left[{a\neq 0}\right]$ , establishing $\Phi_{b}(I)\preceq I$ , and by Cor. 2 establishing $\textsf{{wp}}\left\llbracket{C_{\mathit{geo}}}\right\rrbracket\left({b}\right)\preceq b+\left[{a\neq 0}\right]$ . So the expected value of $b$ after termination of $C_{\mathit{geo}}$ on $s_{0}$ is at most $\smash{s_{0}(b)+\left[{s_{0}(a)\neq 0}\right]}$ .

Let us explain why Park induction is sound using the so-called Tarski-Knaster Fixed Point Theorem:101010 In (Hark et al., 2020), we explain Park induction via an extended version of Thm. 5. However, this explanation was incorrect and is fixed here. This does not have any consequences regarding the results of (Hark et al., 2020).

Theorem 4 (Tarski-Knaster Fixed Point Theorem (Tarski, 1955; Knaster, 1928)).

Let $(D,\,{\sqsubseteq})$ be a complete lattice and $\Phi\colon D\to D$ be continuous. Then the set of fixed points of $\Phi$ is a complete lattice. Here, $\inf\{U\in D\mid\Phi(U)\sqsubseteq U\}$ is the least fixed point of $\Phi$ and $\sup\{I\in D\mid I\sqsubseteq\Phi(I)\}$ is the greatest fixed point of $\Phi$ .

Again, for this theorem it would even suffice for $\Phi$ to be only monotonic.

In our setting, the characterization of the least fixed point in Thm. 4 proves soundness of Park induction (recall that $\Phi$ is continuous, see Thm. 4). For making a comparison to the lower bound case which we consider later, let us introduce the so-called Tarski-Kantorovich Principle:

Theorem 5 (Tarski-Kantorovich Principle, see (Jachymski

et al., 2000)).

Let $(D,\,{\sqsubseteq})$ be a complete lattice, $\Phi\colon D\to D$ be continuous, and let $I\in D$ such that $I\sqsubseteq\Phi(I)$ . Then the sequence $I\sqsubseteq\Phi(I)\sqsubseteq\Phi^{2}(I)\sqsubseteq\Phi^{3}(I)\sqsubseteq{\cdots}$ is an ascending chain that converges to an element

[TABLE]

which is a fixed point of $\Phi$ . In particular, $\Phi^{\omega}(I)$ is the least fixed point of $\Phi$ that is ${}\sqsupseteq{}I$ .

The well–known Kleene Fixed Point Theorem (cf. (Lassez et al., 1982)), which states that $\textnormal{{{lfp}}}~{}\Phi=\Phi^{\omega}(\bot)$ , where $\bot$ is the least element of $D$ , is a special case of the Tarski–Kantorovich Principle.

3.2. Lower Bounds

For verifying lower bounds, we do not have a rule as simple as Park induction available. In particular, for a given complete lattice $(D,\,{\sqsubseteq})$ and monotonic function $\Phi\colon D\rightarrow D$ , the rule

[TABLE]

is unsound in general. We call an $I$ satisfying $I\sqsubseteq\Phi(I)$ a subinvariant and the above rule simple lower induction. Generally, we will call an $I$ that is a sub– or a superinvariant an invariant. $I$ being an invariant thus expresses mainly its inductive nature, namely that $I$ is comparable with $\Phi(I)$ with respect to the partial order $\sqsubseteq$ .

An explanation why simple lower induction is unsound is as follows: By Thm. 5, we know from $I\sqsubseteq\Phi(I)$ that $\Phi^{\omega}(I)$ is the least fixed point of $\Phi$ that is greater than or equal to $I$ . Since $\Phi^{\omega}(I)$ is a fixed point, $\Phi^{\omega}(I)\sqsubseteq\textnormal{{{gfp}}}~{}\Phi$ holds, but we do not know how $I$ compares to $\textnormal{{{lfp}}}~{}\Phi$ . We only know that if indeed $I\sqsubseteq\textnormal{{{lfp}}}~{}\Phi$ and $I\sqsubseteq\Phi(I)$ , then iterating $\Phi$ on $I$ also converges to $\textnormal{{{lfp}}}~{}\Phi$ , i.e.,

[TABLE]

If, however, $I\sqsubseteq\Phi(I)$ and $I$ is strictly greater than $\textnormal{{{lfp}}}~{}\Phi$ , then iterating $\Phi$ on $I$ will yield a fixed point strictly greater than $\textnormal{{{lfp}}}~{}\Phi$ , contradicting soundness of simple lower induction.

While we just illustrated by means of the Tarski–Kantorovich principle why the simple lower induction rule is not sound in general, we should note that the rule is not per se absurd: So called metering functions (Frohn et al., 2016) basically employ simple lower induction to verify lower bounds on runtimes of nonprobabilistic programs (Kaminski, 2019, Thm. 7.18). For weakest preexpectations, however, simple lower induction is unsound:

Counterexample 6 (Simple Induction for Lower Bounds).

Consider the following loop $C_{\mathit{cex}}$ , where $b,k\in\mathbb{N}$

[TABLE]

As in Ex. 3, $\textsf{{wp}}\left\llbracket{C_{\mathit{cex}}}\right\rrbracket\left({b}\right)=b+\left[{a\neq 0}\right]$ . In particular, this weakest preexpectation is independent of $k$ . The corresponding characteristic function is

[TABLE]

Let us consider ${I}^{\prime}=b+\left[{a\neq 0}\right]\cdot(1+2^{k})$ , which does depend on $k$ . Indeed, one can check that ${I}^{\prime}\preceq\Phi_{b}({I}^{\prime})$ , i.e., $I^{\prime}$ is a subinvariant. If the simple lower induction rule were sound, we would immediately conclude that ${I}^{\prime}$ is a lower bound on $\textsf{{wp}}\left\llbracket{C_{\mathit{cex}}}\right\rrbracket\left({b}\right)$ , but this is obviously false since

[TABLE]

3.3. Problem Statement

The purpose of this paper is to present a sound lower induction rule of the following form: Let $\Phi_{f}$ be the characteristic function of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $f$ and let $I\in\mathbb{F}$ . Then

[TABLE]

We still want our lower induction rule to be simple in the sense that checking the side conditions should be conceptually as simple as checking $I\preceq\Phi_{f}(I)$ . Intuitively, we want to apply the semantics of the loop body only finitely often, not $\omega$ times, to avoid reasoning about limits of sequences or anything alike. We provide such side conditions in our main contribution, Thm. 9, which transfers the Optional Stopping Theorem of probability theory to weakest preexpectation reasoning.

3.4. Uniform Integrability

We now present a sufficient and necessary criterion to under–approximate the least fixed points that we seek for. Let again $\Phi_{f}$ be the characteristic function of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $f$ . Thm. 4 implies that $\Phi_{f}$ is continuous and monotonic.

Let us consider a subinvariant $I$ , i.e., $I \preceq\Phi_{f}(I)$ . If we iterate $\Phi_{f}$ on $I$ ad infinitum, then the Tarski–Kantorovich principle (Thm. 5) guarantees that we will converge to some fixed point $\Phi_{f}^{\omega}(I)$ that is ${\succeq}\,I$ . From continuity of $\Phi_{f}$ and Thm. 5, one can easily show that $\Phi_{f}^{\omega}(I)$ coincides with $\textnormal{{{lfp}}}~{}\Phi_{f}$ if and only if $I$ itself was already ${\preceq}\,\textnormal{{{lfp}}}~{}\Phi_{f}$ , i.e.:

Theorem 7 (Subinvariance and Lower Bounds).

For any subinvariant $I$ , we have

[TABLE]

More generally, for any expectation $X$ (not necessarily a sub- or superinvariant), if iterating $\Phi_{f}$ on $X$ converges to the least fixed point of $\Phi_{f}$ , then we call $X$ uniformly integrable for $f$ :

Definition 8 (Uniform Integrability of Expectations).

Given a loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ , an expectation $X\in\mathbb{F}$ is called uniformly integrable (u.i.) for $f\in\mathbb{F}$ if $\lim\limits_{n\to\omega}\Phi_{f}^{n}(X)$ exists and

[TABLE]

So far, we have thus established the following diagram which we will gradually extend over the next two sections:

${I~{}\textnormal{u.i.\ for}~{}f\quad}$${\quad\Phi_{f}^{n}(I)\xrightarrow{~{}n\to\omega~{}}\textnormal{{{lfp}}}~{}\Phi_{f}}$${I\preceq\Phi_{f}(I)\Rightarrow I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}}$ Def. 8

Thm. 7

and Def. 8

Uniform integrability (Grimmett and Stirzaker, 2001) — a notion originally from probability theory — will be essential for the Optional Stopping Theorem in Sect. 5. While, so far, we have studied the function $\Phi_{f}$ solely from an expectation transformer point of view and defined a purely expectation–theoretical notion of uniform integrability without any reference to probability theory, we will study in Sect. 4 the function $\Phi_{f}$ from a stochastic process point of view. Stochastic processes are not inductive per se, whereas expectation transformers make heavy use of induction. We will, however, rediscover the inductiveness also in the realm of stochastic processes. We will also see how our notion of uniform integrability corresponds to uniform integrability in its original sense.

4. From Expectations to Stochastic Processes

In this section, we connect concepts from expectation transformers with notions from probability theory. In Sect. 4.1, we recapitulate standard constructions of probability spaces for probabilistic programs, instantiate them in our setting, and present our new results on connecting expectation transformers with stochastic processes (Sect. 4.2) and uniform integrability (Sect. 4.3). Proofs can be found in App. C. For further background on probability theory, we refer to App. B and (Bauer, 1971; Grimmett and Stirzaker, 2001).

We fix for this section an arbitrary loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ . The loop body $C$ may contain loops but we require $C$ to be universally almost–surely terminating (AST), i.e., $C$ terminates on any input with probability 1. The set of program states can be uniquely partitioned into $\Sigma=\Sigma_{\varphi}\uplus\Sigma_{\neg\varphi}$ , with $s\in\Sigma_{\varphi}$ iff $s\models\varphi$ . The set $\Sigma_{\neg\varphi}$ thus contains the terminal states from which the loop is not executed further.

4.1. Canonical Probability Space

We begin with constructing a canonical probability measure and space corresponding to the execution of our loop. As every pGCL program is, operationally, a countable Markov chain, our construction is similar to the standard construction for Markov chains (cf. (Vardi, 1985)).

In general, a measurable space is a pair $(\Omega,\mathfrak{F})$ consisting of a sample space $\Omega$ and a $\sigma$ –field $\mathfrak{F}$ of $\Omega$ , which is a collection of subsets of $\Omega$ , closed under complement and countable union, such that $\Omega\in\mathfrak{F}$ . In our setting, a loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ induces the following canonical measurable space:

Definition 1 (Loop Space).

The loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ induces a unique measurable space $(\Omega^{\mbox{\tiny\rm loop}},\,\mathfrak{F}^{\mbox{\tiny\rm loop}})$ with sample space $\Omega^{\mbox{\tiny\rm loop}}~{}{}\coloneqq{}~{}\Sigma^{\omega}~{}{}={}~{}\{\vartheta\colon\mathbb{N}\to\Sigma\},$ i.e., it is the set of all infinite sequences of program states (so–called runs). For $\vartheta\in\Omega^{\mbox{\tiny\rm loop}}$ , we denote by $\vartheta[n]$ the $n$ –th state in the sequence $\vartheta$ (starting to count at 0). The $\sigma$ –field $\mathfrak{F}^{\mbox{\tiny\rm loop}}$ is the smallest $\sigma$ –field that contains all cylinder sets $Cyl(\pi)=\left\{\,{\pi\vartheta}~{}\middle|~{}{\vartheta\in\Sigma^{\omega}}\,\right\}$ , for all finite prefixes $\pi\in\Sigma^{+}$ , denoted as

[TABLE]

Intuitively, a run $\vartheta\in\Omega$ is an infinite sequence of states $\vartheta~{}{}={}~{}s_{0}\,s_{1}\,s_{2}\,s_{3}\,{{}\cdots{}}~{},$ where $s_{0}$ represents the initial state on which the loop is started and $s_{i}$ is a state that could be reached after $i$ iterations of the loop. Obviously, some sequences in $\Omega^{\mbox{\tiny\rm loop}}$ may not actually be admissible by our loop.

We next develop a canonical probability measure corresponding to the execution of the loop, which will assign the measure [math] to inadmissible runs. We start with considering a single loop iteration. The loop body $C$ induces a family of distributions111111Since the loop body $C$ is AST, these are distributions and not subdistributions.

[TABLE]

where $\phantom{}{}^{s}\mu_{C}({s}^{\prime})$ is the probability that after one iteration of $C$ on $s$ , the program is in state ${s}^{\prime}$ .

The loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ induces a family of probability measures on $(\Omega^{\mbox{\tiny\rm loop}},\,\mathfrak{F}^{\mbox{\tiny\rm loop}})$ . This family is parameterized by the initial state of the loop. Using the distributions $\phantom{}{}^{\bullet}\mu_{C}$ above, we first define the probability of a finite non–empty prefix of a run, i.e., for $\pi\in\Sigma^{+}$ . Here, ${}^{s}p(\pi)$ is the probability that $\pi$ is the sequence of states reached after the first loop iterations, when starting the loop in state $s$ . Hence, the family

[TABLE]

of distributions on $\Sigma^{+}$ is defined by

(1)

${}^{s}p({s}^{\prime})=\left[{s={s}^{\prime}}\right]$ 2. (2)

${}^{s}p(\pi{s}^{\prime}{s}^{\prime\prime})=\begin{cases}\phantom{}{}^{s}p(\pi{s}^{\prime})\cdot\left[{{s}^{\prime\prime}={s}^{\prime}}\right],&\text{if }{s}^{\prime}\in\Sigma_{\neg\varphi}\\ \phantom{}{}^{s}p(\pi{s}^{\prime})\cdot\phantom{}^{{s}^{\prime}}\mu_{C}({s}^{\prime\prime}),&\text{if }{s}^{\prime}\in\Sigma_{\varphi}\end{cases}$ .

Using the family ${}^{\bullet}{}p$ , we now obtain a canonical probability measure on the loop space.

Lemma 2 (Loop Measure (Feller, 1971, Kolmogorov’s Extension Theorem)).

There exists a unique family of probability measures ${}^{\bullet}{}\mathbb{P}\colon\Sigma\to\mathfrak{F}\to[0,1]$ with

[TABLE]

We now turn to random variables and their expected values. A mapping $X\colon\Omega\to\smash{\overline{\mathbb{R}}_{\geq 0}}$ on a probability space $\smash{(\Omega,\mathfrak{F},\mathbb{P})}$ is called ( $\mathfrak{F}$ –)measurable or random variable if for any open set $U\subseteq\overline{\mathbb{R}}_{\geq 0}$ its preimage lies in $\mathfrak{F}$ , i.e., $X^{-1}(U)\in\mathfrak{F}$ . If $X(\Omega)\subseteq\overline{\mathbb{N}}=\mathbb{N}\cup\{\omega\}$ , then this is equivalent to checking $X^{-1}(\{n\})\in\mathfrak{F}$ for any $n\in\mathbb{N}$ . The expected value $\phantom{}\mathbb{E}\left(X\right)$ of a random variable $X$ is defined as $\phantom{}\mathbb{E}\left(X\right)\coloneqq\int_{\Omega}Xd\mathbb{P}$ .121212Details on integrals for arbitrary measures can be found in App. B. If $X$ takes only countably many values we have

[TABLE]

We saw that $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ gives rise to a unique canonical measurable space $(\Omega^{\mbox{\tiny\rm loop}},\,\mathfrak{F}^{\mbox{\tiny\rm loop}})$ and to a family of probability measures ${}^{s}\mathbb{P}$ parameterized by the initial state $s$ on which our loop is started. We now define a corresponding parameterized expected value operator ${}^{\bullet}{}\mathbb{E}$ .

Definition 3 (Expected Value for Loops $\boldsymbol{{}^{\bullet}{}\mathbb{E}}$ ).

Let $s\in\Sigma$ and $X\colon\Omega^{\mbox{\tiny\rm loop}}\to\overline{\mathbb{R}}_{\geq 0}$ be a random variable. The expected value of $X$ with respect to the loop measure ${}^{s}\mathbb{P}$ , parameterized by state $s$ , is defined by $\phantom{}{}^{s}\mathbb{E}\left(X\right)\coloneqq\int_{\Omega}X\,d\left({}^{s}\mathbb{P}\right)$ .

Next, we define a random variable that corresponds to the number of iterations that our loop makes until it terminates.

Definition 4 (Looping Time).

The mapping

[TABLE]

is a random variable and called the looping time of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ . Here, $\overline{\mathbb{N}}=\mathbb{N}\cup\{\omega\}$ and $\inf\emptyset=\omega$ .

The canonical $\sigma$ –field $\mathfrak{F}^{\mbox{\tiny\rm loop}}$ contains infinite runs. But after $n$ iterations of the loop we only know the first $n+1$ states $s_{0}\cdots s_{n}$ of a run. Gaining knowledge in this successive fashion can be captured by a so–called filtration of the $\sigma$ –field $\mathfrak{F}^{\mbox{\tiny\rm loop}}$ . In general, a filtration is a sequence $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ of subsets of $\mathfrak{F}$ , such that $\mathfrak{F}_{n}\subseteq\mathfrak{F}_{n+1}$ and $\mathfrak{F}_{n}$ is a $\sigma$ –field for any $n\in\mathbb{N}$ , i.e., $\mathfrak{F}$ is approximated from below.

Definition 5 (Loop Filtration).

The sequence $(\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ is a filtration of $\mathfrak{F}^{\mbox{\tiny\rm loop}}$ , where

[TABLE]

i.e., $\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}}$ is the smallest $\sigma$ –field containing $\{Cyl(\pi)\mid\pi\in\Sigma^{+},~{}|\pi|=n+1\}$ .131313Note that here $\mathfrak{F}^{\mbox{\tiny\rm loop}}=\bigcup\limits_{n\in\mathbb{N}}\mathfrak{F}^{\mbox{\tiny\rm loop}}_{n}$ which is not the case for general filtrations.

Next, we recall the notion of stopping times from probability theory.

Definition 6 (Stopping Time).

For a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ with filtration $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ , a random variable $T\colon\Omega\to\overline{\mathbb{N}}$ is called a stopping time with respect to $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ if for every $n\in\mathbb{N}$ we have $T^{-1}(\{n\})=\{\vartheta\in\Omega\mid T(\vartheta)=n\}\in\mathfrak{F}_{n}$ .

Let us reconsider the looping time $T^{\neg\varphi}$ and the loop filtration $(\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ . In order to decide for a run $\vartheta=s_{0}s_{1}\cdots\in\Omega^{\mbox{\tiny\rm loop}}$ whether its looping time is $n$ , we only need to consider the states $s_{0}\cdots s_{n}$ . Hence, $(T^{\neg\varphi})^{-1}(\{n\})\in\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}}$ for any $n\in\mathbb{N}$ and thus $T^{\neg\varphi}$ is a stopping time with respect to $(\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ .

Note that $T^{\neg\varphi}$ does not reflect the actual runtime of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ , as it does not take the runtime of the loop body $C$ into account. Instead, $T^{\neg\varphi}$ only counts the number of loop iterations of the “outer loop” $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ . This enriches the class of probabilistic programs our technique will be able to analyze, as we will not need to require that the whole program has finite expected runtime, but only that the outer loop is expected to be executed finitely often.

4.2. Canonical Stochastic Process

Now we can present our novel results on the connection of weakest preexpectations and stochastic processes. Henceforth, let $f,I\in\mathbb{F}$ . Intuitively, $f$ will play the role of the postexpectation and $I$ the role of an invariant (i.e., $I$ is a sub– or superinvariant). We now present a canonical stochastic process, i.e., a sequence of random variables that captures approximating $\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)$ using the invariant $I$ .

Definition 7 (Induced Stochastic Process).

The stochastic process induced by $I$ , denoted $\mathbf{X}^{f,I}=(X_{n}^{f,I})_{n\in\mathbb{N}}$ , is given by

[TABLE]

Now, in what sense does the stochastic process $\mathbf{X}^{f,I}$ capture approximating the weakest preexpectation of our loop with respect to $f$ by invariant $I$ ? $X_{n}^{f,I}$ takes as argument a run $\vartheta$ of the loop and assigns to $\vartheta$ a value as follows: If the loop has reached a terminal state within $n$ iterations, it returns the value of the postexpectation $f$ evaluated in that terminal state. If no such terminal state is reached within $n$ steps, it simply approximates the remainder of the run, i.e.,

[TABLE]

by returning the value of the invariant $I$ evaluated in $\vartheta[n{+}1]$ . We see that $X_{n}^{f,I}$ needs at most the first $n+2$ states of a run to determine its value. Thus, $X_{n}^{f,I}$ is not $\mathfrak{F}_{n}$ –measurable but $\mathfrak{F}_{n+1}$ –measurable, as there exist runs that agree on the first $n+1$ states but yield different images under $X_{n}^{f,I}$ . Hence, we shift the loop filtration $(\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ by one.

Definition 8 (Shifted Loop Filtration).

The filtration $(\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ of $\mathfrak{F}^{\mbox{\tiny\rm loop}}$ is defined by

[TABLE]

Note that $(T^{\neg\varphi})^{-1}(\{n\})\in\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}}\subseteq\mathfrak{F}_{n+1}^{\mbox{\tiny\rm loop}}=\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ , so $T^{\neg\varphi}$ is a stopping time w.r.t. $(\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ as well.

Lemma 9 (Adaptedness of Induced Stochastic Process).

$\mathbf{X}^{f,I}$ * is adapted to $(\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ , i.e., $X_{n}^{f,I}$ is $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ –measurable.*

The loop space, the loop measure, and the induced stochastic process $\mathbf{X}^{f,I}$ are not defined by induction on the number of steps performed in the program. The loop space, for instance, contains all infinite sequences of states, whether they are admissible by the loop or not. The loop measure filters out the inadmissible runs and gives them probability 0.

Reasoning by invariants and characteristic functions, on the other hand, is inductive. We will thus relate iterating a characteristic function on $I$ to the stochastic process $\mathbf{X}^{f,I}$ . For this, let $\Phi_{f}$ again be the characteristic function of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $f$ , i.e.,

[TABLE]

We now develop a first connection between the stochastic process $\mathbf{X}^{f,I}$ and $\Phi_{f}$ , which involves the notion of conditional expected values with respect to a $\sigma$ –field, for which we provide some preliminaries here. In general, for $M\subseteq\Omega$ , by slight abuse of notation, the Iverson bracket $\left[{M}\right]:\Omega\to\overline{\mathbb{R}}_{\geq 0}$ maps $\vartheta\in\Omega$ to $1$ if $\vartheta\in M$ and to [math] otherwise. $\left[{M}\right]$ is $\mathfrak{F}$ –measurable iff $M\in\mathfrak{F}$ . If $X$ is a random variable on $(\Omega,\mathfrak{F},\mathbb{P})$ and $\mathfrak{G}\subseteq\mathfrak{F}$ is a $\sigma$ –field with respect to $\Omega$ , then the conditional expected value $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is a $\mathfrak{G}$ –measurable mapping such that for every $G\in\mathfrak{G}$ the equality $\phantom{}\mathbb{E}\left(X\cdot\left[{G}\right]\right)=\phantom{}\mathbb{E}\left(\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\cdot\left[{G}\right]\right)$ holds, i.e., restricted to the set $G$ the conditional expected value $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)$ and $X$ have the same expected value. Hence, $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)$ is a random variable that is like $X$ , but for elements that are indistinguishable in the subfield $\mathfrak{G}$ , i.e., they either are both contained or none of them is contained in a $\mathfrak{G}$ –measurable set, it “distributes the value of $X$ equally”.

Theorem 10 (Relating $\boldsymbol{\mathbf{X}^{f,I}}$ and $\boldsymbol{\Phi_{f}}$ ).

For any $n\in\mathbb{N}$ and any $s\in\Sigma$ , we have

[TABLE]

Note that both sides in Thm. 10 are mappings of type $\Omega^{\mbox{\tiny\rm loop}}\to\overline{\mathbb{R}}_{\geq 0}$ . Intuitively, Thm. 10 expresses the following: Consider some cylinder $Cyl(\pi)\in\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ , i.e., $\pi=s_{0}\cdots s_{n+1}\in\Sigma^{n+2}$ is a sequence of states of length $n+2$ . Then, $X_{n}^{f,\Phi_{f}(I)}$ and $X_{n+1}^{f,I}$ have the same expected value under ${}^{s}\mathbb{P}$ on the cylinder set $Cyl(\pi)$ independent of the initial state $s$ of the loop.

Using Thm. 10, one can now explain in which way iterating $\Phi_{f}$ on $I$ represents an expected value, thus revealing the inductive structure inside the induced stochastic process:

Corollary 11 (Relating Expected Values of $\boldsymbol{\mathbf{X}^{f,I}}$ and Iterations of $\boldsymbol{\Phi_{f}}$ ).

For any $n\in\mathbb{N}$ and any $s\in\Sigma$ , we have

[TABLE]

Intuitively, $\Phi_{f}^{n+1}$ represents allowing for at most $n+1$ evaluations of the loop guard. For any state $s\in\Sigma$ , the number $\Phi_{f}^{n+1}(I)(s)$ is composed of

(a)

$f$ ’s average value on the final states of those runs starting in $s$ that terminate within $n+1$ guard evaluations, and

(b)

$I$ ’s average value on the $(n+2)$ –nd states of those runs starting in $s$ that do not terminate within $n+1$ guard evaluations.

We now want to take $n$ to the limit by considering all possible numbers of iterations of the loop body. We will see that this corresponds to evaluating the stochastic process $\mathbf{X}^{f,I}$ at the time when our loop terminates, i.e., the looping time $T^{\neg\varphi}$ :

Definition 12 (Canonical Stopped Process).

The mapping

[TABLE]

is the stopped process, corresponding to $\mathbf{X}^{f,I}$ stopped at stopping time $T^{\neg\varphi}$ . As this mapping is independent of $I$ , we write $X_{T^{\neg\varphi}}^{f}$ instead of $X_{T^{\neg\varphi}}^{f,I}$ .

The stopped process now corresponds exactly to the quantity we want to reason about — the value of $f$ evaluated in the final state after termination of our loop. For nonterminating runs we get [math], as there exists no state in which to evaluate $f$ .

We now show that the limit of the induced stochastic process $\mathbf{X}^{f,I}$ corresponds to the stopped process $X_{T^{\neg\varphi}}^{f}$ . For the following lemma, note that a statement over runs $\alpha$ holds almost–surely in the probability space $(\Omega^{{\mbox{\tiny\rm loop}}},\mathfrak{F}^{{\mbox{\tiny\rm loop}}},{}^{s}\mathbb{P})$ , if $\phantom{}{}^{s}\mathbb{P}\left(\left\{\,{\vartheta\in\Omega}~{}\middle|~{}{\vartheta\text{ satisfies }\alpha}\,\right\}\right)=1$ , i.e., the set of all elements of the sample space satisfying $\alpha$ has probability $1$ .

Lemma 13 (Convergence of $\boldsymbol{\mathbf{X}^{f,I}}$ to $\boldsymbol{X_{T^{\neg\varphi}}^{f}}$ ).

The stochastic process $\mathbf{X}^{f,I}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ converges point–wise to $X_{T^{\neg\varphi}}^{f}$ , i.e., for all $\vartheta\in\Omega^{\mbox{\tiny\rm loop}}$ ,

[TABLE]

So if $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ is universally almost–surely terminating, then $\mathbf{X}^{f,I}$ converges to $X_{T^{\neg\varphi}}^{f}$ almost–surely with respect to the measure $\phantom{}{}^{s}\mathbb{P}$ for any $s\in\Sigma$ .

Intuitively, the factor $\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right](\vartheta)$ selects those runs $\vartheta$ where the looping time $T^{\neg\varphi}$ is finite. If the loop is AST, then this factor can be neglected, because then $\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ is the constant function $1$ for the probability measures $\phantom{}{}^{s}\mathbb{P}$ . In any case, (i.e., whether the looping time is almost–surely finite or not) the expected value of the stopped process captures precisely the weakest preexpectation of our loop with respect to the postexpectation $f$ , since only the terminating runs are taken into account by $X_{T^{\neg\varphi}}^{f}$ and by $\textnormal{{{lfp}}}~{}\Phi_{f}$ when computing the expected value of $f$ after termination of the loop. So from Cor. 11 and Lem. 13 we get our first main result:

Theorem 14 (Weakest Preexpectation is Expected Value of Stopped Process).

[TABLE]

Thm. 14 captures our sought–after least fixed point as an expected value of a canonical stopped process. This is what will allow us to later apply the Optional Stopping Theorem. Moreover, it is crucial for deriving our generalization of an existing rule for lower bounds (cf. Sect. 6) and the connection of upper bounds to the Lemma of Fatou (cf. Sect. 7).

4.3. Uniform Integrability

As we will see in Sect. 5, uniform integrability of a certain stochastic process is the central aspect of the Optional Stopping Theorem (Thm. 3). In probability theory, uniform integrability means that taking the expected value and taking the limit of a stochastic process commutes.

Definition 15 (Uniform Integrability of Stochastic Processes, (Grimmett and

Stirzaker, 2001, Lemma 7.10.(3))).

Let $\mathbf{X}=(X_{n})_{n\in\mathbb{N}}$ be a stochastic process on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ with almost–surely existing limit $\lim_{n\to\omega}X_{n}$ . The process $\mathbf{X}$ is uniformly integrable if

[TABLE]

Counterexample 16 ((Grimmett and

Stirzaker, 2001, Sect. 7.10)).

Consider the stochastic process $\mathbf{X}=(X_{n})_{n\in\mathbb{N}}$ on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ with $\phantom{}\mathbb{P}\left(Y_{n}=n\right)=\tfrac{1}{n}=1-\phantom{}\mathbb{P}\left(Y_{n}=0\right)$ . Then $\phantom{}\mathbb{E}\left(X_{n}\right)=1$ . Moreover, $\mathbf{X}$ converges almost surely to $Y\equiv 0$ , i.e., the constant function [math]. So, $\phantom{}\mathbb{E}\left(Y\right)=0$ . But

[TABLE]

so $\mathbf{X}$ is not u.i.

Note that our notion of uniform integrability of expectations from Def. 8 coincides with uniform integrability of the corresponding induced stochastic process.

Corollary 17 (Uniform Integrability of Expectations and Stochastic Processes).

Let the loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ be AST.141414It suffices that $\phantom{}{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\infty\right)=1$ for any $s$ . But this is equivalent to AST as we required the body of the loop to be AST. Then $I$ is uniformly integrable for $f$ (in the sense of Def. 8) iff the induced stochastic process $\mathbf{X}^{f,I}$ is uniformly integrable (in the sense of Def. 15), i.e.,

[TABLE]

Cor. 17 justifies the naming in Def. 8: an expectation $I$ is uniformly integrable for $f$ iff its induced process $\mathbf{X}^{f,I}$ is uniformly integrable. So, we can now extend the diagram from Sect. 3.4 as follows:

${\mathbf{X}^{f,I}~{}\textnormal{u.i.}\quad}$${\quad\phantom{}{}^{\bullet}\mathbb{E}\left(X_{n}^{f,I}\right)\xrightarrow{~{}n\to\omega~{}}\phantom{}^{\bullet}\mathbb{E}\left(X_{T^{\neg\varphi}}^{f}\right)}$${I~{}\textnormal{u.i.\ for}~{}f\quad}$${\quad\Phi_{f}^{n}(I)\xrightarrow{~{}n\to\omega~{}}\textnormal{{{lfp}}}~{}\Phi_{f}}$${I\preceq\Phi_{f}(I)\Rightarrow I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}}$ Cor. 17 Lem. 13 and 15

Cor. 11

and Thm. 14

Def. 8

Thm. 7

and Def. 8

Uniform integrability is very hard to verify in general, both in the realm of stochastic processes as well as in the realm of expectation transformers. Thus, one usually tries to find sufficient criteria for uniform integrability that are easier to verify. The very idea of the Optional Stopping Theorem is to provide such sufficient criteria for uniform integrability which then allow deriving a lower bound as we will discuss in the next section.

5. The Optional Stopping Theorem of Weakest Preexpectations

In this section, we develop an inductive proof rule for lower bounds on preexpectations by using the results of Sect. 4 and the Optional Stopping Theorem (Thm. 3). The proofs of our results in this section can be found in App. D. Recall that we have fixed a loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ , a finite postexpectation $f$ , a corresponding characteristic function $\Phi_{f}$ , and another finite expectation $I$ which plays the role of an invariant.

We first introduce the Optional Stopping Theorem from probability theory. It builds upon the concept of submartingales. A submartingale is a stochastic process that induces a monotonically increasing sequence of its expected values.

Definition 1 (Submartingale).

Let $(X_{n})_{n\in\mathbb{N}}$ be a stochastic process on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ adapted to a filtration $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ of $\mathfrak{F}$ , i.e., a sequence of random variables $X_{n}\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ such that $X_{n}$ is $\mathfrak{F}_{n}$ –measurable. Then $(X_{n})_{n\in\mathbb{N}}$ is called a submartingale with respect to $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ if

[TABLE]

It turns out that submartingales are closely related to subinvariants. In fact, $I$ being a subinvariant (plus some side conditions) gives us that the stochastic process induced by $I$ is a submartingale.

Lemma 2 (Subinvariant Induces Submartingale).

Let $I$ be a subinvariant, i.e., $I\preceq\Phi_{f}(I)$ , such that $\Phi_{f}^{n}(I)\mathrel{{\prec}{\prec}}\infty$ for every $n\in\mathbb{N}$ , that is, $\Phi_{f}^{n}(I)$ only takes finite values. Then the induced stochastic process $\mathbf{X}^{f,I}$ is a submartingale with respect to $(\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ .

Given a submartingale $(X_{n})_{n\in\mathbb{N}}$ and a stopping time $T$ , the goal of the Optional Stopping Theorem is to prove a lower bound for the expected value of $X_{n}$ at the stopping time $T$ . To this end, we define a stochastic process $(X_{n\wedge T})_{n\in\mathbb{N}}$ where for any $\vartheta\in\Omega$ , $X_{n\wedge T}(\vartheta)=X_{n}(\vartheta)$ if $n$ is smaller than the stopping time $T(\vartheta)$ and otherwise, $X_{n\wedge T}(\vartheta)=X_{T(\vartheta)}(\vartheta)$ . Hence, $\phantom{}\mathbb{E}\left(\lim_{n\to\omega}X_{n\wedge T}\right)$ is the expected value of $X_{n}$ at the stopping time $T$ . The Optional Stopping Theorem asserts that the first component $X_{0}$ of the stochastic process $(X_{n})_{n\in\mathbb{N}}$ is a lower bound for $\phantom{}\mathbb{E}\left(\lim_{n\to\omega}X_{n\wedge T}\right)$ provided that $(X_{n\wedge T})_{n\in\mathbb{N}}$ is uniformly integrable. Moreover, the Optional Stopping Theorem provides a collection of criteria that are sufficient for uniform integrability of $(X_{n\wedge T})_{n\in\mathbb{N}}$ .

Theorem 3 (Optional Stopping Theorem (Grimmett and

Stirzaker, 2001, Theorems 12.3.(1), 12.4.(11), 12.5.(1), 12.5.(2), 12.5.(9))).

Let $(X_{n})_{n\in\mathbb{N}}$ be a submartingale and $T$ be a stopping time on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ with respect to a filtration $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ . Then $\mathbf{X}_{\wedge T}=(X_{n\wedge T})_{n\in\mathbb{N}}$ defined by

[TABLE]

is also a submartingale w.r.t. $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ . If $\mathbf{X}_{\wedge T}$ converges almost–surely and is uniformly integrable,

[TABLE]

If one of the following conditions holds, then $\mathbf{X}_{\wedge T}$ converges almost–surely and is uniformly integrable:

(a)

$T$ * is almost–surely bounded, i.e., there is a constant $N\in\mathbb{N}$ such that $\phantom{}\mathbb{P}\left(T\leq N\right)=1$ .* 2. (b)

$\phantom{}\mathbb{E}\left(T\right)<\infty$ * and there is a constant $c\in\mathbb{R}_{\geq 0}$ , such that for each $n\in\mathbb{N}$ *

[TABLE] 3. (c)

There exists a constant $c\in\mathbb{R}_{\geq 0}$ such that $X_{n\wedge T}\leq c$ holds almost–surely for every $n\in\mathbb{N}$ .

Our goal now is to transfer the Optional Stopping Theorem from probability theory to the realm of weakest preexpectations in order to obtain inductive proof rules for lower bounds on weakest preexpectations. So far, we have introduced the looping time $T^{\neg\varphi}$ (which is a stopping time w.r.t. $(\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ ), presented the connection of subinvariants and submartingales, and defined the concept of uniform integrability also for expectations. Hence, the only missing ingredient is a proper connection of expectations to the condition “ $\phantom{}\mathbb{E}\left(\left\lvert X_{n+1}-X_{n}\right\rvert\mid\mathfrak{F}_{n}\right)\leq c$ ” in Thm. 3 (b). To translate this concept to expectations, we require that the expectation $I$ has a certain shape depending on the postexpectation $f$ .

Definition 4 (Harmonization).

An expectation $I$ harmonizes with $f\in\mathbb{F}$ if it is of the form

[TABLE]

for some expectation ${I}^{\prime}\in\mathbb{F}$ .

Def. 4 reflects that in terminal states $t$ of the loop the invariant $I$ evaluates to $f(t)$ . For an invariant $I$ to harmonize with postexpectation $f$ is a minor restriction on the shape of $I$ . It is usually easy to choose an $I$ that takes the value of $f$ for states in which the loop is not executed at all. Moreover, performing one iteration of $\Phi_{f}$ obviously brings any expectation “into shape”:

Corollary 5 (Harmonizing Expectations).

For any $f,J\in\mathbb{F}$ , $\Phi_{f}(J)$ harmonizes with $f$ .

The actual criterion that connects “ $\phantom{}\mathbb{E}\left(\left\lvert X_{n+1}-X_{n}\right\rvert\mid\mathfrak{F}_{n}\right)\leq c$ ” with the invariant $I$ is called conditional difference boundedness (see also (Fu and Chatterjee, 2019; Fioriti and Hermanns, 2015)):

Definition 6 (Conditional Difference Boundedness).

Let $I\in\mathbb{F}$ . We define the function $H\colon\mathbb{F}\to\mathbb{F}$ and the expectation $\Delta I\in\mathbb{F}$ as151515Recall that we have fixed a loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ .

[TABLE]

The expectation $I$ is called conditionally difference bounded (c.d.b.) if for some constant $c\in\mathbb{R}_{\geq 0}$

[TABLE]

The expectation $\Delta I$ expresses the expected change of $I$ within one loop iteration. So, if $I$ is c.d.b. it is expected to change at most by a constant in one loop iteration.

Example 7.

Reconsider the program $C_{\mathit{cex}}$ from Counterex. 6 and expectation $I=b+\left[{a\neq 0}\right]$ . We will check conditional difference boundedness of $I$ , using the function $H$ given by

[TABLE]

We then check the following:

[TABLE]

Thus, $I$ is c.d.b. by the constant $1$ . In contrast, the subinvariant ${I}^{\prime}=b+\left[{a\neq 0}\right]\cdot(1+2^{k})$ from Counterex. 6 is not conditionally difference bounded. Indeed, we would get (cf. App. D for details)

[TABLE]

which cannot be bounded by a constant.

Finally, we can connect the expected change of $I$ to a property of the stochastic process $\mathbf{X}^{f,I}$ . This is our second major result.

Theorem 8 (Expected Change of $\boldsymbol{I}$ ).

Let $I\mathrel{{\prec}{\prec}}\infty$ harmonize with $f$ . Then

[TABLE]

The stochastic process $\mathbf{X}^{f,I}$ induced by $I$ exhibits an interesting correspondence: If $\Delta I$ is bounded by a constant $c$ (i.e., if $I$ is c.d.b.), then so is $X_{n}^{0,\Delta I}$ and thus Thm. 8 ensures that precondition (b) of the Optional Stopping Theorem (Thm. 3) is fulfilled. Note that Thm. 8 depends crucially on the fact that $I\mathrel{{\prec}{\prec}}\infty$ as otherwise the well–definedness of the expectation $\Delta I$ cannot be ensured.

Now Lem. 2 allows us to use the Optional Stopping Theorem from probability theory (Thm. 3) to prove a novel Optional Stopping Theorem for weakest preexpectations, which collects sufficient conditions for uniform integrability. In particular, due to Thm. 8, our Optional Stopping Theorem shows that our notion of conditional difference boundedness is an (easy–to–check) sufficient criterion for uniform integrability and hence, for ensuring that a subinvariant is indeed a lower bound for the weakest preexpectation under consideration. After stating the theorem, we will discuss the intuition of its parts in more detail.

Theorem 9 (Optional Stopping Theorem for Weakest Preexpectation Reasoning).

Consider a loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ where $C$ is AST. Let $I\mathrel{{\prec}{\prec}}\infty$ be a subinvariant w.r.t. the postexpectation $f\mathrel{{\prec}{\prec}}\infty$ (i.e., $I\preceq\Phi_{f}(I)$ ). $I$ is uniformly integrable for $f$ iff $I$ is a lower bound, i.e.,

[TABLE]

$I$ * is uniformly integrable for $f$ if one of the following three conditions holds:*

(a)

The looping time $T^{\neg\varphi}$ of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ is almost–surely bounded, i.e., for every state $s\in\Sigma$ there exists a constant $N(s)\in\mathbb{N}$ with $\phantom{}{}^{s}\mathbb{P}\left(T^{\neg\varphi}\leq N(s)\right)=1$ and $\Phi_{f}^{n}(I)\mathrel{{\prec}{\prec}}\infty$ for every $n\in\mathbb{N}$ . 2. (b)

The expected looping time of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ is finite for every initial state $s\in\Sigma$ , $I$ harmonizes with $f$ , $\Phi_{f}(I)\mathrel{{\prec}{\prec}}\infty$ , and $I$ is conditionally difference bounded. 3. (c)

Both $f$ and $I$ are bounded and $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ is AST.

We can now extend the diagram from Sect. 4 connecting the realm of stochastic processes (on the right) and the realm of expectation transformers (on the left) for a universally almost–surely terminating program. The respective Optional Stopping Theorems provide the sufficient criteria for uniform integrability, which is marked by the dashed implications.

${I~{}\textnormal{c.d.b.~{}by}~{}c\quad}$${\quad\phantom{}{}^{\bullet}\mathbb{E}\left(\big{|}X_{n+1}^{f,I}-X_{n}^{f,I}\big{|}~{}\Big{|}~{}\mathfrak{G}_{n}\right)\leq c}$${\mathbf{X}^{f,I}~{}\textnormal{u.i.}\quad}$${\quad\phantom{}{}^{\bullet}\mathbb{E}\left(X_{n}^{f,I}\right)\xrightarrow{~{}n\to\omega~{}}\phantom{}^{\bullet}\mathbb{E}\left(X_{T^{\neg\varphi}}^{f}\right)}$${I~{}\textnormal{u.i.\ for}~{}f\quad}$${\quad\Phi_{f}^{n}(I)\xrightarrow{~{}n\to\omega~{}}\textnormal{{{lfp}}}~{}\Phi_{f}}$${I\preceq\Phi_{f}(I)\Rightarrow I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}}$ Thm. 8 Thm. 9 (b) Thm. 3 Cor. 17 Lem. 13 and 15

Cor. 11

and Thm. 14

Def. 8

Thm. 7

and Def. 8

Let us elaborate on the different cases of our Optional Stopping Theorem (Thm. 9): Case (a) yields an alternative proof for the technique of so–called metering functions by (Frohn et al., 2016) for deterministic terminating loops. As for the severity of the finiteness condition “ $\Phi_{f}^{n}(I)\mathrel{{\prec}{\prec}}\infty$ for every $n\in\mathbb{N}$ ”, note that if the body $C$ is loop–free, this condition is vacuously satisfied as $I$ itself is finite and cannot become infinite by finitely iterations of $\Phi_{f}$ . If $C$ contains loops, then we can establish the finiteness condition by finding a finite superinvariant $U$ with $I\preceq U\mathrel{{\prec}{\prec}}\infty$ . In this case, we can also guarantee $\Phi_{f}^{n}(I)\mathrel{{\prec}{\prec}}\infty$ .161616The reason is that we have $U\succeq{}^{\textsf{{wp}}}\Phi^{n}_{f}(U)$ for all $n\in\mathbb{N}$ by monotonicity of ${}^{\textsf{{wp}}}\Phi_{f}$ (Thm. 4). Then, $U\succeq I$ implies ${}^{\textsf{{wp}}}\Phi^{n}_{f}(U)\succeq{}^{\textsf{{wp}}}\Phi^{n}_{f}(I)$ also by monotonicity of ${}^{\textsf{{wp}}}\Phi_{f}$ , which gives us $\infty\mathrel{{\succ}{\succ}}U\succeq{}^{\textsf{{wp}}}\Phi^{n}_{f}(U)\succeq{}^{\textsf{{wp}}}\Phi^{n}_{f}(I)$ .

Case (b) applies whenever the outer loop is expected to be executed finitely often. In particular, this holds if the entire loop terminates positively almost–surely (i.e., within finite expected runtime).

To the best of our knowledge, Cases (a) and (b) are the first sufficiently simple induction rules for lower bounds that do not require restricting to bounded postexpectations $f$ . While the requirements on the loop’s termination behavior gradually weaken along $\textnormal{(a)}\rightarrow\textnormal{(b)}\rightarrow\textnormal{(c)}$ , the requirements on the subinvariant $I$ become stricter.

Finally, Case (c) yields an alternative proof of the result of (McIver and Morgan, 2005) on inductive lower bounds for bounded expectations in case of AST, which we will generalize in Sect. 6.

When comparing the cases (c) of Thm. 3 and Thm. 9, we notice that Thm. 3 (c) has no restrictions on the stopping time, whereas Thm. 9 (c) requires almost–sure termination. This might spark some hope that AST is not needed in Thm. 9 (c), but the following counterexample shows that this is not the case:

Counterexample 10.

Consider the program

[TABLE]

together with the bounded postexpectation $f=1$ , i.e., we are interested in the termination probability which is obviously [math]. The corresponding characteristic function is given by

[TABLE]

i.e., $\Phi_{1}$ is the identity map. Trivially, the bounded expectation $I=1$ is a fixed point of $\Phi_{1}$ , thus in particular $I$ is a subinvariant. Clearly, $I$ is not a lower bound on the actual termination probability, i.e., on $\textnormal{{{lfp}}}~{}\Phi_{1}$ . If the condition of almost–sure termination in Thm. 9 (c)* could be weakened, it has to be ensured that for any program $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with universally almost–surely terminating body $C$ 171717Note that in this case $1$ is always a subinvariant. and postexpectation $f=1$ , $1$ is a lower bound only if the program terminates universally almost–surely. But this means that this property has to be at least as strong as almost–sure termination.*

We reconsider Counterex. 6 illustrating unsoundness of simple lower induction and do sound lower induction instead.

Example 11.

Let us continue Ex. 7, where we have checked that for the program $C_{\mathit{cex}}$ the expectation $I=b+\left[{a\neq 0}\right]$ is conditionally difference bounded by $1$ . It is easy to check that $I$ is a fixed point of the characteristic function $\Phi_{b}$ with respect to the postexpectation $b$ , which by Park induction gives us a finite upper bound on the least fixed point of $\Phi_{b}$ . But up to now we could not prove that $I$ is indeed equal to the least fixed point. Using Thm. 9, we can now do this.

First of all, we already have $\Phi_{b}(I)=I\mathrel{{\prec}{\prec}}\infty$ and since $I$ is a fixed point, it is also a subinvariant. Secondly, the loop is expected to be executed twice.181818Positive almost–sure termination itself can also be verified by Park induction, see (Kaminski et al., 2016, 2018). Finally, $I=b+\left[{a\neq 0}\right]=\left[{\neg(a\neq 0)}\right]\cdot b+\left[{a\neq 0}\right]\cdot(b+1)$ harmonizes with $b$ and is conditionally difference bounded. Hence, the preconditions of Thm. 9 (b) are satisfied and $I$ is indeed a lower bound on $\textnormal{{{lfp}}}~{}\Phi_{b}$ . Since $I$ is a fixed point, it is the least fixed point, i.e., we have proved $\textsf{{wp}}\left\llbracket{C_{\mathit{cex}}}\right\rrbracket\left({b}\right)=I$ .

Further case studies demonstrating the effectiveness of our proof rule, as well as an example that cannot be treated by Thm. 9, are provided in App. A.

6. Lower Bound Rules by McIver and Morgan

In Sect. 5, we briefly mentioned the rules for lower bounds for bounded expectations by (McIver and Morgan, 2005) which are restated in Thm. 1 below. To the best of our knowledge, before our new Thm. 9 these were the only existing inductive proof rules for weakest preexpectations.

Theorem 1 ((McIver and Morgan, 2005)).

Let $f\in\mathbb{F}$ be a bounded postexpectation. Furthermore, let $I^{\prime}\in\mathbb{F}$ be a bounded expectation such that the harmonized expectation $I\in\mathbb{F}$ given by $I~{}{}={}~{}\left[{\neg\varphi}\right]\cdot f+\left[{\varphi}\right]\cdot I^{\prime}$ is a subinvariant of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $f$ . Finally, let $T=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({1}\right)$ be the termination probability of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ . Then:

(1)

If $I=\left[{G}\right]$ for some predicate $G$ , then $T\cdot I~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}.$ 2. (2)

If $\left[{G}\right]\preceq T$ for some predicate $G$ , then $\left[{G}\right]\cdot I~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}.$ 3. (3)

If $\varepsilon\cdot I\preceq T$ for some $\varepsilon>0$ , then $I~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}.$

Thm. 1 does not make any assumptions on the termination behavior of the loop, so, it is also possible to analyze programs with termination probability $<1$ . It turns out that Thm. 1 (1) – (3) can be proved easily from our results from Sect. 4 in the case where $C$ is AST where we do not need the restriction that $I$ harmonizes with $f$ . In particular, we can show that in Thm. 1 (3) the fact that $T$ is the probability of termination is insignificant (see App. E). In fact, it suffices if $T$ is the weakest preexpectation for some arbitrary bounded postexpectation, i.e., a least fixed point (see App. E for details and proofs). So, we obtain the following generalized version of Thm. 1 (3) in the case where $C$ is AST which is substantially more powerful: it states a sufficient condition for a subinvariant to be a lower bound but also a necessary condition. This is the main new contribution of this section.

Theorem 2 (Generalization of Thm. 1 (3)).

Let $f\in\mathbb{F}$ be a bounded postexpectation. Furthermore, let $I\in\mathbb{F}$ be a bounded expectation such that $I$ is a subinvariant of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $f$ where $C$ is AST. There exist $\varepsilon>0$ and $g\in\mathbb{F}$ bounded s.t.

[TABLE]

Example 3.

Let us consider the program $C_{\mathit{rdw}}$ for an asymmetric random walk

[TABLE]

with $x,y\in\mathbb{N}$ and $y\leq 100$ . This program is not AST but the body of the loop is indeed AST. Furthermore, the postexpectation $y$ is bounded. If $y\leq x$ initially then $y$ is [math] after termination of the program. So, $\textsf{{wp}}\left\llbracket{C_{\mathit{rdw}}}\right\rrbracket\left({y}\right)\geq\left[{y>x}\right]\cdot\left(\tfrac{1}{3}\right)^{x}\cdot(y-x)\coloneqq I$ .

Now consider $f=\left[{y\text{ even}}\right]\cdot 200\cdot y^{2}+\left[{y\text{ odd}}\right]\cdot(y+5)^{4}$ . We have $I^{\prime}\preceq\Phi_{f}(I^{\prime})$ , where $I^{\prime}=400\cdot I$ (see App. E). As we have $\tfrac{1}{400}\cdot I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{\mathit{rdw}}}\right\rrbracket\left({y}\right)$ we can conclude from Thm. 2 that $I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{\mathit{rdw}}}\right\rrbracket\left({f}\right)$ . Note that this is easier than relating $I^{\prime}$ and the termination probability as required by Thm. 1 since the probability of termination of the loop is independent of $y$ .

Of course, Ex. 3 is an artificial example. Nevertheless, it shows a strength of our generalization: it makes it easier to reason about bounded expectations which are independent of the probability of termination. However, a drawback of Thm. 1 remains: one already needs a lower bound, i.e., one has to be able to read off a lower bound directly from the program.

7. Upper Bounds and Fatou’s Lemma

We saw that Park induction for proving upper bounds does not require additional conditions such as conditional difference boundedness or even boundedness of $f$ or $I$ , respectively. The question arises whether this fact is also explainable using our canonical stochastic process. Indeed, the well–known Lemma of Fatou provides such an explanation. We will present a specialized variant of it which is sufficient for our purpose.

Lemma 1 (Fatou’s Lemma (cf. (Bauer, 1971, Lemma 2.7.1))).

Let $(X_{n})_{n\in\mathbb{N}}$ be a stochastic process on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ . Then

[TABLE]

where the $\lim$ on the left–hand–side is point–wise.

We can now reprove Park induction for wp using Fatou’s Lemma: Let $I$ be a superinvariant, i.e., $\Phi_{f}(I)\preceq I$ . By Thm. 10, the canonical stochastic process $\mathbf{X}^{f,I}$ satisfies

[TABLE]

By applying $\phantom{}{}^{s}\mathbb{E}$ on both sides, we obtain $\phantom{}{}^{s}\mathbb{E}\left(X_{0}^{f,I}\right)\geq\phantom{}^{s}\mathbb{E}\left(X_{1}^{f,I}\right)\geq\dots$ . This implies

[TABLE]

as $X^{f,I}_{n}\geq X^{f,I}_{n}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ . We conclude

[TABLE]

so $I$ is indeed an upper bound on the least fixed point.

Note that here we handle arbitrary loops, i.e., they are not necessarily AST. While $I$ being a superinvariant (plus some side conditions) still implies that $\mathbf{X}^{f,I}$ is a supermartingale, the second part of Lem. 13 is not applicable, i.e., in general we have $X_{T^{\neg\varphi}}^{f}\neq\lim_{n\to\omega}X_{n}^{f,I}$ if the loop is not AST. So in this case we cannot use classic results from martingale theory. Nevertheless, Fatou’s Lemma combined with Thm. 14 and the first part of Lem. 13 provide a connection of Park induction for upper bounds to stochastic processes.

8. Lower bounds on the expected runtime

So far, we have developed techniques for verifying lower bounds on weakest preexpectations, i.e., expected values of random variables upon program termination. In this section, we transfer those techniques to verify lower bounds on expected runtimes of probabilistic programs. For this, we employ the ert–transformer (Kaminski et al., 2018, 2016), which is very similar to the wp-transformer: Given program $C$ and postruntime $t\in\mathbb{F}$ , we are interested in the expected time it takes to first execute $C$ and then let time $t$ pass (where $t$ is evaluated in the final states reached after termination of $C$ ). Again, the behavior (and the runtime) of $C$ depends on its input, so we are actually interested in a function $g\in\mathbb{F}$ mapping initial states $s_{0}$ to the respective expected time. For more details, see also (Kaminski, 2019, Chapter 7). Similarly to weakest preexpectations, expected runtimes can be determined in a systematic and compositional manner by means of the ert calculus:

Definition 1 (The ert–Transformer (Kaminski et al., 2018, 2016)).

Let pGCL be again the set of programs in the probabilistic guarded command language. Then the expected runtime transformer

[TABLE]

is defined according to the rules given in Table 2. We call the function $\tensor*[^{\smash{\textsf{{ert}}}}_{\smash{\langle\varphi,C\rangle}}]{\Phi}{{}_{{t}}}$ the ert–characteristic function of the loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $t$ . Its least fixed point is understood in terms of the partial order $\preceq$ . To increase readability, we will again usually omit ert, $\varphi$ , $C$ , or $t$ from $\Phi$ whenever they are clear from the context.

Example 2 (Applying the ert Calculus).

Consider the probabilistic program $C$ given by

[TABLE]

Suppose we want to know the expected runtime of $C$ . Then we need to determine $\textsf{{ert}}\,\left\llbracket{C}\right\rrbracket\,\left({0}\right)$ . Reusing the annotation styles of Fig. 2(a) for wp, we make the following ert annotations:

[TABLE]

At the top, we read off the expected runtime of $C$ , namely $4+\left[{b\neq 5}\right]\cdot\tfrac{4}{5}$ . This tells us that the expected runtime of $C$ is $4$ if started in an initial state where $b$ is $5$ , and $4+\tfrac{4}{5}=\tfrac{24}{5}$ otherwise.

The ert– and the wp–transformers are not only similar in definition, but they are closely connected by the following equality (Olmedo et al., 2016):

[TABLE]

In addition, reasoning about upper bounds by Park induction works exactly the same way. For reasoning about lower bounds using subinvariants, notice above that $\textsf{{ert}}\,\left\llbracket{C}\right\rrbracket\,\left({0}\right)$ is independent of $t$ . So, we can combine our derivation of Thm. 9 for lower bounds on wp in Sect. 4 and 5 with the equation above to establish the first inductive rule for verifying lower bounds on expected runtimes:

Theorem 3 (Inductive Lower Bounds on Expected Runtimes).

Let $t,I\in\mathbb{F}$ with $t,I\mathrel{{\prec}{\prec}}\infty$ and let $I$ harmonize with $t$ . Furthermore, let ${}^{\textsf{{ert}}}\Phi_{t}$ be the ert–characteristic function of the loop $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ with respect to $t$ . If $I$ is conditionally difference bounded and ${}^{\textsf{{wp}}}\Phi_{t}(I)\mathrel{{\prec}{\prec}}\infty$ , then

[TABLE]

We call an $I$ that satisfies $I\preceq{}^{\textsf{{ert}}}\Phi_{t}(I)$ a runtime subinvariant.

The proof of Thm. 3 can be found in Sect. F.1. We now illustrate the applicability of Thm. 3:

Example 4 (Coupon Collector (Pólya, 1930)).

Consider the well–known coupon collector’s problem: There are $N$ different types coupons. A collector wants to collect at least one of each type. Each time she buys a new coupon, its type is drawn uniformly at random. How many coupons does she (expectedly) need to buy in order to have collected at least one coupon of each type?

We can model this problem by the program $C_{\mathit{cc}}$ for some non–zero natural number $N\in\mathbb{N}$ :191919In (Hark et al., 2020), the guard of the inner loop is just “ $x<i$ ” and thus, the body of the outer loop (as a standalone program) is not AST. Here, we fix this by changing the guard to “ $0<x<i$ ”. While we adapted the calculations accordingly, this does not affect our overall result since the outer loop is only entered if $x$ is positive.

[TABLE]

Variable $x$ represents the number of uncollected coupon types. The inner loop models the buying of new coupons until an uncollected type is drawn.202020The random assignment $i\mathrel{\textnormal{{:=}}}\mathrm{Unif}[1..N]$ does — strictly speaking — not adhere to our pGCL syntax, but it can be modeled in pGCL. For the sake of readability, we opted for $i\mathrel{\textnormal{{:=}}}\mathrm{Unif}[1..N]$ .

The expected runtime of $C_{\mathit{cc}}$ is proportional to the expected number of coupons the collector needs to buy. We want to prove that $N\cdot\mathcal{H}_{N}$ is a lower bound on that expected runtime, where $\mathcal{H}_{m}$ is the $m$ -th harmonic number, i.e., $\mathcal{H}_{0}=0$ and $\mathcal{H}_{m}=\sum_{k=1}^{m}\tfrac{1}{k}$ . For this, we make the following annotations, reusing the annotation style of Fig. 3(a) (for more detailed annotations, see Sect. F.2):

[TABLE]

By our above annotations, we have shown that212121In (Hark et al., 2020), there was a typo in the invariant $I$ . In the second case, it said “ $N-x$ ” which leads to negative values. Here, we correct this mistake and adapt the calculations accordingly. Again, this does not change the overall result.

[TABLE]

is indeed a runtime subinvariant of the outer loop. Before we finish proving that $I$ is indeed a lower bound on the expected runtime of the outer loop, let us take a closer look at the meaning of $I$ : If $I$ is a lower bound, the outer loop takes at least expected runtime $N\cdot\mathcal{H}_{x}$ if $x$ is between $1$ and $N$ , and expected runtime $N\cdot\mathcal{H}_{N}+x-N$ if $x$ is larger than $N$ . In the second case, the too–large $x$ value suggests that we have to collect more coupons than there are different coupons. So we first collect $x-N$ arbitrary “excess coupons” before we enter the “normal coupon collector mode” and collect the remaining $N$ coupons in expected time $N\cdot\mathcal{H}_{N}$ . Indeed, $N\cdot\mathcal{H}_{x}$ (without case analysis) is not a lower bound on the expected runtime and we would in fact fail to prove its subinvariance.

For the inner loop, we have used the fact that this loop is a so–called independent and identically distributed loop, for which exact expected runtimes can be determined (Batz et al., 2018, Theorem 4). For more details, see Sect. F.2, Lem. 1. We stress that while in this case we had an exact expected runtime for the inner loop available by external techniques, a suitable underapproximation of the expected runtime of the inner loop using the technique presented in this paper (Thm. 3) would have worked as well. Hence, our technique is generally applicable to nested loops.

At the very top of the above annotations, we push $I$ over the initial assignment, thus verifying $1+N\cdot\mathcal{H}_{N}$ (and hence also $N\cdot\mathcal{H}_{N}$ ) as lower bound for the entire expected runtime of $C_{\mathit{cc}}$ .

In order to establish that the subinvariant $I$ is in fact a lower bound, we are still left to prove conditional difference boundedness of $I$ . For this, we first make the following annotations:

[TABLE]

Now that we have determined $\smash{\textsf{{wp}}\left\llbracket{\mathit{outer~{}loop~{}body}}\right\rrbracket\left({\bigl{|}I-I(s)\bigr{|}}\right)}$ , we finally bound $\Delta I$ :

[TABLE]

Hence, $\Delta I$ is bounded by a constant, as $N$ is constant within the program $C_{\mathit{cc}}$ . Finally, we would still have to show ${}^{\textsf{{wp}}}\Phi_{t}(I)\mathrel{{\prec}{\prec}}\infty$ , which is easily checked and thus omitted here. This concludes our lower bound proof for the coupon collector’s problem.

In the example above, we have verified that $N\cdot\mathcal{H}_{N}$ is a lower bound on the expected runtime of the coupon collector program. This lower bound enjoys several nice properties: For one, our lower bound is an exact asymptotic lower bound. Another fact is that our lower bound is a strict lower bound. The actual runtime is a bit higher, as we have omitted some constants. This is, however, a desirable fact, as often we are only interested in the asymptotic runtime and do not wish to bother with the constants. Notice further, that we never had to find the limit of any sequence. Loop semantics (be it wp or ert) were all applied only finitely many times in order to verify a tight asymptotic lower bound.222222This is also true for the technique we used for the inner loop. All in all, the above example demonstrates the effectiveness of our inductive lower bound rule.

9. Related Work

Weakest preexpectation reasoning.

The weakest preexpectation calculus goes back to the predicate transformer calculus by (Dijkstra, 1975, 1976), which provides an important tool for qualitative formal reasoning about nonprobabilistic programs. The probabilistic and quantitative analog to predicate transformers for nonprobabilistic programs are expectation transformers for probabilistic programs. Weakest–preexpectation–style reasoning was first studied in seminal work on probabilistic propositional dynamic logic (PPDL) by (Kozen, 1983, 1985). Its box– and diamond–modalities provide probabilistic versions of Dijkstra’s weakest (liberal) preconditions. Amongst others, (Jones, 1990), (Morgan et al., 1996), (McIver and Morgan, 2005), and (Hehner, 2011) have furthered this line of research, e.g., by considering nondeterminism and proof rules for bounding preexpectations in the presence of loops. Work towards automation of weakest preexpectation reasoning was carried out, amongst others, by (Chen et al., 2015), (Cock, 2014), (Katoen et al., 2010), and (Feng et al., 2017). Abstract interpretation of probabilistic programs was studied in this setting by (Monniaux, 2005).

Bounds on weakest preexpectations.

Rules for bounding weakest preexpectations were considered from very early on. Already (Kozen, 1983) provides an induction rule for verifying upper bounds. Pioneering work on lower bounds by means of limits of sequences was carried out by (Jones, 1990) and later reconsidered by (Audebaud and Paulin-Mohring, 2009). Proof rules that do not make use of limits were studied by (Morgan, 1996) and later more extensively in (McIver and Morgan, 2005). An orthogonal approach to lower bounds by means of bounded model checking was explored by (Jansen et al., 2016).

Advanced weakest preexpectation calculi.

Apart from reasoning about expected values of random variables at termination of simple pGCL programs, more advanced expectation–based calculi were invented. For instance, (Morgan and McIver, 1999) use expectation transformers to reason about temporal logic. More recently, (Olmedo et al., 2018) studies expectation transformers for probabilistic programs with conditioning. (Kaminski et al., 2016; Olmedo et al., 2016; Kaminski et al., 2018) introduce expectation based calculi to reason about expected runtimes of probabilistic programs. (Batz et al., 2019) present a quantitative separation logic together with a weakest preexpectation calculus for verifying probabilistic programs with pointer–access to dynamic memory.

In all of the above works, the rules for lower bounds rely throughout on finding limits of sequences as well as the sequences themselves. In particular, the proof of the (exact) expected runtime of the coupon collector by (Kaminski et al., 2016) requires a fairly complicated sequence, whereas our invariant in Ex. 4 was conceptually fairly easy and thus more informative for a human.

Martingale–based reasoning.

Probabilistic program analysis using martingales was pioneered by (Chakarov and Sankaranarayanan, 2013). Our rules rely on the notions of uniform integrability and conditional difference boundedness as well as the Optional Stopping Theorem. Previous works have also used these notions. (Barthe et al., 2016) focus on synthesizing exact martingale expressions. (Fioriti and Hermanns, 2015) develop a type system for uniform integrability in order to prove (positive) almost–sure termination232323Termination with probability 1 (within finite expected time). of probabilistic programs and give upper bounds on the expected runtime. (Fu and Chatterjee, 2019) give lower bounds on expected runtimes. (Kobayashi et al., 2020) provide a semi–decision procedure for lower bounding termination probabilities of probabilistic higher–order recursive programs. (Ngo et al., 2018) perform automated template–driven resource analysis, but infer upper bounds only.

The latter four works analyze the termination behavior of a probabilistic program, whereas we focus on general expected values, e.g., of program variables. Furthermore, we do not only make use of uniform integrability and/or conditional difference boundedness of some auxiliary stochastic process in order to prove soundness of our proof rules but establish tight connections between expectation–based reasoning via induction and martingale–based reasoning.

Other work on probabilistic program analysis by specialized kinds of martingales includes (Chakarov and Sankaranarayanan, 2014), (Chatterjee et al., 2016), (Chatterjee et al., 2017), (Agrawal et al., 2018), (Huang et al., 2018), (Fu and Chatterjee, 2019), and (Wang et al., 2019). For instance, regarding expected runtimes of probabilistic (and possibly nondeterministic) programs, (Fu and Chatterjee, 2019) construct difference bounded (as opposed to conditionally difference bounded, which is a strictly weaker requirement) supermartingales which have to correspond to the exact asymptotic expected runtime. In contrast, our rule allows for reasoning about strict lower bounds.

10. Conclusion

In this paper, we have studied proof rules for lower bounds in probabilistic program verification. Our rules are simple in the sense that the invariants need to be “pushed through the loop semantics” only a finite number of times, much like invariants in Hoare logic. In contrast, existing rules for lower bounds of unbounded weakest preexpectations required coming up with an infinite sequence of invariants, performing induction to prove relative inductiveness of two subsequent invariants, and then — most unpleasantly — finding the limit of this sequence. The main results of this paper are the following:

(1)

We have presented the first inductive proof rules (Thm. 9 (a) and (b)) for verifying lower bounds on (possibly unbounded) weakest preexpectations of probabilistic while loops using quantitative invariants. Our inductive rules are given as an Optional Stopping Theorem (OST) for weakest preexpectations. They provide sufficient conditions for the requirement of uniform integrability which are much easier to check than uniform integrability in general. Case studies demonstrating the effectiveness but also the limitations of these rules are found in App. A. 2. (2)

For proving our OST, we resort to the classical OST from probability theory. However, for most notions that appear in the classical OST, like uniform integrability and conditional difference boundedness, we were able to find purely expectation–transformer–based counterparts (see Sect. 4 and 5). We thus conjecture that our OST can be proven in purely expectation–theoretic terms, which would most likely simplify the proof of our OST significantly as no probability theory would be required anymore. 3. (3)

We studied the inductive proof rules for lower bounds on bounded weakest preexpectations from (McIver and Morgan, 2005). Our results gave rise to a generalization of their proof rule to a sufficient and necessary criterion for lower bounds. (Thm. 2). 4. (4)

We have investigated a measure theoretical explanation for why verifying upper bounds using domain theoretical Park induction is conceptually simpler (Sect. 7). The underlying reason is the well–known Lemma of Fatou. This leads us to speculate that Fatou’s Lemma could be proved in purely domain theoretical terms, perhaps as an instance of Park induction. A successful attempt at a similar idea is due to (Baranga, 1991) who proved that the well–known Banach Contraction Principle is a particular instance of the Kleene Fixed Point Theorem. 5. (5)

We used the close connection between wp and ert to present the first inductive proof rule for lower bounding expected runtimes (Thm. 3). As an example to demonstrate the power of this rule, we inferred a nontrivial lower bound on the expected runtime of the famous coupon collector’s problem (Ex. 4).

Future work includes extending our proof rules for weakest preexpectation reasoning to recursive programs (Olmedo et al., 2016), to probabilistic programs with nondeterminism (McIver and Morgan, 2001, 2005), and to mixed–sign postexpectations. For the latter, this will likely yield more appealing proof rules for loops than those provided in (Kaminski and Katoen, 2017) which currently involve reasoning about sequences. Moreover, we are interested in (partially) automating the synthesis of the quantitative invariants needed in our proof rules.

Acknowledgements.

The authors gratefully acknowledge the support of the German Research Council (DFG) Research Training Group 2236 UnRAVeL and ERC Advanced Grant 787914 FRAPPANT. Furthermore, we would like to thank Florian Frohn and Christoph Matheja for many fruitful discussions on examples and counterexamples.

Appendix

This appendix contains additional material for our paper. App. A presents a collection of case studies to demonstrate the strengths and the limitations of our rule. In App. B, we give a more detailed introduction into the required preliminaries from probability theory. Afterwards, in App. C, we present the proofs for Sect. 4. App. D then contains the proofs of our main results in Sect. 5. In App. E we give the proofs for Sect. 6. Finally, App. F contains the proofs of our result for lower bounding the expected runtime from Sect. 8.

Appendix A Case Studies

Example 1 (Negative Binomial Loop (cf.

(McIver et al., 2018))).

Let us consider the program $C_{neg}$

[TABLE]

with $x,k\in\mathbb{N}$ . The characteristic function for $f=k$ of the program is given by

[TABLE]

The loop is expected to be executed $2\cdot\left[{x>0}\right]\cdot x$ times, so its expected looping time is finite. Intuitively, the value of $k$ after termination of the program is $I=\left[{x=0}\right]\cdot k+\left[{x>0}\right]\cdot(k+x)$ , i.e., the initial value of $k$ increases by the initial value of $x$ if the loop can be executed at all. Note that $I$ harmonizes with $f$ . We will prove that our intuition for $I$ is correct, i.e., that $I$ is indeed the least fixed point of $\Phi_{f}$ .

First of all, it is a fixed point of $\Phi_{f}$ :

[TABLE]

Since $I$ is indeed a fixed point of $\Phi_{f}$ , it is also a subinvariant and furthermore finite. To apply Thm. 9 (b)* we need to check that $I$ is conditionally difference bounded. We derive*

[TABLE]

So $I$ is indeed conditionally difference bounded by $1$ . Hence, we can apply our new Optional Stopping Theorem (Thm. 9 (b)) to obtain $I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}$ . As $I$ is also fixed point itself, it is the least fixed point of $\Phi_{f}$ , i.e.,

[TABLE]

Example 2 (Fair in the Limit Negative Binomial Loop).

Consider the slight adaption of $C_{neg}$ from Ex. 1 to $C_{filneg}$ :

[TABLE]

with $x,k\in\mathbb{N}$ . Note that for any $x>0$ we have $\tfrac{1}{3}\leq\frac{1}{2+\nicefrac{{1}}{{x}}}\leq\tfrac{1}{2}$ and $\frac{1}{2+\nicefrac{{1}}{{x}}}$ is monotonically increasing in the value of $x$ . Therefore one can show using the ert–transformer (cf. (Kaminski et al., 2018)) that the expected runtime of $C_{filneg}$ is at most $3\cdot\left[{x>0}\right]\cdot x$ . So we have positive almost–sure termination and therefore finite expected looping time. Again, we would like to reason about the expected value of $k$ after termination of $C_{filneg}$ . The characteristic function for $f=k$ of the program is given by

[TABLE]

Intuitively, the value of $k$ after termination of the program should again be at least $I=\left[{x=0}\right]\cdot k+\left[{x>0}\right]\cdot(k+x)$ , i.e., the initial value of $k$ again increases at least by the initial value of $x$ if the loop can be executed at all. We will prove that this intuition is correct, so that $I$ is indeed a lower bound on the least fixed point of $\Phi_{f}$ . Again, $I$ harmonizes with $f$ .

We first show that $I$ is a subinvariant:

[TABLE]

So $I$ is indeed a subinvariant of $\Phi_{f}$ and $\Phi_{f}(I)\mathrel{{\prec}{\prec}}\infty$ . To apply Thm. 9 (b)* we need to check that $I$ is conditionally difference bounded.*

[TABLE]

So $I$ is indeed conditionally difference bounded by $1$ . Hence, by Thm. 9 (b)* $I$ is a lower bound on the least fixed point of $\Phi_{f}$ , i.e.,*

[TABLE]

Example 3 (Negative Binomial Loop with Non–Constant Updates).

Let us consider another adaption of $C_{neg}$ from Ex. 1 to the program $C_{negncu}$ :

[TABLE]

with $x,y\in\mathbb{N}$ . This program is positively almost–sure terminating and the expected number of loop iterations is $\left[{x>0}\right]\cdot 2\cdot x$ . The characteristic function for $f=y$ of the program is given by $\Phi_{f}(X)=\left[{x=0}\right]\cdot y+\left[{x>0}\right]\cdot\frac{1}{2}\cdot\Bigl{(}X\left[{y}\middle/{y+x}\right]\left[{x}\middle/{x-1}\right]+X\Bigr{)}$ .

Intuitively, the value of $y$ after termination of the program should be $I=\left[{x=0}\right]\cdot y+\left[{x>0}\right]\cdot(y+\frac{x\cdot(x-1)}{2})$ , i.e., the initial value of $y$ increases by the sum $\sum\limits_{j=0}^{x-1}j=\frac{x\cdot(x-1)}{2}$ if the loop can be executed at all. We will prove that this intuition is correct, so that $I$ is indeed the least fixed point of $\Phi_{f}$ . Again, $I$ harmonizes with $f$ .

First of all, we show that $I$ is a fixed point of $\Phi_{f}$ :

[TABLE]

So $I$ is indeed a fixed point of $\Phi_{f}$ and $\Phi_{f}(I)\mathrel{{\prec}{\prec}}\infty$ . To apply Thm. 9 (b)* we need to check that $I$ is conditionally difference bounded.*

[TABLE]

So $I$ is indeed conditionally difference bounded. Hence, by Thm. 9 (b)* $I$ is a lower bound on the least fixed point of $\Phi_{f}$ . Since it is also a fixed point itself, we obtain*

[TABLE]

Example 4 (Probabilistic Doubling with Bounded Looping Time).

Let us consider the program $C_{double}$

[TABLE]

with $x,y\in\mathbb{N}$ . The characteristic function for $f=y$ of the program is given by

[TABLE]

The looping time of this program is $\left[{x>0}\right]\cdot x$ which is bounded by $\max(s(x),0)\in\mathbb{N}$ for any initial state $s\in\Sigma$ . We will prove that $I=\left[{x\leq 0}\right]\cdot y+\left[{x>0}\right]\cdot 2^{\frac{x}{2}}\cdot y$ is a lower bound on the expected value of $y$ after termination of the program.

[TABLE]

So $I$ is indeed a subinvariant. Furthermore, we have $I\preceq\Phi_{f}^{n}(I)\mathrel{{\prec}{\prec}}\infty$ for any $n\in\mathbb{N}$ , as the loop body is loop–free. So by using Thm. 9 (a) we can deduce that $I$ is indeed a lower bound on the least fixed point of $\Phi_{f}$ , i.e.,

[TABLE]

Note that in this case Thm. 9 (b) is not applicable. $I$ harmonizes with $f$ , but $I$ is not conditionally difference bounded:

[TABLE]

which is unbounded.

Nevertheless, even in the case of finite expected looping time, Thm. 9 just provides sufficient conditions for lower bounds. This is not surprising as conditional difference boundedness is a sufficient condition for uniform integrability but far from being necessary. The following example presents a limitation of our proof rule.

Example 5 (Probabilistic Doubling with Unbounded Looping Time).

Let us consider the program

[TABLE]

with $b\in\mathbb{N}$ and $b>0$ . The characteristic function for $f=b$ of the program is given by $\Phi_{f}(X)=\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\tfrac{1}{2}\cdot\Bigl{(}X\left[{a}\middle/{0}\right]+X\left[{b}\middle/{2\cdot b}\right]\Bigr{)}$ .

Now consider the sequence of invariants $I_{n}\coloneqq\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\frac{n\cdot b}{2}$ . Then $I_{n}=\Phi_{f}^{n+1}(0)$ : $I_{0}=\left[{a\neq 1}\right]\cdot b=\Phi_{f}(0)$ . Furthermore, $\Phi_{f}(I_{n})=\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\tfrac{1}{2}\cdot\Bigl{(}I_{n}\left[{a}\middle/{0}\right]+I_{n}\left[{b}\middle/{2\cdot b}\right]\Bigr{)}=\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\tfrac{1}{2}\cdot\Bigl{(}b+n\cdot b\Bigr{)}=I_{n+1}$ . Hence, $I_{0}\preceq I_{1}\preceq\cdots$ and $\left[{a=0}\right]\cdot b+\left[{a=1}\right]\cdot\infty=\lim_{n\to\omega}I_{n}=\textnormal{{{lfp}}}~{}\Phi_{f}$ by the Tarski–Kantorovich Principle Thm. 5. Moreover, each of the $I_{n}$ is a lower bound on the least fixed point, i.e., $I_{n}\preceq\textnormal{{{lfp}}}~{}\Phi_{f}$ .

Furthermore, the loop is expected to be executed twice. Clearly, Thm. 9 (a)* and (c) are not applicable. Let $n\geq 2$ . Then Thm. 9* (b)* is not applicable either, because $\Delta I_{n}$ is unbounded, i.e., $I_{n}$ is not conditionally difference bounded. To see this, let $s\in\Sigma$ .*

[TABLE]

which can take an arbitrary large value as there is no bound on the value of $b>0$ . Hence Thm. 9 (b)* cannot be applied although $I_{n}$ is a lower bound. However, in (Kaminski et al., 2016) it is proved that the $I_{n}$ form an $\omega$ –subinvariant, hence, they are all lower bounds.*

Appendix B Details on Probability Theory

This section is devoted to a more detailed introduction of the concepts from probability theory that we use in our work.

B.1. $\sigma$ –Fields

When setting up a probability space over some sample space $\Omega$ , which can be any set, we have to distinguish the sets whose probabilities we want to be able to measure. The collection of these measurable sets is called a $\sigma$ –field.

Definition 1 ( $\sigma$ –Field).

Let $\Omega$ be an arbitrary set and $\mathfrak{F}\subseteq Pot(\Omega)$ . $\mathfrak{F}$ is called a $\sigma$ –field over $\Omega$ if the following three conditions are satisfied.

(1)

$\Omega\in\mathfrak{F}$ , 2. (2)

$A\in\mathfrak{F}\Rightarrow\Omega\setminus A\in\mathfrak{F}$ * i.e., $\mathfrak{F}$ is closed under taking the complement,* 3. (3)

$A_{i}\in\mathfrak{F}\Rightarrow\bigcup\limits_{i\in\mathbb{N}}A_{i}\in\mathfrak{F}$ * i.e., $\mathfrak{F}$ is closed under countable union.*

The pair $(\Omega,\mathfrak{F})$ is called a measurable space. The elements of $\mathfrak{F}$ are called measurable sets.

In the setting of program verification, we have seen that $\Omega$ is the set of all program runs. A program run is an infinite sequence of states, i.e., variable assignments. We regard $\sigma$ –fields $\mathfrak{F}\subseteq Pot(\Omega)$ of the form $\left\langle\mathfrak{E}\right\rangle_{\sigma}$ , where $\mathfrak{E}$ is a collection of cylinder sets. (More precisely, we regard fields $\mathfrak{F}_{n}\subseteq Pot(\Omega)$ , where $\mathfrak{F}_{n}$ is the smallest $\sigma$ –field containing all cylinder sets of order $n$ or smaller.) In our setting, a set of runs $\mathfrak{E}$ is a cylinder set of order $n$ if all runs in $\mathfrak{E}$ have the same $n+1$ first configurations, and all the following configurations can be arbitrary.

Let $\mathfrak{F}$ and $\mathfrak{B}$ be $\sigma$ –fields over $\Omega$ . Then $\mathfrak{F}\cap\mathfrak{B}$ is a $\sigma$ –field over $\Omega$ . Furthermore, if $(\mathfrak{F}_{i})_{i\in I}$ is a family of $\sigma$ –fields over $\Omega$ then so is $\bigcap\limits_{i\in I}\mathfrak{F}_{i}$ .

For any set $\mathfrak{E}$ of subsets of $\Omega$ , let $\left\langle\mathfrak{E}\right\rangle_{\sigma}\subseteq\mathfrak{F}$ consist of all elements that are contained in all $\sigma$ –fields that are supersets of $\mathfrak{E}$ . The mapping from $\mathfrak{E}$ to $\left\langle\mathfrak{E}\right\rangle_{\sigma}$ is also called $\sigma$ –operator.

Definition 2 (Generating $\sigma$ –Fields).

Let $\mathfrak{E}\subseteq Pot(\Omega)$ . Then the smallest $\sigma$ –field over $\Omega$ containing $\mathfrak{E}$ is

[TABLE]

It turns out that there is a special case in which the generated $\sigma$ –field is easy to describe, namely in the case where a countable covering of the space $\Omega$ is given.

Lemma 3 (Generating $\sigma$ –Fields for Covering of $\Omega$ ).

If $\Omega=\biguplus\limits_{i=1}^{\infty}A_{i}$ for a sequence $A_{i}\in Pot(\Omega)$ and $\mathfrak{H}\coloneqq\left\langle\{A_{i}\mid i\in\mathbb{N}\}\right\rangle_{\sigma}$ then

[TABLE]

Proof.

Showing that $\mathfrak{E}$ is a $\sigma$ –field is enough to prove the desired result: it contains all the sets $A_{i}$ and every $\sigma$ –algebra containing all the $A_{i}$ has to contain all their countable unions, i.e., $\mathfrak{H}$ :

$\Omega\in\mathfrak{H}$ by choosing $J=\mathbb{N}$ .

Let $J\subseteq\mathbb{N}$ . Then $\Omega\setminus\left(\biguplus\limits_{i\in J}A_{i}\right)=\biguplus\limits_{i\in\mathbb{N}\setminus J}A_{i}\in\mathfrak{H}.$

Let $J_{n}\subseteq\mathbb{N}$ . Then $\bigcup\limits_{n\in\mathbb{N}}\biguplus\limits_{i\in J_{n}}A_{i}=\biguplus\limits_{i\in\bigcup\limits_{n\in\mathbb{N}}J_{n}}A_{i}\in\mathfrak{H}.$

∎

This special type of a $\sigma$ –field can be used to describe the elements of the $\sigma$ –fields $\mathfrak{F}_{n}$ (cf. Def. 5) and $\mathfrak{G}_{n}$ (cf. Def. 8). We will discuss it in more detail in Appendix C where we use it in the proofs.

Definition 4 (Borel–Field).

If $\Omega=\overline{\mathbb{R}}_{\geq 0}$ we use its $\sigma$ –field $\mathfrak{B}=\mathfrak{B}\left(\overline{\mathbb{R}}_{\geq 0}\right)$ , the Borel–field with

[TABLE]

In this work, we use the concept of measurable maps (or “measurable mappings”). Measurable maps are the structure–preserving maps between measurable spaces. They are defined as follows.

Definition 5 (Measurable Map).

Let $(\Omega_{1},\mathfrak{F}_{1})$ and $(\Omega_{2},\mathfrak{F}_{2})$ be measurable spaces. A function $f\colon\Omega_{1}\to\Omega_{2}$ is called an $\mathfrak{F}_{1}$ – $\mathfrak{F}_{2}$ measurable map or just measurable if for all $A\in\mathfrak{F}_{2}$

[TABLE]

$\left\langle f\right\rangle_{\sigma}\coloneqq f^{-1}(\mathfrak{F}_{2})\coloneqq\{f^{-1}(A)\mid A\in\mathfrak{F}_{2}\}$ * is the smallest $\sigma$ –field $\mathfrak{F}$ such that $f$ is $\mathfrak{F}_{1}$ – $\mathfrak{F}_{2}$ measurable. Similarly, $\left\langle f_{0},\ldots,f_{n}\right\rangle_{\sigma}=\left\langle f_{0}^{-1}(\mathfrak{F}_{2})\cup\ldots\cup f_{n}^{-1}(\mathfrak{F}_{2})\right\rangle_{\sigma}$ . This will become important when talking about random variables and conditional expected value.*

B.2. Probability Spaces

So far we have only introduced the concept of measurable spaces. Intuitively, a measurable space provides the structure for defining a measure. Probability spaces are measurable space attached with a certain measure where the measure of the sample space is $1$ .

Definition 6 (Probability Measure, Probability Space).

Let $(\Omega,\mathfrak{F})$ be a measurable space. A map $\mu\colon\mathfrak{F}\to\mathbb{R}_{\geq 0}$ is called a measure if

(1)

$\mu(\emptyset)=0$ ** 2. (2)

$\mu(\biguplus\limits_{i\geq 0}A_{i})=\sum\limits_{i\geq 0}\mu(A_{i})$ * .*

A probability measure is a measure $\mathbb{P}\colon\mathfrak{F}\to\mathbb{R}_{\geq 0}$ with $\mathbb{P}(\Omega)=1$ . This implies that $\mathbb{P}(A)\in[0,1]$ for every $A\in\mathfrak{F}$ . If $\mathbb{P}$ is a probability measure, then $(\Omega,\mathfrak{F},\mathbb{P})$ is called a probability space. In this setting a set in $\mathfrak{F}$ is called an event.

Here, the intuition for a probability measure $\mathbb{P}$ is that for any set $A\in\mathfrak{F}$ , $\mathbb{P}(A)$ is the probability that an element chosen from $\Omega$ is contained in $A$ .

In this work we will consider properties that hold almost–surely, for example almost–sure termination or almost–sure convergence.

Definition 7 (Almost–Sure Properties).

Let $(\Omega,\mathfrak{F},\mathbb{P})$ be a probability space and $\alpha$ some property, e.g., a logical formula. If $A_{\alpha}\coloneqq\{\vartheta\in\Omega\mid\vartheta\vDash\alpha\}\in\mathfrak{F}$ (i.e., it is measurable) and $\phantom{}\mathbb{P}\left(A_{\alpha}\right)=1$ then $\alpha$ is said to hold almost–surely.

If a property $\alpha$ holds almost–surely it does not need to hold for all $\vartheta\in\Omega$ . However, the measure $\mathbb{P}$ cannot distinguish $A_{\alpha}=\Omega$ and $A_{\alpha}\neq\Omega$ if $\phantom{}\mathbb{P}\left(A_{\alpha}\right)=1$ , so in the sense of $\mathbb{P}$ , almost–surely holding properties can be considered as holding globally.

B.3. Integrals of Arbitrary Measures

We now introduce a notion of an integral with respect to an arbitrary measure. Therefore, we fix a measurable space $(\Omega,\mathfrak{F})$ and a measure $\mu$ . The objective is to define a “mean” of a measurable function $f$ . The basic idea is to partition the image of $f$ into sets $A_{i}$ on which $f$ has a constant value $\alpha_{i}$ . Then we compute the weighted average of the $\alpha_{i}$ , where the weights are the measures of the $A_{i}$ . This definition is fine if $f$ takes only finitely many values (these functions are called elementary). If $f$ takes infinitely (countable or even uncountable) many values, we have to approximate $f$ step by step by such functions with finite image. This yields a limit process. In this work we will only consider the cases where $\mu=\mathbb{P}$ is a probability measure or a probability submeasure (i.e., $\mathbb{P}(\Omega)\leq 1$ ).

Definition 8 (Elementary Function, (Bauer, 1971, Def. 2.2.1)).

An elementary function is a nonnegative measurable function $f\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ that takes only finitely many finite values, i.e., there exist $A_{1},\dots,A_{n}\in\mathfrak{F}$ and $\alpha_{1},\dots,\alpha_{n}\geq 0$ such that

[TABLE]

Lemma 9 (Decomposition of Elementary Functions (Bauer, 1971, Lemma 2.2.2)).

Let $f=\sum\limits_{i=1}^{n}\alpha_{i}\cdot\left[{A_{i}}\right]=\sum\limits_{j=1}^{n}\beta_{j}\cdot\left[{B_{j}}\right]$ be an elementary function, where $\alpha_{i},\beta_{j}\geq 0$ , $A_{i},B_{j}\in\mathfrak{F}$ for all $i$ and $j$ . Then

[TABLE]

Given an elementary function, the decomposition into a linear combination of indicator functions is not unique. But to define an integral we have to guarantee that its value does not depend on the chosen decomposition. Fortunately, this can be proved:

Definition 10 (Integral of Elementary Function, (Bauer, 1971, Def. 2.2.3)).

Let $f=\sum\limits_{i=1}^{n}\alpha_{i}\cdot\left[{A_{i}}\right]$ be an elementary function. Then we define its integral w.r.t. $\mu$ as

[TABLE]

The well–definedness is justified by Lem. 9 as it shows the independence of the chosen decomposition of the elementary function.

However, the measurable functions we use are not elementary. They take arbitrary (countably or even uncountably) many values. So we have to generalize Def. 10. It can be shown that any nonnegative measurable function is the limit of a monotonic increasing sequence of elementary functions.

Theorem 11 (Representation by Elementary Functions).

Let $f\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be a nonnegative measurable function. Then there exists a monotonic sequence $f_{0}\leq f_{1}\leq\dots$ of elementary functions such that

[TABLE]

Furthermore, for any two such sequences $(f_{n})_{n\in\mathbb{N}}$ and $(g_{n})_{n\in\mathbb{N}}$ we have $\sup\limits_{n\in\mathbb{N}}\int f_{n}\,d\mu=\sup\limits_{n\in\mathbb{N}}\int g_{n}\,d\mu$ .

Proof.

See (Bauer, 1971, Cor. 2.3.2., Thm. 2.3.6). ∎

This theorem justifies the following definition of an integral for an arbitrary nonnegative function.

Definition 12 (Integral of Arbitrary Functions).

Let $f\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be a nonnegative measurable function and $(f_{n})_{n\in\mathbb{N}}$ a monotonic sequence of elementary functions such that $f=\sup_{n\in\mathbb{N}}f_{n}$ . Then we define the integral of $f$ w.r.t. $\mu$ by

[TABLE]

Before we state the properties of the integral used in this work we will define the integral on a measurable subset of $\Omega$ .

Definition 13 (Integral on Measurable Subset).

Let $f\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be a nonnegative measurable function and $A\in\mathfrak{F}$ . Then $f\cdot\left[{A}\right]$ is nonnegative and measurable, and we define

[TABLE]

With this definition of an integral, a very special property holds for monotonically increasing sequences of nonnegative functions: taking the limit (it always exists due to monotonicity) and the integral can be intertwined. We will focus on this when discussing uniform integrability.

Theorem 14 (Monotone Convergence Theorem, (Bauer, 1971, Thm. 2.3.4.)).

Let $(f_{n})_{n\in\mathbb{N}}$ be a monotonic sequence of nonnegative measurable functions, i.e., $f_{n}\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is measurable and $f_{0}\leq f_{1}\leq\dots$ . Then $\sup_{n\in\mathbb{N}}f_{n}=\lim\limits_{n\to\omega}f_{n}\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is measurable and

[TABLE]

Lemma 15 (Properties of the Integral).

Let $a,b\geq 0$ , $f,g\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be measurable functions, and $A,A_{i}\in\mathfrak{F}$ . Then

[TABLE]

Proof.

See (Bauer, 1971, (2.2.4.), (2.3.6.), (2.3.7.), Cor. 2.3.5.). ∎

B.4. Random Variables

A random variable $X$ maps elements of one set $\Omega$ to another set $\Omega^{\prime}$ . If $\mathbb{P}$ is a probability measure for $\Omega$ (i.e., for $A\subseteq\Omega$ , $\mathbb{P}(A)$ is the probability that an element chosen from $\Omega$ is contained in $A$ ), then one obtains a corresponding probability measure $\mathbb{P}^{X}$ for $\Omega^{\prime}$ . For $A^{\prime}\subseteq\Omega^{\prime}$ , $\mathbb{P}^{X}(A^{\prime})$ is the probability that an element chosen from $\Omega$ is mapped by $X$ to an element contained in $A^{\prime}$ . In other words, instead of regarding the probabilities for choosing elements from $A$ , one now regards the probabilities for the values of the random variable $X$ .

Definition 16 (Random Variable).

Let $(\Omega,\mathfrak{F},\mathbb{P})$ be a probability space. An $\mathfrak{F}$ – $\mathfrak{B}(\overline{\mathbb{R}}_{\geq 0})$ measurable map $X\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is a random variable. Instead of saying “ $\mathfrak{F}$ – $\mathfrak{B}\left(\overline{\mathbb{R}}_{\geq 0}\right)$ measurable” we simply use the notion “ $\mathfrak{F}$ –measurable”. It is called discrete random variable, if its image is a countable set.

$\mathbb{P}^{X}\colon{\mathfrak{F}}^{\prime}\to[0,1],A^{\prime}\mapsto\mathbb{P}(X^{-1}(A^{\prime}))$ * is the induced probability measure by $X$ on $(\overline{\mathbb{R}}_{\geq 0},\mathfrak{B}(\overline{\mathbb{R}}_{\geq 0}))$ . Instead of $\mathbb{P}^{X}(A)$ the notation $\mathbb{P}(X\in A)$ is common. If $A=\{i\}$ is a singleton set, we also write $\mathbb{P}(X=i)$ instead of $\mathbb{P}^{X}(\{i\})$ .*

Definition 17 (Expected Value).

Let $X\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be a random variable. Then $\phantom{}\mathbb{E}\left(X\right)\coloneqq\int Xd\mathbb{P}$ .

Lemma 18 (Expected Value as Sum).

If $X$ is a discrete random variable we have $\phantom{}\mathbb{E}\left(X\right)=\sum\limits_{r\in\overline{\mathbb{R}}_{\geq 0}}r\cdot\phantom{}\mathbb{P}\left(X=r\right)$ . Note that this series has only countably many nonzero nonnegative summands. Hence, it either converges or it diverges to infinity.

Proof.

Let $X(\Omega)=\{r_{1},r_{2},\dots\}$ . Then $\Omega=\biguplus\limits_{i\in\mathbb{N}}X^{-1}(\{r_{i}\})$ . Hence

[TABLE]

∎

B.5. Uniform Integrability

Now given any stochastic process (i.e., a sequence of random variables, $X_{n}:\Omega\to\overline{\mathbb{R}}_{\geq 0}$ on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ ) that has an almost–surely existing limit the question arises whether we can construct the expectation of the limit as the limit of the expectations of the $X_{n}$ . However, this is false in general, a counterexample is given in (Grimmett and Stirzaker, 2001, Introduction of 7.10). Therefore, we distinguish stochastic processes with this special property.

Definition 19 (Uniform Integrability, (Grimmett and

Stirzaker, 2001, Thm. 7.10.(3))).

Let $(X_{n})_{n\in\mathbb{N}}$ be a sequence of random variables converging almost–surely to a random variable $X$ . Then $(X_{n})_{n\in\mathbb{N}}$ is uniformly integrable if and only if $\lim\limits_{n\to\omega}\phantom{}\mathbb{E}\left(X_{n}\right)=\phantom{}\mathbb{E}\left(X\right)$ .

To check for uniform integrability is one of the main purposes of this work. There are two sufficient criteria which we will list below. The first one is the monotonic convergence theorem for random variables, a corollary of Thm. 14. It states that a sequence of monotonically increasing variables is always uniformly integrable.

Corollary 20 (Monotone Convergence Theorem for Random Variables, (Bauer, 1971, Thm. 2.3.4.)).

Let $(X_{n})_{n\in\mathbb{N}}$ be a monotonic sequence of nonnegative random variables, i.e., $X_{n}\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is measurable and $X_{0}\leq X_{1}\leq\dots$ . Then $\lim_{n\to\omega}X_{n}\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is measurable and

[TABLE]

The second sufficient criterion states that if the sequence is bounded by an integrable random variable $M$ , then uniform integrability is given as well.

Lemma 21 (Bounded Stochastic Processes are Uniformly Integrable (Grimmett and

Stirzaker, 2001, Thm. 7.10.(4))).

Let $(X_{n})_{n\in\mathbb{N}}$ be a sequence of random variables and $M$ a nonnegative random variable on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ with $X_{n}\leq M$ for all $n\in\mathbb{N}$ . If $\phantom{}\mathbb{E}\left(M\right)<\infty$ then $(X_{n})_{n\in\mathbb{N}}$ is uniformly integrable.

B.6. Conditional Expected Values

We introduce the notion of conditional expected value w.r.t. a sub– $\sigma$ –field on a fixed probability space $(\Omega,\mathfrak{F},\mathbb{P})$ . The idea is that given a random variable $X$ and a subfield $\mathfrak{G}\subseteq\mathfrak{F}$ we would like to approximate $X$ by another $\mathfrak{G}$ –measurable random variable w.r.t. expectation. Intuitively, this means that we want to construct a (possibly infinite) nonnegative linear combination of the functions $\left[{G}\right],G\in\mathfrak{G}$ , in such a way that restricted to a set $G\in\mathfrak{G}$ , the random variable $X$ and this linear combination have the same average value w.r.t. $\mathbb{P}$ .

Definition 22 (Conditional Expected Value w.r.t. a $\sigma$ –Field, (Bauer, 1971, Def. 10.1.2)).

Let $X\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be a random variable and $\mathfrak{G}$ be a sub– $\sigma$ –field of $\mathfrak{F}$ . A random variable $Y\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ is called a conditional expected value of $X$ w.r.t. $\mathfrak{G}$ if

(1)

$Y$ * is $\mathfrak{G}$ –measurable* 2. (2)

$\int_{G}Y\,d\mathbb{P}=\phantom{}\mathbb{E}\left(Y\cdot\left[{G}\right]\right)=\phantom{}\mathbb{E}\left(X\cdot\left[{G}\right]\right)=\int_{G}X\,d\mathbb{P}$ * for every $G\in\mathfrak{G}$ .*

If two such random variables $Y$ and ${Y}^{\prime}$ exist, the just stated properties already ensure that $\mathbb{P}(Y={Y}^{\prime})=1$ . Therefore a conditional expected value is almost–surely unique which justifies the notation $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\coloneqq Y$ .

If the sub– $\sigma$ –field has the special structure described in Lem. 3 then the just stated property just needs to be checked on the generators.

Lemma 23 (Conditional Expected Value).

Let $X$ be a random variable. If $\Omega=\biguplus\limits_{i=1}^{\infty}A_{i}$ for a sequence $A_{i}\in\mathfrak{F}$ and $\mathfrak{G}\coloneqq\left\langle\{A_{i}\mid i\in\mathbb{N}\}\right\rangle_{\sigma}$ then a $\mathfrak{G}$ –measurable function $Y$ is a conditional expected value of $X$ iff

[TABLE]

Proof.

By definition it is left to show that $\phantom{}\mathbb{E}\left(X\cdot\left[{A_{i}}\right]\right)=\phantom{}\mathbb{E}\left(Y\cdot\left[{A_{i}}\right]\right)$ for all $i\in\mathbb{N}$ implies $\phantom{}\mathbb{E}\left(X\cdot\left[{G}\right]\right)=\phantom{}\mathbb{E}\left(Y\cdot\left[{G}\right]\right)$ for any $G\in\mathfrak{G}$ . But due to Lem. 3 it is enough to show this for any disjoint union $\biguplus\limits_{i\in J}A_{i}$ . We have $\left[{\biguplus\limits_{i\in J}A_{i}}\right]=\sum\limits_{i\in J}\left[{A_{i}}\right]$ and this series always converges point–wise as at most one of the summands is nonzero for any $\alpha\in\Omega$ . Hence we have

[TABLE]

∎

It turns out that in our setting a conditional expectation always exists as it is almost surely finite.

Theorem 24 (Existence of Conditional Expected Values (Agrawal et al., 2018, Prop.

3.1.)).

Let $X\colon\Omega\to\overline{\mathbb{R}}_{\geq 0}$ be a random variable such that $\phantom{}\mathbb{P}\left(X=\infty\right)=0$ and let $\mathfrak{G}$ be a sub– $\sigma$ –field of $\mathfrak{F}$ . Then $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)$ exists.

This theorem helps us to find bounds on the conditional expected value if we are unable to determine it exactly.

Lemma 25.

Let $X$ be a random variable with $\phantom{}\mathbb{P}\left(X=\infty\right)=0$ and $\Omega=\biguplus\limits_{i=1}^{\infty}A_{i}$ for a sequence $A_{i}\in\mathfrak{F}$ , $\mathfrak{G}\coloneqq\left\langle\{A_{i}\mid i\in\mathbb{N}\}\right\rangle_{\sigma}$ and $Y$ a $\mathfrak{G}$ –measurable function. We have $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\leq Y$ iff

[TABLE]

Proof.

We prove the two directions separately.

“ $\Rightarrow$ ”

Let $i\in\mathbb{N}$ . Then we have by definition of the conditional expectation

[TABLE]

by monotonicity of the integral.

“ $\Leftarrow$ ”

By Thm. 24 the conditional expected value of $X$ exists and is itself a nonnegative $\mathfrak{G}$ –measurable random variable. Consider the random variable $Z=\max(\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)-Y,0):\Omega\to\overline{\mathbb{R}}_{\geq 0}$ . It is a result of measure theory (cf. (Bauer, 1971)) that $Z$ is a $\mathfrak{G}$ –measurable random variable. So consider the set $M=Z^{-1}((0,\infty))\in\mathfrak{G}$ . Then we know that $M$ is a disjoint union of some of the $A_{i}$ , so w.l.o.g. let us assume that $A_{i_{0}}\subseteq M$ . Again, a deep result from measure theory shows that $\phantom{}\mathbb{E}\left(Z\cdot\left[{A_{i_{0}}}\right]\right)>0$ (cf. (Bauer, 1971)) if $\phantom{}\mathbb{P}\left(A_{i_{0}}\right)>0$ . Then we have

[TABLE]

i.e., $\phantom{}\mathbb{E}\left(Y\cdot\left[{A_{i_{0}}}\right]\right)<\phantom{}\mathbb{E}\left(\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\cdot\left[{A_{i_{0}}}\right]\right)$ , a contradiction. So, we must have $\phantom{}\mathbb{P}\left(A_{i_{0}}\right)=0$ . As $i_{0}$ was chosen arbitrarily we have $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\leq Y$ a.s.

∎

The same proof can also be done for the inverse inequality, i.e., we have the following corollary.

Corollary 26.

Let $X$ be a random variable with $\phantom{}\mathbb{P}\left(X=\infty\right)=0$ and $\Omega=\biguplus\limits_{i=1}^{\infty}A_{i}$ for a sequence $A_{i}\in\mathfrak{F}$ , $\mathfrak{G}\coloneqq\left\langle\{A_{i}\mid i\in\mathbb{N}\}\right\rangle_{\sigma}$ and $Y$ a $\mathfrak{G}$ –measurable function. We have $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\geq Y$ iff

[TABLE]

Recall that $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)$ is a random variable that is like $X$ , but for those elements that are not distinguishable in the sub– $\sigma$ –field $\mathfrak{G}$ , it “distributes the value of $X$ equally”. This statement is formulated by the following lemma.

Lemma 27 (Expected Value Does Not Change When Regarding Conditional Expected Values).

Let $X$ be a random variable on $(\Omega,\mathfrak{F},\mathbb{P})$ and let $\mathfrak{G}$ be a sub– $\sigma$ –field of $\mathfrak{F}$ . Then

[TABLE]

Proof.

[TABLE]

∎

The following theorem shows (a) that linear operations carry over to conditional expected values w.r.t. sub– $\sigma$ –fields, (b) that every random variable approximates itself if it is already measurable w.r.t. the sub– $\sigma$ –field $\mathfrak{G}$ , and (c) it allows to simplify multiplications with $\mathfrak{G}$ –measurable random variables. Moreover, (d) shows how to simplify expected values with several conditions.

Theorem 28 (Properties of Conditional Expected Value (Grimmett and

Stirzaker, 2001, p. 443)).

Let $X,Y$ be random variables on $(\Omega,\mathfrak{F},\mathbb{P})$ and let $\mathfrak{G}$ be a sub– $\sigma$ –field of $\mathfrak{F}$ . Then the following properties hold.

(a)

$\phantom{}\mathbb{E}\left(a\cdot X+b\cdot Y\mid\mathfrak{G}\right)=a\cdot\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)+b\cdot\phantom{}\mathbb{E}\left(Y\mid\mathfrak{G}\right)$ . 2. (b)

If $X$ is itself $\mathfrak{G}$ –measurable then $\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)=X$ . 3. (c)

If $X$ is $\mathfrak{G}$ –measurable then $\phantom{}\mathbb{E}\left(X\cdot Y\mid\mathfrak{G}\right)=X\cdot\phantom{}\mathbb{E}\left(Y\mid\mathfrak{G}\right)$ . 4. (d)

If $\mathfrak{G}\subseteq\mathfrak{G}$ is a sub– $\sigma$ –field of $\mathfrak{G}$ then $\phantom{}\mathbb{E}\left(\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)\mid\mathfrak{G}\right)=\phantom{}\mathbb{E}\left(X\mid\mathfrak{G}\right)$ .

Appendix C Proofs for Section 4

We start this section with a crucial observation which will ease the proofs we conduct here. Reconsider the filtration $(\mathfrak{F}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ as presented in Def. 5. For every $n\in\mathbb{N}$ and every two distinct prefixes $\pi\neq{\pi}^{\prime}$ of length $n+1$ , their generated cylinder sets are disjoint, i.e., $Cyl(\pi)\cap Cyl({\pi}^{\prime})=\emptyset$ and $\biguplus_{\pi\in\Sigma^{+},\,|\pi|=n+1}Cyl(\pi)=\Omega$ . Therefore,

[TABLE]

As $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}=\mathfrak{F}_{n+1}^{\mbox{\tiny\rm loop}}$ (cf. Def. 8) we directly get

[TABLE]

See 9

Proof.

We have to prove that $X_{n}^{f,I}$ is $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ –measurable, i.e., $\left(X_{n}^{f,I}\right)^{-1}(B)\in\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ for any $B\in\mathfrak{B}$ .

Consider any run $\vartheta\in\Omega$ . $X_{n}^{f,I}(\vartheta)$ just depends on $\vartheta[0],\cdots,\vartheta[n+1]$ , i.e., $X_{n}^{f,I}$ is constant on the cylinder set $Cyl(\vartheta[0]\cdots\vartheta[n+1])$ . Therefore it is constant on the generators of $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ , i.e., on every $Cyl(\pi)$ with $|\pi|=n+2$ . Since there are only countably many of these generators, $\left(X_{n}^{f,I}\right)^{-1}(B)$ for any $B\in\mathfrak{B}$ is a countable union of generators of $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ , i.e., $\left(X_{n}^{f,I}\right)^{-1}(B)\in\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ . ∎

See 10

Proof.

Due to Lem. 9, $X_{n}^{f,\Phi_{f}(I)}$ is $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ –measurable. So it is left to show that for any $G\in\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$

[TABLE]

Due to Lem. 23 it is enough to show 2 for the generators of $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ , as any other set in $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ is just a disjoint union of these generators. Hence, we prove 2 for $G=Cyl(\pi)$ for a prefix run $\pi\in\Sigma^{n+2}$ . Furthermore, if $\pi\not\in s\Sigma^{n+1}$ the set $Cyl(\pi)$ is a nullset and hence, 2 holds trivially. So assume that $s_{0}\cdots s_{n}=\pi\in s\Sigma^{n+1}$ and that $Cyl(\pi)$ is not a nullset, i.e., ${}^{s}p(\pi)>0$ . Note that if $s_{i}\in\Sigma_{\neg\varphi}$ for some $i\leq n$ , then $X_{n+1}^{f,I}$ and $X_{n}^{f,\Phi_{f}(I)}$ are identical on $Cyl(\pi)$ , as then $T^{\neg\varphi}(\vartheta)\leq n\leq n+1$ for all $\vartheta\in Cyl(\pi)$ . So in this case 2 holds trivially, too. Hence, we assume $\pi=s_{0}\cdots s_{n+1}\in\Sigma_{\varphi}^{n}\Sigma$ . We will use a case analysis to prove the desired result.

(1)

$s_{n+1}\in\Sigma_{\neg\varphi}$ , i.e., $T^{\neg\varphi}(\vartheta)=n+1$ for all $\vartheta\in Cyl(\pi)$

[TABLE] 2. (2)

$s_{n+1}\in\Sigma_{\varphi}$ , i.e., $T^{\neg\varphi}(\vartheta)>n+1$ for all $\vartheta\in Cyl(\pi)$

[TABLE]

∎

See 11

Proof.

Let us fix an arbitrary state $s\in\Sigma$ . We will prove the result by induction.

•

Induction base: $n=0$

[TABLE]

•

Assume the result holds for a fixed $n\in\mathbb{N}$ .

[TABLE]

∎

See 13

Proof.

Let $\vartheta$ in $\Omega$ .

[TABLE]

If the program is universally almost–surely terminating (i.e., $\phantom{}{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\infty\right)=\phantom{}^{s}\mathbb{P}\left((T^{\neg\varphi})^{-1}(\mathbb{N})\right)=1$ for any $s\in\Sigma$ ), then $\phantom{}{}^{s}\mathbb{P}\left(X_{n}^{f,I}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]=X_{n}^{f,I}\right)=1$ for any $s\in\Sigma$ . Furthermore, $\phantom{}{}^{s}\mathbb{P}\left((T^{\neg\varphi})^{-1}(\mathbb{N})\right)=1$ so by the previous result $\mathbf{X}^{f,I}$ converges point–wise to $X_{T^{\neg\varphi}}^{f}$ on a set with probability $1$ . By definition $\mathbf{X}^{f,I}$ converges almost–surely to $X_{T^{\neg\varphi}}^{f}$ . ∎

See 14

Proof.

Consider the expectation $I\coloneqq 0$ and the indicator function $\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ . Then by definition $X_{n}^{f,0}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]\leq X_{n+1}^{f,0}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ for all $n\in\mathbb{N}$ . To apply the Monotone Convergence Theorem we will calculate the expectation of $X_{n}^{f,0}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ . Note that $X_{n}^{f,0}$ is zero on the set $(T^{\neg\varphi})^{-1}(\{\omega\})$ . Hence,

[TABLE]

Hence by the Monotone Convergence Theorem (Cor. 20)

[TABLE]

∎

See 17

Proof.

By Def. 15, $\mathbf{X}^{f,I}$ is uniformly integrable for every $s\in\Sigma$ if and only if for every $s\in\Sigma$ the equation $\lim\limits_{n\to\omega}\phantom{}^{s}\mathbb{E}\left(X_{n}^{f,I}\right)=\phantom{}^{s}\mathbb{E}\left(\lim\limits_{n\to\omega}X_{n}^{f,I}\right)\overset{\textnormal{\lx@cref{creftype~refnum}{lemma:as_limit}}}{=}\phantom{}^{s}\mathbb{E}\left(X_{T^{\neg\varphi}}^{f}\right)$ holds. But due to Thm. 14 and AST, we have $\phantom{}{}^{s}\mathbb{E}\left(X^{f}_{T^{\neg\varphi}}\right)=(\textnormal{{{lfp}}}~{}\Phi_{f})(s)$ . So $\mathbf{X}^{f,I}$ is uniformly integrable iff $\phantom{}{}^{s}\mathbb{E}\left(X_{n}^{f,I}\right)$ converges to $(\textnormal{{{lfp}}}~{}\Phi_{f})(s)$ . By Cor. 11, this is equivalent to the requirement that $\Phi_{f}^{n}(I)$ converges to $\textnormal{{{lfp}}}~{}\Phi_{f}$ . This is the definition of uniform integrability of $I$ for $f$ , cf. Def. 8. ∎

Appendix D Proofs for Section 5

See 2

Proof.

Let $n\in\mathbb{N}$ . We have already proved in Lem. 9 that $X_{n}^{f,I}$ is $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ –measurable. First of all, by Cor. 11 we have $\phantom{}{}^{s}\mathbb{E}\left(X_{n}^{f,I}\right)=\Phi_{f}^{n+1}(I)(s)<\infty$ . Secondly, by Thm. 10 and $\Phi_{f}(I)\succeq I$ we have $\phantom{}{}^{s}\mathbb{E}\left(X_{n+1}^{f,I}\mid\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}\right)=X_{n}^{f,\Phi_{f}(I)}\geq X_{n}^{f,I}$ . By Def. 1 this proves that $\mathbf{X}^{f,I}$ is a submartingale with respect to $(\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}})_{n\in\mathbb{N}}$ . ∎

See 8

Proof.

First of all, $X_{n}^{0,\Delta I}$ is $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ –measurable as seen in Lem. 9. For any $\vartheta\in\Omega$ we have

[TABLE]

as $I$ harmonizes with $f$ . To show the result we will prove

[TABLE]

for any $\pi\in\Sigma^{n+2}$ and use Lem. 23 to obtain the desired result. Note that both sides of this equality are [math] if $\phantom{}{}^{s}\mathbb{P}\left(Cyl(\pi)\right)=0$ , so in this case the equality holds trivially.

Take any $\pi\in\Sigma^{n+2}$ such that $\phantom{}{}^{s}\mathbb{P}\left(Cyl(\pi)\right)\neq 0$ . Furthermore, as both random variables $\big{|}X_{n+1}^{f,I}-X_{n}^{f,I}\big{|}$ and $X_{n}^{0,\Delta I}$ are constant zero if all runs in $Cyl(\pi)$ have a looping time $\leq n+1$ , 4 holds trivially in this case as well. Note that for any $X\in\mathbb{F}$ , $H(X)(s)=\left[{\varphi}\right](s)\cdot\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({X}\right)(s)$ is zero if $s\not\vDash\varphi$ . So, assume $\pi=s_{0}\cdots s_{n+1}\in s\Sigma^{n+1}_{\varphi}$ .

[TABLE]

As we have already seen in Appendix C that $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}=\left\{\biguplus_{\pi\in J}Cyl(\pi)\mid J\subseteq\Sigma^{n+2}\right\}$ , we use Lem. 23 to conclude our desired result $\phantom{}{}^{s}\mathbb{E}\left(\big{|}X_{n+1}^{f,I}-X_{n}^{f,I}\big{|}~{}\Big{|}~{}\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}\right)~{}{}={}~{}X_{n}^{0,\Delta I}$ . ∎

The following auxiliary lemma is needed for the proof of Thm. 9.

Lemma 1 (Sufficient Condition for Uniform Integrability for a Fixed State).

Let $I\mathrel{{\prec}{\prec}}\infty$ be a conditionally difference bounded expectation that harmonizes with $f\mathrel{{\prec}{\prec}}\infty$ , $\Phi_{f}(I)\mathrel{{\prec}{\prec}}\infty$ and $s\in\Sigma$ . Let the expected looping time of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ be finite for $s\in\Sigma$ , where $C$ is AST, i.e., $\phantom{}{}^{s}\mathbb{E}\left(T^{\neg\varphi}\right)<\infty$ and $\Phi_{f}(I)(s)<\infty$ . Then $\Phi_{f}^{n}(I)(s)<\infty$ for all $n\in\mathbb{N}$ and

[TABLE]

Proof.

We present a proof based on the proof of the Optional Stopping Theorem given in (Grimmett and Stirzaker, 2001, Thm 12.5.(9)). Consider the process $\mathbf{X}^{f,I}$ as studied in Sect. 4. As $I$ harmonizes with $f$ , we have seen in Thm. 8 that if $I$ is conditionally difference bounded by the constant $c\geq 0$ , then $\phantom{}{}^{s}\mathbb{E}\left(\left\lvert X_{n+1}^{f,I}-X_{n}^{f,I}\right\rvert\mid\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}\right)\leq c$ , where $\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}$ belongs to the filtration defined in Def. 8. Now we will show that $\mathbf{X}^{f,I}$ is uniformly integrable w.r.t. $\phantom{}{}^{s}\mathbb{P}$ .

Note that by definition of $\mathbf{X}^{f,I}$ (Def. 7), we have $\mathbf{X}^{f,I}_{\wedge T^{\neg\varphi}}=\mathbf{X}^{f,I}$ : Let $n\in\mathbb{N}$ and $\vartheta\in\Omega$ . Then if $T^{\neg\varphi}(\vartheta)\leq n$ , we have $X^{f,I}_{n\wedge T^{\neg\varphi}(\vartheta)}(\vartheta)=X^{f,I}_{T^{\neg\varphi}(\vartheta)}(\vartheta)=f(\vartheta[T^{\neg\varphi}(\vartheta)])=X^{f,I}_{n}(\vartheta)$ . If on the other hand $T^{\neg\varphi}(\vartheta)>n$ , we have $X^{f,I}_{n\wedge T^{\neg\varphi}(\vartheta)}(\vartheta)=X^{f,I}_{n}(\vartheta)$ .

We have for any $n\in\mathbb{N}$ , and any run $\vartheta\in\Omega^{\mbox{\tiny\rm loop}}$ :

[TABLE]

We will show that the expectation of $W^{f,I}$ is finite.

[TABLE]

By Lem. 21 the uniform integrability of $\mathbf{X}^{f,I}$ w.r.t. $\phantom{}{}^{s}\mathbb{P}$ follows. Therefore, we have that $\lim\limits_{n\to\omega}\Phi_{f}^{n}(I)(s)=\textnormal{{{lfp}}}~{}\Phi_{f}(s)$ by Cor. 17. Furthermore we can extract the following for any $n\in\mathbb{N}$ :

[TABLE]

So we have just shown that $\Phi_{f}^{n}(I)(s)<\infty$ for any $n\in\mathbb{N}$ . ∎

Corollary 2 (Sufficient Condition for Uniform Integrability).

Let $I\mathrel{{\prec}{\prec}}\infty$ be a conditionally difference bounded expectation that harmonizes with $f\mathrel{{\prec}{\prec}}\infty$ , $\Phi_{f}(I)\mathrel{{\prec}{\prec}}\infty$ , and let the expected looping time of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ be finite for every initial state $s\in\Sigma$ , where $C$ is AST. Then I is uniformly integrable for $f$ and $\Phi^{n}_{f}(I)\mathrel{{\prec}{\prec}}\infty$ for any $n\in\mathbb{N}$ .

Proof.

The result follows immediately by applying Lem. 1 to every state $s\in\Sigma$ . ∎

See 9

Proof.

That the subinvariant $I$ is a lower bound iff it is uniformly integrable for $f$ is exactly Thm. 7. Nevertheless, we present a proof for the whole theorem in analogy to the Optional Stopping Theorem (Thm. 3).

First of all, recall that $\mathbf{X}^{f,I}_{\wedge T^{\neg\varphi}}=\mathbf{X}^{f,I}$ holds (cf. the proof of Cor. 2).

Secondly, in any of the three cases (a) to (c), we have $\Phi_{f}^{n}(I)\mathrel{{\prec}{\prec}}\infty$ for any $n\in\mathbb{N}$ : in (a) it is a precondition, in (b) it holds due to Cor. 2, and in (c) the boundedness of $f$ and $I$ implies that $\Phi_{f}^{n}(I)$ is bounded as well. So in particular, it is finite (cf. (McIver and Morgan, 2005)). Therefore in any of the three cases, $\mathbf{X}^{f,I}$ is a submartingale by Lem. 2 as $I$ is a subinvariant.

Furthermore, in any of the three cases (a) to (c), $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ is universally almost surely terminating. Hence by Lem. 13 we have

[TABLE]

almost–surely for every $s\in\Sigma$ . So if we can prove for all of the three cases (a) to (c) that $\mathbf{X}^{f,I}=\mathbf{X}^{f,I}_{\wedge T^{\neg\varphi}}$ is uniformly integrable for any $s\in\Sigma$ , then we have independent of $s\in\Sigma$ :

[TABLE]

i.e., $I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)$ as desired.

We will now use the Optional Stopping Theorem (Thm. 3) and Cor. 2 to prove the uniform integrability.

(a)

Let $s\in\Sigma$ . Then there is an $N(s)\in\mathbb{N}$ with $\phantom{}{}^{s}\mathbb{P}\left(T^{\neg\varphi}\leq N(s)\right)$ , i.e., the looping time of $\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}$ is almost–surely bounded for any $s\in\Sigma$ . So by Thm. 3 (a), $\mathbf{X}^{f,I}$ is uniformly integrable for any $s\in\Sigma$ . 2. (b)

Due to Cor. 2, $I$ is uniformly integrable for $f$ . Hence $\mathbf{X}^{f,I}$ is uniformly integrable by Cor. 17. As $\mathbf{X}^{f,I}$ is a submartingale, the result follows from Thm. 3. 3. (c)

If $f$ and $I$ are bounded, then so is the process $\mathbf{X}^{f,I}=\mathbf{X}^{f,I}_{\wedge T^{\neg\varphi}}$ . By Thm. 3 (c) $\mathbf{X}^{f,I}$ is uniformly integrable.

∎

Example 3 (Details on Ex. 7 and 11).

Reconsider the program $C_{cex}$ , given by ex:calculations_running_example

[TABLE]

The characteristic function of the while loop with respect to postexpectation $b$ is given by

[TABLE]

We have seen that

[TABLE]

and

[TABLE]

are fixed points of $\Phi_{b}$ and in Ex. 11 we proved that $I$ is indeed the least fixed point. So, ${I}^{\prime}$ cannot be the least fixed point. Hence, ${I}^{\prime}$ cannot be uniformly integrable and in particular, it cannot be conditionally difference bounded. Indeed, $\Delta{I}^{\prime}$ is unbounded: Let $s\in\Sigma$ . For any $x\in\mathsf{Vars}$ and any arithmetic expression $e$ , let $s[x/e]$ denote the state with $s[x/e](y)=s(y)$ for $y\in\mathsf{Vars}\setminus\{x\}$ and $s[x/e](x)=s(e)$ , where $s(e)$ is obtained by extending states from variables to arithmetic expressions in the straightforward way.

[TABLE]

So we have $\Delta{I}^{\prime}=\lambda s.\left[{a\neq 0}\right](s)\cdot(2^{s(k)}+1)$ which is unbounded. So ${I}^{\prime}$ does not satisfy the preconditions of Thm. 9 (b), hence our proof rule sorts out this invariant. Note that neither (a) (as the looping time is unbounded) nor (c) (as neither $b$ nor ${I}^{\prime}$ are bounded) are applicable.

Appendix E Proofs for Section 6

We will show that Thm. 1 can be easily inferred from our results in Sect. 4 and we can generalize (3) to a complete proof rule. To do so, we will make use of the Martingale Convergence Theorem of which we present a specialized version suitable for our purposes:

Theorem 1 (Martingale Convergence Theorem (Grimmett and

Stirzaker, 2001, Thm.12.3.(1))).

Let $(X_{n})_{n\in\mathbb{N}}$ be a submartingale on a probability space $(\Omega,\mathfrak{F},\mathbb{P})$ with respect to a filtration $(\mathfrak{F}_{n})_{n\in\mathbb{N}}$ . If there is a constant $c\geq 0$ such that $X_{n}\leq c$ for every $n\in\mathbb{N}$ then there exists a random variable $X_{\omega}$ such that $\phantom{}\mathbb{P}\left(\{\vartheta\in\Omega\mid\lim\limits_{n\to\omega}X_{n}(\vartheta)=X_{\omega}(\vartheta)\}\right)=1,$ i.e., $(X_{n})_{n\in\mathbb{N}}$ converges almost surely to $X_{\omega}$ . Furthermore, $(X_{n})_{n\in\mathbb{N}}$ is uniformly integrable, i.e., $\lim\limits_{n\to\omega}\phantom{}\mathbb{E}\left(X_{n}\right)=\phantom{}\mathbb{E}\left(X_{\omega}\right).$ Moreover,

[TABLE]

If $(X_{n})_{n\in\mathbb{N}}$ is a martingale, i.e., for all $n\in\mathbb{N}$ we have $\phantom{}\mathbb{E}\left(X_{n+1}\mid\mathfrak{F}_{n}\right)=X_{n}$ , then we even have

[TABLE]

Now let $f,I\in\mathbb{F}$ be bounded such that $I\preceq\Phi_{f}(I)$ , i.e., $I$ is a subinvariant and assume there is some $c\geq 0$ with $f,I\preceq c$ . Then the process $\mathbf{X}^{f,I}$ satisfies $X^{f,I}_{n}\leq c$ for every $n\in\mathbb{N}$ . By Thm. 1 there exists a random variable $X^{f,I}_{\omega}$ such that $X^{f,I}_{n}$ converges to $X^{f,I}_{\omega}$ almost surely. By Lem. 13 we get that for any run $\vartheta\in\Omega^{{\mbox{\tiny\rm loop}}}$ with $T^{\neg\varphi}(\vartheta)<\omega$ we must have $X^{f,I}_{\omega}(\vartheta)=X^{f}_{T^{\neg\varphi}}(\vartheta)$ , i.e., w.l.o.g. we can assume $X^{f,I}_{\omega}\cdot\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]=X^{f}_{T^{\neg\varphi}}$ . As $I$ is a subinvariant we have by Lem. 2 that $\mathbf{X}^{f,I}$ is a submartingale. We conclude for an arbitrary initial state $s\in\Sigma$

[TABLE]

Consequently,

[TABLE]

If $I$ is a fixed point of $\Phi_{f}$ , then the process $\mathbf{X}^{f,I}$ is a martingale. By Thm. 10 we have for an arbitrary initial state $s$

[TABLE]

Hence, in this case $=$ instead of $\leq$ holds in 5. We will now discuss the results of Thm. 1. First of all, $T(s)=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({1}\right)(s)=\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\omega\right)$ by using $X^{1,T}=\left[{(T^{\neg\varphi})^{-1}(\mathbb{N})}\right]$ and Thm. 14.

We will now prove See 1

Proof.

(1)

Assume $I=\left[{G}\right]$ for some predicate $G$ , i.e., $I(s)\in\{0,1\}$ . W.l.o.g. let $I(s)=1$ as the claim holds trivially if $I(s)=0$ . Then $\phantom{}{}^{s}\mathbb{E}\Bigl{(}\left[{(T^{\neg\varphi})^{-1}(\{\omega\})}\right]\cdot\underbrace{X^{f,I}_{\omega}}_{\leq 1}\Bigr{)}\leq\phantom{}^{s}\mathbb{E}\Bigl{(}\left[{(T^{\neg\varphi})^{-1}(\{\omega\})}\right]\Bigr{)}=\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}=\omega\right).$ By 5 we get $I(s)\cdot T(s)=T(s)=\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\omega\right)=1-\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}=\omega\right)=I(s)-\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}=\omega\right)\leq\textnormal{{{lfp}}}~{}\Phi_{f}(s),$ so we have

[TABLE] 2. (2)

Assume that for some predicate $G$ we have $\left[{G}\right]\preceq T$ . Again, w.l.o.g. let $\left[{G}\right](s)=1$ . But then we must have $1\leq T(s)=\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\omega\right)\leq 1$ , i.e., $\phantom{}{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\omega\right)=1$ . So, $\phantom{}{}^{s}\mathbb{E}\left(\left[{(T^{\neg\varphi})^{-1}(\{\omega\})}\right]\cdot X^{f,I}_{\omega}\right)=0$ and by 5 we have $\left[{G}\right](s)\cdot I(s)=I(s)\leq\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)(s)$ , i.e.,

[TABLE] 3. (3)

Assume there is some $\varepsilon>0$ with $\varepsilon\cdot I\preceq T$ . By definition, $T=\textnormal{{{lfp}}}~{}\Phi_{1}$ . By 5, we have

[TABLE]

i.e., $\phantom{}{}^{s}\mathbb{E}\left(\left[{(T^{\neg\varphi})^{-1}(\{\omega\})}\right]\cdot X^{1,T}_{\omega}\right)=0$ . By definition, $\frac{T}{\varepsilon}$ is a fixed point of $\Phi_{\frac{1}{\varepsilon}}$ . Thus,

[TABLE]

So by 5 we can conclude that $I(s)\leq\textnormal{{{lfp}}}~{}\Phi_{f}(s)$ for any state $s$ , i.e.,

[TABLE]

∎

Notice that we have not used the fact that $T$ is the termination probability but only that $T=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)$ for some bounded postexpectation $f$ . Furthermore, if $I$ is a lower bound, by definition $I\preceq\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)$ . Hence, we have generalized Thm. 1 (3) in case of a loop with universally almost–surely terminating body to a complete characterization of lower bounds. So we have proved the following theorem.

See 2

Example 2 (Details on Ex. 3).

Let us consider the program $C_{rdw}$

[TABLE]

with $x,y\in\mathbb{N}$ and $y\leq 100$ . Note that this program is not AST. Furthermore, the postexpectation $y$ is bounded. If $y\leq x$ initially then $y$ is [math] after termination of the program. So, $\textsf{{wp}}\left\llbracket{C_{rdw}}\right\rrbracket\left({y}\right)\geq\left[{y>x}\right]\cdot\left(\tfrac{1}{3}\right)^{x}\cdot(y-x)\coloneqq I$ .

Now consider $f=\left[{y\text{ even}}\right]\cdot 200\cdot y^{2}+\left[{y\text{ odd}}\right]\cdot(y+5)^{4}$ . We have $I^{\prime}\leq\Phi_{f}(I^{\prime})$ , where $I^{\prime}=400\cdot I$ .

[TABLE]

If $s(x)>0$ , then obviously $I(s)\leq\Phi_{f}(I^{\prime})(s)$ by the calculation above. If $s(x)=0$ , then we have

[TABLE]

as for every even $y$ we have $200\cdot y^{2}\geq 400\cdot y$ and for every odd $y$ we have $200\cdot(y+5)^{4}\geq 400\cdot y$ .

We have $\tfrac{1}{400}\cdot I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{rdw}}\right\rrbracket\left({y}\right)$ . Thus, we can conclude from Thm. 2 that $I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{rdw}}\right\rrbracket\left({f}\right)$ . Note that this is easier than relating $I^{\prime}$ and the termination probability as required in Thm. 1 as $y$ does not influence the termination behavior of the loop.

Appendix F Details for Section 8

F.1. Proofs

See 3

Proof.

Remember the connection between wp and ert (cf. (Olmedo et al., 2016, Thm. 5.2)): For any probabilistic program $P$ we have

[TABLE]

Our goal is to show ${}^{\textsf{{ert}}}\Phi^{n}_{t}(I)={}^{\textsf{{wp}}}\Phi^{n}_{t}(I)+{}^{\textsf{{ert}}}\Phi^{n}_{0}(0)$ for all $n\geq 1$ . We use induction on $n$ to prove this result. In the base case we have $n=1$ . Here, we obtain

[TABLE]

In the induction step we use the induction hypothesis ${}^{\textsf{{ert}}}\Phi^{n}_{t}(I)={}^{\textsf{{wp}}}\Phi^{n}_{t}(I)+{}^{\textsf{{ert}}}\Phi^{n}_{0}(0)$ . Then we have

[TABLE]

So ${}^{\textsf{{ert}}}\Phi^{n}_{t}(I)={}^{\textsf{{wp}}}\Phi^{n}_{t}(I)+{}^{\textsf{{ert}}}\Phi^{n}_{0}(0)$ holds for an arbitrary $n\in\mathbb{N}$ with $n\geq 1$ .

Now let $s\in\Sigma$ . Then one of the following two cases occurs.

(1)

$\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({0}\right)(s)=\infty$

In this case we have by 6 $\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({t}\right)(s)=\infty\geq I(s)$ . 2. (2)

$\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({0}\right)(s)<\infty$

In this case we have $\phantom{}{}^{s}\mathbb{E}\left(T^{\neg\varphi}\right)<\infty$ . First of all, in Thm. 9 (b) we have seen, that if $I$ is conditionally difference bounded, ${}^{\textsf{{wp}}}\Phi_{t}(I)\mathrel{{\prec}{\prec}}\infty$ and $\phantom{}{}^{s^{\prime}}\mathbb{E}\left(T^{\neg\varphi}\right)<\infty$ for every $s^{\prime}\in\Sigma$ then we have $\lim\limits_{n\to\omega}{}^{\textsf{{wp}}}\Phi^{n}_{t}(I)=\textnormal{{{lfp}}}~{}{}^{\textsf{{wp}}}\Phi_{t}$ . However, in Thm. 9 (b) we need that the expected looping time is finite for every initial state $s^{\prime}\in\Sigma$ . As we cannot ensure this condition, we use Lem. 1, a specialized result used in the proof of Thm. 9 which is indeed dependent on the initial state $s\in\Sigma$ .

Furthermore, the expected runtime of the program with initial state $s\in\Sigma$ is finite so, the expected looping time of the program has to be finite as well, i.e., $\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({0}\right)(s)<\infty$ implies $\phantom{}{}^{s}\mathbb{E}\left(T^{\neg\varphi}\right)<\infty$ .

Hence, as $I$ harmonizes with $t$ , $I$ is conditionally difference bounded, ${}^{\textsf{{wp}}}\Phi_{t}(I)\mathrel{{\prec}{\prec}}\infty$ , and $\phantom{}{}^{s}\mathbb{E}\left(T^{\neg\varphi}\right)<\infty$ . Thus, we can apply Lem. 1 and get

[TABLE]

Hence we have

[TABLE]

But the sequence ${}^{\textsf{{ert}}}\Phi^{n}_{t}(I)_{n\in\mathbb{N}}$ is monotonically increasing as $I$ is an ert–subinvariant. Hence, $I(s)\leq\lim\limits_{n\to\omega}{}^{\textsf{{ert}}}\Phi^{n}_{t}(I)(s)=\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({t}\right)(s)$ .

Combining these results, we get $I\leq\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({t}\right)$ . ∎

F.2. Details for Example 4

More detailed annotations for the outer loop of the coupon collector are as follows:

[TABLE]

For the inner loop, we make use of the following Lemma, for which we also give a detailed proof in the following:

Lemma 1.

Let $t\in\mathbb{F}$ be a runtime (i.e. an expectation) such that $t$ does not depend on program variable $i$ . Then the following expected runtime annotation is valid:

[TABLE]

Proof.

We employ (Batz et al., 2018, Theorem 4) for so-called $t$ –independent and identically distributed loops ( $t$ -i.i.d. loops for short) (see (Batz et al., 2018, Definition 5)). In order to verify the $t$ -i.i.d.-ness of the loop $\textnormal{{while}}\left(\,{0<x<i}\,\right)\left\{\,{i\mathrel{\textnormal{{:=}}}\mathrm{Unif}[1..N]}\,\right\}$ , we have to establish that neither

[TABLE]

nor

[TABLE]

depend on program variable $i$ which is indeed the case by the assumption that $t$ does not depend on $i$ . Additionally to $t$ -i.i.d.-ness, (Batz et al., 2018, Theorem 4) requires us to establish that

[TABLE]

does not depend on variable $i$ and that the loop body terminates almost-surely, i.e.

[TABLE]

Both conditions are obviously true.

Having established all preconditions of (Batz et al., 2018, Theorem 4), we can now make the following ert-annotations (recall that such annotations are best read from bottom to top):

[TABLE]

It is important to note that, again, any loop semantics needed to be applied only a finite number of times. In particular, it was not necessary to find the limit of a sequence or anything alike. ∎

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Agrawal et al . (2018) Sheshansh Agrawal, Krishnendu Chatterjee, and Petr Novotný. 2018. Lexicographic Ranking Supermartingales: An Efficient Approach to Termination of Probabilistic Programs. PACMPL 2, POPL (2018), 34:1–34:32.
3Audebaud and Paulin-Mohring (2009) Philippe Audebaud and Christine Paulin-Mohring. 2009. Proofs of Randomized Algorithms in Coq. Science of Computer Programming 74, 8 (2009), 568–589.
4Back and von Wright (1998) Ralph-Johan Back and Joakim von Wright. 1998. Refinement Calculus - A Systematic Introduction . Springer.
5Baranga (1991) Andrei Baranga. 1991. The Contraction Principle as a Particular Case of Kleene’s Fixed Point Theorem. Discrete Mathematics 98, 1 (1991), 75–79.
6Barthe et al . (2016) Gilles Barthe, Thomas Espitau, Luis María Ferrer Fioriti, and Justin Hsu. 2016. Synthesizing Probabilistic Invariants via Doob’s Decomposition. In Proc. of the International Conference on Computer–Aided Verification (CAV) (Lecture Notes in Computer Science) , Vol. 9779. Springer, 43–61.
7Batz et al . (2019) Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Thomas Noll. 2019. Quantitative Separation Logic: a Logic for Reasoning about Probabilistic Pointer Programs. PACMPL 3, POPL (2019), 34:1–34:29.
8Batz et al . (2018) Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Christoph Matheja. 2018. How Long, O Bayesian Network, will I Sample Thee? - A Program Analysis Perspective on Expected Sampling Times. In Proc. of the European Symposium on Programming Languages and Systems (ESOP) (Lecture Notes in Computer Science) , Vol. 10801. Springer, 186–213.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Aiming Low Is Harder

Abstract.

1. Introduction and Overview

2. Weakest Preexpectation Reasoning

Definition 1 (Expectations (Kaminski, 2019; McIver and Morgan, 2005)).

2.1. Weakest Preexpectations

2.2. The Weakest Preexpectation Calculus

Definition 2 (The wp–Transformer (McIver and Morgan, 2005)).

Example 3 (Applying the wp Calculus).

Theorem 4 (Healthiness Conditions (Kaminski, 2019; McIver and Morgan, 2005)).

3. Bounds on Weakest Preexpectations

3.1. Upper Bounds

Theorem 1 (Park Induction (Park, 1969)).

Corollary 2 (Park Induction for wp (Kozen, 1985; Kaminski, 2019)).

Example 3 (Induction for Upper Bounds).

Theorem 4 (Tarski-Knaster Fixed Point Theorem (Tarski, 1955; Knaster, 1928)).

Theorem 5 (Tarski-Kantorovich Principle, see (Jachymski

3.2. Lower Bounds

Counterexample 6 (Simple Induction for Lower Bounds).

3.3. Problem Statement

3.4. Uniform Integrability

Theorem 7 (Subinvariance and Lower Bounds).

Definition 8 (Uniform Integrability of Expectations).

4. From Expectations to Stochastic Processes

4.1. Canonical Probability Space

Definition 1 (Loop Space).

Lemma 2 (Loop Measure (Feller, 1971, Kolmogorov’s Extension Theorem)).

Definition 3 (Expected Value for Loops ∙E\boldsymbol{{}^{\bullet}{}\mathbb{E}}∙E).

Definition 4 (Looping Time).

Definition 5 (Loop Filtration).

Definition 6 (Stopping Time).

4.2. Canonical Stochastic Process

Definition 7 (Induced Stochastic Process).

Definition 8 (Shifted Loop Filtration).

Lemma 9 (Adaptedness of Induced Stochastic Process).

Theorem 10 (Relating Xf,I\boldsymbol{\mathbf{X}^{f,I}}Xf,I and Φf\boldsymbol{\Phi_{f}}Φf​).

Corollary 11 (Relating Expected Values of Xf,I\boldsymbol{\mathbf{X}^{f,I}}Xf,I and Iterations of Φf\boldsymbol{\Phi_{f}}Φf​).

Definition 12 (Canonical Stopped Process).

Lemma 13 (Convergence of Xf,I\boldsymbol{\mathbf{X}^{f,I}}Xf,I to XT¬φf\boldsymbol{X_{T^{\neg\varphi}}^{f}}XT¬φf​).

Theorem 14 (Weakest Preexpectation is Expected Value of Stopped Process).

4.3. Uniform Integrability

Definition 15 (Uniform Integrability of Stochastic Processes, (Grimmett and

Counterexample 16 ((Grimmett and

Corollary 17 (Uniform Integrability of Expectations and Stochastic Processes).

5. The Optional Stopping Theorem of Weakest Preexpectations

Definition 1 (Submartingale).

Lemma 2 (Subinvariant Induces Submartingale).

Theorem 3 (Optional Stopping Theorem (Grimmett and

Definition 4 (Harmonization).

Corollary 5 (Harmonizing Expectations).

Definition 6 (Conditional Difference Boundedness).

Example 7.

Theorem 8 (Expected Change of I\boldsymbol{I}I).

Theorem 9 (Optional Stopping Theorem for Weakest Preexpectation Reasoning).

Counterexample 10.

Example 11.

6. Lower Bound Rules by McIver and Morgan

Theorem 1 ((McIver and Morgan, 2005)).

Theorem 2 (Generalization of Thm. 1 (3)).

Example 3.

7. Upper Bounds and Fatou’s Lemma

Lemma 1 (Fatou’s Lemma (cf. (Bauer, 1971, Lemma 2.7.1))).

8. Lower bounds on the expected runtime

Definition 1 (The ert–Transformer (Kaminski et al., 2018, 2016)).

Example 2 (Applying the ert Calculus).

Theorem 3 (Inductive Lower Bounds on Expected Runtimes).

Example 4 (Coupon Collector (Pólya, 1930)).

9. Related Work

Weakest preexpectation reasoning.

Bounds on weakest preexpectations.

Advanced weakest preexpectation calculi.

Martingale–based reasoning.

10. Conclusion

Acknowledgements.

Definition 3 (Expected Value for Loops $\boldsymbol{{}^{\bullet}{}\mathbb{E}}$ ).

Theorem 10 (Relating $\boldsymbol{\mathbf{X}^{f,I}}$ and $\boldsymbol{\Phi_{f}}$ ).

Corollary 11 (Relating Expected Values of $\boldsymbol{\mathbf{X}^{f,I}}$ and Iterations of $\boldsymbol{\Phi_{f}}$ ).

Lemma 13 (Convergence of $\boldsymbol{\mathbf{X}^{f,I}}$ to $\boldsymbol{X_{T^{\neg\varphi}}^{f}}$ ).

Theorem 8 (Expected Change of $\boldsymbol{I}$ ).

B.1. $\sigma$ –Fields

Definition 1 ( $\sigma$ –Field).

Definition 2 (Generating $\sigma$ –Fields).

Lemma 3 (Generating $\sigma$ –Fields for Covering of $\Omega$ ).

Definition 22 (Conditional Expected Value w.r.t. a $\sigma$ –Field, (Bauer, 1971, Def. 10.1.2)).