Aiming Low Is Harder -- Induction for Lower Bounds in Probabilistic Program Verification
Marcel Hark, Benjamin Lucien Kaminski, J\"urgen Giesl, Joost-Pieter, Katoen

TL;DR
This paper introduces a new inductive rule for verifying lower bounds on expected values and runtimes in probabilistic loops, simplifying the process by avoiding the need to compute limits of sequences.
Contribution
The paper presents a novel inductive rule that simplifies lower bound verification in probabilistic program analysis by eliminating the need for limit calculations.
Findings
The new rule effectively verifies lower bounds without sequence limits.
Simplifies probabilistic loop analysis by applying loop body semantics finitely.
Enhances the efficiency of probabilistic program verification methods.
Abstract
We present a new inductive rule for verifying lower bounds on expected values of random variables after execution of probabilistic loops as well as on their expected runtimes. Our rule is simple in the sense that loop body semantics need to be applied only finitely often in order to verify that the candidates are indeed lower bounds. In particular, it is not necessary to find the limit of a sequence as in many previous rules.
| skip | |
|---|---|
| skip | |
|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Aiming Low Is Harder
Induction for Lower Bounds in Probabilistic Program Verification
Marcel Hark
RWTH Aachen UniversityGermany
,
Benjamin Kaminski
RWTH Aachen UniversityGermany
,
Jürgen Giesl
RWTH Aachen UniversityGermany
and
Joost-Pieter Katoen
RWTH Aachen UniversityGermany
(2020)
Abstract.
We present a new inductive rule for verifying lower bounds on expected values of random variables after execution of probabilistic loops as well as on their expected runtimes. Our rule is simple in the sense that loop body semantics need to be applied only finitely often in order to verify that the candidates are indeed lower bounds. In particular, it is not necessary to find the limit of a sequence as in many previous rules.
probabilistic programs, verification, weakest precondition, weakest preexpectation, lower bounds, optional stopping theorem, uniform integrability
††copyright: none††doi: 10.1145/3371105††journalyear: 2020††journal: PACMPL††journalvolume: 4††journalnumber: POPL††publicationmonth: 1††ccs: Mathematics of computing Probabilistic algorithms††ccs: Mathematics of computing Markov processes††ccs: Theory of computation Denotational semantics
1. Introduction and Overview
We study probabilistic programs featuring discrete probabilistic choices as well as unbounded loops. Randomized algorithms are the classical application of such programs. Recently, applications in biology, quantum computing, cyber security, machine learning, and artificial intelligence led to rapidly growing interest in probabilistic programming (Gordon et al., 2014).
Formal verification of probabilistic programs is strictly harder than for nonprobabilistic programs (Kaminski et al., 2019). Given a random variable , a key verification task is to reason about the expected value of after termination of a program on input . If is the indicator function of an event , then this expected value is the probability that has occurred on termination of .
For verifying probabilistic loops, most approaches share a common, conceptually very simple, technique: an induction rule for verifying upper bounds on expected values, which are characterized as least fixed points (lfp) of a suitable function . This rule, called “Park induction”, reads
[TABLE]
i.e., for a candidate upper bound we check (for a suitable partial order ) to prove that is indeed an upper bound on the least fixed point, and hence on the sought–after expected value.
For lower bounds, a simple proof principle analogous to Park induction, namely
[TABLE]
is unsound in general. Sound rules (see Sect. 9), on the other hand, often suffer from the fact that either needs to be bounded, or that one has to find the limit of some sequence, as well as the sequence itself, rendering those rules conceptually much more involved than Park induction.
Our main contribution (Sect. 5, Thm. 9) is to provide relatively simple side conditions that can be added to the (unsound) implication above, such that the implication becomes true, i.e.,
[TABLE]
In particular, our side conditions will be simple in the sense that (a variation of) needs to be applied to a candidate only a finite number of times, which is beneficial for potential automation.
The need for verifying lower bounds on expected values is quite natural: First of all, they help to assess the quality and tightness of upper bounds. Moreover, giving total correctness guarantees for probabilistic programs amounts to lower–bounding the correctness probability, e.g., in order to establish membership in complexity classes like RP and PP.
In addition to expected values of random variables at program termination, lower bounds on expected runtimes are also of significant interest: Lower bounds on expected runtimes which depend on secret program variables may compromise the secret, thus allowing for timing side–channel attacks; “very large” lower bounds could indicate potential denial–of–service attacks.
In order to enable practicable reasoning about lower bounds on expected runtimes, we will show how our inductive lower bound rule carries over to expected runtimes (Sect. 8, Thm. 3). As an example to show the applicability of our rule, we will verify that the well–known and notoriously difficult coupon collector’s problem (Motwani and Raghavan, 1995) (Sect. 8, Ex. 4), modeled by the probabilistic program111The random assignment does not — strictly speaking — adhere to our syntax of binary probabilistic choices, but it can be modeled in our syntax. For the sake of readability, we opted for .
[TABLE]
has an expected runtime of at least , where is the -th harmonic number.
Our new inductive rules will be stated in terms of so–called expectation transformers (McIver and Morgan, 2005) (Sect. 2) and rely on the notions of uniform integrability (Sect. 3, in particular 3.4, and Sect. 4), martingales, conditional difference boundedness, and the Optional Stopping Theorem (Sect. 5) from the theory of stochastic processes. However, we do not only make use of these notions in order to prove soundness of our induction rule, but instead establish tight connections in terms of these notions between expectation transformers and certain canonical stochastic processes (Sect. 4, Thm. 14 and Sect. 5, Thm. 8). In particular, we will build upon the key result of this connection (Thm. 14) to study exactly how inductive proof rules for both upper and lower bounds can be understood in the realm of these stochastic processes and vice versa (Sect. 5, Thm. 9 and Sect. 7). We see those connections between the theories of expectation transformers and stochastic processes as a stepping stone for applying further results from stochastic process theory to probabilistic program analysis and possibly also vice versa.
As a final contribution, we revisit one of the few existing rules for lower bounds due to (McIver and Morgan, 2005), which gives sufficient criteria for a candidate being a lower bound on the expected value of a bounded function . We show that their rule is also a consequence of uniform integrability and we are moreover able to generalize their rule to a necessary and sufficient criterion (Sect. 6, Thm. 2). We demonstrate the usability of our generalization by an example (Sect. 6, Ex. 3).
The appendix contains more case studies illustrating the effectiveness of our lower bound proof rule, a more detailed introduction to probability theory, and more detailed proofs of our results.
2. Weakest Preexpectation Reasoning
Weakest preexpectations for probabilistic programs are a generalization of Dijkstra’s weakest preconditions for nonprobabilistic programs. Dijkstra employs predicate transformers, which push a postcondition (a predicate) backward through a nonprobabilistic program and yield the weakest precondition (another predicate) describing the largest set of states such that whenever is started in a state satisfying , terminates in a state satisfying .222We consider total correctness, i.e., from any state satisfying the weakest precondition , definitely terminates.
The weakest preexpecation calculus on the other hand employs expectation transformers which act on real–valued functions called expectations, mapping program states to non–negative reals.333For simplicity of the presentation, we study the standard case of positive expectations. Mixed–sign expectations mapping to the full extended reals require much more technical machinery, see (Kaminski and Katoen, 2017). These transformers push a postexpectation backward through a probabilistic program and yield a preexpectation , such that represents the expected value of after executing . The term expectation coined by (McIver and Morgan, 2005) may appear somewhat misleading at first. We clearly distinguish between expectations and expected values: An expectation is hence not an expected value, per se. Instead, we can think of an expectation as a random variable. In Bayesian network jargon, expectations are also called factors.
Definition 1 (Expectations (Kaminski, 2019; McIver and Morgan, 2005)).
Let denote the finite set of program variables and let denote the set of program states.444We choose rationals to have some range of values at hand which are conveniently represented in a computer.
The set of expectations, denoted by , is defined as
[TABLE]
where . We say that is finite and write , if for all . A partial order on is obtained by point–wise lifting the usual order on , i.e.,
[TABLE]
* is a complete lattice where suprema and infima are constructed point–wise.*
We note that our notion of expectations is more general than the one of McIver and Morgan: Their work builds almost exclusively on bounded expectations, i.e., non–negative real–valued functions which are bounded from above by some constant, whereas we allow unbounded expectations. As a result, we have that forms a complete lattice, whereas McIver and Morgan’s space of bounded expectations does not.
2.1. Weakest Preexpectations
Given program and postexpectation , we are interested in the expected value of evaluated in the final states reached after termination of . More specifically, we are interested in a function mapping each initial state of to the respective expected value of evaluated in the final states reached after termination of on input . This function is called the weakest preexpectation of with respect to , denoted \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right). Put as an equation, if is the probability (sub)measure555 is the probability that is the final state reached after termination of on input . We have , where the “missing” probability mass is the probability of nontermination of on . over final states reached after termination of on initial state , then666As is countable, the integral can be expressed as .
[TABLE]
While \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right) in fact represents an expected value, itself does not. In an analogy to Dijkstra’s pre– and postconditions, as is evaluated in the final states after termination of it is called the postexpectation, and as \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right) is evaluated in the initial states of it is called the preexpectation.
2.2. The Weakest Preexpectation Calculus
We now show how to determine weakest preexpectations in a systematic and compositional manner by recapitulating the weakest preexpectation calculus à la McIver and Morgan. This calculus employs expectation transformers which move backward through the program in a continuation–passing style, see Fig. 1.
If we are interested in the expected value of some postexpectation after executing the sequential composition , then we can first determine the weakest preexpectation of with respect to , i.e., \textsf{{wp}}\left\llbracket{C_{2}}\right\rrbracket\left({f}\right). Thereafter, we can use the intermediate result \textsf{{wp}}\left\llbracket{C_{2}}\right\rrbracket\left({f}\right) as postexpectation to determine the weakest preexpectation of with respect to \textsf{{wp}}\left\llbracket{C_{2}}\right\rrbracket\left({f}\right). Overall, this gives the weakest preexpectation of with respect to the postexpectation . The above explanation illustrates the compositional nature of the weakest preexpectation calculus. wp–transformers for all language constructs can be defined by induction on the program structure:
Definition 2 (The wp–Transformer (McIver and Morgan, 2005)).
Let pGCL be the set of programs in the probabilistic guarded command language. Then the weakest preexpectation transformer
[TABLE]
is defined according to the rules given in Table 1, where denotes the Iverson–bracket of , i.e., evaluates to if and to [math] otherwise. Moreover, for any variable and any expression , let be the expectation with for any , where and for all .
We call the function the characteristic function of with respect to . Its least fixed point is understood in terms of . To increase readability, we omit wp, , , or from whenever they are clear from the context.
Example 3 (Applying the wp Calculus).
Consider the probabilistic program given by
[TABLE]
Suppose we want to know the expected value of after execution of . For this, we determine \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({b}\right). Using the annotation style shown in Fig. 2(a), we can annotate the program as shown in Fig. 2(b), using the rules from Table 1. At the top, we read off the weakest preexpectation of with respect to , namely . This tells us that the expected value of after termination of on is equal to .
The wp–transformer satisfies what is sometimes called healthiness conditions (Hino et al., 2016; Keimel, 2015; McIver and Morgan, 2005) or homomorphism properties (Back and von Wright, 1998):
Theorem 4 (Healthiness Conditions (Kaminski, 2019; McIver and Morgan, 2005)).
Let , ,777That is, is a chain. , and . Then:
- (1)
Continuity:* \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({\sup~{}S}\right)~{}{}={}~{}\sup~{}\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({S}\right) .* 2. (2)
Strictness:888Here, we overload notation and denote by [math] the constant expectation that maps every to [math].* \textsf{{wp}}\left\llbracket{C}\right\rrbracket is strict, i.e., \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({0}\right)=0.* 3. (3)
Monotonicity:* f~{}{}\preceq{}~{}g\quad\textnormal{implies}\quad\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({g}\right).* 4. (4)
Linearity:* \textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({r\cdot f+g}\right)~{}{}={}~{}r\cdot\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({f}\right)+\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({g}\right).*
3. Bounds on Weakest Preexpectations
For loop–free programs, it is generally straightforward to determine weakest preexpectations, simply by applying the rules in Table 1, which guide us along the syntax of , see Ex. 3. Weakest preexpectations of loops, on the other hand, are generally non–computable least fixed points and we often have to content ourselves with some approximation of those fixed points.
For us, a sound approximation is either a lower or an upper bound on the least fixed point. There are in principle two challenges: (1) finding a candidate bound and (2) verifying that the candidate is indeed an upper or lower bound. In this paper, we study the latter problem.
3.1. Upper Bounds
The Park induction principle provides us with a very convenient proof rule for verifying upper bounds. In general, this principle reads as follows:
Theorem 1 (Park Induction (Park, 1969)).
Let be a complete lattice and let be continuous.999It would even suffice for to be monotonic, but we consider continuous functions throughout this paper. Then has a least fixed point in and for any ,
[TABLE]
In the realm of weakest preconditions, Park induction gives rise to the following induction principle:
Corollary 2 (Park Induction for wp (Kozen, 1985; Kaminski, 2019)).
Let be the characteristic function of the while loop with respect to postexpectation and let . Then
[TABLE]
We call an that satisfies a superinvariant. The striking power of Park induction is its simplicity: Once an appropriate candidate is found (even though this is usually not an easy task), all we have to do is push it through the characteristic function once and check whether we went down in our underlying partial order. If this is the case, we have verified that is indeed an upper bound on the least fixed point and thus on the sought–after weakest preexpectation.
Example 3 (Induction for Upper Bounds).
Consider the program , given by
[TABLE]
where we assume . Suppose we aim at an upper bound on the expected value of executing . Using the annotation style of Fig. 3(a), we can annotate the loop as shown in Fig. 3(b), using superinvariant , establishing , and by Cor. 2 establishing \textsf{{wp}}\left\llbracket{C_{\mathit{geo}}}\right\rrbracket\left({b}\right)\preceq b+\left[{a\neq 0}\right]. So the expected value of after termination of on is at most .
Let us explain why Park induction is sound using the so-called Tarski-Knaster Fixed Point Theorem:101010 In (Hark et al., 2020), we explain Park induction via an extended version of Thm. 5. However, this explanation was incorrect and is fixed here. This does not have any consequences regarding the results of (Hark et al., 2020).
Theorem 4 (Tarski-Knaster Fixed Point Theorem (Tarski, 1955; Knaster, 1928)).
Let be a complete lattice and be continuous. Then the set of fixed points of is a complete lattice. Here, is the least fixed point of and is the greatest fixed point of .
Again, for this theorem it would even suffice for to be only monotonic.
In our setting, the characterization of the least fixed point in Thm. 4 proves soundness of Park induction (recall that is continuous, see Thm. 4). For making a comparison to the lower bound case which we consider later, let us introduce the so-called Tarski-Kantorovich Principle:
Theorem 5 (Tarski-Kantorovich Principle, see (Jachymski
et al., 2000)).
Let be a complete lattice, be continuous, and let such that . Then the sequence is an ascending chain that converges to an element
[TABLE]
which is a fixed point of . In particular, is the least fixed point of that is .
The well–known Kleene Fixed Point Theorem (cf. (Lassez et al., 1982)), which states that , where is the least element of , is a special case of the Tarski–Kantorovich Principle.
3.2. Lower Bounds
For verifying lower bounds, we do not have a rule as simple as Park induction available. In particular, for a given complete lattice and monotonic function , the rule
[TABLE]
is unsound in general. We call an satisfying a subinvariant and the above rule simple lower induction. Generally, we will call an that is a sub– or a superinvariant an invariant. being an invariant thus expresses mainly its inductive nature, namely that is comparable with with respect to the partial order .
An explanation why simple lower induction is unsound is as follows: By Thm. 5, we know from that is the least fixed point of that is greater than or equal to . Since is a fixed point, holds, but we do not know how compares to . We only know that if indeed and , then iterating on also converges to , i.e.,
[TABLE]
If, however, and is strictly greater than , then iterating on will yield a fixed point strictly greater than , contradicting soundness of simple lower induction.
While we just illustrated by means of the Tarski–Kantorovich principle why the simple lower induction rule is not sound in general, we should note that the rule is not per se absurd: So called metering functions (Frohn et al., 2016) basically employ simple lower induction to verify lower bounds on runtimes of nonprobabilistic programs (Kaminski, 2019, Thm. 7.18). For weakest preexpectations, however, simple lower induction is unsound:
Counterexample 6 (Simple Induction for Lower Bounds).
Consider the following loop , where
[TABLE]
As in Ex. 3, \textsf{{wp}}\left\llbracket{C_{\mathit{cex}}}\right\rrbracket\left({b}\right)=b+\left[{a\neq 0}\right]. In particular, this weakest preexpectation is independent of . The corresponding characteristic function is
[TABLE]
Let us consider , which does depend on . Indeed, one can check that , i.e., is a subinvariant. If the simple lower induction rule were sound, we would immediately conclude that is a lower bound on \textsf{{wp}}\left\llbracket{C_{\mathit{cex}}}\right\rrbracket\left({b}\right), but this is obviously false since
[TABLE]
3.3. Problem Statement
The purpose of this paper is to present a sound lower induction rule of the following form: Let be the characteristic function of with respect to and let . Then
[TABLE]
We still want our lower induction rule to be simple in the sense that checking the side conditions should be conceptually as simple as checking . Intuitively, we want to apply the semantics of the loop body only finitely often, not times, to avoid reasoning about limits of sequences or anything alike. We provide such side conditions in our main contribution, Thm. 9, which transfers the Optional Stopping Theorem of probability theory to weakest preexpectation reasoning.
3.4. Uniform Integrability
We now present a sufficient and necessary criterion to under–approximate the least fixed points that we seek for. Let again be the characteristic function of with respect to . Thm. 4 implies that is continuous and monotonic.
Let us consider a subinvariant , i.e., . If we iterate on ad infinitum, then the Tarski–Kantorovich principle (Thm. 5) guarantees that we will converge to some fixed point that is . From continuity of and Thm. 5, one can easily show that coincides with if and only if itself was already , i.e.:
Theorem 7 (Subinvariance and Lower Bounds).
For any subinvariant , we have
[TABLE]
More generally, for any expectation (not necessarily a sub- or superinvariant), if iterating on converges to the least fixed point of , then we call uniformly integrable for :
Definition 8 (Uniform Integrability of Expectations).
Given a loop , an expectation is called uniformly integrable (u.i.) for if exists and
[TABLE]
So far, we have thus established the following diagram which we will gradually extend over the next two sections:
{I~{}\textnormal{u.i.\ for}~{}f\quad}$${\quad\Phi_{f}^{n}(I)\xrightarrow{~{}n\to\omega~{}}\textnormal{{{lfp}}}~{}\Phi_{f}}$${I\preceq\Phi_{f}(I)\Rightarrow I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}}Def. 8
and Def. 8
Uniform integrability (Grimmett and Stirzaker, 2001) — a notion originally from probability theory — will be essential for the Optional Stopping Theorem in Sect. 5. While, so far, we have studied the function solely from an expectation transformer point of view and defined a purely expectation–theoretical notion of uniform integrability without any reference to probability theory, we will study in Sect. 4 the function from a stochastic process point of view. Stochastic processes are not inductive per se, whereas expectation transformers make heavy use of induction. We will, however, rediscover the inductiveness also in the realm of stochastic processes. We will also see how our notion of uniform integrability corresponds to uniform integrability in its original sense.
4. From Expectations to Stochastic Processes
In this section, we connect concepts from expectation transformers with notions from probability theory. In Sect. 4.1, we recapitulate standard constructions of probability spaces for probabilistic programs, instantiate them in our setting, and present our new results on connecting expectation transformers with stochastic processes (Sect. 4.2) and uniform integrability (Sect. 4.3). Proofs can be found in App. C. For further background on probability theory, we refer to App. B and (Bauer, 1971; Grimmett and Stirzaker, 2001).
We fix for this section an arbitrary loop . The loop body may contain loops but we require to be universally almost–surely terminating (AST), i.e., terminates on any input with probability 1. The set of program states can be uniquely partitioned into , with iff . The set thus contains the terminal states from which the loop is not executed further.
4.1. Canonical Probability Space
We begin with constructing a canonical probability measure and space corresponding to the execution of our loop. As every pGCL program is, operationally, a countable Markov chain, our construction is similar to the standard construction for Markov chains (cf. (Vardi, 1985)).
In general, a measurable space is a pair consisting of a sample space and a –field of , which is a collection of subsets of , closed under complement and countable union, such that . In our setting, a loop induces the following canonical measurable space:
Definition 1 (Loop Space).
The loop induces a unique measurable space with sample space i.e., it is the set of all infinite sequences of program states (so–called runs). For , we denote by the –th state in the sequence (starting to count at 0). The –field is the smallest –field that contains all cylinder sets , for all finite prefixes , denoted as
[TABLE]
Intuitively, a run is an infinite sequence of states where represents the initial state on which the loop is started and is a state that could be reached after iterations of the loop. Obviously, some sequences in may not actually be admissible by our loop.
We next develop a canonical probability measure corresponding to the execution of the loop, which will assign the measure [math] to inadmissible runs. We start with considering a single loop iteration. The loop body induces a family of distributions111111Since the loop body is AST, these are distributions and not subdistributions.
[TABLE]
where is the probability that after one iteration of on , the program is in state .
The loop induces a family of probability measures on . This family is parameterized by the initial state of the loop. Using the distributions above, we first define the probability of a finite non–empty prefix of a run, i.e., for . Here, is the probability that is the sequence of states reached after the first loop iterations, when starting the loop in state . Hence, the family
[TABLE]
of distributions on is defined by
- (1)
2. (2)
.
Using the family , we now obtain a canonical probability measure on the loop space.
Lemma 2 (Loop Measure (Feller, 1971, Kolmogorov’s Extension Theorem)).
There exists a unique family of probability measures with
[TABLE]
We now turn to random variables and their expected values. A mapping on a probability space is called (–)measurable or random variable if for any open set its preimage lies in , i.e., . If , then this is equivalent to checking for any . The expected value of a random variable is defined as .121212Details on integrals for arbitrary measures can be found in App. B. If takes only countably many values we have
[TABLE]
We saw that gives rise to a unique canonical measurable space and to a family of probability measures parameterized by the initial state on which our loop is started. We now define a corresponding parameterized expected value operator .
Definition 3 (Expected Value for Loops ).
Let and be a random variable. The expected value of with respect to the loop measure , parameterized by state , is defined by .
Next, we define a random variable that corresponds to the number of iterations that our loop makes until it terminates.
Definition 4 (Looping Time).
The mapping
[TABLE]
is a random variable and called the looping time of . Here, and .
The canonical –field contains infinite runs. But after iterations of the loop we only know the first states of a run. Gaining knowledge in this successive fashion can be captured by a so–called filtration of the –field . In general, a filtration is a sequence of subsets of , such that and is a –field for any , i.e., is approximated from below.
Definition 5 (Loop Filtration).
The sequence is a filtration of , where
[TABLE]
i.e., is the smallest –field containing .131313Note that here which is not the case for general filtrations.
Next, we recall the notion of stopping times from probability theory.
Definition 6 (Stopping Time).
For a probability space with filtration , a random variable is called a stopping time with respect to if for every we have .
Let us reconsider the looping time and the loop filtration . In order to decide for a run whether its looping time is , we only need to consider the states . Hence, for any and thus is a stopping time with respect to .
Note that does not reflect the actual runtime of , as it does not take the runtime of the loop body into account. Instead, only counts the number of loop iterations of the “outer loop” . This enriches the class of probabilistic programs our technique will be able to analyze, as we will not need to require that the whole program has finite expected runtime, but only that the outer loop is expected to be executed finitely often.
4.2. Canonical Stochastic Process
Now we can present our novel results on the connection of weakest preexpectations and stochastic processes. Henceforth, let . Intuitively, will play the role of the postexpectation and the role of an invariant (i.e., is a sub– or superinvariant). We now present a canonical stochastic process, i.e., a sequence of random variables that captures approximating \textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right) using the invariant .
Definition 7 (Induced Stochastic Process).
The stochastic process induced by , denoted , is given by
[TABLE]
Now, in what sense does the stochastic process capture approximating the weakest preexpectation of our loop with respect to by invariant ? takes as argument a run of the loop and assigns to a value as follows: If the loop has reached a terminal state within iterations, it returns the value of the postexpectation evaluated in that terminal state. If no such terminal state is reached within steps, it simply approximates the remainder of the run, i.e.,
[TABLE]
by returning the value of the invariant evaluated in . We see that needs at most the first states of a run to determine its value. Thus, is not –measurable but –measurable, as there exist runs that agree on the first states but yield different images under . Hence, we shift the loop filtration by one.
Definition 8 (Shifted Loop Filtration).
The filtration of is defined by
[TABLE]
Note that , so is a stopping time w.r.t. as well.
Lemma 9 (Adaptedness of Induced Stochastic Process).
* is adapted to , i.e., is –measurable.*
The loop space, the loop measure, and the induced stochastic process are not defined by induction on the number of steps performed in the program. The loop space, for instance, contains all infinite sequences of states, whether they are admissible by the loop or not. The loop measure filters out the inadmissible runs and gives them probability 0.
Reasoning by invariants and characteristic functions, on the other hand, is inductive. We will thus relate iterating a characteristic function on to the stochastic process . For this, let again be the characteristic function of with respect to , i.e.,
[TABLE]
We now develop a first connection between the stochastic process and , which involves the notion of conditional expected values with respect to a –field, for which we provide some preliminaries here. In general, for , by slight abuse of notation, the Iverson bracket maps to if and to [math] otherwise. is –measurable iff . If is a random variable on and is a –field with respect to , then the conditional expected value is a –measurable mapping such that for every the equality holds, i.e., restricted to the set the conditional expected value and have the same expected value. Hence, is a random variable that is like , but for elements that are indistinguishable in the subfield , i.e., they either are both contained or none of them is contained in a –measurable set, it “distributes the value of equally”.
Theorem 10 (Relating and ).
For any and any , we have
[TABLE]
Note that both sides in Thm. 10 are mappings of type . Intuitively, Thm. 10 expresses the following: Consider some cylinder , i.e., is a sequence of states of length . Then, and have the same expected value under on the cylinder set independent of the initial state of the loop.
Using Thm. 10, one can now explain in which way iterating on represents an expected value, thus revealing the inductive structure inside the induced stochastic process:
Corollary 11 (Relating Expected Values of and Iterations of ).
For any and any , we have
[TABLE]
Intuitively, represents allowing for at most evaluations of the loop guard. For any state , the number is composed of
- (a)
’s average value on the final states of those runs starting in that terminate within guard evaluations, and
- (b)
’s average value on the –nd states of those runs starting in that do not terminate within guard evaluations.
We now want to take to the limit by considering all possible numbers of iterations of the loop body. We will see that this corresponds to evaluating the stochastic process at the time when our loop terminates, i.e., the looping time :
Definition 12 (Canonical Stopped Process).
The mapping
[TABLE]
is the stopped process, corresponding to stopped at stopping time . As this mapping is independent of , we write instead of .
The stopped process now corresponds exactly to the quantity we want to reason about — the value of evaluated in the final state after termination of our loop. For nonterminating runs we get [math], as there exists no state in which to evaluate .
We now show that the limit of the induced stochastic process corresponds to the stopped process . For the following lemma, note that a statement over runs holds almost–surely in the probability space , if , i.e., the set of all elements of the sample space satisfying has probability .
Lemma 13 (Convergence of to ).
The stochastic process converges point–wise to , i.e., for all ,
[TABLE]
So if is universally almost–surely terminating, then converges to almost–surely with respect to the measure for any .
Intuitively, the factor selects those runs where the looping time is finite. If the loop is AST, then this factor can be neglected, because then is the constant function for the probability measures . In any case, (i.e., whether the looping time is almost–surely finite or not) the expected value of the stopped process captures precisely the weakest preexpectation of our loop with respect to the postexpectation , since only the terminating runs are taken into account by and by when computing the expected value of after termination of the loop. So from Cor. 11 and Lem. 13 we get our first main result:
Theorem 14 (Weakest Preexpectation is Expected Value of Stopped Process).
[TABLE]
Thm. 14 captures our sought–after least fixed point as an expected value of a canonical stopped process. This is what will allow us to later apply the Optional Stopping Theorem. Moreover, it is crucial for deriving our generalization of an existing rule for lower bounds (cf. Sect. 6) and the connection of upper bounds to the Lemma of Fatou (cf. Sect. 7).
4.3. Uniform Integrability
As we will see in Sect. 5, uniform integrability of a certain stochastic process is the central aspect of the Optional Stopping Theorem (Thm. 3). In probability theory, uniform integrability means that taking the expected value and taking the limit of a stochastic process commutes.
Definition 15 (Uniform Integrability of Stochastic Processes, (Grimmett and
Stirzaker, 2001, Lemma 7.10.(3))).
Let be a stochastic process on a probability space with almost–surely existing limit . The process is uniformly integrable if
[TABLE]
Counterexample 16 ((Grimmett and
Stirzaker, 2001, Sect. 7.10)).
Consider the stochastic process on a probability space with . Then . Moreover, converges almost surely to , i.e., the constant function [math]. So, . But
[TABLE]
so is not u.i.
Note that our notion of uniform integrability of expectations from Def. 8 coincides with uniform integrability of the corresponding induced stochastic process.
Corollary 17 (Uniform Integrability of Expectations and Stochastic Processes).
Let the loop be AST.141414It suffices that for any . But this is equivalent to AST as we required the body of the loop to be AST. Then is uniformly integrable for (in the sense of Def. 8) iff the induced stochastic process is uniformly integrable (in the sense of Def. 15), i.e.,
[TABLE]
Cor. 17 justifies the naming in Def. 8: an expectation is uniformly integrable for iff its induced process is uniformly integrable. So, we can now extend the diagram from Sect. 3.4 as follows:
{\mathbf{X}^{f,I}~{}\textnormal{u.i.}\quad}$${\quad\phantom{}{}^{\bullet}\mathbb{E}\left(X_{n}^{f,I}\right)\xrightarrow{~{}n\to\omega~{}}\phantom{}^{\bullet}\mathbb{E}\left(X_{T^{\neg\varphi}}^{f}\right)}$${I~{}\textnormal{u.i.\ for}~{}f\quad}$${\quad\Phi_{f}^{n}(I)\xrightarrow{~{}n\to\omega~{}}\textnormal{{{lfp}}}~{}\Phi_{f}}$${I\preceq\Phi_{f}(I)\Rightarrow I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}} Cor. 17Lem. 13 and 15
and Thm. 14
and Def. 8
Uniform integrability is very hard to verify in general, both in the realm of stochastic processes as well as in the realm of expectation transformers. Thus, one usually tries to find sufficient criteria for uniform integrability that are easier to verify. The very idea of the Optional Stopping Theorem is to provide such sufficient criteria for uniform integrability which then allow deriving a lower bound as we will discuss in the next section.
5. The Optional Stopping Theorem of Weakest Preexpectations
In this section, we develop an inductive proof rule for lower bounds on preexpectations by using the results of Sect. 4 and the Optional Stopping Theorem (Thm. 3). The proofs of our results in this section can be found in App. D. Recall that we have fixed a loop , a finite postexpectation , a corresponding characteristic function , and another finite expectation which plays the role of an invariant.
We first introduce the Optional Stopping Theorem from probability theory. It builds upon the concept of submartingales. A submartingale is a stochastic process that induces a monotonically increasing sequence of its expected values.
Definition 1 (Submartingale).
Let be a stochastic process on a probability space adapted to a filtration of , i.e., a sequence of random variables such that is –measurable. Then is called a submartingale with respect to if
[TABLE]
It turns out that submartingales are closely related to subinvariants. In fact, being a subinvariant (plus some side conditions) gives us that the stochastic process induced by is a submartingale.
Lemma 2 (Subinvariant Induces Submartingale).
Let be a subinvariant, i.e., , such that for every , that is, only takes finite values. Then the induced stochastic process is a submartingale with respect to .
Given a submartingale and a stopping time , the goal of the Optional Stopping Theorem is to prove a lower bound for the expected value of at the stopping time . To this end, we define a stochastic process where for any , if is smaller than the stopping time and otherwise, . Hence, is the expected value of at the stopping time . The Optional Stopping Theorem asserts that the first component of the stochastic process is a lower bound for provided that is uniformly integrable. Moreover, the Optional Stopping Theorem provides a collection of criteria that are sufficient for uniform integrability of .
Theorem 3 (Optional Stopping Theorem (Grimmett and
Stirzaker, 2001, Theorems 12.3.(1), 12.4.(11), 12.5.(1), 12.5.(2), 12.5.(9))).
Let be a submartingale and be a stopping time on a probability space with respect to a filtration . Then defined by
[TABLE]
is also a submartingale w.r.t. . If converges almost–surely and is uniformly integrable,
[TABLE]
If one of the following conditions holds, then converges almost–surely and is uniformly integrable:
- (a)
* is almost–surely bounded, i.e., there is a constant such that .* 2. (b)
* and there is a constant , such that for each *
[TABLE] 3. (c)
There exists a constant such that holds almost–surely for every .
Our goal now is to transfer the Optional Stopping Theorem from probability theory to the realm of weakest preexpectations in order to obtain inductive proof rules for lower bounds on weakest preexpectations. So far, we have introduced the looping time (which is a stopping time w.r.t. ), presented the connection of subinvariants and submartingales, and defined the concept of uniform integrability also for expectations. Hence, the only missing ingredient is a proper connection of expectations to the condition “” in Thm. 3 (b). To translate this concept to expectations, we require that the expectation has a certain shape depending on the postexpectation .
Definition 4 (Harmonization).
An expectation harmonizes with if it is of the form
[TABLE]
for some expectation .
Def. 4 reflects that in terminal states of the loop the invariant evaluates to . For an invariant to harmonize with postexpectation is a minor restriction on the shape of . It is usually easy to choose an that takes the value of for states in which the loop is not executed at all. Moreover, performing one iteration of obviously brings any expectation “into shape”:
Corollary 5 (Harmonizing Expectations).
For any , harmonizes with .
The actual criterion that connects “” with the invariant is called conditional difference boundedness (see also (Fu and Chatterjee, 2019; Fioriti and Hermanns, 2015)):
Definition 6 (Conditional Difference Boundedness).
Let . We define the function and the expectation as151515Recall that we have fixed a loop .
[TABLE]
The expectation is called conditionally difference bounded (c.d.b.) if for some constant
[TABLE]
The expectation expresses the expected change of within one loop iteration. So, if is c.d.b. it is expected to change at most by a constant in one loop iteration.
Example 7.
Reconsider the program from Counterex. 6 and expectation . We will check conditional difference boundedness of , using the function given by
[TABLE]
We then check the following:
[TABLE]
Thus, is c.d.b. by the constant . In contrast, the subinvariant from Counterex. 6 is not conditionally difference bounded. Indeed, we would get (cf. App. D for details)
[TABLE]
which cannot be bounded by a constant.
Finally, we can connect the expected change of to a property of the stochastic process . This is our second major result.
Theorem 8 (Expected Change of ).
Let harmonize with . Then
[TABLE]
The stochastic process induced by exhibits an interesting correspondence: If is bounded by a constant (i.e., if is c.d.b.), then so is and thus Thm. 8 ensures that precondition (b) of the Optional Stopping Theorem (Thm. 3) is fulfilled. Note that Thm. 8 depends crucially on the fact that as otherwise the well–definedness of the expectation cannot be ensured.
Now Lem. 2 allows us to use the Optional Stopping Theorem from probability theory (Thm. 3) to prove a novel Optional Stopping Theorem for weakest preexpectations, which collects sufficient conditions for uniform integrability. In particular, due to Thm. 8, our Optional Stopping Theorem shows that our notion of conditional difference boundedness is an (easy–to–check) sufficient criterion for uniform integrability and hence, for ensuring that a subinvariant is indeed a lower bound for the weakest preexpectation under consideration. After stating the theorem, we will discuss the intuition of its parts in more detail.
Theorem 9 (Optional Stopping Theorem for Weakest Preexpectation Reasoning).
Consider a loop where is AST. Let be a subinvariant w.r.t. the postexpectation (i.e., ). is uniformly integrable for iff is a lower bound, i.e.,
[TABLE]
* is uniformly integrable for if one of the following three conditions holds:*
- (a)
The looping time of is almost–surely bounded, i.e., for every state there exists a constant with and for every . 2. (b)
The expected looping time of is finite for every initial state , harmonizes with , , and is conditionally difference bounded. 3. (c)
Both and are bounded and is AST.
We can now extend the diagram from Sect. 4 connecting the realm of stochastic processes (on the right) and the realm of expectation transformers (on the left) for a universally almost–surely terminating program. The respective Optional Stopping Theorems provide the sufficient criteria for uniform integrability, which is marked by the dashed implications.
{I~{}\textnormal{c.d.b.~{}by}~{}c\quad}$${\quad\phantom{}{}^{\bullet}\mathbb{E}\left(\big{|}X_{n+1}^{f,I}-X_{n}^{f,I}\big{|}~{}\Big{|}~{}\mathfrak{G}_{n}\right)\leq c}$${\mathbf{X}^{f,I}~{}\textnormal{u.i.}\quad}$${\quad\phantom{}{}^{\bullet}\mathbb{E}\left(X_{n}^{f,I}\right)\xrightarrow{~{}n\to\omega~{}}\phantom{}^{\bullet}\mathbb{E}\left(X_{T^{\neg\varphi}}^{f}\right)}$${I~{}\textnormal{u.i.\ for}~{}f\quad}$${\quad\Phi_{f}^{n}(I)\xrightarrow{~{}n\to\omega~{}}\textnormal{{{lfp}}}~{}\Phi_{f}}$${I\preceq\Phi_{f}(I)\Rightarrow I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}}Thm. 8 Thm. 9 (b) Thm. 3 Cor. 17Lem. 13 and 15
and Thm. 14
and Def. 8
Let us elaborate on the different cases of our Optional Stopping Theorem (Thm. 9): Case (a) yields an alternative proof for the technique of so–called metering functions by (Frohn et al., 2016) for deterministic terminating loops. As for the severity of the finiteness condition “ for every ”, note that if the body is loop–free, this condition is vacuously satisfied as itself is finite and cannot become infinite by finitely iterations of . If contains loops, then we can establish the finiteness condition by finding a finite superinvariant with . In this case, we can also guarantee .161616The reason is that we have for all by monotonicity of (Thm. 4). Then, implies also by monotonicity of , which gives us .
Case (b) applies whenever the outer loop is expected to be executed finitely often. In particular, this holds if the entire loop terminates positively almost–surely (i.e., within finite expected runtime).
To the best of our knowledge, Cases (a) and (b) are the first sufficiently simple induction rules for lower bounds that do not require restricting to bounded postexpectations . While the requirements on the loop’s termination behavior gradually weaken along , the requirements on the subinvariant become stricter.
Finally, Case (c) yields an alternative proof of the result of (McIver and Morgan, 2005) on inductive lower bounds for bounded expectations in case of AST, which we will generalize in Sect. 6.
When comparing the cases (c) of Thm. 3 and Thm. 9, we notice that Thm. 3 (c) has no restrictions on the stopping time, whereas Thm. 9 (c) requires almost–sure termination. This might spark some hope that AST is not needed in Thm. 9 (c), but the following counterexample shows that this is not the case:
Counterexample 10.
Consider the program
[TABLE]
together with the bounded postexpectation , i.e., we are interested in the termination probability which is obviously [math]. The corresponding characteristic function is given by
[TABLE]
i.e., is the identity map. Trivially, the bounded expectation is a fixed point of , thus in particular is a subinvariant. Clearly, is not a lower bound on the actual termination probability, i.e., on . If the condition of almost–sure termination in Thm. 9 (c)* could be weakened, it has to be ensured that for any program with universally almost–surely terminating body 171717Note that in this case is always a subinvariant. and postexpectation , is a lower bound only if the program terminates universally almost–surely. But this means that this property has to be at least as strong as almost–sure termination.*
We reconsider Counterex. 6 illustrating unsoundness of simple lower induction and do sound lower induction instead.
Example 11.
Let us continue Ex. 7, where we have checked that for the program the expectation is conditionally difference bounded by . It is easy to check that is a fixed point of the characteristic function with respect to the postexpectation , which by Park induction gives us a finite upper bound on the least fixed point of . But up to now we could not prove that is indeed equal to the least fixed point. Using Thm. 9, we can now do this.
First of all, we already have and since is a fixed point, it is also a subinvariant. Secondly, the loop is expected to be executed twice.181818Positive almost–sure termination itself can also be verified by Park induction, see (Kaminski et al., 2016, 2018). Finally, harmonizes with and is conditionally difference bounded. Hence, the preconditions of Thm. 9 (b) are satisfied and is indeed a lower bound on . Since is a fixed point, it is the least fixed point, i.e., we have proved \textsf{{wp}}\left\llbracket{C_{\mathit{cex}}}\right\rrbracket\left({b}\right)=I.
Further case studies demonstrating the effectiveness of our proof rule, as well as an example that cannot be treated by Thm. 9, are provided in App. A.
6. Lower Bound Rules by McIver and Morgan
In Sect. 5, we briefly mentioned the rules for lower bounds for bounded expectations by (McIver and Morgan, 2005) which are restated in Thm. 1 below. To the best of our knowledge, before our new Thm. 9 these were the only existing inductive proof rules for weakest preexpectations.
Theorem 1 ((McIver and Morgan, 2005)).
Let be a bounded postexpectation. Furthermore, let be a bounded expectation such that the harmonized expectation given by is a subinvariant of with respect to . Finally, let T=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({1}\right) be the termination probability of . Then:
- (1)
If for some predicate , then T\cdot I~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}. 2. (2)
If for some predicate , then \left[{G}\right]\cdot I~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}. 3. (3)
If for some , then I~{}{}\preceq{}~{}\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)~{}.
Thm. 1 does not make any assumptions on the termination behavior of the loop, so, it is also possible to analyze programs with termination probability . It turns out that Thm. 1 (1) – (3) can be proved easily from our results from Sect. 4 in the case where is AST where we do not need the restriction that harmonizes with . In particular, we can show that in Thm. 1 (3) the fact that is the probability of termination is insignificant (see App. E). In fact, it suffices if is the weakest preexpectation for some arbitrary bounded postexpectation, i.e., a least fixed point (see App. E for details and proofs). So, we obtain the following generalized version of Thm. 1 (3) in the case where is AST which is substantially more powerful: it states a sufficient condition for a subinvariant to be a lower bound but also a necessary condition. This is the main new contribution of this section.
Theorem 2 (Generalization of Thm. 1 (3)).
Let be a bounded postexpectation. Furthermore, let be a bounded expectation such that is a subinvariant of with respect to where is AST. There exist and bounded s.t.
[TABLE]
Example 3.
Let us consider the program for an asymmetric random walk
[TABLE]
with and . This program is not AST but the body of the loop is indeed AST. Furthermore, the postexpectation is bounded. If initially then is [math] after termination of the program. So, \textsf{{wp}}\left\llbracket{C_{\mathit{rdw}}}\right\rrbracket\left({y}\right)\geq\left[{y>x}\right]\cdot\left(\tfrac{1}{3}\right)^{x}\cdot(y-x)\coloneqq I.
Now consider . We have , where (see App. E). As we have \tfrac{1}{400}\cdot I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{\mathit{rdw}}}\right\rrbracket\left({y}\right) we can conclude from Thm. 2 that I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{\mathit{rdw}}}\right\rrbracket\left({f}\right). Note that this is easier than relating and the termination probability as required by Thm. 1 since the probability of termination of the loop is independent of .
Of course, Ex. 3 is an artificial example. Nevertheless, it shows a strength of our generalization: it makes it easier to reason about bounded expectations which are independent of the probability of termination. However, a drawback of Thm. 1 remains: one already needs a lower bound, i.e., one has to be able to read off a lower bound directly from the program.
7. Upper Bounds and Fatou’s Lemma
We saw that Park induction for proving upper bounds does not require additional conditions such as conditional difference boundedness or even boundedness of or , respectively. The question arises whether this fact is also explainable using our canonical stochastic process. Indeed, the well–known Lemma of Fatou provides such an explanation. We will present a specialized variant of it which is sufficient for our purpose.
Lemma 1 (Fatou’s Lemma (cf. (Bauer, 1971, Lemma 2.7.1))).
Let be a stochastic process on a probability space . Then
[TABLE]
where the on the left–hand–side is point–wise.
We can now reprove Park induction for wp using Fatou’s Lemma: Let be a superinvariant, i.e., . By Thm. 10, the canonical stochastic process satisfies
[TABLE]
By applying on both sides, we obtain . This implies
[TABLE]
as . We conclude
[TABLE]
so is indeed an upper bound on the least fixed point.
Note that here we handle arbitrary loops, i.e., they are not necessarily AST. While being a superinvariant (plus some side conditions) still implies that is a supermartingale, the second part of Lem. 13 is not applicable, i.e., in general we have if the loop is not AST. So in this case we cannot use classic results from martingale theory. Nevertheless, Fatou’s Lemma combined with Thm. 14 and the first part of Lem. 13 provide a connection of Park induction for upper bounds to stochastic processes.
8. Lower bounds on the expected runtime
So far, we have developed techniques for verifying lower bounds on weakest preexpectations, i.e., expected values of random variables upon program termination. In this section, we transfer those techniques to verify lower bounds on expected runtimes of probabilistic programs. For this, we employ the ert–transformer (Kaminski et al., 2018, 2016), which is very similar to the wp-transformer: Given program and postruntime , we are interested in the expected time it takes to first execute and then let time pass (where is evaluated in the final states reached after termination of ). Again, the behavior (and the runtime) of depends on its input, so we are actually interested in a function mapping initial states to the respective expected time. For more details, see also (Kaminski, 2019, Chapter 7). Similarly to weakest preexpectations, expected runtimes can be determined in a systematic and compositional manner by means of the ert calculus:
Definition 1 (The ert–Transformer (Kaminski et al., 2018, 2016)).
Let pGCL be again the set of programs in the probabilistic guarded command language. Then the expected runtime transformer
[TABLE]
is defined according to the rules given in Table 2. We call the function the ert–characteristic function of the loop with respect to . Its least fixed point is understood in terms of the partial order . To increase readability, we will again usually omit ert, , , or from whenever they are clear from the context.
Example 2 (Applying the ert Calculus).
Consider the probabilistic program given by
[TABLE]
Suppose we want to know the expected runtime of . Then we need to determine \textsf{{ert}}\,\left\llbracket{C}\right\rrbracket\,\left({0}\right). Reusing the annotation styles of Fig. 2(a) for wp, we make the following ert annotations:
[TABLE]
At the top, we read off the expected runtime of , namely . This tells us that the expected runtime of is if started in an initial state where is , and otherwise.
The ert– and the wp–transformers are not only similar in definition, but they are closely connected by the following equality (Olmedo et al., 2016):
[TABLE]
In addition, reasoning about upper bounds by Park induction works exactly the same way. For reasoning about lower bounds using subinvariants, notice above that \textsf{{ert}}\,\left\llbracket{C}\right\rrbracket\,\left({0}\right) is independent of . So, we can combine our derivation of Thm. 9 for lower bounds on wp in Sect. 4 and 5 with the equation above to establish the first inductive rule for verifying lower bounds on expected runtimes:
Theorem 3 (Inductive Lower Bounds on Expected Runtimes).
Let with and let harmonize with . Furthermore, let be the ert–characteristic function of the loop with respect to . If is conditionally difference bounded and , then
[TABLE]
We call an that satisfies a runtime subinvariant.
The proof of Thm. 3 can be found in Sect. F.1. We now illustrate the applicability of Thm. 3:
Example 4 (Coupon Collector (Pólya, 1930)).
Consider the well–known coupon collector’s problem: There are different types coupons. A collector wants to collect at least one of each type. Each time she buys a new coupon, its type is drawn uniformly at random. How many coupons does she (expectedly) need to buy in order to have collected at least one coupon of each type?
We can model this problem by the program for some non–zero natural number :191919In (Hark et al., 2020), the guard of the inner loop is just “” and thus, the body of the outer loop (as a standalone program) is not AST. Here, we fix this by changing the guard to “”. While we adapted the calculations accordingly, this does not affect our overall result since the outer loop is only entered if is positive.
[TABLE]
Variable represents the number of uncollected coupon types. The inner loop models the buying of new coupons until an uncollected type is drawn.202020The random assignment does — strictly speaking — not adhere to our pGCL syntax, but it can be modeled in pGCL. For the sake of readability, we opted for .
The expected runtime of is proportional to the expected number of coupons the collector needs to buy. We want to prove that is a lower bound on that expected runtime, where is the -th harmonic number, i.e., and . For this, we make the following annotations, reusing the annotation style of Fig. 3(a) (for more detailed annotations, see Sect. F.2):
[TABLE]
By our above annotations, we have shown that212121In (Hark et al., 2020), there was a typo in the invariant . In the second case, it said “” which leads to negative values. Here, we correct this mistake and adapt the calculations accordingly. Again, this does not change the overall result.
[TABLE]
is indeed a runtime subinvariant of the outer loop. Before we finish proving that is indeed a lower bound on the expected runtime of the outer loop, let us take a closer look at the meaning of : If is a lower bound, the outer loop takes at least expected runtime if is between and , and expected runtime if is larger than . In the second case, the too–large value suggests that we have to collect more coupons than there are different coupons. So we first collect arbitrary “excess coupons” before we enter the “normal coupon collector mode” and collect the remaining coupons in expected time . Indeed, (without case analysis) is not a lower bound on the expected runtime and we would in fact fail to prove its subinvariance.
For the inner loop, we have used the fact that this loop is a so–called independent and identically distributed loop, for which exact expected runtimes can be determined (Batz et al., 2018, Theorem 4). For more details, see Sect. F.2, Lem. 1. We stress that while in this case we had an exact expected runtime for the inner loop available by external techniques, a suitable underapproximation of the expected runtime of the inner loop using the technique presented in this paper (Thm. 3) would have worked as well. Hence, our technique is generally applicable to nested loops.
At the very top of the above annotations, we push over the initial assignment, thus verifying (and hence also ) as lower bound for the entire expected runtime of .
In order to establish that the subinvariant is in fact a lower bound, we are still left to prove conditional difference boundedness of . For this, we first make the following annotations:
[TABLE]
Now that we have determined \smash{\textsf{{wp}}\left\llbracket{\mathit{outer~{}loop~{}body}}\right\rrbracket\left({\bigl{|}I-I(s)\bigr{|}}\right)}, we finally bound :
[TABLE]
Hence, is bounded by a constant, as is constant within the program . Finally, we would still have to show , which is easily checked and thus omitted here. This concludes our lower bound proof for the coupon collector’s problem.
In the example above, we have verified that is a lower bound on the expected runtime of the coupon collector program. This lower bound enjoys several nice properties: For one, our lower bound is an exact asymptotic lower bound. Another fact is that our lower bound is a strict lower bound. The actual runtime is a bit higher, as we have omitted some constants. This is, however, a desirable fact, as often we are only interested in the asymptotic runtime and do not wish to bother with the constants. Notice further, that we never had to find the limit of any sequence. Loop semantics (be it wp or ert) were all applied only finitely many times in order to verify a tight asymptotic lower bound.222222This is also true for the technique we used for the inner loop. All in all, the above example demonstrates the effectiveness of our inductive lower bound rule.
9. Related Work
Weakest preexpectation reasoning.
The weakest preexpectation calculus goes back to the predicate transformer calculus by (Dijkstra, 1975, 1976), which provides an important tool for qualitative formal reasoning about nonprobabilistic programs. The probabilistic and quantitative analog to predicate transformers for nonprobabilistic programs are expectation transformers for probabilistic programs. Weakest–preexpectation–style reasoning was first studied in seminal work on probabilistic propositional dynamic logic (PPDL) by (Kozen, 1983, 1985). Its box– and diamond–modalities provide probabilistic versions of Dijkstra’s weakest (liberal) preconditions. Amongst others, (Jones, 1990), (Morgan et al., 1996), (McIver and Morgan, 2005), and (Hehner, 2011) have furthered this line of research, e.g., by considering nondeterminism and proof rules for bounding preexpectations in the presence of loops. Work towards automation of weakest preexpectation reasoning was carried out, amongst others, by (Chen et al., 2015), (Cock, 2014), (Katoen et al., 2010), and (Feng et al., 2017). Abstract interpretation of probabilistic programs was studied in this setting by (Monniaux, 2005).
Bounds on weakest preexpectations.
Rules for bounding weakest preexpectations were considered from very early on. Already (Kozen, 1983) provides an induction rule for verifying upper bounds. Pioneering work on lower bounds by means of limits of sequences was carried out by (Jones, 1990) and later reconsidered by (Audebaud and Paulin-Mohring, 2009). Proof rules that do not make use of limits were studied by (Morgan, 1996) and later more extensively in (McIver and Morgan, 2005). An orthogonal approach to lower bounds by means of bounded model checking was explored by (Jansen et al., 2016).
Advanced weakest preexpectation calculi.
Apart from reasoning about expected values of random variables at termination of simple pGCL programs, more advanced expectation–based calculi were invented. For instance, (Morgan and McIver, 1999) use expectation transformers to reason about temporal logic. More recently, (Olmedo et al., 2018) studies expectation transformers for probabilistic programs with conditioning. (Kaminski et al., 2016; Olmedo et al., 2016; Kaminski et al., 2018) introduce expectation based calculi to reason about expected runtimes of probabilistic programs. (Batz et al., 2019) present a quantitative separation logic together with a weakest preexpectation calculus for verifying probabilistic programs with pointer–access to dynamic memory.
In all of the above works, the rules for lower bounds rely throughout on finding limits of sequences as well as the sequences themselves. In particular, the proof of the (exact) expected runtime of the coupon collector by (Kaminski et al., 2016) requires a fairly complicated sequence, whereas our invariant in Ex. 4 was conceptually fairly easy and thus more informative for a human.
Martingale–based reasoning.
Probabilistic program analysis using martingales was pioneered by (Chakarov and Sankaranarayanan, 2013). Our rules rely on the notions of uniform integrability and conditional difference boundedness as well as the Optional Stopping Theorem. Previous works have also used these notions. (Barthe et al., 2016) focus on synthesizing exact martingale expressions. (Fioriti and Hermanns, 2015) develop a type system for uniform integrability in order to prove (positive) almost–sure termination232323Termination with probability 1 (within finite expected time). of probabilistic programs and give upper bounds on the expected runtime. (Fu and Chatterjee, 2019) give lower bounds on expected runtimes. (Kobayashi et al., 2020) provide a semi–decision procedure for lower bounding termination probabilities of probabilistic higher–order recursive programs. (Ngo et al., 2018) perform automated template–driven resource analysis, but infer upper bounds only.
The latter four works analyze the termination behavior of a probabilistic program, whereas we focus on general expected values, e.g., of program variables. Furthermore, we do not only make use of uniform integrability and/or conditional difference boundedness of some auxiliary stochastic process in order to prove soundness of our proof rules but establish tight connections between expectation–based reasoning via induction and martingale–based reasoning.
Other work on probabilistic program analysis by specialized kinds of martingales includes (Chakarov and Sankaranarayanan, 2014), (Chatterjee et al., 2016), (Chatterjee et al., 2017), (Agrawal et al., 2018), (Huang et al., 2018), (Fu and Chatterjee, 2019), and (Wang et al., 2019). For instance, regarding expected runtimes of probabilistic (and possibly nondeterministic) programs, (Fu and Chatterjee, 2019) construct difference bounded (as opposed to conditionally difference bounded, which is a strictly weaker requirement) supermartingales which have to correspond to the exact asymptotic expected runtime. In contrast, our rule allows for reasoning about strict lower bounds.
10. Conclusion
In this paper, we have studied proof rules for lower bounds in probabilistic program verification. Our rules are simple in the sense that the invariants need to be “pushed through the loop semantics” only a finite number of times, much like invariants in Hoare logic. In contrast, existing rules for lower bounds of unbounded weakest preexpectations required coming up with an infinite sequence of invariants, performing induction to prove relative inductiveness of two subsequent invariants, and then — most unpleasantly — finding the limit of this sequence. The main results of this paper are the following:
- (1)
We have presented the first inductive proof rules (Thm. 9 (a) and (b)) for verifying lower bounds on (possibly unbounded) weakest preexpectations of probabilistic while loops using quantitative invariants. Our inductive rules are given as an Optional Stopping Theorem (OST) for weakest preexpectations. They provide sufficient conditions for the requirement of uniform integrability which are much easier to check than uniform integrability in general. Case studies demonstrating the effectiveness but also the limitations of these rules are found in App. A. 2. (2)
For proving our OST, we resort to the classical OST from probability theory. However, for most notions that appear in the classical OST, like uniform integrability and conditional difference boundedness, we were able to find purely expectation–transformer–based counterparts (see Sect. 4 and 5). We thus conjecture that our OST can be proven in purely expectation–theoretic terms, which would most likely simplify the proof of our OST significantly as no probability theory would be required anymore. 3. (3)
We studied the inductive proof rules for lower bounds on bounded weakest preexpectations from (McIver and Morgan, 2005). Our results gave rise to a generalization of their proof rule to a sufficient and necessary criterion for lower bounds. (Thm. 2). 4. (4)
We have investigated a measure theoretical explanation for why verifying upper bounds using domain theoretical Park induction is conceptually simpler (Sect. 7). The underlying reason is the well–known Lemma of Fatou. This leads us to speculate that Fatou’s Lemma could be proved in purely domain theoretical terms, perhaps as an instance of Park induction. A successful attempt at a similar idea is due to (Baranga, 1991) who proved that the well–known Banach Contraction Principle is a particular instance of the Kleene Fixed Point Theorem. 5. (5)
We used the close connection between wp and ert to present the first inductive proof rule for lower bounding expected runtimes (Thm. 3). As an example to demonstrate the power of this rule, we inferred a nontrivial lower bound on the expected runtime of the famous coupon collector’s problem (Ex. 4).
Future work includes extending our proof rules for weakest preexpectation reasoning to recursive programs (Olmedo et al., 2016), to probabilistic programs with nondeterminism (McIver and Morgan, 2001, 2005), and to mixed–sign postexpectations. For the latter, this will likely yield more appealing proof rules for loops than those provided in (Kaminski and Katoen, 2017) which currently involve reasoning about sequences. Moreover, we are interested in (partially) automating the synthesis of the quantitative invariants needed in our proof rules.
Acknowledgements.
The authors gratefully acknowledge the support of the German Research Council (DFG) Research Training Group 2236 UnRAVeL and ERC Advanced Grant 787914 FRAPPANT. Furthermore, we would like to thank Florian Frohn and Christoph Matheja for many fruitful discussions on examples and counterexamples.
Appendix
This appendix contains additional material for our paper. App. A presents a collection of case studies to demonstrate the strengths and the limitations of our rule. In App. B, we give a more detailed introduction into the required preliminaries from probability theory. Afterwards, in App. C, we present the proofs for Sect. 4. App. D then contains the proofs of our main results in Sect. 5. In App. E we give the proofs for Sect. 6. Finally, App. F contains the proofs of our result for lower bounding the expected runtime from Sect. 8.
Appendix A Case Studies
Example 1 (Negative Binomial Loop (cf.
(McIver et al., 2018))).
Let us consider the program
[TABLE]
with . The characteristic function for of the program is given by
[TABLE]
The loop is expected to be executed times, so its expected looping time is finite. Intuitively, the value of after termination of the program is , i.e., the initial value of increases by the initial value of if the loop can be executed at all. Note that harmonizes with . We will prove that our intuition for is correct, i.e., that is indeed the least fixed point of .
First of all, it is a fixed point of :
[TABLE]
Since is indeed a fixed point of , it is also a subinvariant and furthermore finite. To apply Thm. 9 (b)* we need to check that is conditionally difference bounded. We derive*
[TABLE]
So is indeed conditionally difference bounded by . Hence, we can apply our new Optional Stopping Theorem (Thm. 9 (b)) to obtain . As is also fixed point itself, it is the least fixed point of , i.e.,
[TABLE]
Example 2 (Fair in the Limit Negative Binomial Loop).
Consider the slight adaption of from Ex. 1 to :
[TABLE]
with . Note that for any we have and is monotonically increasing in the value of . Therefore one can show using the ert–transformer (cf. (Kaminski et al., 2018)) that the expected runtime of is at most . So we have positive almost–sure termination and therefore finite expected looping time. Again, we would like to reason about the expected value of after termination of . The characteristic function for of the program is given by
[TABLE]
Intuitively, the value of after termination of the program should again be at least , i.e., the initial value of again increases at least by the initial value of if the loop can be executed at all. We will prove that this intuition is correct, so that is indeed a lower bound on the least fixed point of . Again, harmonizes with .
We first show that is a subinvariant:
[TABLE]
So is indeed a subinvariant of and . To apply Thm. 9 (b)* we need to check that is conditionally difference bounded.*
[TABLE]
So is indeed conditionally difference bounded by . Hence, by Thm. 9 (b)* is a lower bound on the least fixed point of , i.e.,*
[TABLE]
Example 3 (Negative Binomial Loop with Non–Constant Updates).
Let us consider another adaption of from Ex. 1 to the program :
[TABLE]
with . This program is positively almost–sure terminating and the expected number of loop iterations is . The characteristic function for of the program is given by \Phi_{f}(X)=\left[{x=0}\right]\cdot y+\left[{x>0}\right]\cdot\frac{1}{2}\cdot\Bigl{(}X\left[{y}\middle/{y+x}\right]\left[{x}\middle/{x-1}\right]+X\Bigr{)}.
Intuitively, the value of after termination of the program should be , i.e., the initial value of increases by the sum if the loop can be executed at all. We will prove that this intuition is correct, so that is indeed the least fixed point of . Again, harmonizes with .
First of all, we show that is a fixed point of :
[TABLE]
So is indeed a fixed point of and . To apply Thm. 9 (b)* we need to check that is conditionally difference bounded.*
[TABLE]
So is indeed conditionally difference bounded. Hence, by Thm. 9 (b)* is a lower bound on the least fixed point of . Since it is also a fixed point itself, we obtain*
[TABLE]
Example 4 (Probabilistic Doubling with Bounded Looping Time).
Let us consider the program
[TABLE]
with . The characteristic function for of the program is given by
[TABLE]
The looping time of this program is which is bounded by for any initial state . We will prove that is a lower bound on the expected value of after termination of the program.
[TABLE]
So is indeed a subinvariant. Furthermore, we have for any , as the loop body is loop–free. So by using Thm. 9 (a) we can deduce that is indeed a lower bound on the least fixed point of , i.e.,
[TABLE]
Note that in this case Thm. 9 (b) is not applicable. harmonizes with , but is not conditionally difference bounded:
[TABLE]
which is unbounded.
Nevertheless, even in the case of finite expected looping time, Thm. 9 just provides sufficient conditions for lower bounds. This is not surprising as conditional difference boundedness is a sufficient condition for uniform integrability but far from being necessary. The following example presents a limitation of our proof rule.
Example 5 (Probabilistic Doubling with Unbounded Looping Time).
Let us consider the program
[TABLE]
with and . The characteristic function for of the program is given by \Phi_{f}(X)=\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\tfrac{1}{2}\cdot\Bigl{(}X\left[{a}\middle/{0}\right]+X\left[{b}\middle/{2\cdot b}\right]\Bigr{)}.
Now consider the sequence of invariants . Then : . Furthermore, \Phi_{f}(I_{n})=\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\tfrac{1}{2}\cdot\Bigl{(}I_{n}\left[{a}\middle/{0}\right]+I_{n}\left[{b}\middle/{2\cdot b}\right]\Bigr{)}=\left[{a\neq 1}\right]\cdot b+\left[{a=1}\right]\cdot\tfrac{1}{2}\cdot\Bigl{(}b+n\cdot b\Bigr{)}=I_{n+1}. Hence, and by the Tarski–Kantorovich Principle Thm. 5. Moreover, each of the is a lower bound on the least fixed point, i.e., .
Furthermore, the loop is expected to be executed twice. Clearly, Thm. 9 (a)* and (c) are not applicable. Let . Then Thm. 9* (b)* is not applicable either, because is unbounded, i.e., is not conditionally difference bounded. To see this, let .*
[TABLE]
which can take an arbitrary large value as there is no bound on the value of . Hence Thm. 9 (b)* cannot be applied although is a lower bound. However, in (Kaminski et al., 2016) it is proved that the form an –subinvariant, hence, they are all lower bounds.*
Appendix B Details on Probability Theory
This section is devoted to a more detailed introduction of the concepts from probability theory that we use in our work.
B.1. –Fields
When setting up a probability space over some sample space , which can be any set, we have to distinguish the sets whose probabilities we want to be able to measure. The collection of these measurable sets is called a –field.
Definition 1 (–Field).
Let be an arbitrary set and . is called a –field over if the following three conditions are satisfied.
- (1)
, 2. (2)
* i.e., is closed under taking the complement,* 3. (3)
* i.e., is closed under countable union.*
The pair is called a measurable space. The elements of are called measurable sets.
In the setting of program verification, we have seen that is the set of all program runs. A program run is an infinite sequence of states, i.e., variable assignments. We regard –fields of the form , where is a collection of cylinder sets. (More precisely, we regard fields , where is the smallest –field containing all cylinder sets of order or smaller.) In our setting, a set of runs is a cylinder set of order if all runs in have the same first configurations, and all the following configurations can be arbitrary.
Let and be –fields over . Then is a –field over . Furthermore, if is a family of –fields over then so is .
For any set of subsets of , let consist of all elements that are contained in all –fields that are supersets of . The mapping from to is also called –operator.
Definition 2 (Generating –Fields).
Let . Then the smallest –field over containing is
[TABLE]
It turns out that there is a special case in which the generated –field is easy to describe, namely in the case where a countable covering of the space is given.
Lemma 3 (Generating –Fields for Covering of ).
If for a sequence and then
[TABLE]
Proof.
Showing that is a –field is enough to prove the desired result: it contains all the sets and every –algebra containing all the has to contain all their countable unions, i.e., :
by choosing .
Let . Then
Let . Then
∎
This special type of a –field can be used to describe the elements of the –fields (cf. Def. 5) and (cf. Def. 8). We will discuss it in more detail in Appendix C where we use it in the proofs.
Definition 4 (Borel–Field).
If we use its –field , the Borel–field with
[TABLE]
In this work, we use the concept of measurable maps (or “measurable mappings”). Measurable maps are the structure–preserving maps between measurable spaces. They are defined as follows.
Definition 5 (Measurable Map).
Let and be measurable spaces. A function is called an – measurable map or just measurable if for all
[TABLE]
* is the smallest –field such that is – measurable. Similarly, . This will become important when talking about random variables and conditional expected value.*
B.2. Probability Spaces
So far we have only introduced the concept of measurable spaces. Intuitively, a measurable space provides the structure for defining a measure. Probability spaces are measurable space attached with a certain measure where the measure of the sample space is .
Definition 6 (Probability Measure, Probability Space).
Let be a measurable space. A map is called a measure if
- (1)
** 2. (2)
* .*
A probability measure is a measure with . This implies that for every . If is a probability measure, then is called a probability space. In this setting a set in is called an event.
Here, the intuition for a probability measure is that for any set , is the probability that an element chosen from is contained in .
In this work we will consider properties that hold almost–surely, for example almost–sure termination or almost–sure convergence.
Definition 7 (Almost–Sure Properties).
Let be a probability space and some property, e.g., a logical formula. If (i.e., it is measurable) and then is said to hold almost–surely.
If a property holds almost–surely it does not need to hold for all . However, the measure cannot distinguish and if , so in the sense of , almost–surely holding properties can be considered as holding globally.
B.3. Integrals of Arbitrary Measures
We now introduce a notion of an integral with respect to an arbitrary measure. Therefore, we fix a measurable space and a measure . The objective is to define a “mean” of a measurable function . The basic idea is to partition the image of into sets on which has a constant value . Then we compute the weighted average of the , where the weights are the measures of the . This definition is fine if takes only finitely many values (these functions are called elementary). If takes infinitely (countable or even uncountable) many values, we have to approximate step by step by such functions with finite image. This yields a limit process. In this work we will only consider the cases where is a probability measure or a probability submeasure (i.e., ).
Definition 8 (Elementary Function, (Bauer, 1971, Def. 2.2.1)).
An elementary function is a nonnegative measurable function that takes only finitely many finite values, i.e., there exist and such that
[TABLE]
Lemma 9 (Decomposition of Elementary Functions (Bauer, 1971, Lemma 2.2.2)).
Let be an elementary function, where , for all and . Then
[TABLE]
Given an elementary function, the decomposition into a linear combination of indicator functions is not unique. But to define an integral we have to guarantee that its value does not depend on the chosen decomposition. Fortunately, this can be proved:
Definition 10 (Integral of Elementary Function, (Bauer, 1971, Def. 2.2.3)).
Let be an elementary function. Then we define its integral w.r.t. as
[TABLE]
The well–definedness is justified by Lem. 9 as it shows the independence of the chosen decomposition of the elementary function.
However, the measurable functions we use are not elementary. They take arbitrary (countably or even uncountably) many values. So we have to generalize Def. 10. It can be shown that any nonnegative measurable function is the limit of a monotonic increasing sequence of elementary functions.
Theorem 11 (Representation by Elementary Functions).
Let be a nonnegative measurable function. Then there exists a monotonic sequence of elementary functions such that
[TABLE]
Furthermore, for any two such sequences and we have .
Proof.
See (Bauer, 1971, Cor. 2.3.2., Thm. 2.3.6). ∎
This theorem justifies the following definition of an integral for an arbitrary nonnegative function.
Definition 12 (Integral of Arbitrary Functions).
Let be a nonnegative measurable function and a monotonic sequence of elementary functions such that . Then we define the integral of w.r.t. by
[TABLE]
Before we state the properties of the integral used in this work we will define the integral on a measurable subset of .
Definition 13 (Integral on Measurable Subset).
Let be a nonnegative measurable function and . Then is nonnegative and measurable, and we define
[TABLE]
With this definition of an integral, a very special property holds for monotonically increasing sequences of nonnegative functions: taking the limit (it always exists due to monotonicity) and the integral can be intertwined. We will focus on this when discussing uniform integrability.
Theorem 14 (Monotone Convergence Theorem, (Bauer, 1971, Thm. 2.3.4.)).
Let be a monotonic sequence of nonnegative measurable functions, i.e., is measurable and . Then is measurable and
[TABLE]
Lemma 15 (Properties of the Integral).
Let , be measurable functions, and . Then
[TABLE]
Proof.
See (Bauer, 1971, (2.2.4.), (2.3.6.), (2.3.7.), Cor. 2.3.5.). ∎
B.4. Random Variables
A random variable maps elements of one set to another set . If is a probability measure for (i.e., for , is the probability that an element chosen from is contained in ), then one obtains a corresponding probability measure for . For , is the probability that an element chosen from is mapped by to an element contained in . In other words, instead of regarding the probabilities for choosing elements from , one now regards the probabilities for the values of the random variable .
Definition 16 (Random Variable).
Let be a probability space. An – measurable map is a random variable. Instead of saying “– measurable” we simply use the notion “–measurable”. It is called discrete random variable, if its image is a countable set.
* is the induced probability measure by on . Instead of the notation is common. If is a singleton set, we also write instead of .*
Definition 17 (Expected Value).
Let be a random variable. Then .
Lemma 18 (Expected Value as Sum).
If is a discrete random variable we have . Note that this series has only countably many nonzero nonnegative summands. Hence, it either converges or it diverges to infinity.
Proof.
Let . Then . Hence
[TABLE]
∎
B.5. Uniform Integrability
Now given any stochastic process (i.e., a sequence of random variables, on a probability space ) that has an almost–surely existing limit the question arises whether we can construct the expectation of the limit as the limit of the expectations of the . However, this is false in general, a counterexample is given in (Grimmett and Stirzaker, 2001, Introduction of 7.10). Therefore, we distinguish stochastic processes with this special property.
Definition 19 (Uniform Integrability, (Grimmett and
Stirzaker, 2001, Thm. 7.10.(3))).
Let be a sequence of random variables converging almost–surely to a random variable . Then is uniformly integrable if and only if .
To check for uniform integrability is one of the main purposes of this work. There are two sufficient criteria which we will list below. The first one is the monotonic convergence theorem for random variables, a corollary of Thm. 14. It states that a sequence of monotonically increasing variables is always uniformly integrable.
Corollary 20 (Monotone Convergence Theorem for Random Variables, (Bauer, 1971, Thm. 2.3.4.)).
Let be a monotonic sequence of nonnegative random variables, i.e., is measurable and . Then is measurable and
[TABLE]
The second sufficient criterion states that if the sequence is bounded by an integrable random variable , then uniform integrability is given as well.
Lemma 21 (Bounded Stochastic Processes are Uniformly Integrable (Grimmett and
Stirzaker, 2001, Thm. 7.10.(4))).
Let be a sequence of random variables and a nonnegative random variable on a probability space with for all . If then is uniformly integrable.
B.6. Conditional Expected Values
We introduce the notion of conditional expected value w.r.t. a sub––field on a fixed probability space . The idea is that given a random variable and a subfield we would like to approximate by another –measurable random variable w.r.t. expectation. Intuitively, this means that we want to construct a (possibly infinite) nonnegative linear combination of the functions , in such a way that restricted to a set , the random variable and this linear combination have the same average value w.r.t. .
Definition 22 (Conditional Expected Value w.r.t. a –Field, (Bauer, 1971, Def. 10.1.2)).
Let be a random variable and be a sub––field of . A random variable is called a conditional expected value of w.r.t. if
- (1)
* is –measurable* 2. (2)
* for every .*
If two such random variables and exist, the just stated properties already ensure that . Therefore a conditional expected value is almost–surely unique which justifies the notation .
If the sub––field has the special structure described in Lem. 3 then the just stated property just needs to be checked on the generators.
Lemma 23 (Conditional Expected Value).
Let be a random variable. If for a sequence and then a –measurable function is a conditional expected value of iff
[TABLE]
Proof.
By definition it is left to show that for all implies for any . But due to Lem. 3 it is enough to show this for any disjoint union . We have and this series always converges point–wise as at most one of the summands is nonzero for any . Hence we have
[TABLE]
∎
It turns out that in our setting a conditional expectation always exists as it is almost surely finite.
Theorem 24 (Existence of Conditional Expected Values (Agrawal et al., 2018, Prop.
3.1.)).
Let be a random variable such that and let be a sub––field of . Then exists.
This theorem helps us to find bounds on the conditional expected value if we are unable to determine it exactly.
Lemma 25.
Let be a random variable with and for a sequence , and a –measurable function. We have iff
[TABLE]
Proof.
We prove the two directions separately.
- “”
Let . Then we have by definition of the conditional expectation
[TABLE]
by monotonicity of the integral.
- “”
By Thm. 24 the conditional expected value of exists and is itself a nonnegative –measurable random variable. Consider the random variable . It is a result of measure theory (cf. (Bauer, 1971)) that is a –measurable random variable. So consider the set . Then we know that is a disjoint union of some of the , so w.l.o.g. let us assume that . Again, a deep result from measure theory shows that (cf. (Bauer, 1971)) if . Then we have
[TABLE]
i.e., , a contradiction. So, we must have . As was chosen arbitrarily we have a.s.
∎
The same proof can also be done for the inverse inequality, i.e., we have the following corollary.
Corollary 26.
Let be a random variable with and for a sequence , and a –measurable function. We have iff
[TABLE]
Recall that is a random variable that is like , but for those elements that are not distinguishable in the sub––field , it “distributes the value of equally”. This statement is formulated by the following lemma.
Lemma 27 (Expected Value Does Not Change When Regarding Conditional Expected Values).
Let be a random variable on and let be a sub––field of . Then
[TABLE]
Proof.
[TABLE]
∎
The following theorem shows (a) that linear operations carry over to conditional expected values w.r.t. sub––fields, (b) that every random variable approximates itself if it is already measurable w.r.t. the sub––field , and (c) it allows to simplify multiplications with –measurable random variables. Moreover, (d) shows how to simplify expected values with several conditions.
Theorem 28 (Properties of Conditional Expected Value (Grimmett and
Stirzaker, 2001, p. 443)).
Let be random variables on and let be a sub––field of . Then the following properties hold.
- (a)
. 2. (b)
If is itself –measurable then . 3. (c)
If is –measurable then . 4. (d)
If is a sub––field of then .
Appendix C Proofs for Section 4
We start this section with a crucial observation which will ease the proofs we conduct here. Reconsider the filtration as presented in Def. 5. For every and every two distinct prefixes of length , their generated cylinder sets are disjoint, i.e., and . Therefore,
[TABLE]
As (cf. Def. 8) we directly get
[TABLE]
See 9
Proof.
We have to prove that is –measurable, i.e., for any .
Consider any run . just depends on , i.e., is constant on the cylinder set . Therefore it is constant on the generators of , i.e., on every with . Since there are only countably many of these generators, for any is a countable union of generators of , i.e., . ∎
See 10
Proof.
Due to Lem. 9, is –measurable. So it is left to show that for any
[TABLE]
Due to Lem. 23 it is enough to show 2 for the generators of , as any other set in is just a disjoint union of these generators. Hence, we prove 2 for for a prefix run . Furthermore, if the set is a nullset and hence, 2 holds trivially. So assume that and that is not a nullset, i.e., . Note that if for some , then and are identical on , as then for all . So in this case 2 holds trivially, too. Hence, we assume . We will use a case analysis to prove the desired result.
- (1)
, i.e., for all
[TABLE] 2. (2)
, i.e., for all
[TABLE]
∎
See 11
Proof.
Let us fix an arbitrary state . We will prove the result by induction.
- •
Induction base:
[TABLE]
- •
Assume the result holds for a fixed .
[TABLE]
∎
See 13
Proof.
Let in .
[TABLE]
If the program is universally almost–surely terminating (i.e., for any ), then for any . Furthermore, so by the previous result converges point–wise to on a set with probability . By definition converges almost–surely to . ∎
See 14
Proof.
Consider the expectation and the indicator function . Then by definition for all . To apply the Monotone Convergence Theorem we will calculate the expectation of . Note that is zero on the set . Hence,
[TABLE]
Hence by the Monotone Convergence Theorem (Cor. 20)
[TABLE]
∎
See 17
Proof.
By Def. 15, is uniformly integrable for every if and only if for every the equation \lim\limits_{n\to\omega}\phantom{}^{s}\mathbb{E}\left(X_{n}^{f,I}\right)=\phantom{}^{s}\mathbb{E}\left(\lim\limits_{n\to\omega}X_{n}^{f,I}\right)\overset{\textnormal{\lx@cref{creftype~refnum}{lemma:as_limit}}}{=}\phantom{}^{s}\mathbb{E}\left(X_{T^{\neg\varphi}}^{f}\right) holds. But due to Thm. 14 and AST, we have . So is uniformly integrable iff converges to . By Cor. 11, this is equivalent to the requirement that converges to . This is the definition of uniform integrability of for , cf. Def. 8. ∎
Appendix D Proofs for Section 5
See 2
Proof.
Let . We have already proved in Lem. 9 that is –measurable. First of all, by Cor. 11 we have . Secondly, by Thm. 10 and we have . By Def. 1 this proves that is a submartingale with respect to . ∎
See 8
Proof.
First of all, is –measurable as seen in Lem. 9. For any we have
[TABLE]
as harmonizes with . To show the result we will prove
[TABLE]
for any and use Lem. 23 to obtain the desired result. Note that both sides of this equality are [math] if , so in this case the equality holds trivially.
Take any such that . Furthermore, as both random variables \big{|}X_{n+1}^{f,I}-X_{n}^{f,I}\big{|} and are constant zero if all runs in have a looping time , 4 holds trivially in this case as well. Note that for any , H(X)(s)=\left[{\varphi}\right](s)\cdot\textsf{{wp}}\left\llbracket{C}\right\rrbracket\left({X}\right)(s) is zero if . So, assume .
[TABLE]
As we have already seen in Appendix C that , we use Lem. 23 to conclude our desired result \phantom{}{}^{s}\mathbb{E}\left(\big{|}X_{n+1}^{f,I}-X_{n}^{f,I}\big{|}~{}\Big{|}~{}\mathfrak{G}_{n}^{\mbox{\tiny\rm loop}}\right)~{}{}={}~{}X_{n}^{0,\Delta I}. ∎
The following auxiliary lemma is needed for the proof of Thm. 9.
Lemma 1 (Sufficient Condition for Uniform Integrability for a Fixed State).
Let be a conditionally difference bounded expectation that harmonizes with , and . Let the expected looping time of be finite for , where is AST, i.e., and . Then for all and
[TABLE]
Proof.
We present a proof based on the proof of the Optional Stopping Theorem given in (Grimmett and Stirzaker, 2001, Thm 12.5.(9)). Consider the process as studied in Sect. 4. As harmonizes with , we have seen in Thm. 8 that if is conditionally difference bounded by the constant , then , where belongs to the filtration defined in Def. 8. Now we will show that is uniformly integrable w.r.t. .
Note that by definition of (Def. 7), we have : Let and . Then if , we have . If on the other hand , we have .
We have for any , and any run :
[TABLE]
We will show that the expectation of is finite.
[TABLE]
By Lem. 21 the uniform integrability of w.r.t. follows. Therefore, we have that by Cor. 17. Furthermore we can extract the following for any :
[TABLE]
So we have just shown that for any . ∎
Corollary 2 (Sufficient Condition for Uniform Integrability).
Let be a conditionally difference bounded expectation that harmonizes with , , and let the expected looping time of be finite for every initial state , where is AST. Then I is uniformly integrable for and for any .
Proof.
The result follows immediately by applying Lem. 1 to every state . ∎
See 9
Proof.
That the subinvariant is a lower bound iff it is uniformly integrable for is exactly Thm. 7. Nevertheless, we present a proof for the whole theorem in analogy to the Optional Stopping Theorem (Thm. 3).
First of all, recall that holds (cf. the proof of Cor. 2).
Secondly, in any of the three cases (a) to (c), we have for any : in (a) it is a precondition, in (b) it holds due to Cor. 2, and in (c) the boundedness of and implies that is bounded as well. So in particular, it is finite (cf. (McIver and Morgan, 2005)). Therefore in any of the three cases, is a submartingale by Lem. 2 as is a subinvariant.
Furthermore, in any of the three cases (a) to (c), is universally almost surely terminating. Hence by Lem. 13 we have
[TABLE]
almost–surely for every . So if we can prove for all of the three cases (a) to (c) that is uniformly integrable for any , then we have independent of :
[TABLE]
i.e., I\preceq\textnormal{{{lfp}}}~{}\Phi_{f}=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right) as desired.
We will now use the Optional Stopping Theorem (Thm. 3) and Cor. 2 to prove the uniform integrability.
- (a)
Let . Then there is an with , i.e., the looping time of is almost–surely bounded for any . So by Thm. 3 (a), is uniformly integrable for any . 2. (b)
Due to Cor. 2, is uniformly integrable for . Hence is uniformly integrable by Cor. 17. As is a submartingale, the result follows from Thm. 3. 3. (c)
If and are bounded, then so is the process . By Thm. 3 (c) is uniformly integrable.
∎
Example 3 (Details on Ex. 7 and 11).
Reconsider the program , given by ex:calculations_running_example
[TABLE]
The characteristic function of the while loop with respect to postexpectation is given by
[TABLE]
We have seen that
[TABLE]
and
[TABLE]
are fixed points of and in Ex. 11 we proved that is indeed the least fixed point. So, cannot be the least fixed point. Hence, cannot be uniformly integrable and in particular, it cannot be conditionally difference bounded. Indeed, is unbounded: Let . For any and any arithmetic expression , let denote the state with for and , where is obtained by extending states from variables to arithmetic expressions in the straightforward way.
[TABLE]
So we have which is unbounded. So does not satisfy the preconditions of Thm. 9 (b), hence our proof rule sorts out this invariant. Note that neither (a) (as the looping time is unbounded) nor (c) (as neither nor are bounded) are applicable.
Appendix E Proofs for Section 6
We will show that Thm. 1 can be easily inferred from our results in Sect. 4 and we can generalize (3) to a complete proof rule. To do so, we will make use of the Martingale Convergence Theorem of which we present a specialized version suitable for our purposes:
Theorem 1 (Martingale Convergence Theorem (Grimmett and
Stirzaker, 2001, Thm.12.3.(1))).
Let be a submartingale on a probability space with respect to a filtration . If there is a constant such that for every then there exists a random variable such that i.e., converges almost surely to . Furthermore, is uniformly integrable, i.e., Moreover,
[TABLE]
If is a martingale, i.e., for all we have , then we even have
[TABLE]
Now let be bounded such that , i.e., is a subinvariant and assume there is some with . Then the process satisfies for every . By Thm. 1 there exists a random variable such that converges to almost surely. By Lem. 13 we get that for any run with we must have , i.e., w.l.o.g. we can assume . As is a subinvariant we have by Lem. 2 that is a submartingale. We conclude for an arbitrary initial state
[TABLE]
Consequently,
[TABLE]
If is a fixed point of , then the process is a martingale. By Thm. 10 we have for an arbitrary initial state
[TABLE]
Hence, in this case instead of holds in 5. We will now discuss the results of Thm. 1. First of all, T(s)=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({1}\right)(s)=\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}<\omega\right) by using and Thm. 14.
We will now prove See 1
Proof.
- (1)
Assume for some predicate , i.e., . W.l.o.g. let as the claim holds trivially if . Then \phantom{}{}^{s}\mathbb{E}\Bigl{(}\left[{(T^{\neg\varphi})^{-1}(\{\omega\})}\right]\cdot\underbrace{X^{f,I}_{\omega}}_{\leq 1}\Bigr{)}\leq\phantom{}^{s}\mathbb{E}\Bigl{(}\left[{(T^{\neg\varphi})^{-1}(\{\omega\})}\right]\Bigr{)}=\phantom{}^{s}\mathbb{P}\left(T^{\neg\varphi}=\omega\right). By 5 we get so we have
[TABLE] 2. (2)
Assume that for some predicate we have . Again, w.l.o.g. let . But then we must have , i.e., . So, and by 5 we have \left[{G}\right](s)\cdot I(s)=I(s)\leq\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right)(s), i.e.,
[TABLE] 3. (3)
Assume there is some with . By definition, . By 5, we have
[TABLE]
i.e., . By definition, is a fixed point of . Thus,
[TABLE]
So by 5 we can conclude that for any state , i.e.,
[TABLE]
∎
Notice that we have not used the fact that is the termination probability but only that T=\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right) for some bounded postexpectation . Furthermore, if is a lower bound, by definition I\preceq\textsf{{wp}}\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\left({f}\right). Hence, we have generalized Thm. 1 (3) in case of a loop with universally almost–surely terminating body to a complete characterization of lower bounds. So we have proved the following theorem.
See 2
Example 2 (Details on Ex. 3).
Let us consider the program
[TABLE]
with and . Note that this program is not AST. Furthermore, the postexpectation is bounded. If initially then is [math] after termination of the program. So, \textsf{{wp}}\left\llbracket{C_{rdw}}\right\rrbracket\left({y}\right)\geq\left[{y>x}\right]\cdot\left(\tfrac{1}{3}\right)^{x}\cdot(y-x)\coloneqq I.
Now consider . We have , where .
[TABLE]
If , then obviously by the calculation above. If , then we have
[TABLE]
as for every even we have and for every odd we have .
We have \tfrac{1}{400}\cdot I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{rdw}}\right\rrbracket\left({y}\right). Thus, we can conclude from Thm. 2 that I^{\prime}\preceq\textsf{{wp}}\left\llbracket{C_{rdw}}\right\rrbracket\left({f}\right). Note that this is easier than relating and the termination probability as required in Thm. 1 as does not influence the termination behavior of the loop.
Appendix F Details for Section 8
F.1. Proofs
See 3
Proof.
Remember the connection between wp and ert (cf. (Olmedo et al., 2016, Thm. 5.2)): For any probabilistic program we have
[TABLE]
Our goal is to show for all . We use induction on to prove this result. In the base case we have . Here, we obtain
[TABLE]
In the induction step we use the induction hypothesis . Then we have
[TABLE]
So holds for an arbitrary with .
Now let . Then one of the following two cases occurs.
- (1)
\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({0}\right)(s)=\infty
In this case we have by 6 \textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({t}\right)(s)=\infty\geq I(s). 2. (2)
\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({0}\right)(s)<\infty
In this case we have . First of all, in Thm. 9 (b) we have seen, that if is conditionally difference bounded, and for every then we have . However, in Thm. 9 (b) we need that the expected looping time is finite for every initial state . As we cannot ensure this condition, we use Lem. 1, a specialized result used in the proof of Thm. 9 which is indeed dependent on the initial state .
Furthermore, the expected runtime of the program with initial state is finite so, the expected looping time of the program has to be finite as well, i.e., \textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({0}\right)(s)<\infty implies .
Hence, as harmonizes with , is conditionally difference bounded,, and . Thus, we can apply Lem. 1 and get
[TABLE]
Hence we have
[TABLE]
But the sequence is monotonically increasing as is an ert–subinvariant. Hence, I(s)\leq\lim\limits_{n\to\omega}{}^{\textsf{{ert}}}\Phi^{n}_{t}(I)(s)=\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({t}\right)(s).
Combining these results, we get I\leq\textsf{{ert}}\,\left\llbracket{\textnormal{{while}}\left(\,{\varphi}\,\right)\left\{\,{C}\,\right\}}\right\rrbracket\,\left({t}\right). ∎
F.2. Details for Example 4
More detailed annotations for the outer loop of the coupon collector are as follows:
[TABLE]
For the inner loop, we make use of the following Lemma, for which we also give a detailed proof in the following:
Lemma 1.
Let be a runtime (i.e. an expectation) such that does not depend on program variable . Then the following expected runtime annotation is valid:
[TABLE]
Proof.
We employ (Batz et al., 2018, Theorem 4) for so-called –independent and identically distributed loops (-i.i.d. loops for short) (see (Batz et al., 2018, Definition 5)). In order to verify the -i.i.d.-ness of the loop , we have to establish that neither
[TABLE]
nor
[TABLE]
depend on program variable which is indeed the case by the assumption that does not depend on . Additionally to -i.i.d.-ness, (Batz et al., 2018, Theorem 4) requires us to establish that
[TABLE]
does not depend on variable and that the loop body terminates almost-surely, i.e.
[TABLE]
Both conditions are obviously true.
Having established all preconditions of (Batz et al., 2018, Theorem 4), we can now make the following ert-annotations (recall that such annotations are best read from bottom to top):
[TABLE]
It is important to note that, again, any loop semantics needed to be applied only a finite number of times. In particular, it was not necessary to find the limit of a sequence or anything alike. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Agrawal et al . (2018) Sheshansh Agrawal, Krishnendu Chatterjee, and Petr Novotný. 2018. Lexicographic Ranking Supermartingales: An Efficient Approach to Termination of Probabilistic Programs. PACMPL 2, POPL (2018), 34:1–34:32.
- 3Audebaud and Paulin-Mohring (2009) Philippe Audebaud and Christine Paulin-Mohring. 2009. Proofs of Randomized Algorithms in Coq. Science of Computer Programming 74, 8 (2009), 568–589.
- 4Back and von Wright (1998) Ralph-Johan Back and Joakim von Wright. 1998. Refinement Calculus - A Systematic Introduction . Springer.
- 5Baranga (1991) Andrei Baranga. 1991. The Contraction Principle as a Particular Case of Kleene’s Fixed Point Theorem. Discrete Mathematics 98, 1 (1991), 75–79.
- 6Barthe et al . (2016) Gilles Barthe, Thomas Espitau, Luis María Ferrer Fioriti, and Justin Hsu. 2016. Synthesizing Probabilistic Invariants via Doob’s Decomposition. In Proc. of the International Conference on Computer–Aided Verification (CAV) (Lecture Notes in Computer Science) , Vol. 9779. Springer, 43–61.
- 7Batz et al . (2019) Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Thomas Noll. 2019. Quantitative Separation Logic: a Logic for Reasoning about Probabilistic Pointer Programs. PACMPL 3, POPL (2019), 34:1–34:29.
- 8Batz et al . (2018) Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Christoph Matheja. 2018. How Long, O Bayesian Network, will I Sample Thee? - A Program Analysis Perspective on Expected Sampling Times. In Proc. of the European Symposium on Programming Languages and Systems (ESOP) (Lecture Notes in Computer Science) , Vol. 10801. Springer, 186–213.
