This paper introduces probabilistic extensions of the lambda calculus for call-by-value and call-by-name evaluation strategies, establishing confluence and standardization properties, and unifies them through a linear logic-based calculus for better control of probabilistic choice and copying.
Contribution
It presents novel probabilistic lambda calculi with confluence and standardization proofs, and unifies them via a linear logic framework for enhanced control over probabilistic computation.
Findings
01
Both calculi enjoy confluence and standardization.
02
A unified calculus based on Linear Logic is developed.
03
The approach allows fine control of choice and copying interactions.
Abstract
We introduce two extensions of the λ-calculus with a probabilistic choice operator, Λ⊕cbv and Λ⊕cbn, modeling respectively call-by-value and call-by-name probabilistic computation. We prove that both enjoys confluence and standardization, in an extended way: we revisit these two fundamental notions to take into account the asymptotic behaviour of terms. The common root of the two calculi is a further calculus based on Linear Logic, Λ⊕!, which allows for a fine control of the interaction between choice and copying, and which allows us to develop a unified, modular approach.
Equations56
\mu(M)=\left\{\begin{array}[]{ll}p&\mbox{ if }p=\sum_{i\in I}p_{i}\mbox{ s.t. }M_{i}=M\\
0&\mbox{otherwise;}\end{array}\right.
\mu(M)=\left\{\begin{array}[]{ll}p&\mbox{ if }p=\sum_{i\in I}p_{i}\mbox{ s.t. }M_{i}=M\\
0&\mbox{otherwise;}\end{array}\right.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
Lambda Calculus and Probabilistic Computation
(Extended Version)
Claudia Faggian
Université de Paris, IRIF, CNRS, France
Simona Ronchi della Rocca
Dip. di Informatica, Università di Torino, Italy
Abstract
We introduce two extensions of the λ-calculus with a probabilistic choice operator, Λ⊕cbv and Λ⊕cbn, modeling respectively call-by-value and call-by-name probabilistic computation. We prove that both enjoys confluence and standardization, in an extended way: we revisit these two fundamental notions to take into account the asymptotic behaviour of terms. The common root of the two calculi is a further calculus based on Linear Logic, Λ⊕!, which allows us to develop a unified, modular approach.
I Introduction
The pervasive role of stochastic models in a variety of domains (such as machine learning, natural language, verification) has prompted a vast body of research on probabilistic programming languages; such a language supports at least discrete distributions by providing an operator which models sampling.
In particular, the functional style of probabilistic programming, pioneered by [28], attracts increasing interest because it allows for higher-order computation, and offers a level of abstraction well-suited to deal with mathematical objects. Early work [18, 24, 22, 26, 23]
has evolved in a growing body of software development and theoretical research.
In this context, the λ-calculus has often been used as a core language.
In order to model higher-order probabilistic computation, it is a natural approach to take the λ-calculus as general paradigm, and to enrich it with a probabilistic construct. The most simple and concrete way to do so ([10, 8, 13]) is to equip the untyped λ-calculus with an operator ⊕,
which models flipping a fair coin. This suffices to have universality, as proved in [8], in the sense that the calculus is
sound and complete with respect to computable probability distributions.
The resulting calculus is however non-confluent, as it has been observed early (see [8] for an analysis).
We revise the issue in Example 1. The problem with confluence is handled in the literature by fixing a deterministic reduction strategy, typically the leftmost-outermost strategy.
This is not satisfactory both for theoretical and practical reasons, as we discuss later.
In this paper, we propose a more general point of view. Our goal is a foundational calculus, which plays the same role as the λ-calculus does for deterministic computation. More precisely, taking the point of view propounded by Plotkin in [25], we discriminate between a calculus and a programming language. The former defines the reduction rules, independently from any reduction strategy, and enjoys confluence and standardization, the latter is specified by a deterministic strategy (an abstract machine). Standardization is what relates the two: the programming language implements the standard strategy associated to the calculus. Indeed, standardization implies the existence of a strategy (the standard strategy) which is guaranteed to reach the result, if it exists.
In this spirit, we consider a probabilistic calculus to be characterized by a specific calling mechanism;
the reduction is otherwise only constrained by the need of discriminating between duplicating a function which samples from a distribution, and duplicating the result of sampling. Think of tossing a coin and
duplicating the result, versus tossing the coin twice, which is indeed the issue at the core of confluence failure, as the following examples (adapted from [9, 8]) show.
Example 1** (Confluence).**
Let us consider the untyped λ-calculus extended with a binary operator ⊕ which models fair, binary probabilistic choice: M⊕N reduces to either M or N with equal probability 1/2; we write this as M⊕N→{M21,N21}. Intuitively, the result of evaluating a probabilistic term is a distribution on its possible values.
*Consider the term PQ, where P=(λz.z\leavevmodeXOR\leavevmodez), and Q=(T⊕F); \leavevmodeXOR\leavevmode is the standard construct for exclusive OR,
T=λxy.x and F=λxy.y code the boolean values. *
– If we first reduce Q,
we obtain (λz.z\leavevmodeXOR\leavevmodez)T or (λz.z\leavevmodeXOR\leavevmodez)F, with equal probability 1/2. This way, PQ evaluates to {F1}, i.e.F with probability 1.
*– If we reduce the outermost redex first, PQ reduces to
(T⊕F)\leavevmodeXOR\leavevmode(T⊕F), and the term evaluates to the distribution {T21,F21}. *
The two resulting distributions are not even comparable.
2. 2.
The same phenomenon appears even if we restrict ourselves to
call-by-value. Consider for example the reductions of PN with P as in 1), and N=(λxy.x⊕y). We obtain the same two different distributions as above.
In this paper, we define two probabilistic λ-calculi, respectively based on the call-by-value (CbV) and call-by-name (CbN) calling mechanism. Both enjoy confluence and standardization, in an extended way: indeed we revisit these two fundamental notions to take into account the asymptotic behaviour of terms. The common root of the two calculi is a further calculus based on Linear Logic, which is an extension of Simpson’s linear λ-calculus [30], and which allows us to develop a unified, modular approach.
Content and Contributions
In Section IV, we introduce a call-by-value calculus, denoted Λ⊕cbv, as a probabilistic extension of the call-by-value λ-calculus of Plotkin (where the β-reduction fires only in case the argument is a value, i.e. either a variable or a λ-abstraction). We choose to study in detail call-by-value for two main reasons.
First, it is the most relevant mechanism to probabilistic programming (most of the abstract languages we cited are call-by-value, but also real-world stochastic programs such as Church [16]). Second, call-by-value is a mechanism in which dealing with functions, and duplication of functions, is clean and intuitive, which allows us to address the issue at the core of confluence failure.
The definition of value (in particular, a probabilistic choice is not a value) together with a suitable restriction of the evaluation context for the probabilistic choice, allow us to recover key results: confluence and a form of standardization (Section V). Let us recall that, in the classical λ-calculus, standardization means that there is a strategy which is complete for all reduction sequences, i.e., for every reduction sequence M→∗N there is a standard reduction sequence from M to N.
A standard reduction sequence with the same property exists also here. An unexpected result is that strategies which are complete in the classical case, are not so here, notably the leftmost strategy.
In Section VI we study the asymptotic behavior of terms.
Our leading question is how the asymptotic behaviour of different sequences starting from the same term compare.
We first analyze if and in which sense confluence implies that the result of a probabilistically terminating computation is unique. We formalize the notion of asymptotic result via limit distributions, and establish that there is a unique maximal one.
In Section VII we address the question of how to find
such greatest limit distribution, a question which arises from the fact that evaluation in Λ⊕cbv is non-deterministic, and different sequences may terminate with different probability.
With this aim, we extend the notion of standardization to limits; this extension is non-trivial, and demands the development of new sophisticated proof methods.
We prove that the new notion of standardization supplies a family of complete reduction strategies which are guaranteed to reach the unique maximal result.
Remarkably, we are able to show that, when evaluating programs, i.e., closed terms, this family does include the leftmost strategy. As we have already observed, this is the deterministic strategy which is typically adopted in the literature, in either its call-by-value ([18, 7]) or its call-by-name version ([10, 13]), but without any completeness result with respect to probabilistic computation.
Our result offers an “a posteriori” justification for its use!
The study of Λ⊕cbv allows us to develop a crisp approach, which we are then able to use in the study of different probabilistic calculi.
Because the issue of duplication is central, it is natural to expect a benefit from the fine control over copies which is provided by Linear Logic. In Section IX we use our tools to introduce and study a probabilistic linearλ-calculus, Λ⊕!. The linear calculus provides not only a finer control on duplication, but also a modular approach to confluence and standardization, which allow us to formalize a call-by-name version of our calculus, namely Λ⊕cbn, in Section X. We prove that Λ⊕cbn enjoys properties analogous to those of Λ⊕cbv, in particular confluence and standardization.
In Section II we provide the reader with some background and motivational observations.
Basic notions of discrete probability and rewriting are reviewed in Section III.
Related Work
The idea of extending the λ-calculus with a probabilistic construct is not new;
without any ambition to be exhaustive, let us cite [22, 26], [10, 13, 8, 5]. In all these cases, a specific reduction strategy is fixed; they are indeed languages, not calculi, according to Plotkin’s distinction.
The issue about confluence appears every time the λ-calculus is extended with a choice effect: quantum, algebraic, non-deterministic. The ways of framing the same problem in different settings are naturally related, and we were inspired by them. Confluence for an algebric calculus is dealt with in [1] for the call-by-value, and in [31] for the call-by-name. In the quantum case we would like to cite [7, 6], which are based on Simpson’s calculus [30]. A probabilistic extension of Simpson’s calculus was first proposed in [11].
The language is similar to that of Λ⊕!; however in [11] (as also in [7, 6]) no reduction (not even β) is allowed in the scope of a !-operator. The reduction there hence corresponds to surface reduction, which in Sec. IX we show to be the standard strategy for Λ⊕!.
To our knowledge, the only proposal of a probabilistic λ-calculus in which the reduction is independent from a specific strategy is for call-by-name, namely the calculus of [19], in the line of work of differential [14] and algebric [31] λ-calculus.
The focus in [19] is essentially semantical, as the author want to study an equational theory for the λ-calculus, based on an extension of Böhm trees.
[19] develops results which in their essence are similar to those we obtain for call-by-name in Sec. X, in particular confluence and standardization, even if
his calculus –which internalizes the probabilistic behavior– is quite different from ours, and so are the proof techniques.
Finally,
we wish to mention that proposals of a probabilistic λ-calculus could also be extracted from semantical models, such as
the one in
[3], which develops an idea earlier presented in [29], and in which the notion of graph models for λ-calculus has been extended with a probabilistic construct.
II Background and Motivational observations
In this section, we first review -in a non-technical way- the specific features of probabilistic programs, and how they differ from classical ones.
We then focus on some motivational observations which are relevant to our work. First, we give an example of features which are lost if a programming language is characterized by a strategy which is not rooted in a more general calculus. Then, we illustrate some of the issues which appear when we study a general calculus, instead of a specific reduction strategy. Addressing these issues will lead us to develop new notions and tools.
II-A Classical vs. Probabilistic Programs
A classical program
defines a deterministic input-output relation; it
terminates (on a given input), or does not; if it terminates, the program only runs for a finite number of steps.
Instead, a probabilistic program generates a probability distribution over possible outputs; it
terminates (on a given input) with a certain probability;
it may have runs which take infinitely many steps even when termination has probability 1.
A probabilistic program is a stochastic model.
The intuition is that the probabilistic program P is executed, and random choices are made by sampling;
this process defines a distribution over all the possible outputs of P.
Even if the termination probability is 1 (almost sure termination), that degree of certitude
is typically not reached in any finite number of steps, but it appears as a limit. A standard example is a term M which reduces to either the normal form T or M itself, with equal probability 1/2. After n steps, M reduces to T with probability
21+221+⋯+2n1. Only at the limit this computation terminates with probability 1 .
Probabilistic vs. Quantitative
The notion of probabilistic termination is what sets apart probabilistic λ-calculus from other quantitative calculi such as those in [1, 14, 31], and from the non-deterministic λ-calculus [9].
For this reason, the asymptotic behaviour of terms will be the focus of this paper.
II-B Confluence of the calculus is relevant to programming
Functional languages have their foundation in the λ-calculus and its properties, and such properties (notably, confluence and standardization) have theoretical and practical implications.
A strength of classical functional languages -which is assuming growing importance-
is that they are inherently parallel (we refer e.g. to [21] for discussion on deterministic parallel programming): every sub-expression can be evaluated in parallel,
because of referential transparency; still, we can perform reasoning, testing and debugging on a program using a sequential model,
because the result of the calculus is independent from the evaluation order.
Not to force a sequential strategy impacts the implementation of the language, but also the conception of programs. As advocated by Harper, the parallelism of functional languages exposes the
“dependency structure of the computation by not introducing any dependencies that are not forced on us by the nature of the computation itself."
This feature of functional languages is rooted in the confluence of the λ-calculus, and is an example of what is lost in the probabilistic setting, if we give-up either confluence, or the possibility of non-deterministic evaluation.
II-C The result of probabilistic computation
A ground for our approach is the distinction between calculus and language.
Some of the issues which we will need to address do not appear when working with probabilistic languages, because they are based on a simplification of the λ-calculus.
Programming languages only evaluate programs, i.e., closed terms (without free variables). A striking simplification appears from another crucial restriction, weak evaluation, which does not evaluate function bodies (the scope of λ-abstractions). In weak call-by-value (base of the ML/CAML family of probabilistic languages) values are normal forms.
What is the result of a probabilistic computation is well understood only in the case of programming languages:
the result of a program is a distribution on its possible outcomes, which are normal forms w.r.t. a chosen strategy.
In the literature of probabilistic λ-calculus, two main deterministic strategies have been studied: weak left strategy in CbV [8] and head strategy in CbN [13], whose normal forms are respectively the closed values and the head normal forms.
When considering a calculus instead of a language, the identity between normal forms and results does not hold anymore, with important consequences in the definition of limit distributions.
We investigate this issue in Sec. VI.
The approach we develop is general and uniform to all our calculi.
III Technical Preliminaries
We review basic notions on discrete probability and rewriting which we use through the paper.
We assume that the reader has some familiarity with the λ-calculus.
III-A Basics on Discrete Probability
A discrete probability space is given by a pair (Ω,μ),
where Ω is a countable set, and μ is a discrete probability distribution
on Ω, i.e. is a function from Ω to [0,1]⊂R such that ∥μ∥:=∑ω∈Ωμ(ω)=1. In this case,
a probability measure is assigned to any subset A⊆Ω as μ(A)=∑ω∈Aμ(ω).
In the language of probability theory, a subset of Ω is called an event.
Let (Ω,μ) be as above. Any functionF:Ω→Δ, where Δ is another countable set,
induces a probability distributionμF on Δ
by composition: μF(d∈Δ):=μ(F−1(d))i.e.μ{ω∈Ω:F(ω)=d}.
In the language of probability theory, F is called a discrete random variable on (Ω,μ).
Example 2** (Die).**
Consider tossing a die once. The space of possible outcomes is the
set Ω={1,2,3,4,5,6}. The probability measure μ of each outcome is 1/6. The event “result is odd" is the subset O={1,3,5}, whose probability measure is μ(O)=1/2.
2. 2.
Let Δ be a set with two elements {Even,Odd}, and F the obvious function from Ω to
Δ. F induces a distribution on Δ, with μF(Even)=1/2 and μF(Odd)=1/2.
III-B Subdistributions and DST(Ω)
Given a countable set Ω, a function μ:Ω→[0,1] is a probability subdistribution if
∥μ∥≤1.
We write DST(Ω) for the set
of subdistributions on Ω. With a slight abuse of language, we will use the term distribution also for subdistribution.
Subdistributions allow us to deal with partial results and non-successful computations.
Order:DST(Ω) is equipped with the standard order relation of functions : μ≤ρ if
μ(ω)≤ρ(ω) for each ω∈Ω.
Support: The support of μ is Supp(μ)={ω:μ(ω)>0}.
Representation:
We represent a distribution by explicitly indicating the support, and (as superscript) the probability assigned to each element by μ. We write μ={a0p0,…,anpn} if μ(a0)=p0,…,μ(an)=pn and μ(aj)=0 otherwise.
III-C Multidistributions
To syntactically represent the global evolution of a probabilistic system, we rely on the notion of multidistribution [2].
A multiset is a (finite) list of elements, modulo reordering, i.e.[a,b,a]=[a,a,b]=[a,b]; the multiset [a,a,b] has three elements. Let
X be a countable set
and m a multiset of pairs of the form pM, with p∈]0,1], and M∈X.
We call m=[piMi∣i∈I] (where the index set I ranges over the elements of m) a multidistribution on X if
∑i∈Ipi≤1. We denote by MDST(X) the set of all multidistributions on X.
We write the multidistribution [1M] simply as [M].
The sum of multidistributions is denoted by +, and it is the concatenation of lists.
The product q⋅m of a scalar q and a multidistribution m is defined pointwise: q⋅[p1M1,...,pnMn]=[(qp1)M1,...,(qpn)Mn].
Intuitively, a multidistribution m∈MDST(X) is a syntactical representation of a discrete probability space where at each element of the space is associated a probability and a term of X.
To the multidistribution m=[piMi∣i∈I] we associate a probability distribution μ∈DST(X) as follows:
[TABLE]
and we call μ the probability distribution associated to m.
Example 3** (Distribution vs. multidistribution).**
If m=[21a,21a], then μ={a1}. Please observe the difference between distribution and multidistribution: if m′=[1a], then m=m′, but μ=μ′.
III-D Binary relations (notations and basic definitions)
Let →r be a binary relation on a set X. We denote
→r∗ its reflexive and transitive closure.
We denote =r the reflexive, symmetric and transitive closure of →r.
If u∈X, we write u→r
if there is no t∈X such that
u→rt; in this case, u is in →r-normal form.
Figures convention: as is standard, in the figures we depict →∗ as ↠; solid arrows are universally quantified, dashed arrows are existentially quantified.
Confluence and Commutation
Let
r,s,t,u∈X. The relations →1 and →2 on Xcommute if (r→1∗s and r→2∗t) imply there is u such that
(s→2∗u and r3→1∗u); they
diamond-commute (⋄-commute) if (r→1s and r→2t) imply there is u such that (s→2u and t→1u). The relation
→ is confluent (resp. diamond) if it commutes (resp. ⋄-commutes) with itself. It is well known that
⋄-commutation implies commutation, and diamond implies confluence.
IV Call-by-Value calculus Λ⊕cbv
We define Λ⊕cbv, a CbV probabilistic λ-calculus.
IV-A Syntax of Λ⊕cbv
IV-A1 The language
Terms and values are generated respectively by the grammars:
where x ranges over a countable set of variables (denoted by x,y,…).
Λ⊕ and V denote respectively the set of terms and of values.
Free variables are defined as usual. M[N/x] denotes the term obtained by capture-avoiding substitution of N for each free occurrence of x in M.
Contexts (C) and surface contexts (S) are generated by the grammars:
where □ denotes the hole of the term context.
Given a term context C, we denote by C(M) the term obtained from C by filling the hole with M, allowing the capture of free variables.
All surface contexts are contexts. Since the hole will be filled with a redex, surface contexts formalize the fact that the redex (the hole) is not in the scope of a λ-abstraction, nor of a ⊕.
MDST(Λ⊕) denotes the set of multi-distributions on Λ⊕.
IV-A2 Reductions
We first define reduction rules on terms (Fig. 1), and one-step reduction from terms to multidistributions (Fig. 2). We then lift the definition of reduction to a binary relation on MDST(Λ⊕).
Observe that, usually, a reduction step is given by the closure under context of the reduction rules.
However, to define a reduction from term to term is not informative enough, because we still have to account for the probability.
The meaning of M⊕N is that this term reduces to
either M or N, with equal probability21. There are various way to formalize this fact; here, we use multidistributions.
Reduction Rules and Steps
The reduction rules on the terms of Λ⊕ are defined in Fig. 1.
The (one-step) reduction relations→βv,→⊕⊆Λ⊕×MDST(Λ⊕) are defined in Fig. 2. Observe that the probabilistic rules ↦r⊕,l⊕ are closed only under surface contexts, while the reduction rule ↦βv is closed under general context C (hence Λ⊕cbv is a conservative extension of Plotkin’s CbV λ-calculus, see IV-B).
We denote by → the union →βv∪→⊕.
Lifting
We lift the reduction relation →⊆Λ⊕×MDST(Λ⊕) to a relation ⇒⊆MDST(Λ⊕)×MDST(Λ⊕), as defined in Fig. 3. Observe that ⇒ is a reflexive relation.
We define in the same way the lifting of any relation →r⊆Λ⊕×MDST(Λ⊕) to a binary relation ⇒r on MDST(Λ⊕). In particular, we lift →βv,→⊕ to ⇒βv,⇒⊕.
Reduction sequences
A ⇒-sequence (reduction sequence) from m is a
sequence m=m0,…,mi,mi+1,… such that mi⇒mi+1 (∀i).
We write m⇒∗n to indicate that there is a finite sequence from m to n, and ⟨mn⟩n∈N
for an infinite sequence.
βv equivalence
We write
=βv for the transitive, reflexive and symmetric closure of ⇒βv; abusing the notation, we will
write M=βvN for [M]=βv[N].
Normal Forms
N denotes the set of →-normal forms.
Given →r∈Λ⊕×MDST(Λ⊕),
a term M is in →r-normal form if M→r, i.e. there is no m such that M→rm.
It is easy to check that all closed →-normal forms are values, however a value is not necessarily a →-normal form.
IV-A3 Full Lifting
The definition of lifting allows us to apply a reduction step → to any number of Mi in the multidistribution m=[piMi∣i∈I].
If no Mi is reduced, then m⇒m (the relation ⇒ is reflexive).
Another important case is when allMi for which a reduction step is possible are indeed reduced.
This notion of full reduction, denoted by ⇉, is defined as follows.
Obviously, ⇉⊂⇒.
Similarly to lifting, also the notion of full lifting can be extended to any reduction. For any →r⊆Λ⊕×MDST(Λ⊕), its full lifting is denoted by ⇉r⊆MDST(Λ⊕)×MDST(Λ⊕).
The relation ⇉ plays an important role in VII.
IV-B Λ⊕cbv* and the λ-calculus*
A comparison between Λ⊕cbv and the λ-calculus is in order.
Let Λ be the set of λ-terms; we denote
by Λcbn the CbN λ-calculus, equipped with the reduction →β [4], and by Λcbv the CbV λ-calculus, equipped with the reduction →βv [25].
Λ⊕cbv is a conservative extension of Λcbv.
A translation
(⋅)λ:Λ⊕→Λ can be defined as follows, where z is a fresh variable which is used by no term:
[TABLE]
The translation is injective (if (M)λ=(N)λ then M=N) and preserves values.
Proposition 4** (Simulation).**
The translation is sound and complete.
Let M,N∈Λ⊕.
M→βvN* implies (M)λ→βv(N)λ;*
2. 2.
(M)λ→βvQ* implies there is a (unique) N, with Q=(N)λ and M→βvN.*
IV-C Discussion (Surface Contexts)
The notion of surface context which we defined is familiar in the setting of λ-calculus: it corresponds to weak evaluation, which we discussed in II-C.
In Λ⊕cbv, the →βv-reduction is unrestricted.
Closing the ⊕-rules under surface context S expresses the fact that the ⊕-redex is not reduced under
λ-abstraction, nor in the scope of another ⊕. The former is fundamental to confluence: it means that a function which samples from a distribution can be duplicated, but we cannot pre-evaluate the sampling. The latter is a technical simplification, which we adopt to avoid unessential burdens with associativity. To require no reduction in the scope of ⊕ is very similar to allow no reduction in the branches of an if-then-else.
V Confluence and Standardization
V-A Confluence
We prove that Λ⊕cbv is confluent.
We modularize the proof using the Hindley-Rosen lemma. The notions of commutation and ⋄-commutation which we use are reviewed in Sec. III-D.
Lemma** (Hindley-Rosen).**
Let →1 and →2 be binary relations on the same set R. Their union →1∪→2 is confluent if both →1 and →2 are confluent, and
→1 and →2 commute.
The following criterion allows us to work pointwise in proving commutation and confluence of binary relations on multidistributions, namely ⇒βv and ⇒⊕.
Lemma 5** (Pointwise Criterion).**
Let →o,→b⊆Λ⊕×MDST(Λ⊕) and ⇒o,⇒b their lifting (as defined in IV-A2).
Property () below implies that ⇒o,⇒b⋄-commute.*
() If M→bn and M→os, then ∃r s.t. n⇒or and s⇒br.*
Proof.
We prove that () m⇒bn and m⇒os imply exists r s.t. n⇒or and s⇒br.
Let m=[piMi∣i∈I]. By definition of lifting, for each Mi, we have [Mi]⇒bni and [Mi]⇒osi,
with n=∑pi⋅ni and s=∑pi⋅si.
It is easily checked,
that for each Mi, it exists ri s.t. ni⇒ori and si⇒bri.
If either [Mi]⇒bni or [Mi]⇒osi uses reflexivity (rule L1), it is immediate to obtain ri. Otherwise, ri is given by
property (*).
Hence r=∑ipi⋅ri satisfies ().
∎
We derive confluence of ⇒βv from the same property in the CbV λ-calculus [25, 27], using the simulation of Prop. 4.
Lemma 6**.**
The reduction ⇒βv is confluent.
Proof.
Assume m⇒βv∗n and m⇒βv∗s.
We first observe that if m=[piMi∣i∈I], then n and s are respectively of the shape [piNi∣i∈I], [piSi∣i∈I], with Mi→βv∗Ni and Mi→βv∗Si. By Prop. 4, we can project such reduction sequences on Λcbv, obtaining that for each i∈I, (Mi)λ→βv∗(Ni)λ and (Mi)λ→βv∗(Si)λ.
Since →βv in CbV λ-calculus is confluent, there are Ri∈Λ such that (Ni)λ→βv∗Ri and (Si)λ→βv∗Ri. By Prop. 4.2, for each i∈I there is a unique Ti∈Λ⊕ such that (Ti)λ=Ri, and the proof is given. ∎
We prove that the reduction ⇒⊕ is diamond, i.e., the reduction diagram closes in one step.
Lemma 7**.**
The reduction ⇒⊕ is diamond.
Proof.
We prove that if M→⊕n and M→⊕s , then ∃r such that n⇒⊕r and
s⇒⊕r. The claim then follows by Lemma 5, by taking →o\leavevmode=\leavevmode→b\leavevmode=\leavevmode→⊕.
Let M=S(P⊕Q)=S′(P′⊕Q′), n=[21S(P),21S(Q)] and
s=[21S′(P′),21S′(Q′)]. Because of definition of surface context, the two ⊕-redexes do not overlap: P′⊕Q′ is a subterm of S and P⊕Q is a subterm of S′. Hence we can reduce those redexes in S and S′, to obtain r.
∎
We prove commutation of ⇒⊕ and ⇒βv by proving a stronger property: they ⋄-commute.
Lemma 8**.**
The reductions
⇒βv and ⇒⊕⋄-commute.
Proof.
By using Lemma 5, we only need to prove that
if M→βvn and M→⊕s, then ∃r such that n⇒⊕r and s⇒βvr.
The proof is by induction on M.
Cases M=x and M=λx.P are not possible given the hypothesis.
Case M=P⊕Q. M is the only possible ⊕-redex. Assume the βv-redex is inside P (the other case is similar), and that P⊕Q→βv[P′⊕Q], P⊕Q→⊕[21P,21Q]. It is immediate that r=[21P′,21Q] satisfies the claim.
2. 2.
Case M=PQ. M cannot have the form (λx.P′)V because neither P nor Q could contain a ⊕-redex.
(a)
Assume that the βv-redex is inside P, and the ⊕-redex inside Q.
We have PQ→βv[P′Q] (with P→βvP′), PQ→⊕[21PQ′,21PQ′′] (with Q→⊕[21Q′,21Q′′]). It is immediate that
r=[21P′Q′,21P′Q′′] satisfies the claim. The symmetric case is similar.
2. (b)
Assume that both redexes are inside Q. Let us write M as S(Q). Assume
Q→βv[N], Q→⊕[21Q′,21Q′′], therefore S(Q)→βv[S(N)]=n and S(Q)→⊕[21S(Q′),21S(Q′′)]=s. We
use the inductive hypothesis on Q to obtain r′=[21R′,21R′′] such that [N]⇒⊕[21R′,21R′′], [Q′]⇒βv[R′], [Q′′]⇒βv[R′′].
We conclude that for r=[21S(R′),21S(R′′)], it holds that n⇒⊕r and s⇒βvr.
Let us call n an N-multidistribution if n∈MDST(N)i.e.n=[piMi] and all Mi are →-normal forms.
The following fact is an immediate consequence of confluence:
Fact**.**
The N-multidistribution to which m reduces, if any, is unique.
V-A1 Discussion
While immediate, the above fact is hardly useful, for two reasons.
First, we know that probabilistic termination is not necessarily reached in a finite number of steps; the relevant notion is not that m⇒∗n∈MDST(N), but rather that of a distribution which is defined as limit by the sequence ⟨mn⟩n∈N.
Secondly, in Plotkin’s CbV calculus the result of computation is formalized by the notion of value, and considering normal forms as values is unsound ([25], page 135).
In Section VI-B we introduce a suitable notion of limit distribution, and study the implications of confluence on it.
V-B A Standardization Property
In this section, we first introduce surface and left reduction as strategies for ⇒. In the setting of the CbV λ-calculus, the former corresponds to weak reduction,
the latter to the standard strategy originally defined in [25].
We then establish a standardization result, namely that every finite⇒-sequence can be partially ordered as a sequence in which all surface reductions are performed first. A counterexample shows that in Λ⊕cbv, a standardization result using left reduction fails.
V-B1 Surface and Left Reduction
We remind the reader that in the λ-calculus, a deterministic strategy defines a function from terms to redexes, associating to every term the next redex to be reduced. More generally, we call reduction strategy for → a reduction relation→a such that
→a⊆→. The notion of strategy can be easily formalized through the notion of context. With this in mind, let us consider surface and left contexts.
Left contextsL are defined by the following grammar:
[TABLE]
Note that in particular a left contexts is a surface context.
•
We call surface reduction, denoted by →s (with lifting ⇒s) and left reduction, denoted by →l (with lifting ⇒l), the closure of
the reduction rules in Fig. 1 under surface contexts and left contexts, respectively. It is clear that →s\leavevmode=\leavevmode→sβv∪→⊕. Observe that →l⊊→s.
•
A reduction step M→m is deep, written M→dm, if it is not a surface step. A reduction step is internal (written M→intm) if it is not a left step. Observe that →d⊂→int.
Example 10**.**
•
(→l⊊→s)
Let M=x(II)(II), where I=λx.x. Then M→s[xI(II)] and M→s[x(II)I]; instead,
M→l[xI(II)], M→l[x(II)I].
•
(→d⊊→int) Let M=(λx.II)(II). Then M→int(λx.I)(II) and M→int(λx.II)I, while
M→d(λx.I)(II) and M→d(λx.II)I
Intuitively, left reduction chooses the leftmost of the surface redexes. More precisely, this is the case for closed terms (for example, the term (xx)(II) has a →s-step, but no →l-step).
Surface Normal Forms:
We denote by Scbv the set of →s-normal forms. We observe that all values are surface normal forms (but the converse does not hold):
V⊊Scbv (and N⊊Scbv).
The situation is different if we restrict ourselves to close term, in fact the following result holds, which is easy to check.
Lemma 11**.**
If M is a closed term, the following three are equivalent:
M* is a →s-normal form;*
2. 2.
M* is a →l-normal form;*
3. 3.
M* is a value.*
V-B2 Finitary Surface Standardization
The next theorem proves a standardization result, in the sense that every finite reduction sequence can be (partially) ordered in a sequence of surface steps followed by a sequence of deep steps.
In Λ⊕cbv, if m⇒∗n then exists r such that m⇒s∗r and r⇒d∗n.
Proof.
We build on an analogous result for CbV λ-calculus, which is folklore and is proved explicitly in
Appendix V-B. We then only need to check that deep steps commute with ⊕-steps, which is straightforward technology (the full proof is in Appendix V-B).
∎
Finitary Left Standardization does not hold
The following statement is false for Λ⊕cbv.
“If m⇒∗n then there exists r such that m⇒l∗r and r⇒int∗n."
Example 13** (Counter-example).**
Let us consider the following sequence, where I=λx.x and
M=(II)((λx.y⊕z)I).
\boldsymbol{[}M\boldsymbol{]}\overset{{}_{\textsf{int}}}{\Rightarrow}\boldsymbol{[}(II)(y\oplus z)\boldsymbol{]}$$\Rightarrow_{\oplus}\boldsymbol{[}\frac{1}{2}(II)y,\frac{1}{2}(II)z\boldsymbol{]}\Rightarrow_{\beta_{v}}\boldsymbol{[}\frac{1}{2}Iy,\frac{1}{2}(II)z\boldsymbol{]}.
If we anticipate the reduction of (II), we have M→lβv[I((λx.y⊕z)I)], from where we cannot reach [21Iy,21(II)z]. Observe that the sequence is already surface-standard!
VI Asymptotic Evaluation
The specificity of probabilistic computation is to be concerned with asymptotic behavior; the focus is not what happens
after a finite number n of steps, but when n tends to infinity.
In this section, we study the asymptotic behavior of ⇒-sequences with respect to evaluation.
The intuition is that a reduction sequence defines a distribution on the possible outcomes of the program.
We first clarify what is the outcome of evaluating a probabilistic term, and then
we formalize the idea of result “at the limit" with the notion of limit distribution (Def. 18). In Sec. VI-B we investigate
how the asymptotic result of different sequences starting from the same m compare.
We recall that to each multidistribution m on Λ⊕ is associated a probability distribution μ∈DST(Λ⊕) (see Sec.III-C). We use the following letter convention: given a multidistribution
m,n,r,... we denote the associated distribution by the corresponding Greek letter μ,ν,ρ,...
If ⟨mn⟩n∈N is a ⇒-sequence, then ⟨μn⟩n∈N
is the sequence of associated distributions.
VI-A Probabilistic Evaluation
We start by studying the property of being valuable (VI-A1) and by analyzing some examples (VI-A2). This motivates the more general approach we introduce in VI-A3.
VI-A1 To be valuable
In the CbV λ-calculus, the key property of a term M is *to be valuable, i.e., M can reduce to a value.
To be valuable is a yes/no property, whose probabilistic analogous is the probability to reduce to a value. *
If m describes the result of a computation step, the probability that such a result is a value is simply
μ(V):=∑V∈Vμ(V), i.e. the probability of the event V⊂Λ⊕.
Since the set of values is closed under reduction, the following property holds:
Fact 14**.**
If V∈V and V→m, then m=[W], with W∈V, and V→βv[W].
Let ⟨mn⟩n∈N be a ⇒-sequence, and ⟨μn⟩n∈N the sequence of associated distributions.
The sequence of reals ⟨μn(V)⟩n∈N is nondecreasing and bounded,
because of Fact 14.
Therefore the limit exists, and is the supremum: limn→∞μn(V)=supn{μn(V)}.
This fact allows us the following definition.
•
The sequence ⟨mn⟩n∈Nevaluates with probability p
if p=supnμn(V), written
⟨mn⟩n∈N⇒\leavevmode\leavevmode\leavevmode∞p.
•
m is * p-valuable* if p is the greatest probability to
which a sequence from m can evaluate.
Example 15**.**
Let T=λxy.x and F=λxy.y.
Consider the term PP where P=(λx.(xx⊕T)). Then PP→[(PP)⊕T]⇒[21PP,21T]⇒2n[2n1PP,21T,…,2n1T] .
Since limn→∞∑1n2n1=1, PP is 1-valuable.
2. 2.
Consider the term QQ, where Q=λx.(xx⊕(T⊕F)).
Then QQ→βv[(QQ)⊕(T⊕F)]⇒∗[21QQ,41T,41F]⇒∗…
It is immediate that QQ is 1-valuable.
3. 3.
Let Δ=λx.xx, so that ΔΔ is a divergent term, and let
N=λx.(xx)⊕(T⊕(ΔΔ)).
Then
NN→βv[(NN)⊕(T⊕(ΔΔ))]⇒∗[21NN,41T,41(ΔΔ)]⇒∗…NN is 21-valuable.
VI-A2 Result of a CbV computation
The notion of being p-valuable allows for a simple definition, but it is too coarse.
Consider Example 15; both 1) and 2) give examples of 1-valuable term. However, in 1) the probability is concentrated in the value T, while in 2) T and F have equal probability 21. Observe that T and F are different normal forms, and are not βv-equivalent. To discriminate between T and F, we need a finer notion of evaluation.
Since the calculus is CbV, the result “at the limit" is intuitively a distribution on the possible values that the term can reach.
Some care is needed though, as the following example shows.
Example 16**.**
Consider Plotkin’s CbV λ-calculus.
Let
ω3=λx.xxx; the term M=(λx.x)λx.ω3ω3 has the following →βv-reduction: M=(λx.x)(λx.ω3ω3)→βvM1=λx.ω3ω3→βvM2=λx.ω3ω3ω3→βv⋯. We obtain a reduction sequence where ∀n≥1, Mn=λx.ω3nω3...ω3.
Each Mi is a value, but there is not a "final" one in which the reduction ends.
Transposing this to Λ⊕cbv, let m0=[M], mi=[Mi]. The ⇒-sequence ⟨mn⟩n∈N is 1-valuable, but the distribution on values is different at every step. In other words, ∀V∈V, the sequence ⟨μn(V)⟩ has limit [math].
Observe that however all the values Mi are βv-equivalent.
VI-A3 Observations and Limit Distribution
Example 16 motivates the approach that we develop now: the result of probabilistic evaluation is not a distribution on values, but* a distribution on some events of interest*. In the case of Λ⊕cbv, the most informative events are equivalence classes of values.
We first introduce the notion of observation, and then that of limit distribution.
Definition 17**.**
A set of observations for (Λ⊕,⇒) is a set Obs⊆P(Λ⊕) such that ∀U,Z∈Obs, if U=Z then U∩Z=∅, and if m⇒m′ then μ(U)≤μ′(U).
Note that, given μ∈DST(Λ⊕), U∈Obs has probability μ(U) (similarly to the event "the result is Odd" in Example 2).
It follows immediately from the definition that, given a sequence ⟨mn⟩n∈N, then for each U∈Obs the sequence ⟨μn(U)⟩n∈N is nondecreasing and bounded, and therefore has a limit, the sup.
Moreover, monotony implies the following
[TABLE]
which guarantees that the distribution ρ in Def. 18 is well defined, because supn∥μn∥≤1 and (1) gives supn∥μn∥=∥ρ∥.
Definition 18**.**
Let Obs be a set of observations. The sequence ⟨mn⟩n∈N defines a distribution ρ∈DST(Obs), where ∀U∈Obs,
ρ(U):=supn{μn(U)}.
•
We call such a ρ the* limit distribution* of ⟨mn⟩n∈N. Letter convention: greek bold letters denote limit distributions.
•
The sequence ⟨mn⟩n∈Nconverges to (or evaluates to) the limit distribution
ρ, written
⟨mn⟩n∈N⇓Obsρ.
•
If m has a sequence which converges to ρ, we write
m⇒Obs∞ρ.
•
Given m, we denote by LimObs(m) the set {ρ∣m⇒Obs∞ρ} of all limit distributions of m.
If LimObs(m) has a greatest element, we indicate it by [[m]]Obs.
If Obs is clear from the context, we omit the index which specifies it, and simply write ⟨mn⟩n∈N⇓ρ, m⇒\leavevmode\leavevmode\leavevmode∞ρ,
Lim(m).
The notion of limit distribution formalizes what is the result of evaluating a probabilistic term, once we choose the set Obs of observations which interest us. In VI-B we prove that confluence implies that Lim(m) has a unique maximal element.
Sets of Observations for Λ⊕cbv
Let us consider two partitions of the set V⊂Λ⊕, the trivial one {V}, and the set V∼ of
values up to the equivalence =βv, i.e. the collection of all events {W∈V∣W=βvV}. For the set N of →-normal forms (see IV-A2), interesting partitions are {N} and
the set of singletons N{}:={{M},M∈N}.
Proposition 19**.**
{V}, V∼, {N} and N{} are each a set of observations for (Λ⊕,⇒).
Proof.
Clearly, any partition of N satisfies the conditions in Def. 17.
For {V} and V∼, the result follows from Fact 14.
∎
Notice that convergence w.r.t. {V} corresponds to the notion of being p-valuable. Instead {N} and N{} correspond to normalization and reaching a specific normal form, respectively; however these are events which are not significant in a CbV perspective, as we already discussed in V-A1. For this reason, in Sec. VII
we will focus on the study of Obs:=V∼ (Sec. VII).
Example 20**.**
•
Let Obs be either V∼ or N{}.
Let ⟨mn⟩n∈N be the sequence in Example 15.1, starting from [PP].
Then ⟨mn⟩n∈N⇓Obs{T1}.
2. 2.
Let ⟨mn⟩n∈N be the computation in Example 15.2, starting from [QQ].
Then ⟨mn⟩n∈N⇓Obs{T21,F21}.
3. 3.
Let ⟨mn⟩n∈N be the computation in Example 15.3, starting from [NN].
Then ⟨mn⟩n∈N⇓Obs{T21}.
•
Let ⟨mn⟩n∈N be the reduction sequence in Example 16, starting with [(λx.x)λx.ω3ω3]. By taking as set of observations V∼, the sequence converges to {λx.ω3ω31}.
Discussion
Each observation expresses a result of interest for the evaluation of the term M. To better understand this,
let us examine what become our notions of observation in the case of usual (non-probabilistic) CbV λ-calculus.
Let M→∗N∈U and U∈Obs; if U∈{V} then M is valuable, if U∈V∼, then M reduces to the value N up to βv-equivalence, if
U∈{N}, then M normalizes, finally U={N}∈N{} means that M has normal form N.
We say that U∈Obs is a result of evaluating M, if M→∗N∈U.
Clearly, fixed Obs, confluence implies that the result of evaluating M, if any, is unique.
Sets of observations for Surface Reduction
It is interesting to examine the set of observations for surface reduction⇒s.
When considering →s, values are →s-normal forms (the converse does not hold!). Therefore {{V}∣V∈V} (where {V} is a singleton) is a set of observations for (Λ⊕,⇒s). In other words, when restricting oneself to surface reduction, the result of a probabilistic computation (i.e. the limit distribution) is a distribution on the possible values of the term.
Observe that all set of observations for ⇒ (Prop. 19) are also set of observations for ⇒s.
VI-B Uniqueness and Adequacy of the Evaluation
In this section, we adapt similar results from [15], to which we refer for
details. We assume a set Obs to be fixed, hence we omit the index. For concreteness, think of V∼, but the results only depend on the properties in Def. 17, and on confluence.
How do different reduction sequences from the same initial m compare? More precisely, assume m⇒\leavevmode\leavevmode\leavevmode∞ρ and m⇒\leavevmode\leavevmode\leavevmode∞μ, how do ρ and μ compare?
Intuitively, the limit distributions of m (which are the result of a probabilistically terminating sequence) play the role of normal forms in finitary termination. As confluence implies uniqueness of normal forms, a similar property holds when considering probabilistic termination and limits,
in the sense that each m has a unique maximal limit distribution (Thm. 22). While the property is similar, the proof is not as immediate as in the finitary case.
The key result is Lemma 21 which implies both that Lim(m) has a greatest element (Thm. 22), and adequacy of the evaluation (Thm. 23).
Recall that the order ≤ on distributions is defined pointwise (Sec. III-A).
Lemma 21** (Main Lemma).**
Λ⊕cbv* has the following property:
∀m,s, if μ∈Lim(m), and m⇒∗s, then s⇒\leavevmode\leavevmode\leavevmode∞σ with μ≤σ. Moreover, if μ is maximal in Lim(m) then σ=μ.*
Proof.
Let μ∈Lim(m), and ⟨mn⟩n∈N be a sequence from m=m0 which converges to μ. Assume m⇒∗s. As illustrated in Fig. 5,
from s we build a sequence s=sm0⇒∗sm1⇒∗sm2…,
where each segment smi⇒∗smi+1 (i≥0) is given by confluence from mi⇒∗smi and mi⇒mi+1. Let ⟨sn⟩n∈N be the concatenation of all such segments and let σ be its limit distribution. Clearly, σ∈Lim(m). Since by construction mi⇒∗smi,
then for each V∈Obs, μi(V)≤σ(V) (because μi(V)≤σmi(V) by definition of observation).
Therefore
supn{μn(V)}=μ(V)≤σ(V). If μ is maximal, then σ=μ.
∎
Theorem 22** (Greatest Limit Distribution).**
Lim(m)* has a greatest element, which we indicate by [[m]].*
Proof.
The proof of both existence and uniqueness of maximal elements relies on Lemma 21.
Let us explicitly show uniqueness.
Let μ∈Lim(m) be maximal. Given any ρ∈Lim(m), we prove that ρ≤μ. Let ⟨rn⟩n∈N be a sequence from m such that ⟨rn⟩n∈N⇓ρ. By Lemma 21, ∀rn there is a ⇒-sequence from rn which has limit μ. Therefore ∀V∈V, ∀n, ρn(V)≤μ(V), hence
ρ(V)≤μ(V). If ρ is maximal, ρ=μ.
∎
Theorem 23** (Adequacy of evaluation).**
If m⇒∗s, then [[m]]=[[s]].
Proof.
Observe first that [[s]]∈Lim(m), hence [[s]]≤[[m]].
Indeed, if ⟨sn⟩n∈N⇓[[s]],
by concatenanting m⇒∗s with ⟨sn⟩n∈N,
we have m⇒\leavevmode\leavevmode\leavevmode∞[[s]]. By Lemma 21, it holds that [[m]]∈Lim(s), hence [[m]]≤[[s]]. Therefore [[m]]=[[s]].
∎
VII Asymptotic Standardization
In this section, we focus on V∼ as set of observations, which is the most natural choice in a CbV setting, in particular if we want to evaluate programs, i.e., closed terms.
We proved, in Thm. 22, that each m has a unique maximal limit distribution [[m]]. Now we address the question: is there a reduction strategy which is guaranteed to converge to [[m]]?
We show that surface evaluation provides such a strategy; indeed, any limit distribution in Lim(m) can be reached by surface evaluation (Thm. 26). This result of asymptotic completeness is the main technical contribution of the section.
Following the notation introduced in VI-A3, we denote by V the set {W∈V∣W=βvV}.
We observe that:
Fact 24**.**
Let M→dm, then m has form [P] and M=βvP;
M is a value if and only if P is a value.
As a consequence of the previous fact, we have
Lemma 25**.**
If m⇒ds then μ(V)=σ(V), and μ(V)=σ(V), for each V∈V∼.
We write m⇒\leavevmode\leavevmodes\leavevmode∞μ (resp. m⇒\leavevmode\leavevmodel\leavevmode∞μ) if there is a sequence ⟨mn⟩n∈N such that all steps mi⇒mi+1 are surface (resp. left) reductions
and ⟨mn⟩n∈N⇓μ.
Remember that given m, we write [[m]] for the unique maximal element of Lim(m), and m⇒\leavevmode\leavevmode\leavevmode∞μ if there exists a ⇒-sequence from m which converges to μ.
We now prove asymptotic completeness for surface evaluation.
We exploit finitary standardization (Thm. 12) and extend it to the limit. In the proof, it is essential the fact that ⇒d-steps preserve the distributions (Lemma 25).
Theorem 26** (Asymptotic Completeness of Surface Reduction).**
m⇒\leavevmode\leavevmode\leavevmode∞μ* if and only if m⇒\leavevmode\leavevmodes\leavevmode∞μ.*
Proof.
We prove that m⇒\leavevmode\leavevmode\leavevmode∞μ implies m⇒\leavevmode\leavevmodes\leavevmode∞μ (the other direction holds by definition).
Assume ⟨mn⟩n∈N⇓μ, with m=m0. As illustrated in Fig. 5,
we build a sequence ⟨smn⟩ such that m0=sm0 and ∀i (smi⇒s∗smi+1 and smi+1⇒d∗mi+1). If i=0, by Thm. 12 it exists sm1 such that m0=sm0⇒s∗sm1⇒d∗m1.
We then procede by induction: for each i>0, we apply Thm. 12 to the sequence smi⇒d∗mi⇒mi+1, and obtain
the multidistribution smi+1 such that smi⇒s∗smi+1 and smi+1⇒d∗mi+1, as wanted. The concatenation of all segments sm0⇒s∗sm1,...,smi⇒s∗smi+1,... is a ⇒s-sequence. Let σ be its limit.
By Lemma 25 and the fact that smi⇒d∗mi, we have σmi(V)=μi(V), for each V∈V∼. We conclude σ=μ because ∀i:
σmi(V)=μi(V)≤μ(V), therefore σ(V)≤μ(V).
2. 2.
μi(V)=σmi(V)≤σ(V), therefore μ(V)≤σ(V).
∎
Remark 27**.**
We observe that completeness of surface evaluation (Thm. 26) is specific to convergence w.r.t. V∼ and {V} (the most natural set of observations in CbV).
Surface evaluation is not necessarily complete if we evaluate w.r.t. other sets of observations, such as normal forms, where deep steps may be needed. Consider, for example, the term
λz.II→dλz.I.
To define a complete strategy w.r.t. N{} demands to refine the approach.
VII-A Surface and Left Evaluation
We are now equipped to tackle the goal of this section, namely the existence of a strategy to find the greatest limit distribution of a program.
Since our aim is to reach the greatest limit, it makes sense to reduce "whenever is possible", and use the full lifting ⇉ (Def. IV-A3).
The reason is easy to see. Consider for example m=[21ΔΔ,21II], which has greatest limit [[m]]={I21}. We observe that a ⇒-sequence from m may very well keep reducing only the diverging term ΔΔ and never reach [[m]]. The reduction ⇉, instead, forces the reduction of each term which is not in normal form for →.
Lemma 28**.**
Let ρ be maximal among the limit distribution of all ⇉-sequences from m. Then ρ=[[m]].
Proof.
Obviously, ρ∈Lim(m). It is straightforward to check that if μ is the limit of a ⇒-sequence, then there is a ⇉-sequence, whose limit is greater or equal to μ.
∎
We write \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}} (resp. \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{l}}\vss}}}) for the full lifting of →s (resp. →l). Observe that given m, there is only one \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{l}}\vss}}}-sequence.
We use the letters l=⟨ln⟩n∈N,\leavevmodes=⟨sn⟩n∈N,\leavevmodet=⟨tn⟩n∈N to indicate (infinite) reduction sequences.
We say that m is closed if it is a multidistribution on closed terms i.e.m=[piMi∣i∈I] with Mi closed ∀i∈I.
Proposition 29** (Left Evaluation).**
Let m be closed.
Let s,t be \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequences from m; s⇓μ if and only if t⇓μ.
2. 2.
Let s be any \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequence from m, and l the \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{l}}\vss}}}-sequences from m. Then s⇓μ if and only if l⇓μ.
Proof.
[15], Sec.6, studies a CbV probabilistic λ-calculus with surface reduction (Λ⊕weak) and proves using a diamond property that if
\mathtt{m}{\mathrel{\mathop{\rightrightarrows^{k}}\limits^{\vbox to-1.50694pt{\kern 0.0pt\hbox{\tiny\textsf{s}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ }\vss}}}}\mathtt{m}_{k} and \mathtt{m}{\mathrel{\mathop{\rightrightarrows^{k}}\limits^{\vbox to-1.50694pt{\kern 0.0pt\hbox{\tiny\textsf{s}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ }\vss}}}}\mathtt{r}_{k} (both sequence have k steps)
then ∀V∈V, μk(V)=ρk(V). Hence claim (1.) follows.
Claim (2.) follows from (1.) and from Lemma 11, which implies that if M→sn is closed, we can always choose a surface step which is a →l-step.
∎
Putting all elements together, we have proved that the limit distribution of any\mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequence from m is [[m]].
In particular,
[[m]] is also the limit distribution of the \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{l}}\vss}}}-sequence from m.
Theorem 30**.**
For m closed, the following hold.
Let s be any \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequence from m. Then s⇓[[m]].
2. 2.
Let l be the \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{l}}\vss}}}-sequence from m. Then l⇓[[m]].
3. 3.
The sets {ρ∣m⇒\leavevmode\leavevmode\leavevmode∞ρ},{ρ∣m⇒\leavevmode\leavevmodes\leavevmode∞ρ}, and
{ρ∣m⇒\leavevmode\leavevmodel\leavevmode∞ρ} have the same greatest element, which is [[m]].
While left reduction is not standard for finite sequences (as Example 13 shows), still is able to reach [[m]], if we only evaluate programs, i.e., closed terms.
Thm. 30 justifies (a posteriori!) the use of the leftmost-outermost strategy in the literature of probabilistic λ-calculus: left evaluation actually produces the best asymptotic result. However, it is not the only strategy to achieve this: any \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequence will.
VIII Summing-up and Overview
The definition of reduction in Λ⊕cbv is based on two components: the βv-rule and the ⊕-rule. We stress that only the ⊕-step is constrained, while βv is inherited "as is" from the λ-calculus. The βv-rule is allowed in all contexts, while the ⊕-rule is disabled in a function body.
This avoids confusion between duplicating a function which performs a choice, and duplicating the choice, that is the core of confluence failure.
It is then natural to expect that the fine control on duplication which is offered by linear logic could be beneficial.
In Sec. IX we apply the methods and tools which we have developed to study Λ⊕cbv to
define a probabilistic linear calculus Λ⊕! which extends with a probabilistic choice Simpson’s linear λ-calculus [30].
This is a result of interest in its own, but also evidence that our approach is robust, as it transfers well to other probabilistic calculi.
In Sec. X we then define a call-by-name probabilistic calculus, Λ⊕cbn, and we show that similar results to the ones we have established for Λ⊕cbv hold.
As we will see, the three calculi follow the same pattern: the ⊕-reduction (and only this reduction) is restricted to surface contexts. In Sec. XI we
discuss how the three calculi relate.
IX Probabilistic Linear Lambda Calculus
Λ! [30] is an untyped linear λ-calculus which is closely based on linear logic.
Abstraction is refined into linear abstraction λx.M and non-linear abstraction λ!x.M, which allows duplication of the argument. The argument of λ!x.M is required to be suspended as thunk !N, that corresponds to the !-box of linear logic.
In this section, we define a probabilistic linear λ-calculus Λ⊕! by extending Λ! with an operator ⊕.
We demand that probabilistic choice is not reduced under the scope of a ! operator, while the β-reduction is unrestricted.
We show that this suffices to preserve confluence; we then study the properties of the calculus.
IX-A Syntax of Λ⊕!
IX-A1 The language
Raw termsM,N,… are built up from a countable set of variables x,y,… according to the grammar:
We say that x is affine (resp. linear) in M if x occurs free at most (resp. exactly) once in M, and moreover, the free occurrence of x does not lie within the scope of a ! operator. A term M is affine (resp. linear) if for
every subterm λx.P of M, x is so in P.
Henceforth, we consider affine terms only.
It is immediate to observe that if M is affine (linear) and M→N, then N is affine (linear).
Contexts (C) and surface contexts (S) are generated by the grammars:
where □ denotes the hole of the term context.
Observe that a surface context is defined in a different way than in IV-A. Here it expresses the fact that a
surface redex cannot occur in the scope of a ! operator (nor in the scope of a ⊕).
IX-A2 Reductions
We follow the same pattern as for Λ⊕cbv.
The beta rules ↦β are given in Fig. 7.
The probabilistic rules ↦l⊕,↦r⊕
are as in Fig. 1. The reduction steps are in Fig. 6: the β-rule is closed under general context, while the ⊕-rules are closed under surface contexts. The β-rules also can be restricted to the closure under surface contexts, as shown in Fig. 6. A →-step is deep (written →d) if it is not surface.
The lifting of the relation →:Λ⊕!×MDST(Λ⊕!) to a binary relation on ⇒MDST(Λ⊕!) is defined as in Fig. 3.
Remark 31**.**
To limit notations for reductions and contexts, we use the same as for Λ⊕cbv, clearly the meaning is different.
IX-B Λ⊕!* is a conservative extension of Λ!*
As in IV-B,
we denote by →β both the reduction in Λ! and the β reduction in Λ⊕!; we prove that (Λ⊕!,⇒β) is a conservative extension of (Λ!,→β).
Definition 32** (Translation).**
(⋅)!:Λ⊕!→Λ! is defined in the following way, where z is a fixed fresh variable
[TABLE]
Note that the translation of terms of the form M⊕N is designed so to preserves surface reduction.
Proposition 33** (Simulation).**
Let M∈Λ⊕!.
M→β[N]* implies (M)!→β(N)!.*
2. 2.
(M)!→βP* implies that exists (unique) N∈Λ⊕!, with N=(P)! and M→β[N].*
3. 3.
M→sβ[N]* implies (M)!→sβ(N)!.*
4. 4.
(M)!→sβP* implies exists (unique) N∈Λ⊕!, s.t. N=(P)! and M→sβ[N].*
The translation tells us that the reduction [M]⇒β[N] on Λ⊕! behaves as the reduction (M)!→β(N)! on Λ!.
IX-C Confluence and Finitary Standardization for Λ⊕!
Surface Standardization. If M→β∗N then exists R such that M→sβ∗R and R→d∗N.
We show, using the methods developed for Λ⊕cbv and the translation in Def. 32, that the same properties hold for Λ⊕!.
IX-C1 Confluence
We
follow the same approach as in Sec. V-A.
In fact, we already have most of the building blocks for the proof. Observe that Lemma 5 is general enough to apply also to binary relations on MDST(Λ⊕!).
Lemma 34**.**
The reduction ⇒⊕ is diamond.
2. 2.
The reduction ⇒β is confluent.
3. 3.
The reductions ⇒β and ⇒⊕ commute.
Proof.
The details of the proof are in Appendix -E1. The proof of 1) and 2) is as for Lemmas 7 and 6; 3) is proved using Lemma 5, by induction on the term.
∎
Normal forms are defined as in IV-A2; we denote by N! the set of →-normal forms, and by S!
the set of the surface normal forms (i.e. the →s-normal forms). Clearly N!⫋S!.
We define N{}!:={{M},M∈N!}, and S∼! as the set of all events R:={S∈S!∣S=βR}.
Observations
A set of observations for (Λ⊕!,⇒) is defined in the same way as that for (Λ⊕,⇒) (Def. 17 ).
Proposition 37**.**
Each of the following sets {N!}, {S!}, N{}!,
S∼!, is a set of observations for (Λ⊕!,⇒).
Limit distributions and evaluation
Once we fix a set of observations Obs for (Λ⊕!,⇒),
the definition of evaluation and limit distribution, and the notations ⟨mn⟩n∈N⇓ρ, m⇒\leavevmode\leavevmode\leavevmode∞ρ and Lim(m)
are as in Def. 18.
We already observed that Thm. 22 and 23 only depends on confluence, and on the definition of observations;
therefore both hold.
Theorem 38**.**
For any choice of Obs,
Λ⊕! has the properties:
•
Lim(m)* has a greatest element, which we indicate as [[m]].*
•
If m⇒∗s, then [[m]]=[[s]].
Asymptotic Standardization
For the rest of the section we focus on Obs:=S∼!.
Notice that if ρ is a limit distribution, ρ∈MDST(S∼!).
We have established that for each m∈Λ⊕!, Lim(m) has a unique maximal element [[m]]. We now want to
have a strategy to find [[m]]. Surface reduction plays that role.
We use the following fact, which is easy to verify.
Fact 39**.**
Let M→dn. Then
n* is of the form [N], and M=βN;*
2. 2.
M∈S!* if and only if N∈S!.*
Theorem 40** (Asymptotic Completeness).**
In Λ⊕! it holds that m⇒\leavevmode\leavevmode\leavevmode∞μ if and only if m⇒\leavevmode\leavevmodes\leavevmode∞μ.
Similarly to Sec. VII, we can establish that any (infinitary) \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequences from m converges precisely to [[m]], where \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}} indicate the full lifting of the relation →s⊆Λ⊕!×MDST(Λ⊕!).
Theorem 41** (Surface Evaluation).**
Let s=⟨sn⟩n∈N be any \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequences from m. It holds that s⇓[[m]].
X Call-by-Name calculus Λ⊕cbn
We show that results similar to those for Λ⊕cbv hold for a CbN calculus, denoted Λ⊕cbn.
We could adapt all the proofs, but now we prefer to follow a different way.
Once we take the point of view of linear logic, we have a roadmap to CbN via Girard’s translation of intuitionistic into linear logic. More precisely, we rely on recent work [12, 17] which expresses those translations in untypedλ-calculus. We exploit the faithful nature of the translation to transfer both confluence and standardization from Λ⊕! to Λ⊕cbn, essentially for free.
X-A Syntax of Λ⊕cbn
We write Λ⊕cbn for the set of terms Λ⊕ equipped with the reduction relation ⇒ defined below.
X-A1 The language
Terms and contexts(C) are the same as in Λ⊕cbv.
Surface contexts (S) are generated by the grammar:
The β-rule ↦β is as in the CbN λ-calculus (Fig. 8). The probabilistic rules ↦l⊕,↦l⊕
are as in Fig. 1.
Reduction steps→,→β,→⊕⊆Λ⊕×MDST(Λ⊕) and surface reduction steps→s,→s⊕,→sβ⊆Λ⊕×MDST(Λ⊕) are defined
in Fig. 6, following the usual pattern.
By definition of surface context, a reduction step is surface if it does not occur in argument position (nor in the scope of ⊕).
The lifting of →⊆Λ⊕×MDST(Λ⊕) to a binary relation ⇒ on MDST(Λ⊕cbn) is defined as in Fig. 3. The full lifting ⇉ is defined as in IV-A3.
X-A3 Normal Forms
We denote by Ncbn the set of →-normal forms, and by Scbn
the set of the surface normal forms (i.e. the →s-normal forms). Clearly Ncbn⫋Scbn.
Let us extend to Λ⊕cbn the notion of head normal form.
Head reduction →h is
the closure of both the β and the probabilistic rules under head context H,
which is defined by the following grammar
H::=λx.H∣KK::=□∣KM( head contexts )
Remark**.**
A common way to write head context H is as follows:
[TABLE]
Observe that →h\leavevmode⫋\leavevmode→s (for example, the reduction (λx.(λy.y)P)Q→s(λx.P)Q is not a head reduction). However, the two relations have the same normal forms. Let us write H for the set of head normal forms .
If M is in surface normal form, it is also in head normal form.
It is easy to verify that a head normal form has no →s-redex, and conclude:
Scbn=H
X-A4 Λ⊕cbn to Λ⊕!.
In [17], the translation from Λcbn into a linear λ-calculus is proved sound and complete.
We follow their work to define a similar translation (⋅)N:Λ⊕cbn→Λ⊕!:
[TABLE]
[TABLE]
The following extend to the probabilistic setting an analogous result proved in [17].
Observe that, with a slight abuse of notation, reductions in the two calculi are denoted in the same way, the meaning being clear from the context.
Proposition 42** (Simulation).**
The translation (.)N is sound and complete; it preserves surface reduction and surface normal forms.
Let M∈Λ⊕cbn; the following hold:
if M→n then (M)N→(n)N;
2. 2.
if M→sn then (M)N→s(n)N;
3. 3.
if (M)N→s then ∃!n such that s=(n)N and M→n;
4. 4.
if (M)N→ss then ∃!n such that s=(n)N and M→sn;
5. 5.
X-B *Confluence and Finitary Standardization for Λ⊕cbn *
The fact that surface reduction is preserved by (.)N is crucial to transfer the standardization result from Λ⊕! to Λ⊕cbn. We show that via translation,
Λ⊕cbn inherits both the confluence and the surface standardization property from Λ⊕!.
If m⇒∗n then exists r such that m⇒s∗r and r⇒d∗n.
Proof.
From Thm. 36, by using back-and-forth Thm 42, and the fact that the translation preserves surface reduction. ∎
In the classical λ-calculus, the standardization property (Barendregt, Th. 11.4.7) says that every reduction sequence can be ordered in such a way to perform first only left β-redexes, reading the term from left to right, and then internal ones (a redex is internal if it is not the leftmost one).
In Λ⊕cbn this notion of standardization fails, as the following example (which we take from [19]) shows.
Example 45**.**
In each step, we underline the redex. Consider
[(λx.I(y⊕z))I]⇒[(λx.y⊕z)I]⇒[21(λx.y)I,21(λx.z)I]⇒[21y,21(λx.z)I],
where only the last step reduces a left redex.
If we perform the left redex first, we have [(λx.I(y⊕z))I]⇒[I(y⊕z)], from which [21y,21(λx.z)I] cannot be reached.
A consequence of standardization
is that M has a head normal form iff the →h-sequence from M terminates.
In the following section we retrieve an analogue of this result.
X-C Asymptotic behaviour
We denote by H∼ the set of head normal forms up to the equivalence =β, and we define N{}cbn={{M},M∈Ncbn}.
Each of the following is a set of observations for Λ⊕cbn: {H}, {Ncbn}, H∼, N{}cbn.
Convergence and Limit distributions
Once we fix a set of observations Obs for Λ⊕cbn,
the definition of convergence and limit distribution
are as in Def. 18.
We observe that Theorems 22 and 23 both hold. Hence in particular
Theorem 47**.**
For any choice of Obs, the following holds in
Λ⊕cbn : given m,
Lim(m) has a greatest element [[m]].
We now study the notion of convergence induced by* choosing head normal forms as outcome*, i.e.Obs:=H∼.
Therefore, if ρ∈Lim(m), it holds ρ∈MDST(H∼).
The following results match the analogous results in Λ⊕! (Thm. 40 and 41).
Theorem 48**.**
Let Obs:=H∼. For every multidistribution m:
•
m⇒\leavevmode\leavevmode\leavevmode∞μ* if and only if m⇒\leavevmode\leavevmodes\leavevmode∞μ.*
•
If ⟨sn⟩n∈N is a \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequences of full surface reductions from m, then ⟨sn⟩n∈N⇓[[m]].
Similarly to Prop. 29, it is not hard to
prove that in Λ⊕cbn, \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}} satisfies a diamond property
in the sense of [15], and hence
all \mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{s}}\vss}}}-sequences from m
converge to the same limit distribution.
Since →h⊂→s and since head reduction and surface reduction have the same normal forms, we can always choose a →l step whenever a →s-step is possible. This allows us to retrieve a result of completeness for head reduction:
**
Let ⟨sn⟩n∈N be the\mathrel{\mathop{\rightrightarrows}\limits^{\vbox to0.0pt{\kern 0.0pt\hbox{\tiny\textsf{h}}\vss}}}-sequences of full head reductions from m. It holds that ⟨sn⟩n∈N⇓[[m]].
Once again, this justifies a posteriori the choice of head reduction in probabilistic CbN (such as [13]).
Observe that we follow the same reasoning as in the case of Λ⊕cbv (with V∼ as set of observations). First we proved that surface reduction is sufficient to reach the greatest limit distribution, then we observed that in particular left reduction can be chosen. There is a close parallelism between Λ⊕cbv and Λ⊕cbn: similar results hold if we consider as set of observations V∼ and H∼ respectively.
XI Conclusion and discussion
XI-A Summary
In this paper we design two probabilistic extensions of respectively the CbV and CbN λ-calculus, Λ⊕cbv and Λ⊕cbn, which we propose as foundational calculi for probabilistic computation. Both calculi enjoy confluence and standardization, in an extended way. Namely, first we prove both properties for the finite sequences, exploiting classical methods, then we extend these properties to the limit, developing new sophisticated proof methods.
In particular, we prove the uniqueness of the (maximal) result, parametrized by the notion of set of observations,
and that the asymptotic extension of surface standardization supplies a family of complete reduction strategies which are guaranteed to reach the best result.
The two calculi have a common root in the linear λ-calculus Λ⊕!, which is both a technical tool
and a calculus of interest in its own, in which a fine control of the interaction between copying and choice is possible.
In all three calculi, β-reduction is unconstrained; hence for each calculus, its restriction to only β-reduction exactly gives the usual corresponding (CbN, CbV, or linear) λ-calculus; this is not the case for extensions in which a strategy is fixed.
New proof methods include the asymptotic extension of surface standardization (Thm. 26), and the use of a translation to transfer standardization properties, namely from Λ⊕! to Λ⊕cbn.
It is worth stressing a crucial element: the fact that the translation is sound, complete and preserves surface contexts is what allows us to transfer the results.
XI-B Discussion
Relating the calculi (Girard’s Translations)
The key to understand how Λ⊕cbv, Λ⊕cbn, and Λ⊕! relate are
the two Girard’s translations which embed intuitionistic logic into linear logic, and which are well known to respectively correspond to CbN and CbV computations. Let us clarify this.
Let us start from Λ⊕!: the natural constraint to avoid copying the result of a choice is "no ⊕-reduction in the scope of !" (i.e., inside a !-box).
Using the intuition provided by Girard’s translations as a guide, the constraint above becomes respectively
"no ⊕-reduction in the scope of a λ-abstraction" (in CbV) and "no ⊕-reduction in argument position" (in CbN).
Our three notions of surface context express these three constraints.
The intuitive reasoning above can be formalized thanks to a recent line of work [12, 17], which internalizes the insights coming from linear logic and proof nets into a λ-syntax. The resulting calculus subsumes both CbN and CbV λ-calculi via Girard’s translation.
The idea of a system which subsumes both CbV and CbN had been already advocated and developed by Levy, via the Call-By-Push-Value paradigm [20].
And indeed, [12] can be seen as an untyped version of Levy’s calculus. We leave to the future a comprehensive approach, where a probabilistic linear calculus is the metalanguage in which all the results are developed.
On non-deterministic λ-calculi
The finitary results we presented (namely, confluence and finitary surface standardization) also hold if the probabilistic choice is replaced by non-deterministic choice (just forget the coefficients). Asymptotic results, instead, are specific to probabilistic computation.
Λ⊕! and quantum λ-calculi
The fine control of duplication which Λ! inherits from linear logic has made it an ideal base for quantum λ-calculi (such as [7, 6]).
In those calculi,* surface reduction* is the key ingredient to
allow for the coexistence of quantum bits with duplication and erasing.
No reduction (not even β) is allowed in the scope of a ! operator.
Our results show that β-reduction can be unrestricted, only measurement (the quantum analogue of ⊕) needs to be surface.
We prove Thm. 12, i.e. finitary Surface Standardization for Λ⊕cbv. We start by establishing Surface Standardization for the (non probabilistic) call-by-value λ-calculus, Λcbv, in -C2. This result is folklore, but we could not find it in the literature. In -C3 we extend the result to Λ⊕cbv.
-C1 Preliminary definitions
Surface and left reduction
Surface and left reduction have been defined in Sec. V-B1;
Fig. 9 and Fig. 10 give explicitly the inference rules for surface and left steps; we use the notation defined below:
Notation**.**
If m=[piMi∣i∈I], we write m@Q for [pi(MiQ)∣i∈I], and Q@m for [pi(QMi)∣i∈I].
We recall that a reduction step → is deep, written →d, (resp. internal, written →int) if it is not a surface step (a left step). We have already observed that →d⊂→int, and
that since a ⊕-redex is always surface, a →d step is always a →βv step.
Parallel βv-reduction
•
Parallelβv-reduction is a standard definition, and is given in Fig. 12. We define its lifting \leavevmode∥⇒βv as usual (see Section IV-A).
•
Deep parallel reduction (\leavevmode∥→d, with lifting \leavevmode∥⇒d)
indicates that M\leavevmode∥⇒[S] and M⇒d∗[S]. We make the rules explicit in Fig. 12.
Fact 49**.**
The following holds
[TABLE]
Translation
We refine the translation given in in Sec. IV-B in order to preserves surface reduction.
Let z,w be fresh variables.
(⋅)λ:Λ⊕→Λ is defined as follows:
[TABLE]
The following is straightforward to check.
Lemma 50**.**
Assume M∈Λ⊕.
P→βv[Q]* and (Q)λ=S (in Λ⊕) ⟺(P)λ→βvS (in Λ).*
2. 2.
P→sβv[Q]* and (Q)λ=S (in Λ⊕) ⟺(P)λ→sβvS (in Λ).*
3. 3.
P\leavevmode∥→βv[Q]* and (Q)λ=S (in Λ⊕) ⟺(P)λ\leavevmode∥→βv(Q)λ (in Λ).*
-C2 Λcbv and Surface Standardization
With the standard definition of left, internal, and parallel reduction (denoted →lβv,→intβv,\leavevmode∥→βv, respectively) the following results are well known to hold (see [25, 27]).
(a)
If M→βv∗N then exists S such that
[TABLE]
(b)
If M\leavevmode∥→βvN then exists S s.t. M→lβv∗S\leavevmode∥→intβvN.
(c)
If M\leavevmode∥→intβvM′→lβvN, it exists S s.t. M→lβv∗S\leavevmode∥→intβvN.
The results of the following lemma are immediately obtained from the previous ones, by observing that a left reduction is a surface reduction, a deep reduction is always an internal reduction, and that →intβv does not modify the shape of a term (see [27]).
Lemma 51**.**
If M→βv∗N then exists S such that
[TABLE]
2. 2.
If M\leavevmode∥→βvN then exists S s.t. M→sβv∗S\leavevmode∥→dβvN.
3. 3.
If M\leavevmode∥→dβvM′→sβvN then exists S s.t. M→sβv∗S\leavevmode∥→dβvN.
Proof.
The first two are by induction on N. We recall that →l⊆→s.
By (a) M→βv∗N implies M→lβv∗S→intβv∗N. We examine N.
•
N=x. Then S=N and the result holds trivially.
•
N=λx.P. Hence S=λx.Q→intβv∗λx.P.
Then M→lβv∗λx.Q→dβv∗λx.P.
•
N=PQ. Then S=P′Q′, where P′→βv∗P and Q′→βv∗Q.
By induction P′→sβv∗P′′→dβv∗P and Q′→sβv∗Q′′→dβv∗Q, and the desired sequence is
M→lβv∗P′Q′→sβv∗P′′Q′′→dβv∗PQ.
The result follows since →lβv⊂→dβv.
2. 2.
Similar to the previous one, using (b) , i.e., the fact that M\leavevmode∥→βvN implies M→lβv∗S\leavevmode∥→intβvN.
•
N=λx.P. Then S=λx.Q and Q\leavevmode∥→βvP. By
definition, M→lβv∗λx.Q\leavevmode∥→dβvλx.P.
•
N=PQ. Then S=P′Q′, with P′\leavevmode∥→βvP and Q′\leavevmode∥→βvQ.
By induction P′→sβv∗P′′\leavevmode∥→dβvP and Q′→sβv∗Q′′\leavevmode∥→dβvQ, and the desired sequence is
M→lβv∗P′Q′→sβv∗P′′Q′′→dβv∗PQ.
3. 3.
By induction on M.
•
M=x or M=λx.P. Immediate.
•
M=(λx.P)V. Assume (λx.P)V\leavevmode∥→dβv(λx.P′)V′→sβvN. Since the deep step is an internal step, the surface step is a left step, we have
(λx.P)V\leavevmode∥→intβv(λx.P′)V′→lβvN. From (c), it exists S, M→lβv∗S\leavevmode∥→intβvN.
The \leavevmode∥→intβv step is in particular a \leavevmode∥→βv, hence from point 2 it holds that S→sβv∗S′\leavevmode∥→dβvN, hence the claim.
•
M=PQ. By hypothesis, PQ\leavevmode∥→dβvP′Q′→sβvN; the surface redex is inside either P′ or Q′, say Q′. We have N=P′R, Q\leavevmode∥→dβvQ′→sβvR and by induction
Q→sβv∗R′\leavevmode∥→dβvR. Hence PQ→sβv∗PR′\leavevmode∥→dβvP′R.
If M\leavevmode∥→d[M′] and M′→sn, then it exists s, such that [M]⇒s∗s and s\leavevmode∥⇒dn.
Proof.
If M′→sβvn, the claim holds by simulation in Λcbv and Lemma 51, point (3).
If M′→s⊕n, we procede by induction on M.
The case M=x and M=λx.P do not apply.
2. 2.
Let M=P⊕Q. Assume P⊕Q\leavevmode∥→d[R⊕S], so P\leavevmode∥→βvR, Q\leavevmode∥→βvS, and R⊕S→s⊕[21R,21S]. By Lemma 51, point 2 and simulation in Λcbv, it holds that P→s∗P′\leavevmode∥→dR and Q→s∗Q′\leavevmode∥→dS. Therefore P⊕Q→s⊕[21P,21Q]⇒s∗[21P′,21Q′]\leavevmode∥⇒d[21R,21S]
3. 3.
Let M=PQ. Assume PQ\leavevmode∥→d[RT] (with P\leavevmode∥→dR and Q\leavevmode∥→dT) and RT→⊕n with the ⊕-redex in either R or T, say is in R. Hence R→⊕r=[21Ri∣i∈{1,2}] and RT→s[21RiT∣i∈{1,2}]=n.
By induction, from P\leavevmode∥→d[R]⇒sr we have [P]⇒s∗[21Si∣i∈{1,2}] and Si\leavevmode∥→dRi .
Therefore [PQ]→s∗[21SiQ∣i∈{1,2}]\leavevmode∥⇒d[21RiT∣i∈{1,2}]=n .
∎
Corollary 53**.**
If m\leavevmode∥⇒dn and n⇒s∗r, then exists s with m⇒s∗s and s\leavevmode∥⇒dr.
Proof.
By induction on the length k of n⇒s(k)r. If k=0 the result is trivial.
Otherwise, let n⇒s∗r be n⇒sn1⇒s(k−1)r. By Lemma 52, from m\leavevmode∥→dn⇒sn1 we have that m⇒s∗s\leavevmode∥⇒dn1⇒s(k−1)r.
By inductive hypothesis, s⇒s∗r′\leavevmode∥⇒dr, hence m⇒s∗s⇒s∗r′\leavevmode∥⇒dr.
∎
Now we are able to prove the theorem:
Thm. 12: if m⇒∗n then then exists r such that m⇒s∗r and r⇒d∗n.
Proof.
By induction on the length k of the reduction m⇒∗n, using Corollary 53.
If k=0, the result is trivial (r=m). Otherwise, m⇒m1⇒∗n. By induction, we have
m1⇒s∗r⇒d∗n. We can separate the first step in two: m⇒sm′⇒dm1, by reducing first only the elements of m which have a surface reduction, and then only the elements which have a deep reduction. The step m′⇒dm1 can be regarded as a parallel step. By Corollary 53, from m′\leavevmode∥⇒dm1⇒s∗r we obtain m′⇒s∗s\leavevmode∥⇒dr, hence it holds that
m⇒sm′⇒s∗s⇒d∗r⇒d∗n.
Let X be a countable set, fn:X→[0,∞] a non-decreasing sequence of functions, such that
f(x):=limn→∞fn(x)=supnfn(x) exists for each x∈X. Then
[TABLE]
Hence, given μn:Obs→[0,1] and ρ(U)=limn→∞μn(U), the following holds:
[TABLE]
Existence of maximals.
We recall the definition of norm ∥μ∥=∑x∈Xμ(x).
Lemma 54** (Existence of maximals).**
Confluence implies that:
Norms(m)={∥μ∥\leavevmode∣\leavevmodeμ∈Lim(m)}* has a greatest element;*
2. 2.
Lim(m)* has maximal elements.*
Proof.
(1. )
Let p=sup\leavevmodeNorms(m).
We show that p∈Norms(m),
by providing a rewrite sequence ⟨mn⟩n∈N from m such that ⟨mn⟩n∈N⇒\leavevmode\leavevmode\leavevmode∞τ and ∥τ∥=p.
The following facts are all easy to check:
a.
If α<β then ∥α∥<∥β∥.
b.
If p∈Norms(m), then for each ϵ, there exists μ∈Lim(m) such that ∥μ∥≥p−ϵ.
c.
The Main Lemma implies that, fixed ϵ, if m⇒\leavevmode\leavevmode\leavevmode∞μ with ∥μ∥≥(p−ϵ), and m⇒∗s, then there exists s′, such that s⇒∗s′ and ∥σ′∥≥(p−2ϵ).
(Proof:
Main Lemma implies that there is a rewrite sequence ⟨sn⟩n∈N from s which converges to σ≥μ.
Therefore ⟨sn⟩n∈N⇒\leavevmode\leavevmode\leavevmode∞σ where ∥σ∥≥(p−ϵ). For the same ϵ, there is an index N such that s⇒∗sN and ∥σN∥≥(∥σ∥−ϵ), hence ∥σN∥≥p−2ϵ. )
d.
∀δ∈R+ there exists k such that 2kp≤δ.
For each k∈N, let ϵk=2kp. Let s(0)=m. From here, we build a sequence of reductions
m⇒∗s(1)⇒∗s(2)⇒∗… whose limit has norm p, as illustrated in Fig. 13.
For each k>0, we observe that:
•
By (b.) there exists μ(k)∈Lim(m) such that ∥μ(k)∥≥(p−212kp).
•
From m⇒∗s(k−1), we use (c.) to establish that there exists s(k) such that s(k−1)⇒∗s(k) and
∥σ(k)∥≥(p−2kp). Observe that μ(k),s(k−1),s(k) resp. instantiate μ,s,s′ of (c.).
Let ⟨sn⟩n∈N be the concatenation of all the finite sequences s(k−1)⇒∗s(k). By construction,
⟨sn⟩n∈N⇒\leavevmode\leavevmode\leavevmode∞τ such that ∥τ∥=p. Hence p∈Norms(m).
**(1. ⇒ 2.) ** We observe that if ⟨mn⟩n∈N⇒\leavevmode\leavevmode\leavevmode∞μ and ∥μ∥ is maximal in Norms(m), then μ is maximal in Lim(m), because of (a.).
Case M=S(Q), and both redexes are inside Q. Similar to Lemma 8, case (2.2b).
3. 3.
Case M=PQ, with the β-redex inside P, and the ⊕-redex inside Q. Similar to Lemma 8,
case (2.2a).
4. 4.
Case M=(λ!x.P)!Q, where M is the β-redex. The ⊕-redex needs to be inside P.
Assume P→⊕[21P1,21P2].
We have M→⊕[21(λ!x.P1)!Q,21(λ!x.P2)!Q], and
M→β[P[Q/x]]. It is immediate that the multidistribution r=[21P1[Q/x],21P2[Q/x]] satisfies the claim.
5. 5.
Case M=(λx.P)Q, where M is the β-redex.
If the ⊕-redex is inside P, we reason as above. Assume that the ⊕-redex is inside Q, and we have Q→⊕[21Q1,21Q2]. The key observation is that in P there is *at most one occurrence *of x. Let assume there is exactly one occurrence (the case of none is easy). Let C be the context such that P=C(x) (i.e., C is P, with a hole in the place of x). Observe that P[Q/x]=C(Q). We have M→β[P[Q/x]=C(Q)], and M→⊕[21(λx.P)Q1,21(λx.P)Q2].
The multidistribution r=[21C(Q1),21C(Q2)] satisfies the claim.
-F1 Λ⊕! is a conservative extension of Λ⊕cbn (X-A4)
To prove Proposition 42, we first prove that (⋅)N preserves both surface contexts and ⊕-redexes.
Lemma 58**.**
Given M∈Λ⊕cbn, (S1) holds ⟺ (S2) holds, where:
S1:
in Λ⊕cbn , there exists S surface context and a redex r=R1⊕R2 such that M=S(r);
2. S2:
in Λ⊕! there exists T surface context and a redex u=U1⊕U2 such that (M)N=T(u);
and moreover (S(Ri))N=T(Ui), for i∈{1,2}.
Proof.
⟹. By induction on the form of the surface context.
•
□. Since M=r, then (M)N=(R1)N⊕(R2)N. Hence u=(R1)N⊕(R2)N and T=□
satisfy the claim.
•
SQ. We have that M=SQ(r)=S(r)Q. Hence (S(r)Q)N=(S(r))N!(Q)N. By inductive hypothesis, there exist T′ and u such that
(S(r))N=T′(u), and (S(Ri))N=T′(Ui).
By definition of surface context in Λ⊕!, the claim hold with T=T′!(Q)N, and the same u.
•
λx.S. (λx.S(r))N=λ!x.(S′(r))N, and the claim holds by inductive hypothesis.
⟸.
We examine the possible form of T, given that (M)N=T(u); we prove that
M=S(r) and that (S(Ri))N=T(Ui).
•
□. Immediate.
•
T′Q. We have that (T′Q)(u)=T′(u)Q, and
T′(u)Q=(LN)N=(L)N!(N)N with M=LN. Therefore T′(u)=(L)N, and the claim holds by inductive hypothesis and definition of surface context.
•
λ!x.T′. We have that (λ!x.T′)(u)=λ!x.T′(u) and λ!x.T′(u)=λ!x.(M′)N, with M=λx.M′. The claim holds by inductive hypothesis.
∎
Proposition**.**
42. [Simulation] The translation (.)N is sound and complete; it preserves surface reduction and surface normal forms.
Let M∈Λ⊕cbn; the following hold:
if M→n then (M)N→(n)N;
2. 2.
if M→sn then (M)N→s(n)N;
3. 3.
if (M)N→s then ∃!n such that s=(n)N and M→n;
4. 4.
if (M)N→ss then ∃!n such that s=(n)N and M→sn;
5. 5.
M∈H if and only if (M)N∈S!.
Proof.
We prove (1.)-(4.); since
→\leavevmode=\leavevmode→β∪→⊕, we deal separately with the two reductions. Point (5.) is an immediate consequence of the other points.
→β
We deal with →β via simulation in Λcbn and Λ!, since the analogous result is proved in [17].
We have defined a translation (−)!:Λ⊕!→Λ! which is sound and complete, and preserves surface reduction. It is straightforward to define a similar translation from Λ⊕cbn into Λcbn.
Therefore, if
in Λ⊕cbn it holds M→βN,
we translate in Λcbn,
use the result in [17] and conclude (via simulation) that
(M)N→β(N)N in Λ⊕!.
Similarly for (2.)-(3.)-(4.).
→⊕
Immediate consequence of Lemma 58,
which proves that that (⋅)N preserves both surface contexts and ⊕-redexes.
∎
Bibliography31
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] P. Arrighi and G. Dowek. Lineal: A linear-algebraic lambda-calculus. Logical Methods in Computer Science , 13(1), 2017.
2[2] M. Avanzini, U. Dal Lago, and A. Yamada. On probabilistic term rewriting. In J. P. Gallagher and M. Sulzmann, editors, Functional and Logic Programming - 14th International Symposium, FLOPS 2018, Nagoya, Japan, May 9-11, 2018, Proceedings , volume 10818 of Lecture Notes in Computer Science , pages 132–148. Springer, 2018.
3[3] G. Bacci, R. Furber, D. Kozen, R. Mardare, P. Panangaden, and D. Scott. Boolean-valued semantics for the stochastic λ 𝜆 \lambda -calculus. In A. Dawar and E. Grädel, editors, Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, Oxford, UK, July 09-12, 2018 , pages 669–678. ACM, 2018.
4[4] H. P. Barendregt. The Lambda Calculus: Its Syntax and Semantics , volume 103. North Holland, 1984.
5[5] J. Borgström, U. Dal Lago, A. D. Gordon, and M. Szymczak. A lambda-calculus foundation for universal probabilistic programming. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, Nara, Japan, September 18-22, 2016 , pages 33–46, 2016.
6[6] U. Dal Lago, C. Faggian, B. Valiron, and A. Yoshimizu. The geometry of parallelism: classical, probabilistic, and quantum effects. In G. Castagna and A. D. Gordon, editors, Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017 , pages 833–845. ACM, 2017.
7[7] U. Dal Lago, A. Masini, and M. Zorzi. Confluence results for a quantum lambda calculus with measurements. Electr. Notes Theor. Comput. Sci. , 270(2):251–261, 2011.
8[8] U. Dal Lago and M. Zorzi. Probabilistic operational semantics for the lambda calculus. Co RR , abs/1104.0195, 2011.