On the Taylor Expansion of Probabilistic $\lambda$-Terms (Long Version)
Ugo Dal Lago, Thomas Leventis

TL;DR
This paper extends the Taylor expansion framework to probabilistic lambda calculus, establishing its adequacy as a semantic tool and linking it to probabilistic B"ohm trees, thus advancing the understanding of probabilistic computation semantics.
Contribution
It generalizes the Taylor expansion to probabilistic lambda-terms and proves its adequacy and correspondence with probabilistic B"ohm trees, providing a new semantic perspective.
Findings
Taylor expansion is adequate for probabilistic lambda-terms
Established a correspondence with probabilistic B"ohm trees
Extended resource calculus to probabilistic settings
Abstract
We generalise Ehrhard and Regnier's Taylor expansion from pure to probabilistic -terms through notions of probabilistic resource terms and explicit Taylor expansion. We prove that the Taylor expansion is adequate when seen as a way to give semantics to probabilistic -terms, and that there is a precise correspondence with probabilistic B\"ohm trees, as introduced by the second author.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the Taylor Expansion of Probabilistic -terms
(Long Version)
Ugo Dal Lago
Thomas Leventis
Abstract
We generalise Ehrhard and Regnier’s Taylor expansion from pure to probabilistic -terms through notions of probabilistic resource terms and explicit Taylor expansion. We prove that the Taylor expansion is adequate when seen as a way to give semantics to probabilistic -terms, and that there is a precise correspondence with probabilistic Böhm trees, as introduced by the second author.
1 Introduction
Linear logic is a proof-theoretical framework which, since its inception [10], has been built around an analogy between on the one hand linearity in the sense of linear algebra, and on the other hand the absence of copying and erasing in cut elimination and higher-order rewriting. This analogy has been pushed forward by Ehrhard and Regnier, who introduced a series of logical and computational frameworks accounting, along the same analogy, for concepts like that of a differential, or the very related one of an approximation. We are implicitly referring to differential -calculus [6], to differential linear logic [8], and to the Taylor expansion of ordinary -terms [9]. The latter has given rise to an extremely interesting research line, with many deep contributions in the last ten years. Not only the Taylor expansion of pure -terms has been shown to be endowed with a well-behaved notion of reduction, but the Böhm tree and Taylor expansion operators are now known to commute [7]. This easily implies that the equational theory (on pure -terms) induced by the Taylor expansion coincides with the one induced by Böhm trees.
The Taylor expansion operator is essentially quantitative, in that its codomain is not merely the set of resource -terms [3, 6], a term syntax for promotion-free differential proofs, but the set of linear combinations of those terms, with positive real number coefficients. When enlarging the domain of the operator to account for a more quantitative language, one is naturally lead to consider algebraic -calculi, to which giving a clean computational meaning has been proved hard so far [18].
But what about probabilistic -calculi [11], which have received quite some attention recently (see, e.g. [5, 2, 16]) due to their applicability to randomised computation and bayesian programming? Can the Taylor expansion naturally be generalised to those calculi? This is an interesting question, to which we give the first definite positive answer in this paper. In particular, we show that the Taylor expansion of probabilistic -terms is a conservative extension of the well-known one on ordinary -terms. In particular, the target can be taken, as usual, as a linear combination of ordinary resource -terms, i.e., the same kind of structure which Ehrhard and Regnier considered in their work on the Taylor expansion of pure -terms. We moreover show that the Taylor expansion, as extended to probabilistic -terms, continues to enjoy the nice properties it has in the deterministic realm. In particular, it is adequate as a way to give semantics to probabilistic -terms, and the equational theory on probabilistic -terms induced by Taylor expansion coincides with the one induced by a probabilistic variation on Böhm trees [1]. The latter, noticeably, has been proved to capture observational equivalence, one quotiented modulo -equivalence [1].
Are we the first ones to embark on the challenge of generalising Taylor’s expansion to probabilistic -calculi, and in general to effectful calculi? Actually, some steps in this direction have recently been taken. First of all, we need to mention the line of works originated by Tsukada and Ong’s paper on rigid resource terms [14]. This has been claimed from the very beginning to be a way to model effects in the resource -calculus, but it has also been applied to, among others, probabilistic effects, giving rise to quantitative denotational models [15]. The obtained models are based on species, and are proved to be adequate. The construction being generic, there is no aim at providing a precise comparison between the discriminating power of the obtained theory and, say, observational equivalence: the choice of the underlying effect can in principle have a huge impact on it.
One should also mention Vaux’s work on the algebraic -calculus [18], where one can build arbitrary linear combinations of terms. He showed a correspondence between Taylor expansion and Böhm trees, but only for terms whose Böhm trees approximants at finite depths are computable in a finite number of steps. This includes all ordinary -terms but not all probabilistic ones. More recently Olimpieri and Vaux have studied a Taylor expansion for a non-deterministic -calculus [19] corresponding to our notion of explicit Taylor expansion (Section 3).
In the rest of this section, probabilistic Taylor expansion will be informally introduced by way of an example, so as to make the main concepts comprehensible to the non-specialist. In sections 2 and 3, we introduce a new form of resource term, and a notion of explicit Taylor expansion from probabilistic -terms. These constructions have an interest in themselves (again, see [19]) but in this paper they are just an intermediate step towards proving our main results. Definitionally, the crux of the paper is Section 4, in which the Taylor expansion of a probabilistic -term is made to produce ordinary resource terms. The relationship between the introduced theory and the one induced by Probabilistic Böhm trees [13] is investigated in Section 5 and Section 7.
The Probabilistic Taylor Expansion, Informally
In this section, we introduce the main ingredients of the probabilistic Taylor expansion by way of an extremely simple, although instructive, example. Let us consider the probabilistic -term , where is an operator for binary, fair, probabilistic choice, , and is a purely diverging, term. As such, is a term of a minimal, untyped, probabilistic -calculus. Evaluation of , if performed leftmost-outermost is as in Figure 1. In particular, the probability of convergence for is .
Please observe that two copies of the argument are produced, and that the “rightmost” one is evaluated only when the “leftmost” one converges, i.e. when the probabilistic choice produces as a result.
The main idea behind building the Taylor expansion of any -term is to describe the dynamics of by way of linear approximations of . In the realm of the -calculus, a linear approximation has traditionally been taken as a resource -term, which can be seen as a pure -term in which applications have the form , where is a term and is a multiset of terms, and in which the result of firing the redex is the linear combination of all the terms obtained by allocating the resources in to the occurrences of in . For instance, one such element in the Taylor expansion of is , where the occurrence of in head position is provided with only one copy of its argument. If applied to the multiset , this term would reduce into . Similarly, an element in the Taylor expansion of would be , which reduces into . Another element of the same Taylor expansion is , but this one reduces into [math]: there is no way to use its resources linearly, i.e., using them without copying and erasing. The actual Taylor expansion of a term is built by translating any application into an infinite sum . For instance, the Taylor expansion of is . Remark that any summand properly reduces only when , in which case it reduces to . In turn reduces properly only when , and the result is . All the other terms reduce to [math]. In the end the Taylor expansion of normalises to .
Extending the Taylor expansion to probabilistic terms seems straightforward, a natural candidate for the Taylor expansion of being just . When computing the Taylor expansion of we will find expressions such as , i.e. . For non-trivial reasons, the Taylor expansion of any diverging term normalises to [math], so just like in our previous example, the only element in which does not reduce to [math] is . The difference is that this time it appears with a coefficient , so normalises to . Please notice how this is precisely the “normal form” of the original term . This is a general phenomenon, whose deep consequences will be investigated in the rest of this paper, and in particular in Section 5.
Notations
We write for the set of natural numbers and for the set of nonnegative real numbers. Given a set , we write for the set of families of positive real numbers indexed by elements in . We write such families as linear combinations: an element is a sum , with . The support of a family is . We write for those families such that is finite. Given we often write for unless we want to emphasise the difference between the two expressions. We also define finite multisets over as functions such that for finitely many . We use the notation to describe the multiset such that is the number of indices such that .
2 Probabilistic Resource -Calculus
In this section, we describe the theory of resource terms with explicit choices, for the purpose of extending many of the properties of resource terms to the probabilistic case. All this has an interest in itself, but here this is mainly useful as a way to render certain proofs about the Taylor Expansion easier (see Section 3 for more details). For this reason we try to give the reader a clear understanding of this calculus and of why these definitions and properties are useful, without focusing on the actual proofs. These are straightforward generalisations of those for deterministic resource terms [9] and can be found in an extended version of this paper [4]. The same results have recently been given for a non-deterministic calculus [19] by Olimpieri and Vaux.
2.1 The Basics
Definition 2.1**.**
The sets of probabilistic simple resource terms and of probabilistic simple resource poly-terms over a set of variables are defined by mutual induction as follows:
[TABLE]
where ranges over . We call finite probabilistic resource terms the finite linear combinations of resource terms in , and finite probabilistic resource poly-terms the finite linear combinations of resource poly-terms in . We extend the constructors of simple (poly-)terms to (poly-)terms by linearity, e.g., if then is defined as the poly-term such that and if is not an abstraction.
Some consecutive abstractions will be indicated as , or even as . Similarly, to describe many successive applications , we use a single pair of brackets and we write . We write for , which is ranged over by metavariables like . Note that intuitively should stand for either or , not their union. For instance we will prove some properties for finite linear combinations in , but the only relevant linear combinations are the actual (poly-)terms in or . Yet this distinction is technically irrelevant, and all our results hold if we define as a union.
The reason why linear combinations over such elements are dubbed terms will be clear once we describe the operational semantics of the resource calculus. The main point of the resource -calculus is to allow functions to use their argument arbitrarily many times and yet remain entirely linear, which is achieved by taking multisets as arguments: if a function uses its argument times then it needs to receive resources as argument and use each of them linearly. This idea has two consequences. First, an application can fail if a function is not given exactly as many arguments as it needs, as it would need either to duplicate or to discard some of them. Second, the result of a valid application is often not unique: a function can choose how to allocate the different resources to the different calls to its argument, and different choices may lead to different results. Both these features are treated using linear combinations: a failed application results in [math] (i.e. the trivial linear combination) and a successful one yields the sum of all its possible outcomes.
Definition 2.2**.**
We define the substitution of for in by:
[TABLE]
where are the free occurrences of in and is the set of permutations over . Alternatively, we could define by induction on , as follows
[TABLE]
where is the disjoint union of sets.
Example 2.1*.*
A basic example is : there are two occurrences of in , so there are two ways to substitute for them. Remark that we also have : the two occurrences of are not as clearly distinguished as in the first example but they still count as different occurrences. Similarly and : there are two distinct occurrences of , so there are two ways to allocate them. As another example, please consider : the substitution fails if the number of resources does not match the number of free occurrences of the substituted variable.
The operational semantics of the deterministic resource -calculus [9] is usually given as a single rule of -reduction. In the probabilistic setting, we also need rules to make choices commute with head contexts.
Definition 2.3**.**
The reductions and are defined from to by:
[TABLE]
extended under arbitrary contexts. We simply write for . Reduction can be extended to finite terms in the following way: if , and then .
As the resource -calculus does not allow any duplication, and -reduction erases some constructors, it naturally decreases the size of the involved simple terms. Consequently, -reduction is strongly normalising. This result can be extended to the whole reduction , which is also confluent.
More specifically we define the size of a simple (poly-)term in a natural way. To any we associate two sizes: and . We order with a reverse lexicographical order: iff there exists such that and for all .
Proposition 2.1**.**
The reduction is confluent and strongly normalising on . Given we write for its unique normal form for , and given we write for .
Proof.
Proving weak confluence is straightforward. Strong normalisation is proven in two steps. First using an appropriate weight on terms describing how deep choices are we can prove that is strongly normalising. Second one can observe that preserves size, and that if and then , hence if then . The confluence is given by Newman’s Lemma. ∎
2.2 Complete Left Reduction
This reduction is not convenient to study (poly-)terms with particular properties such as uniformity or regularity, which we will define later. For instance given a simple poly-term we can reduce independently the different occurrences of , so not every reduct of is of the form with . Similarly given a term we can reduce independently the elements of its support, possibly losing some common properties shared by these elements. For that reason (as well as the issue of infinite terms discussed in the rest of this section) we are mostly interested in normalisation rather than reduction. To study this normalisation we still need some small-step operational semantics, but it will be more convenient to consider the complete left reduction defined as follows.
Definition 2.4**.**
We define the complete left reduct of a simple (poly-)term by induction:
[TABLE]
We extend this definition to terms: .
Proposition 2.2**.**
For all , .
Proposition 2.3**.**
For all there is such that .
Proof.
The reduction being strongly normalising we reason by induction on the bound on the length of the reductions of . We have either and is already in normal form or reduces into in a least one step and we conclude by induction hypothesis. ∎
2.3 Infinite Terms
So far we only worked with finite terms but to fully express the operational behaviour of a -term in the resource -calculus, which is the purpose of the Taylor expansion, we need infinite ones. We can extend the constructors of the calculus to by linearity and generalise the reduction relation , but Proposition 2.1 fails. Indeed let and . For , let . Then, for all the term normalises in steps and does not normalise in a finite number of reduction steps. A simple solution to this problem is to define the “normal form” of an infinite term by normalising each of its components: we can set . But then another problem arises. In our previous example, we have for all , thus we would have , which is not an element of as the coefficient of is infinite. Still we can use this pointwise normalisation if we consider terms with a particular property, called uniformity.
Definition 2.5**.**
The coherence relation on is defined by:
[TABLE]
[TABLE]
For we write when for all , . A simple (poly-)term is called uniform if , and a term is called uniform if .
Remark 2.1*.*
In the rule for we require and to ensure that whenever , the simple (poly-)terms and are necessarily uniform. This is not crucial, as we will only consider uniform (poly-)terms, whose support contains only uniform simple (poly-)terms by definition, but this simplifies inductive reasoning.
What makes coherence and uniformity interesting is that if two coherent terms and have disjoint supports, then all of their reducts, and in particular their normal forms, have disjoint supports. Then any element in the support of comes either from or from , but it cannot come from both.
Lemma 2.4**.**
If and then . Besides if then and .
Proof.
By induction on :
- •
If then for and to be both nonempty we need to have and for some , and in this case and . The hypothesis implies , and if then .
- •
If with then either one of the substitutions is [math] or we have .
- •
If , or , with in each case , then the result is immediate by induction hypothesis.
- •
If then we use the induction hypothesis on and (given by Proposition LABEL:prop:coh_ref) to prove that for we have , and similarly for , and the result follows. Notice that we will never have .
- •
If then , and similarly for . Observe that for and we have and so we can apply the induction hypothesis to and and to and to get the result.
- •
Finally if we use a similar reasoning: for any and we have , and , hence by induction hypothesis for any and we have , and . This gives the first part of the result. Now if then necessarily , and we can find sets and such that , and (up to permutation of the indices in and ). By induction hypothesis we get and , hence and .
∎
Proposition 2.5**.**
Given , if then . If moreover then .
Proof.
It is sufficient to prove the result for simple terms as the generalisation to finite terms is straightforward. We reason by induction on and the proof of .
- •
If or the result is immediate by induction hypothesis.
- •
If then and are uniform and by induction hypothesis so are and , hence .
- •
The case of head normal forms is immediate by induction hypothesis.
- •
If then we apply Lemma 2.4.
- •
The cases of head choices are immediate.
- •
The case of poly-terms is immediate by induction hypothesis.
∎
Corollary 2.6**.**
Given , if then . If moreover then .
Proof.
Using Proposition 2.3, by induction on . ∎
This immediately implies that pointwise reduction of infinite uniform terms is well defined, as both complete left reducts and normal forms of distinct but coherent simple (poly-)terms have disjoint supports.
Corollary 2.7**.**
If is uniform then and are in . We write and respectively for these sums.
Proof.
For all we have by hypothesis so the previous proposition gives . Therefore given any there is at most one such that . The same goes for normalisation. ∎
Remark 2.2*.*
Although both complete left reduction and normal forms are well defined for infinite terms, Proposition 2.3 doesn’t hold: consider , and , then is uniform and but for all , . Besides is not even the limit of the as approaches . However normal forms are indeed limits of complete left reducts restricted to normal simple terms.
Proposition 2.8**.**
Given a uniform (poly-)term and given in normal form, we have for all large enough.
Proof.
If then by Corollary 2.6 there is a unique such that , and by Proposition 2.3 for all large enough we have . ∎
2.4 Regular Terms
The deterministic Taylor expansion associates to any -term a uniform term, and explicit choices are adopted precisely for the sake of preserving this property in the probabilistic case. Taylor expansions have another important property: they are entirely defined by their support. If a simple term is in the support of the Taylor expansion of a -term , then its coefficient is the inverse of its multinomial coefficient, which does not depend on . Moreover this property is preserved by normalisation. Using explicit choices enforces this result in the probabilistic case, as well.
Definition 2.6**.**
For any we define the multinomial coefficient by:
[TABLE]
where is the multiplicity of in .
Definition 2.7**.**
A uniform term is called regular if for all , .
Multinomial coefficients correspond to the number of permutations of multisets which preserve the description of simple (poly-)terms. For instance, given variables , the coefficient is exactly the number of permutations such that . For a more precise interpretation of multinomial coefficients see [9] or [14]. Due to their relation with permutations in multisets, these coefficients appear naturally when we perform substitutions.
Theorem 2.9**.**
For any uniform, for , and , we have: .
There exist two methods to prove similar theorems in the literature, and both can be used to prove Theorem 2.9. The first one is the original proof by Ehrhard and Regnier for the pure deterministic case [9], and its generalisation is straightforward and only requires to extend the notion of uniformity (to take into account that is uniform). The second one is by Asada, Tsukada and Ong for a simply typed calculus with choices [14], and it has been extended to the untyped case by Olimpieri and Vaux in an unpublished paper [19]. We present here a direct generalisation of the proof in [9].
Definition 2.8**.**
A multilinear-free (poly)-term is a (poly)-term such that all of its variables are free and each one occurs exactly once. A multilinear-free substitution is a partial function from to multilinear-free terms such that for all in . We say that is adapted if and no element of is bound in . Then is the multilinear-free (poly)-term obtained by applying on the variables of . Similarly for any multilinear-free (poly)-term and we write for the term obtained by applying to the variables of without renaming captured variables. A pair is said to represent if .
Definition 2.9**.**
We define the following sets of bijections over variables:
[TABLE]
Lemma 2.10**.**
.
Lemma 2.11**.**
For any there exists a unique such that , and is a group homomorphism.
Lemma 2.12**.**
.
Definition 2.10**.**
We define by induction a notion of uniformity for pairs where is a multilinear-free polyterm and :
- •
is uniform if for all ;
- •
is uniform if is uniform;
- •
is uniform if and are uniform, with and the obvious restrictions of ;
- •
is uniform if and are uniform, where and are the obvious restrictions of .
If is a multilinear-free simple term we say that is uniform if is uniform.
Lemma 2.13**.**
A pair is uniform iff is uniform (i.e. ).
Lemma 2.14**.**
For a uniform pair and two multilinear-free substitutions over , if then there exists such that .
Lemma 2.15**.**
If is uniform then .
Proposition 2.16**.**
If is uniform then
Proof.
We have .
Observe that and . ∎
This is enough to conclude the proof of Theorem 2.9.
This theorem ensures that a regular -redex reduces into a regular term. More generally, the theorem is the key step towards proving that regular (poly-)terms always normalise to regular (poly-)terms.
Proposition 2.17**.**
If is uniform then for any , .
Proof.
We reason by induction on , using Theorem 2.9 when dealing with -reduction. Observe that in the case of a poly-term , according to Proposition 2.5 for all we have either or . This means that for a poly-term the number of pairwise distinct sequences with such that for all is exactly . ∎
Corollary 2.18**.**
For all finite regular term , and are regular.
Theorem 2.19**.**
If is regular then is regular.
Proof.
This follows directly from the previous result and Corollary 2.6. ∎
2.5 Regularity and the Exponential
The regularity of terms is preserved by the constructors of simple resource terms.
Proposition 2.20**.**
For all , regular and regular, the terms , , , and are regular.
One may expect a similar result for poly-terms: if ,…, in are regular then is regular. However, this is not the case: is regular and yet is not. Indeed nontrivial coefficients appear in precisely when contains simple poly-terms with multiplicities greater than , so the regular sum with the same support as has no simple description. A natural way to build regular poly-terms from regular terms is to use the following construction.
Definition 2.11**.**
The exponential of is , where stands for the poly-term with copies of .
Proposition 2.21**.**
If is regular then is regular.
Proof.
The key point is that the number of sequences which describe a given simple poly-term is exactly . ∎
With these results, we have all the ingredients we need to translate (probabilistic) -terms into regular terms: variables and abstractions of regular terms are regular, and we can define an application between regular terms following Girard’s call-by-name translation of intuitionistic logic into linear logic [10]: applied to is .
3 Explicit Probabilistic Taylor Expansion
This section is devoted to defining and studying the Taylor expansion with explicit choices, or explicit Taylor expansion, of probabilistic -terms. It is named as such because its target is the set of probabilistic resource terms, as defined in the previous section, rather than the usual ones. This is not the main contribution of this paper, but an intermediate step in the study of Taylor expansion as defined in Section 4.
3.1 The Definition
Probabilistic -terms are -terms enriched with a probabilistic choice operator.
Definition 3.1**.**
The set of probabilistic -terms is:
[TABLE]
Example 3.1*.*
Let us consider the probabilistic -term , where , , and is any diverging term, e.g. . The term converges (to ) with probability , and will be used as a running example throughout this section.
Definition 3.2**.**
The explicit Taylor expansion is defined inductively as follows:
[TABLE]
Definition 3.3**.**
The support of the Taylor expansion of is defined by:
[TABLE]
Proposition 3.1**.**
For every , it holds that
[TABLE]
Proof.
By induction on the structure of :
- •
If is a variable , then
[TABLE]
- •
If is an abstraction , then:
[TABLE]
- •
If is an application , then we can first of all give the following lemma. For every , it holds that
[TABLE]
As a consequence,
[TABLE]
- •
If is a sum , then
[TABLE]
∎
The results from the previous section immediately imply that Taylor expansions are regular resource terms and that they are normalisable.
Proposition 3.2**.**
For all , the explicit Taylor expansion is uniform and regular.
Proof.
This is a direct consequence of Proposition 2.20 and Proposition 2.21. ∎
Corollary 3.3**.**
Every explicit Taylor expansion has a normal form , which we call the explicit Taylor normal form of , and which is regular.
Proof.
This is given by Theorem 2.19. ∎
3.2 Probabilistic Reduction
In the literature, the probabilistic -calculus is usually endowed with a labelled transition relation describing a probabilistic reduction process, where a choice reduces to with probability and to with probability . Another kind of operational semantics, more common for other quantitative calculi such as the algebraic -calculus, is to have a non-labelled reduction where choices simply commute with some contexts, as we did in our probabilistic resource calculus. In this paper we use both kinds of semantics. On one hand a deterministic operational semantics will simplify the comparison between the operational semantics of -terms and that of their Taylor expansion, but on the other hand explicit Taylor expansion precisely splits choices into two different branches, just like labelled transition systems do.
Definition 3.4**.**
Head contexts are contexts of the form , and are indicated with the metavariable . Head normal forms are terms of the form . We write for the set of all head normal forms. We now define a formal system deriving judgements in the form where , and is a finite sequence of elements in :
[TABLE]
where is the empty sequence and for .
Proposition 3.4**.**
For all and there is at most one such that .
Definition 3.5**.**
For all we define the complete left reduct of by:
[TABLE]
Proposition 3.5**.**
For all , .
Proof.
By a simple induction on . ∎
Proposition 3.6**.**
If then either or . Conversely if then there is such that and either or .
Proof.
By induction on .
- •
For for both results the sequence of choices cannot be empty. Let us assume wlog we reduce to the left-hand side. If then and by induction hypothesis either or , hence either or . Similarly if we conclude by induction hypothesis.
- •
For head normal forms if then , and conversely if then .
- •
If there is a head -redex then and \lambda\vec{x}.M\left[\raisebox{1.99997pt}{N}/\raisebox{-1.99997pt}{y}\right]\,P_{1}\,\dots\,P_{m} have the same reductions. The same goes for head choices.
∎
An interesting property of explicit Taylor expansion is that the explicit Taylor normal form of a term is precisely given by the explicit Taylor normal forms of the head normal forms of , as well as the sequences of choices such that .
Definition 3.6**.**
Given a sequence of choices and we define by induction on the length of by:
[TABLE]
We extend this definition to by linearity.
Theorem 3.7**.**
Given any ,
[TABLE]
Proof.
First observe that these resource terms are regular: Corollary 3.3 states that and the are regular (so the are regular too), and if and then either and by Proposition 3.4 , or and then and are coherent and have disjoint supports. Thus we only need to prove that these terms have the same supports.
Now if then we prove by induction on the proof this relation that if then . More precisely we prove that for some , .
- •
If the result is immediate.
- •
If and then by induction hypothesis there is such that \rho\cdot s\in\mathrm{supp}(\mathrm{L}^{k}(H[(M\left[\raisebox{1.99997pt}{N}/\raisebox{-1.99997pt}{x}\right]]^{*\oplus})), ie .
- •
The same goes for head choices.
Conversely according to Proposition 2.8 for all in normal form there is such that , and according to Proposition 3.5 we have . Hence if we have . It is then easy to prove by induction on that there are and such that and . Then according to the previous proposition there is such that and (with ), hence . ∎
Lemma 3.8**.**
For all , if and then \delta_{x}s\cdot\overline{t}\in\mathrm{supp}(M\left[\raisebox{1.99997pt}{N}/\raisebox{-1.99997pt}{x}\right]^{*\oplus}).
Proof.
By induction on . ∎
Lemma 3.9**.**
For any and any head context we have:
[TABLE]
4 Generic Taylor Expansion of Probabilistic -terms
4.1 Barycentric Semantics of Choices
The explicit probabilistic Taylor expansion is satisfactory in that it is an extension of deterministic Taylor expansion which preserves its most important properties: it is regular and so are its normal forms. But while deterministic Taylor normal forms are well known to correspond to Böhm trees [7], explicit Taylor normal forms are not such a good denotational semantics for probabilistic -calculus, as they take the exact choices made during the reduction into account. For instance the terms and have distinct explicit Taylor normal forms while one could expect them to have the same semantics. More precisely we expect any model of the probabilistic -calculus to interpret probabilistic choices as a barycentric sum respecting the following equivalence.
Definition 4.1**.**
The barycentric equivalence is the least congruence on such that for all and :
[TABLE]
Saying it another way, We want a notion of Taylor expansion such that if then . This is easy to achieve, as the resource -calculus stemmed precisely from quantitative models of the -calculus, and resource terms are linear combinations.
Definition 4.2**.**
The sets of simple resource terms and of simple resource poly-terms are:
[TABLE]
The set of resource terms is and the set of resource poly-terms is .
Definition 4.3**.**
The Taylor expansion of a term is defined inductively as follows:
[TABLE]
The definition of the Taylor expansion of a probabilistic choice immediately gives the expected property.
Proposition 4.1**.**
If then .
4.2 Normalisation
Unfortunately, these Taylor expansions lack all the good properties of explicit expansions: they are not entirely defined by their support, and those supports are not uniform, so we do not even know if such Taylor expansions admit normal forms. But there is actually a close relationship between explicit and non explicit Taylor expansions which can be used to recover our most important results. Indeed, switching from the explicit Taylor expansion to the Taylor expansion simply amounts to using coefficients instead of explicit choices.
Definition 4.4**.**
Given any we define and a probability as follows:
[TABLE]
To any probabilistic resource (poly-)term one could associate the resource term . But just like with normalisation, infinite coefficients may appear. For instance, removing the choices from could give an infinite coefficient. Fortunately, we do not get any infinite coefficient if we work with regular terms.
Proposition 4.2**.**
For any such that for all , and we have .
Corollary 4.3**.**
For all regular, is in .
In particular, we can apply this process to explicit Taylor expansions and to their normal forms. It is easy to see that we associate to every explicit Taylor expansion the corresponding Taylor expansion, but more interestingly erasing choices commutes with normalisation.
Proposition 4.4**.**
For any :
[TABLE]
hence is well defined. We denote it by and we call it the Taylor normal form of .
Proof.
The key point is that and for any , . ∎
4.3 Adequacy
The behaviour of a probabilistic -term is usually described as a (sub-)probability distribution over the possible results of its evaluation. In particular, the observable behaviour of a term is its convergence probability, i.e. the probability for its computation to terminate [11, 5]. To show that the Taylor expansion gives a meaningful semantics we will prove it is adequate, i.e. it does not equate terms which are not observationally equivalent. We can actually show a more refined result, given as a Corollary of Theorem 3.7: the Taylor normal form of a term is given by the Taylor normal forms of its head normal forms.
Definition 4.5**.**
The any sequence of choices we associate a probability by:
[TABLE]
The probability for to reduce into a head normal form and its convergence probability are defined as follows:
[TABLE]
Proposition 4.5**.**
For we have:
[TABLE]
Proof.
This is given by Proposition 4.4 and Theorem 3.7. Observe that for any and we have and . ∎
The adequacy follows immediately.
Proposition 4.6**.**
If then for all context , , i.e. and are contextually equivalent.
Proof.
First the convergence probability of a term is exactly the sum of the coefficients . Second if then for all . ∎
5 On the Taylor Expansion and Böhm Trees
5.1 A Commutation Theorem
Deterministic Taylor normal forms are an adequate semantics for the probabilistic -calculus, but more precisely they are known to correspond to Böhm trees [7]. We are now able to show that this result extends to the probabilistic case.
Definition 5.1**.**
The sets of probabilistic Böhm trees and of probabilistic value trees for are defined inductively by induction on the depth :
[TABLE]
where is the set of countable-support subprobability distributions on any set , is the only subprobability distribution over the empty set, i.e. over .
Definition 5.2**.**
We define for and , and for and by induction on the depth as follows:
[TABLE]
Intuitively the Böhm tree of a term is the limit of its finite Böhm approximants . To avoid making the structure of Böhm trees of infinite depth explicit, we simply write for the sequence . In particular we say that and have the same Böhm tree iff for every .
The definition of the Taylor expansion can easily be generalised to finite-depth Böhm trees. We simply define for and for by:
[TABLE]
We extend this definition to infinite Böhm trees as follows: if contains at most layers of nested multisets then for any , for all , so can be taken as . Then the Taylor normal form of a term is exactly the Taylor expansion of its Böhm tree.
Theorem 5.1**.**
For all , .
Proof.
We prove by induction on , using to Proposition 4.5. ∎
This theorem is important but it does not actually prove the correspondence between Böhm trees and Taylor expansions: we still do not know if Taylor expansion is injective on Böhm trees. In the deterministic case this is simple to prove: to every deterministic Böhm tree of depth we can associate a simple resource term such that for all , iff (by associating to ). The situation is more complicated in the probabilistic case, as Taylor expansions are no longer defined solely by their supports. The rest of this article is devoted to proving injectivity for the probabilistic Taylor expansion.
5.2 Böhm Tests
In order to better understand coefficients in probabilistic Taylor expansions and to get our injectivity property, we use a notion of testing coming from the literature on labelled Markov decision processes [17].
Definition 5.3** (Böhm Tests).**
The classes of Böhm term tests (BTTs) and Böhm hnf tests (BHTs) are given as follows, by mutual induction:
[TABLE]
The probability of success of a BTT on a term and the probability of success of a BHT on an head-normal-form , indicated as and respectively, are defined as follows:
[TABLE]
[TABLE]
The following is the first step towards proving the main result of this paper, as it characterises Böhm tree equality as equality of families of real numbers.
Theorem 5.2**.**
Two terms and have the same Böhm trees iff for every BTT it holds that .
Theorem 5.2 is quite nontrivial to prove. Section 6 is dedicated to a proof of this result.
6 Probabilistic Tree Transition Systems and Testing Equivalence
A tree transition system is a tuple such that
- •
and are sets of linear states and of branching states, respectively.
- •
and are disjoint sets of labels.
- •
The linear transition map is a partial function from to distributions over ;
- •
The branching transition map is a partial function from to .
An example of a tree transition system is the one coming out of Böhm trees as defined in the last section. In particular:
- •
is the set of terms, while is the set of head normal forms.
- •
, while .
- •
and can be defined in the natural way.
Let us call the resulting tree transition system .
A tree bisimulation relation for a tree transition system is given by two relations and such that the following two contstraints both hold:
- •
If , then for every label it holds that is defined iff is defined, and in the latter case there is such that
[TABLE]
where for every it holds that .
- •
If , then for every label it holds that is defined iff is defined, and in the latter case there is such that
[TABLE]
where for every it holds that .
The (pointwise) largest bisimulation relation is called tree-bisimilarity, and is indicated as .
Lemma 6.1**.**
Two terms and have the same Böhm Tree iff .
Proof.
One the one hand, we can prove that equality of Böhm trees is a tree bisimulation relation for . On the other hand, we can prove that if , then their Böhm trees are equal up to any level , by induction on . ∎
The rest of this section is devoted to proving that tree-bisimilarity can be characterised by a notion of testing, which generalises the one we saw for in the previous section. The set of linear and branching tests are defined as follows
[TABLE]
The probability of success of a linear test on a linear state and the one of a branching test on a branching state , indicated as and respectively, are defined as follows:
[TABLE]
Two linear states are said to be testing equivalent iff for every linear test we have that
[TABLE]
Similarly for branching states. Testing equivalence is indicated with , where is the underlying tree transition system. It consists of a pair of equivalence relations
Theorem 6.2**.**
* and coincide.*
Proof.
The idea is to make heavy use of the results from [17], which relate bisimilarity and testing equivalence. We are however a little detour which needs to be taken, due to the fact that the results from [17] are formulated for Labelled Markov Chains (LMCs), while we need the same result we need here is for tree transition system. The way we will proceed consists in defining, for every tree transition system an equivalent LMC , then proving that both bisimilarity and testing equivalent in and coincide. Given a tree transition system , we define the LMC as the triple where and:
- •
On the states from , behaves like ;
- •
For every state in , we have that
[TABLE]
- •
In all the other cases, returns the empty distribution.
The results from [17] tell us that testing equivalence and bisimilarity coincide in , where tests now have the following form:
[TABLE]
and . The rest of the proof is thus organised as follows:
- •
We can first of all prove that and coincide. This can be proved by showing that any -test can be turned into a -test having the same probability of success, and vice versa. The two mappings we need can be given as follows, by induction on the structure of tests:
- •
We can then prove that and coincide, by proving that each of the two relations is a bisimulation in the sense of the other.
∎
Proposition 6.3**.**
For all BTT context with a hole in BHT, for all there exists a probability distribution such that for all BHT , .
Proof.
We prove this result, as well as its equivalent for head normal forms and BHT contexts, by induction on test contexts.
For BHT contexts, let . For the empty context we have . For a product we apply the induction hypothesis to to get and we have
[TABLE]
The same goes if the hole is on the right side of a conjunction. Finally for a test context of the form , either if does not have the right shape, or , the induction hypothesis applied to and gives some , and we have
[TABLE]
For BTT contexts the cases of and conjunction are similar. The interesting case is that of the evaluation. Given a BTT context and we apply the induction hypothesis to and every head normal form , or at least any such that , to get distributions . Then we have
[TABLE]
∎
7 Implementing Tests as Resource Terms
There is a very tight correspondence between simple resource terms and Böhm tests, but this correspondence does not hold for all Böhm tests. Simple resource terms can be seen as a particular class of Böhm tests.
Definition 7.1**.**
The classes of resource Böhm term tests (rBTTs) and resource Böhm hnf tests (rBHTs) are given as follows, by mutual induction:
[TABLE]
Definition 7.2**.**
For every rBTT we define a simple poly-term and for every rBHT we define a simple term in the following way:
[TABLE]
The similarity between simple resource terms and resource Böhm tests is more than structural: the probability of success of a resource Böhm test is actually given by a coefficient in the Taylor normal form.
Proposition 7.1**.**
For every rBTT and , . 2. 2.
For every rBHT and , .
Proof.
We reason by induction on tests. Observe that these can be considered modulo commutativity and associativity of the conjunction and modulo : these equivalences preserve both the results of testing and the associated simple resource (poly-)terms. Then every rBTT is equivalent either to or to a conjunction . In the first case we always have . In the second case just like in the proof of regularity of the exponential (Proposition 2.21) for any we have . To conclude we want to show that for all . We have by definition , and Proposition 4.5 gives , so we conclude by induction hypothesis on . Now given a rBHT and we have either and if is of the form , in which case we conclude by induction hypothesis, or otherwise. ∎
With this result, we completely characterise Taylor normal forms by resource Böhm tests.
Corollary 7.2**.**
Two terms and have the same Taylor normal form iff for every rBTT it holds that .
Proof.
Simply observe that every simple resource term in normal form is equal to for some resource Böhm test . ∎
Thanks to Theorem 5.2 and Corollary 7.2 both Böhm tree equality and Taylor normal form equality are characterised by tests. They still leave a gap in our reasoning, as not all Böhm tests are resource Böhm tests. This difference is not just cosmetic: is a valid Böhm test which computes the convergence probability of any -term, which cannot be done using only resource Böhm tests. More precisely this cannot be done using a single Böhm test. To fill the gap between Böhm tests and resource Böhm tests we observe that any of the former can be simulated by a family of resource Böhm tests.
Proposition 7.3**.**
For every BTT there is a family of rBTTs of arbitrary size (possibly empty, possibly infinite) such that for all -term we have .
Proof.
We prove this, as well as the corresponding result for BHTs, by induction on the size of tests. In the case of BTTs, the result is simply given by induction hypothesis. To the BTT we associate the single-element family , to we associate where and are given by induction hypothesis on and , and to we associate . The interesting part of the proof is on BHTs, where we want to remove two constructors. Modulo commutativity and associativity of the conjunction and the equivalence , every BHT is either or of the form with . In the first case to we associate the family where denotes the sequence of length . In the second case if , or for some then the result of the test is always [math], which is simulated by the empty family of rBHTs. Otherwise let , and , the test is equivalent to . We apply the induction hypothesis to the BTTs to get families and we associate the family to the original BHT. ∎
Corollary 7.4**.**
Given two terms and , for every BTT it holds that iff for every rBTT it holds that .
We can now state the main result of this paper.
Theorem 7.5**.**
Two terms have the same Böhm trees iff their Taylor expansions have the same normal forms.
Proof.
The result follows from Theorem 5.2, Corollary 7.4 and Corollary 7.2. ∎
8 Conclusion
In this paper, we attack the problem of extending the Taylor Expansion construction to the probabilistic -calculus, at the same time preserving its nice properties. What we find remarkable about the defined notion of Taylor expansion is that its codomain is the set of ordinary resource terms, and that the equivalence induced by the Taylor expansion is precisely the one induced by Böhm trees [13]. The latter, not admitting , is strictly included in contextual equivalence.
Among the many questions this work leaves open, we could cite the extension of the proposed definition to call-by-value reduction, along the lines of [12], and a formal comparison between the notion of equivalence introduced here and the the one from [15] in which, however, the target language is not the one of ordinary resource terms, but one specifically designed around probabilistic effects.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H.P. Barendregt. The Lambda Calculus: Its Syntax and Semantics . Studies in Logic and the Foundations of Mathematics. Elsevier Science, 1984.
- 2[2] Johannes Borgström, Ugo Dal Lago, Andrew D. Gordon, and Marcin Szymczak. A lambda-calculus foundation for universal probabilistic programming. In Proc. of ICFP 2016 , pages 33–46, 2016.
- 3[3] Gérard Boudol. The lambda-calculus with multiplicities. Technical Report 2025, INRIA Sophia-Antipolis, 1993.
- 4[4] Ugo Dal Lago and Thomas Leventis. On the Taylor expansion of probabilistic lambda terms (long version). Available at http://www.cs.unibo.it/~dallago/TEPLC.pdf , 2019.
- 5[5] Thomas Ehrhard, Michele Pagani, and Christine Tasson. Full abstraction for probabilistic PCF. J. ACM , 65(4):23:1–23:44, 2018.
- 6[6] Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus. Theor. Comput. Sci. , 309(1-3):1–41, 2003.
- 7[7] Thomas Ehrhard and Laurent Regnier. Böhm trees, Krivine’s machine and the Taylor expansion of lambda-terms. In Proc. of CIE 2006 , pages 186–197, 2006.
- 8[8] Thomas Ehrhard and Laurent Regnier. Differential interaction nets. Theor. Comput. Sci. , 364(2):166–195, 2006.
