The Geometry of Bayesian Programming
Ugo Dal Lago, Naohiko Hoshino

TL;DR
This paper introduces a geometric interaction model for a typed lambda-calculus designed for higher-order Bayesian programming, incorporating sampling and soft conditioning, based on measurable spaces.
Contribution
It provides a novel geometric interaction model for Bayesian programming languages, connecting category theory with probabilistic semantics.
Findings
Model is adequate for distribution-based semantics
Model is adequate for sampling-based semantics
Framework supports higher-order Bayesian programming
Abstract
We give a geometry of interaction model for a typed lambda-calculus endowed with operators for sampling from a continuous uniform distribution and soft conditioning, namely a paradigmatic calculus for higher-order Bayesian programming. The model is based on the category of measurable spaces and partial measurable functions, and is proved adequate with respect to both a distribution-based and a sampling based operational semantics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Logic, programming, and type systems · Advanced Algebra and Logic
The Geometry of Bayesian Programming
Ugo Dal Lago
Naohiko Hoshino
Abstract
We give a geometry of interaction model for a typed -calculus endowed with operators for sampling from a continuous uniform distribution and soft conditioning, namely a paradigmatic calculus for higher-order Bayesian programming. The model is based on the category of measurable spaces and partial measurable functions, and is proved adequate with respect to both a distribution-based and a sampling based operational semantics.
1 Introduction
Randomisation provides the most efficient algorithmic solutions, at least concretely, in many different contexts. A typical example is the one of primality testing, where the Miller-Rabin test [1, 2] remains the preferred choice despite polynomial time deterministic algorithms are available from many years now [3]. Probability theory can be exploited even more fundamentally in programming, by way of so-called probabilistic (or, more specifically, Bayesian) programming, as popularized by languages like, among others, ANGLICAN [4] or CHURCH [5]. This has stimulated research about probabilistic programming languages and their semantics [6, 7, 8], together with type systems [9, 10], equivalence methodologies [11, 12], and verification techniques [13].
Giving a satisfactory denotational semantics to higher-order functional languages is already problematic in presence of probabilistic choice [6, 14], and becomes even more challenging when continuous distributions and scoring are present. Recently, quasi-Borel spaces [15] have been proposed as a way to give semantics to calculi with all these features, and only very recently [16] this framework has been shown to be adaptable to a fully-fledged calculus for probabilistic programming, in which continuous distributions and soft-conditioning are present. Probabilistic coherent spaces [17] are fully abstract [8] for -calculi with discrete probabilistic choice, and can, with some effort, be adapted to calculi with sampling from continuous distributions [18], although without scoring.
A research path which has been studied only marginally, so far, consists in giving semantics to Bayesian higher-order programming languages through interactive forms of semantics, e.g. game semantics [19, 20] or the geometry of interaction [21]. One of the very first models for higher-order calculi with discrete probabilistic choice was in fact a game model, proved fully abstract for a probabilistic calculus with global ground references [7]. After more than ten years, a parallel form of Geometry of Interaction (GoI) and some game models have been introduced for -calculi with probabilistic choice [22, 23, 24], but in all these cases only discrete probabilistic choice can be handled, with the exception of a recent work on concurrent games and continuous distributions [25].
In this paper, we will report on some results about GoI models of higher-order Bayesian languages. The distinguishing features of the introduced GoI model can be summarised as follows:
- •
Simplicity. The category on which the model is defined is the one of measurable spaces and partial measurable functions, so it is completely standard from a measure-theoretic perspective.
- •
Expressivity. As is well-known, the GoI construction [26, 27] allows to give semantics to calculi featuring higher-order functions and recursion. Indeed, our GoI model can be proved adequate for , a fully-fledged calculus for probabilistic programming.
- •
Flexibility. The model we present is quite flexible, in the sense of being able to reflect the operational behaviour of programs as captured by both the distribution-based and the sampling-based semantics.
- •
Intuitiveness. GoI visualises the structure of programs in terms of graphs, from which dependencies between subprograms can be analyzed. Adequacy of our model provides diagrammatic reasoning principle about observational equivalence of .
This paper’s contributions, beside the model’s definition, are two adequacy results which precisely relate our GoI model to the operational semantics, as expressed (following [28]), in both the distribution and sampling styles. As a corollary of our adequacy results, we show that the distribution induced by sampling-based operational semantics coincides with distribution-based operational semantics.
1.1 Turning Measurable Spaces into a GoI
Model
Before entering into the details of our model, it is worthwhile to give some hints about how the proposed model is obtained, and why it differs from similar GoI models from the literature.
The thread of work the proposed model stems from is the one of so-called memoryful geometry of interaction [29, 30]. The underlying idea of this paper is precisely the same: program execution is modelled as an interaction between the program and its environment, and memoisation takes place inside the program as a result of the interaction.
In the previous work on memoryful GoI by the second author with Hasuo and Muroya, the goal consisted in modelling a -calculus with algebraic effects. Starting from a monad together with some algebraic effects, they gave an adequate GoI model for such a calculus, which is applicable to wide range of algebraic effects. In principle, then, their recipe could be applicable to , sinc sampling-based operational semantics enables us to see scoring and sampling as algebraic effects acting on global states. However, the that would not work for , since the category of measurable spaces111We need to work on because we want to give adequacy for distribution-based semantics. is not cartesian closed, and we thus cannot define a state monad by way of the exponential .
In this paper, we side step this issue by a series of translations, to be described in Section 4 below. Instead of looking for a state monad on , we embed into the category of -objects and Mealy machines (Section 5) and use a state monad on this category. This is doable because is a compact closed category given by the -construction [27]. The use of such compact closed categories (or, more generally, of traced monoidal categories) is the way GoI models capture higher-order functions.
1.2 Outline
The rest of the paper is organised as follows. After giving some necessary measure-theoretic preliminaries in Section 2 below, we introduce in Section 3 the language , together with the two kinds of operational semantics we were referring to above. In Section 4, we introduce our GoI model informally, while in Section 5 a more rigorous treatment of the involved concepts is given, together with the adequacy results. We discuss in Section 10 an alternative way of giving a GoI semantics to based on s-finite kernels, and we conclude in Section 12.
2 Measure-Theoretic Preliminaries
We recall some basic notions in measure theory that will be needed in the following. We also fix some useful notations. For more about measure theory, see standard text books such as [31].
A -algebra on a set is a family consisting of subsets of such that ; and if , then the complement is in ; and for any family , the intersection is in . A measurable space is a set equipped with a -algebra on . We often confuse a measurable space with its underlying set . For example, we simply write instead of . For measurable spaces and , we say that a partial function (in this paper, we use for both partial functions and total functions) is measurable when for all , the inverse image
[TABLE]
is in . A measurable function from to is a totally defined partial measurable function. A (partial) measurable function is invertible when there is a measurable function such that and are identities. In this case, we say that is an isomorphism from to and say that is isomorphic to .
We denote a singleton set by , and we regard the latter as a measurable space by endowing it with the trivial -algebra. We also regard the empty set as a measurable space in the obvious way. In this paper, denotes the measurable set of all non-negative integers equipped with the -algebra consisting of all subsets of , and denotes the measurable set of all real numbers equipped with the -algebra consisting of Borel sets, that is, the least -algebra that contains all open subsets of . By the definition of , a function is measurable whenever for all open subsets . Therefore, all continuous functions on are measurable.
When is a subset of the underlying set of a measurable space , we can equip with a -algebra . This way, we regard the unit interval and the set of all non-negative real numbers as measurable spaces, and indicate them as follows:
[TABLE]
For measurable spaces and , we define the product measurable space and the coproduct measurable space by
[TABLE]
where the underlying -algebras are:
[TABLE]
We assume that has higher precedence than , i.e., we write for . In this paper, we always regard finite products as the product measurable space on . It is well-known that the -algebra is the set of all Borel sets, i.e., is the least one that contains all open subsets of . Partial measurable functions are closed under compositions, products and coproducts.
Let be a measurable space. A measure on is a function from to that is the set of all non-negative real numbers extended with , such that
- •
; and
- •
for any mutually disjoint family , we have .
We say that a measure on is finite when and that it is -finite if for some family satisfying .
For a measurable space , we write for a measure on given by for all . If is a measure on a measurable space , then for any non-negative real number , the function is also a measure on . The Borel measure on is the unique measure that satisfies
[TABLE]
We define the Borel measure on by . For a measurable function and a measurable subset , we denote the integral of with respect to the Borel measure restricted to by
[TABLE]
For a measurable space and for an element , a Dirac measure on is given by
[TABLE]
The square bracket notation in the right hand side is called Iverson’s bracket. In general, for a proposition , we have when is true and when is false.
Proposition 2.1**.**
For every -finite measures on a measurable space and on a measurable space , there is a unique measure on such that for all and .
The measure is called the product measure of and . For example, the Borel measure on is the product measure of the Borel measure on .
Finally, let us recall the notion of a kernel, which is a well-known concept in the theory of stochastic processes. For measurable spaces and , a kernel from to is a function such that for any , the function is a measure on , and for any , the function is measurable. Notions of finite and -finite kernels can be naturally given, following the emponymous constraint on measures. Those kernels which can be expressed as the sum of countably many finite kernels are said to be s-finite [32]. We use kernels to give semantics for our probabilistic programming language, to be defined in the next section.
3 Syntax and Operational Semantics
3.1 Syntax and Type System
Our language for higher order Bayesian programming can be seen as Plotkin’s endowed with real numbers, measurable functions, sampling from the uniform distribution on and soft-conditioning. We first define types , values and terms as follows:
[TABLE]
Here, varies over a countably infinite set of variable symbols, and varies over the set of all real numbers. Each function identifier is associated with a measurable function from to . For terms and , we write for the capture-avoiding substitution of in by .
Terms in are restricted to be A-normal forms, in order to make some of the arguments on our semantics simpler. This restriction is harmless for the language’s expressive power, thanks to the presence of -bindings. For example, term application can be defined to be .
The term constructor and the constant enable probabilistic programming in . Evaluation of has the effect of multiplying the weight of the current probabilistic branch by , this way enabling a form of soft-conditioning. The constant generates a real number randomly drawn from the uniform distribution on . Only one sampling mechanism is sufficient because we can model sampling from other standard distributions by composing with measurable functions [33].
Terms can be typed in a natural way. A context is a finite sequence consisting of pairs of a variable and a type such that every variable appears in at most once. A type judgement is a triple consisting of a context , a term and a type . We say that a type judgement is derivable when we can derive from the typing rules in Figure 1. Here, the type of is , and the type of is because returns a real number, and the purpose of scoring is its side effect.
In the sequel, we only consider derivable type judgements and typable closed terms, that is, closed terms such that is derivable for some type .
3.2 Distribution-Based Operational Semantics
We define distribution-based operational semantics following [28] where, however, a -algebra on the set of terms is necessary so as to define evaluation results of terms to be distributions (i.e. measures) over values. In this paper, we only consider evaluation of terms of type and avoid introducing -algebras on sets of closed terms, thus greatly simplifying the overall development.
Distribution-based operational semantics is a function that sends a closed term to a measure on . Because of the presence of , the measure may not be a probabilistic measure, i.e., may be larger than , but the idea of distribution-based operational semantics is precisely that of associating each closed term of type with a measure over .
As common in call-by-value programming languages, evaluation is defined by way of evaluation contexts:
[TABLE]
The distribution-based operational semantics of is a family of binary relations between closed terms of type and measures on inductively defined by the evaluation rules in Figure 2 where the evaluation rule for is inspired from the one in [32]. The binary relation in the precondition of the third rule in Figure 2 is called deterministic reduction and is defined as follows as a relation on closed terms:
[TABLE]
The last evaluation rule in Figure 2 makes sense because in the precondition is a kernel from to :
Lemma 3.1**.**
For any and for any term
[TABLE]
there is a finite kernel from to such that for any and for any measure on ,
[TABLE]
where .
Proof.
Let be a context of the form . In this proof, for a finite sequence , and for a term , we denote
[TABLE]
by . We prove the statement by induction on . (Base case) Let be a kernel from to given by
[TABLE]
Then for any ,
[TABLE]
(Induction step) We define a redex by
[TABLE]
We note that in the above BNF can be variables. By induction on the size of type derivation, we can show that every term is either a value or of the form for some evaluation context and some redex . Given a term where , we prove the induction step by case analysis.
- •
If is a value, then is either a variable or a constant . When is a variable , we have
[TABLE]
When is a constant , we have
[TABLE]
Both given by
[TABLE]
are kernels from to .
- •
If is of the form , then by induction hypothesis, there is a kernel from to such that for any ,
[TABLE]
We define a kernel from to by
[TABLE]
This is a kernel because if is a non-negative measurable function, then
[TABLE]
is measurable. See [31, Theorem 18.3]. Then, for any ,
[TABLE]
- •
If is of the form for some , then by induction hypothesis, there is a kernel from to such that for any ,
[TABLE]
We define a kernel to by
[TABLE]
Then, for any ,
[TABLE]
- •
If is of the form for some , then by induction hypothesis, there is a kernel from to such that for any ,
[TABLE]
We define a kernel to by
[TABLE]
Then, for any ,
[TABLE]
- •
If is of the form , then by induction hypothesis, there is a kernel from to such that for all ,
[TABLE]
Hence,
[TABLE]
- •
If is of the form , then by induction hypothesis, there is a kernel from to such that for all ,
[TABLE]
Hence,
[TABLE]
- •
If is of the form , then is equal to either a variable or a constant . For simplicity, we suppose that and and . By induction hypothesis, there is a kernel from to such that for all ,
[TABLE]
We define a kernel from to by
[TABLE]
Then, for any ,
[TABLE]
- •
If is of the form , then by induction hypothesis, there is a kernel from to such that for all ,
[TABLE]
Hence,
[TABLE]
- •
If is of the form for some , then by induction hypothesis, there are kernels and from to such that for any ,
[TABLE]
We define a kernel from to by
[TABLE]
Then, for any ,
[TABLE]
- •
If is of the form , then by induction hypothesis, there is a kernel from to such that for any ,
[TABLE]
Hence,
[TABLE]
- •
If is of the form for some real number , then by induction hypothesis, there is a kernel from to such that
[TABLE]
Hence,
[TABLE]
∎
Lemma 3.1 implies that the relations can be seen as functions from the set of closed terms of type to the set of measures on .
The step-indexed distribution-based operational semantics approximates the evaluation of closed terms by restricting the number of reduction steps. Thus, the limit of the step-indexed distribution-based operational semantics represents the “true” result of evaluating the underlying term.
Definition 3.1**.**
For a closed term and a measure on , we write when there is a family of measures on such that and for all ,
[TABLE]
The binary relation is a function from the set of closed terms of type to the set of measures on . This follows from Lemma 3.1 and that the family of measures on such that forms an ascending chain with respect to the pointwise order. Moreover, it can be proved that for any , given by is an s-finite kernel.
3.3 Sampling-Based Operational Semantics
can be endowed with another form of operational semantics, closer in spirit to inference algorithms, called the sampling-based operational semantics. The way we formulate it is deeply inspired from the one in [28].
The idea behind sampling-based operational semantics is to give the evaluation result of each probabilistic branch somehow independently. We specify each probabilistic branch by two parameters: one is a sequence of random draws, which will be consumed by ; the other is a likelihood measure called weight, which will be modified by .
Definition 3.2**.**
A configuration is a triple consisting of a closed term , a real number called the configuration’s weight, and a finite sequence of real numbers in , called its trace.
Below, we write for the empty sequence. For a real number and a finite sequence consisting of real numbers, we write for the finite sequence obtained by putting on the head of . In Figure 3, we give the evaluation rules of sampling-based operational semantics where is the deterministic reduction relation introduced in the previous section. We denote the reflective transitive closure of by . Intuitively, means that by evaluating , we get the real number with weight consuming all the random draws in .
4 Towards Mealy Machine Semantics
In this section, we give some intuitions about our GoI model, which we also call Mealy machine semantics. Giving Mealy machine semantics for requires translating into the linear -calculus. This is because GoI is a semantics for linear logic, and is thus tailored for calculi in which terms are treated as resources. Schematically, Mealy machine semantics for translates terms in into Mealy machines in the following way.
[TABLE]
In Section 4.1, we explain the first three steps. The last step deserves to be explained in more detail, which we do in Section 4.2. For the sake of simplicity, we ignore the translation of conditional branching and the fixed point operator.
4.1 From to Proof Structures
4.1.1 Moggi’s Translation
In the first step, we translate into an extension of the Moggi’s meta-language by Moggi’s translation [34]. Here, in order to translate scoring and sampling in , we equip Moggi’s meta-language with base types and and the following terms:
[TABLE]
where is the monad of Moggi’s meta-language. Any type of is translated into the type defined as follows:
[TABLE]
Terms and in are translated into and in Moggi’s meta-language respectively. See [34] for more detail about Moggi’s translation.
4.1.2 Girard Translation
We next translate the extended Moggi’s meta-language into an extension of the linear -calculus, by way of the so-called Girard translation [35]. Types are given by
[TABLE]
where , and are base types, and terms are generated by the standard term constructors of the linear -calculus, plus the following rules:
[TABLE]
(as customary in linear logic, is an abbreviation of ). These typing rules are derived from the following translation of types of the extended Moggi’s meta-language into types of the extended linear -calculus:
[TABLE]
The definition of is motivated by the following categorical observation: let be the syntactic category of the extended linear -calculus, which is a symmetric monoidal closed category endowed with a comonad with certain coherence conditions (see e.g. [36]), and let be the coKleisli category of the comonad . Then, by composing the adjunction between and with a state monad on , we obtain a monad on :
[TABLE]
which sends an object to . This use of the state monad is motivated by sampling-based operational semantics: we can regard as a call-by-value -calculus with global states consisting of pairs of a non-negative real number and a finite sequence of real numbers, and we can regard and as effectful operations interacting with those states.
4.1.3 The Third Step
We translate terms in the extended linear -calculus into (an extension of proof structures) [37], which are graphical presentations of type derivation trees of linear -terms. We can also understand proof structures as string diagrams for compact closed categories [38]. Operators of the pure, linear, -calculus, can be translated as usual [37]. For example, type derivation trees
[TABLE]
are translated into proof structures
\mathtt{M}$$\mathtt{N}$$\otimes$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\otimes\mathtt{B}$$\wp$$\mathtt{A}$$\mathtt{A}^{\bot}$$\mathtt{A}\multimap\mathtt{A}$$\mathtt{A}
respectively where nodes labelled with and are proof structures associated to type derivations of and . Terms of the form , and , require new kinds of nodes:
\mathtt{r}_{a}$$\mathtt{Real}$$\mathtt{sa}$$\oc\mathtt{Real}$$\mathtt{State}^{\bot}$$\mathtt{State}$$\mathtt{sc}$$\mathtt{State}$$\mathtt{State}^{\bot}$$\oc\mathtt{Unit}$$\oc\mathtt{Real}
.
This is not a direct adaptation of typing rules for and in the linear -calculus, but the correspondence can be recovered by way of multiplicatives:
\otimes$$\wp$$\mathtt{State}$$\mathtt{State}^{\bot}$$\oc\mathtt{A}$$\mathtt{State}\otimes\oc\mathtt{A}$$\mathtt{State}\multimap\mathtt{State}\otimes\oc\mathtt{A}
.
4.2 From Proof Structures to Mealy Machines
The series of translations from to proof structures is agnostic as for the computational meaning of and in because and introduced in these translations are just constant symbols. In other words, the translation from to the extended proof structures is not sound with respect to either form of operational semantics for . In the last translation step, we assign proof structures a computational meaning, respecting the operational semantics of the underlying term.
We do this by associating proof structures with Mealy machines. A Mealy machine is an input/output-machine whose evolution may depend on its current state. In this paper, for the sake of supporting intuition and of enabling graphical reasoning, we depict a Mealy machine as a node with some input/output-ports:
\mathsf{M}$$\mathsf{M}$$x$$y$$s/t$$\mathsf{M}$$z$$w$$s^{\prime}/t^{\prime}
.
For example, the thick arrow in the middle diagram indicates that if the current state is and the given input is , then the Mealy machine outputs and changes its state to . In the GoI jargon, data traveling along edges of proof structures are often called tokens.
For the standard proof structures, we can follow [39] where Mealy machines associated with proof structures are built up from Mealy machines associated to each nodes. For example, the following nodes
\otimes$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\otimes\mathtt{B}$$\wp$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\mathbin{\wp}\mathtt{B}
are both associated with a one-state Mealy machine that behaves in the following manner:
\mathtt{A}$$\mathtt{B}$$\mathtt{A}\mathbin{\otimes}\mathtt{B}$$b$$(\circ,b)$$a$$(\bullet,a)$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\mathbin{\otimes}\mathtt{B}$$b$$(\bullet,b)$$a$$(\circ,a)
.
Namely, the Mealy machine forwards each input from the left hand side to the right hand side endowing it with a tag that tells where the token came from. The Mealy machine handles inputs from the right hand side in the reverse way.
Soundness of Mealy machine semantics states that if two (pure) linear -terms are -equivalent, then the behaviours of the Mealy machines associated to these terms are the same. As an example, let us consider a -reduction step The proof structure associated to is the graph in the left hand side, and the arrow in the right hand side illustrates a trace of a run of this Mealy machine for an input from the right edge:
\wp$$\otimes$$\mathtt{A}$$\mathtt{A}^{\bot}$$\mathtt{A}\multimap\mathtt{A}$$\mathtt{A}$$\mathtt{A}$$\wp$$\otimes$$a$$a
.
This Mealy machine forwards any input from the right hand side to the left hand side as indicated by the thick arrow, and it also forwards any input from the left hand side to the right hand side. Hence, the behaviour of this Mealy machine is equivalent to the behaviour of the following trivial Mealy machine:
\mathtt{A}$$a$$a$$a$$a
,
which is the interpretation of . This is in fact a symptom of a general phenomenon: Mealy machine semantics for the linear -calculus captures -reduction .
But how can we extend this Mealy machine semantics to and ? Here, we borrow the idea from Game semantics [40] that models computation in terms of interaction between programs and environments. For scoring and sampling, we can infer how they interact with the environment from sampling-based operational semantics. For scoring, we associate with a one-state Mealy machine that has the following transitions:
\mathtt{sc}$$\mathtt{State}$$\mathtt{State}^{\bot}$$\oc\mathtt{Unit}$$\oc\mathtt{Real}$$(a,u)$$(a,u)$$(a,b\mathbin{::}u)$$(|b|\,a,u)
where is a finite sequence of real numbers and are real numbers such that . We can read these transitions as follows: for each “configuration” , the Mealy machine sends a query to environment in order to know the value of its argument, and if environment answers that the value is , i.e., if the Mealy machine receives , then it outputs , which is the evaluation result of .
For sampling, we associate with a Mealy machine that has the following transitions:
\mathtt{sa}$$\ast/b$$\oc\mathtt{Real}$$\mathtt{State}$$\mathtt{State}^{\bot}$$(a,b\mathbin{::}u)$$(a,u)$$\mathtt{sa}$$b/b$$\oc\mathtt{Real}$$\mathtt{State}$$\mathtt{State}^{\bot}$$(a,u)$$(a,b\mathbin{::}u)
where is a finite sequence of real numbers and are real numbers such that . The first transition means that in the initial state , given a “configuration” , the Mealy machine pops the first element of and memorises the value by changing its state from to . After this transition, for any query asking the result of sampling, it answers the value memorised in the first transition.
For example, a Mealy machine
\mathsf{sa}$$\mathsf{sc}$$\oc\mathtt{Real}$$\mathtt{State}^{\bot}$$\mathtt{State}$$\oc\mathtt{Unit}$$\mathtt{State}
,
which is a denotation of the term
[TABLE]
and behaves as follows:
\mathsf{sa}$$\mathsf{sc}$$(a,u)$$(a,u)$$(a,b\mathbin{::}u)$$(a,b\mathbin{::}u)$$(|b|\,a,u)$$\ast/b
.
Our adequacy theorem says that the evaluation result of a term coincides with the execution result of the associated Mealy machine. In fact, for this case, the outcome of the above Mealy machine is equal to the evaluation result of , that is, . In this interaction process, the memoisation mechanism of the -node is necessary, otherwise the -node can not tell the -node that the result of sampling is .
Remark 4.1**.**
*Two notions of state (the one coming from the state monad and the one of the of the Mealy machine itself) are used for different purpose here: the first notion is needed to model the call-by-value evaluation strategy where we need to store intermediate effects that are invoked during the evaluation. The second notion of state is needed to model sampling. More concretely, each Mealy machine for sampling need to remember the already sampled values in the current probabilistic branch. *
5 Mealy Machines and their Compositions
After having described Mealy machine semantics briefly and informally, it is now time to get more formal. In this section, we introduce the notion of a Mealy machine and some constructions on Mealy machines. We also introduce a way of diagramatically presenting Mealy machines which is behaviourally sound.
5.1 Mealy Machines, Formally
In this paper, we call a pair of measurable spaces an -object. We use sans-serif capital letters to denote -objects, and we denote the positive/negative part of an -object by the same italic letter superscripted by . For example, denotes an -object consisting of two measurable spaces and . The name “-object” comes from the so-called -construction [26]. Definition 5.1 and the definition of monoidal products in Section 5.4 are also motivated by -construction.
Definition 5.1**.**
For -objects and , a Mealy machine from to consists of
- •
a measurable space called the state space of ;
- •
an element called the initial state of ;
- •
a partial measurable function
[TABLE]
called the transition function.
If is a Mealy machine from to , we write .
The transition function of a Mealy machine describes a mapping between inputs and outputs which can also alter the underlying state. For and , means that when the current state of is , given an input , there is an output and the next state is .
Readers may wonder why appears in the target and appears in the source of the transition function of a Mealy machine from to . In short, this is because we are interested in Mealy machines that handle bidirectional computation. The diagrammatic presentation of Mealy machines clarifies the meaning of “bidirectional.” Let be a Mealy machine. In this paper, we depict as follows:
\mathsf{M}$$\mathsf{Y}$$\mathsf{X}
.
Intuitively, each label on an edge indicates the type of data traveling along the edge. Namely, on the -edge (on the -edge), elements in (in ) go from left to right, and elements in (in ) go from right to left. For example, we depict the following transitions
[TABLE]
for some , , and as the following thick arrows
\mathsf{M}$$s_{0}/s_{1}$$\mathsf{Y}$$\mathsf{X}$$y$$x$$\mathsf{M}$$s_{0}/s_{2}$$\mathsf{Y}$$\mathsf{X}$$y^{\prime}$$y^{\prime\prime}
.
(Recall that the white/black bullet indicates the left/right part of the disjoint sum.) The expressions and on the Mealy machine stands for transitions of states. We omit states transitions when we can infer them.
We will give some Mealy machines whose state spaces are trivial, namely . We call such a Mealy machine token machine. Our usage of the term token machine is along the lines of that in other papers on GoI such as [41, 39]. Since we can identify the transition function of a token machine with the following partial measurable function
[TABLE]
giving partial measurable function of this type is enough to specify a token machine.
Convention 5.1**.**
We define a token machine by giving a partial measurable function from to , and we also call this partial measurable function transition function of . Abusing notation, we write for this transition function.
5.2 Behavioural Equivalence
We are now ready to give an equivalence relation between Mealy machines which identifies machines which behave the same way. Identifying Mealy machines in terms of their behaviour is important to reason about compositions of Mealy machines in the following part of this paper. Here, we are inspired by behavioural equivalence from coalgebraic theory of modelling transition systems [42].
Let and be Mealy machines from to . We write when there is a measurable function satisfying and
[TABLE]
The definition means that if we have , then no observer can distinguish between and from their input/output behaviour, although their internal structure can be quite different. We define an equivalence relation to be the reflective symmetric transitive closure of . Below, if we can infer the subscript from the context, we write instead of .
Definition 5.2**.**
For Mealy machines , we say that is behaviourally equivalent to when .
For a Mealy machine , we write for its equivalence class with respect to behavioural equivalence. We define a binary relation between equivalence classes of Mealy machines from to by if and only if there are and such that and , and the graph relation of is a subset of the graph relation of .
Proposition 5.1**.**
The set of equivalence classes for with is a pointed cpo.
We can characterize interpretation of the fixed point operator in in terms of least fixed points, see [43]. We give a proof of Proposition 5.1 in Section 5.3.
5.3 Proof of Proposition 5.1
For a partially defined expressions and , we write when is defined if and only if is defined, and if both expressions are defined, then they are the same. For example, we have for all . For a measurable space , we write for the measurable space of all finite sequences over endowed with the following -algebra:
[TABLE]
We write for the empty sequence. For and , we denote the list obtained by appending to by .
Let be a Mealy machine. We write for and for . Then, the transition function of is of the form
[TABLE]
We define partial measurable functions and by
[TABLE]
Below, for , we write for the first entry of , and we write for the second entry of . By the definition of and , we have
[TABLE]
Lemma 5.1**.**
If , then .
Proof.
Let be a measurable function that realizes . We show and by induction on the size of . (Base case)
[TABLE]
[TABLE]
(Induction step)
[TABLE]
[TABLE]
∎
For a Mealy machine , we define Mealy machines by
- •
,
- •
,
- •
and
- •
,
- •
,
- •
Here, the -algebra of is the one induced by via the obvious bijection between and .
Lemma 5.2**.**
.
Proof.
It is straightforward to check that the embedding is a measurable function that realizes . It remains to show . We define a measurable function by
[TABLE]
We show that for any ,
[TABLE]
by induction on . (Base case)
[TABLE]
(Induction step)
[TABLE]
∎
Proposition 5.2**.**
For all Mealy machines , we have if and only if .
Proof.
If , then we can show that by using Lemma 5.1. If , then we have by the definition of . Because we have and (Lemma 5.2), we see that is behaviourally equivalent to . ∎
Hence, each equivalence class of behavioural equivalence is represented by , and is independent of choice of . We extend this correspondence to order theoretic structure of Mealy machines.
Lemma 5.3**.**
Let be Mealy machines from to such that . If and , then .
Proof.
By induction on the size of , we can show that if is defined, then is defined and they are the same. Then follows from the definition of . ∎
Theorem 5.1**.**
For Mealy machines ,
[TABLE]
Proof.
If , then because and are representatives of and respectively, we have . If , then there are and such that
- •
and ,
- •
the graph relation of is a subset of the graph relation of .
By Lemma 5.3, we see that . ∎
Theorem 5.2**.**
The set of equivalence classes of Mealy machines from to with the partial order is an -cpo.
Proof.
Let be an upper bound of an -chain
[TABLE]
By Theorem 5.1, we have
[TABLE]
We define a Mealy machine by
- •
,
- •
,
- •
.
Because , the equivalence class is an upper bound of the -chain . We also have because . ∎
5.4 Constructions on Mealy Machines
It is now time to give some constructions which are the basic building blocks of our Mealy machine semantics. This section consists of three parts. The first part (from Section 5.4.2 to Section 5.4.5) is related to the linear -calculus and is serves to model the purely functional features of , such as -abstraction and function application. In the second part (Section 5.4.6 and Section 5.4.7), we give Mealy machines modelling real numbers and measurable functions. In the last part (from Section 5.4.9 to Section 5.4.11), we introduce a state monad and associate the monad with Mealy machines modelling and .
5.4.1 Composition
Let , and be -objects, and let , be Mealy machines. We can now define their composition . Before giving a precise definition, some intuitive explanation about is in order. The main idea is to define as a Mealy machine obtained by connecting and in the following manner:
\mathsf{M}$$\mathsf{N}$$\mathsf{X}$$\mathsf{Y}$$\mathsf{Z}
.
The following series of thick arrows
\mathsf{M}$$\mathsf{N}$$y_{0}$$y_{1}$$y_{2}$$y_{3}$$z$$z^{\prime}
illustrates an execution of the obtained Mealy machine. Given an input from an edge, and engage in some interactive communication, and at some point, some output is produced. Because performs “parallel composition plus connecting,” the state space of should be , and the initial state should be . The transition function of should be given by the collection of all possible interaction paths between and .
Let us give a precise definition. For Mealy machines and , we define the state space and the initial states of by , and we define the transition function by
[TABLE]
where the are restrictions of the following partial measurable function
[TABLE]
and the above join is with respect to the inclusion order between graph relations. The above join is measurable because measurable sets are closed under countable joins. It is tedious but doable to check that the above join always exists and that the composition is compatible with behavioural equivalence and satisfies associativity modulo behavioural equivalence. We define a Mealy machine by . This is the unit of the composition modulo behavioural equivalence.
5.4.2 Monoidal Products
Monoidal Products of Int-objects
We introduce monoidal products of -objects and their diagrammatic presentation. For -objects and , we define a -object by
[TABLE]
We define an -object to be . We write for .
Let be -object. We depict a Mealy machine from to as a node with edges labeled by on the left hand side and edges labeled by on the right hand side:
\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}
.
We do not draw any edges on the left/right hand side when the domain/codomain of is :
\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{M}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}
The diagrammatic presentation of monoidal products allows for an intuitive description of transition functions. For example, we can depict transitions
[TABLE]
for some , , , and as follows:
\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$$y$$x^{\prime}$$s/t$$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$$x$$y^{\prime}$$s/t^{\prime}
We note that there are several ways to present a Mealy machine such as
\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}
, \mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\mathsf{X}_{1}\otimes\cdots\otimes\mathsf{X}_{n-1}
, \mathsf{M}$$\mathsf{Y}_{1}\otimes\cdots\otimes\mathsf{Y}_{m}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}
.
Monoidal Product of Mealy Machines
We give monoidal products of Mealy machines. For Mealy machines and , we define a Mealy machine by: , and is given by
[TABLE]
It is not difficult to check that the monoidal product is compatible with behavioural equivalence.
We depict as follows:
\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}
As indicated by the above diagram, consists of two sub-machines and working independently. For example, if we have
\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$$s_{0}/s_{1}$$t_{0}/t_{1}$$z$$z^{\prime}$$w$$y
then has the following transitions:
\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$$s_{0}/s_{1}$$t/t$$z$$z^{\prime}$$\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$$t_{0}/t_{1}$$s/s$$w$$y
for all and for all .
Convention 5.2**.**
We do the following identification:
- •
We identity with by the canonical isomorphism
- •
*We identify and with by the unit laws and .
5.4.3 Axiom Link and Cut Link
For an -object , we define to be , and we define token machines
[TABLE]
by and . We depict them by single edges
\mathsf{X}$$\mathsf{X}^{\bot}$$\mathsf{X}^{\bot}$$\mathsf{X}
respectively. This is compatible with behaviour of these Mealy machines: if we give an input to an edge, then we will get the same value from the other end of the edge. For example, for any , we have
x$$x$$x$$x
.
5.4.4 Symmetry
Let and be -objects. We define a token machine by letting its transition function be the canonical isomorphism
[TABLE]
We depict by a crossing:
\mathsf{X}$$\mathsf{Y}$$\mathsf{X}$$\mathsf{Y}$$x$$x$$y$$y
As arrows in the right hand side indicate, given an input from an edge in one side, outputs the same value to the corresponding edge on other side.
5.4.5 A Modal Operator
We give a constructor on Mealy machines that corresponds to the resource modality in linear logic. For an -object , we define an -object by
[TABLE]
We can informally regard as a countable monoidal power . Following this intuition, we extend the action of to Mealy machines. Let be a Mealy machine. We define a Mealy machine by: the state space of is defined to be associated with the least -algebra such that for all ,
[TABLE]
the initial state is ; the transition function is the unique partial measurable function satisfying
[TABLE]
for all . Here, are the th injections, and sends to .
As is defined to be a countable monoidal power, behaves as a parallel composition of countably infinite copies of . For example, if we have
\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$$s/s^{\prime}$$y$$x
then for all and , we have
\oc\mathsf{M}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$(t_{1},\ldots,t_{n-1},s,t_{n},t_{n+1},\ldots)/(t_{1},\ldots,t_{n-1},s^{\prime},t_{n},t_{n+1},\ldots)$$(n,y)$$(n,x)
.
In other words, given an input whose first entry is , then the th copy of handles the input, and there is no side effect to the other copies of .
Proposition 5.3**.**
The operator is compatible with the behavioral equivalence and is functorial. Namely,
- •
for all Mealy machines , if , then ; and
- •
for all Mealy machines and ,
[TABLE]
- •
.
Convention 5.3**.**
For the sake of legibility and due to lack of space, we sometimes implicitly identify with by the canonical isomorphism
Under the above convention, for Mealy macines and , we can simply write . It is not difficult to see that when and for some , we have .
Dereliction
For an -object , we define a token machine by defining by
[TABLE]
The Mealy machine pops/pushes indices with probability . Namely, we have
\mathsf{d}_{\mathsf{X}}$$\mathsf{X}$$\oc\mathsf{X}$$x$$(n,x)
\mathsf{d}_{\mathsf{X}}$$\mathsf{X}$$\oc\mathsf{X}$$x$$(0,x)
for all , and . Hence, for any Mealy machine , if we have
\mathsf{M}$$\mathsf{X}$$x$$x^{\prime}$$s/s^{\prime}
for some , and , then has the following transition:
\oc\mathsf{M}$$\oc\mathsf{X}$$(0,x)$$(0,x^{\prime})$$(s,s_{1},s_{2},\ldots)/(s^{\prime},s_{1},s_{2},\ldots)$$\mathsf{d}_{\mathsf{X}}$$\mathsf{X}$$x$$x^{\prime}
for all .
Proposition 5.4**.**
For any Mealy machine ,
[TABLE]
Diagrammatically, we have
\oc\mathsf{M}$$\mathsf{d}_{\mathsf{X}}$$\oc\mathsf{X}$$\mathsf{X}$$\simeq$$\mathsf{M}$$\mathsf{X}
.
Digging and Contraction
For natural numbers , we write for the Cantor pairing , and we write and for unique natural numbers such that . For an -object , let and be stateless deterministic Mealy machines whose transition functions
[TABLE]
are given by
[TABLE]
These stateless Mealy machines and behave as follows: for all ,
\mathsf{dg}_{\mathsf{X}}$$\oc\oc\mathsf{X}$$\oc\mathsf{X}$$(n,(m,x))$$(\langle n,m\rangle,x)
\mathsf{c}_{\mathsf{X}}$$\oc\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$$(2n,x)$$(n,x)$$(2n+1,x)$$(n,x)
Proposition 5.5**.**
For any Mealy machine ,
[TABLE]
Diagrammatically, we have
\oc\mathsf{M}$$\mathsf{dg}_{\mathsf{Y}}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\oc\mathsf{Y}$$\simeq$$\mathsf{dg}_{\mathsf{Y}}$$\oc\oc\mathsf{M}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\oc\mathsf{Y}
\oc\mathsf{M}$$\mathsf{c}_{\mathsf{X}}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\mathsf{Y}$$\oc\mathsf{Y}$$\simeq$$\mathsf{c}_{\mathsf{X}}$$\oc\mathsf{M}$$\oc\mathsf{M}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\mathsf{Y}$$\oc\mathsf{Y}$$\oc\mathsf{X}
.
Weakening
We define a token machine by
[TABLE]
Because the identity is the only Mealy machine from to (up to behavioural equivalence), we see that for any Mealy machine ,
[TABLE]
This behavioural equivalence means that we can remove
\mathsf{M}$$\mathsf{w}_{\mathsf{X}}$$\mathsf{X}
from any diagram.
5.4.6 Real Numbers
We define an -object to be where is the measurable space of all finite sequences of real numbers endowed with the following -algebra
[TABLE]
For , we define a token machine by
[TABLE]
The transition means that given a query from environment, answers its value by appending to . We will use as a stack. See Section 5.4.7 and Section 5.4.10.
5.4.7 Measurable Functions
We associate a measurable function with a token machine . For simplicity, we define for and . When , the transition function is given by
[TABLE]
We explain how simulates by describing execution of for a real number . As in the following diagram, given an input from the right -edge, sends to the left -edge in order to obtain the value of its argument. The return value to from is , by which sees that its argument is . Then, outputs . As a whole, the following Mealy machine is behaviourally equivalent to .
\mathsf{fn}_{f}$$\mathsf{r}_{a}$$u$$a\mathbin{::}u$$\mathsf{R}$$\mathsf{R}$$u$$f(a)\mathbin{::}u
When , the transition function of is given by
[TABLE]
As in the following diagram, given an input from the right -edge, first sends to the lower -edge in the left hand side in order to obtain the value of its first argument. The return value to from is . Next, sends to the upper -edge in the left hand side. Then returns . Now, sees that its first argument is and its second argument is . Finally, outputs .
\mathsf{fn}_{f}$$\mathsf{r}_{a}$$\mathsf{r}_{b}$$u$$a\mathbin{::}u$$a\mathbin{::}u$$b\mathbin{::}a\mathbin{::}u$$u$$f(a,b)\mathbin{::}u
For general cases, may have more arguments, and sequentially sends queries to its arguments storing partial information about its arguments on finite sequences of real numbers.
5.4.8 Conditional Branching
For an -object such that is a measurable subspace of , we define
[TABLE]
to be a token machine whose transition function
[TABLE]
is given by
[TABLE]
For a real number and Mealy machines , we describe execution of Given an input , then tries to check whether is zero or not by sending to the -edge. There are two cases: (i) if is [math], then returns , and forwards to the middle -edge; (ii) if is not [math], say , then returns , and forwards to the upper -edge:
\mathsf{cd}$$\mathsf{r}_{0}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$u$$0\mathbin{::}u$$u$$x$$u$$x(i)
\mathsf{cd}$$\mathsf{r}_{0}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$u$$0\mathbin{::}u$$u$$x$$u$$x(ii)
Because in both cases, all outputs from and are sent to the -edge in the right hand, we see that simulates when and simulates when .
Proposition 5.6**.**
For and for Mealy machines , we have
[TABLE]
Diagrammatically, we have
\mathsf{cd}$$\mathsf{r}_{0}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$\simeq$$\mathsf{M}$$\mathsf{X}
and for any ,
\mathsf{cd}$$\mathsf{r}_{a}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$\simeq$$\mathsf{N}$$\mathsf{X}
Proof.
When , the first behavioral equivalence is realized by the first projection from to . When , the first behavioral equivalence is realized by the second projection from to . The second behavioral equivalence is realized by the obvious measurable function from to . ∎
5.4.9 A State Monad
Let be the subspace of consisting of all finite sequences of real numbers in . Recall that is “the set of states” in sampling-based operational semantics and our idea is to model and by a state monad. In this section, we give a state monad that we use in our Mealy machine semantics. We define -objects and by
[TABLE]
Then is a state monad (on ) because for any -object , we have The unit and the multiplication of this monad are:
[TABLE]
where and . Note that is equal to . We can depict the unit and the multiplication as follows:
\mathsf{X}$$\mathsf{e}$$\mathsf{S}$$\mathsf{X}$$\mathsf{m}$$\mathsf{S}$$\mathsf{S}$$\mathsf{S}
.
5.4.10 Scoring
We define to be a token machine from to whose transition function is given by
[TABLE]
The token machine simulates scoring as follows:
\mathsf{r}_{a}$$\mathsf{sc}$$\mathsf{R}$$\mathsf{S}$$b\mathbin{::}u$$a\mathbin{::}b\mathbin{::}u$$(b,u)$$(|a|b,u)
.
5.4.11 Sampling
We define a Mealy machine by: the state space is defined to be , and the initial state is , and the transition function
[TABLE]
is given by
[TABLE]
As we explained in Section 4.2, the Mealy machine simulates the evaluation rule :
\mathsf{sa}$$\oc\mathsf{R}$$\mathsf{S}$$\ast/b$$(a,b\mathbin{::}u)$$(a,u)$$\mathsf{sa}$$\oc\mathsf{R}$$\mathsf{S}$$b/b$$(n,u)$$(n,b\mathbin{::}u)
.
Namely, pops from the trace, and then answers queries that the result of sampling is .
5.5 Diagrammatic Reasoning
We now give a brief remark on diagrammatic presentation of Mealy machines. The diagrammatic presentation of a Mealy machine is not only for intuitive explanation, but also for rigorous reasoning about behavioural equivalence. This follows from some categorical observation. Let be the category of -objects and behavioural equivalence classes of Mealy machines, where composition is induced by the composition of Mealy machines. We will give a proof of the followng proposition in the next section.
Proposition 5.7**.**
*The category is a compact closed category. The dual of an -object is . The unit and the counit arrows are and . *
Therefore, as a consequence of the coherence theorem for compact closed categories [44, 38], we see that graph isomorphism preserves behavioural equivalence.
Proposition 5.8**.**
If two Mealy machines have the same diagrammatic presentation modulo some rearrangement of edges and nodes, then they are behaviourally equivalent.
For example, for all Mealy machines and , we have
\mathsf{M}$$\mathsf{N}$$\mathsf{W}$$\mathsf{Y}$$\mathsf{Z}$$\mathsf{X}$$\simeq$$\mathsf{M}$$\mathsf{N}$$\mathsf{Y}$$\mathsf{W}$$\mathsf{X}$$\mathsf{Z}
.
5.6 Proof of Proposition 5.7
5.6.1 The Category of Partial Measurable Functions
For some basic categorical notions, see standard text books such as [45]. We define to be the category of measurable spaces and partial measurable functions. In , the empty space is the initial object, and the coproduct space is the coproduct of and in the categorical sense. We write
[TABLE]
for the left/right injections. For partial measurable functions and , we define to be the cotupling of and . For partial measurable functions and , we define partial measurable functions and by
[TABLE]
We note that and are symmetric monoidal categories. We also note that distributes over the coproducts, i.e., the canonical arrow
[TABLE]
is an isomorphism.
The notion of trace introduced by Joyal, Street and Verity [26] plays important role in this section.
Definition 5.3**.**
Let be a symmetric monoidal category. A trace operator on is a family satisfying the following axioms:
- •
(Dinaturality) For all -arrows , and ,
[TABLE]
- •
(Sliding) For all -arrows , ,
[TABLE]
- •
(Vanishing I) For all -arrows ,
[TABLE]
- •
(Vanishing II) For all -arrows ,
[TABLE]
- •
(Superposing) For all -arrows ,
[TABLE]
- •
(Yanking) For all ,
[TABLE]
where is the brading.
A symmetric monoidal category endowed with a trace operator is called a traced symmetric monoidal category.
We give a trace operator on . The symmetric monoidal category is enriched over , which is the cartesian category of pointed -cpos and continuous functions. The partial order on a hom-set is given by
[TABLE]
The least arrow is the empty partial measurable function. The -enrichment induces an iterator
[TABLE]
given by
[TABLE]
The operator induces another operator
[TABLE]
given by
[TABLE]
Concretely, for a partial measurable function ,
[TABLE]
if and only if either or there is a finite sequence such that
[TABLE]
Proposition 5.9**.**
The family of operators is a trace operator of the symmetric monoidal category . Furthermore, the trace operator is uniform [46] with respect to partial measurable functions : for all partial measurable functions , and , if
[TABLE]
commutes, then
[TABLE]
Proof.
It is straightforward to adapt the argument in [47, Section A]. ∎
We will use the next proposition to construct a trace operator for Mealy machines.
Proposition 5.10**.**
For any partial measurable function and a measurable space ,
[TABLE]
Proof.
For any , we show that
[TABLE]
where we identify with the arrow from to that sends to . Because
[TABLE]
it follows from uniformity that
[TABLE]
By dinaturality, we obtain
[TABLE]
Since this is true for any , we see that is equal to
[TABLE]
∎
5.6.2 The Category of Mealy Machines
Definition 5.4**.**
We define a category by:
- •
objects are -objects ; and
- •
arrows are behavioural equivalence classes of Mealy machines from to .
We denote a wide subcategory of consisting of -object such by .
Intuitively, while arrows in are bidirectional Mealy machines, arrows in are “one-way” Mealy machines. We consider the wide subcategory because categorical structure of is easier to describe than that of , and categorical structure of is induced by that of .
The identity arrow and the composition of is given by the identity Mealy machine and the composition of Mealy machine:
[TABLE]
Concrete description of the composition of Mealy machines between -objects is easy: for -objects and , and for Mealy machines and , the composition consists of:
- •
;
- •
;
- •
given by
[TABLE]
The composition of transition functions makes sense because . From this concrete description, it is easy to check that the composition of -arrows is well-defined. In fact, if realizes and realizes and , then realizes . Therefore, the symmetric transitive closure is compatible with the composition.
Proposition 5.11**.**
The category with is a symmetric monoidal category where the monoidal product of -arrows and is given by
[TABLE]
Proof.
It is easy to see that objects in are closed under the monoidal product of -objects. Thanks to simplicity of the composition of -arrows, we can easily check that the monoidal product of Mealy machines between -objects is compatible with behavioural equivalence and that is a symmetric monoidal category. ∎
Furthermore, inherits the trace operator of . For a -arrow , we define a -arrow to be the equivalence class of a Mealy machine given by
[TABLE]
and
[TABLE]
Proposition 5.12**.**
The family of operators is a trace operator on the symmetric monoidal category .
Proof.
Well-definedness of follows from uniformity of the trace operator on . Sliding, vanishing I, vanishing II, superposing and yanking for follow from that of . Dinaturality for follows from dinaturality of and Proposition 5.10. ∎
We recall the notions of -construction [26] and compact closed category.
Definition 5.5** (-construction).**
Let be a traced symmetric monoidal category. We define a category by:
- •
objects are pairs of -objects;
- •
arrows from to are -arrows from to .
The identity on is given by the identity on , and the composition of -arrows and is given by
[TABLE]
Here, we omit some coherence isomorphisms.
Definition 5.6**.**
A compact closed category is a symmetric monoidal category with a function and families of -arrows
[TABLE]
such that
[TABLE]
For , the object is called the dual object of .
Theorem 5.3** ([26]).**
The category is a compact closed category. The unit and the monoidal product are given by
[TABLE]
The dual object of is . The unit arrow and the counit arrow are given by
[TABLE]
Corollary 5.1**.**
The category is a compact closed category. The monoidal structure is given by , and the unit and the counit are given by and respectively.
Proof.
It is straightforward to check that is isomorphic to , and the compact closed structure is given by data provided in Section 5. ∎
6 Mealy Machine Semantics for
We interpret a type as the -object given by
[TABLE]
We define interpretation of contexts by
[TABLE]
When is the empty sequence, we define to be .
For interpreting conditional branching, we use the following proposition.
Proposition 6.1**.**
For any type , there is a partial measurable embedding .
Proof.
We first define an embedding from to by induction on . We note that for any type , we have . Base cases are easy. For induction step,
[TABLE]
The statement follows from . ∎
We interpret terms and values by
[TABLE]
inductively defined by diagrams in Figure 4. In these definitions, when we can infer and , we simply write and for and respectively, and we often apply Convention 5.3 to these Mealy machines. Extracting precise definitions from these diagrams would be easy.
7 Adequacy Theorems
Finally, we give our main results. In the proof of our adequacy theorems, we use logical relations, diagrammatic reasoning of Mealy machines (Proposition 5.8), the domain theoretic structure of Mealy machines (Proposition 5.1), and Fubini-Tonelli theorem.
7.1 Sampling-Based Operational Semantics
For a closed term , we define a partial measurable function as follows:
- •
for , if there are such that
[TABLE]
i.e., if we have the following transitions:
\llbracket\mathtt{M}\rrbracket$$\oc\mathsf{R}$$\mathsf{S}$$s_{\llbracket\mathtt{M}\rrbracket}/s$$(a,u)$$(a^{\prime},\varepsilon)$$\llbracket\mathtt{M}\rrbracket$$\oc\mathsf{R}$$\mathsf{S}$$s/s^{\prime}$$(0,\varepsilon)$$(0,b\mathbin{::}\varepsilon)
then we define to be ;
- •
otherwise, is undefined.
Theorem 7.1** (Adequacy).**
For any closed term and for any , we have
[TABLE]
Corollary 7.1**.**
For any closed term , partial functions and given as follows
[TABLE]
are partial measurable functions.
7.2 Distribution-Based Operational Semantics
For a closed term , we define measurable functions and by
[TABLE]
Then we define a measure on by:
[TABLE]
Theorem 7.2** (Adequacy).**
For any closed term , we have
It follows from our adequacy theorems that sampling-based operational semantics induces distribution-based operational semantics.
Corollary 7.2**.**
For any closed term ,
[TABLE]
A result analogous to Corollary 7.2 has already been proved by way of a purely operational (and quite laburious) argument in an untyped setting where score is not available in its full generality [28]. Here, it is just an easy corollary of our adequacy theorems.
8 Proof of Adequacy Theorems
Lemma 8.1**.**
For any term and for any closed value ,
[TABLE]
Proof.
By induction on . ∎
Lemma 8.2**.**
For all closed terms , if , then .
Proof.
By case analysis. For the case of recursion, see Proposition 9.2 in Section 9. ∎
We first prove soundness.
Proposition 8.1**.**
For any closed term and for any , if , then .
Proof.
By induction on the length of . (Base case) Easy. (Induction step) By case analysis on the first evaluation step of .
- •
If the first evaluation step is of the form for some , then by Lemma 8.2, we have . Because , by induction hypothesis, we obtain . Hence, .
- •
If the first evaluation step is of the form , then by induction hypothesis, we have . Therefore, by the definition of the Mealy machine , we see that is .
- •
If the first evaluation step is of the form , then by induction hypothesis, is . Therefore, by the definition of the Mealy machine , we see that is .
∎
It remains to prove that implies that . We use logical relations. We define a binary relation between closed terms of type and Mealy machines from to by
[TABLE]
where is a partial measurable function given by: for each ,
- •
if there are states such that
[TABLE]
i.e., if we have the following transitions:
\mathsf{M}$$\oc\mathsf{R}$$\mathsf{S}$$s_{\mathsf{M}}/s$$(a,u)$$(a^{\prime},\varepsilon)$$\mathsf{M}$$\oc\mathsf{R}$$\mathsf{S}$$s/s^{\prime}$$(0,\varepsilon)$$(0,b\mathbin{::}\varepsilon)
then we define to be ;
- •
otherwise, is undefined.
We then inductively define binary relations
[TABLE]
by
[TABLE]
We list some properties of the logical relations.
Lemma 8.3**.**
Let be a type.
If , then . 2. 2.
If and , then . 3. 3.
If and , then . 4. 4.
If and , then . 5. 5.
For any closed term , where is a token machine whose transition function is the empty partial measurable function. 6. 6.
For any closed value , . 7. 7.
If and , then where is the least upper bound of the -chain .
Proof.
We can check these items by unfolding the definition of and the logical relations. ∎
Lemma 8.4** (Basic Lemma).**
Let be a context.
- •
For any term and for any for , we have
[TABLE]
- •
For any value and for any for , we have
[TABLE]
Proof.
By induction on and . Most cases follow from Lemma 8.3. For and , we check the statement by unfolding the definition of and . Here, we only check for and .
- •
When , for any in , if
[TABLE]
then by the definition of , we see that for some and such that
[TABLE]
Because , we obtain . Hence,
[TABLE]
- •
When , for simplicity, we suppose that is a closed term. By induction hypothesis, we can check that
[TABLE]
by induction on . Because is the least upper bound of (Proposition 9.1), we obtain by Lemma 8.3.
∎
Theorem 8.1**.**
For any closed term and for any , we have
[TABLE]
Proof.
If , then we have by Proposition 8.1. If , then because is an element of , we obtain by Lemma 8.4. ∎
9 Approximation Lemma
Let be a Mealy machine. In this section, we give a way to calculate a Mealy machine given by
[TABLE]
Diagrammatically, consists of digging, contraction and a feed back loop:
\oc\mathsf{M}$$\mathsf{dg}$$\mathsf{c}$$\mathsf{M}$$\oc\oc\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$$\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}
This construction already appeared in the interpretation of the fixed point operator. In fact, for a term , we have .
The goal of this section is to show that is a fixed point of and can be approximated by a family of Mealy machines
[TABLE]
9.0.1 Parametrized Modal Operator and Parametrized Loop Operator
We introduce parametrization of the modal operator . For a subset and for a Mealy machine , we define a Mealy machine by: the state space and the initial state of are given by
[TABLE]
and is given by
[TABLE]
where and vary over the corresponding sets. For example, if we have
\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$$s/s^{\prime}$$x$$y
,
then for any and for any , we have
\oc\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$$(t_{1},\ldots,t_{n-1},s,t_{n+1},\ldots)/(t_{1},\ldots,t_{n-1},s^{\prime},t_{n+1},\ldots)$$(n,x)$$(n,y)
whenever . When , there is no output from . We can think as a “restriction” of to . In fact, is equal to .
We are interested in restrictions of to subsets inductively given by
[TABLE]
The definition of and are motivated by the following lemma.
Lemma 9.1**.**
For any and for any , we have
[TABLE]
By means of , we can also parametrize the operator . For , and for , we define by
[TABLE]
It is easy to see that is equal to .
Lemma 9.2**.**
For any Mealy machine , we have
[TABLE]
Proof.
See Figure 5. ∎
Lemma 9.3**.**
For any Mealy machine and for any ,
[TABLE]
Proof.
We prove the statement by induction on . The base case follows from that the transition functions of and are equal to the empty partial function. We next check the induction step. We have
[TABLE]
∎
Proposition 9.1**.**
For a Mealy machine , we inductively define by
[TABLE]
For all , we have
[TABLE]
and
[TABLE]
Hence, we have an ascending chain
[TABLE]
and is the least upper bound of the ascending chain .
Proof.
It follows from the definition of , we have
[TABLE]
for all , and
[TABLE]
Hence, by the definition of the composition and the monoidal product, we have
[TABLE]
for all , and
[TABLE]
It remains to check . We show this by induction on . For the base case, we have because these Mealy machines and are behaviorally equivalent to . For the induction step,
[TABLE]
Because is equal to , we obtain . ∎
Proposition 9.2**.**
For any Mealy machine ,
[TABLE]
Proof.
Because and the composition of Mealy machines are continuous, we have
[TABLE]
∎
10 How About S-Finite Kernels?
The reader experienced with the semantics of probabilistic programming languages have probably already wondered whether a GoI model for could be given out of s-finite kernels instead of measurable functions, following Staton’s work on the semantics of a first-order probabilistic programming language [32].
The answer is indeed positive: the kind of construction we have presented in Section 5 can in fact be adapted to the category of measurable spaces and s-finite kernels. The latter, being traced monoidal, has all the necessary structure one needs [27]. What one obtains proceeding this way is indeed a GoI model, but adequate only for the distribution-based operational semantics.
The interpretation of any program in this alternative GoI can be seen as structurally identical to the one from Section 5 once the sample and score operators are interpreted as usual, namely as those s-finite kernels which actually perform sampling and scoring internally. Below, we first recall the definition of s-finite kernel, and then we introduce Mealy machines whose transition is described in terms of an s-finite kernel, and we give some basic Mealy machines. Finally, we give an adequate GoI model for the distribution-based operational semantics.
Being adequate for the distribution-based semantics directly (and not by way of integration as in Theorem 7.2) has the pleasant consequence of validating a number of useful program transformations, and in particular commutation of sampling and scoring effects, see [28] for a thorough discussion about this topic, and about how s-finite kernels are a particularly nice way of achieving commutativity in presence of scoring.
10.1 S-finite Kernels
Let be a kernel. We say that is finite when there is a real number such that for all and , we have . An s-finite kernel is a kernel such that there is a countable family of finite kernels such that for all and . It is easy to see that s-finite kernels are closed under the pointwise addition. We write for the pointwise addition of s-finite kernels . A (sub)probability kernel is a kernel such that is a (sub)probability measure on for all . Every (sub)probability kernel is a finite kernel.
Every measurable function gives rise to a probability kernel given by
[TABLE]
We denote the probability kernel induced by the identity measurable function by . Concretely, this is given by .
We recall two constructions of s-finite kernels.
- •
(Composition) For s-finite kernels and , we define an s-finite kernel by
[TABLE]
The composition of s-finite kernels is associative and satisfies the unit laws, namely, we have and .
- •
(Tensor product) For s-finite kernels and , we define an s-finite kernel to be the unique s-finite kernel such that for all and for all and ,
[TABLE]
The tensor product and the coproduct of s-finite kernels is functorial. This means that these constructors are compatible with the composition and preserve identities. The following proposition summarizes catagorical status of these structures.
Proposition 10.1**.**
The category of measurable spaces and s-finite kernels with forms a symmetric monoidal category where the unit object is . The object is the zero object, and with
[TABLE]
forms the coproduct of and where and are the first and the second injections. Furthermore, the monoidal product distributes over the coproducts. Namely, the canonical s-finite kernel given by
[TABLE]
is a natural isomorphism.
Proof.
For associativity of the composition, see [32, Lemma 3], and for functoriality of , see [32, Proposition 5]. It is not difficult to check that the category of measurable spaces and s-finite kernels associated with and forms a symmetric monoidal category. For s-finite kernels and , the cotupling is given by
[TABLE]
It follows from universality of coproducts that is a natural isomorphism. ∎
For s-finite kernels and , we define an s-finite kernel by
[TABLE]
This is the unique s-finite kernel satisfying
[TABLE]
10.2 Probabilistic Mealy Machine
Definition 10.1**.**
For -objects and , a probabilistic Mealy machine from to consists of
- •
a measurable space called the state space of ;
- •
an element called the initial state of ;
- •
an s-finite kernel called the transition relation.
When is a probabilistic Mealy machine from to , we write .
We can regard a Mealy machine as a probabilistic Mealy machine from to by identifying the transition function with the correspondnig s-finite kernel . In the sequel, we confuse Mealy machines (and token machines) with corresponding probabilistic Mealy machines.
Let and be -object. Just like Mealy machines, we depict a probabilistic Mealy machine from to as a box with edges labeled by on the left hand side and edges labeled by on the right hand side:
\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}
,
and we depict transitions as arrows. For example, when , we depict
[TABLE]
for , and as the following arrow
\mathsf{M}$$s/s_{1}$$\mathsf{Y}_{1}$$\mathsf{X}_{1}$$y$$0.4$$x
where the positive real on the arrow indicate probabilities of the transition. Below, we may omit states and probabilities of transitions when they are not important or are easy to infer.
10.3 Behavioral Equivalence
We give an equivalence relation between probabilistic Mealy machines so as to identify probabilistic Mealy machines that behaves in the same way. Let and be -objects, and let and be probabilistic Mealy machines from to . We write when there is a measurable function such that and the following diagram commutes:
[TABLE]
We define an equivalence relation to be the symmetric transitive closure of . A probabilistic Mealy machine is behaviorally equivalent to when we have . When we can infer subscripts of , we omit them. We say that a measurable function realizes a behavioral equivalence (realizes ) when is witnessed by .
10.4 Construction of probabilistic Mealy Machines
We introduce probabilistic Mealy machines and their constructions that are building blocks of our denotational semantics. Most of them are adoptation of Mealy machines in Section 5.4, and we just give their formal definitions.
10.4.1 Composition/Cut
For probabilistic Mealy machines and , we define the state space and the initial states of by , and we define the transition relation by
[TABLE]
where are restrictions of the following s-finite kernel
[TABLE]
namely, the s-finite kernels satisfies
[TABLE]
Here, the horizontal arrows consists of the injection from into followed by distributivity and symmetry. For example, when , the upper horizontal arrow is given by
[TABLE]
Joins in the definition of the composition of probabilistic Mealy machines are the pointwise ordoer. We can check that the composition of probabilistic Mealy machines is compatible with behavioural equivalence and that -objects and the composition of probabilistic Mealy machines is a category where the identity on an -object is (regarded as a probabilistic Mealy machine).
10.4.2 Monoidal Products
We give monoidal products of probabilistic Mealy machines. For probabilistic Mealy machines and , we define a probabilistic Mealy machine by: , and is given by
[TABLE]
It is not difficult to check that the monoidal product is compatible with behavioural equivalence. We depict as follows:
\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}
For and , we adopt the same diagrammatic presentation.
10.4.3 A Modal Operator
Let be a probabilistic Mealy machine. We define a probabilistic Mealy machine by: the state space of is defined to be associated with the least -algebra such that for all ,
[TABLE]
the initial state is ; the transition function is the unique partial measurable function satisfying
[TABLE]
for all . Here, are the th injections, and sends to .
10.4.4 Diagrammatic Reasoning on Probabilistic Mealy Machines
Diagrammatic reasoning is valid also for probabilistic Mealy machines.
Proposition 10.2**.**
The category of -object and probabilistic Mealy machines (modulo behavioural equivalence) is a compact closed category. The dual of an -object is . The unit and the counit arrows are and .
Proposition 10.3**.**
If two probabilistic Mealy machines have the same diagrammatic presentation modulo some rearrangement of edges and nodes, then they are behaviourally equivalent.
We can check Proposition 10.2 by replacing the category of partial measurable functions by the category of s-finite kernels in Section 5.6.
10.4.5 A State Monad
We define an -object by and define an -object by . Then is a state monad (on ), whose unit and multiplication are given by:
[TABLE]
where and .
10.4.6 Scoring
We construct a probabilistic Mealy machine by:
[TABLE]
and
[TABLE]
The probabilistic Mealy machine simulates scoring as follows:
\mathsf{r}_{a}$$\mathsf{Sc}$$\mathsf{R}$$\mathsf{J} |a|$$\varepsilon$$a\mathbin{::}\varepsilon$$\ast$$\ast
.
10.4.7 Sampling
We define a Mealy machine by: the state space is defined to be , and the initial state is , and the transition function
[TABLE]
is given by
[TABLE]
The probabilistic Mealy machine behaves as follows:
- •
In the initial state , given from the -edge, draws a real number from the uniform distribution and stores the real number:
\mathsf{Sa}$$\oc\mathsf{R}$$\mathsf{J}$$\ast/a$$\ast$$\ast
.
For example, the probability of the state being a real number in after this transition is .
- •
After this transition, returns to each “query” :
\mathsf{Sa}$$\oc\mathsf{R}$$\mathsf{J}$$a/a$$(n,u)$$(n,b\mathbin{::}u)
.
11 Probabilistic Mealy Machine Semantics for
We interpret a type as the -object given by
[TABLE]
We define interpretation of contexts by
[TABLE]
When is the empty sequence, we define to be .
We interpret terms and values by
[TABLE]
inductively defined by diagrams in Figure 6 where Mealy machines are regarded as probabilistic Mealy machines in the obvious manner.
11.1 Soundness and Adequacy
11.1.1 Observation
Let be a probabilistic Mealy machine. We define s-finite kernels and by
[TABLE]
Then we define a measure on to be . Intuitively, is a measure that describes distribution of real numbers obtained by the following process:
- •
We first input to the -wire of .
- •
If outputs to the -wire, then we input to the -wire of .
- •
We only observe outputs of the form for some .
For example, is the uniform distribution over .
Theorem 11.1** (Soundness and Adequacy).**
For any closed term , if , then .
Below, we give a proof of Theorem 11.1.
11.1.2 Proof of Adequacy Theorem
Lemma 11.1**.**
For any term and for any closed value ,
[TABLE]
Proof.
By induction on . ∎
Lemma 11.2**.**
For all closed terms , if , then .
Proof.
By case analysis. For the case of recursion, see Corollary 11.1. ∎
We first prove soundness.
Proposition 11.1**.**
For any closed term , if , then .
Proof.
By induction on . (Base case) Easy. (Induction step) By case analysis.
- •
If , then .
- •
If and , then .
- •
If and , then by the definition of , we see that .
- •
If and for some finite kernel , then by the definition of and , we see that
[TABLE]
for some such that . Hence,
[TABLE]
∎
It remains to prove that implies . We use logical relations. We define a binary relation between closed terms of type and probabilistic Mealy machines from to by
[TABLE]
We then inductively define binary relations
[TABLE]
by
[TABLE]
We list some properties of the logical relations.
Lemma 11.3**.**
Let be a type.
If , then . 2. 2.
If and , then . 3. 3.
If and , then . 4. 4.
If and , then . 5. 5.
For any closed term , where is a token machine whose transition function is the zero kernel. 6. 6.
For any closed value , . 7. 7.
If and and and , then where is given by , , .
Proof.
We can check these items by unfolding the definition of and the logical relations. ∎
Lemma 11.4** (Basic Lemma).**
Let be a context.
- •
For any term and for any for , we have
[TABLE]
- •
For any value and for any for , we have
[TABLE]
Proof.
By induction on and . Most cases follow from Lemma 11.3. For and , we check the statement by unfolding the definition of and . Here, we only check for and .
- •
When , let be a pair in . We write for . By the definition of and , we have
[TABLE]
for some such that . Hence,
[TABLE]
- •
When , for simplicity, we suppose that is a closed term. By induction hypothesis, we can check that
[TABLE]
by induction on . By Lemma 11.3 and Proposition 11.2, we obtain .
∎
Theorem 11.2**.**
For any closed term , if , then
[TABLE]
Proof.
By soundness, we have . On the other hand, because is an element of , we obtain the other inequality by Lemma 11.4. ∎
11.1.3 Induction step on recursion
Our Goal: Approximation Lemma
Let be a Mealy machine. In this section, we show that a Mealy machine given by
[TABLE]
is a “least” fixed point of . Diagrammatically, consists of digging, contraction and a feed back loop:
\mathsf{dg}_{\mathsf{X}}$$\oc\mathsf{M}$$\mathsf{con}_{\mathsf{X}}$$\mathsf{M}$$\oc\oc\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$$\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}
This construction already appeared in the interpretation of the fixed point operator. In fact, for a term , we have .
Parametrized Modal Operator and Parametrized Loop Operator
We introduce parametrization of the modal operator and the loop operator . For , we define by: the state space and the initial state of are given by
[TABLE]
and is a unique s-finite kernel such that the following diagrams commute:
- •
for any ,
[TABLE]
- •
for any ,
[TABLE]
Let be
[TABLE]
The definition of and are motivated by the following lemma.
Lemma 11.5**.**
For any and for any , we have
[TABLE]
By means of , we also parametrize the operator . For , and for , we define by
[TABLE]
Because , we have .
Lemma 11.6**.**
For any , we have
[TABLE]
Below, we write for the isomorphism obtained by applying to .
Lemma 11.7**.**
There is a family of measurable functions such that the following diagram commutes:
[TABLE]
where is a measurable isomorphism given by
[TABLE]
Proof.
In this proof, for sets , we identify elements in with functions from to . For and , We define by induction on :
[TABLE]
where is given by
[TABLE]
We note that this definition makes sense because implies . It is straightforward to check the family makes the above diagram commute. ∎
Lemma 11.8**.**
For any Mealy machine , we have
[TABLE]
which is realized by given by
[TABLE]
where the first and the last isomorphisms are obtained by applying canonical isomorphisms and .
Proof.
See Figure 5. ∎
Lemma 11.9**.**
For any Mealy machine and for any ,
[TABLE]
which is realized by given by
[TABLE]
where the first and the last isomorphisms are obtained by applying canonical isomorphisms and .
Proof.
We prove the statement by induction on . The base case follows from that the transition relations of and are the zero kernels. We next check the induction step. We have
[TABLE]
By Lemma 11.7, this behavioral equivalence is realized by . ∎
Proposition 11.2**.**
For a Mealy machine , we inductively define by
[TABLE]
For all , we have
[TABLE]
and
[TABLE]
Proof.
It follows from the definition of , we have
[TABLE]
for all , and
[TABLE]
Hence, by continuity of the composition, the coproduct and the monodal product of s-finite kernels, we have
[TABLE]
for all , and
[TABLE]
By induction on , we show that we have
[TABLE]
which is realized by . For the base case, we have because these Mealy machines and are behaviorally equiavalent to . The induction step follows from Lemma 11.8. ∎
Corollary 11.1**.**
For any Mealy machine ,
[TABLE]
Proof.
By Proposition 9.1, we have
[TABLE]
and
[TABLE]
Therefore, by Lemma 11.8 and Lemma 11.9, the following diagram commutes:
[TABLE]
It is easy to see that preserves the initial states. Hence, . ∎
11.1.4 Commutativity Modulo Observational Equivalence
Definition 11.1**.**
For terms , we say that is observationally equivalent to when for all context , if , then .
In this sectin, as an application of our GoI semantics, we show that for all , and ,
[TABLE]
is observationally equivalent to
[TABLE]
To prove this equivalence, let be a binary relation between closed terms of type and probabilistic Mealy machines from to by
[TABLE]
where (Condition 1) is: for any such that
[TABLE]
and for any , (Condition 2) is: for any such that
[TABLE]
and for any , for any , We then inductively define binary relations
[TABLE]
by replacing in the definition of , and with . Then we can prove basic lemma for this logical relation.
Lemma 11.10** (Basic Lemma).**
Let be a context.
- •
For any term and for any for , we have
[TABLE]
- •
For any value and for any for , we have
[TABLE]
Proof.
Almost equivalent to the proof of Lemma 11.4. ∎
Corollary 11.2**.**
For any ,
- •
for any such that
[TABLE]
and for any ,
- •
for any such that
[TABLE]
and for any , for any ,
By Corollary 11.2 and by the definition of composition of probabilistic Mealy machines, we see that if
[TABLE]
then
[TABLE]
where and are s-finite kernels given by restricting the domain and the codomain of and respectively. Because of commuativity for s-finite kernels [32], the equality (1) is true. Hence, (2) holds. Then by adequacy, we see that
[TABLE]
is observationally equivalent to
[TABLE]
12 Conclusion
We introduced a denotational semantics for , a higher-order functional language with sampling from a uniform continuous distribution and scoring. Following [28], we considered two operational semantics, namely a distribution-based operational semantics, which associates terms with distributions over real numbers, and a sampling-based operational semantics, which associates each term with a weight along every probabilistic branch. Our main results are adequacy theorems for both kinds of operational semantics, and it follows from these theorems that sampling-based operational semantics is essentially equivalent to distribution-based operational semantics. Another consequence of adequacy theorems is the possibility of diagrammatic reasoning for observational equivalence of programs. It follows from the observation in Section 5.5 and the adequacy theorems, that diagrammatic equivalence for denotation of terms implies observational equivalence. It would be interesting to explore possible connections between our work and other works on diagrammatic reasoning for probabilistic computation, such as [48, 49].
At this point, our language does not support normalisation mechanism as a first class operator, and we are negative about extending our semantics to capture normalisation mechanism. However, capturing sampling algorithms such as the Metropolis-Hastings algorithm [50, 51], which consists of a number of interactions between programs and their environment seems plausible. Exploring the relationships between “idealised” normalisation mechanisms and such “approximating” normalisation mechanisms from the point of view of GoI is an interesting topic for future work.
Acknowledgment
The authors are partially supported by the INRIA/JSPS project “CRECOGI”, and would like to thank Michele Pagani for many fruitful discussions about an earlier version of this work. Naohiko Hoshino is supported by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. L. Miller, “Riemann’s hypothesis and tests for primality,” J. Comput. Syst. Sci. , vol. 13, no. 3, pp. 300–317, 1976.
- 2[2] M. O. Rabin, “Probabilistic algorithm for testing primality,” Journal of Number Theory , vol. 12, no. 1, pp. 128 – 138, 1980.
- 3[3] M. Agrawal, N. Kayal, and N. Saxena, “PRIMES is in P,” Ann. of Math , vol. 2, pp. 781–793, 2002.
- 4[4] F. D. Wood, J. van de Meent, and V. Mansinghka, “A new approach to probabilistic programming inference,” in AISTATS 2014 , 2014, pp. 1024–1032.
- 5[5] N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum, “Church: a language for generative models,” in UAI 2008 , 2008, pp. 220–229.
- 6[6] C. Jones, “Probabilistic non-determinism,” Ph.D. dissertation, University of Edinburgh, 1990.
- 7[7] V. Danos and R. Harmer, “Probabilistic game semantics,” ACM Trans. Comput. Log. , vol. 3, no. 3, pp. 359–382, 2002.
- 8[8] T. Ehrhard, M. Pagani, and C. Tasson, “Full abstraction for probabilistic PCF,” J. ACM , vol. 65, no. 4, pp. 23:1–23:44, 2018.
