The Geometry of Bayesian Programming

Ugo Dal Lago; Naohiko Hoshino

arXiv:1904.07425·cs.PL·June 22, 2023

The Geometry of Bayesian Programming

Ugo Dal Lago, Naohiko Hoshino

PDF

Open Access

TL;DR

This paper introduces a geometric interaction model for a typed lambda-calculus designed for higher-order Bayesian programming, incorporating sampling and soft conditioning, based on measurable spaces.

Contribution

It provides a novel geometric interaction model for Bayesian programming languages, connecting category theory with probabilistic semantics.

Findings

01

Model is adequate for distribution-based semantics

02

Model is adequate for sampling-based semantics

03

Framework supports higher-order Bayesian programming

Abstract

We give a geometry of interaction model for a typed lambda-calculus endowed with operators for sampling from a continuous uniform distribution and soft conditioning, namely a paradigmatic calculus for higher-order Bayesian programming. The model is based on the category of measurable spaces and partial measurable functions, and is proved adequate with respect to both a distribution-based and a sampling based operational semantics.

Equations628

{x \in X : f (x) is defined and is equal to an element of A}

{x \in X : f (x) is defined and is equal to an element of A}

R_{[0, 1]} = {a \in R : 0 \leq a \leq 1}, R_{\geq 0} = {a \in R : a \geq 0}

R_{[0, 1]} = {a \in R : 0 \leq a \leq 1}, R_{\geq 0} = {a \in R : a \geq 0}

∣ X \times Y ∣

∣ X \times Y ∣

∣ X + Y ∣

Σ_{X \times Y}

Σ_{X \times Y}

for all A \in Σ_{X} and B \in Σ_{Y},

Σ_{X + Y}

μ_{Borel} ([a_{1}, b_{1}] \times \dots \times [a_{n}, b_{n}]) = 1 \leq i \leq n \prod ∣ a_{i} - b_{i} ∣.

μ_{Borel} ([a_{1}, b_{1}] \times \dots \times [a_{n}, b_{n}]) = 1 \leq i \leq n \prod ∣ a_{i} - b_{i} ∣.

\int_{X} f (u) d u .

\int_{X} f (u) d u .

δ_{x} (A) = [x \in A] = {1, 0, if x \in A; if x \in / A .

δ_{x} (A) = [x \in A] = {1, 0, if x \in A; if x \in / A .

A, B

A, B

V, W

M, N

\displaystyle\hskip 15.0pt\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{F}(\mathtt{V}_{1},\dots,\mathtt{V}_{|\mathtt{F}|})\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{sample}\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{score}(\mathtt{V}).

\mathtt{E}[-]::=[-]\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{let}\;\mathtt{x}\;\mathtt{be}\;\mathtt{E}[-]\;\mathtt{in}\;\mathtt{M}.

\mathtt{E}[-]::=[-]\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{let}\;\mathtt{x}\;\mathtt{be}\;\mathtt{E}[-]\;\mathtt{in}\;\mathtt{M}.

(λ x^{A} . M) V

(λ x^{A} . M) V

let x be V in M

fix_{A, B} (f, x, M) V

ifz (r_{a}, M, N)

F (r_{a}, \dots r_{b})

x_{1} : Real, \dots, x_{m} : Real ⊢ M : Real,

x_{1} : Real, \dots, x_{m} : Real ⊢ M : Real,

M {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \Rightarrow_{n} μ ⟺ μ = k (u, -)

M {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \Rightarrow_{n} μ ⟺ μ = k (u, -)

M {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}}

M {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}}

k (u, A) = 0.

k (u, A) = 0.

M {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \Rightarrow_{0} μ ⟺ μ = \emptyset_{R} ⟺ μ = k (u, -) .

M {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \Rightarrow_{0} μ ⟺ μ = \emptyset_{R} ⟺ μ = k (u, -) .

R ::=

R ::=

\displaystyle\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{F}(\mathtt{V},\ldots\mathtt{W})\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{let}\;\mathtt{x}\;\mathtt{be}\;\mathtt{V}\;\mathtt{in}\;\mathtt{M}\;\;\mbox{\Large{$\mid$}}\;\;\mathtt{ifz}(\mathtt{r}_{a},\mathtt{M},\mathtt{N}).

x_{i} {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \equiv r_{a_{i}} \Rightarrow_{n + 1} μ ⟺ μ = δ_{a_{i}} .

x_{i} {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \equiv r_{a_{i}} \Rightarrow_{n + 1} μ ⟺ μ = δ_{a_{i}} .

r_{a} {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \equiv r_{a} \Rightarrow_{n + 1} μ ⟺ μ = δ_{a} .

r_{a} {r_{a_{1}} / x_{1}, \dots, r_{a_{m}} / x_{m}} \equiv r_{a} \Rightarrow_{n + 1} μ ⟺ μ = δ_{a} .

k ((a_{1}, \dots, a_{m}), A) = δ_{a_{i}} (A), h ((a_{1}, \dots, a_{m}), A) = δ_{a} (A)

k ((a_{1}, \dots, a_{m}), A) = δ_{a_{i}} (A), h ((a_{1}, \dots, a_{m}), A) = δ_{a} (A)

E [y] {r_{u} / (Δ, y : Real)} \Rightarrow_{n} μ ⟺ μ = k (u, -) .

E [y] {r_{u} / (Δ, y : Real)} \Rightarrow_{n} μ ⟺ μ = k (u, -) .

h ((a_{1}, \dots, a_{m}), A) = \int_{R_{[0, 1]}} k ((a_{1}, \dots, a_{m}, a), A) d a .

h ((a_{1}, \dots, a_{m}), A) = \int_{R_{[0, 1]}} k ((a_{1}, \dots, a_{m}, a), A) d a .

(b, \dots, c) \mapsto \int_{R} f (a, b, \dots, c) d a

(b, \dots, c) \mapsto \int_{R} f (a, b, \dots, c) d a

E [sample] {r_{u} / Δ} \Rightarrow_{n + 1} μ

E [sample] {r_{u} / Δ} \Rightarrow_{n + 1} μ

⟺ μ = h (u, -) .

E [skip] {r_{u} / Δ} \Rightarrow_{n} μ ⟺ μ = k (u, -) .

E [skip] {r_{u} / Δ} \Rightarrow_{n} μ ⟺ μ = k (u, -) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Logic, programming, and type systems · Advanced Algebra and Logic

Full text

The Geometry of Bayesian Programming

Ugo Dal Lago

Naohiko Hoshino

Abstract

We give a geometry of interaction model for a typed $\lambda$ -calculus endowed with operators for sampling from a continuous uniform distribution and soft conditioning, namely a paradigmatic calculus for higher-order Bayesian programming. The model is based on the category of measurable spaces and partial measurable functions, and is proved adequate with respect to both a distribution-based and a sampling based operational semantics.

1 Introduction

Randomisation provides the most efficient algorithmic solutions, at least concretely, in many different contexts. A typical example is the one of primality testing, where the Miller-Rabin test [1, 2] remains the preferred choice despite polynomial time deterministic algorithms are available from many years now [3]. Probability theory can be exploited even more fundamentally in programming, by way of so-called probabilistic (or, more specifically, Bayesian) programming, as popularized by languages like, among others, ANGLICAN [4] or CHURCH [5]. This has stimulated research about probabilistic programming languages and their semantics [6, 7, 8], together with type systems [9, 10], equivalence methodologies [11, 12], and verification techniques [13].

Giving a satisfactory denotational semantics to higher-order functional languages is already problematic in presence of probabilistic choice [6, 14], and becomes even more challenging when continuous distributions and scoring are present. Recently, quasi-Borel spaces [15] have been proposed as a way to give semantics to calculi with all these features, and only very recently [16] this framework has been shown to be adaptable to a fully-fledged calculus for probabilistic programming, in which continuous distributions and soft-conditioning are present. Probabilistic coherent spaces [17] are fully abstract [8] for $\lambda$ -calculi with discrete probabilistic choice, and can, with some effort, be adapted to calculi with sampling from continuous distributions [18], although without scoring.

A research path which has been studied only marginally, so far, consists in giving semantics to Bayesian higher-order programming languages through interactive forms of semantics, e.g. game semantics [19, 20] or the geometry of interaction [21]. One of the very first models for higher-order calculi with discrete probabilistic choice was in fact a game model, proved fully abstract for a probabilistic calculus with global ground references [7]. After more than ten years, a parallel form of Geometry of Interaction (GoI) and some game models have been introduced for $\lambda$ -calculi with probabilistic choice [22, 23, 24], but in all these cases only discrete probabilistic choice can be handled, with the exception of a recent work on concurrent games and continuous distributions [25].

In this paper, we will report on some results about GoI models of higher-order Bayesian languages. The distinguishing features of the introduced GoI model can be summarised as follows:

•

Simplicity. The category on which the model is defined is the one of measurable spaces and partial measurable functions, so it is completely standard from a measure-theoretic perspective.

•

Expressivity. As is well-known, the GoI construction [26, 27] allows to give semantics to calculi featuring higher-order functions and recursion. Indeed, our GoI model can be proved adequate for $\mathbf{PCFSS}$ , a fully-fledged calculus for probabilistic programming.

•

Flexibility. The model we present is quite flexible, in the sense of being able to reflect the operational behaviour of programs as captured by both the distribution-based and the sampling-based semantics.

•

Intuitiveness. GoI visualises the structure of programs in terms of graphs, from which dependencies between subprograms can be analyzed. Adequacy of our model provides diagrammatic reasoning principle about observational equivalence of $\mathbf{PCFSS}$ .

This paper’s contributions, beside the model’s definition, are two adequacy results which precisely relate our GoI model to the operational semantics, as expressed (following [28]), in both the distribution and sampling styles. As a corollary of our adequacy results, we show that the distribution induced by sampling-based operational semantics coincides with distribution-based operational semantics.

1.1 Turning Measurable Spaces into a GoI

Model

Before entering into the details of our model, it is worthwhile to give some hints about how the proposed model is obtained, and why it differs from similar GoI models from the literature.

The thread of work the proposed model stems from is the one of so-called memoryful geometry of interaction [29, 30]. The underlying idea of this paper is precisely the same: program execution is modelled as an interaction between the program and its environment, and memoisation takes place inside the program as a result of the interaction.

In the previous work on memoryful GoI by the second author with Hasuo and Muroya, the goal consisted in modelling a $\lambda$ -calculus with algebraic effects. Starting from a monad together with some algebraic effects, they gave an adequate GoI model for such a calculus, which is applicable to wide range of algebraic effects. In principle, then, their recipe could be applicable to $\mathbf{PCFSS}$ , sinc sampling-based operational semantics enables us to see scoring and sampling as algebraic effects acting on global states. However, the that would not work for $\mathbf{PCFSS}$ , since the category $\mathbf{Meas}$ of measurable spaces111We need to work on $\mathbf{Meas}$ because we want to give adequacy for distribution-based semantics. is not cartesian closed, and we thus cannot define a state monad by way of the exponential $S\Rightarrow S\times(-)$ .

In this paper, we side step this issue by a series of translations, to be described in Section 4 below. Instead of looking for a state monad on $\mathbf{Meas}$ , we embed $\mathbf{Meas}$ into the category $\mathbf{Mealy}$ of $\mathbf{Int}$ -objects and Mealy machines (Section 5) and use a state monad on this category. This is doable because $\mathbf{Mealy}$ is a compact closed category given by the $\mathbf{Int}$ -construction [27]. The use of such compact closed categories (or, more generally, of traced monoidal categories) is the way GoI models capture higher-order functions.

1.2 Outline

The rest of the paper is organised as follows. After giving some necessary measure-theoretic preliminaries in Section 2 below, we introduce in Section 3 the language $\mathbf{PCFSS}$ , together with the two kinds of operational semantics we were referring to above. In Section 4, we introduce our GoI model informally, while in Section 5 a more rigorous treatment of the involved concepts is given, together with the adequacy results. We discuss in Section 10 an alternative way of giving a GoI semantics to $\mathbf{PCFSS}$ based on s-finite kernels, and we conclude in Section 12.

2 Measure-Theoretic Preliminaries

We recall some basic notions in measure theory that will be needed in the following. We also fix some useful notations. For more about measure theory, see standard text books such as [31].

A $\sigma$ -algebra on a set $X$ is a family $\Sigma$ consisting of subsets of $X$ such that $\emptyset\in\Sigma$ ; and if $A\in\Sigma$ , then the complement $X\setminus A$ is in $\Sigma$ ; and for any family $\{A_{n}\in\Sigma\}_{n\in\mathbb{N}}$ , the intersection $\bigcap_{n\in\mathbb{N}}A_{n}$ is in $\Sigma$ . A measurable space $X$ is a set $|X|$ equipped with a $\sigma$ -algebra $\Sigma_{X}$ on $|X|$ . We often confuse a measurable space $X$ with its underlying set $|X|$ . For example, we simply write $x\in X$ instead of $x\in|X|$ . For measurable spaces $X$ and $Y$ , we say that a partial function $f\colon X\to Y$ (in this paper, we use $\to$ for both partial functions and total functions) is measurable when for all $A\in\Sigma_{Y}$ , the inverse image

[TABLE]

is in $\Sigma_{X}$ . A measurable function from $X$ to $Y$ is a totally defined partial measurable function. A (partial) measurable function $f\colon X\to Y$ is invertible when there is a measurable function $g\colon Y\to X$ such that $g\circ f$ and $f\circ g$ are identities. In this case, we say that $f$ is an isomorphism from $X$ to $Y$ and say that $X$ is isomorphic to $Y$ .

We denote a singleton set $\{\ast\}$ by $1$ , and we regard the latter as a measurable space by endowing it with the trivial $\sigma$ -algebra. We also regard the empty set $\emptyset$ as a measurable space in the obvious way. In this paper, $\mathbb{N}$ denotes the measurable set of all non-negative integers equipped with the $\sigma$ -algebra consisting of all subsets of $\mathbb{N}$ , and $\mathbb{R}$ denotes the measurable set of all real numbers equipped with the $\sigma$ -algebra consisting of Borel sets, that is, the least $\sigma$ -algebra that contains all open subsets of $\mathbb{R}$ . By the definition of $\Sigma_{\mathbb{R}}$ , a function $f\colon\mathbb{R}\to\mathbb{R}$ is measurable whenever $f^{-1}(U)\in\Sigma_{\mathbb{R}}$ for all open subsets $U\subseteq\mathbb{R}$ . Therefore, all continuous functions on $\mathbb{R}$ are measurable.

When $Y$ is a subset of the underlying set of a measurable space $X$ , we can equip $Y$ with a $\sigma$ -algebra $\Sigma_{Y}=\{A\cap Y:A\in\Sigma_{X}\}$ . This way, we regard the unit interval and the set of all non-negative real numbers as measurable spaces, and indicate them as follows:

[TABLE]

For measurable spaces $X$ and $Y$ , we define the product measurable space $X\times Y$ and the coproduct measurable space $X+Y$ by

[TABLE]

where the underlying $\sigma$ -algebras are:

[TABLE]

We assume that $\times$ has higher precedence than $+$ , i.e., we write $X+Y\times Z$ for $X+(Y\times Z)$ . In this paper, we always regard finite products $\mathbb{R}^{n}$ as the product measurable space on $\mathbb{R}$ . It is well-known that the $\sigma$ -algebra $\Sigma_{\mathbb{R}^{n}}$ is the set of all Borel sets, i.e., $\Sigma_{\mathbb{R}^{n}}$ is the least one that contains all open subsets of $\mathbb{R}^{n}$ . Partial measurable functions are closed under compositions, products and coproducts.

Let $X$ be a measurable space. A measure $\mu$ on $X$ is a function from $\Sigma_{X}$ to $[0,\infty]$ that is the set of all non-negative real numbers extended with $\infty$ , such that

•

$\mu(\emptyset)=0$ ; and

•

for any mutually disjoint family $\{A_{n}\in\Sigma_{X}\}_{n\in\mathbb{N}}$ , we have $\sum_{n\in\mathbb{N}}\mu(A_{n})=\mu\left(\bigcup_{n\in\mathbb{N}}A_{n}\right)$ .

We say that a measure $\mu$ on $X$ is finite when $\mu(X)<\infty$ and that it is $\sigma$ -finite if $X=\bigcup_{n\in\mathbb{N}}X_{n}$ for some family $\{X_{n}\in\Sigma_{X}\}_{n\in\mathbb{N}}$ satisfying $\mu(X_{n})<\infty$ .

For a measurable space $X$ , we write $\varnothing_{X}$ for a measure on $X$ given by $\varnothing_{X}(A)=0$ for all $A\in\Sigma_{X}$ . If $\mu$ is a measure on a measurable space $X$ , then for any non-negative real number $a$ , the function $(a\,\mu)(A)=a(\mu(A))$ is also a measure on $X$ . The Borel measure $\mu_{\mathrm{Borel}}$ on $\mathbb{R}^{n}$ is the unique measure that satisfies

[TABLE]

We define the Borel measure $\mu_{\mathrm{Borel}}$ on $1$ by $\mu_{\mathrm{Borel}}(1)=1$ . For a measurable function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and a measurable subset $X\subseteq\mathbb{R}^{n}$ , we denote the integral of $f$ with respect to the Borel measure restricted to $X$ by

[TABLE]

For a measurable space $X$ and for an element $x\in X$ , a Dirac measure $\delta_{x}$ on $X$ is given by

[TABLE]

The square bracket notation in the right hand side is called Iverson’s bracket. In general, for a proposition $P$ , we have $[P]=1$ when $P$ is true and $[P]=0$ when $P$ is false.

Proposition 2.1.

For every $\sigma$ -finite measures $\mu$ on a measurable space $X$ and $\nu$ on a measurable space $Y$ , there is a unique measure $\mu\times\nu$ on $X\times Y$ such that $(\mu\times\nu)(A\times B)=\mu(A)\nu(B)$ for all $A\in\Sigma_{X}$ and $B\in\Sigma_{Y}$ .

The measure $\mu\times\nu$ is called the product measure of $\mu$ and $\nu$ . For example, the Borel measure on $\mathbb{R}^{2}$ is the product measure of the Borel measure on $\mathbb{R}$ .

Finally, let us recall the notion of a kernel, which is a well-known concept in the theory of stochastic processes. For measurable spaces $X$ and $Y$ , a kernel from $X$ to $Y$ is a function $k\colon X\times\Sigma_{Y}\to[0,\infty]$ such that for any $x\in X$ , the function $k(x,-)$ is a measure on $Y$ , and for any $A\in\Sigma_{Y}$ , the function $k(-,A)$ is measurable. Notions of finite and $\sigma$ -finite kernels can be naturally given, following the emponymous constraint on measures. Those kernels which can be expressed as the sum of countably many finite kernels are said to be s-finite [32]. We use kernels to give semantics for our probabilistic programming language, to be defined in the next section.

3 Syntax and Operational Semantics

3.1 Syntax and Type System

Our language $\mathbf{PCFSS}$ for higher order Bayesian programming can be seen as Plotkin’s $\mathbf{PCF}$ endowed with real numbers, measurable functions, sampling from the uniform distribution on $\mathbb{R}_{[0,1]}$ and soft-conditioning. We first define types $\mathtt{A},\mathtt{B},\ldots$ , values $\mathtt{V},\mathtt{W},\ldots$ and terms $\mathtt{M},\mathtt{N},\ldots$ as follows:

[TABLE]

Here, $\mathtt{x}$ varies over a countably infinite set of variable symbols, and $a$ varies over the set $\mathbb{R}$ of all real numbers. Each function identifier $\mathtt{F}$ is associated with a measurable function $\mathrm{fun}_{\mathtt{F}}$ from $\mathbb{R}^{|\mathtt{F}|}$ to $\mathbb{R}$ . For terms $\mathtt{M}$ and $\mathtt{N}$ , we write $\mathtt{M}\{\mathtt{N}/\mathtt{x}\}$ for the capture-avoiding substitution of $\mathtt{x}$ in $\mathtt{M}$ by $\mathtt{N}$ .

Terms in $\mathbf{PCFSS}$ are restricted to be A-normal forms, in order to make some of the arguments on our semantics simpler. This restriction is harmless for the language’s expressive power, thanks to the presence of $\mathtt{let}$ -bindings. For example, term application $\mathtt{M}\,\mathtt{N}$ can be defined to be $\mathtt{let}\;\mathtt{x}\;\mathtt{be}\;\mathtt{M}\;\mathtt{in}\;\mathtt{let}\;\mathtt{y}\;\mathtt{be}\;\mathtt{N}\;\mathtt{in}\;\mathtt{x}\,\mathtt{y}$ .

The term constructor $\mathtt{score}$ and the constant $\mathtt{sample}$ enable probabilistic programming in $\mathbf{PCFSS}$ . Evaluation of $\mathtt{score}(\mathtt{r}_{a})$ has the effect of multiplying the weight of the current probabilistic branch by $|a|$ , this way enabling a form of soft-conditioning. The constant $\mathtt{sample}$ generates a real number randomly drawn from the uniform distribution on $\mathbb{R}_{[0,1]}$ . Only one sampling mechanism is sufficient because we can model sampling from other standard distributions by composing $\mathtt{sample}$ with measurable functions [33].

Terms can be typed in a natural way. A context $\mathtt{\Delta}$ is a finite sequence consisting of pairs of a variable and a type such that every variable appears in $\mathtt{\Delta}$ at most once. A type judgement is a triple $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ consisting of a context $\mathtt{\Delta}$ , a term $\mathtt{M}$ and a type $\mathtt{A}$ . We say that a type judgement $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ is derivable when we can derive $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ from the typing rules in Figure 1. Here, the type of $\mathtt{sample}$ is $\mathtt{Real}$ , and the type of $\mathtt{score}(\mathtt{V})$ is $\mathtt{Unit}$ because $\mathtt{sample}$ returns a real number, and the purpose of scoring is its side effect.

In the sequel, we only consider derivable type judgements and typable closed terms, that is, closed terms $\mathtt{M}$ such that $\vdash\mathtt{M}:\mathtt{A}$ is derivable for some type $\mathtt{A}$ .

3.2 Distribution-Based Operational Semantics

We define distribution-based operational semantics following [28] where, however, a $\sigma$ -algebra on the set of terms is necessary so as to define evaluation results of terms to be distributions (i.e. measures) over values. In this paper, we only consider evaluation of terms of type $\mathtt{Real}$ and avoid introducing $\sigma$ -algebras on sets of closed terms, thus greatly simplifying the overall development.

Distribution-based operational semantics is a function that sends a closed term $\mathtt{M}:\mathtt{Real}$ to a measure $\mu$ on $\mathbb{R}$ . Because of the presence of $\mathtt{score}$ , the measure may not be a probabilistic measure, i.e., $\mu(\mathbb{R})$ may be larger than $1$ , but the idea of distribution-based operational semantics is precisely that of associating each closed term of type $\mathtt{Real}$ with a measure over $\mathbb{R}$ .

As common in call-by-value programming languages, evaluation is defined by way of evaluation contexts:

[TABLE]

The distribution-based operational semantics of $\mathbf{PCFSS}$ is a family of binary relations $\{\Rightarrow_{n}\}_{n\in\mathbb{N}}$ between closed terms of type $\mathtt{Real}$ and measures on $\mathbb{R}$ inductively defined by the evaluation rules in Figure 2 where the evaluation rule for $\mathtt{score}$ is inspired from the one in [32]. The binary relation $\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}$ in the precondition of the third rule in Figure 2 is called deterministic reduction and is defined as follows as a relation on closed terms:

[TABLE]

The last evaluation rule in Figure 2 makes sense because $k$ in the precondition is a kernel from $\mathbb{R}_{[0,1]}$ to $\mathbb{R}$ :

Lemma 3.1.

For any $n\in\mathbb{N}$ and for any term

[TABLE]

there is a finite kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for any $u\in\mathbb{R}^{m}$ and for any measure $\mu$ on $\mathbb{R}$ ,

[TABLE]

where $u=(a_{1},\ldots,a_{m})$ .

Proof.

Let $\mathtt{\Delta}$ be a context of the form $\mathtt{x}_{1}:\mathtt{Real},\ldots,\mathtt{x}_{m}:\mathtt{Real}$ . In this proof, for a finite sequence $u=(a_{1},\ldots,a_{n})\in\mathbb{R}^{m}$ , and for a term $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ , we denote

[TABLE]

by $\mathtt{M}\{\mathtt{r}_{u}/\mathtt{\Delta}\}$ . We prove the statement by induction on $n\in\mathbb{N}$ . (Base case) Let $k$ be a kernel from $\mathbb{R}^{m}$ to $\mathbb{R}$ given by

[TABLE]

Then for any $u=(a_{1},\ldots,a_{m})\in\mathbb{R}^{m}$ ,

[TABLE]

(Induction step) We define a redex $\mathtt{R}$ by

[TABLE]

We note that $\mathtt{V},\mathtt{W}$ in the above BNF can be variables. By induction on the size of type derivation, we can show that every term $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ is either a value or of the form $\mathtt{E}[\mathtt{R}]$ for some evaluation context $\mathtt{E}[-]$ and some redex $\mathtt{R}$ . Given a term $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ where $\mathtt{\Delta}=\mathtt{x}_{1}:\mathtt{Real},\ldots,\mathtt{x}_{m}:\mathtt{Real}$ , we prove the induction step by case analysis.

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is a value, then $\mathtt{M}$ is either a variable $\mathtt{x}_{i}$ or a constant $\mathtt{r}_{a}$ . When $\mathtt{M}$ is a variable $\mathtt{x}_{i}$ , we have

[TABLE]

When $\mathtt{M}$ is a constant $\mathtt{r}_{a}$ , we have

[TABLE]

Both $k,h\colon\mathbb{R}^{m}\times\Sigma_{\mathbb{R}}\to[0,\infty]$ given by

[TABLE]

are kernels from $\mathbb{R}^{m}$ to $\mathbb{R}$ .

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is of the form $\mathtt{E}[\mathtt{sample}]$ , then by induction hypothesis, there is a kernel from $\mathbb{R}^{m+1}$ to $\mathbb{R}$ such that for any $u\in\mathbb{R}^{m+1}$ ,

[TABLE]

We define a kernel $h$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ by

[TABLE]

This is a kernel because if $f\colon\mathbb{R}\times\cdots\times\mathbb{R}\to\mathbb{R}$ is a non-negative measurable function, then

[TABLE]

is measurable. See [31, Theorem 18.3]. Then, for any $u=(a_{1},\ldots,a_{m})\in\mathbb{R}^{m}$ ,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{B}$ is of the form $\mathtt{E}[\mathtt{score}(\mathtt{x}_{i})]$ for some $i\in\{1,2,\ldots,m\}$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for any $u\in\mathbb{R}^{m}$ ,

[TABLE]

We define a kernel $h\colon\mathbb{R}^{m}$ to $\mathbb{R}$ by

[TABLE]

Then, for any $u=(a_{1},\ldots,a_{m})\in\mathbb{R}^{m}$ ,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{B}$ is of the form $\mathtt{E}[\mathtt{score}(\mathtt{r}_{a})]$ for some $a\in\mathbb{R}$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for any $u\in\mathbb{R}^{m}$ ,

[TABLE]

We define a kernel $h\colon\mathbb{R}^{m}$ to $\mathbb{R}$ by

[TABLE]

Then, for any $u=(a_{1},\ldots,a_{m})\in\mathbb{R}^{m}$ ,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{B}$ is of the form $\mathtt{E}[(\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{N})\,\mathtt{V}]$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for all $u\in\mathbb{R}^{m}$ ,

[TABLE]

Hence,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{B}$ is of the form $\mathtt{E}[\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{N})\,\mathtt{V}]$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for all $u\in\mathbb{R}^{m}$ ,

[TABLE]

Hence,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is of the form $\mathtt{E}[\mathtt{F}(\mathtt{V}_{1},\ldots,\mathtt{V}_{|\mathtt{F}|})]$ , then $\mathtt{V}_{i}$ is equal to either a variable or a constant $\mathtt{r}_{a}$ . For simplicity, we suppose that $|\mathtt{F}|=2$ and $\mathtt{V}_{1}=\mathtt{x}_{i}$ and $\mathtt{V}_{2}=\mathtt{r}_{a}$ . By induction hypothesis, there is a kernel from $\mathbb{R}^{m+1}$ to $\mathbb{R}$ such that for all $u\in\mathbb{R}^{m+1}$ ,

[TABLE]

We define a kernel $h$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ by

[TABLE]

Then, for any $u=(a_{1},\ldots,a_{m})\in\mathbb{R}^{m}$ ,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is of the form $\mathtt{let}\;\mathtt{x}\;\mathtt{be}\;\mathtt{V}\;\mathtt{in}\;\mathtt{N}$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for all $u\in\mathbb{R}^{m}$ ,

[TABLE]

Hence,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is of the form $\mathtt{E}[\mathtt{ifz}(\mathtt{x}_{i},\mathtt{N},\mathtt{L})]$ for some $i\in\{1,2,\ldots,m\}$ , then by induction hypothesis, there are kernels $k$ and $k^{\prime}$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for any $u\in\mathbb{R}^{m}$ ,

[TABLE]

We define a kernel $h$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ by

[TABLE]

Then, for any $u\in\mathbb{R}^{m}$ ,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is of the form $\mathtt{E}[\mathtt{ifz}(\mathtt{r}_{0},\mathtt{N},\mathtt{L})]$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that for any $u\in\mathbb{R}^{m}$ ,

[TABLE]

Hence,

[TABLE]

•

If $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{Real}$ is of the form $\mathtt{E}[\mathtt{ifz}(\mathtt{r}_{a},\mathtt{N},\mathtt{L})]$ for some real number $a\neq 0$ , then by induction hypothesis, there is a kernel $k$ from $\mathbb{R}^{m}$ to $\mathbb{R}$ such that

[TABLE]

Hence,

[TABLE]

∎

Lemma 3.1 implies that the relations $\Rightarrow_{n}$ can be seen as functions from the set of closed terms of type $\mathtt{Real}$ to the set of measures on $\mathbb{R}$ .

The step-indexed distribution-based operational semantics approximates the evaluation of closed terms by restricting the number of reduction steps. Thus, the limit of the step-indexed distribution-based operational semantics represents the “true” result of evaluating the underlying term.

Definition 3.1.

For a closed term $\mathtt{M}:\mathtt{Real}$ and a measure $\mu$ on $\mathbb{R}$ , we write $\mathtt{M}\Rightarrow_{\infty}\mu$ when there is a family of measures $\{\mu_{n}\}_{n\in\mathbb{N}}$ on $\mathbb{R}$ such that $\mathtt{M}\Rightarrow_{n}\mu_{n}$ and for all $A\in\Sigma_{\mathbb{R}}$ ,

[TABLE]

The binary relation $\Rightarrow_{\infty}$ is a function from the set of closed terms of type $\mathtt{Real}$ to the set of measures on $\mathbb{R}$ . This follows from Lemma 3.1 and that the family of measures $\{\mu_{n}\}_{n\in\mathbb{N}}$ on $\mathbb{R}$ such that $\mathtt{M}\Rightarrow_{n}\mu_{n}$ forms an ascending chain $\mu_{0}\leq\mu_{1}\leq\cdots$ with respect to the pointwise order. Moreover, it can be proved that for any $\mathtt{x_{1}}:\mathtt{Real},\ldots,\mathtt{x}_{m}:\mathtt{Real}\vdash\mathtt{M}:\mathtt{Real}$ , $k$ given by $\mathtt{M}\{\mathtt{r}_{a_{1}}/\mathtt{x_{1}},\ldots,\mathtt{r}_{a_{m}}/\mathtt{x}_{m}\}\Rightarrow_{\infty}k((a_{1},\ldots,a_{m}),-)$ is an s-finite kernel.

3.3 Sampling-Based Operational Semantics

$\mathbf{PCFSS}$ can be endowed with another form of operational semantics, closer in spirit to inference algorithms, called the sampling-based operational semantics. The way we formulate it is deeply inspired from the one in [28].

The idea behind sampling-based operational semantics is to give the evaluation result of each probabilistic branch somehow independently. We specify each probabilistic branch by two parameters: one is a sequence of random draws, which will be consumed by $\mathtt{sample}$ ; the other is a likelihood measure called weight, which will be modified by $\mathtt{score}$ .

Definition 3.2.

A configuration is a triple $(\mathtt{M},a,u)$ consisting of a closed term $\mathtt{M}:\mathtt{Real}$ , a real number $a\geq 0$ called the configuration’s weight, and a finite sequence $u$ of real numbers in $\mathbb{R}_{[0,1]}$ , called its trace.

Below, we write $\varepsilon$ for the empty sequence. For a real number $a$ and a finite sequence $u$ consisting of real numbers, we write $a\mathbin{::}u$ for the finite sequence obtained by putting $a$ on the head of $u$ . In Figure 3, we give the evaluation rules of sampling-based operational semantics where $\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}$ is the deterministic reduction relation introduced in the previous section. We denote the reflective transitive closure of $\to$ by $\to^{\ast}$ . Intuitively, $(\mathtt{M},1,u)\to^{\ast}(\mathtt{r}_{a},b,\varepsilon)$ means that by evaluating $\mathtt{M}$ , we get the real number $a$ with weight $b$ consuming all the random draws in $u$ .

4 Towards Mealy Machine Semantics

In this section, we give some intuitions about our GoI model, which we also call Mealy machine semantics. Giving Mealy machine semantics for $\mathbf{PCFSS}$ requires translating $\mathbf{PCFSS}$ into the linear $\lambda$ -calculus. This is because GoI is a semantics for linear logic, and is thus tailored for calculi in which terms are treated as resources. Schematically, Mealy machine semantics for $\mathbf{PCFSS}$ translates terms in $\mathbf{PCFSS}$ into Mealy machines in the following way.

[TABLE]

In Section 4.1, we explain the first three steps. The last step deserves to be explained in more detail, which we do in Section 4.2. For the sake of simplicity, we ignore the translation of conditional branching and the fixed point operator.

4.1 From $\mathbf{PCFSS}$ to Proof Structures

4.1.1 Moggi’s Translation

In the first step, we translate $\mathbf{PCFSS}$ into an extension of the Moggi’s meta-language by Moggi’s translation [34]. Here, in order to translate scoring and sampling in $\mathbf{PCFSS}$ , we equip Moggi’s meta-language with base types $\mathtt{Unit}$ and $\mathtt{Real}$ and the following terms:

[TABLE]

where $\mathtt{T}$ is the monad of Moggi’s meta-language. Any type $\mathtt{A}$ of $\mathbf{PCFSS}$ is translated into the type $\mathtt{A}^{\sharp}$ defined as follows:

[TABLE]

Terms $\mathtt{sample}$ and $\mathtt{score}(-)$ in $\mathbf{PCFSS}$ are translated into $\mathtt{sample}$ and $\mathtt{score}(-)$ in Moggi’s meta-language respectively. See [34] for more detail about Moggi’s translation.

4.1.2 Girard Translation

We next translate the extended Moggi’s meta-language into an extension of the linear $\lambda$ -calculus, by way of the so-called Girard translation [35]. Types are given by

[TABLE]

where $\mathtt{Unit}$ , $\mathtt{Real}$ and $\mathtt{State}$ are base types, and terms are generated by the standard term constructors of the linear $\lambda$ -calculus, plus the following rules:

[TABLE]

(as customary in linear logic, $\mathtt{A}\multimap\mathtt{B}$ is an abbreviation of $\mathtt{A}^{\bot}\mathbin{\wp}\mathtt{B}$ ). These typing rules are derived from the following translation $(-)^{\flat}$ of types of the extended Moggi’s meta-language into types of the extended linear $\lambda$ -calculus:

[TABLE]

The definition of $(\mathtt{T}\,\mathtt{A})^{\flat}$ is motivated by the following categorical observation: let $\mathcal{L}$ be the syntactic category of the extended linear $\lambda$ -calculus, which is a symmetric monoidal closed category endowed with a comonad $\oc\colon\mathcal{L}\to\mathcal{L}$ with certain coherence conditions (see e.g. [36]), and let $\mathcal{L}_{\oc}$ be the coKleisli category $\mathcal{L}_{\oc}$ of the comonad $\oc$ . Then, by composing the adjunction between $\mathcal{L}$ and $\mathcal{L}_{\oc}$ with a state monad $\mathsf{State}\multimap\mathsf{State}\otimes(-)$ on $\mathcal{L}$ , we obtain a monad on $\mathcal{L}_{\oc}$ :

[TABLE]

which sends an object $\mathtt{A}\in\mathcal{L}_{\oc}$ to $\mathtt{State}\multimap\mathtt{State}\otimes\oc\mathtt{A}$ . This use of the state monad is motivated by sampling-based operational semantics: we can regard $\mathbf{PCFSS}$ as a call-by-value $\lambda$ -calculus with global states consisting of pairs of a non-negative real number and a finite sequence of real numbers, and we can regard $\mathtt{score}$ and $\mathtt{sample}$ as effectful operations interacting with those states.

4.1.3 The Third Step

We translate terms in the extended linear $\lambda$ -calculus into (an extension of proof structures) [37], which are graphical presentations of type derivation trees of linear $\lambda$ -terms. We can also understand proof structures as string diagrams for compact closed categories [38]. Operators of the pure, linear, $\lambda$ -calculus, can be translated as usual [37]. For example, type derivation trees

[TABLE]

are translated into proof structures

$\mathtt{M}$$\mathtt{N}$$\otimes$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\otimes\mathtt{B}$$\wp$$\mathtt{A}$$\mathtt{A}^{\bot}$$\mathtt{A}\multimap\mathtt{A}$$\mathtt{A}$

respectively where nodes labelled with $\mathtt{M}$ and $\mathtt{N}$ are proof structures associated to type derivations of $\mathtt{M}$ and $\mathtt{N}$ . Terms of the form $\mathtt{r}_{a}$ , $\mathtt{sample}(\mathtt{M})$ and $\mathtt{score}$ , require new kinds of nodes:

$\mathtt{r}_{a}$$\mathtt{Real}$$\mathtt{sa}$$\oc\mathtt{Real}$$\mathtt{State}^{\bot}$$\mathtt{State}$$\mathtt{sc}$$\mathtt{State}$$\mathtt{State}^{\bot}$$\oc\mathtt{Unit}$$\oc\mathtt{Real}$

.

This is not a direct adaptation of typing rules for $\mathtt{score}$ and $\mathtt{sample}$ in the linear $\lambda$ -calculus, but the correspondence can be recovered by way of multiplicatives:

$\otimes$$\wp$$\mathtt{State}$$\mathtt{State}^{\bot}$$\oc\mathtt{A}$$\mathtt{State}\otimes\oc\mathtt{A}$$\mathtt{State}\multimap\mathtt{State}\otimes\oc\mathtt{A}$

.

4.2 From Proof Structures to Mealy Machines

The series of translations from $\mathbf{PCFSS}$ to proof structures is agnostic as for the computational meaning of $\mathtt{score}$ and $\mathtt{sample}$ in $\mathbf{PCFSS}$ because $\mathtt{score}$ and $\mathtt{sample}$ introduced in these translations are just constant symbols. In other words, the translation from $\mathbf{PCFSS}$ to the extended proof structures is not sound with respect to either form of operational semantics for $\mathbf{PCFSS}$ . In the last translation step, we assign proof structures a computational meaning, respecting the operational semantics of the underlying $\mathbf{PCFSS}$ term.

We do this by associating proof structures with Mealy machines. A Mealy machine is an input/output-machine whose evolution may depend on its current state. In this paper, for the sake of supporting intuition and of enabling graphical reasoning, we depict a Mealy machine $\mathsf{M}$ as a node with some input/output-ports:

$\mathsf{M}$$\mathsf{M}$$x$$y$$s/t$$\mathsf{M}$$z$$w$$s^{\prime}/t^{\prime}$

.

For example, the thick arrow in the middle diagram indicates that if the current state is $s$ and the given input is $x$ , then the Mealy machine outputs $y$ and changes its state to $t$ . In the GoI jargon, data traveling along edges of proof structures are often called tokens.

For the standard proof structures, we can follow [39] where Mealy machines associated with proof structures are built up from Mealy machines associated to each nodes. For example, the following nodes

$\otimes$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\otimes\mathtt{B}$$\wp$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\mathbin{\wp}\mathtt{B}$

are both associated with a one-state Mealy machine that behaves in the following manner:

$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\mathbin{\otimes}\mathtt{B}$$b$$(\circ,b)$$a$$(\bullet,a)$$\mathtt{A}$$\mathtt{B}$$\mathtt{A}\mathbin{\otimes}\mathtt{B}$$b$$(\bullet,b)$$a$$(\circ,a)$

.

Namely, the Mealy machine forwards each input from the left hand side to the right hand side endowing it with a tag that tells where the token came from. The Mealy machine handles inputs from the right hand side in the reverse way.

Soundness of Mealy machine semantics states that if two (pure) linear $\lambda$ -terms are $\beta$ -equivalent, then the behaviours of the Mealy machines associated to these terms are the same. As an example, let us consider a $\beta$ -reduction step $(\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{x})\,\mathtt{y}\to\mathtt{y}.$ The proof structure associated to $(\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{x})\,\mathtt{y}$ is the graph in the left hand side, and the arrow in the right hand side illustrates a trace of a run of this Mealy machine for an input $a$ from the right edge:

$\wp$$\otimes$$\mathtt{A}$$\mathtt{A}^{\bot}$$\mathtt{A}\multimap\mathtt{A}$$\mathtt{A}$$\mathtt{A}$$\wp$$\otimes$$a$$a$

.

This Mealy machine forwards any input from the right hand side to the left hand side as indicated by the thick arrow, and it also forwards any input from the left hand side to the right hand side. Hence, the behaviour of this Mealy machine is equivalent to the behaviour of the following trivial Mealy machine:

$\mathtt{A}$$a$$a$$a$$a$

,

which is the interpretation of $\mathtt{y}:\mathtt{A}\vdash\mathtt{y}:\mathtt{A}$ . This is in fact a symptom of a general phenomenon: Mealy machine semantics for the linear $\lambda$ -calculus captures $\beta$ -reduction $(\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{x})\,\mathtt{y}\to\mathtt{y}$ .

But how can we extend this Mealy machine semantics to $\mathtt{score}$ and $\mathtt{sample}$ ? Here, we borrow the idea from Game semantics [40] that models computation in terms of interaction between programs and environments. For scoring and sampling, we can infer how they interact with the environment from sampling-based operational semantics. For scoring, we associate $\mathtt{score}$ with a one-state Mealy machine that has the following transitions:

$\mathtt{sc}$$\mathtt{State}$$\mathtt{State}^{\bot}$$\oc\mathtt{Unit}$$\oc\mathtt{Real}$$(a,u)$$(a,u)$$(a,b\mathbin{::}u)$$(|b|\,a,u)$

where $u$ is a finite sequence of real numbers and $a,b$ are real numbers such that $a\geq 0$ . We can read these transitions as follows: for each “configuration” $(-,a,u)$ , the Mealy machine sends a query $(a,u)$ to environment in order to know the value of its argument, and if environment answers that the value is $b$ , i.e., if the Mealy machine receives $(a,b\mathbin{::}u)$ , then it outputs $(|b|\,a,u)$ , which is the evaluation result of $(\mathtt{score}(\mathtt{r}_{b}),a,u)$ .

For sampling, we associate $\mathtt{sample}$ with a Mealy machine that has the following transitions:

$\mathtt{sa}$$\ast/b$$\oc\mathtt{Real}$$\mathtt{State}$$\mathtt{State}^{\bot}$$(a,b\mathbin{::}u)$$(a,u)$$\mathtt{sa}$$b/b$$\oc\mathtt{Real}$$\mathtt{State}$$\mathtt{State}^{\bot}$$(a,u)$$(a,b\mathbin{::}u)$

where $u$ is a finite sequence of real numbers and $a,b$ are real numbers such that $a\geq 0$ . The first transition means that in the initial state $\ast$ , given a “configuration” $(-,a,b\mathbin{::}u)$ , the Mealy machine pops the first element of $b\mathbin{::}u$ and memorises the value $b$ by changing its state from $\ast$ to $b$ . After this transition, for any query $(a,u)$ asking the result of sampling, it answers the value memorised in the first transition.

For example, a Mealy machine

$\mathsf{sa}$$\mathsf{sc}$$\oc\mathtt{Real}$$\mathtt{State}^{\bot}$$\mathtt{State}$$\oc\mathtt{Unit}$$\mathtt{State}$

,

which is a denotation of the term

[TABLE]

and behaves as follows:

$\mathsf{sa}$$\mathsf{sc}$$(a,u)$$(a,u)$$(a,b\mathbin{::}u)$$(a,b\mathbin{::}u)$$(|b|\,a,u)$$\ast/b$

.

Our adequacy theorem says that the evaluation result of a term coincides with the execution result of the associated Mealy machine. In fact, for this case, the outcome $(|b|\,a,u)$ of the above Mealy machine is equal to the evaluation result of $(\mathtt{M},a,b\mathbin{::}u)$ , that is, $(\mathtt{M},a,b\mathbin{::}u)\to^{\ast}(\mathtt{skip},|b|\,a,u)$ . In this interaction process, the memoisation mechanism of the $\mathsf{sa}$ -node is necessary, otherwise the $\mathsf{sa}$ -node can not tell the $\mathsf{sc}$ -node that the result of sampling is $b$ .

Remark 4.1.

*Two notions of state (the one coming from the state monad and the one of the of the Mealy machine itself) are used for different purpose here: the first notion is needed to model the call-by-value evaluation strategy where we need to store intermediate effects that are invoked during the evaluation. The second notion of state is needed to model sampling. More concretely, each Mealy machine for sampling need to remember the already sampled values in the current probabilistic branch. *

5 Mealy Machines and their Compositions

After having described Mealy machine semantics briefly and informally, it is now time to get more formal. In this section, we introduce the notion of a Mealy machine and some constructions on Mealy machines. We also introduce a way of diagramatically presenting Mealy machines which is behaviourally sound.

5.1 Mealy Machines, Formally

In this paper, we call a pair of measurable spaces an $\mathbf{Int}$ -object. We use sans-serif capital letters $\mathsf{X},\mathsf{Y},\mathsf{Z},\ldots$ to denote $\mathbf{Int}$ -objects, and we denote the positive/negative part of an $\mathbf{Int}$ -object by the same italic letter superscripted by $+/-$ . For example, $\mathsf{X}$ denotes an $\mathbf{Int}$ -object $(X^{+},X^{-})$ consisting of two measurable spaces $X^{+}$ and $X^{-}$ . The name “ $\mathbf{Int}$ -object” comes from the so-called $\mathbf{Int}$ -construction [26]. Definition 5.1 and the definition of monoidal products in Section 5.4 are also motivated by $\mathbf{Int}$ -construction.

Definition 5.1.

For $\mathbf{Int}$ -objects $\mathsf{X}$ and $\mathsf{Y}$ , a Mealy machine $\mathsf{M}$ from $\mathsf{X}$ to $\mathsf{Y}$ consists of

•

a measurable space $S_{\mathsf{M}}$ called the state space of $\mathsf{M}$ ;

•

an element $s_{\mathsf{M}}\in S_{\mathsf{M}}$ called the initial state of $\mathsf{M}$ ;

•

a partial measurable function

[TABLE]

called the transition function.

If $\mathsf{M}$ is a Mealy machine from $\mathsf{X}$ to $\mathsf{Y}$ , we write $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ .

The transition function $\tau_{\mathsf{M}}$ of a Mealy machine $\mathsf{M}$ describes a mapping between inputs and outputs which can also alter the underlying state. For $x\in X^{+}+Y^{-}$ and $s\in S_{\mathsf{M}}$ , $\tau_{\mathsf{M}}(x,s)=(y,t)$ means that when the current state of $\mathsf{M}$ is $s$ , given an input $x$ , there is an output $y$ and the next state is $t$ .

Readers may wonder why $X^{-}$ appears in the target and $Y^{-}$ appears in the source of the transition function of a Mealy machine from $\mathsf{X}$ to $\mathsf{Y}$ . In short, this is because we are interested in Mealy machines that handle bidirectional computation. The diagrammatic presentation of Mealy machines clarifies the meaning of “bidirectional.” Let $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ be a Mealy machine. In this paper, we depict $\mathsf{M}$ as follows:

$\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$

.

Intuitively, each label on an edge indicates the type of data traveling along the edge. Namely, on the $\mathsf{X}$ -edge (on the $\mathsf{Y}$ -edge), elements in $X^{+}$ (in $Y^{+}$ ) go from left to right, and elements in $X^{-}$ (in $Y^{-}$ ) go from right to left. For example, we depict the following transitions

[TABLE]

for some $y,y^{\prime}\in Y^{-}$ , $x\in X^{-}$ , $y^{\prime\prime}\in Y^{+}$ and $s_{0},s_{1},s_{2}\in S_{\mathsf{M}}$ as the following thick arrows

$\mathsf{M}$$s_{0}/s_{1}$$\mathsf{Y}$$\mathsf{X}$$y$$x$$\mathsf{M}$$s_{0}/s_{2}$$\mathsf{Y}$$\mathsf{X}$$y^{\prime}$$y^{\prime\prime}$

.

(Recall that the white/black bullet indicates the left/right part of the disjoint sum.) The expressions $s_{0}/s_{1}$ and $s_{0}/s_{2}$ on the Mealy machine $\mathsf{M}$ stands for transitions of states. We omit states transitions when we can infer them.

We will give some Mealy machines whose state spaces are trivial, namely $1$ . We call such a Mealy machine token machine. Our usage of the term token machine is along the lines of that in other papers on GoI such as [41, 39]. Since we can identify the transition function of a token machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ with the following partial measurable function

[TABLE]

giving partial measurable function of this type is enough to specify a token machine.

Convention 5.1.

We define a token machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ by giving a partial measurable function from $X^{+}+Y^{-}$ to $Y^{+}+X^{-}$ , and we also call this partial measurable function transition function of $\mathsf{M}$ . Abusing notation, we write $\tau_{\mathsf{M}}$ for this transition function.

5.2 Behavioural Equivalence

We are now ready to give an equivalence relation between Mealy machines which identifies machines which behave the same way. Identifying Mealy machines in terms of their behaviour is important to reason about compositions of Mealy machines in the following part of this paper. Here, we are inspired by behavioural equivalence from coalgebraic theory of modelling transition systems [42].

Let $\mathsf{M}$ and $\mathsf{N}$ be Mealy machines from $\mathsf{X}$ to $\mathsf{Y}$ . We write $\mathsf{M}\preceq_{\mathsf{X},\mathsf{Y}}\mathsf{N}$ when there is a measurable function $f\colon S_{\mathsf{M}}\to S_{\mathsf{N}}$ satisfying $f(s_{\mathsf{M}})=s_{\mathsf{N}}$ and

[TABLE]

The definition means that if we have $\mathsf{M}\preceq_{\mathsf{X},\mathsf{Y}}\mathsf{N}$ , then no observer can distinguish between $\mathsf{M}$ and $\mathsf{N}$ from their input/output behaviour, although their internal structure can be quite different. We define an equivalence relation $\simeq_{\mathsf{X},\mathsf{Y}}$ to be the reflective symmetric transitive closure of $\preceq_{\mathsf{X},\mathsf{Y}}$ . Below, if we can infer the subscript $\mathsf{X},\mathsf{Y}$ from the context, we write $\simeq$ instead of $\simeq_{\mathsf{X},\mathsf{Y}}$ .

Definition 5.2.

For Mealy machines $\mathsf{M},\mathsf{N}\colon\mathsf{X}\multimap\mathsf{Y}$ , we say that $\mathsf{M}$ is behaviourally equivalent to $\mathsf{N}$ when $\mathsf{M}\simeq\mathsf{N}$ .

For a Mealy machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ , we write $[\mathsf{M}]$ for its equivalence class with respect to behavioural equivalence. We define a binary relation $\leq$ between equivalence classes of Mealy machines from $\mathsf{X}$ to $\mathsf{Y}$ by $[\mathsf{M}]\leq[\mathsf{N}]$ if and only if there are $\mathsf{M}^{\prime}\simeq\mathsf{M}$ and $\mathsf{N}^{\prime}\simeq\mathsf{N}$ such that $S_{\mathsf{M}^{\prime}}=S_{\mathsf{N}^{\prime}}$ and $s_{\mathsf{M}^{\prime}}=s_{\mathsf{N}^{\prime}}$ , and the graph relation of $\tau_{\mathsf{M}^{\prime}}$ is a subset of the graph relation of $\tau_{\mathsf{N}^{\prime}}$ .

Proposition 5.1.

The set of equivalence classes for $\simeq_{\mathsf{X},\mathsf{Y}}$ with $\leq$ is a pointed $\omega$ cpo.

We can characterize interpretation of the fixed point operator in $\mathbf{PCFSS}$ in terms of least fixed points, see [43]. We give a proof of Proposition 5.1 in Section 5.3.

5.3 Proof of Proposition 5.1

For a partially defined expressions $E$ and $E^{\prime}$ , we write $E\approx E^{\prime}$ when $E$ is defined if and only if $E^{\prime}$ is defined, and if both expressions are defined, then they are the same. For example, we have $(1-x)^{-1}\approx\sum_{n=0}^{\infty}x^{n}$ for all $x\in\mathbb{R}_{[0,1]}$ . For a measurable space $X$ , we write $LX$ for the measurable space of all finite sequences over $X$ endowed with the following $\sigma$ -algebra:

[TABLE]

We write $\varepsilon$ for the empty sequence. For $a\in X$ and $u\in LX$ , we denote the list obtained by appending $a$ to $u$ by $a\mathbin{::}u$ .

Let $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ be a Mealy machine. We write $Z$ for $X^{+}+Y^{-}$ and $W$ for $Y^{+}+X^{-}$ . Then, the transition function of $\mathsf{M}$ is of the form

[TABLE]

We define partial measurable functions $\alpha_{\mathsf{M}}\colon LZ\to S_{\mathsf{M}}$ and $\beta_{\mathsf{M}}\colon Z\times LZ\to W$ by

[TABLE]

Below, for $x\in W\times S_{\mathsf{M}}$ , we write $\mathrm{fst}(x)$ for the first entry of $x$ , and we write $\mathrm{snd}(x)$ for the second entry of $x$ . By the definition of $\alpha_{\mathsf{M}}$ and $\beta_{\mathsf{M}}$ , we have

[TABLE]

Lemma 5.1.

If $\mathsf{M}\preceq\mathsf{N}$ , then $\beta_{\mathsf{M}}=\beta_{\mathsf{N}}$ .

Proof.

Let $h\colon S_{\mathsf{M}}\to S_{\mathsf{N}}$ be a measurable function that realizes $\mathsf{M}\preceq\mathsf{N}$ . We show $h(\alpha_{\mathsf{M}}(u))\approx\alpha_{\mathsf{N}}(u)$ and $\beta_{\mathsf{M}}(z,u)\approx\beta_{\mathsf{N}}(z,u)$ by induction on the size of $u$ . (Base case)

[TABLE]

(Induction step)

[TABLE]

∎

For a Mealy machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ , we define Mealy machines $\mathsf{M}^{\#},\mathsf{M}^{@}\colon\mathsf{X}\multimap\mathsf{Y}$ by

•

$S_{\mathsf{M}^{\#}}=LZ$ ,

•

$s_{\mathsf{M}^{\#}}=\varepsilon$ ,

•

$\tau_{\mathsf{M}^{\#}}(z,u)=\begin{cases}(\beta_{\mathsf{M}}(z,u),z\mathbin{::}u),&\textnormal{if }\beta_{\mathsf{M}}(z,u)\textnormal{ is defined,}\\ \textnormal{undefined},&\textnormal{otherwise},\\ \end{cases}$

and

•

$S_{\mathsf{M}^{@}}=\{\diamond\}\cup S_{\mathsf{M}}$ ,

•

$s_{\mathsf{M}^{@}}=s_{\mathsf{M}}$ ,

•

$\tau_{\mathsf{M}^{@}}(z,s)=\begin{cases}\tau_{\mathsf{M}}(z,s),&\textnormal{if }s\in S_{\mathsf{M}}\textnormal{ and }\tau_{\mathsf{M}}(z,s)\textnormal{ is defined,}\\ \textnormal{undefined},&\textnormal{if }u=\diamond,\\ \textnormal{undefined},&\textnormal{otherwise}.\end{cases}$

Here, the $\sigma$ -algebra of $S_{\mathsf{M}^{@}}$ is the one induced by $\Sigma_{1+S_{\mathsf{M}}}$ via the obvious bijection between $1+S_{\mathsf{M}}$ and $\{\diamond\}\cup S_{\mathsf{M}}$ .

Lemma 5.2.

$\mathsf{M}\preceq\mathsf{M}^{@}\succeq\mathsf{M}^{\#}$ .

Proof.

It is straightforward to check that the embedding $e\colon S_{\mathsf{M}}\to\{\diamond\}\cup S_{\mathsf{M}}$ is a measurable function that realizes $\mathsf{M}\preceq\mathsf{M}^{@}$ . It remains to show $\mathsf{M}^{\#}\preceq\mathsf{M}^{@}$ . We define a measurable function $h\colon LZ\to\{\diamond\}\cup S_{\mathsf{M}}$ by

[TABLE]

We show that for any $(z,u)\in Z\times LZ$ ,

[TABLE]

by induction on $u\in LZ$ . (Base case)

[TABLE]

(Induction step)

[TABLE]

∎

Proposition 5.2.

For all Mealy machines $\mathsf{M},\mathsf{N}\colon\mathsf{X}\multimap\mathsf{Y}$ , we have $\mathsf{M}\simeq\mathsf{N}$ if and only if $\beta_{\mathsf{M}}=\beta_{\mathsf{N}}$ .

Proof.

If $\mathsf{M}\simeq\mathsf{N}$ , then we can show that $\beta_{\mathsf{M}}=\beta_{\mathsf{N}}$ by using Lemma 5.1. If $\beta_{\mathsf{M}}=\beta_{\mathsf{N}}$ , then we have $\mathsf{M}^{\#}=\mathsf{N}^{\#}$ by the definition of $(-)^{\#}$ . Because we have $\mathsf{M}\simeq\mathsf{M}^{\#}$ and $\mathsf{N}\simeq\mathsf{N}^{\#}$ (Lemma 5.2), we see that $\mathsf{M}$ is behaviourally equivalent to $\mathsf{N}$ . ∎

Hence, each equivalence class $[\mathsf{M}]$ of behavioural equivalence is represented by $\mathsf{M}^{\#}$ , and $\mathsf{M}^{\#}$ is independent of choice of $\mathsf{M}$ . We extend this correspondence to order theoretic structure of Mealy machines.

Lemma 5.3.

Let $\mathsf{M},\mathsf{M}$ be Mealy machines from $\mathsf{X}$ to $\mathsf{Y}$ such that $S_{\mathsf{M}}=S_{\mathsf{N}}$ . If $\tau_{\mathsf{M}}\leq\tau_{\mathsf{N}}$ and $s_{\mathsf{M}}=s_{\mathsf{N}}$ , then $\tau_{\mathsf{M}^{\#}}\leq\tau_{\mathsf{N}^{\#}}$ .

Proof.

By induction on the size of $u\in LZ$ , we can show that if $\alpha_{\mathsf{M}}(u)$ is defined, then $\alpha_{\mathsf{N}}(u)$ is defined and they are the same. Then $\tau_{\mathsf{M}^{\#}}\leq\tau_{\mathsf{N}^{\#}}$ follows from the definition of $(-)^{\#}$ . ∎

Theorem 5.1.

For Mealy machines $\mathsf{M},\mathsf{N}\colon\mathsf{X}\multimap\mathsf{Y}$ ,

[TABLE]

Proof.

If $\tau_{\mathsf{M}^{\#}}\leq\tau_{\mathsf{N}^{\#}}$ , then because $\mathsf{M}^{\#}$ and $\mathsf{N}^{\#}$ are representatives of $[\mathsf{M}]$ and $[\mathsf{N}]$ respectively, we have $[\mathsf{M}]\leq[\mathsf{N}]$ . If $[\mathsf{M}]\leq[\mathsf{N}]$ , then there are $\mathsf{M}^{\prime}\simeq\mathsf{M}$ and $\mathsf{N}^{\prime}\simeq\mathsf{N}$ such that

•

$S_{\mathsf{M}^{\prime}}=S_{\mathsf{N}^{\prime}}$ and $s_{\mathsf{M}^{\prime}}=s_{\mathsf{N}^{\prime}}$ ,

•

the graph relation of $\tau_{\mathsf{M}^{\prime}}$ is a subset of the graph relation of $\tau_{\mathsf{N}^{\prime}}$ .

By Lemma 5.3, we see that $\tau_{{\mathsf{M}^{\prime}}^{\#}}\leq\tau_{{\mathsf{N}^{\prime}}^{\#}}$ . ∎

Theorem 5.2.

The set of equivalence classes of Mealy machines from $\mathsf{X}$ to $\mathsf{Y}$ with the partial order $\leq$ is an $\omega$ -cpo.

Proof.

Let $[\mathsf{N}]$ be an upper bound of an $\omega$ -chain

[TABLE]

By Theorem 5.1, we have

[TABLE]

We define a Mealy machine $\mathsf{L}\colon\mathsf{X}\multimap\mathsf{Y}$ by

•

$S_{\mathsf{L}}=S_{\mathsf{M}_{1}^{\#}}$ ,

•

$s_{\mathsf{L}}=s_{\mathsf{M}_{1}^{\#}}$ ,

•

$\tau_{\mathsf{L}}=\bigvee_{n\in\mathbb{N}}\tau_{\mathsf{M}_{n}^{\#}}$ .

Because $\mathsf{M}_{n}\simeq\mathsf{M}_{n}^{\#}$ , the equivalence class $[\mathsf{L}]$ is an upper bound of the $\omega$ -chain $[\mathsf{M}_{1}]\leq[\mathsf{M}_{2}]\leq\cdots$ . We also have $[\mathsf{L}]\leq[\mathsf{N}]$ because $\tau_{\mathsf{L}}\leq\tau_{\mathsf{M}_{n}^{\#}}$ . ∎

5.4 Constructions on Mealy Machines

It is now time to give some constructions which are the basic building blocks of our Mealy machine semantics. This section consists of three parts. The first part (from Section 5.4.2 to Section 5.4.5) is related to the linear $\lambda$ -calculus and is serves to model the purely functional features of $\mathbf{PCFSS}$ , such as $\lambda$ -abstraction and function application. In the second part (Section 5.4.6 and Section 5.4.7), we give Mealy machines modelling real numbers and measurable functions. In the last part (from Section 5.4.9 to Section 5.4.11), we introduce a state monad and associate the monad with Mealy machines modelling $\mathtt{score}$ and $\mathtt{sample}$ .

5.4.1 Composition

Let $\mathsf{X}$ , $\mathsf{Y}$ and $\mathsf{Z}$ be $\mathbf{Int}$ -objects, and let $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ , $\mathsf{N}\colon\mathsf{Y}\multimap\mathsf{Z}$ be Mealy machines. We can now define their composition $\mathsf{N}\circ\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Z}$ . Before giving a precise definition, some intuitive explanation about $\mathsf{N}\circ\mathsf{M}$ is in order. The main idea is to define $\mathsf{N}\circ\mathsf{M}$ as a Mealy machine obtained by connecting $\mathsf{N}$ and $\mathsf{M}$ in the following manner:

$\mathsf{M}$$\mathsf{N}$$\mathsf{X}$$\mathsf{Y}$$\mathsf{Z}$

.

The following series of thick arrows

$\mathsf{M}$$\mathsf{N}$$y_{0}$$y_{1}$$y_{2}$$y_{3}$$z$$z^{\prime}$

illustrates an execution of the obtained Mealy machine. Given an input from an edge, $\mathsf{M}$ and $\mathsf{N}$ engage in some interactive communication, and at some point, some output is produced. Because $\mathsf{N}\circ\mathsf{M}$ performs “parallel composition plus connecting,” the state space of $\mathsf{N}\circ\mathsf{M}$ should be $S_{\mathsf{M}}\times S_{\mathsf{N}}$ , and the initial state should be $(s_{\mathsf{M}},s_{\mathsf{N}})$ . The transition function of $\mathsf{N}\circ\mathsf{M}$ should be given by the collection of all possible interaction paths between $\mathsf{M}$ and $\mathsf{N}$ .

Let us give a precise definition. For Mealy machines $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ and $\mathsf{N}\colon\mathsf{Y}\multimap\mathsf{Z}$ , we define the state space and the initial states of $\mathsf{N}\circ\mathsf{M}$ by $S_{\mathsf{N}\circ\mathsf{M}}=S_{\mathsf{M}}\times S_{\mathsf{N}}$ , $s_{\mathsf{N}\circ\mathsf{M}}=(s_{\mathsf{M}},s_{\mathsf{N}})$ and we define the transition function $\tau_{\mathsf{N}\circ\mathsf{M}}$ by

[TABLE]

where the $f_{A,B,C,D}\colon(A+B)\times S_{\mathsf{N}\circ\mathsf{M}}\to(C+D)\times S_{\mathsf{N}\circ\mathsf{M}}$ are restrictions of the following partial measurable function

[TABLE]

and the above join is with respect to the inclusion order between graph relations. The above join is measurable because measurable sets are closed under countable joins. It is tedious but doable to check that the above join always exists and that the composition is compatible with behavioural equivalence and satisfies associativity modulo behavioural equivalence. We define a Mealy machine $\mathsf{id}_{\mathsf{X}}\colon\mathsf{X}\multimap\mathsf{X}$ by $\tau_{\mathsf{id}_{\mathsf{X}}}=\mathrm{id}_{X^{+}+X^{-}}$ . This is the unit of the composition modulo behavioural equivalence.

5.4.2 Monoidal Products

Monoidal Products of Int-objects

We introduce monoidal products of $\mathbf{Int}$ -objects and their diagrammatic presentation. For $\mathbf{Int}$ -objects $\mathsf{X}$ and $\mathsf{Y}$ , we define a $\mathbf{Int}$ -object $\mathsf{X}\otimes\mathsf{Y}$ by

[TABLE]

We define an $\mathbf{Int}$ -object $\mathsf{I}$ to be $(\emptyset,\emptyset)$ . We write $\mathsf{X}\otimes\mathsf{Y}\otimes\cdots$ for $\mathsf{X}\otimes(\mathsf{Y}\otimes\cdots)$ .

Let $\mathsf{X}_{1},\ldots,\mathsf{X}_{n},\mathsf{Y}_{1},\ldots,\mathsf{Y}_{m}$ be $\mathbf{Int}$ -object. We depict a Mealy machine $\mathsf{M}$ from $\mathsf{X}_{1}\otimes\cdots\otimes\mathsf{X}_{n}$ to $\mathsf{Y}_{1}\otimes\cdots\otimes\mathsf{Y}_{m}$ as a node with edges labeled by $\mathsf{X}_{1},\ldots,\mathsf{X}_{n}$ on the left hand side and edges labeled by $\mathsf{Y}_{1},\ldots,\mathsf{Y}_{m}$ on the right hand side:

$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$

.

We do not draw any edges on the left/right hand side when the domain/codomain of $\mathsf{M}$ is $\mathsf{I}$ :

$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{M}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$

The diagrammatic presentation of monoidal products allows for an intuitive description of transition functions. For example, we can depict transitions

[TABLE]

for some $y\in Y_{1}^{-}$ , $x\in X_{1}^{+}$ , $x^{\prime}\in X_{1}^{-}$ , $y^{\prime}\in Y_{m}^{+}$ and $s,t,t^{\prime}\in S_{\mathsf{M}}$ as follows:

$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$$y$$x^{\prime}$$s/t$$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$$x$$y^{\prime}$$s/t^{\prime}$

We note that there are several ways to present a Mealy machine $\mathsf{M}\colon\mathsf{X}_{1}\otimes\cdots\otimes\mathsf{X}_{n}\multimap\mathsf{Y}_{1}\otimes\cdots\otimes\mathsf{Y}_{m}$ such as

$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$

, $\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\mathsf{X}_{1}\otimes\cdots\otimes\mathsf{X}_{n-1}$

, $\mathsf{M}$$\mathsf{Y}_{1}\otimes\cdots\otimes\mathsf{Y}_{m}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$

$\cdots$ .

Monoidal Product of Mealy Machines

We give monoidal products of Mealy machines. For Mealy machines $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Z}$ and $\mathsf{N}\colon\mathsf{Y}\multimap\mathsf{W}$ , we define a Mealy machine $\mathsf{M}\otimes\mathsf{N}\colon\mathsf{X}\otimes\mathsf{Y}\multimap\mathsf{Z}\otimes\mathsf{W}$ by: $S_{\mathsf{M}\otimes\mathsf{N}}=S_{\mathsf{M}}\times S_{\mathsf{N}}$ , $s_{\mathsf{M}\otimes\mathsf{N}}=(s_{\mathsf{M}},s_{\mathsf{N}})$ and $\tau_{\mathsf{M}\otimes\mathsf{N}}$ is given by

[TABLE]

It is not difficult to check that the monoidal product is compatible with behavioural equivalence.

We depict $\mathsf{M}\otimes\mathsf{N}\colon(\mathsf{X}\otimes\mathsf{Y})\multimap(\mathsf{Z}\otimes\mathsf{W})$ as follows:

$\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$

As indicated by the above diagram, $\mathsf{M}\otimes\mathsf{N}$ consists of two sub-machines $\mathsf{M}$ and $\mathsf{N}$ working independently. For example, if we have

$\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$$s_{0}/s_{1}$$t_{0}/t_{1}$$z$$z^{\prime}$$w$$y$

then $\mathsf{M}\otimes\mathsf{N}$ has the following transitions:

$\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$$s_{0}/s_{1}$$t/t$$z$$z^{\prime}$$\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$$t_{0}/t_{1}$$s/s$$w$$y$

for all $t\in S_{\mathsf{N}}$ and for all $s\in S_{\mathsf{M}}$ .

Convention 5.2.

We do the following identification:

•

We identity $\mathsf{X}\otimes(\mathsf{Y}\otimes\mathsf{Z})$ with $(\mathsf{X}\otimes\mathsf{Y})\otimes\mathsf{Z}$ by the canonical isomorphism $X+(Y+Z)\cong(X+Y)+Z.$

•

*We identify $\mathsf{I}\otimes\mathsf{X}$ and $\mathsf{X}\otimes\mathsf{I}$ with $\mathsf{X}$ by the unit laws $X^{+}+\emptyset\cong X^{+}$ and $\emptyset+X^{-}\cong X^{-}$ .

5.4.3 Axiom Link and Cut Link

For an $\mathbf{Int}$ -object $\mathsf{X}$ , we define $\mathsf{X}^{\bot}$ to be $(X^{-},X^{+})$ , and we define token machines

[TABLE]

by $\tau_{\mathsf{unit}_{\mathsf{X}}}=\mathrm{id}_{X^{+}+X^{-}}$ and $\tau_{\mathsf{counit}_{\mathsf{X}}}=\mathrm{id}_{{X^{-}+X^{+}}}$ . We depict them by single edges

$\mathsf{X}$$\mathsf{X}^{\bot}$$\mathsf{X}^{\bot}$$\mathsf{X}$

respectively. This is compatible with behaviour of these Mealy machines: if we give an input to an edge, then we will get the same value from the other end of the edge. For example, for any $x\in X^{+}$ , we have

$x$$x$$x$$x$

.

5.4.4 Symmetry

Let $\mathsf{X}$ and $\mathsf{Y}$ be $\mathbf{Int}$ -objects. We define a token machine $\mathsf{sym}_{\mathsf{X},\mathsf{Y}}\colon\mathsf{X}\otimes\mathsf{Y}\multimap\mathsf{Y}\otimes\mathsf{X}$ by letting its transition function be the canonical isomorphism

[TABLE]

We depict $\mathsf{sym}_{\mathsf{X},\mathsf{Y}}$ by a crossing:

$\mathsf{X}$$\mathsf{Y}$$\mathsf{X}$$\mathsf{Y}$$x$$x$$y$$y$

As arrows in the right hand side indicate, given an input from an edge in one side, $\mathsf{sym}_{\mathsf{X},\mathsf{Y}}$ outputs the same value to the corresponding edge on other side.

5.4.5 A Modal Operator

We give a constructor on Mealy machines that corresponds to the resource modality in linear logic. For an $\mathbf{Int}$ -object $\mathsf{X}$ , we define an $\mathbf{Int}$ -object $\oc\mathsf{X}$ by

[TABLE]

We can informally regard $\oc\mathsf{X}$ as a countable monoidal power $\bigotimes_{n\in\mathbb{N}}\mathsf{X}\approx\mathsf{X}\otimes\mathsf{X}\otimes\cdots$ . Following this intuition, we extend the action of $\oc(-)$ to Mealy machines. Let $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ be a Mealy machine. We define a Mealy machine $\oc\mathsf{M}\colon\oc\mathsf{X}\multimap\oc\mathsf{Y}$ by: the state space of $\oc\mathsf{M}$ is defined to be $|\mathsf{M}|^{\mathbb{N}}$ associated with the least $\sigma$ -algebra such that for all $A_{1},A_{2},\ldots\in\Sigma_{\mathsf{M}}$ ,

[TABLE]

the initial state $s_{\oc\mathsf{M}}$ is $(s_{\mathsf{M}},s_{\mathsf{M}},\ldots)$ ; the transition function $\tau_{\oc\mathsf{M}}$ is the unique partial measurable function satisfying

[TABLE]

for all $n\in\mathbb{N}$ . Here, $\mathrm{inj}_{n}\colon(-)\to\mathbb{N}\times(-)$ are the $n$ th injections, and $\mathrm{ins}_{n}\colon S_{\mathsf{M}}\times S_{\mathsf{M}}^{\mathbb{N}}\to S_{\mathsf{M}}^{\mathbb{N}}$ sends $(s,\{s_{n}\}_{n\in\mathbb{N}})$ to $(s_{0},\ldots,s_{n-1},s,s_{n},s_{n+1},\ldots)$ .

As $\oc(-)$ is defined to be a countable monoidal power, $\oc\mathsf{M}$ behaves as a parallel composition of countably infinite copies of $\mathsf{M}$ . For example, if we have

$\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$$s/s^{\prime}$$y$$x$

then for all $n\in\mathbb{N}$ and $t_{1},t_{2},\ldots\in S_{\mathsf{M}}$ , we have

$\oc\mathsf{M}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$(t_{1},\ldots,t_{n-1},s,t_{n},t_{n+1},\ldots)/(t_{1},\ldots,t_{n-1},s^{\prime},t_{n},t_{n+1},\ldots)$$(n,y)$$(n,x)$

.

In other words, given an input whose first entry is $n$ , then the $n$ th copy of $\mathsf{M}$ handles the input, and there is no side effect to the other copies of $\mathsf{M}$ .

Proposition 5.3.

The operator $\oc(-)$ is compatible with the behavioral equivalence and is functorial. Namely,

•

for all Mealy machines $\mathsf{M},\mathsf{N}\colon\mathsf{X}\multimap\mathsf{Y}$ , if $\mathsf{M}\simeq\mathsf{M}^{\prime}$ , then $\oc\mathsf{M}\simeq\oc\mathsf{N}$ ; and

•

for all Mealy machines $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ and $\mathsf{N}\colon\mathsf{Y}\multimap\mathsf{Z}$ ,

[TABLE]

•

$\oc\mathsf{id}_{\mathsf{X}}\simeq\mathsf{id}_{\oc\mathsf{X}}$ .

Convention 5.3.

For the sake of legibility and due to lack of space, we sometimes implicitly identify $\oc(\mathsf{X}\otimes\mathsf{Y})$ with $\oc\mathsf{X}\otimes\oc\mathsf{Y}$ by the canonical isomorphism $\mathbb{N}\times(X+Y)\cong\mathbb{N}\times X+\mathbb{N}\times Y.$

Under the above convention, for Mealy macines $\mathsf{M}\colon\oc(\mathsf{X}\otimes\mathsf{Y})\multimap\mathsf{Z}$ and $\mathsf{N}\colon\mathsf{W}\multimap\mathsf{X}$ , we can simply write $\mathsf{M}\circ(\oc\mathsf{N}\otimes\mathsf{Y}_{\oc\mathsf{Y}})\colon\oc\mathsf{W}\otimes\oc\mathsf{Y}\multimap\mathsf{Z}$ . It is not difficult to see that when $\mathsf{Z}=\oc\mathsf{Z}^{\prime}$ and $\mathsf{M}=\oc\mathsf{M}^{\prime}$ for some $\mathsf{M}^{\prime}\colon\mathsf{X}\otimes\mathsf{Y}\multimap\mathsf{Z}^{\prime}$ , we have $\mathsf{M}\circ(\oc\mathsf{N}\otimes\mathsf{id}_{\oc\mathsf{Y}})\simeq\oc(\mathsf{M}\circ(\mathsf{N}\otimes\mathsf{id}_{\mathsf{Y}}))$ .

Dereliction

For an $\mathbf{Int}$ -object $\mathsf{X}$ , we define a token machine $\mathsf{d}_{\mathsf{X}}\colon\oc\mathsf{X}\multimap\mathsf{X}$ by defining $\tau_{\mathsf{d}_{\mathsf{X}}}\colon(\mathbb{N}\times X^{+})+X^{-}\to X^{+}+(\mathsf{N}\times X^{-})$ by

[TABLE]

The Mealy machine $\mathsf{d}_{\mathsf{X}}$ pops/pushes indices with probability $1$ . Namely, we have

$\mathsf{d}_{\mathsf{X}}$$\mathsf{X}$$\oc\mathsf{X}$$x$$(n,x)$

$\mathsf{d}_{\mathsf{X}}$$\mathsf{X}$$\oc\mathsf{X}$$x$$(0,x)$

for all $n\in\mathbb{N}$ , $x\in X^{+}$ and $x^{\prime}\in X^{-}$ . Hence, for any Mealy machine $\mathsf{M}\colon\mathsf{I}\multimap\mathsf{X}$ , if we have

$\mathsf{M}$$\mathsf{X}$$x$$x^{\prime}$$s/s^{\prime}$

for some $x\in X^{-}$ , $x^{\prime}\in X^{+}$ and $s,s^{\prime}\in S_{\mathsf{M}}$ , then $\mathsf{d}_{\mathsf{X}}\circ\oc\mathsf{M}$ has the following transition:

$\oc\mathsf{M}$$\oc\mathsf{X}$$(0,x)$$(0,x^{\prime})$$(s,s_{1},s_{2},\ldots)/(s^{\prime},s_{1},s_{2},\ldots)$$\mathsf{d}_{\mathsf{X}}$$\mathsf{X}$$x$$x^{\prime}$

for all $s_{1},s_{2},\ldots\in S_{\mathsf{M}}$ .

Proposition 5.4.

For any Mealy machine $\mathsf{M}\colon\mathsf{I}\multimap\mathsf{X}$ ,

[TABLE]

Diagrammatically, we have

$\oc\mathsf{M}$$\mathsf{d}_{\mathsf{X}}$$\oc\mathsf{X}$$\mathsf{X}$$\simeq$$\mathsf{M}$$\mathsf{X}$

.

Digging and Contraction

For natural numbers $n,m\in\mathbb{N}$ , we write $\langle n,m\rangle$ for the Cantor pairing $n+(n+m)(n+m+1)/2$ , and we write $n|_{0}$ and $n|_{1}$ for unique natural numbers such that $n=\langle n|_{0},n|_{1}\rangle$ . For an $\mathbf{Int}$ -object $\mathsf{X}$ , let $\mathsf{dg}_{\mathsf{X}}\colon\oc\mathsf{X}\multimap\oc\oc\mathsf{X}$ and $\mathsf{con}_{\mathsf{X}}\colon\oc\mathsf{X}\multimap\oc\mathsf{X}\otimes\oc\mathsf{X}$ be stateless deterministic Mealy machines whose transition functions

[TABLE]

are given by

[TABLE]

These stateless Mealy machines $\mathsf{dg}_{\mathsf{X}}$ and $\mathsf{con}_{\mathsf{X}}$ behave as follows: for all $n,m\in\mathbb{N}$ ,

$\mathsf{dg}_{\mathsf{X}}$$\oc\oc\mathsf{X}$$\oc\mathsf{X}$$(n,(m,x))$$(\langle n,m\rangle,x)$

$\mathsf{c}_{\mathsf{X}}$$\oc\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$$(2n,x)$$(n,x)$$(2n+1,x)$$(n,x)$

Proposition 5.5.

For any Mealy machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ ,

[TABLE]

Diagrammatically, we have

$\oc\mathsf{M}$$\mathsf{dg}_{\mathsf{Y}}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\oc\mathsf{Y}$$\simeq$$\mathsf{dg}_{\mathsf{Y}}$$\oc\oc\mathsf{M}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\oc\mathsf{Y}$

$\oc\mathsf{M}$$\mathsf{c}_{\mathsf{X}}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\mathsf{Y}$$\oc\mathsf{Y}$$\simeq$$\mathsf{c}_{\mathsf{X}}$$\oc\mathsf{M}$$\oc\mathsf{M}$$\oc\mathsf{Y}$$\oc\mathsf{X}$$\oc\mathsf{Y}$$\oc\mathsf{Y}$$\oc\mathsf{X}$

.

Weakening

We define a token machine $\mathsf{w}_{X}\colon\mathsf{X}\to\mathsf{I}$ by

[TABLE]

Because the identity is the only Mealy machine from $\mathsf{I}$ to $\mathsf{I}$ (up to behavioural equivalence), we see that for any Mealy machine $\mathsf{M}\colon\mathsf{I}\multimap\mathsf{X}$ ,

[TABLE]

This behavioural equivalence means that we can remove

$\mathsf{M}$$\mathsf{w}_{\mathsf{X}}$$\mathsf{X}$

from any diagram.

5.4.6 Real Numbers

We define an $\mathbf{Int}$ -object $\mathsf{R}$ to be $(\mathbb{S},\mathbb{S})$ where $\mathbb{S}$ is the measurable space of all finite sequences of real numbers endowed with the following $\sigma$ -algebra

[TABLE]

For $a\in\mathbb{R}$ , we define a token machine $\mathsf{r}_{a}\colon\mathsf{I}\multimap\mathsf{R}$ by

[TABLE]

The transition means that given a query $u$ from environment, $\mathsf{r}_{a}$ answers its value $a$ by appending $a$ to $u$ . We will use $u$ as a stack. See Section 5.4.7 and Section 5.4.10.

5.4.7 Measurable Functions

We associate a measurable function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ with a token machine $\mathsf{fn}_{f}\colon\mathsf{R}^{\otimes n}\multimap\mathsf{R}$ . For simplicity, we define $\mathsf{fn}_{f}$ for $n=1$ and $n=2$ . When $n=1$ , the transition function $\tau_{\mathsf{fn}_{f}}\colon\mathbb{S}+\mathbb{S}\to\mathbb{S}+\mathbb{S}$ is given by

[TABLE]

We explain how $\mathsf{fn}_{f}$ simulates $f$ by describing execution of $\mathsf{fn}_{f}\circ\mathsf{r}_{a}$ for a real number $a\in\mathbb{R}$ . As in the following diagram, given an input $u\in\mathbb{S}$ from the right $\mathsf{R}$ -edge, $\mathsf{fn}_{f}$ sends $u$ to the left $\mathsf{R}$ -edge in order to obtain the value of its argument. The return value to $\mathsf{fn}_{f}$ from $\mathsf{r}_{a}$ is $a\mathbin{::}u$ , by which $\mathsf{fn}_{f}$ sees that its argument is $a$ . Then, $\mathsf{fn}_{f}$ outputs $f(a)\mathbin{::}u$ . As a whole, the following Mealy machine is behaviourally equivalent to $\mathsf{r}_{f(a)}$ .

$\mathsf{fn}_{f}$$\mathsf{r}_{a}$$u$$a\mathbin{::}u$$\mathsf{R}$$\mathsf{R}$$u$$f(a)\mathbin{::}u$

When $n=2$ , the transition function of $\mathsf{fn}_{f}\colon\mathsf{R}\otimes\mathsf{R}\multimap\mathsf{R}$ is $\tau_{\mathsf{fn}_{f}}\colon(\mathbb{S}+\mathbb{S})+\mathbb{S}\to\mathbb{S}+(\mathbb{S}+\mathbb{S})$ given by

[TABLE]

As in the following diagram, given an input $u\in\mathbb{S}$ from the right $\mathsf{R}$ -edge, $\mathsf{fn}_{f}$ first sends $u$ to the lower $\mathsf{R}$ -edge in the left hand side in order to obtain the value of its first argument. The return value to $\mathsf{fn}_{f}$ from $\mathsf{r}_{a}$ is $a\mathbin{::}u$ . Next, $\mathsf{fn}_{f}$ sends $a\mathbin{::}u$ to the upper $\mathsf{R}$ -edge in the left hand side. Then $\mathsf{r}_{b}$ returns $b\mathbin{::}a\mathbin{::}u$ . Now, $\mathsf{fn}_{f}$ sees that its first argument is $a$ and its second argument is $b$ . Finally, $\mathsf{fn}_{f}$ outputs $f(a,b)\mathbin{::}u$ .

$\mathsf{fn}_{f}$$\mathsf{r}_{a}$$\mathsf{r}_{b}$$u$$a\mathbin{::}u$$a\mathbin{::}u$$b\mathbin{::}a\mathbin{::}u$$u$$f(a,b)\mathbin{::}u$

For general cases, $f$ may have more arguments, and $\mathsf{fn}_{f}$ sequentially sends queries to its arguments storing partial information about its arguments on finite sequences of real numbers.

5.4.8 Conditional Branching

For an $\mathbf{Int}$ -object $\mathsf{X}$ such that $X^{-}$ is a measurable subspace of $\mathbb{S}$ , we define

[TABLE]

to be a token machine whose transition function

[TABLE]

is given by

[TABLE]

For a real number $a\in\mathbb{R}$ and Mealy machines $\mathsf{M},\mathsf{N}\colon\mathsf{I}\multimap\mathsf{X}$ , we describe execution of $\mathsf{cd}_{\mathsf{X}}\circ(\mathsf{r}_{a}\otimes\mathsf{M}\otimes\mathsf{N}).$ Given an input $u\in X^{-}$ , then $\mathsf{cd}_{\mathsf{X}}$ tries to check whether $a$ is zero or not by sending $u$ to the $\mathsf{R}$ -edge. There are two cases: (i) if $a$ is [math], then $\mathsf{r}_{a}$ returns $0\mathbin{::}u$ , and $\mathsf{cd}_{\mathsf{X}}$ forwards $u$ to the middle $\mathsf{X}$ -edge; (ii) if $a$ is not [math], say $1$ , then $\mathsf{r}_{a}$ returns $1\mathbin{::}u$ , and $\mathsf{cd}_{\mathsf{X}}$ forwards $u$ to the upper $\mathsf{X}$ -edge:

$\mathsf{cd}$$\mathsf{r}_{0}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$u$$0\mathbin{::}u$$u$$x$$u$$x$ (i)

$\mathsf{cd}$$\mathsf{r}_{0}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$u$$0\mathbin{::}u$$u$$x$$u$$x$ (ii)

Because in both cases, all outputs from $\mathsf{M}$ and $\mathsf{N}$ are sent to the $\mathsf{X}$ -edge in the right hand, we see that $\mathsf{cd}_{\mathsf{X}}\circ(\mathsf{r}_{a}\otimes\mathsf{M}\otimes\mathsf{N})$ simulates $\mathsf{M}$ when $a=0$ and simulates $\mathsf{N}$ when $a\neq 0$ .

Proposition 5.6.

For $a\in\mathbb{R}$ and for Mealy machines $\mathsf{M},\mathsf{N}\colon\mathsf{I}\to\mathsf{X}$ , we have

[TABLE]

Diagrammatically, we have

$\mathsf{cd}$$\mathsf{r}_{0}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$\simeq$$\mathsf{M}$$\mathsf{X}$

and for any $a\neq 0$ ,

$\mathsf{cd}$$\mathsf{r}_{a}$$\mathsf{M}$$\mathsf{N}$$\mathsf{R}$$\mathsf{X}$$\mathsf{X}$$\mathsf{X}$$\simeq$$\mathsf{N}$$\mathsf{X}$

Proof.

When $a=0$ , the first behavioral equivalence is realized by the first projection from $1\times 1\times S_{\mathsf{M}}\times S_{\mathsf{M}}\cong S_{\mathsf{M}}\times S_{\mathsf{M}}$ to $S_{\mathsf{M}}$ . When $a\neq 0$ , the first behavioral equivalence is realized by the second projection from $1\times 1\times S_{\mathsf{M}}\times S_{\mathsf{M}}\cong S_{\mathsf{M}}\times S_{\mathsf{M}}$ to $S_{\mathsf{N}}$ . The second behavioral equivalence is realized by the obvious measurable function from $1\times 1\times S_{\mathsf{M}}\times S_{\mathsf{N}}$ to $1$ . ∎

5.4.9 A State Monad

Let $\mathbb{T}$ be the subspace of $\mathbb{S}$ consisting of all finite sequences of real numbers in $\mathbb{R}_{[0,1]}$ . Recall that $\mathbb{R}_{\geq 0}\times\mathbb{T}$ is “the set of states” in sampling-based operational semantics and our idea is to model $\mathtt{score}$ and $\mathtt{sample}$ by a state monad. In this section, we give a state monad that we use in our Mealy machine semantics. We define $\mathbf{Int}$ -objects $\mathsf{S}_{0}$ and $\mathsf{S}$ by

[TABLE]

Then $\mathsf{S}\otimes(-)$ is a state monad (on $\mathbf{Mealy}$ ) because for any $\mathbf{Int}$ -object $\mathsf{X}$ , we have $\mathsf{S}\otimes\mathsf{X}=((\mathsf{S}_{0}\otimes\mathsf{X})^{\bot}\otimes\mathsf{S}_{0})^{\bot}.$ The unit and the multiplication of this monad are:

[TABLE]

where $\mathsf{e}=\mathsf{unit}_{\mathsf{S}_{0}}$ and $\mathsf{m}=\mathsf{S}_{0}\otimes\mathsf{counit}_{\mathsf{S}_{0}}\otimes\mathsf{S}_{0}^{\bot}$ . Note that $\mathsf{S}$ is equal to $\mathsf{S}_{0}\otimes\mathsf{S}_{0}^{\bot}$ . We can depict the unit and the multiplication as follows:

$\mathsf{X}$$\mathsf{e}$$\mathsf{S}$$\mathsf{X}$$\mathsf{m}$$\mathsf{S}$$\mathsf{S}$$\mathsf{S}$

.

5.4.10 Scoring

We define $\mathsf{sc}$ to be a token machine from $\mathsf{R}$ to $\mathsf{S}$ whose transition function $\tau_{\mathsf{sc}}\colon\mathbb{S}+\mathbb{R}_{\geq 0}\times\mathbb{T}\to\mathbb{R}_{\geq 0}\times\mathbb{T}+\mathbb{S}$ is given by

[TABLE]

The token machine simulates scoring $(\mathtt{score}(\mathtt{r}_{a}),b,u)\to(\mathtt{skip},|a|\,b,u)$ as follows:

$\mathsf{r}_{a}$$\mathsf{sc}$$\mathsf{R}$$\mathsf{S}$$b\mathbin{::}u$$a\mathbin{::}b\mathbin{::}u$$(b,u)$$(|a|b,u)$

.

5.4.11 Sampling

We define a Mealy machine $\mathsf{sa}\colon\mathsf{I}\multimap\mathsf{S}\otimes\oc\mathsf{R}$ by: the state space $S_{\mathsf{sa}}$ is defined to be $\{\ast\}\cup\mathbb{R}_{[0,1]}$ , and the initial state $s_{\mathsf{sa}}$ is $\ast$ , and the transition function

[TABLE]

is given by

[TABLE]

As we explained in Section 4.2, the Mealy machine $\mathsf{sa}$ simulates the evaluation rule $(\mathtt{sample},a,b\mathbin{::}u)\to(b,a,u)$ :

$\mathsf{sa}$$\oc\mathsf{R}$$\mathsf{S}$$\ast/b$$(a,b\mathbin{::}u)$$(a,u)$$\mathsf{sa}$$\oc\mathsf{R}$$\mathsf{S}$$b/b$$(n,u)$$(n,b\mathbin{::}u)$

.

Namely, $\mathsf{sa}$ pops $b$ from the trace, and then $\mathsf{sa}$ answers queries $(n,u)$ that the result of sampling is $b$ .

5.5 Diagrammatic Reasoning

We now give a brief remark on diagrammatic presentation of Mealy machines. The diagrammatic presentation of a Mealy machine is not only for intuitive explanation, but also for rigorous reasoning about behavioural equivalence. This follows from some categorical observation. Let $\mathbf{Mealy}$ be the category of $\mathbf{Int}$ -objects and behavioural equivalence classes of Mealy machines, where composition is induced by the composition of Mealy machines. We will give a proof of the followng proposition in the next section.

Proposition 5.7.

*The category $\mathbf{Mealy}$ is a compact closed category. The dual of an $\mathbf{Int}$ -object $\mathsf{X}$ is $\mathsf{X}^{\bot}$ . The unit and the counit arrows are $\mathsf{unit}_{\mathsf{X}}$ and $\mathsf{counit}_{\mathsf{X}}$ . *

Therefore, as a consequence of the coherence theorem for compact closed categories [44, 38], we see that graph isomorphism preserves behavioural equivalence.

Proposition 5.8.

If two Mealy machines have the same diagrammatic presentation modulo some rearrangement of edges and nodes, then they are behaviourally equivalent.

For example, for all Mealy machines $\mathsf{M}\colon\mathsf{X}\otimes\mathsf{Y}\multimap\mathsf{Z}\otimes\mathsf{W}$ and $\mathsf{N}\colon\mathsf{W}\multimap\mathsf{Y}$ , we have

$\mathsf{M}$$\mathsf{N}$$\mathsf{W}$$\mathsf{Y}$$\mathsf{Z}$$\mathsf{X}$$\simeq$$\mathsf{M}$$\mathsf{N}$$\mathsf{Y}$$\mathsf{W}$$\mathsf{X}$$\mathsf{Z}$

.

5.6 Proof of Proposition 5.7

5.6.1 The Category of Partial Measurable Functions

For some basic categorical notions, see standard text books such as [45]. We define $\mathbf{pMeas}$ to be the category of measurable spaces and partial measurable functions. In $\mathbf{pMeas}$ , the empty space $\emptyset$ is the initial object, and the coproduct space $X+Y$ is the coproduct of $X$ and $Y$ in the categorical sense. We write

[TABLE]

for the left/right injections. For partial measurable functions $f\colon X\to Y$ and $g\colon Z\to Y$ , we define $[f,g]\colon X+Z\to Y$ to be the cotupling of $f$ and $g$ . For partial measurable functions $f\colon X\to Y$ and $g\colon Z\to W$ , we define partial measurable functions $f+g\colon X+Z\to Y+W$ and $f\times g\colon X\times Z\to Y\times W$ by

[TABLE]

We note that $(\mathbf{pMeas},1,\times)$ and $(\mathbf{pMeas},\emptyset,+)$ are symmetric monoidal categories. We also note that $X\times(-)$ distributes over the coproducts, i.e., the canonical arrow

[TABLE]

is an isomorphism.

The notion of trace introduced by Joyal, Street and Verity [26] plays important role in this section.

Definition 5.3.

Let $(\mathcal{C},I,\otimes)$ be a symmetric monoidal category. A trace operator on $(\mathcal{C},I,\otimes)$ is a family $\left\{\mathbf{tr}_{X,Y}^{Z}\right\}_{X,Y,Z\in\mathcal{C}}$ satisfying the following axioms:

•

(Dinaturality) For all $\mathcal{C}$ -arrows $f\colon X\otimes Z\to Y\otimes Z$ , $g\colon X^{\prime}\to X$ and $h\colon Y\to Y^{\prime}$ ,

[TABLE]

•

(Sliding) For all $\mathcal{C}$ -arrows $f\colon X\otimes Z\to Y\otimes W$ , $g\colon W\to Z$ ,

[TABLE]

•

(Vanishing I) For all $\mathcal{C}$ -arrows $f\colon X\otimes I\to Y\otimes I$ ,

[TABLE]

•

(Vanishing II) For all $\mathcal{C}$ -arrows $f\colon X\otimes Z\otimes W\to Y\otimes Z\otimes W$ ,

[TABLE]

•

(Superposing) For all $\mathcal{C}$ -arrows $f\colon X\otimes Z\to Y\otimes Z$ ,

[TABLE]

•

(Yanking) For all $X\in\mathcal{C}$ ,

[TABLE]

where $\sigma_{X,Y}\colon X\otimes Y\to Y\otimes X$ is the brading.

A symmetric monoidal category $(\mathcal{C},I,\otimes)$ endowed with a trace operator $\mathbf{tr}$ is called a traced symmetric monoidal category.

We give a trace operator on $(\mathbf{pMeas},\emptyset,+)$ . The symmetric monoidal category $(\mathbf{pMeas},\emptyset,+)$ is enriched over $\omega\mathbf{Cppo}$ , which is the cartesian category of pointed $\omega$ -cpos and continuous functions. The partial order on a hom-set $\mathbf{pMeas}(X,Y)$ is given by

[TABLE]

The least arrow $\bot_{X,Y}\colon X\to Y$ is the empty partial measurable function. The $\omega\mathbf{Cppo}$ -enrichment induces an iterator

[TABLE]

given by

[TABLE]

The operator $\mathbf{iter}$ induces another operator

[TABLE]

given by

[TABLE]

Concretely, for a partial measurable function $f\colon X+Z\to Y+Z$ ,

[TABLE]

if and only if either $f(\bullet,x)=(\bullet,y)$ or there is a finite sequence $z_{1},\ldots,z_{n}\in Z$ such that

[TABLE]

Proposition 5.9.

The family of operators $\left\{\mathbf{tr}_{X,Y}^{Z}\right\}_{X,Y,Z\in\mathbf{pMeas}}$ is a trace operator of the symmetric monoidal category $(\mathbf{pMeas},\emptyset,+)$ . Furthermore, the trace operator is uniform [46] with respect to partial measurable functions : for all partial measurable functions $f\colon X+Z\to Y+Z$ , $f\colon X+W\to Y+W$ and $h\colon Z\to W$ , if

[TABLE]

commutes, then

[TABLE]

Proof.

It is straightforward to adapt the argument in [47, Section A]. ∎

We will use the next proposition to construct a trace operator for Mealy machines.

Proposition 5.10.

For any partial measurable function $f\colon X+Z\to Y+Z$ and a measurable space $W$ ,

[TABLE]

Proof.

For any $w\in W$ , we show that

[TABLE]

where we identify $w$ with the arrow from $1=\{\ast\}$ to $W$ that sends $\ast$ to $w$ . Because

[TABLE]

it follows from uniformity that

[TABLE]

By dinaturality, we obtain

[TABLE]

Since this is true for any $w\in W$ , we see that $W\times\mathbf{tr}_{X,Y}^{Z}(f)$ is equal to

[TABLE]

∎

5.6.2 The Category of Mealy Machines

Definition 5.4.

We define a category $\mathbf{Mealy}$ by:

•

objects are $\mathbf{Int}$ -objects $\mathsf{X}$ ; and

•

arrows $f\colon\mathsf{X}\multimap\mathsf{Y}$ are behavioural equivalence classes of Mealy machines from $\mathsf{X}$ to $\mathsf{Y}$ .

We denote a wide subcategory of $\mathbf{Mealy}$ consisting of $\mathbf{Int}$ -object $\mathsf{X}$ such $X^{-}=\emptyset$ by $\mathbf{Mealy}_{+}$ .

Intuitively, while arrows in $\mathbf{Mealy}$ are bidirectional Mealy machines, arrows in $\mathbf{Mealy}_{+}$ are “one-way” Mealy machines. We consider the wide subcategory $\mathbf{Mealy}_{+}$ because categorical structure of $\mathbf{Mealy}_{+}$ is easier to describe than that of $\mathbf{Mealy}$ , and categorical structure of $\mathbf{Mealy}$ is induced by that of $\mathbf{Mealy}_{+}$ .

The identity arrow and the composition of $\mathbf{Mealy}_{+}$ is given by the identity Mealy machine $[\mathsf{id}_{\mathsf{X}}]$ and the composition of Mealy machine:

[TABLE]

Concrete description of the composition of Mealy machines between $\mathbf{Mealy}_{+}$ -objects is easy: for $\mathbf{Mealy}_{+}$ -objects $\mathsf{X}$ and $\mathsf{Y}$ , and for Mealy machines $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ and $\mathsf{N}\colon\mathsf{Y}\multimap\mathsf{Z}$ , the composition $\mathsf{N}\circ\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ consists of:

•

$S_{\mathsf{N}\circ\mathsf{M}}=S_{\mathsf{N}}\times S_{\mathsf{M}}$ ;

•

$s_{\mathsf{N}\circ\mathsf{M}}=(s_{\mathsf{N}},s_{\mathsf{M}})$ ;

•

$\tau_{\mathsf{N}\circ\mathsf{M}}$ given by

[TABLE]

The composition of transition functions makes sense because $X^{-}=Z^{-}=\emptyset$ . From this concrete description, it is easy to check that the composition of $\mathbf{Mealy}_{+}$ -arrows is well-defined. In fact, if $h\colon S_{\mathsf{M}}\to S_{\mathsf{M}^{\prime}}$ realizes $\mathsf{M}\preceq\mathsf{M}^{\prime}$ and $h^{\prime}\colon S_{\mathsf{N}}\to S_{\mathsf{N}^{\prime}}$ realizes and $\mathsf{N}\preceq\mathsf{N}^{\prime}$ , then $h^{\prime}\times h$ realizes $\mathsf{N}\circ\mathsf{M}\preceq\mathsf{N}^{\prime}\circ\mathsf{M}^{\prime}$ . Therefore, the symmetric transitive closure $\simeq$ is compatible with the composition.

Proposition 5.11.

The category $\mathbf{Mealy}_{+}$ with $(\mathsf{I},\otimes)$ is a symmetric monoidal category where the monoidal product of $\mathbf{Mealy}_{+}$ -arrows $[\mathsf{M}]\colon\mathsf{X}\multimap\mathsf{Y}$ and $[\mathsf{N}]\colon\mathsf{Z}\multimap\mathsf{W}$ is given by

[TABLE]

Proof.

It is easy to see that objects in $\mathbf{Mealy}_{+}$ are closed under the monoidal product of $\mathbf{Int}$ -objects. Thanks to simplicity of the composition of $\mathbf{Mealy}_{+}$ -arrows, we can easily check that the monoidal product of Mealy machines between $\mathbf{Mealy}_{+}$ -objects is compatible with behavioural equivalence and that $(\mathbf{Mealy}_{+},I,\otimes)$ is a symmetric monoidal category. ∎

Furthermore, $\mathbf{Mealy}_{+}$ inherits the trace operator of $\mathbf{pMeas}$ . For a $\mathbf{Mealy}_{+}$ -arrow $[\mathsf{M}]\colon\mathsf{X}\otimes\mathsf{Z}\multimap\mathsf{Y}\otimes\mathsf{Z}$ , we define a $\mathbf{Mealy}_{+}$ -arrow $\mathsf{Tr}_{\mathsf{X},\mathsf{Y}}^{\mathsf{Z}}[\mathsf{M}]\colon\mathsf{X}\multimap\mathsf{Y}$ to be the equivalence class of a Mealy machine $\mathsf{N}\colon\mathsf{X}\multimap\mathsf{Y}$ given by

[TABLE]

and

[TABLE]

Proposition 5.12.

The family of operators $\{\mathsf{Tr}_{\mathsf{X},\mathsf{Y}}^{\mathsf{Z}}\}_{\mathsf{X},\mathsf{Y},\mathsf{Z}\in\mathbf{Mealy}_{+}}$ is a trace operator on the symmetric monoidal category $(\mathbf{Mealy}_{+},\mathsf{I},\otimes)$ .

Proof.

Well-definedness of $\mathsf{Tr}_{\mathsf{X},\mathsf{Y}}^{\mathsf{Z}}(-)$ follows from uniformity of the trace operator on $\mathbf{pMeas}$ . Sliding, vanishing I, vanishing II, superposing and yanking for $\mathsf{Tr}$ follow from that of $\mathbf{tr}$ . Dinaturality for $\mathsf{Tr}$ follows from dinaturality of $\mathbf{tr}$ and Proposition 5.10. ∎

We recall the notions of $\mathbf{Int}$ -construction [26] and compact closed category.

Definition 5.5 ( $\mathbf{Int}$ -construction).

Let $(\mathcal{C},I,\otimes,\mathbf{tr})$ be a traced symmetric monoidal category. We define a category $\mathbf{Int}(\mathcal{C})$ by:

•

objects are pairs $(X^{+},X^{-})$ of $\mathcal{C}$ -objects;

•

arrows from $(X^{+},X^{-})$ to $(Y^{+},Y^{-})$ are $\mathcal{C}$ -arrows from $X^{+}\otimes Y^{-}$ to $Y^{+}\otimes X^{-}$ .

The identity on $(X^{+},X^{-})$ is given by the identity on $X^{+}\otimes X^{-}$ , and the composition of $\mathbf{Int}(\mathcal{C})$ -arrows $f\colon(X^{+},X^{-})\to(Y^{+},Y^{-})$ and $g\colon(Y^{+},Y^{-})\to(Z^{+},Z^{-})$ is given by

[TABLE]

Here, we omit some coherence isomorphisms.

Definition 5.6.

A compact closed category is a symmetric monoidal category $(\mathcal{C},I,\otimes)$ with a function $(-)^{\bot}\colon\mathrm{obj}(\mathcal{C})\to\mathrm{obj}(\mathcal{C})$ and families of $\mathcal{C}$ -arrows

[TABLE]

such that

[TABLE]

For $X\in\mathcal{C}$ , the object $X^{\bot}$ is called the dual object of $X$ .

Theorem 5.3 ([26]).

The category $\mathbf{Int}(\mathcal{C})$ is a compact closed category. The unit and the monoidal product are given by

[TABLE]

The dual object of $(X^{+},X^{-})$ is $(X^{-},X^{+})$ . The unit arrow $\eta_{(X^{+},X^{-})}$ and the counit arrow $\epsilon_{(X^{+},X^{-})}$ are given by

[TABLE]

Corollary 5.1.

The category $\mathbf{Mealy}$ is a compact closed category. The monoidal structure is given by $(\mathsf{I},\otimes)$ , and the unit and the counit are given by $\mathsf{unit}_{\mathsf{X}}$ and $\mathsf{counit}_{\mathsf{X}}$ respectively.

Proof.

It is straightforward to check that $\mathbf{Mealy}$ is isomorphic to $\mathbf{Int}(\mathbf{Mealy}_{+})$ , and the compact closed structure is given by data provided in Section 5. ∎

6 Mealy Machine Semantics for $\mathbf{PCFSS}$

We interpret a type $\mathtt{A}$ as the $\mathbf{Int}$ -object $\llbracket\mathtt{A}\rrbracket$ given by

[TABLE]

We define interpretation of contexts by

[TABLE]

When $\mathtt{\Delta}$ is the empty sequence, we define $\llbracket\mathtt{\Delta}\rrbracket$ to be $\mathsf{I}$ .

For interpreting conditional branching, we use the following proposition.

Proposition 6.1.

For any type $\mathtt{A}$ , there is a partial measurable embedding $e\colon\mathbb{S}+\mathbb{N}\times\llbracket\mathtt{A}\rrbracket^{-}\to\mathbb{S}$ .

Proof.

We first define an embedding from $\llbracket\mathtt{A}\rrbracket^{-}$ to $\mathbb{S}$ by induction on $\mathtt{A}$ . We note that for any type $\mathtt{A}$ , we have $\llbracket\mathtt{A}\rrbracket^{+}=\llbracket\mathtt{A}\rrbracket^{-}$ . Base cases are easy. For induction step,

[TABLE]

The statement follows from $\mathbb{S}+\mathbb{N}\times\llbracket\mathtt{A}\rrbracket^{-}\subseteq\llbracket\mathtt{Unit}\to\mathtt{A}\rrbracket^{-}$ . ∎

We interpret terms $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ and values $\mathtt{\Delta}\vdash\mathtt{V}:\mathtt{A}$ by

[TABLE]

inductively defined by diagrams in Figure 4. In these definitions, when we can infer $\mathtt{\Delta}$ and $\mathtt{A}$ , we simply write $\llbracket\mathtt{M}\rrbracket$ and $\llparenthesis\mathtt{V}\rrparenthesis$ for $\llbracket\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}\rrbracket$ and $\llparenthesis\mathtt{\Delta}\vdash\mathtt{V}:\mathtt{A}\rrparenthesis$ respectively, and we often apply Convention 5.3 to these Mealy machines. Extracting precise definitions from these diagrams would be easy.

7 Adequacy Theorems

Finally, we give our main results. In the proof of our adequacy theorems, we use logical relations, diagrammatic reasoning of Mealy machines (Proposition 5.8), the domain theoretic structure of Mealy machines (Proposition 5.1), and Fubini-Tonelli theorem.

7.1 Sampling-Based Operational Semantics

For a closed term $\mathtt{M}:\mathtt{Real}$ , we define a partial measurable function $\mathfrak{o}(\mathtt{M})\colon\mathbb{R}_{\geq 0}\times\mathbb{T}\to\mathbb{R}_{\geq 0}\times\mathbb{R}$ as follows:

•

for $(a,u)\in\mathbb{R}_{\geq 0}\times\mathbb{T}$ , if there are $s,s^{\prime}\in S_{\llbracket\mathtt{M}\rrbracket}$ such that

[TABLE]

i.e., if we have the following transitions:

$\llbracket\mathtt{M}\rrbracket$$\oc\mathsf{R}$$\mathsf{S}$$s_{\llbracket\mathtt{M}\rrbracket}/s$$(a,u)$$(a^{\prime},\varepsilon)$$\llbracket\mathtt{M}\rrbracket$$\oc\mathsf{R}$$\mathsf{S}$$s/s^{\prime}$$(0,\varepsilon)$$(0,b\mathbin{::}\varepsilon)$

then we define $\mathfrak{o}(\mathtt{M})(a,u)$ to be $(a^{\prime},b)$ ;

•

otherwise, $\mathfrak{o}(\mathtt{M})(a,u)$ is undefined.

Theorem 7.1 (Adequacy).

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ and for any $(a,u)\in\mathbb{R}_{\geq 0}\times\mathbb{T}$ , we have

[TABLE]

Corollary 7.1.

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ , partial functions $\mathtt{weight}(\mathtt{M})\colon\mathbb{R}_{\geq 0}\times\mathbb{T}\to\mathbb{R}_{\geq 0}$ and $\mathtt{val}(\mathtt{M})\colon\mathbb{R}_{\geq 0}\times\mathbb{T}\to\mathbb{R}$ given as follows

[TABLE]

are partial measurable functions.

7.2 Distribution-Based Operational Semantics

For a closed term $\mathtt{M}:\mathtt{Real}$ , we define measurable functions $\mathfrak{o}_{0}(\mathtt{M})\colon\mathbb{T}\to\mathbb{R}_{\geq 0}$ and $\mathfrak{o}_{1}(\mathtt{M})\colon\mathbb{T}\to\mathbb{R}$ by

[TABLE]

Then we define a measure $\mathfrak{O}(\mathtt{M})$ on $\mathbb{R}$ by:

[TABLE]

Theorem 7.2 (Adequacy).

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ , we have $\mathtt{M}\Rightarrow_{\infty}\mathfrak{O}(\mathtt{M}).$

It follows from our adequacy theorems that sampling-based operational semantics induces distribution-based operational semantics.

Corollary 7.2.

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ ,

[TABLE]

A result analogous to Corollary 7.2 has already been proved by way of a purely operational (and quite laburious) argument in an untyped setting where score is not available in its full generality [28]. Here, it is just an easy corollary of our adequacy theorems.

8 Proof of Adequacy Theorems

Lemma 8.1.

For any term $\mathtt{\Delta},\mathtt{x}:\mathtt{A}\vdash\mathtt{M}:\mathtt{B}$ and for any closed value $\vdash\mathtt{V}:\mathtt{A}$ ,

[TABLE]

Proof.

By induction on $\mathtt{M}$ . ∎

Lemma 8.2.

For all closed terms $\mathtt{M},\mathtt{N}:\mathtt{A}$ , if $\mathtt{M}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{N}$ , then $\llbracket\mathtt{M}\rrbracket=\llbracket\mathtt{N}\rrbracket$ .

Proof.

By case analysis. For the case of recursion, see Proposition 9.2 in Section 9. ∎

We first prove soundness.

Proposition 8.1.

For any closed term $\mathtt{M}:\mathtt{Real}$ and for any $(a,u)\in\mathbb{R}_{\geq 0}\times\mathbb{T}$ , if $(\mathtt{M},a,u)\to^{\ast}(b,a^{\prime},\varepsilon)$ , then $\mathfrak{o}(\mathtt{M})(a,u)=(a^{\prime},b)$ .

Proof.

By induction on the length of $\to^{\ast}$ . (Base case) Easy. (Induction step) By case analysis on the first evaluation step of $(\mathtt{M},a,u)\to^{\ast}(b,a^{\prime},\varepsilon)$ .

•

If the first evaluation step is of the form $(\mathtt{E}[\mathtt{N}],a,u)\to(\mathtt{E}[\mathtt{L}],a^{\prime},u^{\prime})$ for some $\mathtt{N}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{L}$ , then by Lemma 8.2, we have $\mathtt{E}[\mathtt{N}]=\mathtt{E}[\mathtt{N}]$ . Because $(\mathtt{E}[\mathtt{L}],a^{\prime},u^{\prime})\to^{\ast}(b,a^{\prime},\varepsilon)$ , by induction hypothesis, we obtain $\mathfrak{o}(\mathtt{E}[\mathtt{L}])(a^{\prime},u^{\prime})=(a^{\prime},b)$ . Hence, $\mathfrak{o}(\mathtt{E}[\mathtt{N}])(a,u)=\mathfrak{o}(\mathtt{E}[\mathtt{L}],a^{\prime},u^{\prime})=(a^{\prime},b)$ .

•

If the first evaluation step is of the form $(\mathtt{E}[\mathtt{score}(\mathtt{r}_{c})],a,u)\to(\mathtt{E}[\mathtt{skip}],|c|\,a,u)$ , then by induction hypothesis, we have $\mathfrak{o}(\mathtt{E}[\mathtt{skip}])(|c|\,a,u)=(a^{\prime},b)$ . Therefore, by the definition of the Mealy machine $\mathsf{sc}$ , we see that $\mathfrak{o}(\mathtt{E}[\mathtt{score}(\mathtt{r}_{c})])(a,u)$ is $(a^{\prime},b)$ .

•

If the first evaluation step is of the form $(\mathtt{E}[\mathtt{sample}],a,c\mathbin{::}u)\to(\mathtt{E}[\mathtt{r}_{c}],a,u)$ , then by induction hypothesis, $\mathfrak{o}(\mathtt{E}[\mathtt{r}_{c}])(a,u)$ is $(a^{\prime},b)$ . Therefore, by the definition of the Mealy machine $\mathtt{sample}$ , we see that $\mathfrak{o}(\mathtt{E}[\mathtt{sample}])(a,c\mathbin{::}u)$ is $(a^{\prime},b)$ .

∎

It remains to prove that $\mathfrak{o}(\mathtt{M})(a,u)=(a^{\prime},b)$ implies that $(\mathtt{M},a,u)\to^{\ast}(b,a^{\prime},\varepsilon)$ . We use logical relations. We define a binary relation $O$ between closed terms of type $\mathtt{Real}$ and Mealy machines from $\mathsf{I}$ to $\mathsf{S}\otimes\oc\mathsf{R}$ by

[TABLE]

where $o(\mathsf{M})\colon\mathbb{R}_{\geq 0}\times\mathbb{T}\to\mathbb{R}_{\geq 0}\times\mathbb{R}$ is a partial measurable function given by: for each $(a,u)\in\mathbb{R}_{\geq 0}\times\mathbb{T}$ ,

•

if there are states $s,s^{\prime}\in S_{\mathsf{M}}$ such that

[TABLE]

i.e., if we have the following transitions:

$\mathsf{M}$$\oc\mathsf{R}$$\mathsf{S}$$s_{\mathsf{M}}/s$$(a,u)$$(a^{\prime},\varepsilon)$$\mathsf{M}$$\oc\mathsf{R}$$\mathsf{S}$$s/s^{\prime}$$(0,\varepsilon)$$(0,b\mathbin{::}\varepsilon)$

then we define $o(\mathsf{M})(a,u)$ to be $(a^{\prime},b)$ ;

•

otherwise, $o(\mathsf{M})(a,u)$ is undefined.

We then inductively define binary relations

[TABLE]

by

[TABLE]

We list some properties of the logical relations.

Lemma 8.3.

Let $\mathtt{A}$ be a type.

If $(\mathtt{V},\mathtt{M})\in R_{\mathtt{A}}$ , then $(\mathtt{V},\mathsf{e}\otimes\oc\mathsf{M})\in\overline{R}_{\mathtt{A}}$ . 2. 2.

If $(\mathtt{M},\mathsf{M})\in\overline{R}_{\mathtt{A}}$ and $\mathtt{N}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{M}$ , then $(\mathtt{N},\mathsf{M})\in\overline{R}_{\mathtt{A}}$ . 3. 3.

If $(\mathtt{M},\mathsf{M})\in\overline{R}_{\mathtt{A}}$ and $\mathtt{M}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{N}$ , then $(\mathtt{N},\mathsf{M})\in\overline{R}_{\mathtt{A}}$ . 4. 4.

If $(\mathtt{M},\mathsf{M})\in\overline{R}_{\mathtt{A}}$ and $\mathsf{M}\simeq\mathsf{N}$ , then $(\mathtt{M},\mathsf{N})\in\overline{R}_{\mathtt{A}}$ . 5. 5.

For any closed term $\mathtt{M}:\mathtt{A}$ , $(\mathtt{M},\mathsf{bot}_{\mathsf{S}\otimes\oc\llbracket\mathtt{A}\rrbracket})\in\overline{R}_{\mathtt{A}}$ where $\mathsf{bot}_{\mathsf{X}}\colon\mathsf{I}\multimap\mathsf{X}$ is a token machine whose transition function is the empty partial measurable function. 6. 6.

For any closed value $\mathtt{V}:\mathtt{A}\to\mathtt{B}$ , $(\mathtt{V},\mathsf{bot}_{\llbracket\mathtt{A}\to\mathtt{B}\rrbracket})\in R_{\mathtt{A}\to\mathtt{B}}$ . 7. 7.

If $(\mathtt{M},\mathsf{M}_{i})\in\overline{R}_{\mathtt{A}}$ and $[\mathsf{M}_{1}]\leq[\mathsf{M}_{2}]\leq\cdots$ , then $(\mathtt{M},\mathsf{N})\in\overline{R}_{\mathtt{A}}$ where $[\mathsf{N}]$ is the least upper bound of the $\omega$ -chain $[\mathsf{M}_{1}]\leq[\mathsf{M}_{2}]\leq\cdots$ .

Proof.

We can check these items by unfolding the definition of $O$ and the logical relations. ∎

Lemma 8.4 (Basic Lemma).

Let $\mathtt{\Delta}=(\mathtt{x}:\mathtt{A}_{1},\ldots,\mathtt{x}_{n}:\mathtt{A}_{n})$ be a context.

•

For any term $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ and for any $(\mathtt{V}_{i},\mathtt{N}_{i})\in R_{\mathtt{A}_{i}}$ for $i=1,2,\ldots,n$ , we have

[TABLE]

•

For any value $\mathtt{\Delta}\vdash\mathtt{V}:\mathtt{A}$ and for any $(\mathtt{V}_{i},\mathtt{N}_{i})\in R_{\mathtt{A}_{i}}$ for $i=1,2,\ldots,n$ , we have

[TABLE]

Proof.

By induction on $\mathtt{M}$ and $\mathtt{V}$ . Most cases follow from Lemma 8.3. For $\mathtt{M}=\mathtt{sample}$ and $\mathtt{M}=\mathtt{score}(\mathtt{V})$ , we check the statement by unfolding the definition of $\mathsf{sa}$ and $\mathsf{sc}$ . Here, we only check for $\mathtt{M}=\mathtt{sample}$ and $\mathtt{M}=\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{N})$ .

•

When $\mathtt{M}=\mathtt{sample}$ , for any $(\mathtt{E},\mathsf{E})$ in $R_{\mathtt{Real}}^{\top}$ , if

[TABLE]

then by the definition of $\mathsf{sa}$ , we see that $u=c\mathbin{::}u^{\prime}$ for some $c\in\mathbb{R}_{[0,1]}$ and $u^{\prime}\in\mathbb{T}$ such that

[TABLE]

Because $(\mathtt{E},\mathsf{E})\in R_{\mathtt{Real}}^{\top}$ , we obtain $(\mathtt{E}[\mathtt{r}_{c}],a,u^{\prime})\to^{\ast}(b,a^{\prime},\varepsilon)$ . Hence,

[TABLE]

•

When $\mathtt{M}=\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{N})$ , for simplicity, we suppose that $\mathtt{M}$ is a closed term. By induction hypothesis, we can check that

[TABLE]

by induction on $n$ . Because $[\llbracket\mathtt{M}\rrbracket]$ is the least upper bound of $\llbracket\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{M}\rrbracket\circ\oc\llbracket\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{N}\rrbracket\circ\cdots\circ\oc^{k}\llbracket\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{N}\rrbracket\circ\mathsf{bot}_{\mathsf{I},\oc^{k}\llbracket\mathtt{A}\to\mathtt{B}\rrbracket}$ (Proposition 9.1), we obtain $(\mathtt{M},\llbracket\mathtt{M}\rrbracket)\in R_{\mathtt{A}\to\mathtt{B}}$ by Lemma 8.3.

∎

Theorem 8.1.

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ and for any $(a,u)\in\mathbb{R}_{\geq 0}\times\mathbb{T}$ , we have

[TABLE]

Proof.

If $(\mathtt{M},a,u)\to^{\ast}(b,a^{\prime},\varepsilon)$ , then we have $\mathfrak{o}(\mathtt{M})(a,u)=(a^{\prime},b)$ by Proposition 8.1. If $\mathfrak{o}(\mathtt{M})(a,u)=(a^{\prime},b)$ , then because $([-],\mathsf{e}\otimes\mathsf{id}_{\oc\mathsf{R}})$ is an element of $R_{\mathtt{Real}}^{\top}$ , we obtain $(\mathtt{M},a,u)\to^{\ast}(b,a^{\prime},\varepsilon)$ by Lemma 8.4. ∎

9 Approximation Lemma

Let $\mathsf{M}\colon\oc\mathsf{X}\multimap\mathsf{X}$ be a Mealy machine. In this section, we give a way to calculate a Mealy machine $\mathsf{M}^{\dagger}\colon\mathsf{I}\to\oc\mathsf{X}$ given by

[TABLE]

Diagrammatically, $\mathsf{M}^{\dagger}$ consists of digging, contraction and a feed back loop:

$\oc\mathsf{M}$$\mathsf{dg}$$\mathsf{c}$$\mathsf{M}$$\oc\oc\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$$\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$

This construction already appeared in the interpretation of the fixed point operator. In fact, for a term $\mathtt{f}:\mathtt{A}\to\mathtt{B},\mathtt{x}:\mathtt{A}\vdash\mathtt{M}:\mathtt{B}$ , we have $\llbracket\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{M})\rrbracket=\llbracket\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{M}\rrbracket^{\dagger}$ .

The goal of this section is to show that $\mathsf{M}^{\dagger}$ is a fixed point of $\mathsf{M}$ and can be approximated by a family of Mealy machines

[TABLE]

9.0.1 Parametrized Modal Operator and Parametrized Loop Operator

We introduce parametrization of the modal operator $\oc$ . For a subset $\alpha\subseteq\mathbb{N}$ and for a Mealy machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ , we define a Mealy machine $\oc_{\alpha}\mathsf{M}\colon\oc\mathsf{X}\multimap\oc\mathsf{Y}$ by: the state space and the initial state of $\oc_{\alpha}\mathsf{M}$ are given by

[TABLE]

and $\tau_{\oc_{\alpha}\mathsf{M}}$ is given by

[TABLE]

where $i,j\in\{0,1\}$ and $z,w$ vary over the corresponding sets. For example, if we have

$\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$$s/s^{\prime}$$x$$y$

,

then for any $t_{1},t_{2},\ldots\in S_{\mathsf{M}}$ and for any $n\in\mathbb{N}$ , we have

$\oc\mathsf{M}$$\mathsf{Y}$$\mathsf{X}$$(t_{1},\ldots,t_{n-1},s,t_{n+1},\ldots)/(t_{1},\ldots,t_{n-1},s^{\prime},t_{n+1},\ldots)$$(n,x)$$(n,y)$

whenever $n\in\alpha$ . When $n\notin\alpha$ , there is no output from $\oc_{\alpha}\mathsf{M}$ . We can think $\oc_{\alpha}\mathsf{M}$ as a “restriction” of $\oc\mathsf{M}$ to $\alpha$ . In fact, $\oc\mathsf{M}$ is equal to $\oc_{\mathbb{N}}\mathsf{M}$ .

We are interested in restrictions of $\oc$ to subsets $\alpha_{n},\beta_{n}\subseteq\mathbb{N}$ inductively given by

[TABLE]

The definition of $\alpha_{n}$ and $\beta_{n}$ are motivated by the following lemma.

Lemma 9.1.

For any $n\in\mathbb{N}$ and for any $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ , we have

[TABLE]

By means of $\oc_{\alpha}$ , we can also parametrize the operator $(-)^{\dagger}$ . For $\alpha\subseteq\mathbb{N}$ , and for $\mathsf{M}\colon\oc\mathsf{X}\multimap\mathsf{X}$ , we define $\mathsf{M}^{\dagger,\alpha}\colon\mathsf{I}\to\oc\mathsf{X}$ by

[TABLE]

It is easy to see that $\mathsf{M}^{\dagger}$ is equal to $\mathsf{M}^{\dagger,\mathbb{N}}$ .

Lemma 9.2.

For any Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\multimap\mathsf{X}$ , we have

[TABLE]

Proof.

See Figure 5. ∎

Lemma 9.3.

For any Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\multimap\mathsf{X}$ and for any $n\in\mathbb{N}$ ,

[TABLE]

Proof.

We prove the statement by induction on $n$ . The base case follows from that the transition functions of $(\oc\mathsf{M})^{\dagger,\alpha_{0}}$ and $\oc(\mathsf{M}^{\dagger,\alpha_{0}})$ are equal to the empty partial function. We next check the induction step. We have

[TABLE]

∎

Proposition 9.1.

For a Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\multimap\mathsf{X}$ , we inductively define $\mathsf{iter}_{n}(\mathsf{M})\colon\mathsf{I}\multimap\mathsf{X}$ by

[TABLE]

For all $n\in\mathbb{N}$ , we have

[TABLE]

and

[TABLE]

Hence, we have an ascending chain

[TABLE]

and $[\mathsf{M}^{\dagger}]$ is the least upper bound of the ascending chain $[\mathsf{iter}_{n}(\mathsf{M})]$ .

Proof.

It follows from the definition of $\oc_{\alpha_{n}}$ , we have

[TABLE]

for all $n\in\mathbb{N}$ , and

[TABLE]

Hence, by the definition of the composition and the monoidal product, we have

[TABLE]

for all $n\in\mathbb{N}$ , and

[TABLE]

It remains to check $\mathsf{iter}_{n}(\mathsf{M})\simeq\mathsf{M}^{\dagger,\alpha_{n}}$ . We show this by induction on $n$ . For the base case, we have $\mathsf{M}^{\dagger,\emptyset}\simeq\mathsf{iter}_{0}(\mathsf{M})$ because these Mealy machines $\mathsf{M}^{\dagger,\emptyset}$ and $\mathsf{iter}_{0}(\mathsf{M})$ are behaviorally equivalent to $\mathsf{bot}_{\mathsf{I},\mathsf{X}}$ . For the induction step,

[TABLE]

Because $\mathsf{iter}_{n+1}(\mathsf{M})$ is equal to $\mathsf{M}\circ\oc(\mathsf{iter}_{n}(\mathsf{M}))$ , we obtain $\mathsf{iter}_{n+1}(\mathsf{M})\simeq\mathsf{M}^{\dagger,\alpha_{n+1}}$ . ∎

Proposition 9.2.

For any Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\multimap\mathsf{X}$ ,

[TABLE]

Proof.

Because $\oc(-)$ and the composition of Mealy machines are continuous, we have

[TABLE]

∎

10 How About S-Finite Kernels?

The reader experienced with the semantics of probabilistic programming languages have probably already wondered whether a GoI model for $\mathbf{PCFSS}$ could be given out of s-finite kernels instead of measurable functions, following Staton’s work on the semantics of a first-order probabilistic programming language [32].

The answer is indeed positive: the kind of construction we have presented in Section 5 can in fact be adapted to the category of measurable spaces and s-finite kernels. The latter, being traced monoidal, has all the necessary structure one needs [27]. What one obtains proceeding this way is indeed a GoI model, but adequate only for the distribution-based operational semantics.

The interpretation of any program in this alternative GoI can be seen as structurally identical to the one from Section 5 once the sample and score operators are interpreted as usual, namely as those s-finite kernels which actually perform sampling and scoring internally. Below, we first recall the definition of s-finite kernel, and then we introduce Mealy machines whose transition is described in terms of an s-finite kernel, and we give some basic Mealy machines. Finally, we give an adequate GoI model for the distribution-based operational semantics.

Being adequate for the distribution-based semantics directly (and not by way of integration as in Theorem 7.2) has the pleasant consequence of validating a number of useful program transformations, and in particular commutation of sampling and scoring effects, see [28] for a thorough discussion about this topic, and about how s-finite kernels are a particularly nice way of achieving commutativity in presence of scoring.

10.1 S-finite Kernels

Let $k\colon X\leadsto Y$ be a kernel. We say that $k$ is finite when there is a real number $c>0$ such that for all $x\in X$ and $A\in\Sigma_{Y}$ , we have $k(x,A)<c$ . An s-finite kernel is a kernel $k\colon X\leadsto Y$ such that there is a countable family $\{k_{n}\colon X\leadsto Y\}$ of finite kernels such that $k(x,A)=\sum_{n\in\mathbb{N}}k_{n}(x,A)$ for all $x\in X$ and $A\in\Sigma_{Y}$ . It is easy to see that s-finite kernels are closed under the pointwise addition. We write $\sum_{i\in I}k_{i}\colon X\leadsto Y$ for the pointwise addition of s-finite kernels $k_{i}\colon X\leadsto Y$ . A (sub)probability kernel is a kernel $k\colon X\leadsto Y$ such that $k(x,-)$ is a (sub)probability measure on $X$ for all $x\in X$ . Every (sub)probability kernel is a finite kernel.

Every measurable function $f\colon X\to Y$ gives rise to a probability kernel $\hat{f}\colon X\leadsto Y$ given by

[TABLE]

We denote the probability kernel induced by the identity measurable function by $\mathrm{id}_{X}\colon X\leadsto X$ . Concretely, this is given by $\mathrm{id}_{X}(x,A)=[x\in A]$ .

We recall two constructions of s-finite kernels.

•

(Composition) For s-finite kernels $k\colon X\leadsto Y$ and $h\colon Y\leadsto Z$ , we define an s-finite kernel $h\circ k\colon X\leadsto Z$ by

[TABLE]

The composition of s-finite kernels is associative and satisfies the unit laws, namely, we have $k\circ\mathrm{id}_{X}=k$ and $\mathrm{id}_{Y}\circ k=k$ .

•

(Tensor product) For s-finite kernels $k\colon X\leadsto Y$ and $h\colon Z\leadsto W$ , we define an s-finite kernel $k\otimes h\colon X\times Z\leadsto Y\times W$ to be the unique s-finite kernel such that for all $(x,z)\in X\times Z$ and for all $A\in\Sigma_{Y}$ and $B\in\Sigma_{W}$ ,

[TABLE]

The tensor product and the coproduct of s-finite kernels is functorial. This means that these constructors are compatible with the composition and preserve identities. The following proposition summarizes catagorical status of these structures.

Proposition 10.1.

The category of measurable spaces and s-finite kernels with $\otimes$ forms a symmetric monoidal category where the unit object is $1$ . The object $\emptyset$ is the zero object, and $X+Y$ with

[TABLE]

forms the coproduct of $X$ and $Y$ where $\mathrm{inl}_{X,Y}\colon X\to X+Y$ and $\mathrm{inr}_{X,Y}\colon X\to X+Y$ are the first and the second injections. Furthermore, the monoidal product distributes over the coproducts. Namely, the canonical s-finite kernel $\widehat{\mathrm{dst}_{X,Y,Z}}\colon X\times Z+Y\times Z\leadsto(X+Y)\times Z$ given by

[TABLE]

is a natural isomorphism.

Proof.

For associativity of the composition, see [32, Lemma 3], and for functoriality of $\otimes$ , see [32, Proposition 5]. It is not difficult to check that the category of measurable spaces and s-finite kernels associated with $\otimes$ and $1$ forms a symmetric monoidal category. For s-finite kernels $f\colon X\leadsto Z$ and $g\colon Y\leadsto Z$ , the cotupling $[f,g]\colon X+Y\leadsto Z$ is given by

[TABLE]

It follows from universality of coproducts that $\widehat{\mathrm{dst}_{X,Y,Z}}$ is a natural isomorphism. ∎

For s-finite kernels $k\colon X\leadsto Y$ and $h\colon Z\leadsto W$ , we define an s-finite kernel $k\oplus h\colon X+Z\leadsto Y+W$ by

[TABLE]

This is the unique s-finite kernel satisfying

[TABLE]

10.2 Probabilistic Mealy Machine

Definition 10.1.

For $\mathbf{Int}$ -objects $\mathsf{X}$ and $\mathsf{Y}$ , a probabilistic Mealy machine $\mathsf{M}$ from $\mathsf{X}$ to $\mathsf{Y}$ consists of

•

a measurable space $S_{\mathsf{M}}$ called the state space of $\mathsf{M}$ ;

•

an element $s_{\mathsf{M}}\in S_{\mathsf{M}}$ called the initial state of $\mathsf{M}$ ;

•

an s-finite kernel $\tau_{\mathsf{M}}\colon(X^{+}+Y^{-})\times S_{\mathsf{M}}\leadsto(Y^{+}+X^{-})\times S_{\mathsf{M}}$ called the transition relation.

When $\mathsf{M}$ is a probabilistic Mealy machine from $\mathsf{X}$ to $\mathsf{Y}$ , we write $\mathsf{M}\colon\mathsf{X}\rightarrowtriangle\mathsf{Y}$ .

We can regard a Mealy machine $\mathsf{M}\colon\mathsf{X}\multimap\mathsf{Y}$ as a probabilistic Mealy machine from $\mathsf{X}$ to $\mathsf{Y}$ by identifying the transition function $\tau_{\mathsf{M}}\colon(X^{+}+Y^{-})\times S_{\mathsf{M}}\to(Y^{+}+X^{-})\times S_{\mathsf{M}}$ with the correspondnig s-finite kernel $\widehat{\tau_{\mathsf{M}}}\colon(X^{+}+Y^{-})\times S_{\mathsf{M}}\leadsto(Y^{+}+X^{-})\times S_{\mathsf{M}}$ . In the sequel, we confuse Mealy machines (and token machines) with corresponding probabilistic Mealy machines.

Let $\mathsf{X}_{1},\ldots,\mathsf{X}_{n}$ and $\mathsf{Y}_{1},\ldots,\mathsf{Y}_{m}$ be $\mathbf{Int}$ -object. Just like Mealy machines, we depict a probabilistic Mealy machine $\mathsf{M}$ from $\mathsf{X}_{1}\otimes\cdots\otimes\mathsf{X}_{n}$ to $\mathsf{Y}_{1}\otimes\cdots\otimes\mathsf{Y}_{m}$ as a box with edges labeled by $\mathsf{X}_{1},\ldots,\mathsf{X}_{n}$ on the left hand side and edges labeled by $\mathsf{Y}_{1},\ldots,\mathsf{Y}_{m}$ on the right hand side:

$\mathsf{M}$$\mathsf{Y}_{m}$$\vdots$$\mathsf{Y}_{1}$$\mathsf{X}_{n}$$\vdots$$\mathsf{X}_{1}$

,

and we depict transitions as arrows. For example, when $n=m=1$ , we depict

[TABLE]

for $y\in Y_{1}^{-}$ , $x\in X_{1}^{-}$ and $s,s_{1},s_{2}\in S_{\mathsf{M}}$ as the following arrow

$\mathsf{M}$$s/s_{1}$$\mathsf{Y}_{1}$$\mathsf{X}_{1}$$y$$0.4$$x$

where the positive real on the arrow indicate probabilities of the transition. Below, we may omit states and probabilities of transitions when they are not important or are easy to infer.

10.3 Behavioral Equivalence

We give an equivalence relation between probabilistic Mealy machines so as to identify probabilistic Mealy machines that behaves in the same way. Let $\mathsf{X}$ and $\mathsf{Y}$ be $\mathbf{Int}$ -objects, and let $\mathsf{M}$ and $\mathsf{N}$ be probabilistic Mealy machines from $\mathsf{X}$ to $\mathsf{Y}$ . We write $\mathsf{M}\sim_{\mathsf{X},\mathsf{Y}}\mathsf{N}$ when there is a measurable function $f\colon S_{\mathsf{M}}\to S_{\mathsf{N}}$ such that $f(s_{\mathsf{M}})=s_{\mathsf{N}}$ and the following diagram commutes:

[TABLE]

We define an equivalence relation $\simeq_{\mathsf{X},\mathsf{Y}}$ to be the symmetric transitive closure of $\sim_{\mathsf{X},\mathsf{Y}}$ . A probabilistic Mealy machine $\mathsf{M}\colon\mathsf{X}\rightarrowtriangle\mathsf{Y}$ is behaviorally equivalent to $\mathsf{N}\colon\mathsf{X}\rightarrowtriangle\mathsf{Y}$ when we have $\mathsf{M}\simeq_{\mathsf{X},\mathsf{Y}}\mathsf{N}$ . When we can infer subscripts of $\simeq_{\mathsf{X},\mathsf{Y}}$ , we omit them. We say that a measurable function $f\colon S_{\mathsf{M}}\to S_{\mathsf{N}}$ realizes a behavioral equivalence $\mathsf{M}\simeq\mathsf{N}$ (realizes $\mathsf{M}\sim\mathsf{N}$ ) when $\mathsf{M}\sim\mathsf{N}$ is witnessed by $f$ .

10.4 Construction of probabilistic Mealy Machines

We introduce probabilistic Mealy machines and their constructions that are building blocks of our denotational semantics. Most of them are adoptation of Mealy machines in Section 5.4, and we just give their formal definitions.

10.4.1 Composition/Cut

For probabilistic Mealy machines $\mathsf{M}\colon\mathsf{X}\rightarrowtriangle\mathsf{Y}$ and $\mathsf{N}\colon\mathsf{Y}\rightarrowtriangle\mathsf{Z}$ , we define the state space and the initial states of $\mathsf{N}\circ\mathsf{M}$ by $S_{\mathsf{N}\circ\mathsf{M}}=S_{\mathsf{M}}\times S_{\mathsf{N}}$ , $s_{\mathsf{N}\circ\mathsf{M}}=(s_{\mathsf{M}},s_{\mathsf{N}})$ and we define the transition relation $\tau_{\mathsf{N}\circ\mathsf{M}}$ by

[TABLE]

where $k_{A,B,C,D}\colon(A+B)\times S_{\mathsf{N}\circ\mathsf{M}}\leadsto(C+D)\times S_{\mathsf{N}\circ\mathsf{M}}$ are restrictions of the following s-finite kernel

[TABLE]

namely, the s-finite kernels $k_{A,B,C,D}$ satisfies

[TABLE]

Here, the horizontal arrows consists of the injection from $A$ into $X^{+}+Y^{-}+Y^{+}+Z^{-}$ followed by distributivity and symmetry. For example, when $A=X^{+}+Z^{-}$ , the upper horizontal arrow is given by

[TABLE]

Joins in the definition of the composition of probabilistic Mealy machines are the pointwise ordoer. We can check that the composition of probabilistic Mealy machines is compatible with behavioural equivalence and that $\mathbf{Int}$ -objects and the composition of probabilistic Mealy machines is a category where the identity on an $\mathbf{Int}$ -object $\mathsf{X}$ is $\mathsf{id}_{\mathsf{X}}\colon\mathsf{X}\rightarrowtriangle\mathsf{X}$ (regarded as a probabilistic Mealy machine).

10.4.2 Monoidal Products

We give monoidal products of probabilistic Mealy machines. For probabilistic Mealy machines $\mathsf{M}\colon\mathsf{X}\rightarrowtriangle\mathsf{Z}$ and $\mathsf{N}\colon\mathsf{Y}\rightarrowtriangle\mathsf{W}$ , we define a probabilistic Mealy machine $\mathsf{M}\otimes\mathsf{N}\colon\mathsf{X}\otimes\mathsf{Y}\rightarrowtriangle\mathsf{Z}\otimes\mathsf{W}$ by: $S_{\mathsf{M}\otimes\mathsf{N}}=S_{\mathsf{M}}\times S_{\mathsf{N}}$ , $s_{\mathsf{M}\otimes\mathsf{N}}=(s_{\mathsf{M}},s_{\mathsf{N}})$ and $\tau_{\mathsf{M}\otimes\mathsf{N}}$ is given by

[TABLE]

It is not difficult to check that the monoidal product is compatible with behavioural equivalence. We depict $\mathsf{M}\otimes\mathsf{N}\colon(\mathsf{X}\otimes\mathsf{Y})\rightarrowtriangle(\mathsf{Z}\otimes\mathsf{W})$ as follows:

$\mathsf{M}$$\mathsf{N}$$\mathsf{Z}$$\mathsf{X}$$\mathsf{W}$$\mathsf{Y}$

For $\mathsf{unit}_{\mathsf{X}},\mathsf{counit}_{\mathsf{X}}$ and $\mathsf{sym}_{\mathsf{X},\mathsf{Y}}$ , we adopt the same diagrammatic presentation.

10.4.3 A Modal Operator

Let $\mathsf{M}\colon\mathsf{X}\rightarrowtriangle\mathsf{Y}$ be a probabilistic Mealy machine. We define a probabilistic Mealy machine $\oc\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\oc\mathsf{Y}$ by: the state space of $\oc\mathsf{M}$ is defined to be $|\mathsf{M}|^{\mathbb{N}}$ associated with the least $\sigma$ -algebra such that for all $A_{1},A_{2},\ldots\in\Sigma_{\mathsf{M}}$ ,

[TABLE]

the initial state $s_{\oc\mathsf{M}}$ is $(s_{\mathsf{M}},s_{\mathsf{M}},\ldots)$ ; the transition function $\tau_{\oc\mathsf{M}}$ is the unique partial measurable function satisfying

[TABLE]

for all $n\in\mathbb{N}$ . Here, $\mathrm{inj}_{n}\colon(-)\to\mathbb{N}\times(-)$ are the $n$ th injections, and $\mathrm{ins}_{n}\colon S_{\mathsf{M}}\times S_{\mathsf{M}}^{\mathbb{N}}\to S_{\mathsf{M}}^{\mathbb{N}}$ sends $(s,\{s_{n}\}_{n\in\mathbb{N}})$ to $(s_{0},\ldots,s_{n-1},s,s_{n},s_{n+1},\ldots)$ .

10.4.4 Diagrammatic Reasoning on Probabilistic Mealy Machines

Diagrammatic reasoning is valid also for probabilistic Mealy machines.

Proposition 10.2.

The category $\mathbf{pMealy}$ of $\mathbf{Int}$ -object and probabilistic Mealy machines (modulo behavioural equivalence) is a compact closed category. The dual of an $\mathbf{Int}$ -object $\mathsf{X}$ is $\mathsf{X}^{\bot}$ . The unit and the counit arrows are $\mathsf{unit}_{\mathsf{X}}$ and $\mathsf{counit}_{\mathsf{X}}$ .

Proposition 10.3.

If two probabilistic Mealy machines have the same diagrammatic presentation modulo some rearrangement of edges and nodes, then they are behaviourally equivalent.

We can check Proposition 10.2 by replacing the category of partial measurable functions by the category of s-finite kernels in Section 5.6.

10.4.5 A State Monad

We define an $\mathbf{Int}$ -object $\mathsf{J}$ by $(1,1)$ and define an $\mathbf{Int}$ -object $\mathsf{J}_{0}$ by $(1,\emptyset)$ . Then $\mathsf{J}\otimes(-)$ is a state monad (on $\mathbf{pMealy}$ ), whose unit and multiplication are given by:

[TABLE]

where $\mathsf{j}=\mathsf{unit}_{\mathsf{J}_{0}}$ and $\mathsf{n}=\mathsf{J}_{0}\otimes\mathsf{counit_{\mathsf{J}_{0}}}\otimes\mathsf{J}_{0}^{\bot}$ .

10.4.6 Scoring

We construct a probabilistic Mealy machine $\mathsf{Sc}\colon\mathsf{R}\rightarrowtriangle\mathsf{J}$ by:

[TABLE]

and

[TABLE]

The probabilistic Mealy machine simulates scoring $\mathtt{score}(\mathtt{r}_{a})$ as follows:

$\mathsf{r}_{a}$$\mathsf{Sc}$$\mathsf{R}$$\mathsf{J}$ $|a|$$\varepsilon$$a\mathbin{::}\varepsilon$$\ast$$\ast$

.

10.4.7 Sampling

We define a Mealy machine $\mathsf{Sa}\colon\mathsf{I}\rightarrowtriangle\mathsf{J}\otimes\oc\mathsf{R}$ by: the state space $S_{\mathsf{Sa}}$ is defined to be $\{\ast\}\cup\mathbb{R}_{[0,1]}$ , and the initial state $s_{\mathsf{Sa}}$ is $\ast$ , and the transition function

[TABLE]

is given by

[TABLE]

The probabilistic Mealy machine behaves as follows:

•

In the initial state $\ast$ , given $\ast$ from the $\mathsf{J}$ -edge, $\mathsf{Sa}$ draws a real number from the uniform distribution and stores the real number:

$\mathsf{Sa}$$\oc\mathsf{R}$$\mathsf{J}$$\ast/a$$\ast$$\ast$

.

For example, the probability of the state being a real number in $[0,0.3]$ after this transition is $0.3$ .

•

After this transition, $\mathsf{Sa}$ returns $(n,a\mathbin{::}u)$ to each “query” $(n,u)$ :

$\mathsf{Sa}$$\oc\mathsf{R}$$\mathsf{J}$$a/a$$(n,u)$$(n,b\mathbin{::}u)$

.

11 Probabilistic Mealy Machine Semantics for $\mathbf{PCFSS}$

We interpret a type $\mathtt{A}$ as the $\mathbf{Int}$ -object $\llbracket\mathtt{A}\rrbracket_{\mathrm{d}}$ given by

[TABLE]

We define interpretation of contexts by

[TABLE]

When $\mathtt{\Delta}$ is the empty sequence, we define $\llbracket\mathtt{\Delta}\rrbracket_{\mathrm{d}}$ to be $\mathsf{I}$ .

We interpret terms $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ and values $\mathtt{\Delta}\vdash\mathtt{V}:\mathtt{A}$ by

[TABLE]

inductively defined by diagrams in Figure 6 where Mealy machines are regarded as probabilistic Mealy machines in the obvious manner.

11.1 Soundness and Adequacy

11.1.1 Observation

Let $\mathsf{M}\colon\mathsf{I}\rightarrowtriangle\mathsf{J}\otimes\oc\mathsf{R}$ be a probabilistic Mealy machine. We define s-finite kernels $t_{0}\colon 1\leadsto S_{\mathsf{M}}$ and $t_{1}\colon S_{\mathsf{M}}\leadsto\mathbb{R}$ by

[TABLE]

Then we define a measure $\mathsf{obs}(\mathsf{M})$ on $\mathbb{R}$ to be $t_{1}^{\mathsf{M}}\circ t_{0}^{\mathsf{M}}(\ast,-)$ . Intuitively, $\mathsf{obs}(\mathsf{M})$ is a measure that describes distribution of real numbers obtained by the following process:

•

We first input $\ast$ to the $\mathsf{J}$ -wire of $\mathsf{M}$ .

•

If $\mathsf{M}$ outputs $\ast$ to the $\mathsf{J}$ -wire, then we input $(0,\varepsilon)$ to the $\oc\mathsf{R}$ -wire of $\mathsf{M}$ .

•

We only observe outputs of the form $(0,a\mathbin{::}\varepsilon)$ for some $a\in\mathbb{R}$ .

For example, $\mathsf{obs}(\mathsf{sa})$ is the uniform distribution over $\mathbb{R}_{[0,1]}$ .

Theorem 11.1 (Soundness and Adequacy).

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ , if $\mathtt{M}\Rightarrow_{\infty}\mu$ , then $\mathsf{obs}(\llbracket\mathtt{M}\rrbracket)=\mu$ .

Below, we give a proof of Theorem 11.1.

11.1.2 Proof of Adequacy Theorem

Lemma 11.1.

For any term $\mathtt{\Delta},\mathtt{x}:\mathtt{A}\vdash\mathtt{M}:\mathtt{B}$ and for any closed value $\vdash\mathtt{V}:\mathtt{A}$ ,

[TABLE]

Proof.

By induction on $\mathtt{M}$ . ∎

Lemma 11.2.

For all closed terms $\mathtt{M},\mathtt{N}:\mathtt{A}$ , if $\mathtt{M}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{N}$ , then $\llbracket\mathtt{M}\rrbracket=\llbracket\mathtt{N}\rrbracket$ .

Proof.

By case analysis. For the case of recursion, see Corollary 11.1. ∎

We first prove soundness.

Proposition 11.1.

For any closed term $\mathtt{M}:\mathtt{Real}$ , if $\mathtt{M}\Rightarrow_{n}\mu$ , then $\mu\leq\mathsf{obs}(\llbracket\mathtt{M}\rrbracket_{\mathrm{d}})$ .

Proof.

By induction on $n$ . (Base case) Easy. (Induction step) By case analysis.

•

If $\mathtt{M}=\mathtt{r}_{a}$ , then $\mathsf{obs}{\llbracket\mathtt{M}\rrbracket_{\mathrm{d}}}=\delta_{a}$ .

•

If $\mathtt{M}=\mathtt{E}[\mathtt{N}]$ and $\mathtt{N}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{L}$ , then $\mathsf{obs}(\llbracket\mathtt{M}\rrbracket_{\mathrm{d}})=\mathsf{obs}(\llbracket\mathtt{E}[\mathtt{L}]\rrbracket_{\mathrm{d}})\geq\mu$ .

•

If $\mathtt{M}=\mathtt{E}[\mathtt{score}(\mathtt{r}_{a})]$ and $\mathtt{E}[\mathtt{skip}]\Rightarrow_{n-1}\mu$ , then by the definition of $\mathsf{obs}(-)$ , we see that $\mathsf{obs}(\llbracket\mathtt{M}\rrbracket_{\mathrm{d}})=|a|\,\mathsf{obs}(\llbracket\mathtt{E}[\mathtt{skip}]\rrbracket_{\mathrm{d}})\geq|a|\,\mu$ .

•

If $\mathtt{M}=\mathtt{E}[\mathtt{sample}]$ and $\mathtt{E}[\mathtt{r}_{a}]\Rightarrow_{n-1}k(a,-)$ for some finite kernel $k$ , then by the definition of $t_{0}$ and $t_{1}$ , we see that

[TABLE]

for some $h$ such that $h(a,-)=t_{0}^{\llbracket\mathtt{E}[\mathtt{r}_{a}]\rrbracket_{\mathrm{d}}}\times\delta_{a}$ . Hence,

[TABLE]

∎

It remains to prove that $\mathtt{M}\Rightarrow_{\infty}\mu$ implies $\mathsf{obs}(\llbracket\mathtt{M}\rrbracket_{\mathrm{d}})\leq\mu$ . We use logical relations. We define a binary relation $O_{\mathrm{d}}$ between closed terms of type $\mathtt{Real}$ and probabilistic Mealy machines from $\mathsf{I}$ to $\mathsf{J}\otimes\oc\mathsf{R}$ by

[TABLE]

We then inductively define binary relations

[TABLE]

by

[TABLE]

We list some properties of the logical relations.

Lemma 11.3.

Let $\mathtt{A}$ be a type.

If $(\mathtt{V},\mathtt{M})\in S_{\mathtt{A}}$ , then $(\mathtt{V},\mathsf{j}\otimes\oc\mathsf{M})\in\overline{S}_{\mathtt{A}}$ . 2. 2.

If $(\mathtt{M},\mathsf{M})\in\overline{S}_{\mathtt{A}}$ and $\mathtt{N}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{M}$ , then $(\mathtt{N},\mathsf{M})\in\overline{S}_{\mathtt{A}}$ . 3. 3.

If $(\mathtt{M},\mathsf{M})\in\overline{S}_{\mathtt{A}}$ and $\mathtt{M}\stackrel{{\scriptstyle\mathrm{red}}}{{\longrightarrow}}\mathtt{N}$ , then $(\mathtt{N},\mathsf{M})\in\overline{S}_{\mathtt{A}}$ . 4. 4.

If $(\mathtt{M},\mathsf{M})\in\overline{S}_{\mathtt{A}}$ and $\mathsf{M}\simeq\mathsf{N}$ , then $(\mathtt{M},\mathsf{N})\in\overline{S}_{\mathtt{A}}$ . 5. 5.

For any closed term $\mathtt{M}:\mathtt{A}$ , $(\mathtt{M},\mathsf{bot}_{\mathsf{J}\otimes\oc\llbracket\mathtt{A}\rrbracket})\in\overline{S}_{\mathtt{A}}$ where $\mathsf{bot}_{\mathsf{X}}\colon\mathsf{I}\rightarrowtriangle\mathsf{X}$ is a token machine whose transition function is the zero kernel. 6. 6.

For any closed value $\mathtt{V}:\mathtt{A}\to\mathtt{B}$ , $(\mathtt{V},\mathsf{bot}_{\llbracket\mathtt{A}\to\mathtt{B}\rrbracket})\in S_{\mathtt{A}\to\mathtt{B}}$ . 7. 7.

If $(\mathtt{M},\mathsf{M}_{i})\in\overline{S}_{\mathtt{A}}$ and $S_{\mathsf{M}_{1}}=S_{\mathsf{M}_{2}}=\cdots$ and $s_{\mathsf{M}_{1}}=s_{\mathsf{M}_{2}}=\cdots$ and $\tau_{\mathsf{M}_{1}}\leq\tau_{\mathsf{M}_{2}}\leq\cdots$ , then $(\mathtt{M},\mathsf{N})\in\overline{S}_{\mathtt{A}}$ where $\mathsf{N}$ is given by $S_{\mathsf{N}}=S_{\mathsf{M}_{1}}$ , $s_{\mathsf{N}}=s_{\mathsf{M}_{1}}$ , $\tau_{\mathsf{N}}=\bigvee_{n}\tau_{\mathsf{M}_{n}}$ .

Proof.

We can check these items by unfolding the definition of $O_{\mathrm{d}}$ and the logical relations. ∎

Lemma 11.4 (Basic Lemma).

Let $\mathtt{\Delta}=(\mathtt{x}:\mathtt{A}_{1},\ldots,\mathtt{x}_{n}:\mathtt{A}_{n})$ be a context.

•

For any term $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ and for any $(\mathtt{V}_{i},\mathtt{N}_{i})\in S_{\mathtt{A}_{i}}$ for $i=1,2,\ldots,n$ , we have

[TABLE]

•

For any value $\mathtt{\Delta}\vdash\mathtt{V}:\mathtt{A}$ and for any $(\mathtt{V}_{i},\mathtt{N}_{i})\in S_{\mathtt{A}_{i}}$ for $i=1,2,\ldots,n$ , we have

[TABLE]

Proof.

By induction on $\mathtt{M}$ and $\mathtt{V}$ . Most cases follow from Lemma 11.3. For $\mathtt{M}=\mathtt{sample}$ and $\mathtt{M}=\mathtt{score}(\mathtt{V})$ , we check the statement by unfolding the definition of $\mathsf{Sa}$ and $\mathsf{sc}$ . Here, we only check for $\mathtt{M}=\mathtt{sample}$ and $\mathtt{M}=\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{N})$ .

•

When $\mathtt{M}=\mathtt{sample}$ , let $(\mathtt{E},\mathsf{E})$ be a pair in $R_{\mathtt{Real}}^{\top}$ . We write $\mathsf{N}$ for $(n\otimes\oc\mathsf{R})\circ(\mathsf{J}\otimes\mathsf{E})\circ\mathsf{Sa}$ . By the definition of $t_{0}$ and $t_{1}$ , we have

[TABLE]

for some $h$ such that $h(a,-)=t_{0}^{\mathsf{E}\circ\mathsf{r}_{a}}\times\delta_{a}$ . Hence,

[TABLE]

•

When $\mathtt{M}=\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{N})$ , for simplicity, we suppose that $\mathtt{M}$ is a closed term. By induction hypothesis, we can check that

[TABLE]

by induction on $n$ . By Lemma 11.3 and Proposition 11.2, we obtain $(\mathtt{M},\llbracket\mathtt{M}\rrbracket)\in R_{\mathtt{A}\to\mathtt{B}}$ .

∎

Theorem 11.2.

For any closed term $\vdash\mathtt{M}:\mathtt{Real}$ , if $\mathtt{M}\Rightarrow_{\infty}\mu$ , then

[TABLE]

Proof.

By soundness, we have $\mu\leq\mathsf{obs}(\llbracket\mathtt{M}\rrbracket_{\mathrm{d}})$ . On the other hand, because $([-],\mathsf{j}\otimes\mathsf{id}_{\oc\mathsf{R}})$ is an element of $S_{\mathtt{Real}}^{\top}$ , we obtain the other inequality by Lemma 11.4. ∎

11.1.3 Induction step on recursion

Our Goal: Approximation Lemma

Let $\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\mathsf{X}$ be a Mealy machine. In this section, we show that a Mealy machine $\mathsf{M}^{\dagger}\colon\mathsf{I}\to\oc\mathsf{X}$ given by

[TABLE]

is a “least” fixed point of $\mathsf{M}$ . Diagrammatically, $\mathsf{M}^{\dagger}$ consists of digging, contraction and a feed back loop:

$\mathsf{dg}_{\mathsf{X}}$$\oc\mathsf{M}$$\mathsf{con}_{\mathsf{X}}$$\mathsf{M}$$\oc\oc\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$$\mathsf{X}$$\oc\mathsf{X}$$\oc\mathsf{X}$

This construction already appeared in the interpretation of the fixed point operator. In fact, for a term $\mathtt{f}:\mathtt{A}\to\mathtt{B},\mathtt{x}:\mathtt{A}\vdash\mathtt{M}:\mathtt{B}$ , we have $\llbracket\mathtt{fix}_{\mathtt{A},\mathtt{B}}(\mathtt{f},\mathtt{x},\mathtt{M})\rrbracket=\llbracket\lambda\mathtt{x}^{\mathtt{A}}.\,\mathtt{M}\rrbracket^{\dagger}$ .

Parametrized Modal Operator and Parametrized Loop Operator

We introduce parametrization of the modal operator $\oc$ and the loop operator $(-)^{\dagger}$ . For $\alpha\subseteq\mathbb{N}$ , we define $\oc_{\alpha}\mathsf{M}$ by: the state space and the initial state of $\oc\mathsf{M}$ are given by

[TABLE]

and $\tau_{\oc_{\alpha}\mathsf{M}}$ is a unique s-finite kernel such that the following diagrams commute:

•

for any $n\in\alpha$ ,

[TABLE]

•

for any $n\not\in\alpha$ ,

[TABLE]

Let $\alpha_{n},\beta_{n}\subseteq\mathbb{N}$ be

[TABLE]

The definition of $\alpha_{n}$ and $\beta_{n}$ are motivated by the following lemma.

Lemma 11.5.

For any $n\in\mathbb{N}$ and for any $\mathsf{M}\colon\mathsf{X}\rightarrowtriangle\mathsf{Y}$ , we have

[TABLE]

By means of $\oc_{\alpha}$ , we also parametrize the operator $(-)^{\dagger}$ . For $\alpha\subseteq\mathbb{N}$ , and for $\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\mathsf{X}$ , we define $\mathsf{M}^{\dagger,\alpha}\colon\mathsf{I}\to\oc\mathsf{X}$ by

[TABLE]

Because $\oc_{\mathbb{N}}\mathsf{M}=\oc\mathsf{M}$ , we have $\mathsf{M}^{\dagger}=\mathsf{M}^{\dagger,\mathbb{N}}$ .

Lemma 11.6.

For any $\alpha\subseteq\mathbb{N}$ , we have

[TABLE]

Below, we write $h(\mathsf{M})\colon S_{\mathsf{M}^{\dagger}}\to S_{\mathsf{M}}\times S_{\mathsf{M}}^{\mathbb{N}}$ for the isomorphism obtained by applying $1\times(-)\cong(-)$ to $S_{\mathsf{M}}$ .

Lemma 11.7.

There is a family of measurable functions $\phi_{X}\colon(X^{\mathbb{N}})^{\mathbb{N}}\to(X^{\mathbb{N}})^{\mathbb{N}}$ such that the following diagram commutes:

[TABLE]

where $u_{X}\colon X^{\mathbb{N}}\to X^{\mathbb{N}}\times(X^{\mathbb{N}})^{\mathbb{N}}$ is a measurable isomorphism given by

[TABLE]

Proof.

In this proof, for sets $N_{1},N_{2},\ldots,N_{k}$ , we identify elements in $(((X^{N_{1}})^{N_{2}})\cdots)^{N_{k}}$ with functions from $N_{1}\times N_{2}\times\cdots\times N_{k}$ to $X$ . For $x\in(X^{\mathbb{N}})^{\mathbb{N}}$ and $(a,b)\in\mathbb{N}$ , We define $(\phi_{X}(x))(a,b)$ by induction on $a$ :

[TABLE]

where $x^{\prime}\in((X^{\mathbb{N}})^{\mathbb{N}})^{\mathbb{N}}$ is given by

[TABLE]

We note that this definition makes sense because $a=2\langle a_{0},a_{1}\rangle+1$ implies $a_{1}<a$ . It is straightforward to check the family $\phi_{X}$ makes the above diagram commute. ∎

Lemma 11.8.

For any Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\mathsf{X}$ , we have

[TABLE]

which is realized by $u^{\prime}_{\mathsf{M}}$ given by

[TABLE]

where the first and the last isomorphisms are obtained by applying canonical isomorphisms $1\times(-)\cong(-)$ and $(-)\times 1\cong(-)$ .

Proof.

See Figure 5. ∎

Lemma 11.9.

For any Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\mathsf{X}$ and for any $n\in\mathbb{N}$ ,

[TABLE]

which is realized by $\phi^{\prime}_{\mathsf{M}}$ given by

[TABLE]

where the first and the last isomorphisms are obtained by applying canonical isomorphisms $1\times(-)\cong(-)$ and $(-)\times 1\cong(-)$ .

Proof.

We prove the statement by induction on $n$ . The base case follows from that the transition relations of $(\oc\mathsf{M})^{\dagger,\alpha_{0}}$ and $\oc(\mathsf{M}^{\dagger,\alpha_{0}})$ are the zero kernels. We next check the induction step. We have

[TABLE]

By Lemma 11.7, this behavioral equivalence is realized by $\phi^{\prime}_{\mathsf{M}}$ . ∎

Proposition 11.2.

For a Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\mathsf{X}$ , we inductively define $\mathsf{iter}_{n}(\mathsf{M})\colon\mathsf{I}\rightarrowtriangle\mathsf{X}$ by

[TABLE]

For all $n\in\mathbb{N}$ , we have

[TABLE]

and

[TABLE]

Proof.

It follows from the definition of $\oc_{\alpha_{n}}$ , we have

[TABLE]

for all $n\in\mathbb{N}$ , and

[TABLE]

Hence, by continuity of the composition, the coproduct and the monodal product of s-finite kernels, we have

[TABLE]

for all $n\in\mathbb{N}$ , and

[TABLE]

By induction on $n\in\mathbb{N}$ , we show that we have

[TABLE]

which is realized by $f_{n}$ . For the base case, we have $\mathsf{M}^{\dagger,\emptyset}\simeq\mathsf{iter}_{0}(\mathsf{M})$ because these Mealy machines $\mathsf{M}^{\dagger,\emptyset}$ and $\mathsf{iter}_{0}(\mathsf{M})$ are behaviorally equiavalent to $\mathsf{emp}_{\mathsf{I},\mathsf{X}}$ . The induction step follows from Lemma 11.8. ∎

Corollary 11.1.

For any Mealy machine $\mathsf{M}\colon\oc\mathsf{X}\rightarrowtriangle\mathsf{X}$ ,

[TABLE]

Proof.

By Proposition 9.1, we have

[TABLE]

and

[TABLE]

Therefore, by Lemma 11.8 and Lemma 11.9, the following diagram commutes:

[TABLE]

It is easy to see that $u_{\mathsf{M}}^{\prime}\circ(S_{\mathsf{M}}\times\phi^{\prime}_{\mathsf{M}})$ preserves the initial states. Hence, $\mathsf{M}\circ\oc(\mathsf{M}^{\dagger})\simeq\mathsf{M}^{\dagger}$ . ∎

11.1.4 Commutativity Modulo Observational Equivalence

Definition 11.1.

For terms $\vdash\mathtt{M},\mathtt{N}:\mathtt{A}$ , we say that $\mathtt{M}$ is observationally equivalent to $\mathtt{N}$ when for all context $\mathtt{C}[-]$ , if $\mathtt{C}[\mathtt{M}]\Rightarrow_{\infty}\mu$ , then $\mathtt{C}[\mathtt{N}]\Rightarrow_{\infty}\mu$ .

In this sectin, as an application of our GoI semantics, we show that for all $\vdash\mathtt{M}:\mathtt{A}$ , $\vdash\mathtt{N}:\mathtt{B}$ and $\mathtt{x}:\mathtt{A},\mathtt{y}:\mathtt{B}\vdash\mathtt{L}:\mathtt{C}$ ,

[TABLE]

is observationally equivalent to

[TABLE]

To prove this equivalence, let $O_{\mathrm{d}}^{\prime}$ be a binary relation between closed terms of type $\mathtt{Real}$ and probabilistic Mealy machines from $\mathsf{I}$ to $\mathsf{J}\otimes\oc\mathsf{R}$ by

[TABLE]

where (Condition 1) is: for any $A\in\Sigma_{((1+\mathbb{N}\times\mathbb{S})+\emptyset)\times S_{\mathsf{M}}}$ such that

[TABLE]

and for any $s\in S_{\mathsf{M}}$ , $\tau_{\mathsf{M}}(((\circ,(\circ,\ast)),s),A)=0;$ (Condition 2) is: for any $A\in\Sigma_{((1+\mathbb{N}\times\mathbb{S})+\emptyset)\times S_{\mathsf{M}}}$ such that

[TABLE]

and for any $(n,u)\in\mathbb{N}\times\mathbb{S}$ , for any $s\in S_{\mathsf{M}}$ , $\tau_{\mathsf{M}}(((\circ,(\bullet,(n,u))),s),A)=0.$ We then inductively define binary relations

[TABLE]

by replacing $O_{\mathrm{d}}$ in the definition of $S_{\mathtt{A}}$ , $S_{\mathtt{A}}^{\top}$ and $\overline{S}_{\mathtt{A}}$ with $O_{\mathtt{d}}^{\prime}$ . Then we can prove basic lemma for this logical relation.

Lemma 11.10 (Basic Lemma).

Let $\mathtt{\Delta}=(\mathtt{x}:\mathtt{A}_{1},\ldots,\mathtt{x}_{n}:\mathtt{A}_{n})$ be a context.

•

For any term $\mathtt{\Delta}\vdash\mathtt{M}:\mathtt{A}$ and for any $(\mathtt{V}_{i},\mathtt{N}_{i})\in T_{\mathtt{A}_{i}}$ for $i=1,2,\ldots,n$ , we have

[TABLE]

•

For any value $\mathtt{\Delta}\vdash\mathtt{V}:\mathtt{A}$ and for any $(\mathtt{V}_{i},\mathtt{N}_{i})\in T_{\mathtt{A}_{i}}$ for $i=1,2,\ldots,n$ , we have

[TABLE]

Proof.

Almost equivalent to the proof of Lemma 11.4. ∎

Corollary 11.2.

For any $\vdash\mathtt{M}:\mathtt{A}$ ,

•

for any $A\in\Sigma_{((1+\mathbb{N}\times\llbracket\mathtt{A}\rrbracket_{\mathrm{d}}^{+})+\emptyset)\times S_{\mathsf{M}}}$ such that

[TABLE]

and for any $s\in S_{\mathsf{M}}$ , $\tau_{\mathsf{M}}(((\circ,(\circ,\ast)),s),A)=0;$

•

for any $A\in\Sigma_{((1+\mathbb{N}\times\llbracket\mathtt{A}\rrbracket_{\mathrm{d}}^{+})+\emptyset)\times S_{\mathsf{M}}}$ such that

[TABLE]

and for any $(n,a)\in\mathbb{N}\times\llbracket\mathtt{A}\rrbracket_{\mathrm{d}}^{-}$ , for any $s\in S_{\mathsf{M}}$ , $\tau_{\mathsf{M}}(((\circ,(\bullet,(n,a))),s),A)=0.$

By Corollary 11.2 and by the definition of composition of probabilistic Mealy machines, we see that if

[TABLE]

then

[TABLE]

where $k$ and $h$ are s-finite kernels given by restricting the domain and the codomain of $\tau_{\llbracket\mathtt{M}\rrbracket_{\mathrm{d}}}$ and $\tau_{\llbracket\mathtt{N}\rrbracket_{\mathrm{d}}}$ respectively. Because of commuativity for s-finite kernels [32], the equality (1) is true. Hence, (2) holds. Then by adequacy, we see that

[TABLE]

is observationally equivalent to

[TABLE]

12 Conclusion

We introduced a denotational semantics for $\mathbf{PCFSS}$ , a higher-order functional language with sampling from a uniform continuous distribution and scoring. Following [28], we considered two operational semantics, namely a distribution-based operational semantics, which associates terms with distributions over real numbers, and a sampling-based operational semantics, which associates each term with a weight along every probabilistic branch. Our main results are adequacy theorems for both kinds of operational semantics, and it follows from these theorems that sampling-based operational semantics is essentially equivalent to distribution-based operational semantics. Another consequence of adequacy theorems is the possibility of diagrammatic reasoning for observational equivalence of programs. It follows from the observation in Section 5.5 and the adequacy theorems, that diagrammatic equivalence for denotation of terms implies observational equivalence. It would be interesting to explore possible connections between our work and other works on diagrammatic reasoning for probabilistic computation, such as [48, 49].

At this point, our language does not support normalisation mechanism as a first class operator, and we are negative about extending our semantics to capture normalisation mechanism. However, capturing sampling algorithms such as the Metropolis-Hastings algorithm [50, 51], which consists of a number of interactions between programs and their environment seems plausible. Exploring the relationships between “idealised” normalisation mechanisms and such “approximating” normalisation mechanisms from the point of view of GoI is an interesting topic for future work.

Acknowledgment

The authors are partially supported by the INRIA/JSPS project “CRECOGI”, and would like to thank Michele Pagani for many fruitful discussions about an earlier version of this work. Naohiko Hoshino is supported by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603).

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. L. Miller, “Riemann’s hypothesis and tests for primality,” J. Comput. Syst. Sci. , vol. 13, no. 3, pp. 300–317, 1976.
2[2] M. O. Rabin, “Probabilistic algorithm for testing primality,” Journal of Number Theory , vol. 12, no. 1, pp. 128 – 138, 1980.
3[3] M. Agrawal, N. Kayal, and N. Saxena, “PRIMES is in P,” Ann. of Math , vol. 2, pp. 781–793, 2002.
4[4] F. D. Wood, J. van de Meent, and V. Mansinghka, “A new approach to probabilistic programming inference,” in AISTATS 2014 , 2014, pp. 1024–1032.
5[5] N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum, “Church: a language for generative models,” in UAI 2008 , 2008, pp. 220–229.
6[6] C. Jones, “Probabilistic non-determinism,” Ph.D. dissertation, University of Edinburgh, 1990.
7[7] V. Danos and R. Harmer, “Probabilistic game semantics,” ACM Trans. Comput. Log. , vol. 3, no. 3, pp. 359–382, 2002.
8[8] T. Ehrhard, M. Pagani, and C. Tasson, “Full abstraction for probabilistic PCF,” J. ACM , vol. 65, no. 4, pp. 23:1–23:44, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The Geometry of Bayesian Programming

Abstract

1 Introduction

1.1 Turning Measurable Spaces into a GoI

1.2 Outline

2 Measure-Theoretic Preliminaries

Proposition 2.1**.**

3 Syntax and Operational Semantics

3.1 Syntax and Type System

3.2 Distribution-Based Operational Semantics

Lemma 3.1**.**

Proof.

Definition 3.1**.**

3.3 Sampling-Based Operational Semantics

Definition 3.2**.**

4 Towards Mealy Machine Semantics

4.1 From PCFSS\mathbf{PCFSS}PCFSS to Proof Structures

4.1.1 Moggi’s Translation

4.1.2 Girard Translation

4.1.3 The Third Step

4.2 From Proof Structures to Mealy Machines

Remark 4.1**.**

5 Mealy Machines and their Compositions

5.1 Mealy Machines, Formally

Definition 5.1**.**

Convention 5.1**.**

5.2 Behavioural Equivalence

Definition 5.2**.**

Proposition 5.1**.**

5.3 Proof of Proposition 5.1

Lemma 5.1**.**

Proof.

Lemma 5.2**.**

Proof.

Proposition 5.2**.**

Proof.

Lemma 5.3**.**

Proof.

Theorem 5.1**.**

Proof.

Theorem 5.2**.**

Proof.

5.4 Constructions on Mealy Machines

5.4.1 Composition

5.4.2 Monoidal Products

Monoidal Products of Int-objects

Monoidal Product of Mealy Machines

Convention 5.2**.**

5.4.3 Axiom Link and Cut Link

5.4.4 Symmetry

5.4.5 A Modal Operator

Proposition 5.3**.**

Convention 5.3**.**

Dereliction

Proposition 5.4**.**

Digging and Contraction

Proposition 5.5**.**

Weakening

5.4.6 Real Numbers

5.4.7 Measurable Functions

5.4.8 Conditional Branching

Proposition 5.6**.**

Proof.

5.4.9 A State Monad

5.4.10 Scoring

5.4.11 Sampling

5.5 Diagrammatic Reasoning

Proposition 5.7**.**

Proposition 5.8**.**

5.6 Proof of Proposition 5.7

5.6.1 The Category of Partial Measurable Functions

Definition 5.3**.**

Proposition 5.9**.**

Proposition 2.1.

Lemma 3.1.

Definition 3.1.

Definition 3.2.

4.1 From $\mathbf{PCFSS}$ to Proof Structures

Remark 4.1.

Definition 5.1.

Convention 5.1.

Definition 5.2.

Proposition 5.1.

Lemma 5.1.

Lemma 5.2.

Proposition 5.2.

Lemma 5.3.

Theorem 5.1.

Theorem 5.2.

Convention 5.2.

Proposition 5.3.

Convention 5.3.

Proposition 5.4.

Proposition 5.5.

Proposition 5.6.

Proposition 5.7.

Proposition 5.8.

Definition 5.3.

Proposition 5.9.

Proposition 5.10.

Definition 5.4.

Proposition 5.11.

Proposition 5.12.

Definition 5.5 ( $\mathbf{Int}$ -construction).

Definition 5.6.

Theorem 5.3 ([26]).

Corollary 5.1.

6 Mealy Machine Semantics for $\mathbf{PCFSS}$

Proposition 6.1.

Theorem 7.1 (Adequacy).

Corollary 7.1.

Theorem 7.2 (Adequacy).

Corollary 7.2.

Lemma 8.1.

Lemma 8.2.

Proposition 8.1.

Lemma 8.3.

Lemma 8.4 (Basic Lemma).

Theorem 8.1.

Lemma 9.1.

Lemma 9.2.

Lemma 9.3.

Proposition 9.1.

Proposition 9.2.

Proposition 10.1.

Definition 10.1.

Proposition 10.2.

Proposition 10.3.

11 Probabilistic Mealy Machine Semantics for $\mathbf{PCFSS}$

Theorem 11.1 (Soundness and Adequacy).