1
Compositional Inference Metaprogramming with Convergence Guarantees
Shivam Handa
nnnn-nnnn-nnnn-nnnn
Massachusetts Institute of Technology
[email protected]
,
Vikash Mansinghka
Massachusetts Institute of Technology
and
Martin Rinard
Massachusetts Institute of Technology
Abstract.
Inference metaprogramming enables effective probabilistic programming by
supporting the decomposition of executions
of probabilistic programs into subproblems and the deployment
of hybrid probabilistic inference algorithms that apply different
probabilistic inference algorithms to different subproblems.
We introduce the concept of independent subproblem inference
(as opposed to entangled subproblem inference in which the
subproblem inference algorithm operates over the full program trace)
and present a mathematical framework for studying convergence
properties of hybrid inference algorithms that apply different
Markov-Chain Monte Carlo algorithms to different parts of the
inference problem. We then use this formalism to prove asymptotic convergence results
for probablistic programs with inference metaprogramming.
To the best of our knowledge this is the first asymptotic convergence
result for hybrid probabilistic inference algorithms defined by
(subproblem-based) inference metaprogramming.
Probabilistic Programming, Metaprogramming, Inference Algorithms
††journal: PACMPL††journalvolume: 1††journalnumber: Under review as a conference paper at POPL 2020††article: 1††journalyear: 2019††publicationmonth: 1††doi: ††copyright: none
1. Introduction
Probabilistic modeling and inference are now mainstream
approaches deployed in many areas of computing and data analysis (Russell and
Norvig, 2003; Thrun
et al., 2005; Liu, 2008; Murphy, 2012; Gelman et al., 2014; Forsyth and Ponce, 2002).
To better support these computations, researchers have developed
probabilistic programming languages, which include constructs that directly
support probabilistic modeling and inference within the language
itself (Milch et al., 2007; Goodman et al., 2008; Goodman and
Stuhlmueller, 2014; Mansinghka
et al., 2014; Gordon et al., 2014a; Tristan et al., 2014; Gordon
et al., 2014b; Tolpin
et al., 2015; Carpenter et al., 2016; Labs, 2017).
Probabilistic inference strategies provide the
probabilistic reasoning required to implement these constructs.
It is well known that no one probabilistic inference strategy is
appropriate for all probabilistic inference and modeling tasks (Russell and
Norvig, 2003; Mansinghka et al., 2018).
Indeed, effective inference often involves breaking an
inference problem down into subproblems, then applying different
inference strategies to different subproblems as appropriate (Russell and
Norvig, 2003; Mansinghka et al., 2018).
Applying this approach to probabilistic programs, specifically by
specifying subtask decompositions and inference strategies to apply
to each subtask, is called inference metaprogramming.
Inference metaprogramming has been shown
to dramatically improve the execution time and
accuracy of probabilistic programs (in comparison with monolithic
inference strategies that apply a single inference strategy to the
entire program) (Mansinghka et al., 2018).
1.1. Probabilistic Inference and Convergence
Executions of probabilistic programs typically use probabilistic inference to generate
samples from the underlying probability distribution that the program defines (Mansinghka et al., 2018).
Many probabilistic inference algorithms are iterative, i.e., they perform multiple steps that bring
samples closer to the specified probability distribution. A standard correctness property of such algorithms is
asymptotic convergence, i.e., a guarantee that, in the limit as the number of iterations increases,
the resulting sample will be drawn from the defined posterior distribution. Markov-Chain
Monte-Carlo (MCMC) algorithms (which include Metropolis-Hastings (Chib and
Greenberg, 1995)
and Gibbs sampling (Meyn and Tweedie, 2012)) comprise a widely-used (Milch et al., 2007; Goodman et al., 2008; Goodman and
Stuhlmueller, 2014; Mansinghka
et al., 2014) class of probabilistic
inference algorithms that often come with asymptotic convergence
guarantees (Geyer, 1998).
Using inference metaprogramming to decompose and solve
inference problems into subprograms produces new hybrid probabilistic
inference algorithms.
Whether or not these new hybrid inference algorithms (as implemented in the
inference metaprogramming language) also asymptotically converge is often a question of interest
(because it directly relates to the compositional soundness of the
inference metaprogram).
Over the last several decades the field has developed many iterative probabilistic
inference algorithms (Meyn and Tweedie, 2012) and proved convergence results for these algorithms (Berti
et al., 2008).
Many of these algorithms compose inference steps applied to different parts of the problem and would therefore
seem to be a promising candidate for proving convergence properties of probabilistic programs
with inference metaprogramming.
Unfortunately, these algorithms, and their associated convergence proofs,
have several onerous restrictions. Specifically, they model the state of the
system as a product space over a fixed set of random choices and work with
policies whose selection of random choices to resample does not depend on the
state of the system. The basic mathematical framework (and associated convergence proofs)
is therefore not applicable to probabilistic programming with inference metaprogramming
— in this new setting, the random choices in the subproblems (as defined by the random variables that
the program defines and samples) may change over time, are potentially unbounded (e.g.,
Open Universe Probabilistic Models (Milch and Russell, 2010; Wu
et al., 2016)),
and may depend on the state of the system as realized in the current values of the random choices.
For example, our framework supports programs that sample a stochastic choice, then
compute the set of stochastic choices to include in a subproblem as a function
of the sampled stochastic choice.
1.2. Our Result
We present the first asymptotic convergence result for hybrid probabilistic inference algorithms
defined by inference metaprogramming. Given a probabilistic program with posterior distribution π,
we show that the hybrid algorithms applied to that program are π-irreducible, aperiodic,
and have π as their stationary distribution. This result stands on two foundational new results:
Independent Subproblems: We consider executions of probabilistic programs
that produce program traces (Mansinghka et al., 2018; Wingate
et al., 2011).
These traces record the random choices made during the execution. With inference metaprogramming, the subproblem
inference algorithms must operate only over the random choices in the subproblem. Previous formulations
of subproblem inference, however, define subproblem inference as operating over the entire trace (Mansinghka et al., 2018).
This approach entangles the subproblem with the full program trace and complicates the analysis of the
interaction between subproblems and inference metaprogramming.
We instead formalize subproblem inference using a new technique that extracts each subproblem from the original
program trace into its own independent trace. Inference is then performed over the full extracted
trace, with the newly generated trace then stitched back into the original trace
to complete the subproblem inference. By detangling the subproblem from the full
trace, this approach delivers the clean separation of subproblems and subproblem inference
required to state and prove the new asymptotic convergence result.
Mathematical Framework: We present a new and more general mathematical framework
for studying the composition of probabilistic inference algorithms applied to subproblems.
A key aspect of this framework is that it supports state-dependent embeddings between program trace spaces
defined by different probabilistic programs. The framework therefore enables us to model
subproblem inference by embedding the original program trace into the space of program
traces defined by the detangled subproblem, moving the embedded trace within this new space of
program traces, then injecting the new trace back into the original trace space.
We note that this mathematical framework is not specifically tied to probabilistic programming.
It supports compositional asymptotic convergence results for a range of probabilistic
inference algorithms that operate over general probability spaces (even uncomputable ones)
by mapping subspaces of the original space into new isolated probability spaces,
iteratively applying MCMC inference algorithms to the isolated space, then mapping the
results back into the original space. An important property is that
the applied inference algorithms, the isolated probability spaces, and the mappings may
all be state-dependent.
Building on these results, we prove a new asymptotic convergence result for inference metaprograms
that apply asymptotically converging MCMC algorithms to appropriately defined subproblems. This
result identifies two key restrictions on the subproblem selection strategies that the
inference metaprogram uses to identify subproblems. These restrictions
guarantee asymptotic convergence for inference metaprograms that
apply a large class of asymptotically converging MCMC algorithms
to the specified subproblems:
Reversibility:
The subproblem selection strategy must be reversible, i.e.,
given n traces t1,t2…tn such that ti can be transformed
into trace ti+1 by modifying parts of the trace ti selected by the subproblem selection strategy,
then it must be possible to transform trace tn into trace t1 by modifying
parts of the trace tn selected by the subproblem selection strategy.
Intuitively, it must be possible to reverse any changes that can be made by countably applying
the same subproblem selection strategy to a starting trace.
Connectivity:
The combination of all of the subproblem selection strategies in the
inference metaprogram must connect the entire probability space. Given two
traces t and t′, we say that the subproblem selection strategies connect
t and t′ if it is possible to transform t into t′ by modifying
the parts of t selected by one of the subproblem selection strategies.
The subproblem selection strategies connect the probability space
if there do not exist sets of traces U and V such that 1) the probability of U∪V
sums to one and 2) there does not exist t∈U and t′∈V such that the
subproblem selection strategies connect t and t′.
Conceptually, these two restrictions together ensure that the hybrid inference algorithm
defined by the inference metaprogram does not become stuck in a subset of the positive
probability space and therefore unable to sample some positive probability set.
Effective probabilistic programming requires subproblem identification and
hybrid probabilistic inference algorithms applied to the
identified subproblems. The results in this paper
enable the sound and complete decomposition of
otherwise intractable probabilistic inference problems
into tractable subproblems solved by different inference algorithms.
It also characterizes properties that entail asymptotic convergence of these resulting hybrid
probabilistic inference algorithms.
2. Language and Execution Model
Our treatment of subproblem selection, extraction, and stitching works with a core
probabilistic programming language (Figure 1) based on the lambda calculus.
A program in this language is a sequence of assume and
observe statements. Expressions are derived from the untyped lambda calculus augmented
with the Dist(e) expression, which allows the program
to sample from a distribution Dist given parameter e.
The core language supports computable distributions over computable expressions (including computable reals).
We believe it is straightforward to generalize the language to include more general probability spaces
(e.g., probability spaces including uncomputable reals) at the cost of a larger formalism.
The mathematical framework we use to prove convergence (Section 4) works over
general probability spaces including probability spaces with uncomputable objects.
Dist(e) can be seen as a set of probabilistic lambda calculus expressions
{ed∣ed∈Dist(e)⊆Ev}. Based on the parameter
expression e, Dist(e) makes a stochastic choice and returns
an expression ev∈Dist(e). We define:
[TABLE]
[TABLE]
Traces:
When a program executes, it produces an execution trace (Figure 2).
This trace records the executed
sequence of assume and observe commands, including the value of
each evaluated (sub)expression. It also assigns a unique
identifier to each evaluated (sub)expression and stochastic choice. These identifiers
will be later used to construct a dependence graph used to define the subproblem given
a set of stochastic choices in the subproblem.
We define the execution, including the generation of valid traces t, with the
transition relation ⇒s⊆Σv×Σid×P→T (Figure 3). Conceptually,
the transition relation executes program p
under the environment σv,σid to obtain a trace t, where
σv:Vars→V and σid:Vars→ID.
σv is a map from variable name to its corresponding assigned value,
whereas σid gives the id of the expression which assigned this value
to that variable. Because of the nondeterminism associated with stochastic choices,
the execution strategy matters for the semantics of the language.
We use call by value as the execution strategy and forbid the execution
of expressions within a lambda.
Given a program p, we define the set of all valid traces which can be obtained
by executing p as Tp=Traces(p).
[TABLE]
Given a trace, we can drop the computed values and
assigned ids and reroll the augmented expressions to recover the underlying program.
The transition relation ⇒r⊆T→P (Figure 4)
formalizes this procedure. Given a trace t, we define
[TABLE]
Note that ∀\leavevmode t,p.\leavevmode t∈Traces(p)⟹p=Program(t). The
reverse may not be true as there are additional constraints that valid traces
must satisfy. Two traces are equivalent if and only if they differ at
most in the choice of unique identifiers selected for each augmented expression and stochastic choice.
Dependence Graphs:
Given a trace t, we define the dependence graph ⟨N,D,E⟩=Graph(t) as a 3-tuple ⟨N,D,E⟩ where N:ID→{⊥,Sample} is a map from
ID to either ⊥ (when the corresponding augmented expression
for an id∈ID is a deterministic computation) or Sample
(when the augmented expression for an id∈ID makes a stochastic choice).
D⊆ID×ID are data dependence edges.
There is a data dependence edge ⟨id1,id2⟩∈D if the value
of the augmented expression id2 directly depends on the
augmented expression id1.
E⊆ID×ID are existential edges.
There is a existential edge ⟨id1,id2⟩∈E if the value
of the augmented expression id1 controls whether or not an
augmented expression id2 executed. For example, in a lambda application
(ae1\leavevmode ae2)x=ae3, all augmented expressions in ae3 were executed only
because of the value of ae1. Changing the value of ae1 would require
dropping the augmented expression ae3 and recomputing another
expression based on the new value of ae1.
We formalize the dependence graph generation procedure as a transition
relation ⇒g⊆T→⟨N,D,E⟩ (Figure 5).
We use the shorthand
⟨N,D,E⟩=Graph(t) if ⟨N,D,E⟩ is the dependence graph
for trace t i.e. t⇒g⟨N,D,E⟩.
Valid Subproblems:
Subproblem inference must 1) change only the identified subproblem and not the
enclosing trace while 2) producing a valid trace for the full probabilistic
program. Valid subproblems must therefore include all parts of the trace
that may change if any part of the subproblem changes. We formalize this
requirement as follows.
Given a trace t with dependence graph
⟨N,D,E⟩=Graph(t), a valid subproblem S⊆dom\leavevmode N
must satisfy two properties: 1) there are no outgoing existential edges and
2) all outgoing data dependence edges must terminate at a
stochastic choice (Sample node).
The first property ensures that parts of the trace which were executed due to
values of expressions in the subproblem are also part of the subproblem. An
example of this is lambda evaluation (ae1\leavevmode ae2)x=ae3. If the value of ae1
can be changed by the subproblem inference, ae3 may or may not exist. Hence
ae3 should be within the subproblem to ensure that the inference algorithm
can change it if necessary.
The second property ensures that any change made by the subproblem inference can be
absorbed by a stochastic choice. For example, when the internal parameter of a Dist
changes, the change can be absorbed by changing the probability of the trace
to account for the change in the probability of the value generated by
the execution of the absorbing Dist node. The changes are
absorbed by the stochastic choice and do not propagate further into the remaining
parts of the trace outside the subproblem. We formalize these two properties as follows:
∀id∈S.\leavevmode ⟨id,ido⟩∈E⟹ido∈S
∀id∈S.\leavevmode ⟨id,ido⟩∈D∧ido∈dom\leavevmode N−S⟹N(ido)=Sample
The absorbing set A⊆dom\leavevmode N−S of a subproblem S
is the set of stochastic choices whose value directly depends on the
nodes in the subproblem i.e. A={ida∣ida∈dom\leavevmode N−S∧∃\leavevmode idi∈S.\leavevmode ⟨idi,ido⟩∈D}.
The input boundary B⊆dom\leavevmode N−S of a subproblem
S is the set of nodes on which the subproblem directly depends, i.e.,
B={idb∣idb∈dom\leavevmode N−S∧∀\leavevmode idi∈S.\leavevmode ⟨idb,idi⟩∈D}.
Entangled Subproblem Inference:
Following (Mansinghka et al., 2018), we define entangled subproblem inference
using the infer procedure (Mansinghka et al., 2018),
which takes as parameters a subproblem selection strategy SS,
an inference tactic IT, and an input trace t. The
subproblem inference mutates t to produce a new trace t′.
[TABLE]
This formulation works with arbitrary subproblem selection strategies SS.
The requirement is that, given a trace t, SS must
produce a valid subproblem S over t. In practice, one way to satisfy this
requirement is to allow the programmer to specify a (potentially arbitrary)
set of stochastic choices that must be in the subproblem, with the language
implementation completing these choices into a valid subproblem (Mansinghka et al., 2018).
We also work with inference algorithms IT that
take as input a full program trace t and a valid subproblem S and return
a mutated full program trace t′. We require that the output trace t′
- is from the same program as the trace t and
- t′ differs from t only in a) the stochastic choices from the subproblem S
and b) the deterministic computations that depend on these stochastic choices.
We formalize these constraints as
t′∈Traces(Program(t))
S⊢t≡t′
Figure 6 presents
the definition of ≡. Note that operating with entangled subproblems
forces the inference tactic IT to take the full program
trace t as a parameter even though it must modify at most only the subproblem.
3. Independent Subproblem Inference
The basic idea of independent subproblem inference is to extract an
independent subtrace ts from the original trace t given a subproblem S,
perform inference over the extracted subtrace ts to obtain a new trace ts′,
then stitch ts′ back into t to obtain a new trace for the full program. Here, consistent with standard inference
techniques for probabilistic programs (Wingate
et al., 2011; Mansinghka et al., 2018), ts and ts′ are
valid traces of the same program ps (the subprogram for the subtraces ts
and ts′). The key challenge is converting the entangled subproblem (which
is typically incomplete and therefore not a valid trace of any program) into
a valid trace by transforming the subproblem to include external dependences
and correctly scope both internal and external dependences in the extracted
trace without giving the inference algorithm access to any external stochastic
choices (including latent choices nested inside certain lambda expressions
which would otherwise override choices outside the subproblem) which it must not change.
Extract Trace:
We define the extraction procedure ts=ExtractTrace(t,S)
using the transition relation
⇒ex⊆P(ID)×T→T
(Figure 7).
The extraction procedure removes Dist(ae#id)=aev
augmented expressions which are not within the subproblem and converts them
into observe statements. This transformation constrains
the value of these stochastic choices to the values present in the original trace.
It leaves the stochastic choices in the subproblem in place and therefore
accessible to the inference algorithm.
For augmented expressions of the form (ae1\leavevmode ae2)y=ae3, when
ae1 is within the subproblem, its value can change and hence
existential edges place the augmented expressions in ae3 within the
subproblem. When ae1 is not within the subproblem,
then some stochastic choices may or not be within the subproblem.
If we keep the augmented expression as is, the inference algorithm may
unroll ae3 and execute it again, changing some stochastic choices
in ae3 but not in the subproblem. If we modify ae3, the
constraint of ae3 being a valid lambda application breaks.
We solve this problem by introducing assume statements and
correctly scoping the resulting dependences.
Stitch Trace:
Given a trace t, a valid subproblem S over the trace,
and a subtrace ts, the stitching procedure stitches back the trace ts
into t to get a new trace t′.
We define the stitching procedure t′=StitchTrace(t,ts,S)
using a transition relation
⇒st⊆P(ID)×T×T→T (Figure 8),
where S⊢t,ts⇒stt′⟺t′=StitchTrace(t,ts,S).
Stitching is the dual of extraction. It uses
the original trace to figure out the structure of the resultant trace, then
stitches back the expressions to get a new trace t′.
Independent Inference:
We define independent subproblem inference using the infer procedure.
infer takes as input a subproblem selection strategy
SS, a trace t and an inference tactic IT. This
differs from tangled inference in that inference tactic IT
takes only the extracted subtrace as input and not the entire program trace.
This approach enables the use of standard inference algorithms which are
designed to operate on complete traces (and not entangled subproblems).
The new inference procedure works as described below:
[TABLE]
Soundness and Completeness:
Given a trace t, a valid subproblem S, an inferred trace t′ and ts=ExtractTrace(t,S),
we prove that extraction and stitching is sound
and complete.
Soundness in this context means that for
all possible mutated subtraces ts′∈Traces(Program(ts)),
the stitched trace t′=StitchTrace(t,ts′,S) is a valid inferred trace.
Completeness means that for all possible inferred traces t′, there
exists a mutated subtrace ts′ such that ts′∈Traces(Program(ts))
and t′=StitchTrace(t,ts′,S).
We summarize the comparison between the entangled subproblem inference and independent
subproblem inference approaches in Figure 9.
We present the theorems, lemmas and proofs of soundness and completeness
in Appendix A.2 and A.3.
4. Convergence of Stochastic Alternating Class Kernels
We next introduce the concept of class functions and
class kernels, which we use to prove the convergence of hybrid inference algorithms based
on asymptotically converging MCMC-algorithms.
We present key definitions, theorems, and lemmas
required to prove these results.
4.1. Preliminaries
We begin by introducing some key measure theory definitions (Geyer, 1998; Meyn and Tweedie, 2012; Tierney, 1994). Readers familiar with measure theory may wish
to skip this subsection.
Definition 0 (Topology).
A Topology on set T is a collection T of subsets of T having the
following properties:
- (1)
∅∈T* and T∈T.*
2. (2)
T* is closed under arbitrary unions. i.e. For any collection {Ai}i∈I, if for all i∈I,Ai∈T, then
i∈I⋃Ai∈T.*
3. (3)
T* is closed under finitely many intersections. i.e. For any finite collection {Ai}i∈I, if for all i∈I,
Ai∈T, then
i∈I⋂Ai∈T.*
Given a set T and a topology T defined on T, the pair (T,T) is called a topological space.
Given a topological space (T,T), all sets A∈T are called open sets.
From this point on, when we say T is a topological space, we
assume we are talking about any topology T on T.
Definition 0 (σ-algebra).
Let T be a set. A collection
Σ of subsets of T is a σ-field* or a
σ-algebra over T if and only if
T∈τ and τ is closed under countable unions, intersections and
complements, i.e.,*
- (1)
T∈Σ* and ∅∈Σ.*
2. (2)
A∈Σ* implies Ac∈Σ.*
3. (3)
Σ* is closed under countable unions, i.e. For any countable collection
{Ai}i∈I, if for all i∈I, Ai∈Σ, then
i∈I⋃Ai∈Σ.*
*A measurable space is a pair (T,Σ) such that T is a set and
Σ is a σ-algebra over T.
*
Definition 0 (Measure).
m:Σ→R∪{−∞,∞}* is a
measure over a measurable space (T,Σ) if*
- (1)
m(A)≥m(∅)=0* for all A∈Σ,*
2. (2)
For all countable collections {Ai}i∈I of pairwise disjoint
sets in Σ,
[TABLE]
Given a measurable space (T,Σ), a measure π on (T,Σ) is called
probability measure if π(T)=1.
We call the tuple (T,Σ,π) a probability space, if π is
a probability measure over the measurable space (T,Σ).
Given a set T, a collection of subsets Aα⊆T (not necessarily countable),
we denote the smallest σ-algebra Σ such that Aα∈Σ
for all α by σ({Aα}).
Definition 0 (Borel σ-algebra).
Given a topological space T,
a Borel σ-algebra, B(T) is the smallest σ-algebra
containing all open sets of T.
(T,B(T)) is called a Borel Space when T is a topological space
and B(T) is a Borel σ-algebra over T.
Consider a topological space (T,T). For any set A∈T,
(A,TA) is also a topological space (where TA={E∩A∣E∈T}).
Given a measurable space (T,Σ), for any set A∈Σ,
(A,ΣA) is also a measurable space (where ΣA={E∩A∣E∈Σ}).
Topology and σ-algebra over Reals:
Consider the smallest topology R over the real space R which
contains all intervals (a,∞)⊆R for all −∞<a<∞.
To avoid confusion, we will refer the topological space over R with
the symbol R. We can now use this topology to define a Borel σ-algebra B(R)
over this topological space. Using the above defined topology and σ-algebra,
we can define a topological space and a σ-algebra for any open or closed intervel
in R.
Definition 0 (Measurable Function over Measurable Spaces).
Given measurable spaces (T1,Σ1) and (T2,Σ2), a function
h:T1→T2 is
a measurable function from (T1,Σ1) to (T2,Σ2)
if h−1{B}∈Σ1
for all sets B∈Σ2, where h−1{B}={x:h(x)∈B}.
The measurable function h is also known as a Random Variable
from measurable space (T1,Σ1) to (T2,Σ2).
If h:T1→T2 is a measurable function from
measurable space (T1,Σ1)
to a measurable space (T2,Σ2), and π is a
probability measure on (T1,Σ1), then πh:Σ2→[0,1]
defined as πh(A)=π(h−1(A))
is a probability measure on (T2,Σ2).
Definition 0 (Pushforward measure).
Given a probability space (T1,Σ1,π) and a measurable function
to a measurable space (T2,Σ2), then the pushforward measure of π
is defined as a probability measure f∗(π):Σ2→[0,1] given by
[TABLE]
Definition 0 (Measurable).
A function f is measurable if f is a measurable function
from a measurable space (T,Σ) to (R,B(R)).
f is measurable if and only if ∀a∈R.{x∈T∣f(x)>a}∈Σ.
Intuitively, for every real number −∞<a<∞, there exists a set A∈τ containing all
elements which f maps to real numbers greater than a.
Definition 0 (Simple Function).
Given a measurable space (T,Σ),
s:T→[0,∞) is a simple
function if s(t)=Σi=1NaiIAi(t), where
ai∈[0,∞), Ai∈Σ, IAi(t)=1 if
t∈Ai and [math] otherwise, and the Ai are disjoint.
Definition 0 (Lebesgue Integral).
Given a measurable space (T,Σ)
and a measure m over (T,Σ), for each
A∈Σ and disjoint Ai∈Σ, we define
[TABLE]
Hence
[TABLE]
Given a function f:T→[0,∞) which is measurable,
we define
[TABLE]
where s≤f if ∀t.s(t)≤f(t), as the Lebesgue Integral
of function f over a set E in measurable space (T,τ) with measure m.
Given a function f:T→R which is measurable, we define
[TABLE]
where f+(t)=max(f(t),0) and f−(t)=max(−f(t),0).
An integral of a measurable function f is the sum of the integral of
the positive part and the integral of the negative part.
From this point on, ∫f(t)m(dt) denotes ∫Tf(t)m(dt) where
T is the set over which the measurable space and measure m is defined.
Definition 0 (Markov Transition Kernel).
Let (T,Σ) be a measurable space. A Markov Transition kernel on (T,Σ) is
a map K:T×Σ→[0,1] such that :
- (1)
for any fixed A∈Σ, the function K(.,A) is measurable function from
(T,Σ) to [0,1].
2. (2)
for any fixed t∈T, the function K(t,.) is a probability
measure on (T,Σ).
Definition 0 (π-irreducible).
Given a probability space (T,τ,π), a Markov Transition Kernel
K:T×τ→[0,1] is
π-irreducible if for each t∈T and each A∈τ, such that π(A)>0,
there exists an integer n=n(t,A)≥1 such that
[TABLE]
where Kn(t,A)=∫TKn−1(t,dt′)K(t′,A)
and K1(t,A)=K(t,A).
Definition 0 (Stationary Distribution).
Given a probability space (T,τ,π), π is the stationary
distribution of a π-irreducible Markov Transition Kernel K:T×τ→[0,1] if
[TABLE]
where (πK)(A)=∫K(t,A)π(dt).
Definition 0 (Aperiodicity).
Given a probability space (T,τ,π), a π-irreducible
Markov Transition Kernel K:T×τ→[0,1]
is periodic if there exists an integer d≥2 and a sequence
{E0,E1,…Ed−1} and N of d non-empty disjoint sets in τ such
that, for all i=0,1,…d−1 and for all t∈Ei,
- (1)
(∪i=0dEi)∪N=T**
2. (2)
K(t,Ej)=1 for j=i+1(mod\leavevmode d)**
3. (3)
π(N)=0**
Otherwise K is aperiodic.
Definition 0 (Asymptotic convergence).
Given a probability space (T,τ,π) and sample t∈T,
a Markov Transition Kernel K:T×τ→[0,1]
is said to asymptotically converge to π if
[TABLE]
where ∣∣.∣∣ refers to the total variation norm of a measure λ,
defined over measurable space (T,τ), defined as
[TABLE]
Theorem 15.
Given a probability space (T,τ,π) and a Markov Transition Kernel
K:T×τ→[0,1].
If K is π-irreducible, aperiodic, and πK=π holds,
then for π-almost all t,
[TABLE]
i.e., K converges to π.
π-almost all t means that there exists a set D⊆T such
that π(D)=1 and for all t∈D, limn→∞∣∣Kn(t,.)−π∣∣=0.
Athreya, Doss, and Sethuraman proved this theorem (Athreya et al., 1996).
All popular asymptotically converging Markov Chain Algorithms, like variants of
the Metropolis Hasting and Gibbs Algorithm, when parameterized over probability space
(T,τ,π), are
π-irreducible and aperiodic with stationary distribution π.
Definition 0 (Subalgebra).
E* is a σ-subalgebra of a measurable space (T,τ) if E
is a σ-algebra of some set A⊆T and E⊆τ.*
Definition 0 (Induced Probability space).
Given a probability space (T,τ,π)
and A∈τ, define τA={B∩A∣B∈τ}.
Note that since τ is a σ-algebra, τA is a sub-algebra over
set A. (A,τA) is a measurable space. If π(A)>0, then function
πA:τA→[0,1], defined as πA(x)=π(x)/π(A),
is a probability measure over (A,τA).
(A,τA,πA) is the probability space induced by A∈τ.
Definition 0 (Regular Conditional Probability Measure over Product Space).
Given a product probability space (T1×T2,Σ1⊗Σ2,π), a regular
conditional probability measure v:T1×Σ2→[0,1] is a transition kernel
such that
For all A∈Σ2, v(.,A) is a measurable.
For all t∈T1, v(t,.) is a probability measure over (T1,Σ1).
For all B∈Σ1, π(B×A)=∫Bv(t,A)π(dt×T2).
4.2. Class Functions and Class Kernels
We next introduce class functions and class kernels, which formalize the
concept of subproblem based inference metaprograms. Gibbs sampling is a special case
of this framework. To aid the reader, we highlight
how the framework is specialized to Gibbs sampling as we introduce the definitions,
lemmas, and theorems.
Definition 0 (Two-way measurable function).
Given a measurable space (T1,Σ1)
and a measurable space (T2,Σ2),
a measurable function f from (T1,Σ1)
to (T2,Σ2) is a two-way
measurable function, if for
all sets A∈Σ1 there exists a set B∈Σ2,
such that
[TABLE]
i.e., the map of any set A∈Σ1 is a set in Σ2.
Given a two-way measurable function f, we can extend
f to a function g:Σ1→Σ2
between sets in Σ1 to sets in Σ2,
where g(A)={f(t)∣t∈A}. Since f is a measurable function,
the function f−1:Σ2→Σ1 is also
defined which maps sets in Σ2 to sets in Σ1.
Note: From this point on,
given a two-way measurable function f, we will simply use it to
represent function g defined above, mapping sets from Σ1
to Σ2. We also use f−1 as a reverse map from Σ2
to Σ1 defined above. Note that f−1 may or may not be the
inverse of the function f.
Example 0.
(Gibbs)
Given measurable spaces (X,X), (Y,Y) and
product space (X×Y,X⊗Y), the projection
functions projx and projy are two-way
measurable functions from (X×Y,X⊗Y)
to (X,X) and (Y,Y), respectively.
Definition 0 (Generalized Product Space).
Given countable sets of
disjoint measurable spaces {(Xi,Xi)∣i∈I} and
{(Yi,Yi)∣i∈I}, we
define the generalized product space (C,C) as:
[TABLE]
Given a probability measure π:C→[0,1] on a generalized product space (C,C),
we can define a conditional distribution πi over each product space
(Xi×Yi,Xi⊗Yi) such that
[TABLE]
In the following we require, for each product probability space
(Xi×Yi,Xi⊗Yi,πi), that we can construct a regular
conditional probability measure vπi:Xi×Yi→[0,1],
such that for each U×V∈Xi⊗Yi
[TABLE]
When constructing a generalized product space, we always
prove the above assumption, i.e., a regular conditional probability measure exists.
Example 0.
(Gibbs)
Given measurable spaces (X,X), (Y,Y), consider
the product probability spaces (X×Y,X⊗Y,π)
and (Y×X,Y⊗X,π′), where
[TABLE]
A Gibbs sampler, sampling over (X×Y,X⊗Y,π),
requires regular conditional probabilities
vπ:X×Y→[0,1] and
vπ′:Y×X→[0,1] for independence sampling. Therefore
both (X×Y,X⊗Y) and (Y×X,X⊗Y)
are generalized product spaces.
Definition 0 (Class Functions).
Given a measurable space (T,Σ) and a
generalized product space (C,C),
a class function is a one-to-one two-way measurable function
f from (T,Σ) to (C,C).
Given a class function f and the target product space
(C,C), the projection functions
[TABLE]
are also two-way measurable functions from
space (T,Σ) to projection spaces \big{(}\bigcup\limits_{i\in I}X_{i},\sigma(\bigcup\limits_{i\in I}{\mathcal{}X}_{i})\big{)} and \big{(}\bigcup\limits_{i\in I}Y_{i},\sigma(\bigcup\limits_{i\in I}{\mathcal{}Y}_{i})\big{)} respectively.
Section 5 uses class functions to model subproblem selection strategies — each
class function produces a tuple ⟨x,y⟩ that
identifies the parts of the trace that are outside (x) and inside (y)
the selected subproblem. Because the class function may depend on the input
trace t, our framework supports subproblem selection strategies that
depend on the input trace (and specifically on the values of stochastic choices in the trace).
Example 0.
(Gibbs)
Given measurable space (X,X), (Y,Y) and
generalized product spaces (X×Y,X⊗Y) and
(Y×X,Y⊗X), the identity function
id:X×Y→X×Y and the reverse function
re:X×Y→Y×X (i.e. re(⟨x,y⟩)=⟨y,x⟩)
are class functions from (X×Y,X⊗Y) to
(X×Y,X⊗Y) and (Y×X,Y⊗X).
Note that idx=projx and rex=projy.
Given a probability space (T,Σ,π),
a class function f to a
generalized product space (C^{f},{\mathcal{}C}^{f})=\big{(}\bigcup\limits_{i\in I}X^{f}_{i}\times Y^{f}_{i},\sigma(\bigcup\limits_{i\in I}{\mathcal{}X}^{f}_{i}\otimes{\mathcal{}Y}^{f}_{i})\big{)},
f maps each point t∈T to some point ⟨x,y⟩ in Xif×Yif for some i.
For each i∈I, we require a function Ki:Xif→(Yif×Yif)→[0,1]
such that
Ki(x):Yif×Yif→[0,1] is a
vf∗(π)i(x,.)-irreducible, aperiodic Markov Transition
Kernel with vf∗(π)i(x,.) as it’s stationary distribution.
Ki(.)(y,A) is measurable for all y∈Yif and all A∈Yif.
When we apply the framework to prove the convergence of inference metaprograms (Section 5),
each Ki represents an function that, when given a parameter x, returns a converging Markov kernel
based on x, where x corresponds loosely to the parts of the program trace that lie outside the
scope of the selected subproblem. These parameterized Ki support sophisticated subproblem inference
strategies that depend on how the subproblem decomposes the program trace. For example, the metaprogram
may apply one inference strategy to subproblems with discrete random choices and another to subproblems
with continuous choices.
Definition 0 (Class Kernels).
Given (T,Σ,π),f, and Ki as above, we define a
class kernel Kf:T×Σ→[0,1] as a Markov Transition Kernel
defined as
[TABLE]
where f(t)=⟨x,y⟩∈Xi×Yi and U×V=f(A)∩(Xi×Yi).
Example 0.
(Gibbs)
Consider a probability space (X×Y,X⊗Y,π) and
class functions id and re, Markov transition Kernels
Kid(⟨x,y⟩,U×V)=vπ(x,V)I(x,U) and
Kre(⟨x,y⟩,V×U)=vπ′(y,U)I(y,V)
representing independence samplers
are
class kernels.
4.3. Properties of Class Kernels
Consider a probability space (T,Σ,π) and
a class function f to a generalized
product space (C^{f},{\mathcal{}C}^{f})=\big{(}\bigcup\limits_{i\in I}X^{f}_{i}\times Y^{f}_{i},\sigma(\bigcup\limits_{i\in I}{\mathcal{}X}^{f}_{i}\otimes{\mathcal{}Y}^{f}_{i})\big{)}.
Below, we prove properties of a Class Kernel Kf within this context.
Lemma 0.
For all t∈T and A∈Σ,
[TABLE]
where f(t)=⟨x,y⟩∈Xi×Yi and U×V=f(A)∩Xi×Yi.
We present the proof of this lemma in Appendix A.1 (Lemma 1).
Lemma 0.
[TABLE]
We present the proof of this lemma in Appendix A.1 (Lemma 2).
Lemma 0.
Kf* is aperiodic if for at least one x∈Xi for some i∈I,
Ki(x):Yi×Yi→[0,1]
is aperiodic.*
We present the proof of this lemma in Appendix A.1 (Lemma 3).
4.4. Connecting the probability space
We use a finite set of class functions F={f1,f2…fn}
to model the inference steps of the hybrid inference metaprogram (Section 5).
A critical concept here is that, together, the F must connect the underlying probability
space — conceptually, starting at any positive probability set contained within the
space, it must be possible to reach any other positive probability set by
following a path of positive probability sets as defined by F. If F does not
connect the space, it is possible for the inference metaprogram to become stuck
within an isolated subspace, with some positive probability sets unreachable
even in the limit.
Definition 0 (Connecting the space (T,Σ,π)).
Given a probability space (T,Σ,π),
a finite set of class functions F={f1,f2…fn}
connect the probability space (T,Σ,π), if
for all sets A∈Σ and any two functions f,g∈F
[TABLE]
For standard two-component Gibbs sampling, there are only two class functions,
specifically id and re:
Example 0 (Connected product space).
(Gibbs)
Given a probability space (X×Y,X⊗Y,π)
and class functions id and re, then
id and re connect the space (X×Y,X⊗Y,π),
if for all sets U×V∈X⊗Y,
[TABLE]
as
[TABLE]
and
[TABLE]
4.5. Stochastic Alternating Class Kernels
Consider a probability space (T,Σ,π) where
Σ is countably generated.
We construct a new Markov Chain Transition Kernel
using a finite set of class functions F={f1,f2…fm}
and respective class Kernels Kf1,Kf2…Kfm.
Definition 0 (Stochastic Alternating Markov Chain Transition Kernel).
Given m positive real numbers pk∈(0,1) which sum to 1 (i.e. k=1∑mpk=1),
we define a stochastic alternating Markov Chain Transition Kernel
K:T×Σ→[0,1] as
[TABLE]
This transition kernel corresponds to randomly picking a class kernel Kfk
with probability pk and using it to transition into the next Markov Chain State.
Example 0.
(Gibbs)
The Markov Transition Kernel for a 2 component Gibbs sampler
over product probability space (X×Y,X⊗Y,π) is
[TABLE]
We now consider the question of convergence of the Stochastic Alternating Transition Kernel.
Let Rtk={A∈Σ∧∃1≤n≤k.Kn(t,A)>0} be the set of all sets in Σ which
are reachable by kernel K in k steps
starting from element t∈T.
Consider the limiting case Rt∞. Let Bt∞={t∈T∣∀\leavevmode A∈Σ.A∈/Rt∞⟹t∈/A}
which is the set of elements t′∈T such that any set A in Σ which contains t′ is reachable by kernel K.
Lemma 0.
For any f∈F, any element t′∈Bt∞ such that
f(t′)=⟨x,y⟩∈Xif×Yif, and any
set U×V∈σ(Xif⊗Yif) such that A=f−1(U×V),
the following condition holds:
[TABLE]
We present the proof of this lemma in Appendix A.1 (Lemma 4).
Lemma 0.
For any positive probability set A and any function f∈F,
if A⊆fx−1(fx(Bt∞)) then A∈Rt∞.
We present the proof of this lemma in Appendix A.1 (Lemma 5).
Lemma 0.
If F connects the space (T,Σ,π) then there does not exist a positive probability set A∈Σ,
such that A⊆⋂f∈Ffx−1((fx(Bt∞))c).
We present the proof of this lemma in Appendix A.1 (Lemma 6).
Theorem 37.
If F connects the space (T,Σ,π) then the Markov Transition Kernel
K is π-irreducible.
We present the proof of this theorem in Appendix A.1 (Theorem 7).
Theorem 38.
π* is the stationary distribution of Markov Kernel K, i.e.*
[TABLE]
We present the proof of this theorem in Appendix A.1 (Theorem 8).
Theorem 39.
The Markov Transition Kernel K is aperiodic if at least one of the
class kernels Kfj is aperiodic.
We present the proof of this theorem in Appendix A.1 (Theorem 9).
Theorem 40.
Markov Transition Kernel K converges to probability distribution π.
Proof.
Using Theorems 15, 37, 38, and 39.
∎
5. Inference Metaprograms
We next formalize the concept of the probability of a trace, introduce
inference metaprogramming, and use the results in Section 4 to prove
the convergence of inference metaprograms.
5.1. Preliminaries
Here we relate the concepts in Section 4 to concepts used in probabilistic programming.
Probability of a Trace:
Because two traces are equivalent if they differ (if at all) only in the choice of unique identifiers,
within this section, for clarity, we drop the id’s associated with augmented expressions
and stochastic choices in traces and augmented expressions wherever they are not required.
Assuming a countable set of variable names allowed in our probabilistic lambda
calculus language, a countable number of expressions and a countable number of
programs can be described in our probabilistic lambda calculus language.
Within our probabilistic lambda calculus language, we assume that all
stochastic distributions Distk:V×P(E)→[0,1]
are functions from a tuple of value and set of lambda expressions in our language to a real number between [math] and 1,
such that for any v∈V, Distk(v,.) is a probability measure over probability space (E,P(E)).
We assume that for each distribution Distk, we are given a probability density function
pdfDistk:V×E→[0,1], such that
[TABLE]
Let Tp=Traces(p) be the set of valid traces of a program p. Tp contains a countable number of traces.
We define a σ-algebra Σp over set Tp such that, for all t∈Tp, {t}∈Σp.
Given a trace t, pdf[[t]] is the unnormalized probability density of the trace t
(Figure 10). The normalized probability distribution μp:Σp→[0,1]
for a given program p is defined as
[TABLE]
Reversible Subproblem Selection Strategy:
Let p be a probabilistic program, t and t′
be valid traces from program p (i.e. t,t′∈Traces(p)), and
SS be a subproblem selection strategy that
returns a valid subproblem over t.
Definition 0 (Reversible subproblem selection strategy).
A subproblem selection strategy SS is reversible if
given any natural number n and n valid traces t1,t2,…tn∈Traces(p),
[TABLE]
i.e.,
given any n traces t1,t2…tn, if we can transform the trace ti to get trace ti+1 by
only modifying parts of the trace ti selected by the subproblem selection strategy SS, then
it is possible to transform the trace tn to get trace t1 by modifying parts of the trace
tn selected by the subproblem selection strategy SS.
In essence, reversible subproblem selection strategies always allow a subproblem based
inference algorithm to reverse a countable number of changes it has made to a trace (given that the
trace was not modified under a different subproblem selection strategy).
We use the shorthand SS⊢t≡t′ to denote
SS(t)⊢t≡t′.
Theorem 2.
A reversible subproblem selection strategy SS divides the trace space of program p
into equivalence classes.
We present the proof of this theorem in Appendix A.4 (Theorem 20).
A reversible subproblem selection strategy SS divides the trace space Tp
into a countable number of equivalence classes where each
equivalence class contains traces which can be modified into any other trace
in that class under the subproblem selection strategy.
A trace from one equivalence class cannot be modified by any subproblem based
inference algorithm to a trace from a different equivalence class under the
given subproblem selection strategy.
Given a reversible subproblem selection strategy SS, let
CSS={c1,…cn,…} be the countable set of
equivalence classes created by SS over the trace space Tp and
{Tc1,…Tcn,…} be the equivalence partitions
created by SS over Tp. Note that for all c∈CSS and t,t′∈Tc,
[TABLE]
and for all ci,cj∈CSS,t∈Tci and t′∈Tcj, where ci=cj
[TABLE]
In practice, subproblems are often specified by associating labels
with stochastic choices, then specifying the labels whose stochastic
choices should be included in the subproblem (Mansinghka et al., 2018).
A standard strategy is to have a fixed set of labels, with the labels
partitioning the choices into classes. Any strategy that always
specifies the subproblem via a fixed subset of labels is reversible.
Because of this property, all of the subproblem selection strategies
presented in (Mansinghka et al., 2018) are reversible.
Any subproblem selection strategy that always selects a fixed set of variables
is also reversible. This property ensures that the subproblem
selection strategy in Block Gibbs sampling, for example, is reversible.
Hence if two traces differ only in the choice of their id’s, all reversible
subproblem selection strategies will assign them to the same equivalence class.
Class functions given a subproblem selection strategy:
Consider a reversible subproblem selection strategy SS which
creates equivalence classes CSS={c1,…cn…}.
and equivalence partitions {Tc1,…Tcn,…}.
We create a generalized product space and class functions using the given subproblem
selection strategy.
Consider the countable set of disjoint measurable spaces
\big{\{}(C_{1},{\mathcal{}C}_{1}),\ldots(C_{n},{\mathcal{}C}_{n}),\ldots\big{\}}
where Ck={ck} and Ck={∅,Ck}. Also
consider disjoint measurable spaces \big{\{}(T_{c_{1}},\Sigma_{c_{1}}),\ldots(T_{c_{n}},\Sigma_{c_{n}}),\ldots\big{\}}.
where
Σck={A∩Tck∣A∈Σp}. We then construct the generalized product space
[TABLE]
Given a probability measure π on (C,C), we compute the conditional
distribution πi on (Ci×Tci,CSS⊗Σck), when π(Ci×Tci)>0
where
[TABLE]
We then define the regular conditional probability measure
vi:Ci×Σci→[0,1], where
vi(ci,A)=πi(Ci×A).
We create the class function fSS:Tp→C,
where
fSS(t)=⟨c,t⟩ where c is the equivalence class of trace t.
Since fSS is a one-to-one function, it is straightforward to
prove that fSS is a two-way measurable function.
Probability of the subtraces:
For all traces t,t′∈Tp such that SS⊢t≡t′,
subtraces ts=ExtractTraces(t,SS(t)) and ts′=ExtractTraces(t′,SS(t′))
are traces from the same program, i.e., ts,ts′∈Traces(ps), where
ps is the subprogram (Soundness).
Similarly, for all traces t∈Tp and subtraces ts=ExtractTrace(t,SS(t)),
for all subtraces ts′∈Traces(ps) (where ps=Program(ts)) and
t′=StitchTrace(t,ts′,SS(t)) then
SS⊢t≡t′ (Completeness).
Hence given a equivalence class ci and partitioned trace space Tci created
by subprogram selection strategy SS, there exists a subprogram ps such that
for all traces t∈Tci, subtraces ts=ExtractTrace(t,SS(t))
are valid traces of ps, i.e., ts∈Tps. We can therefore associate
traces from an equivalence class to valid subtraces of a subprogram.
Theorem 3.
Given a trace t and a valid subproblem S on t,
then for subtrace ts=ExtractTrace(t,S),
[TABLE]
i.e., the unnormalized densities of t and ts are equal.
We present the proof of this theorem in the Appendix A.4 (Theorem 22).
Consider an equivalence class ci and partitioned trace space Tci. Let ps
be the subprogram such that all subtraces of traces in Tci are valid traces of ps.
(Tps,Σps) is the measurable space over traces of subprogram ps.
The normalized probability distribution μps:Σps→[0,1] for the
subprogram ps is
[TABLE]
where A^{\prime}=\big{\{}t^{\prime}\big{|}t_{s}\in A,t^{\prime}=\mathsf{StitchTrace}(t,t_{s},\mathsf{SS}(t))\big{\}} for any trace t∈Tci.
Hence, sampling/inference over subprogram ps is equivalent to sampling/inference
over the original program p with the constraint that all traces belong to the equivalence class
ci.
Theorem 4.
Consider equivalence class ci and partitioned trace space Tci. Let ps
be the subprogram such that all subtraces of traces in Tci are valid traces of ps.
Then given a Markov Kernel K:Tps×Σps→[0,1] which is
μps-irreducible, aperiodic, and with stationary distribution μps, the
Markov kernel K(ci):Tci×Σci→[0,1] defined as
[TABLE]
where ts=ExtractTrace(t,SS(t)) and A′={ts∣t∈A,ts=ExtractTrace(t,SS(t))},
is vi(ci,.)-irreducible, aperiodic, and with stationary distribution vi(ci,.).
Proof.
ExtractTrace is one-to-one function from Tci to Tps
and the push forward measure of vi(ci,.) is μps.
∎
Definition 0 (Generalized Markov Kernel).
A Generalized Markov Kernel K is a parameterized Markov kernel which, when parameterized with a
probabilistic program p, defines the probability space (Tp,Σp,μp) (as defined above),
and returns a Markov Kernel K(p):Tp×Σp→[0,1], which is
μp-irreducible, aperiodic, and has the stationary distribution μp.
A generalized Markov Kernel formalizes the concept of Markov chain inference algorithms.
The inference algorithms used within the probabilistic programming framework are
generally coded to work with any input probabilistic program p and still
provide convergence guarantees. For example,
Venture (Mansinghka et al., 2018) allows the programmer to use a variety of inference algorithms
which in general work on all probabilistic programs which can be written in that language.
Definition 0 (Generalized Class Kernels).
Given a generalized Markov Kernel K and a subproblem selection strategy
SS, a generalized class Kernel KfSS is parameterized
with a probabilistic program p (which defines space (Tp,Σp,μp)), where
[TABLE]
and
ts=ExtractTrace(t,SS(t)), ps=Program(ts), fSS(t)=⟨ci,t⟩
and A′={ts∣t∈A,ts=ExtractTrace(t,SS(t))}.
5.2. Inference Metaprogramming
Using the concept of independent subproblem inference (Section 3) and generalized Markov Kernels, we define
an Inference Metaprogramming Language (Figure 11). An inference metaprogram
is either one of the black box Generalized Markov Kernel inference algorithms bi:Tp→Tp in our framework, which takes
a trace from an arbitrary program p as an input and returns another trace from the same program,
or a finite set S={p1\leavevmode ic1,p2\leavevmode ic2,…,pk\leavevmode ick} of inference
statements with an attached probability value pi∈(0,1), such that i=0∑kpk=1.
These probability values are used to randomly select a subproblem inference statement to execute.
Each infer statement is parameterized with a subproblem selection strategy SS,
which returns a valid subproblem over input trace t and an inference metaprogram that is executed over the subtrace.
Figure 12 presents the execution semantics of our inference metaprogramming language.
In comparison with entangled subproblem inference, one benefit of the approach is
that it is straightforward to apply independent subproblem inference recursively.
Theorem 7.
If all the subproblems used in our inference metaprograms are reversible and connect the space of their respective input probabilistic programs, then
all inference metaprograms in our inference metaprogramming language implement
a generalized Markov kernel.
Proof.
Proof by induction over structure of inference metaprograms.
Base Case:
All black box inference algorithms in our inference metaprogramming language are
generalized Markov kernels. Hence given traces of program p (which define the probability space (Tp,Σp,μp))
the black box inference algorithm is μp-irreducible, aperiodic, and has the stationary distribution μp.
Induction Case:
Consider the inference metaprogram ip={p1\leavevmode ic1,p2,ic2,…,pk\leavevmode ick},
where ici=infer(SSi,ipi) and i=1∑kpi=1.
Using the induction hypothesis, we assume, for all i∈{1,2,…k}, all ipi implement
a generalized Markov kernel Kipi. Since our subproblem SS is reversible, we
lift the generalized Markov kernel to generalized class kernel (Definition 6) KfSSi, which
for any program p (which defines space (Tp,Σp,μp)), is a class kernel.
Given an probabilistic program p, the inference metaprogram ip implements the Generalized Markov Kernel K,
where
[TABLE]
Using Theorems 37, 38, and 39,
if the class functions fSS1,fSS2,…,fSSk connect the space (Tp,Σp,μp),
K(p) is μp-irreducible, aperiodic, and has μp as its stationary distribution.
∎
Corollary 0.
Given a probabilistic program p (defining trace space (Tp,Σp,μp)), inference
metaprograms which use reversible subproblem selection strategies
which connect the space of their respective probabilistic programs asymptotically
converge to μp.
6. Related Work
Probabilistic Programming Languages: Over the last several decades
researchers have developed a range of probabilistic programming languages.
With current practice each language typically comes paired with one/a few black box inference strategies.
Example language/inference pairs include Stan (Carpenter et al., 2016) with
Hamiltonian Monte Carlo inference (Andrieu et al., 2003); Anglician (Tolpin
et al., 2015) with
particle Gibbs, etc.
Languages like LibBi (Murray, 2013), Edward (Tran et al., 2017) and Pyro (Labs, 2017) provide
inference customization mechanisms, but
without subproblems or asymptotic convergence guarantees.
Compilation Strategies for Probabilistic Programs:
Techniques for efficiently executing probabilistic programs are a prerequisite for their widespread adoption.
The Swift (Wu
et al., 2016) and Augur (Huang
et al., 2017; Tristan et al., 2014) compilers generate
efficient compiled implementations of inference algorithms that operate over probabilistic programs.
We anticipate that applying these compilation techniques to subproblems can deliver significant performance
improvements for the hybrid inference algorithms we study in this paper.
Subproblem Inference:
Both Turing (Ge
et al., 2018) and Venture (Mansinghka et al., 2018) provide inference metaprogramming
constructs with subproblems and different inference algorithms that operate on these
subproblems. In Venture subproblem inference is performed over full program traces,
with subproblems entangled with the full trace. The inference algorithms in Venture
must therefore operate over the entire trace while ensuring that the inference
effects do not escape the specified subproblem. Our extraction and stitching technique
eliminates this entanglement and enables the use of standard inference algorithms
that operate over complete traces while still supporting subproblem identification
and inference. Turing only provides mechanisms that target specific stochastic choices in the context of the
complete probabilistic computation.
There is work on extending Gibbs-like algorithms to work on Open Universe Probabilisitic Models (Arora et al., 2012; Milch and Russell, 2010).
This research studies algorithms that apply one strategy to choose a single variable at each iteration and rely on empirical
evidence of convergence. Our research, in contrast, supports a wide range of subproblem selection strategies
and provides formal proofs of asymptotic convergence properties for MCMC algorithms applied to these subproblems.
Asymptotic Convergence:
There is a vast literature on asymptotic convergence of Markov chain algorithms
in various statistics and probability settings (Meyn and Tweedie, 2012; Tierney, 1994). Our work is unique
in that it provides the first characterization of asymptotic convergence
for subproblem inference in probabilistic programs.
Complications that occur in this setting include mixtures of discrete
and continuous variables, stochastic choices with cascading effects
that may change the number of stochastic choices in the computation,
and resulting sample spaces with unbounded numbers of random variables.
Standard results from computational statistics, computational physics,
and Monte-Carlo methods focus on finite dimensional discrete state spaces,
a context in which linear algebra (i.e., spectral analysis of the transition
matrix of the underlying Markov chain (Diaconis and
Stroock, 1991) or coupling arguments (Levin and Peres, 2017))
is sufficient to prove convergence. State spaces with continuous random
variables are outside the scope of these formal analyses. Measure-theoretic
treatments are more general (Roberts
et al., 2004). Our results show how to apply
the concepts in these treatments to prove asymptotic convergence results for
probabilistic programs with interference metaprogramming.
Proving Properties of Probabilistic Inference:
Researchers have recently developed techniques for proving a variety of properties of different
inference algorithms for probabilistic programs (Scibior
et al., 2018; Ścibior
et al., 2017; Atkinson and
Carbin, 2017; Anonymous, 2020).
Our unique contribution relates to the treatment of convergence in the context of subproblems,
specifically 1) the identification of subproblem extraction and
stitching to obtain independent subproblems, 2) support for a general class of
(potentially state-dependent) subproblems, including programmable
subproblem selection strategies that may depend on the values of stochastic
choices from the current execution trace,
and 3) the mathematical formulation that enables us to state and prove asymptotic convergence results for hybrid inference metaprograms
that apply (a general class of potentially very different) MCMC algorithms to different parts of the inference problem.
We see our research and the research cited above as synergistic — one potential synergy is
that the research cited above can prove properties of the black-box MCMC algorithms
that our inference metaprograms deploy.
7. Conclusion
Inference metaprogramming, subproblem inference, and asymptotic convergence are key issues in
probabilistic programming.
Detangling the subproblem from the surrounding program trace allows
us to cleanly analyze subproblem based inference.
Our mathematical framework
introduces new concepts which enable us to
model subproblem based inference and
prove asymptotic convergence properties of the
resulting hybrid probabilistic inference algorithms.
Appendix A Appendix
A.1. Convergence
Lemma 0.
For all t∈T and A∈Σ,
[TABLE]
where f(t)=⟨x,y⟩∈Xi×Yi and U×V=f(A)∩Xi×Yi.
Proof.
Proof by induction.
Base case:
[TABLE]
using definition of Kf.
Induction Hypothesis:
For all 1≤n≤m, the following statement is true
[TABLE]
Induction Case:
[TABLE]
[TABLE]
Note that f(x′,y′)∈Xi×Yi
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Hence proved.
∎
Lemma 0.
[TABLE]
Proof.
Every set A∈Σ can be written as f−1(U×V) for some
U×V∈Cf as f is a two-way measurable function.
[TABLE]
We can split the integral into sum over the constituent product spaces Xi×Yi,
[TABLE]
We can rewrite Kf(f−1(x,y),f−1(U×V)) as
K(x)(y,V∩Yi)I(x,U∩Xi).
[TABLE]
∫x∈Xif(x)I(x,U∩Xi)m(dx)=∫x∈U∩Xif(x)m(dx) as
I(x,U∩Xi) is zero for all x∈/U∩Xi.
[TABLE]
We can rewrite f∗(π)i(dx′×dy) as vf∗(π)i)(x,dy)f∗(π)i(dx′×Yi)
using the definition of regular conditional probability distribution.
[TABLE]
[TABLE]
∫y∈YiKi(x)(y,V∩Yi)vf∗(π)i(x,dy)=vf∗(π)i(x,V∩Yi) as
vf∗(π)i(x,.) is the stationary distribution for kernel Ki(x).
[TABLE]
[TABLE]
∎
Lemma 0.
Kf* is aperiodic if for at least one x∈Xi for some i∈I,
Ki(x):Yi×Yi→[0,1]
is aperiodic.*
Proof.
Proof by contradiction. Let us assume Kf is periodic, i.e.,
there exists an integer d≥2, and a sequence
{E0,E1,…Ed−1} and N of d non-empty disjoint sets in τ such
that, for all i=0,1,…d−1 and for all t∈Ei,
- (1)
(∪i=0dEi)∪N=T
2. (2)
Kf(t,Ej)=1 for j=i+1(mod\leavevmode d)
3. (3)
π(N)=0
For all t∈Ei, Kf(t,Ej)=1 for j=i+1(mod\leavevmode d)
Consider any k∈I, any x∈Xk
and Ui×Vi=f(Ei)∩Xk×Yk.
Consider a trace t∈Ei, such that fx(t)=x.
Since Kf(t,Ei+1)=1≤I(x,Ui+1), for
all i=0,1…d−1, there exists a trace t′∈Ei,
such that f(t′)=⟨x,y⟩.
Kf(t,Ei+1)=1≤K(x)(y,Vi+1), Hence if
Kf is aperiodic, then for all x, K(x) is aperiodic.
∎
Lemma 0.
For any f∈F, any element t′∈Bt∞ such that
f(t′)=⟨x,y⟩∈Xif×Yif, and any
set U×V∈σ(Xif⊗Yif) such that A=f−1(U×V),
the following condition holds true
[TABLE]
Proof.
Consider class Kernel Kf(t′,A)=Ki(x)(y,V)I(x,U).
Since vf∗(π)i(x,V)>0 and Ki is vf∗(π)i-irreducible,
there exists an n such that
[TABLE]
Since x∈U, using Lemma 27
[TABLE]
[TABLE]
Since t′∈Bt∞,
there exists an n′ such that,
for all sets B∈Σ with t′∈B,
Kn′(t,B)>0. Hence
[TABLE]
∎
Lemma 0.
For any positive probability set A and any function f∈F,
if A⊆fx−1(fx(Bt∞)) then A∈Rt∞.
Proof.
Given a set A, we can treat f(A) as a union of sets
{Ui×Vi∣i∈If}, where Ui×Vi are
elements of set f(A) which are elements of the set Xif×Yif (i.e.
Ui×Vi=f(A)∩Xif×Yif).
Since π(A)>0, f∗(π)(f(A))>0 and for at least for one
i∈If, f∗(π)i(Ui×Vi)>0.
Since A⊆fx−1(fx(Bt∞)),
for each x∈Ui
there exists at least
one element t′∈Bt∞ such that fx(t′)=x.
If f∗(π)i(Ui×Vi)>0, there exists at least one t′∈Bt∞
such that f(t′)=⟨x,y⟩∈Ui×Yif and vf∗(π)i(x,Vi)>0.
Hence f−1(Ui×Vi)∈Rt∞.
∎
Lemma 0.
If F connects the space (T,Σ,π) then there does not exist a positive probability set A∈Σ,
such that A⊆⋂f∈Ffx−1((fx(Bt∞))c).
Proof.
Proof by Contradiction.
Let us assume such a set A exists.
If A is a positive probability set, then π(⋂f∈Ffx−1((fx(Bt∞))c))>0.
For any two functions f,g∈F,
[TABLE]
The set fx−1((fx(Bt∞))c)
only contains elements which are not in Bt∞ and gx−1(gx(Bt∞))
contains elements t′ such that there exists at least one element t′′∈Bt∞
with gx(t′)=gx(t′′).
Any positive probability set B⊆gx−1(gx(Bt∞)) is also a
subset of Bt∞ (using Lemma 35).
Hence
[TABLE]
Similarly
[TABLE]
Since π(⋂f∈Ffx−1((fx(Bt∞))c))>0,
[TABLE]
But this contradicts the fact the F connects the space (T,Σ,π).
Hence no such set A exists.
∎
Theorem 7.
If F connects the space (T,Σ,π) then the Markov Transition Kernel
K is π-irreducible.
Proof.
Proof by contradiction. Let us assume K is not π-irreducible, then there exists a
positive probability set A∈Σ such that A∈/Rt∞,
then
If
π(A∩Bt∞)>0, there exists a set B⊆Bt∞ and B⊆A
which implies A∈Rt∞. Hence π(A∩Bt∞)=0.
For any f∈F,
π(A∩fx−1(fx(Bt∞)))>0,
there exists a set B∈Rt∞ and B⊆A
which implies A∈Rt∞. Hence π(A∩fx−1(fx(Bt∞)))=0.
Since fx is a 2-way measurable function (and one-one function from sets to sets), for any set B
fx−1(fx(B))c=fx−1(fx(B)c).
Since π(A)>0 and
For any f∈F
π(A∩fx−1(fx(Bt∞)))=0
this means
[TABLE]
which means there exists a positive probability set B∈Σ
and B⊆fx−1(fx(Bt∞)c)),
which is impossible.
Hence no such set A exists. K is π-irreducible.
∎
Theorem 8.
π* is the stationary distribution of Markov Kernel K, i.e.*
[TABLE]
Proof.
[TABLE]
[TABLE]
∎
Theorem 9.
The Markov Transition Kernel K is aperiodic if at least one of the
class kernels Kfj is aperiodic.
Proof.
Proof by Contradiction.
Let us assume K is periodic.
i.e.,
there exists an integer d≥2 and a sequence
{E0,E1,…Ed−1} and N of d non-empty disjoint sets in τ such
that, for all i=0,1,…d−1 and for all t∈Ei,
- (1)
(∪i=0dEi)∪N=T
2. (2)
K(t,Ej)=1 for j=i+1(mod\leavevmode d)
3. (3)
π(N)=0
If K(t,Ej)=1 then for all f∈F, Kf(t,Ej)=1.
Therefore, for all i=0,1,…d−1 and for all t∈Ei,
Kf(t,Ej)=1 for j=i+1(mod\leavevmode d).
Hence if K is periodic, then for all f∈F, Kf is periodic.
Hence by contradiction, K is aperiodic.
∎
A.2. Soundness
Observation 1.
Note that whenever a rule in ⇒ex introduces an
assume statement in the subtrace, it creates a new variable name. Therefore
variable names in the new subtrace do not conflict with any variable names previously introduced
in another part of the trace. This fact will be used at various points within this paper.
Observation 2.
Note that whenever S⊢ae⇒exaes,ts, V(ae)=V(aes) and
whenever S⊢ae,aes,ts⇒stae′, V(ae′)=V(aes).
To prove soundness of our interface we start by proving
that for a given trace t and a valid subproblem S on trace t,
if for any subtrace ts the stitching process succeeds, the output trace
t′ differs from trace t only in parts which are within the subproblem (i.e. S⊢t≡t′).
Formally, for any trace t and subproblem S,
[TABLE]
Using the definition of StitchTrace,
the above lemma can be rewritten as
[TABLE]
One will note that the stucture of ts does not play a significant role in
proving the above condition.
To prove the above statement we require a similar condition over
augmented expressions embedded within traces. The lemma over augmented expressions is given below:
Lemma 0.
For all augmented expressions ae,ae′,
[TABLE]
Proof.
Proof using induction.
Base Case:
Case 1: ae=(x:x)#id,
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then
ae′=(x:x)#id′.
By definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=(x:x)#id,
[TABLE]
Case 2: ae=(x(idv):v)#id
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=(x(idv′):v′)#id′.
By definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=(x(idv):v)#id,
[TABLE]
Case 3: ae=(λ.x\leavevmode e:v)#id
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=(λ.x\leavevmode e:v′)#id′.
By definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=(λ.x\leavevmode e:v)#id,
[TABLE]
Induction Cases:
Case 1:
ae=((ae1\leavevmode ae2)⊥:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=((ae1\leavevmode ae2)⊥:v′)#id′, S⊢ae1,_,_⇒stae1′, and
S⊢ae2,_,_⇒stae2′.
By induction hypothesis
[TABLE]
Because S⊢ae1,_,_⇒stae1′
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2,_,_⇒stae2′
[TABLE]
Because S⊢ae1≡ae1′ and S⊢ae2≡ae2′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)⊥:v)#id,
[TABLE]
Case 2:
ae=((ae1\leavevmode ae2)aa:v)#id and ID(ae1)∈S
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=((ae1\leavevmode ae2)aa′:v′)#id′, S⊢ae1,_,_⇒stae1′, and
S⊢ae2,_,_⇒stae2′.
By induction hypothesis
[TABLE]
Because S⊢ae1,_,_⇒stae1′
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2,_,_⇒stae2′
[TABLE]
Because S⊢ae1≡ae1′ and S⊢ae2≡ae2′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)aa:v)#id,
[TABLE]
Case 3:
ae=((ae1\leavevmode ae2)x=ae3:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=((ae1\leavevmode ae2)y=ae3′:v′)#id′, S⊢ae1,_,_⇒stae1′, and
S⊢ae2,_,_⇒stae2′.
By induction hypothesis
[TABLE]
Because S⊢ae1,_,_⇒stae1′
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2,_,_⇒stae2′
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae3,_,_⇒stae3′
[TABLE]
Because S⊢ae1≡ae1′, S⊢ae2≡ae2′ and S⊢ae3≡ae3′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)x=ae3:v)#id,
[TABLE]
Case 4:
ae=(Dist(ae1#ide)=ae2):v)#id and ide∈/S.
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=(Dist(ae1′#ide)=ae2′):v′)#id′, S⊢ae1,_,_⇒stae1′, and
S⊢ae2,_,_⇒stae2′.
By induction hypothesis
[TABLE]
Because S⊢ae1,_,_⇒stae1′
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2,_,_⇒stae2′
[TABLE]
Because S⊢ae1≡ae1′ and S⊢ae2≡ae2′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=(Dist(ae1#ide)=ae2):v)#id and ide∈/S,
[TABLE]
Case 5:
ae=(Dist(ae1#ide)=ae2):v)#id and ide∈S.
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then ae′=(Dist(ae1′#ide)=ae2′):v′)#id′ and S⊢ae1,_,_⇒stae1′.
By induction hypothesis
[TABLE]
Because S⊢ae1,_,_⇒stae1′
[TABLE]
Because S⊢ae1≡ae1′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when ae=(Dist(ae1#ide)=ae2):v)#id and ide∈S,
[TABLE]
Because all cases are covered, using induction, the following statement is true for
all augmented expressions ae,ae′ and subproblems S.
[TABLE]
∎
Next we use Lemma 10 to prove the lemma below::
Lemma 0.
Given traces t and t′ and a subproblem S,
[TABLE]
Proof.
Proof by Induction
Base Case:
t=∅
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then t′=∅.
By definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when t=∅
[TABLE]
Induction Case:
Case 1:
t=ts;assume\leavevmode x=ae
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then t′=ts′;assume\leavevmode x=ae′, S⊢ae,_,_⇒stae′, and S⊢ts,_⇒stts′.
By induction hypothesis
[TABLE]
Because S⊢ts,_⇒stts′
[TABLE]
From Lemma 10
[TABLE]
Because S⊢ae,_,_⇒stae′
[TABLE]
Because S⊢ts′≡ts, and S⊢ae≡ae′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when t=ts;assume\leavevmode x=ae
[TABLE]
Case 2:
t=ts;observe(Dist(ae)=ev)
By assumption
[TABLE]
By definition of ⇒st
[TABLE]
Then t′=ts′;observe(Dist(ae′)=ev), S⊢ae,_,_⇒stae′, and S⊢ts,_⇒stts′.
By induction hypothesis
[TABLE]
Because S⊢ts,_⇒stts′
[TABLE]
From Lemma 10
[TABLE]
Because S⊢ae,_,_⇒stae′
[TABLE]
Because S⊢ts′≡ts and S⊢ae≡ae′, by definition of ≡
[TABLE]
Therefore
[TABLE]
Therefore when t=ts;observe(Dist(ae′)=ev)
[TABLE]
Because all cases have been covered, using induction,
for all traces t,t′, subproblems S and subtrace ts,
[TABLE]
∎
Corollary 0.
Given a valid trace t, a valid subproblem S,
a valid subtrace ts, for all traces t′ :
[TABLE]
Therefore, given a trace t and a valid subproblem S on t, if the
stitching process succeeds, then the output trace t′ will only differ from trace t
with parts which are within the subproblem S.
Next, we prove that given a valid trace t, a valid subproblem S on trace t,
and a subtrace ts=ExtractTrace(t,S), for any subtrace ts′∈Traces(Program(ts)),
the stitched trace t′=StitchTrace(t,ts′,S) is a valid trace from the program of trace t
(i.e. t′∈Traces(Program(t))).
Formally, given a valid trace t, a valid subproblem S on t, and
a subtrace ts=ExtractTrace(t,S),
[TABLE]
To prove the above statement, we require a few lemmas first which prove a similar condition for augmented expressions
and traces under non-empty environment
Lemma 0.
Given environements σv,σid,σv′ and σid′
such that dom\leavevmode σv=dom\leavevmode σid=dom\leavevmode σv′=dom\leavevmode σid′ an augmented expression ae within a trace t,
and a valid subproblem S over trace t
[TABLE]
[TABLE]
[TABLE]
Proof.
Proof by induction
Base Case:
Case 1:
ae=(x:x)#id
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=x and x∈/dom\leavevmode σv.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(x:x)#id and ts=∅.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=assume\leavevmode z=x.
By assumption
[TABLE]
By definition of ⇒s, dom\leavevmode σv′=dom\leavevmode σv
[TABLE]
Then ts′=∅ and aes′=(x:x)#id′.
Consider ae′=(x:x)#id′.
By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒s and x∈dom\leavevmode σv′
[TABLE]
Therefore
[TABLE]
Therefore when ae=(x:x)#id
[TABLE]
[TABLE]
[TABLE]
Case 2:
ae=(x(idv):v)#id
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=x, v=σv(x), and idv=σid(x).
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(x(idv):v)#id and ts=∅.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=assume\leavevmode z=x.
By assumption
[TABLE]
By definition of ⇒s, dom\leavevmode σv′=dom\leavevmode σv
[TABLE]
Then ts′=∅, aes′=(x(idv′):v′)#id′, v′=σv′(x), and idv′=σid′(x).
Consider ae′=(x(idv′):v′)#id′.
By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒s,v′=σv′(x), and idv′=σid′(x)
[TABLE]
Therefore
[TABLE]
Therefore when ae=(x(idv):v)#id
[TABLE]
[TABLE]
[TABLE]
Case 3:
ae=(λ.x\leavevmode e′:v)#id
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=λ.x\leavevmode e′.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(λ.x\leavevmode e′:v)#id and ts=∅.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=assume\leavevmode z=λ.x\leavevmode e′.
By assumption
[TABLE]
By definition of ⇒s, dom\leavevmode σv′=dom\leavevmode σv
[TABLE]
Then ts′=∅ and aes′=(λ.x\leavevmode e′:v′)#id′.
Consider ae′=(λ.x\leavevmode e′:v′)#id′.
By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒s and σv′,σid′⊢λ.x\leavevmode e′⇒s_,_,(λ.x\leavevmode e′:v′)#id′
[TABLE]
Therefore
[TABLE]
Therefore when ae=(λ.x\leavevmode e′:v)#id
[TABLE]
[TABLE]
[TABLE]
Induction Cases:
Case 1:
ae=(Dist(ae1#ide)=aev:v)#id and ide∈S
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=Dist(e1), σv,σid⊢e1⇒s_,_,ae1.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(Dist(ae1#ide)=aev:v)#id, ts=ts1,
and S⊢ae1⇒exaes1,ts1.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;assume\leavevmode z=Dist(es1) and aes1⇒res1.
By assumption
[TABLE]
By definition of ⇒s,
[TABLE]
Then ts′=ts2, aes′=(Dist(aes2#ide)=aev′:v′)#id′, and σv′,σid′⊢ps1;assume\leavevmode z=e1⇒sts2;assume\leavevmode z=aes2.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e1⇒s_,_,ae1, S⊢ae1⇒exaes1,ts1,
ts1;assume\leavevmode z=aes1⇒rps1;assume\leavevmode z=es1, and
σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2.
[TABLE]
Consider ae′=(Dist(ae1′#ide′)=aev′:v′)#id′.
By definition of ⇒st, S⊢ae1,aes2,ts2⇒stae1′
[TABLE]
Therefore
[TABLE]
By definition of ⇒s and σv′,σid′⊢e1⇒s_,_,ae1,
[TABLE]
Therefore
[TABLE]
Therefore when ae=(Dist(ae1#ide)=aev:v)#id
[TABLE]
[TABLE]
[TABLE]
Case 2:
ae=(Dist(ae1#ide)=ae2:v)#id and ide∈/S
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=Dist(e1), σv,σid⊢e1⇒s_,_,ae1, ae2⇒re2
and σv,σid⊢e2⇒s_,_,ae2.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=aes3, ts=ts1;observe(Dist(aes1)=e2);ts3,
S⊢ae1⇒exaes1,ts1, and S⊢ae2⇒exaes3,ts3.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;observe(Dist(es1)=e2);ps2;assume\leavevmode z=es2, aes1⇒res1,
ts1⇒rps1, aes3⇒res2, and ts3⇒rps2.
By assumption
[TABLE]
By definition of ⇒s,
[TABLE]
Then ts′=ts2;observe(Dist(aes2)=e2);ts4, aes′=aes4, σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2,
and σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e1⇒s_,_,ae1, S⊢ae1⇒exaes1,ts1,
ts1;assume\leavevmode z=aes1⇒rps1;assume\leavevmode z=es1, and
σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2.
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e2⇒s_,_,ae2, S⊢ae2⇒exaes3,ts3,
ts3;assume\leavevmode z=aes3⇒rps2;assume\leavevmode z=es2, and
σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
[TABLE]
Consider ae′=(Dist(ae1′#ide)=ae2′:v′)#id′.
By definition of ⇒st, S⊢ae1,aes2,ts2⇒stae1′, and
S⊢ae2,aes4,ts4⇒stae2′,
[TABLE]
Therefore
[TABLE]
By definition of ⇒s, σv′,σid′⊢e1⇒s_,_,ae1′, σv′,σid′⊢e2⇒s,_,_,ae2′,
[TABLE]
Therefore
[TABLE]
Therefore when ae=(Dist(ae1#ide)=ae2:v)#id
[TABLE]
[TABLE]
[TABLE]
Case 3:
ae=((ae1\leavevmode ae2)⊥:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=(e1\leavevmode e2), σv,σid⊢e1⇒s_,_,ae1,
and σv,σid⊢e2⇒s_,_,ae2.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=((aes1\leavevmode aes3)⊥:v)#id, ts=ts1;ts3,
S⊢ae1⇒exaes1,ts1, and S⊢ae2⇒exaes3,ts3.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;ps2;assume\leavevmode z=(es1\leavevmode es2), aes1⇒res1,
ts1⇒rps1, aes3⇒res2, and ts3⇒rps2.
By assumption
[TABLE]
By definition of ⇒s,
[TABLE]
Then ts′=ts2;ts4, aes′=((aes2\leavevmode aes4)⊥:v)#id, σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2,
and σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e1⇒s_,_,ae1, S⊢ae1⇒exaes1,ts1,
ts1;assume\leavevmode z=aes1⇒rps1;assume\leavevmode z=es1, and
σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2.
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e2⇒s_,_,ae2, S⊢ae2⇒exaes3,ts3,
ts3;assume\leavevmode z=aes3⇒rps2;assume\leavevmode z=es2, and
σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
[TABLE]
Consider ae′=((ae1′\leavevmode ae2′)⊥:v′)#id′.
By definition of ⇒st, S⊢ae1,aes2,ts2⇒stae1′, and
S⊢ae2,aes4,ts4⇒stae2′,
[TABLE]
Therefore
[TABLE]
By definition of ⇒s, σv′,σid′⊢e1⇒s_,_,ae1′, σv′,σid′⊢e2⇒s,_,_,ae2′,
and because ae1 is not in the subproblem S, therefore its value will not change
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)⊥:v)#id, ID(ae1)∈/S
[TABLE]
[TABLE]
[TABLE]
Case 4:
ae=((ae1\leavevmode ae2)aa:v)#id and ID(ae1)∈S
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=(e1\leavevmode e2), σv,σid⊢e1⇒s_,_,ae1,
and σv,σid⊢e2⇒s_,_,ae2.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=((aes1\leavevmode aes3)aa:v)#id, ts=ts1;ts3,
S⊢ae1⇒exaes1,ts1, and S⊢ae2⇒exaes3,ts3.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;ps2;assume\leavevmode z=(es1\leavevmode es2), aes1⇒res1,
ts1⇒rps1, aes3⇒res2, and ts3⇒rps2.
By assumption
[TABLE]
By definition of ⇒s,
[TABLE]
Then ts′=ts2;ts4, aes′=((aes2\leavevmode aes4)aa′:v)#id, σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2,
and σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e1⇒s_,_,ae1, S⊢ae1⇒exaes1,ts1,
ts1;assume\leavevmode z=aes1⇒rps1;assume\leavevmode z=es1, and
σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2.
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e2⇒s_,_,ae2, S⊢ae2⇒exaes3,ts3,
ts3;assume\leavevmode z=aes3⇒rps2;assume\leavevmode z=es2, and
σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
[TABLE]
Consider ae′=((ae1′\leavevmode ae2′)aa′:v′)#id′.
By definition of ⇒st, S⊢ae1,aes2,ts2⇒stae1′, and
S⊢ae2,aes4,ts4⇒stae2′,
[TABLE]
Therefore
[TABLE]
By definition of ⇒s, σv′,σid′⊢e1⇒s_,_,ae1′, σv′,σid′⊢e2⇒s,_,_,ae2′,
and because V(ae1′)=V(aes2), V(ae2′)=V(aes4) (Observation 2)
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)aa:v)#id, ID(ae1)∈S
[TABLE]
[TABLE]
[TABLE]
Case 5:
ae=((ae1\leavevmode ae2)y=ae3:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then e=(e1\leavevmode e2), σv,σid⊢e1⇒s_,_,ae1,
σv,σid⊢e2⇒s_,_,ae2, σv′′,σid′′⊢e3⇒s_,_,ae3, and V(ae1)=⟨λ.x\leavevmode e3,σv′′,σid′′⟩.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=aes5, ts=ts1;assume\leavevmode x=aes1;ts3;assume\leavevmode y=aes3;ts5,
S⊢ae1⇒exaes1,ts1, S⊢ae2⇒exaes3,ts3, and
S⊢ae3⇒exaes5,ts5.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;assume\leavevmode x=es1;ps2;assume\leavevmode y=es2;ps3;assume\leavevmode z=es3, aes1⇒res1,
ts1⇒rps1, aes3⇒res2, ts3⇒rps2, aes5⇒res3, and ts5⇒rps3.
By assumption
[TABLE]
By definition of ⇒s,
[TABLE]
Then ts′=ts2;assume\leavevmode x=aes2;ts4;assume\leavevmode y=aes4;ts6,
aes′=aes6, σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2,
σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4, σv′′′,σid′′′⊢ps3;assume\leavevmode z=es3⇒sts6;assume\leavevmode z=aes6.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e1⇒s_,_,ae1, S⊢ae1⇒exaes1,ts1,
ts1;assume\leavevmode z=aes1⇒rps1;assume\leavevmode z=es1, and
σv′,σid′⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2.
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e2⇒s_,_,ae2, S⊢ae2⇒exaes3,ts3,
ts3;assume\leavevmode z=aes3⇒rps2;assume\leavevmode z=es2, and
σv′,σid′⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4.
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv′′,σid′′⊢e3⇒s_,_,ae3, S⊢ae3⇒exaes5,ts5,
ts5;assume\leavevmode z=aes5⇒rps3;assume\leavevmode z=es3, and
σv′′′,σid′′′⊢ps3;assume\leavevmode z=es3⇒sts6;assume\leavevmode z=aes6.
[TABLE]
Consider ae′=((ae1′\leavevmode ae2′)y=ae3′:v′)#id′.
By definition of ⇒st, S⊢ae1,aes2,ts2⇒stae1′,
S⊢ae2,aes4,ts4⇒stae2′, and S⊢ae3,aes6,ts6⇒stae3′
[TABLE]
Therefore
[TABLE]
By definition of ⇒s, σv′,σid′⊢e1⇒s_,_,ae1′, σv′,σid′⊢e2⇒s,_,_,ae2′,
and because V(ae1′)=V(aes2), V(ae1′)=V(aes2) (Observation 2), σv′′′,σid′′′⊢e3⇒s_,_,ae3′
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)y=ae3:v)#id, ID(ae1)∈/S
[TABLE]
[TABLE]
[TABLE]
Because we have covered all cases, using induction, the lemma is true.
∎
Lemma 0.
Given environements σv,σid,σv′ and σid′
such that dom\leavevmode σv=dom\leavevmode σid=dom\leavevmode σv′=dom\leavevmode σid′ a partial trace t within a trace tp(t* is a suffix of trace tp),
and a valid subproblem S over trace tp*
[TABLE]
[TABLE]
Proof.
Proof by induction
Base Case:
t=∅
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then p=∅.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then ts=∅.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=∅.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then ts′=∅.
Consider t′=∅.
By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒s
[TABLE]
Therefore
[TABLE]
Therefore when t=∅
[TABLE]
[TABLE]
Induction Case:
Case 1:
t=assume\leavevmode x=ae;t1
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then p=assume\leavevmode x=e;p1, σv,σid⊢e⇒sv,id,ae, σv′′=σv[x→v],
σid′′=σid[x→id], and
σv′′,σid′′⊢p1⇒st1.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then ts=ts1;assume\leavevmode x=aes;ts3, S⊢ae⇒exaes,ts1, and
S⊢t1⇒exts3.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;assume\leavevmode x=es;ps2, ts1⇒rps1, aes⇒res, and ts3⇒rps2.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then ts′=ts2;assume\leavevmode x=aes′;ts4, σv′,σid′⊢ps1;assume\leavevmode x=es⇒sts2;assume\leavevmode x=aes′,
σv′′′=σv′[x→V(aes′)], σid′′′=σid′[x→ID(aes′)],
and σv′′′,σid′′′⊢ps2⇒sts4 (Observation 1, variable names in ts2 do not collide with
variable names in ts4).
By induction hypothesis
[TABLE]
[TABLE]
Because σv′′,σid′′⊢p1⇒st1, S⊢t1⇒exts3,
S⊢t1⇒exts3, ts3⇒rps2, and σv′′′,σid′′′⊢ps2⇒sts4,
[TABLE]
By statement 13
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e⇒s_,_,ae, S⊢ae⇒exaes,ts1,
ts1;assume\leavevmode z=aes⇒rps1;assume\leavevmode z=es and σv′,σid′⊢ps1;assume\leavevmode z=es⇒sts2;assume\leavevmode z=aes′,
[TABLE]
Consider t′=assume\leavevmode x=ae′;t1′.
By definition of ⇒st, S⊢t1,ts4⇒stt1′, and
S⊢ae,aes′,ts2⇒stae′,
[TABLE]
Therefore
[TABLE]
By definition of ⇒s, σv′,σid′⊢e⇒s_,_,ae′, and
σv′′′,σid′′′⊢ps2⇒sts4 (V(ae′)=V(aes′) and ID(ae′)=ID(aes′) Observation 2)
[TABLE]
Therefore
[TABLE]
Therefore when t=assume\leavevmode x=ae;t1
[TABLE]
[TABLE]
Case 2:
t=observe(Dist(ae)=ev);t1
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then p=observe(Dist(e)=ev);p1, σv,σid⊢e⇒sv,id,ae, and
σv,σid⊢p1⇒st1.
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then ts=ts1;observe(Dist(aes)=ev);ts3, S⊢ae⇒exaes,ts1, and
S⊢t1⇒exts3.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then ps=ps1;observe(Dist(aes)=ev);ps2, ts1⇒rps1, aes⇒res, and ts3⇒rps2.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then ts′=ts2;observe(Dist(aes′)=ev);ts4, σv′,σid′⊢ps1;observe(Dist(es)=ev)⇒sts2;observe(Dist(aes′)=ev),
and σv′,σid′⊢ps2⇒sts4 (Observation 1, variable names in ts2 do not collide with
variable names in ts4).
By induction hypothesis
[TABLE]
[TABLE]
Because σv,σid⊢p1⇒st1, S⊢t1⇒exts3,
S⊢t1⇒exts3, ts3⇒rps2, and σv′,σid′⊢ps2⇒sts4,
[TABLE]
By statement 13
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because σv,σid⊢e⇒s_,_,ae, S⊢ae⇒exaes,ts1,
ts1;observe(Dist(aes)=ev)⇒rps1;observe(Dist(es)=ev) and σv′,σid′⊢ps1;observe(Dist(es)=ev)⇒sts2;observe(Dist(aes′)=ev),
[TABLE]
Consider t′=observe(Dist(ae′)=ev);t1′.
By definition of ⇒st, S⊢t1,ts4⇒stt1′, and
S⊢ae,aes′,ts2⇒stae′,
[TABLE]
Therefore
[TABLE]
By definition of ⇒s, σv′,σid′⊢e⇒s_,_,ae′, and
σv′,σid′⊢ps2⇒sts4
[TABLE]
Therefore
[TABLE]
Therefore when t=observe(Dist(ae)=ev);t1
[TABLE]
[TABLE]
Because we have covered all cases, using induction, the below statement
is true.
[TABLE]
[TABLE]
∎
Lemma 0.
Given a valid trace t and a valid subproblem S
and subtrace ts=ExtractTrace(t,S)
for all possible subtraces ts′ :
[TABLE]
Proof.
Since statement 14 is true for all σv,σid,σv′, and σid′,
given dom\leavevmode σv=dom\leavevmode σid=dom\leavevmode σv′=dom\leavevmode σid′,
replacing σv,σid,σv′, and σid′ with ∅ (empty environment) will result in
[TABLE]
∎
Theorem 16.
Given a valid trace t, a valid subproblem S, subtrace ts=ExtractTrace(t,S).
For all possible subtraces ts′,
ts′∈Traces(Program(ts)) implies
there exist a trace t′ such that:
t′=StitchTrace(t,ts′,S)**
t′∈Traces(Program(t))**
S⊢t≡t′**
Proof.
From Corollary 12 and Lemma 15.
∎
A.3. Completeness
Within this section we prove that our interface is complete, i.e.
given a valid trace t, a valid subproblem S on t, a subtrace ts=ExtractTrace(t,ts,S),
for any trace t′ which can be achived from entangled subproblem interface (i.e. t′∈Traces(Program(t)) and S⊢t≡t′),
there exists a subtrace ts′∈Traces(Program(ts)) such that t′=StitchTrace(t,ts′,S).
Formally, given valid trace t, a valid subproblem S on t, a subtrace ts, and any trace t′
[TABLE]
[TABLE]
We need to prove a few lemmas which will aid us in proving the above statement.
Lemma 0.
Given an augmented expression ae and a subproblem S, a subtrace ts, subaugmented expression aes
and an augmented expressions ae′ such that
[TABLE]
then, there exists an aes′, ts′, ps and es such that
[TABLE]
[TABLE]
Proof.
Proof by Induction
Base Case:
Case 1: ae=(x:x)#id
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(x:x)#id and ts=∅.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=(x:x)#id′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=x.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then x∈/dom\leavevmode σv and id′ is a unique id.
Consider aes′=(x:x)#id′, ts′=∅,
ps=∅ and es=x. By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
Because x∈/dom\leavevmode σv, the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=(x:x)#id,
[TABLE]
implies
[TABLE]
[TABLE]
Case 2: ae=(x(idv):v)#id
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(x(idv):v)#id and ts=∅.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=(x(idv′):v)#id′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=x.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv(x)=v′,σid(x)=idv′, and id′ is a unique id.
Consider aes′=(x(idv′):v′)#id′, ts′=∅,
ps=∅ and es=x. By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
Because σv(x)=v′,σid(x)=idv′, the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=(x(idv):v)#id,
[TABLE]
implies
[TABLE]
[TABLE]
Case 3: ae=(λ.x\leavevmode e′:v)#id
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(λ.x\leavevmode e′:v)#id and ts=∅.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=(λ.x\leavevmode e′:v)#id′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=λ.x\leavevmode e′.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then id′ is a unique id.
Consider aes′=(λ.x\leavevmode e′:v′)#id′, ts′=∅,
ps=∅ and es=λ.x\leavevmode e′. By definition of ⇒st
[TABLE]
Therefore
[TABLE]
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢λ.x\leavevmode e′⇒s_,_,(λ.x\leavevmode e′:v′)#id′, the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=(λ.x\leavevmode e′:v)#id,
[TABLE]
implies
[TABLE]
[TABLE]
Induction Case:
Case 1: ae=(Dist(ae1#ide)=ae2:v)#id and ide∈S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(Dist(aes1#ide)=ae2:v)#id, S⊢ae1⇒exaes1,ts1 and ts=ts1.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=(Dist(ae1′#ide′)=ae2′:v′)#id′ and S⊢ae1≡ae1′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=Dist(e1) and ae1⇒re1.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e1⇒s_,_,ae1′, e2∈dom\leavevmode Dist(v′), and σv,σid⊢e2⇒s_,_,ae2′.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae1⇒exaes1,ts1, S⊢ae1≡ae1′, ae1⇒re1, and
σv,σid⊢e1⇒s_,_,ae1′,
[TABLE]
[TABLE]
Consider aes′=(Dist(aes2#ide)=ae2′:v′)#id′, ts′=ts2, ps=ps1,
and es=Dist(es1). Because S⊢ae1,aes2,ts2⇒stae1′ and ide∈S, the
definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts1⇒rps1,
[TABLE]
Because aes1⇒res1, the definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2, σv,σid⊢e2⇒s_,_,ae2′,
and all variable names introduced by ts2 do not conflict with variable names in ae2′ (Observation 1),
the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=(Dist(ae1#ide)=ae2:v)#id and ide∈S, and assuming induction hypothesis,
[TABLE]
[TABLE]
[TABLE]
Case 2: ae=(Dist(ae1#ide)=ae2:v)#id and ide∈/S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(aes3:v)#id, S⊢ae1⇒exaes1,ts1,
S⊢ae2⇒exaes3,ts3, ae2⇒re2, and ts=ts1;observe(Dist(aes1)=e2);ts3.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=(Dist(ae1′#ide)=ae2′:v′)#id′, S⊢ae1≡ae1′, S⊢ae2≡ae2′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=Dist(e1) and ae1⇒re1.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e1⇒s_,_,ae1′, e2∈dom\leavevmode Dist(v′), and σv,σid⊢e2⇒s_,_,ae2′.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae1⇒exaes1,ts1, S⊢ae1≡ae1′, ae1⇒re1, and
σv,σid⊢e1⇒s_,_,ae1′,
[TABLE]
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae2⇒exaes3,ts3, S⊢ae2≡ae2′, ae2⇒re2, and
σv,σid⊢e2⇒s_,_,ae2′,
[TABLE]
[TABLE]
Consider aes′=(aes4:v′)#id′, ts′=ts2;observe(Dist(aes2)=e2);ts4, ps=ps1;observe(Dist(es1)=e2);ps2,
and es=es2. Because S⊢ae1,aes2,ts2⇒stae1′, S⊢ae2,aes4,ts4⇒staes′, and ide∈S, the
definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts1⇒rps1, ts3⇒rps2, aes1⇒res1,
definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because aes3⇒res2, the definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2, σv,σid⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4,
and all variable names introduced by ts2 do not conflict with variable names in aes4 and ts4 (Observation 1),
the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=(\mathsf{Dist}(ae_{1}\id_{e})=ae_{2}:v)#idandid_{e}\in{\mathcal{}S}$, and assuming induction hypothesis,
[TABLE]
[TABLE]
[TABLE]
Case 3: ae=((ae1\leavevmode ae2)aa:v)#id and ID(ae1)∈S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=((aes1\leavevmode aes3)aa:v)#id, S⊢ae1⇒exaes1,ts1,
S⊢ae2⇒exaes3,ts3, and ts=ts1;ts3.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=((ae1′\leavevmode ae2′)aa′:v′)#id′, S⊢ae1≡ae1′, S⊢ae2≡ae2′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=(e1\leavevmode e2), ae1⇒re1, and ae2⇒re2.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e1⇒s_,_,ae1′ and σv,σid⊢e2⇒s_,_,ae2′.
If V(ae1′)=⟨λ.x\leavevmode e′,σv′,σid′⟩,
σv′[y→V(ae2′)],σid′[y→ID(ae2′)]⊢e′[y/x]⇒s_,_,aee, aa=y=aee,
else
aa=⊥.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae1⇒exaes1,ts1, S⊢ae1≡ae1′, ae1⇒re1, and
σv,σid⊢e1⇒s_,_,ae1′,
[TABLE]
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae2⇒exaes3,ts3, S⊢ae2≡ae2′, ae2⇒re2, and
σv,σid⊢e2⇒s_,_,ae2′,
[TABLE]
[TABLE]
Consider aes′=((aes2\leavevmode aes4)aa′:v′)#id′,
ts′=ts2;ts4, ps=ps1;ps2,
and es=(es1\leavevmode es2). Because S⊢ae1,aes2,ts2⇒stae1′, S⊢ae2,aes4,ts4⇒stae2′, and ide∈S, the
definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts1⇒rps1 and ts3⇒rps2,
definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because aes3⇒res2 and aes1⇒res1, the definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2, σv,σid⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4,
and all variable names introduced by ts2 do not conflict with variable names in aes4 and ts4 (Observation 1),
the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)aa:v)#id and ide∈S, and assuming induction hypothesis,
[TABLE]
[TABLE]
[TABLE]
Case 4: ae=((ae1\leavevmode ae2)⊥:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=((aes1\leavevmode aes3)⊥:v)#id, S⊢ae1⇒exaes1,ts1,
S⊢ae2⇒exaes3,ts3, and ts=ts1;ts3.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=((ae1′\leavevmode ae2′)⊥:v′)#id′, S⊢ae1≡ae1′, S⊢ae2≡ae2′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=(e1\leavevmode e2), ae1⇒re1, and ae2⇒re2.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e1⇒s_,_,ae1′ and σv,σid⊢e2⇒s_,_,ae2′.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae1⇒exaes1,ts1, S⊢ae1≡ae1′, ae1⇒re1, and
σv,σid⊢e1⇒s_,_,ae1′,
[TABLE]
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae2⇒exaes3,ts3, S⊢ae2≡ae2′, ae2⇒re2, and
σv,σid⊢e2⇒s_,_,ae2′,
[TABLE]
[TABLE]
Consider aes′=((aes2\leavevmode aes4)⊥:v′)#id′,
ts′=ts2;ts4, ps=ps1;ps2,
and es=(es1\leavevmode es2). Because S⊢ae1,aes2,ts2⇒stae1′, S⊢ae2,aes4,ts4⇒stae2′, and ide∈S, the
definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts1⇒rps1 and ts3⇒rps2,
definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because aes3⇒res2 and aes1⇒res1, the definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2, σv,σid⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4,
and all variable names introduced by ts2 do not conflict with variable names in aes4 and ts4 (Observation 1),
the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)⊥:v)#id and ide∈/S, and assuming induction hypothesis,
[TABLE]
[TABLE]
[TABLE]
Case 5: ae=((ae1\leavevmode ae2)y=ae3:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(aes5:v)#id, S⊢ae1⇒exaes1,ts1,
S⊢ae2⇒exaes3,ts3, S⊢ae3→exaes5,ts5,
and ts=ts1;assume\leavevmode x=aes1;ts3;assume\leavevmode y=aes3;ts5.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then ae′=((ae1′\leavevmode ae2′)y=ae3′:v′)#id′, S⊢ae1≡ae1′, S⊢ae2≡ae2′,
and S⊢ae3≡ae3′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then e=(e1\leavevmode e2), ae1⇒re1, and ae2⇒re2.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e1⇒s_,_,ae1′ and σv,σid⊢e2⇒s_,_,ae2′,
V(ae1′)=⟨λ.x\leavevmode e3′,σv′,σid′⟩, e3=e3′[y/x], and
σv′[y→V(ae2′)],σid′[y→ID(ae2′)]⊢e3⇒s_,_,ae3′.
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae1⇒exaes1,ts1, S⊢ae1≡ae1′, ae1⇒re1, and
σv,σid⊢e1⇒s_,_,ae1′,
[TABLE]
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae2⇒exaes3,ts3, S⊢ae2≡ae2′, ae2⇒re2, and
σv,σid⊢e2⇒s_,_,ae2′,
[TABLE]
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae3⇒exaes5,ts5, S⊢ae3≡ae3′, ae3⇒re3, and
σv,σid⊢e3⇒s_,_,ae3′,
[TABLE]
[TABLE]
Consider aes′=(aes6:v′)#id′,
ts′=ts2;assume\leavevmode x=aes2;ts4;assume\leavevmode y=aes4;ts6, ps=ps1;assume\leavevmode x=es1;ps2;assume\leavevmode y=es2;ps3,
and es=es3. Because S⊢ae1,aes2,ts2⇒stae1′, S⊢ae2,aes4,ts4⇒stae2′, S⊢ae3,aes6,ts6⇒stae3′, the
definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts1⇒rps1, ts3⇒rps2,
ts5⇒rps3, aes1⇒res1, and aes3⇒res2,
definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because aes5⇒res3, the definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es1⇒sts2;assume\leavevmode z=aes2, σv,σid⊢ps2;assume\leavevmode z=es2⇒sts4;assume\leavevmode z=aes4,
and all variable names introduced by ts2 do not conflict with variable names in aes4 and ts4 (Observation 1),
the definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when ae=((ae1\leavevmode ae2)y=ae3:v)#id and ide∈/S, and assuming induction hypothesis,
[TABLE]
[TABLE]
[TABLE]
Because we have covered all cases, using induction, the following statement is true
[TABLE]
[TABLE]
[TABLE]
∎
Lemma 0.
For any two traces t,t′, a valid subproblem S on t and
a subtrace S⊢t⇒exts
[TABLE]
implies there exists a subtrace ts′ such that
[TABLE]
Proof.
Proof by Induction
Base Case:
t=∅
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then ts=∅.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then t′=∅.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then p=∅.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Consider ts′=∅, ps=∅.
By definition of ⇒r
[TABLE]
Therefore
[TABLE]
By definition of ⇒s
[TABLE]
Therefore
[TABLE]
By definition of ⇒st
[TABLE]
Therefore
[TABLE]
Therefore when t=∅,
[TABLE]
[TABLE]
Induction Case:
Case 1:
t=assume\leavevmode x=ae;t1
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then ts=ts1;assume\leavevmode x=aes;ts3, S⊢ae⇒exaes,ts1
and S⊢t1⇒exts3.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then t′=assume\leavevmode x=ae′;t1′, S⊢ae≡ae′, and S⊢t1≡t1′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then p=assume\leavevmode x=e;p1, ae⇒re, and t1⇒rp1.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e⇒s_,_,ae′, σv′,σid′⊢p1⇒st1′,
and σv′=σv[x→V(ae′)], σid′=σid[x→ID(ae′)].
By induction hypothesis
[TABLE]
[TABLE]
Because S⊢t1⇒exts3, S⊢t1≡t1′, t1⇒rp1, and σv′,σid′⊢p1⇒st1′,
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae⇒exaes,ts1, S⊢ae≡ae′,
ae⇒re, and σv,σid⊢e⇒s_,_,ae′,
[TABLE]
[TABLE]
Because S⊢ae,aes′,ts2⇒stae′ and S⊢t1,ts4⇒stt1′,
the definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts3⇒rps2, ts1⇒rps1, and aes⇒res, the
definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es⇒sts2;assume\leavevmode z=aes′,
σv′,σid′⊢ps2⇒sts4, all variable names introduced
by ts2 do not conflict with variable names in ts4 (Observation 1),
V(ae′)=V(aes′), and ID(ae′)=ID(aes′) (Observation 2), the
definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when t=assume\leavevmode x=ae;t1, and assuming the induction hypothesis,
[TABLE]
[TABLE]
Case 2:
t=observe(Dist(ae)=ev);t1
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then ts=ts1;observe(Dist(aes)=ev);ts3, S⊢ae⇒exaes,ts1
and S⊢t1⇒exts3.
By assumption
[TABLE]
By definition of ≡
[TABLE]
Then t′=observe(Dist(ae′)=ev);t1′, S⊢ae≡ae′, and S⊢t1≡t1′.
By assumption
[TABLE]
By definition of ⇒r
[TABLE]
Then p=observe(Dist(e)=ev);p1, ae⇒re, and t1⇒rp1.
By assumption
[TABLE]
By definition of ⇒s
[TABLE]
Then σv,σid⊢e⇒s_,_,ae′ and σv,σid⊢p1⇒st1′.
By induction hypothesis
[TABLE]
[TABLE]
Because S⊢t1⇒exts3, S⊢t1≡t1′, t1⇒rp1, and σv,σid⊢p1⇒st1′,
[TABLE]
By induction hypothesis
[TABLE]
[TABLE]
[TABLE]
Because S⊢ae⇒exaes,ts1, S⊢ae≡ae′,
ae⇒re, and σv,σid⊢e⇒s_,_,ae′,
[TABLE]
[TABLE]
Because S⊢ae,aes′,ts2⇒stae′ and S⊢t1,ts4⇒stt1′,
the definition of ⇒st implies
[TABLE]
Therefore
[TABLE]
Because ts3⇒rps2, ts1⇒rps1, and aes⇒res, the
definition of ⇒r implies
[TABLE]
Therefore
[TABLE]
Because σv,σid⊢ps1;assume\leavevmode z=es⇒sts2;assume\leavevmode z=aes′,
σv,σid⊢ps2⇒sts4, and all variable names introduced
by ts2 do not conflict with variable names in ts4 (Observation 1),
the
definition of ⇒s implies
[TABLE]
Therefore
[TABLE]
Therefore when t=observe(Dist(ae)=ev);t1, and assuming the induction hypothesis,
[TABLE]
[TABLE]
All cases have been covered therefore by induction the following statement is true.
[TABLE]
[TABLE]
∎
Theorem 19.
Given a valid trace t and a valid subproblem S of t and a subtrace
ts=ExtractTrace(t,S). For all possible
traces t′,
t′∈Traces(Program(t))\leavevmode ∧\leavevmode S⊢t≡t′
implies
there exists a subtrace ts′ such that
ts′∈Traces(Program(ts))**
t′=StitchTrace(t,ts′,S)**
Proof.
From definitions of Traces, Program, ExtractTrace, and StitchTrace,
given a trace t and a valid subproblem S
[TABLE]
[TABLE]
Because Lemma 18 is true for all environments σv,σid,σv′, and σid′.
The theorem is equivalent to the lemma, but with σv,σid,σv′, and σid′ set to ∅.
∎
A.4. Metaprogramming
Theorem 20.
A reversible subproblem selection strategy SS divides the trace space of program p
into equivalence classes.
Proof.
SS⊢t≡t′ is an equivalence
relation over traces t,t′∈T.
Reflexivity : SS⊢t≡t is true by definition.
Symmetry : SS⊢t≡t∧SS⊢t≡t′⟹SS⊢t′≡t.
Hence its symmetric.
Transitivity : SS⊢t1≡t2, SS⊢t2≡t3
then SS⊢t1≡t3 (by definition of reversibility and symmetry).
∎
Lemma 0.
For any augmented expression ae and subproblem S,
[TABLE]
Proof.
Proof by Induction
Base Case:
Case 1: ae=x:x,
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=x:x and ts=∅.
By definition of pdf,
pdf[[x:x]]=1 and pdf[[∅]]=1.
Therefore
[TABLE]
Case 2: ae=x:v,
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=x:v and ts=∅.
By definition of pdf,
pdf[[x:v]]=1 and pdf[[∅]]=1.
Therefore
[TABLE]
Case 3: ae=λ.x\leavevmode e:v,
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then pdf[[λ.x\leavevmode e:v]]=v,1 and pdf[[∅]]=1.
Therefore
[TABLE]
Induction Case:
Case 1: ae=(ae1\leavevmode ae2)⊥:v,
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(aes1\leavevmode aes2)⊥:v, ts=ts1;ts2, S⊢ae1⇒exaes1,ts1,
and S⊢ae2⇒exaes2,ts2.
By induction hypothesis
[TABLE]
Because S⊢ae1⇒exaes1,ts1,
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2⇒exaes2,ts2
[TABLE]
From definition of pdf
[TABLE]
[TABLE]
From definition of pdf
pdf[[aes]]=pdf[[aes1]]∗pdf[[aes2]]
and
pdf[[ts]]=pdf[[ts1]]∗pdf[[ts2]]. Therefore
[TABLE]
Case 2: ae=((ae1\leavevmode ae2)x=ae3:v)#id and ID(ae1)∈S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=(aes1\leavevmode aes2)x=ae3:v, ts=ts1;ts2, S⊢ae1⇒exaes1,ts1,
and S⊢ae2⇒exaes2,ts2.
By induction hypothesis
[TABLE]
Because S⊢ae1⇒exaes1,ts1,
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2⇒exaes2,ts2
[TABLE]
From definition of pdf
[TABLE]
[TABLE]
From definition of pdf
pdf[[aes]]=pdf[[aes1]]∗pdf[[aes2]]∗pdf[[ae3]]
and
pdf[[ts]]=pdf[[ts1]]∗pdf[[ts2]]. Therefore
[TABLE]
Case 3: ae=((ae1\leavevmode ae2)x=ae3:v)#id and ID(ae1)∈/S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=aes3:v, ts=ts1;assume\leavevmode y=aes1;ts2;assume\leavevmode x=aes2;ts3, S⊢ae1⇒exaes1,ts1,
S⊢ae2⇒exaes2,ts2, S⊢ae3⇒exaes3,ts3.
By induction hypothesis
[TABLE]
Because S⊢ae1⇒exaes1,ts1,
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2⇒exaes2,ts2
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae3⇒exaes3,ts3
[TABLE]
From definition of pdf
[TABLE]
[TABLE]
From definition of pdf
pdf[[aes]]=pdf[[aes3]]
and
pdf[[ts]]=pdf[[ts1]]∗pdf[[aes1]]∗pdf[[ts2]]∗pdf[[aes2]]∗pdf[[ts3]]. Therefore
[TABLE]
Case 4: ae=Dist(ae1#ide)=ae2:v and ide∈S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=Dist(ae1)=ae2:v, ts=ts1, and S⊢ae1⇒exaes1,ts1.
By induction hypothesis
[TABLE]
Because S⊢ae1⇒exaes1,ts1,
[TABLE]
From definition of pdf, ae2⇒re
[TABLE]
[TABLE]
From definition of pdf
pdf[[aes]]=pdf[[aes1]]∗pdf[[ae2]]∗pdfDist(V(ae1),e)
and
pdf[[ts]]=pdf[[ts1]]. Therefore
[TABLE]
Case 5: ae=(Dist(ae1#ide)=ae2:v)#id and ide∈/S
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then aes=aes2:v, ts=ts1, S⊢ae1⇒exaes1,ts1, ae2⇒revand
S⊢ae2⇒exaes1,ts1.
By induction hypothesis
[TABLE]
Because S⊢ae1⇒exaes1,ts1,
[TABLE]
By induction hypothesis
[TABLE]
Because S⊢ae2⇒exaes2,ts2,
[TABLE]
From definition of pdf,
[TABLE]
[TABLE]
From definition of pdf
pdf[[aes]]=pdf[[aes2]],
pdf[[ts]]=pdf[[ts1]]∗pdf[[aes1]]∗pdf[[ts2]]∗pdfDist(V(aes1),e) and V(aes1)=V(ae1) (Observation 2). Therefore
[TABLE]
Because we have considered all cases, by induction, for augmented expression ae, subproblem S, augmented subexpression aes, and a subtrace ts,
[TABLE]
∎
Theorem 22.
Given a trace t and a valid subproblem S on t,
then for subtrace ts=ExtractTrace(t,S),
[TABLE]
i.e. for the unnormalized density of t and ts is equal.
Proof.
Proof by induction
Base Case:
t=∅
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then
ts=∅.
By definition of pdf, pdf[[∅]]=1
[TABLE]
Induction Case:
Case 1:
t=assume\leavevmode x=ae;t1
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then S⊢ae⇒exaes,ts1, S⊢t1⇒exts2.
By induction hypothesis
[TABLE]
Because S⊢t1⇒exts2,
[TABLE]
From Lemma 21 over augmented expressions
[TABLE]
Because S⊢ae⇒exaes,ts1
[TABLE]
From definition of pdf
[TABLE]
Because pdf[[ts]]=pdf[[aes]]∗pdf[[ts1]]∗pdf[[ts2]]
[TABLE]
Case 2:
t=observe(Dist(ae)=e);t1
By assumption
[TABLE]
By definition of ⇒ex
[TABLE]
Then S⊢ae⇒exaes,ts1, S⊢t1⇒exts2.
By induction hypothesis
[TABLE]
Because S⊢t1⇒exts2,
[TABLE]
From Lemma 21 over augmented expressions
[TABLE]
Because S⊢ae⇒exaes,ts1
[TABLE]
From definition of pdf
[TABLE]
Because pdf[[ts]]=pdf[[aes]]∗pdf[[ts1]]∗pdf[[ts2]]∗pdfDist(V(ae),e) and V(ae)=V(aes) (Observation 2)
[TABLE]
Because we have covered all cases, by induction, for any trace t and subproblem S
[TABLE]
Therefore for any trace t and subproblem S
[TABLE]
∎