Limit theorems for cloning algorithms
Letizia Angeli, Stefan Grosskinsky, Adam M. Johansen

TL;DR
This paper provides a rigorous theoretical framework for analyzing the convergence and errors of cloning algorithms used in estimating large deviations in continuous-time stochastic processes, applicable to jump processes on compact spaces.
Contribution
It establishes the first comprehensive, fully rigorous bounds on systematic and random errors of cloning algorithms in continuous time, avoiding time discretization.
Findings
Bounds systematic and random errors of cloning algorithms
Applicable to a large class of jump processes on compact spaces
Provides a framework for evaluating and improving algorithm efficiency
Abstract
Large deviations for additive path functionals of stochastic processes have attracted significant research interest, in particular in the context of stochastic particle systems and statistical physics. Efficient numerical `cloning' algorithms have been developed to estimate the scaled cumulant generating function, based on importance sampling via cloning of rare event trajectories. So far, attempts to study the convergence properties of these algorithms in continuous time have only led to partial results for particular cases. Adapting previous results from the literature of particle filters and sequential Monte Carlo methods, we establish a first comprehensive and fully rigorous approach to bound systematic and random errors of cloning algorithms in continuous time. To this end we develop a method to compare different algorithms for particular classes of observables, based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Limit Theorems for Cloning Algorithms
Letizia Angeli
Stefan Grosskinsky
Adam M. Johansen
Letizia Angeli: University of Warwick, The Alan Turing Institute and Heriot-Watt University.
Email: [email protected]
Stefan Grosskinsky: University of Warwick, The Alan Turing Institute and TU Delft.
Email: [email protected]
Adam M. Johansen: University of Warwick and The Alan Turing Institute.
Email: [email protected]
Abstract
Large deviations for additive path functionals of stochastic processes have attracted significant research interest, in particular in the context of stochastic particle systems and statistical physics. Efficient numerical ‘cloning’ algorithms have been developed to estimate the scaled cumulant generating function, based on importance sampling via cloning of rare event trajectories. So far, attempts to study the convergence properties of these algorithms in continuous time have led to only partial results for particular cases. Adapting previous results from the literature of particle filters and sequential Monte Carlo methods, we establish a first comprehensive and fully rigorous approach to bound systematic and random errors of cloning algorithms in continuous time. To this end we develop a method to compare different algorithms for particular classes of observables, based on the martingale characterization of stochastic processes. Our results apply to a large class of jump processes on compact state space, and do not involve any time discretization in contrast to previous approaches. This provides a robust and rigorous framework that can also be used to evaluate and improve the efficiency of algorithms.
65C35,
60F25,
62L20,
60F10,
60J75,
60K35,
cloning algorithm,
dynamic large deviations,
interacting particle systems,
convergence,
Feynman-Kac formulae,
jump processes,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs
,
and
1 Introduction
Cloning algorithms have been introduced to the theoretical physics literature [1, 2] as numerical methods to study large deviations of particle currents and other dynamic observables in stochastic particle systems. They combine importance sampling with a stochastic selection mechanism which is used to evaluate numerically the scaled cumulant generating function for time-additive path functionals of stochastic processes. Based on classical ideas of evolutionary algorithms [3, 4], a fixed size population of copies of the original system evolves in parallel, subject to cloning or killing in such a way as to favor the realization of atypical trajectories contributing to rare events. Various variants of the approach are now applied on a regular basis to different systems and large deviation phenomena of interest [5, 6, 7], including also current fluctuations of non-equilibrium lattice gas models [8, 6, 9, 10], turbulent flows [11], glassy dynamics [12], heat waves in climate models [13] and pressure of the edge-triangle model [14]. Due to its widespread applications, the mathematical justification and convergence properties of the algorithm have recently become a subject of research interest with only partial progress. Formal approaches so far are based on a branching process interpretation of the algorithm in discrete time [15], with limited and mostly numerical results in continuous time [16, 17, 18, 19].
In this paper, we provide a novel interpretation of cloning algorithms through Feynman-Kac models and their particle approximations (see [20, 21, 22, 23] for comprehensive reviews), which is itself an established approach to understanding sequential Monte Carlo methods and particle filtering. Previous results provide rigorous control on convergence properties and error bounds of particle filters and related algorithms, mostly for models in discrete time, beginning with the chain of research initiated by [24] with a recent survey provided in [22]. Fewer results address continuous-time dynamics, dating back to [25] in the filtering context, with a Feynman-Kac-based treatment provided by [20] and references therein; a survey of the filtering literature is provided by [26, Chapter 9]. In the current context, particularly relevant recent works include [27, 28, 29, 30, 31]. This literature generally considers diffusive dynamics and relies upon approximative time-discretisations of those dynamics. Adapting those results to the context of jump processes on locally compact state spaces, for which exact simulation from the dynamics is possible, we can establish first rigorous convergence results for the cloning algorithm in continuous time including bounds on the random error and bounds on the systematic error. These bounds include the explicit dependence on the clone size distribution, which is a key parameter of the cloning algorithm. The setting of finite activity pure jump processes in which cloning algorithms are primarily employed allows these algorithms to avoid time discretisation by simulating exactly from the law of the underlying process and allows the use of different approximating particle systems. Similar methods have been previously employed in the probabilistic rare event analysis literature in both discrete and continuous time, via explicit Feynman-Kac approximations, e.g. [32], and splitting algorithms (see [33] and references therein); however, both the underlying processes and approximations considered are quite different to those for which cloning algorithms are usually employed. Practically, an important contribution of our approach is a systematic method to compare different cloning algorithms and particle approximations for particular classes of observables of interest, based on the martingale characterization of continuous-time stochastic processes.
This framework provides a novel perspective on the underlying structure of cloning algorithms in terms of McKean representations [22, Section 1.2.2], and can be used to systematically explore several degrees of freedom in the design of algorithms that can be used to improve performance, as illustrated in [34] for current large deviations of the inclusion process [10]. Here we focus on presenting full rigorous results obtained by applying this approach to a version of the classical cloning algorithm in continuous time [2]. In contrast to previous work in the context of cloning algorithms [15, 16], our mathematical approach does not require a time discretization and works in the very general setting of a pure jump Markov process on a locally compact state space. This covers in particular any finite-state Markov chain or stochastic particle systems on finite lattices.
The paper is organized as follows. In Section 2 we introduce general Feynman-Kac models associated to pure jump Markov processes and show that they can be interpreted as the law of a non-linear Markov process, known as a McKean interpretation [21]. In Section 3 we introduce particle approximations for Feynman-Kac models, including classical mean-field versions and cloning algorithms. We provide generalized conditions for convergence as our main result (proved in Section 4), and use this to establish rigorous convergence bounds for cloning algorithms. In Section 5 we introduce large deviations and scaled cumulant generating functions (SCGF) of additive observables for pure jump Markov processes and discuss how the results presented in Section 3 can be applied to estimate the SCGF. We conclude with a short discussion in Section 6.
2 Mathematical Setting
2.1 Dynamics and Feynman-Kac models
We consider a continuous-time homogeneous Feller process \big{(}X_{t}:t\geq 0\big{)} taking values on a locally compact Polish state space , where is the Borel field on . We denote by and the sets of measures and probability measures, respectively, on . \big{(}P(t):t\geq 0\big{)} describes the semigroup associated with , which is considered as acting on the Banach space of bounded continuous functions , endowed with the supremum norm
[TABLE]
We use the standard notation and for the distribution and the corresponding expectation on the usual path space
[TABLE]
The measurable structure on is given by the Borel -algebra induced by the Skorokhod topology (see [35], Chapter 3). If we want to emphasize a particular initial condition or distribution of the process we write and , or and , respectively. The semigroup acts on bounded continuous functions and probability measures via
[TABLE]
where the latter provides a weak characterization of the distribution at time . Here and in the following we use the common notation for expectations of w.r.t. measures on .
Using the Hille-Yosida Theorem (see e.g. [36], Chapter 3), it is possible to associate to the above Feller process an infinitesimal generator acting on a dense subset so that
[TABLE]
for all and .
In this work, we restrict ourselves to nonexplosive pure jump Feller processes. We denote by the escape rate from state and the target state is chosen with the probability kernel , so that the overall transition rate is
[TABLE]
for . We assume to be a strictly positive, bounded and continuous function and to be a continuous function for every . Under these assumptions, the pure jump process possesses an infinitesimal generator [37, p. 162] with full domain given by
[TABLE]
Along with jump processes on continuous spaces such as continuous-time random walks on (see e.g. [38]), this setting includes in particular any finite-state continuous-time Markov chain. Typical compact examples we have in mind are given by stochastic particle systems on , with finite local state space and lattice which can be finite or countably infinite. These include spin systems with or exclusion processes with , in which particles can jump only onto empty sites. Stochastic particle systems such as zero-range processes with are locally compact as long as the lattice is finite (see e.g. [39] for details).
We will study Feynman-Kac models associated to the jump process by tilting its generator with a diagonal part or potential, which arise in many applications including dynamic large deviations, as explained in detail in Section 5.
Lemma 2.1**.**
Consider a potential function and the tilted generator
[TABLE]
Then the family of operators \big{(}P^{\mathcal{V}}(t):t\geq 0\big{)} with , defined as the solution to the backward equation
[TABLE]
*for all , forms a non-conservative semigroup, the so-called Feynman-Kac semigroup, and is its infinitesimal generator in the sense of the Hille-Yosida Theorem. *
Proof.
See [36], Theorem 3.47.
∎
In order to control the asymptotic behaviour of , we make the following assumption, which closely resembles [28, Assumption 1], on asymptotic stability.
Assumption 2.2** (Asymptotic Stability).**
The spectrum of (2) is bounded by a principal eigenvalue . Moreover, is associated to a positive eigenfunction and an eigenmeasure . Finally, there exist constants and such that
[TABLE]
for every and .
Asymptotic stability is for example guaranteed for all irreducible, finite-state continuous-time Markov chains which necessarily have a spectral gap. For alternative sufficient conditions implying asymptotic stability in a more general context including continuous state spaces, see Appendix A.
We introduce the measures for any general initial distribution and , defined by
[TABLE]
for any . In the literature [21], is known as the unnormalised t-marginal Feynman-Kac measure. Applying Lemma 2.1, we can see that solves the evolution equation
[TABLE]
for any , and . The measures with which one can most naturally associate a process are the corresponding normalised t-marginal Feynman-Kac measures in ,
[TABLE]
defined for any and .
Observe that, as a direct consequence of asymptotic stability (Assumption 2.2), there exist constants and such that for any ,
[TABLE]
for any and initial distribution . In particular converges weakly to , as . Indeed, by definition of (7) and then by asymptotic stability (Assumption 2.2),
[TABLE]
for any and for some constant . This gives the bound (8) for any large enough. Increasing accordingly to ensure that the bound holds also for small , we obtain (8) for any .
For simplicity, in the rest of this article the initial distribution is fixed and we write (resp. ) instead of (resp. ).
2.2 McKean Interpretations
Now, we want to outline the evolution of the time-marginal distribution in terms of interacting jump-type infinitesimal generators. The content presented in the rest of this section is based on the works of Del Moral and Miclo [21, 22, 20]. In this established framework it is possible to define generic Markov processes with time marginals and then use Monte Carlo sampling techniques to approximate those marginals.
Lemma 2.3**.**
For every and , the normalised -marginal (7) solves the non-linear evolution equation
[TABLE]
Proof.
Using the evolution equation (6) of , we see that
[TABLE]
∎
The evolution equation (10) results from the unique decomposition of the non-conservative generator into a conservative and a diagonal part given by the potential . The latter, together with the normalization of , leads to the nonlinear second part in (10) which we want to rewrite to be in the form of another infinitesimal generator, that we denote by . Since (10) is non-linear in , this depends itself on the current distribution such that
[TABLE]
for every and . The choice of the non-linear generator is not unique, leading to various representations of the form
[TABLE]
where is the overall transition kernel of and depends on the current distribution .
Lemma 2.4** (Sufficient conditions).**
An infinitesimal generator in the form (12) satisfies condition (11) if and only if
[TABLE]
for all and . In particular, a sufficient condition on (12) for (11) to hold is
[TABLE]
for all .
Proof.
It is enough to observe that
[TABLE]
∎
Combining with the linear part of (10) into a so-called McKean generator on ,
[TABLE]
the evolution equation (10) can be written as
[TABLE]
for every and . Therefore, the normalized Feynman-Kac marginal can be interpreted as the law of a Markov process \big{(}\overline{X}_{t}\,:\,t\geq 0\big{)} on , associated to the family of generators \big{(}\overline{\mathcal{L}}_{\mu_{t}}:t\geq 0\big{)}. This process is also known as a McKean representation of the process associated to the Feynman-Kac measure , and it is non-linear and in particular time-inhomogeneous. This can be formulated using the propagator
[TABLE]
for all , which follows directly from the definition of (7) and the semigroup characterizing the time evolution for (3).
While the time evolution of is uniquely determined by (10) and therefore independent of the choice of (13), Lemma 2.4 leads to various different McKean representations of the form (12) (see e.g. [34, 28]), that can be characterized by the operator . One common choice related to algorithms in [1, 2] is
[TABLE]
where is an arbitrary constant, and we use the standard notation and for positive and negative part of .
One other possible representation of (12) we want to mention explicitly here is given by
[TABLE]
This corresponds to a pure jump process on in which every jump strictly increases the value of the potential in contrast to the previous representation (15). We will see in the next section that can be interpreted as a fitness potential for the overall process. Further McKean representations of (10) are discussed in [34], here we focus on cloning algorithms which are based on (15).
3 Interacting Particle Approximations
Independent of the particular representation, the rates of the McKean process depend on the distribution itself, which is in general not known. A standard approach is to sample such processes through particle approximations [23], which involve running, in parallel, copies or clones of the process (called particles), and then approximating by the empirical distribution of the realizations. For any the latter is defined as
[TABLE]
We write for the infinitesimal generator of an -particle system and also call this an IPS generator, and denote the associated empirical distribution as
[TABLE]
We denote by
[TABLE]
the standard carré-du-champ operator associated to the generator .
3.1 A general convergence result
The full dynamics can be set up in various different ways such that converges in an appropriate sense as for any . Theoretical convergence results can be obtained under the following assumptions, which are fulfilled by standard mean field particle approximations (as shown in Section 3.2) and cloning algorithms (Section 3.3).
Assumption 3.1**.**
[TABLE]
Remark*.*
Test functions of the form
[TABLE]
describe mean-field observables averaged over the particle ensemble which are generally of most interest, e.g. for the estimator (79) of the SCGF it is sufficient to consider such functions, as shown in Section 5.2. In general the goal is to approximate for a given , so it is natural to set up the auxiliary particle approximation in a permutation invariant way and use mean-field observables.
To better understand the above assumptions, recall that the carré du champ of an interacting particle system is a quadratic operator associated to the fluctuations of the process, whereas the generator determines the expected behaviour of the observables . Thus, Assumption 3.1 implies that trajectories of mean-field observables in a particle approximation coincide in expectation with average trajectories of the McKean representation they are based on (19a), and concentrate on their expectation with diverging (19b). We include the operators explicitly in (19b), because it allows the condition to be stated in a convenient form and we anticipate it being useful in further analysis. Condition (19c) assures that at any given time only a bounded number of particles can change their state, which is a mild technical assumption, necessary to allow the application of Lemma 4.1 in the proof of the error estimates.
Theorem 3.2**.**
Consider a sequence of particle approximations satisfying Assumption 3.1 with empirical distributions (18). Under Assumption 2.2, for every there exists a constant independent of and such that
[TABLE]
for any . Furthermore, there exists a constant independent of and such that
[TABLE]
for any and large enough.
Remark*.*
The constants and depend on the Feynman-Kac model of interest, on the choice of the McKean model and on the considered interacting particle approximation.
The proof, presented in Section 4, is an adaptation of the results in [28] and makes use of the propagator (14) of and the martingale characterization of .
Remark*.*
Observe that, by Markov’s inequality, Theorem 3.2 implies
[TABLE]
for every , , and , where does not depend on . In particular, considering , we can see that
[TABLE]
as , for any , by a Borel-Cantelli argument. The existence of a countable determining class allows this to be further strengthened to the almost sure convergence of to in the weak topology (see, for example, [40, Theorem 4]).
It is important to clarify that the estimators of the Feynman-Kac distribution given by the empirical measures usually have a bias, i.e. for , which vanishes only asymptotically, as illustrated in Theorem 3.2. This arises from the non-linear time evolution of . However, it is straightforward to derive unbiased estimators of the unnormalized measures (5), as shown by the following result.
Proposition 3.3** (Unbiased Estimators).**
Consider a sequence of particle approximations satisfying (19a) and initial condition (19d), with empirical distributions (18). Then, the unnormalized empirical measure
[TABLE]
is an unbiased estimator of the unnormalized -marginal (5), i.e.
[TABLE]
for any .
Proof.
First observe that \mathbb{E}\big{[}\nu_{0}^{N}(f)\big{]}=\nu_{0}(f). Indeed, is the average of i.i.d. random variables with law , and corresponds to the initial distribution of (5).
Note that \mathbb{E}\big{[}\nu_{t}^{N}(f)\big{]} satisfies the evolution equation
[TABLE]
Moreover, by assumption (19a) and using the characterization of (11)-(13), we have
[TABLE]
Inserting into (24), this simplifies to
[TABLE]
Since also generates the time evolution of (6), a simple Gronwall argument with \mathbb{E}\big{[}\nu_{0}^{N}(f)\big{]}=\nu_{0}(f) gives (23).
∎
A generic version of interacting particle systems, directly related to the above McKean representations has been studied in the applied probability literature in great detail [23, 28], providing quantitative control on error bounds for convergence. After reviewing those results in the next subsection, we present a different approach taken in the theoretical physics literature under the name of cloning algorithms [1, 5], which provides some computational advantages but lacks general rigorous error control so far [15, 16].
3.2 Mean Field Particle Approximation
The most basic particle approximation is simply to run the McKean dynamics in parallel on each of the particles, replacing the distribution by the empirical measure. Formally, the mean field particle model with associated to a McKean generator (13), is a Markov process on with homogeneous infinitesimal generator defined by
[TABLE]
for any . Here denotes the McKean generator (13) acting on the function , where the dependence on has been replaced by the empirical distribution .
In analogy to the decomposition in (13), the generator (25) can be decomposed as with
[TABLE]
where and stand respectively for the operators and acting on the function , i.e. only on particle .
Moreover, using representation (12) for , we can write
[TABLE]
with , which introduces an interaction between the particles. In this decomposition, (26) generates the so-called mutation dynamics, where the particles evolve independently under the dynamics given by the infinitesimal generator of the original process, whereas (27) generates the selection dynamics, which leads to mean-field interactions between particles. With (28) the state of particle gets replaced by that of particle with rate . The total selection rate in the particle approximation is , and depends on the McKean representation, in particular the choice of in (12).
From general practical experience it is favourable to minimize the total selection rate in order to improve the estimator’s asymptotic variance; it’s widely understood in the SMC literature that eliminating unnecessary selection events can significantly improve estimator variances, see, for example, [21, Section 7.2.1, 7.4.2] and [41]. For mean-field particle approximations this suggests that (16) is preferable to (15) since
[TABLE]
for all and . In view of Lemma 2.4, minimizing the total selection rate pertains to maximizing , and can be interpreted as a fitness function. With (16) every selection event therefore increases the fitness of the particle ensemble, which is not necessarily the case with (15), and there are even more optimal choices than (16) in that sense as discussed in [34]111As a side remark, the mutation part of the McKean dynamics (which is fixed for mean-field particle particle approximations by (26)), can naturally also decrease the fitness of the ensemble.. On the other hand, depending on the particular application, implementing particle approximations with lower total selection rate could be computationally more expensive, leading to a trade-off in lower values for to be accessible in practice. This is discussed in [34] for a particular example, and is not the subject of this paper.
In order to motivate the choice of the cloning algorithm in the next subsection which is based on the selection rates (15), we note that one can write (27) as
[TABLE]
using a change of summation indices in the second term. With the above discussion this can be interpreted as follows: If particle is less fit than level it is killed and replaced by a uniformly chosen particle , and if it is fitter than it cloned, replacing a uniformly chosen particle .
Observe that, by definition of (25), for any function on of the form , with , we have that
[TABLE]
thus conditions (19a)-(19b) are satisfied.
Analogous relations hold also for the individual mutation and cloning parts of the generator. Since generators are linear, the identity (30) is immediate. The carré du champ (31) is quadratic in , but off-diagonal terms in the corresponding double sum turn out to vanish in a straightforward computation, leading to the additional factor . Furthermore, by construction, for almost every realization , , of the mean field particle approximation, there exists at most one particle such that , thus condition (19c) is satisfied with . Therefore, Theorem 3.2 holds and provides -error and bias estimates of order and respectively, in accordance with already established results, e.g. in [28, 23, 22].
3.3 The Cloning Algorithm
Cloning algorithms have been proposed in the theoretical physics literature [1, 2] for evaluating large deviation functions associated to Markov processes similar to the mean field system (25), using the same mutation dynamics. While selection and mutation events are independent in the latter due to the additive structure of in (26) and (27), in cloning algorithms both are combined to reduce computational cost. We focus the exposition on a variant of the algorithm proposed in [2], but other continuous-time versions can be analysed analogously. This cloning algorithm is constructed from the McKean model (13) with selection rates \widetilde{W}_{c}(x,y)=\big{(}\mathcal{V}(x)-c\big{)}^{-}+\big{(}\mathcal{V}(y)-c\big{)}^{+} as in (15), and we denote the associated McKean generator by
[TABLE]
We will use in particular the killing/cloning interpretation introduced in (29). We recall that the overall escape rate and probability kernel of the original dynamics are denoted respectively by and .
The infinitesimal description of the cloning algorithm as a continuous-time Markov process on the state space is given by the generator
[TABLE]
for any and . Here is the set of all subsets of particle indices, denotes the vector , with
[TABLE]
and, similarly, denotes the vector with
[TABLE]
for any . Cloning events are now coupled with mutation, and if , a non-empty set of particles is chosen at random from the ensemble with probability and every particle is replaced by a clone of , before particle mutates to a new state . If we set , so that no cloning occurs. Further properties of the cloning distribution , which is the main distinctive feature of this algorithm, are discussed below. The killing part in the second line runs independently and remains unchanged from (29). The algorithm is often applied in situations with for all (in particular also with ), leaving cloning coupled with mutation as the only selection events.
In order to simplify the presentation, we make some further assumptions on , which are all satisfied by common choices in the theoretical physics literature. The probability of choosing a set depends only on its size and not on its elements, i.e. for any
[TABLE]
Denote the mean and second moment of this distribution by
[TABLE]
Of course, and its moments also depend on and , which we omit in the notation for simplicity. In order to ensure that the third condition in Assumption 3.1, namely (19c), is satisfied, we assume that the support of is uniformly bounded in , i.e.
[TABLE]
Note that this implies that also and are uniformly bounded, i.e. . We further assume , i.e. is large enough so that the process (33) is well defined.
The most common choice in the physics literature (see, e.g., the recent summary in [7]) for the distribution is
[TABLE]
This corresponds to a binary distribution on the two integers nearest to the prescribed mean, and minimizes the second moment of the distribution for a given mean. Note that if is an integer, concentrates, which includes the case .
The next two results assure respectively that condition (19a) and condition (19b) in Assumption 3.1 are satisfied for the cloning algorithm, so we can apply Theorem 3.2. The only condition is to choose such that each particle produces on average \big{(}\mathcal{V}(x_{i})-c\big{)}^{+} clones per unit time, in accordance with the second term in (29).
Proposition 3.4**.**
Consider the cloning generator (33) with as in (3.3) and (36), such that the mean of the cloning size (35) is
[TABLE]
and . Then, for any test function of the form , with and large enough, we get
[TABLE]
where is the McKean generator given in (32).
Remark*.*
Note that is essential for (36) and (19c), and a simple sufficient condition is for the escape rates to be uniformly bounded below, i.e. .
Proof.
We start by considering the first term in the expression of (33). Observe that with ,
[TABLE]
Thus, we can write
[TABLE]
Moreover, by (3.3), we have that, for any ,
[TABLE]
Therefore,
[TABLE]
Thus, (33) can be rewritten as
[TABLE]
by changing summation variables in the cloning term and using (32).
∎
Proposition 3.5**.**
Let be a cloning generator satisfying the conditions in Proposition 3.4. Then, for any test function of the form , with ,
[TABLE]
as , where for some constant independent of and , and
[TABLE]
with
[TABLE]
and
[TABLE]
Remark*.*
Due to the linearity of the generator, the combined mutation/cloning events in the cloning algorithm can be decomposed easily, which leads to extra terms only in the quadratic carré du champ. In the expression of the operator (42), the term
[TABLE]
is due to the dependence between mutation and cloning dynamics and its sign is not known a priori. Whereas, the term \lambda(x)\,\big{(}Q(x)-M(x)\big{)}\cdot\big{(}\ell_{\mu}f(x)\big{)}^{2} arises from the dependence between clones (since multiple cloning events are allowed at the same time) and is always non-negative. In particular, in any setting in which there is at most one clone per event, i.e. when , the term vanishes. Furthermore, minimizing as in (37) for given (38) leads to the best bound on the carré du champ and convergence properties of the algorithm.
Proof.
Consider the carré du champ of ,
[TABLE]
Using (39), the first term can be decomposed as
[TABLE]
where with (40) and (41) the last line can be rewritten as
[TABLE]
Substituting in the expression of the carré du champ , we obtain
[TABLE]
The first line in (43) is simply
[TABLE]
Now, considering the second line of (43), we can write
[TABLE]
since
[TABLE]
for every such that .
Recalling that \lambda(x)M(x)=\big{(}\mathcal{V}(x)-c\big{)}^{+}, exchanging summation indices and combining with the third line of (43), we see that
[TABLE]
Moreover,
[TABLE]
with
[TABLE]
for all , for some constant , since and are bounded by condition (36) and is bounded by assumption. Combining all together, we obtain the statement.
∎
Proposition 3.4 and Proposition 3.5 show in particular that Assumption 3.1 is satisfied for cloning algorithms, hence Theorem 3.2 holds and provides bias and error bounds.
4 Proof of Theorem 3.2
This section is devoted to the proof of Theorem 3.2, which is an adaptation of the results presented by M. Rousset in [28]. Throughout this section we consider a generic sequence of IPS generators satisfying Assumption 3.1 for some McKean generator (13). Furthermore, we assume that the normalized Feynman-Kac measure is asymptotically stable, i.e. Assumption 2.2 holds.
The proof makes use of the propagator of defined in (14), and the martingale characterization of . We denote by the set of bounded functions such that is continuous on for every and has continuous time derivative for every . Following the standard martingale characterization of Feller-type Markov processes, using Itô’s formula and (19a) one can show that (see also [28], Proposition 3.3), for every , the process
[TABLE]
is a local martingale. With (19b) its predictable quadratic variation is bounded by
[TABLE]
for some constant independent of and , and with (19c) jumps are bounded by
[TABLE]
The following technical Lemma for martingales will play a central role in the proof of Theorem 3.2.
Lemma 4.1**.**
Let be a locally square-integrable martingale with continuous predictable quadratic variation , and uniformly bounded jumps . Then, for every and , there exists a constant such that
[TABLE]
Proof.
See [28], Lemma 6.2.
∎
4.1 Properties of the normalized propagator
Lemma 4.2**.**
For any test function and , we have for the normalized propagator (14)
[TABLE]
Proof.
See [28], p. 836. The idea of the proof is to substitute (3) into the time derivative of (14).
∎
Lemma 4.3**.**
Under Assumption 2.2 on asymptotic stability, for any and and , there exists a constant such that
[TABLE]
Moreover, for any , there exists some , such that
[TABLE]
Proof.
The proof can be found in [28, Lemma 5.1] and the result is due to the asymptotic stability of the Feynman-Kac model.
∎
Observe that, applying Lemma 4.2 to the martingale characterization (44) of , we obtain
[TABLE]
for any , where the last equality follows by the characterization (11) of McKean models. By (47), we obtain the stochastic differential equation
[TABLE]
Moreover, applying Lemma 4.3 to the predictable quadratic variation (45), we obtain that almost surely,
[TABLE]
where .
Note that Equation (47) for centered test functions can be rewritten as
[TABLE]
The martingale characterization (47)-(50) will be the key element in the proof of Theorem 3.2.
4.2 and bias estimates
Define
[TABLE]
with and . Observe that the measure can be also rewritten in terms of (14) as
[TABLE]
for any . To prove Theorem 3.2, we consider the decomposition
[TABLE]
for any . The proof is structured as follows:
- •
In Lemma 4.4, we bound the first term of the decomposition under Assumptions 2.2 and 3.1;
- •
In Lemma 4.5, we bound the second term under Assumption 2.2;
- •
In Lemma 4.6, we combine Lemma 4.4 and Lemma 4.5 to obtain -error estimates of order , for some ;
- •
Finally, from Lemma 4.6 we derive, by iteration, estimates of order , as presented in Theorem 3.2.
Lemma 4.4**.**
Consider a sequence of particle approximations satisfying Assumption 3.1 with empirical distributions (18). Under Assumption 2.2 on asymptotic stability, for any there exists a constant such that
[TABLE]
for any and .
Proof.
This is an adaptation of the first part of the proof of Lemma 5.3 in [28]. First, consider
[TABLE]
with . Observe that, by the stochastic differential equation (48), we can write
[TABLE]
for any . Therefore,
[TABLE]
Fixing , the process
[TABLE]
with , as the integral of a progressively measurable process with respect to a local martingale, is itself a local martingale with predictable quadratic variation given by
[TABLE]
and jumps bounded by
[TABLE]
by Assumption (19c) on bounded jumps, (46) and Lemma 4.3.
Moreover, with (52), we can write
[TABLE]
where the last equality follows by (55). Noting that (A_{t}^{T})^{-1}\leq\exp\big{(}2(T-t)\cdot\|\mathcal{V}\|\big{)} by definition (54), we get
[TABLE]
By Lemma 4.1, we have that, for any ,
[TABLE]
where the last inequality follows by (49). Therefore, for , , we get
[TABLE]
By Jensen’s inequality, this bound holds for any . Applying this to inequality (56), we obtain the result.
∎
Lemma 4.5**.**
Under Assumption 2.2 on asymptotic stability with constants and , we have that for any and any such that for some , the following bound holds
[TABLE]
Furthermore, when , there exists a constant depending on such that
[TABLE]
Proof.
By definition (51) of , for any and we have
[TABLE]
Taking to be the principal eigenvalue of , using Assumption 2.2 on asymptotic stability and the basic fact , we can write
[TABLE]
Therefore, for , for some , we have
[TABLE]
and similarly
[TABLE]
Therefore,
[TABLE]
Now, for , observe that
[TABLE]
Using the basic fact , to conclude it is enough to observe that, for any ,
[TABLE]
with constant depending on . Indeed, with (19d) at time , is the sum of i.i.d. random variables with law . Inequality (57) is then a direct application of Marcinkiewicz-Zygmund/BDG inequalities for i.i.d. variables.
∎
Lemma 4.6**.**
Consider a sequence of particle approximations satisfying Assumption 3.1 with empirical distributions (18). Under Assumption 2.2, there exists such that for any there exist such that
[TABLE]
for any large enough.
Proof.
Recalling decomposition (53), where the first term is estimated in Lemma 4.4 and the second in Lemma 4.5, and using the basic fact , we obtain
[TABLE]
taking , and
[TABLE]
taking such that is large enough.
The idea is to find and such that
[TABLE]
Recalling that , the solution is given by
[TABLE]
provided to ensure that . Also observe that for large enough, satisfies the conditions in Lemma 4.5.
Otherwise, in case , we consider the bound (58) instead, and we obtain
[TABLE]
with
[TABLE]
Taking the result follows from observing that
[TABLE]
for and at most of order as above, or for given by (60).
∎
Proof of Theorem 3.2.
We denote
[TABLE]
in accordance with Rousset [28], Section 5.2. Using (50), we have
[TABLE]
with for any .
First, observe that, similarly to (57), we have
[TABLE]
for some constant depending on . Moreover, by Lemma 4.1 and bound (49), we get with another -dependent constant
[TABLE]
Finally, writing
[TABLE]
and using Hölder’s inequality, we get
[TABLE]
by Lemma 4.3. Using the fact that
[TABLE]
for centered test functions, and applying the Cauchy-Schwarz inequality, we get
[TABLE]
Combining all together, we obtain
[TABLE]
for any and . In particular,
[TABLE]
for any . Applying Lemma 4.6, we get
[TABLE]
for any , by iteration of (62). Thus, we can conclude
[TABLE]
This proves the -error estimate (20).
We conclude by proving the bias estimate (21). By Equation (50), we have
[TABLE]
By (61) for , we obtain
[TABLE]
∎
5 Interacting particle approximations for dynamic large deviations
5.1 Large deviations and Feynman-Kac models
Dynamic large deviations of continuous-time jump processes are a common application area of cloning algorithms [1, 2]. For a given process with bounded rates (1) and path space as outlined in Section 2, we consider a time-additive observable , taken to be a real measurable function of the paths of over the time interval of the form [42]
[TABLE]
Here is such that , for any , and , with a realization of . Note that is well defined since the bound on implies that the process does not explode and the first sum contains almost surely only finitely many non-zero terms for any .
More precisely, we are interested in studying the limiting behaviour, as , of the family of probability measures on , where represents the initial distribution of the underlying process. This can be characterized by the large deviation principle (LDP) [43, 44], in terms of a rate function. We assume that an LDP with convex rate function holds, which can be written as
[TABLE]
for every closed and open. For the study of large deviations, a key role is played by the scaled cumulant generating function (SCGF)
[TABLE]
Indeed, if the rate function is convex and the limit in (64) exists and is finite for every , then is fully characterized by the SCGF via Legendre duality (see [43], Theorem 4.5.10), i.e.
[TABLE]
The SCGF is also the object that can be numerically approximated by cloning algorithms [1, 2] and related approaches and our main aim in this section is to illustrate how our results on Feynman-Kac models can be applied here. Possible subtleties regarding the LDP are not our focus and we restrict ourselves to settings where exists and is finite. In the following we introduce the associated Feynman-Kac models in the notation that is established in this context.
Lemma 5.1**.**
For any the family of operators \big{(}P_{k}(t):t\geq 0\big{)} on defined by
[TABLE]
with , is well defined and it is a non-conservative semigroup, the so-called tilted semigroup.
Moreover, the infinitesimal generator associated with \big{(}P_{k}(t):t\geq 0\big{)}, in the sense of the Hille-Yosida Theorem, can be written in the form
[TABLE]
for and all , with and the bounded continuous functions which characterize via (63). In particular, the semigroup satisfies the differential equations
[TABLE]
for all and .
Proof.
See [42], Appendix A.1.
∎
Observe that, if the SCGF (64) is independent of the choice of the initial distribution , it can be written in terms of the tilted semigroup as
[TABLE]
for all , moreover is the spectral radius of the generator (see also (70) below). With Assumption 2.2 on asymptotic stability, is also the principal eigenvalue of and there exists a probability measure 222To avoid notation overload, we omit writing explicitly the dependence of certain quantities on the fixed parameter in the rest of this section. and constants and such that
[TABLE]
for every and . Note that this implies the independence of the SCGF from the initial distribution, , and thus (68) holds for every initial state . Note that (69) implies in particular that converges weakly to for all initial distributions , and that is the unique invariant probability measure for the modified semigroup . Therefore we have from the generator of this semigroup that
[TABLE]
Neither the semigroup nor the modified one conserve probability, and therefore they do not provide a corresponding process to sample from and use standard MCMC methods to estimate the SCGF . This can be achieved by interpreting the tilted generator through Feynman-Kac models analogous to Lemma 2.1, so that we can apply our results from Section 3.
Lemma 5.2**.**
The infinitesimal generator (66) can be written as
[TABLE]
for all and . Here
[TABLE]
is the generator of a pure jump process with modified rates , and
[TABLE]
is a diagonal potential term where is the escape rate of .
Proof.
Follows directly from the definition of in (66).
∎
In analogy with (1), in the following we also use the notation with a probability kernel
[TABLE]
Observe that
[TABLE]
thus, we get with (70) another representation of the SCGF,
[TABLE]
Recall the unnormalized and normalized versions of the Feynman-Kac measures defined in (5) and (7) for a given initial distribution ,
[TABLE]
and that asymptotic stability (69) implies that weakly as . This suggests the following finite-time approximations for .
Proposition 5.3**.**
For any and every , we have that
[TABLE]
where is defined in (73). In particular, if asymptotic stability (69) is satisfied,
[TABLE]
Proof.
Recalling the evolution equation (6) of , we have
[TABLE]
And, thus,
[TABLE]
since . We can conclude by observing that and
[TABLE]
using that the SCGF is well defined under asymptotic stability (69).
∎
For any , we define
[TABLE]
as a finite-time approximation for .
Lemma 5.4**.**
For any , under asymptotic stability (69) with , there exists a constant such that
[TABLE]
for any given and .
Proof.
By (8), we have
[TABLE]
where , using the basic fact . In particular, , by (76).
∎
Note that for the above result only implies a convergence rate of order , since errors from the arbitrary initial condition have to be averaged out over time. In contrast for (corresponding to the usual idea of burn-in in conventional Markov chain Monte Carlo approximations – see [45], for example), we get a much better exponential rate of convergence dominated by the asymptotic stability parameter .
5.2 Estimation of the SCGF
In this section we establish the convergence of estimators of the SCGF, (64), provided by interacting particle approximations. Approximating by the empirical distribution (18) associated to an interacting particle system, we can estimate with
[TABLE]
Note that, choosing in Proposition 3.3 and (77) implies that \exp\big{(}{t\cdot\Lambda_{k}^{0,t,N}}\big{)} is an unbiased estimator of \exp\big{(}{t\cdot\Lambda_{k}^{0,t}}\big{)}. Recall that particle approximations are characterized by a sequence of IPS generators on , based on the McKean generators (13)
[TABLE]
where describes the selection dynamics of the McKean model as in Lemma 2.4, with examples in (15) or (16). Due to tilted dynamics explained in Lemma 5.2 we have an additional dependence on the parameter .
Proposition 5.5**.**
Given , let be a sequence of IPS generators satisfying Assumption 3.1 with McKean generators . Under asymptotic stability (69) with , for every and there exist constants independent of and , such that
[TABLE]
and
[TABLE]
for any large enough and .
Proof.
First, note that
[TABLE]
The bound for the second term is given in Lemma 5.4, whereas we can bound the first term by observing that
[TABLE]
and applying Theorem 3.2. The second claim can be established similarly.
∎
Proposition 5.5 provides the and bias estimates of the approximation error with order of convergence respectively given by and . The necessarily finite simulation time leads to an additional error of order , with , which is controlled by asymptotic stability properties of the process as summarized in Lemma 5.4. Ideally, during simulations we want to choose the final time with respect to the population size in order to balance both terms in (80), resp. (81). The details depend on asymptotic stability properties of the process and values of constants, but it is clear in general that choosing any would only give the same order of convergence as , which is computationally cheaper. Proposition 5.5 also implies that converges almost surely to as .
5.3 The Cloning Factor
Most results in the physics literature do not use the estimator (79) based on the ergodic average of the mean fitness of the clone ensemble, but an estimator based on a so-called ‘cloning factor’ (see, e.g., [1, 5, 7]). This is essentially a continuous-time jump process on with , where at each cloning event of size at a given time , the value is updated as
[TABLE]
where occurs when there is a ’killing’ event. In our context, we can define the dynamics of jointly with the cloning algorithm via an extension of the cloning generator (33) as introduced in Section 3.3, with exit rate and probability kernel replaced by and , respectively. On the state space define
[TABLE]
where the test function now has a second counting coordinate, and we denote \varsigma_{n}:=\varsigma\cdot\big{(}1{+}\tfrac{n}{N}\big{)}, with . Also recall that the cloning algorithm is based on a McKean model with parameter as given in (15).
We introduce the coordinate projection in order to observe only the cloning factor, . Note that is not compact, and is an unbounded test function. However, since the range of the clone size distribution is uniformly bounded (condition 19c), is a birth-death process on with bounded jump length, and the generator (82) and associated semigroup is therefore well defined for the test function (see e.g. [46]) and all .
The following result provides an unbiased estimator for the unnormalized quantity based on the cloning factor.
Proposition 5.6**.**
Let be the extension (82) of the cloning generator (33). Then, the quantity is an unbiased estimator for (5), i.e.
[TABLE]
for every and , and all choices of the parameter (cf. (15)).
Proof.
First, observe that following (40) and (41)
[TABLE]
using the mean of the distribution as given in Proposition 3.4. Therefore,
[TABLE]
and analogously to (24), the expected time evolution of is then given by
[TABLE]
This is also the evolution of , since
[TABLE]
With initial conditions , the statement follows by a Gronwall argument analogous to (23) and by Proposition 3.3.
∎
Proposition 5.6 leads to an alternative estimator for (78) given by
[TABLE]
Note that this is not itself unbiased as a consequence of the nonlinear transformation involving the logarithm.
In order to study the convergence of the new estimator to the SCGF, it is convenient to use the martingale characterization of the process, which is given by the following result.
Proposition 5.7**.**
Let be the extension (82) of the cloning generator . Then, the process
[TABLE]
with , is a local martingale satisfying
[TABLE]
and with predictable quadratic variation
[TABLE]
where is the second moment of the distribution (35).
Remark*.*
Note that, in the case in which there is at most one clone per transition event, i.e. if , then
[TABLE]
Proof.
Observe that we can rewrite (82) as
[TABLE]
using the expansion as . Similarly,
[TABLE]
The statement corresponds to the martingale problem associated to .
∎
By Proposition 5.7 and recalling the definition of the SCGF estimators (79) and (84) we immediately get
[TABLE]
In what follows, we discuss the convergence of the estimator to the SCGF , which is based on the cloning factor.
Theorem 5.8**.**
Let be the extension (82) of the cloning generator . Then, for every and , there exists a constant such that for all large enough
[TABLE]
If in addition Assumption (69) on asymptotic stability holds, there exist constants and (dependent on and ) such that
[TABLE]
for every .
Proof.
Thanks to Jensen’s inequality, it is enough to prove the inequality for all , . First, we can write
[TABLE]
Observe that \sup_{t\leq T}\big{|}\mathcal{M}^{\star}_{t}\big{|}<\infty, so the assumptions of Lemma 4.1 are satisfied. Thus, using Lemma 4.1, we obtain
[TABLE]
The second part of the Theorem follows directly by Proposition 5.5.
∎
Therefore, the -error for estimator has the same rate of convergence as . Analogous results hold for the bias estimates, which have order of convergence as for the estimator (Proposition 5.5), since with (85) the difference of both estimators is only of order .
6 Discussion
In this work we have established a framework to compare variants of cloning algorithms and understand their connections with mean field particle approximations. This allowed us to obtain first rigorous results on the convergence properties of cloning algorithms in continuous time. Our results apply in the general setting of jump Markov processes on locally compact state spaces. Essential conditions for our approach are summarized in Assumptions 2.2 on asymptotic stability of the process and 3.1 on the particle approximation, which are usually straightforward to check for practical applications. We summarize further sufficient conditions for asymptotic stability in the Appendix A.
In certain situations the cloning algorithm is computationally cheaper and simpler to implement than mean field particle systems, since only the mutation process has to be sampled independently for all particles and cloning events happen simultaneously. However, as discussed in [34], this choice reduces in general the accuracy of the estimator since it does not consider the fitness potential of the replaced particles during the cloning events. Adjusting the algorithm by allowing only substitutions of particles with lower fitness based on different McKean models could improve the accuracy. The approach developed in this paper can be used to conduct a systematic study of this question, which is current work in progress.
Appendix A Asymptotic Stability
We present sufficient conditions for asymptotic stability as presented in Assumption 2.2. The discussion is based on the work of Tweedie et al. [47, 48], which we briefly recall in Lemma A.3 below.
Definition A.1**.**
A Feller process is said to be -irreducible for a non-trivial measure (i.e. ) on , if \mathbb{E}_{x}\big{[}\int_{0}^{\infty}\mathbbm{1}_{Y_{t}\in A}dt\big{]}>0 for every and every set such that . We simply say that is irreducible if it is -irreducible for some .
Definition A.2**.**
A -irreducible Feller process is called aperiodic if there exists a small set , , such that the associated Markov semigroup satisfies the following conditions:
- •
there exists a non-trivial measure and such that , for all and ;
- •
there exists a time such that , for all and .
Lemma A.3**.**
Let be a -irreducible and aperiodic Feller process on a locally compact state space such that has non-empty interior. Denote by and the associated infinitesimal generator and the semigroup, respectively. Assume that for a given function such that , there exist constants and a compact set such that for all
[TABLE]
Then there exist constants and such that for any test function and ,
[TABLE]
for any , where is the (unique) invariant measure of .
Proof.
See [47], Theorem 5.2(c), using the fact that if a Feller process is -irreducible and has non-empty interior, then every compact set is petite (See [48], Theorem 7.1 and Theorem 5.1).
∎
In the following we discuss how the spectral properties of the tilted generator in Assumption 2.2 can imply asymptotic stability in the sense of (4).
Assumption A.4**.**
We assume that the spectrum of (2) is bounded by a greatest eigenvalue . Moreover, there exist a positive function , unique up to multiplicative constants, and a probability measure satisfying respectively
[TABLE]
and
[TABLE]
Without loss of generality, we can assume .
Remark*.*
Sufficient conditions for Assumption A.4 to hold can be found, for instance, in [49, 50]. These are of course satisfied if the original process with generator is an irreducible, finite-state Markov chain, including for example stochastic particle systems on finite lattices with a fixed number of particles.
Under Assumption A.4, we define the generator
[TABLE]
which is known in the literature as Doob’s -transform of [42] or twisted Markov kernel [51]. Observe that , so that it is a probability generator associated to a Markov process with probability semigroup defined for any by
[TABLE]
Proposition A.5** (Asymptotic stability).**
Assume that there exists such that the set
[TABLE]
is compact. Under Assumption A.4, if the initial pure jump process with generator is -irreducible for some for which has non-empty interior, and aperiodic as defined above then (4) holds, i.e. there exists and such that
[TABLE]
for every and .
Proof.
First, note that if the initial process is irreducible and aperiodic, then also the process associated to is irreducible and aperiodic. Moreover, is bounded in and for every . Therefore, the hypotheses of Lemma A.3 are satisfied for the generator acting on the function . Thus, applying the lemma we obtain
[TABLE]
for any and , where is the invariant measure for . Dividing by and substituting with , we obtain the statement ( and can be included in the constant ).
∎
Acknowledgments
This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1 and the Lloyd’s Register Foundation–Alan Turing Institute Programme on Data-Centric Engineering; AMJ was partially supported by EPSRC grants EP/R034710/1 and EP/T004134/1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Cristian Giardina, Jorge Kurchan, and Luca Peliti. Direct evaluation of large-deviation functions. Physical Review Letters , 96(12):120603, 2006.
- 2[2] Vivien Lecomte and Julien Tailleur. A numerical approach to large deviations in continuous time. Journal of Statistical Mechanics: Theory and Experiment , 2007(03):P 03004, 2007.
- 3[3] James B Anderson. A random-walk simulation of the Schrödinger equation: H 3 + subscript superscript 𝐻 3 {H}^{+}_{3} . The Journal of Chemical Physics , 63(4):1499–1503, 1975.
- 4[4] Peter Grassberger. Go with the winners: A general Monte Carlo strategy. Computer Physics Communications , 147(1-2):64–70, 2002.
- 5[5] Cristian Giardina, Jorge Kurchan, Vivien Lecomte, and Julien Tailleur. Simulating rare events in dynamical processes. Journal of Statistical Physics , 145(4):787–811, 2011.
- 6[6] Pablo I Hurtado, Carlos P Espigares, Jesús J del Pozo, and Pedro L Garrido. Thermodynamics of currents in nonequilibrium diffusive systems: theory and simulation. Journal of Statistical Physics , 154(1-2):214–264, 2014.
- 7[7] Carlos Pérez-Espigares and Pablo I Hurtado. Sampling rare events across dynamical phase transitions. Chaos: An Interdisciplinary Journal of Nonlinear Science , 29(8):083106, 2019.
- 8[8] Pablo I Hurtado and Pedro L Garrido. Test of the additivity principle for current fluctuations in a model of heat conduction. Physical Review Letters , 102(25):250601, 2009.
