Conditional Monte Carlo for Reaction Networks

David F. Anderson; Kurt W. Ehlert

arXiv:1906.05353·math.NA·January 5, 2022

Conditional Monte Carlo for Reaction Networks

David F. Anderson, Kurt W. Ehlert

PDF

Open Access 1 Repo

TL;DR

This paper introduces a conditional Monte Carlo estimator for reaction network models that improves probability estimation accuracy in high-dimensional, stochastic systems with small species counts, while maintaining simplicity.

Contribution

The authors develop a novel conditional Monte Carlo estimator with parameter optimization and provide theoretical guarantees including a central limit theorem.

Findings

01

Enhanced estimator accuracy over classical Monte Carlo methods.

02

Efficient parameter approximation for optimal performance.

03

Theoretical validation via a central limit theorem.

Abstract

Reaction networks are often used to model interacting species in fields such as biochemistry and ecology. When the counts of the species are sufficiently large, the dynamics of their concentrations are typically modeled via a system of differential equations. However, when the counts of some species are small, the dynamics of the counts are typically modeled stochastically via a discrete state, continuous time Markov chain. A key quantity of interest for such models is the probability mass function of the process at some fixed time. Since paths of such models are relatively straightforward to simulate, we can estimate the probabilities by constructing an empirical distribution. However, the support of the distribution is often diffuse across a high-dimensional state space, where the dimension is equal to the number of species. Therefore generating an accurate empirical distribution…

Equations208

p_{t}^{ν} (x) = def P_{ν} (X (t) = x), x \in Z_{\geq 0}^{d} .

p_{t}^{ν} (x) = def P_{ν} (X (t) = x), x \in Z_{\geq 0}^{d} .

\frac{d}{dt}p^{\nu}_{t}(x)=\sum_{r=1}^{R}\big{[}p_{t}^{\nu}(x-\zeta_{r})\lambda_{r}(x-\zeta_{r})-p_{t}^{\nu}(x)\lambda_{r}(x)\big{]},\,x\in\mathbb{Z}_{\geq 0}^{d},

\frac{d}{dt}p^{\nu}_{t}(x)=\sum_{r=1}^{R}\big{[}p_{t}^{\nu}(x-\zeta_{r})\lambda_{r}(x-\zeta_{r})-p_{t}^{\nu}(x)\lambda_{r}(x)\big{]},\,x\in\mathbb{Z}_{\geq 0}^{d},

\frac{1}{n} i = 1 \sum n \mathbbm 1 (X_{i} (t) = x) \approx E_{ν, 0} [\mathbbm 1 (X (t) = x)] = p_{t}^{ν} (x),

\frac{1}{n} i = 1 \sum n \mathbbm 1 (X_{i} (t) = x) \approx E_{ν, 0} [\mathbbm 1 (X (t) = x)] = p_{t}^{ν} (x),

p_{t}^{ν} (x)

p_{t}^{ν} (x)

= E_{ν, 0} [E_{ν, 0} [\mathbbm 1 (X (t) = x) ∣ X (t - h)]]

= E_{ν, 0} [E_{X (t - h), t - h} [\mathbbm 1 (X (t) = x)]]

= n \to \infty lim \frac{1}{n} i = 1 \sum n E_{X_{i} (t - h), t - h} [\mathbbm 1 (X (t) = x)], a.s.

\overset{p}{^}_{t}^{ν} (x; n, m, h) = def \frac{1}{n} i = 1 \sum n \frac{1}{m} j = 1 \sum m \mathbbm 1 (X_{ij} (t) = x),

\overset{p}{^}_{t}^{ν} (x; n, m, h) = def \frac{1}{n} i = 1 \sum n \frac{1}{m} j = 1 \sum m \mathbbm 1 (X_{ij} (t) = x),

n \to \infty lim \overset{p}{^}_{t}^{ν} (x; n, m, h) = E_{ν, 0} [\frac{1}{m} j = 1 \sum m \mathbbm 1 (X_{ij} (t) = x)] = p_{t}^{ν} (x) .

n \to \infty lim \overset{p}{^}_{t}^{ν} (x; n, m, h) = E_{ν, 0} [\frac{1}{m} j = 1 \sum m \mathbbm 1 (X_{ij} (t) = x)] = p_{t}^{ν} (x) .

q (x, x^{'}) = r = 1 \sum R λ_{r} (x) \mathbbm 1 (x^{'} - x = ζ_{r}) .

q (x, x^{'}) = r = 1 \sum R λ_{r} (x) \mathbbm 1 (x^{'} - x = ζ_{r}) .

λ_{r} (x) = κ_{r} i = 1 \prod d \frac{x _{i} !}{( x _{i} - y _{i} )!} \mathbbm 1 (x_{i} \geq y_{i}),

λ_{r} (x) = κ_{r} i = 1 \prod d \frac{x _{i} !}{( x _{i} - y _{i} )!} \mathbbm 1 (x_{i} \geq y_{i}),

X (t) = X (0) + r = 1 \sum R Y_{r} (\int_{0}^{t} λ_{r} (X (s)) d s) ζ_{r},

X (t) = X (0) + r = 1 \sum R Y_{r} (\int_{0}^{t} λ_{r} (X (s)) d s) ζ_{r},

X 1 2 X .

X 1 2 X .

\emptyset 50 X, X 1 \emptyset.

\emptyset 50 X, X 1 \emptyset.

A 2 2 A, A + B 0.01 2 B, B 2 \emptyset.

A 2 2 A, A + B 0.01 2 B, B 2 \emptyset.

G 25 G + m R N A, m R N A 100 m R N A + P

G 25 G + m R N A, m R N A 100 m R N A + P

2 P 0.001 D, m R N A 0.1 \emptyset, P 1 \emptyset.

\emptyset A, A \emptyset, \emptyset B, B \emptyset.

\emptyset A, A \emptyset, \emptyset B, B \emptyset.

λ_{1} (x) = \frac{50}{1 + 2 x _{2}}, λ_{2} (x) = x_{1}, λ_{3} (x) = \frac{50}{1 + 2 x _{1}}, λ_{4} (x) = x_{2},

λ_{1} (x) = \frac{50}{1 + 2 x _{2}}, λ_{2} (x) = x_{1}, λ_{3} (x) = \frac{50}{1 + 2 x _{1}}, λ_{4} (x) = x_{2},

A 10 B, B 10 A, B 0.1 C .

A 10 B, B 10 A, B 0.1 C .

\text{MISE}(\hat{p}_{t}^{\nu})\stackrel{{\scriptstyle\text{def}}}{{=}}\mathbb{E}_{\nu,0}\left[\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!\big{(}\hat{p}_{t}^{\nu}(x)-p_{t}^{\nu}(x)\big{)}^{2}\right].

\text{MISE}(\hat{p}_{t}^{\nu})\stackrel{{\scriptstyle\text{def}}}{{=}}\mathbb{E}_{\nu,0}\left[\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!\big{(}\hat{p}_{t}^{\nu}(x)-p_{t}^{\nu}(x)\big{)}^{2}\right].

expected # of reactions in [0, t - h] E_{ν, 0} [\int_{0}^{t - h} λ_{0} (X (s)) d s] + m \cdot expected # of reactions in [t - h, t] E_{ν, 0} [\int_{t - h}^{t} λ_{0} (X (s)) d s],

expected # of reactions in [0, t - h] E_{ν, 0} [\int_{0}^{t - h} λ_{0} (X (s)) d s] + m \cdot expected # of reactions in [t - h, t] E_{ν, 0} [\int_{t - h}^{t} λ_{0} (X (s)) d s],

n \cdot c (E_{ν, 0} [\int_{0}^{t - h} λ_{0} (X (s)) d s] + m \cdot E_{ν, 0} [\int_{t - h}^{t} λ_{0} (X (s)) d s]),

n \cdot c (E_{ν, 0} [\int_{0}^{t - h} λ_{0} (X (s)) d s] + m \cdot E_{ν, 0} [\int_{t - h}^{t} λ_{0} (X (s)) d s]),

\frac{1}{n ~} i = 1 \sum \tilde{n} \int_{0}^{t - h} λ_{0} (X_{i} (s)) d s, and \frac{1}{n ~} i = 1 \sum \tilde{n} \int_{t - h}^{t} λ_{0} (X_{i} (s)) d s .

\frac{1}{n ~} i = 1 \sum \tilde{n} \int_{0}^{t - h} λ_{0} (X_{i} (s)) d s, and \frac{1}{n ~} i = 1 \sum \tilde{n} \int_{t - h}^{t} λ_{0} (X_{i} (s)) d s .

\begin{split}&\min_{n,m,h}\,\,\underbrace{\mathbb{E}_{\nu,0}\left[\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!\big{(}\hat{p}_{t}^{\nu}(x;n,m,h)-p_{t}^{\nu}(x)\big{)}^{2}\right]}_{\text{mean integrated squared error (MISE)}},\end{split}

\begin{split}&\min_{n,m,h}\,\,\underbrace{\mathbb{E}_{\nu,0}\left[\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!\big{(}\hat{p}_{t}^{\nu}(x;n,m,h)-p_{t}^{\nu}(x)\big{)}^{2}\right]}_{\text{mean integrated squared error (MISE)}},\end{split}

n \cdot c (E_{ν, 0} [\int_{0}^{t - h} λ_{0} (X (s)) d s] + m \cdot E_{ν, 0} [\int_{t - h}^{t} λ_{0} (X (s)) d s]) \leq b n, m \in Z_{\geq 1} and 0 \leq h \leq t .

n \cdot c (E_{ν, 0} [\int_{0}^{t - h} λ_{0} (X (s)) d s] + m \cdot E_{ν, 0} [\int_{t - h}^{t} λ_{0} (X (s)) d s]) \leq b n, m \in Z_{\geq 1} and 0 \leq h \leq t .

\mathbb{E}_{\nu,0}\left[\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!\big{(}\hat{p}_{t}^{\nu}(x;n,m,h)-p_{t}^{\nu}(x)\big{)}^{2}\right]=\\ \frac{1}{n}\left[\frac{1}{m}+\left(1-\frac{1}{m}\right)P_{\nu}(X_{11}(t)=X_{12}(t))-\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!p_{t}^{\nu}(x)^{2}\right].

\mathbb{E}_{\nu,0}\left[\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!\big{(}\hat{p}_{t}^{\nu}(x;n,m,h)-p_{t}^{\nu}(x)\big{)}^{2}\right]=\\ \frac{1}{n}\left[\frac{1}{m}+\left(1-\frac{1}{m}\right)P_{\nu}(X_{11}(t)=X_{12}(t))-\sum_{x\in\mathbb{Z}_{\geq 0}^{d}}\!\!p_{t}^{\nu}(x)^{2}\right].

f (m, h) = def

f (m, h) = def

\times 1 + (m - 1) P_{ν} (X_{11} (t) = X_{12} (t)) - m x \in Z_{\geq 0}^{d} \sum p_{t}^{ν} (x)^{2} .

m, h min f (m, h) m \in Z_{\geq 1}, 0 \leq h \leq t .

m, h min f (m, h) m \in Z_{\geq 1}, 0 \leq h \leq t .

X (t)

X (t)

Z (t)

Λ_{r}^{a, b} = \int_{a}^{b} λ_{r} (X (s)) d s,

Λ_{r}^{a, b} = \int_{a}^{b} λ_{r} (X (s)) d s,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kehlert/conditional_monte_carlo_example
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Gene Regulatory Network Analysis · Stochastic processes and statistical mechanics

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim \headersConditional Monte Carlo for Reaction NetworksD. F. Anderson and K. W. Ehlert

\externaldocumentsupplement

Conditional Monte Carlo for Reaction Networks

David F. Anderson Department of Mathematics, University of Wisconsin-Madison () [email protected]

Kurt W. Ehlert Department of Mathematics, University of Wisconsin-Madison () [email protected]

Abstract

Reaction networks are often used to model interacting species in fields such as biochemistry and ecology. When the counts of the species are sufficiently large, the dynamics of their concentrations are typically modeled via a system of differential equations. However, when the counts of some species are small, the dynamics of the counts are typically modeled stochastically via a discrete state, continuous time Markov chain.

A key quantity of interest for such models is the probability mass function of the process at some fixed time. Since paths of such models are relatively straightforward to simulate, we can estimate the probabilities by constructing an empirical distribution. However, the support of the distribution is often diffuse across a high-dimensional state space, where the dimension is equal to the number of species. Therefore generating an accurate empirical distribution can come with a large computational cost.

We present a new Monte Carlo estimator that fundamentally improves on the “classical” Monte Carlo estimator described above. It also preserves much of classical Monte Carlo’s simplicity. The idea is basically one of conditional Monte Carlo. Our conditional Monte Carlo estimator has two parameters, and their choice critically affects the performance of the algorithm. Hence, a key contribution of the present work is that we demonstrate how to approximate optimal values for these parameters in an efficient manner. Moreover, we provide a central limit theorem for our estimator, which leads to approximate confidence intervals for its error.

keywords:

Monte Carlo, continuous time Markov chain, chemical master equation, nonparametric density estimation, reaction networks

{AMS}

65C05, 60J28, 62G07

1 Introduction

Systems of interacting species appear often in nature. To better understand the dynamics of such systems, we can model them as reaction networks with deterministic or stochastic dynamics [12, 28, 35, 58]. If the counts of the constituent species are high, then the dynamics are commonly modeled by a system of differential equations [12, 24, 58]. However, if the count of any species is small, then a stochastic model with a discrete state space is more appropriate [11, 12, 42, 50, 55, 58].

Since the amount of each species is necessarily nonnegative and discrete, the state space of the stochastic process is a subset of $\mathbb{Z}_{\geq 0}^{d}$ , where $d$ is the number of species types. Let $\nu$ be the distribution of the initial state, which is often a point mass distribution, and suppose we are interested in the distribution of the state of the process at some fixed time $t>0$ . That is, if $X(t)$ is the state of the process at time $t$ , then we would like to know the value of

[TABLE]

In general, finding the exact values of $p_{t}^{\nu}(\cdot)$ is extremely difficult. More precisely, the authors are not aware of any general class of models for which $p_{t}^{\nu}$ can be solved for explicitly, with the exception of linear, or first-order, models [33] or, more generally, models that satisfy a dynamical and restricted complex-balanced condition and admit a time-dependent product form Poisson distribution [13]. However, there are many numerical methods that give an estimate. One type of approach is to approximately solve Kolmogorov’s forward equation, which is called the chemical master equation (CME) in much of the biology and chemistry literature. The CME can be written as

[TABLE]

where $R$ is the number of reactions in the system, $\lambda_{r}:\mathbb{Z}^{d}_{\geq 0}\to\mathbb{R}_{\geq 0}$ is the intensity (or propensity) function for the $r$ th reaction, $\zeta_{r}\in\mathbb{Z}^{d}$ gives the net change in the counts of the species due to an occurrence of the $r$ th reaction, and the initial distribution $p_{0}^{\nu}(\cdot)$ is given by $\nu$ . See Section 2 for the precise specification of the model, including terminology.

For most models of interest, solving (1) entails solving a high-dimensional (often infinite-dimensional) system of linear ordinary differential equations. Solving such a system directly is almost always very difficult, so there has been a considerable amount of research devoted to the development of fast and accurate approximate algorithms. The general approach for many such algorithms is to first truncate the state space of the system to a smaller subset. This truncation makes solving the problem computationally feasible, at the cost of introducing a controllable error to the solution. After truncation, the new system of ODEs must be solved.

There is currently a wide variety of methods for performing both the truncation step and solution step. In particular, there is the finite state projection algorithm [45, 56], the uniformization method [21], sliding window methods [32, 59], the sparse grid method [31], the radial basis function approximation [37], a class of spectral methods [23, 34], and methods that specialize to systems with multiple scales [16, 19, 39, 40, 48]. Moreover, there are tensor methods [36, 53, 57] that represents the truncated CME with tensors.

As an alternative to approximating (1) directly via the methods above, we can take a Monte Carlo approach. That is, we can generate $n$ independent and identically distributed (i.i.d.) realizations of the process $X$ , denoted by $\{X_{i}\}_{i=1}^{n}$ , and use the Monte Carlo estimator

[TABLE]

where $\mathbb{E}_{\nu,0}$ is the expectation under the initial distribution $\nu$ and starting time of zero. By the strong law of large numbers, the approximation becomes an equality as $n$ goes to infinity.

To utilize the above estimator, we need to simulate exact realizations of the process $X$ over the time interval $[0,t]$ , and there are many methods to choose from. In particular, there is the Gillespie algorithm, also called the stochastic simulation algorithm, [26], the next reaction method [25], and the modified next reaction method [1], which are all straightforward to implement and often have similar efficiency. For our numerical results in the later sections, we used the modified next reaction method.

One drawback of using the Monte Carlo estimator Eq. 2 to approximate the solution to the CME (1) is that huge numbers of simulations are generally required to achieve a high level of accuracy. That said, the Monte Carlo estimator has at least two distinct advantages when compared against the methods that approximately solve the CME directly: it is very simple to implement and it is substantially less sensitive to the dimension of the state space.

There are two natural ways to improve upon a Monte Carlo estimator. The first way is to decrease the time required to generate realizations of the random samples (i.e., the process $X$ in our case). Lowering the time required to generate paths of the processes that we are interested in has been an active area of research for almost two decades [1, 25, 41, 43, 49, 54]. Moreover, researchers have also designed efficient algorithms that generate approximate paths that trade some accuracy for speed [2, 10, 17, 18, 22, 27, 30, 51].

The second way to improve upon a Monte Carlo estimator, and the focus of this article, is to instead lower the variance of the estimator itself. There are many broadly applicable variance reduction techniques, including coupling methods, control variates, stratified sampling, antithetic random variables, quasi-Monte Carlo, and conditional Monte Carlo [29, 47].

In this paper, we utilize a form of conditional Monte Carlo to reduce the variance. Briefly, conditional Monte Carlo follows from the observation that for one-dimensional random variables $X$ and $Y$ , defined on the same probability space, we have $E[X]=E[E[X|Y]]$ , and $\text{Var}(E[X|Y])\leq\text{Var}(X)$ , so long as all the expectations are well defined [14]. That is, one can always reduce variance by conditioning. Of course, the “art” is in the selection of an appropriate random variable $Y$ .

Returning to our situation, define $\mathbb{E}_{\nu,s}[f(X(t)]$ as the expectation of $f(X(t))$ taken with respect to the initial state distribution $\nu$ and starting time $0\leq s\leq t$ . That is, $P(X(s)=x)=\nu(x)$ . If $\nu$ is a point-mass distribution at $y\in\mathbb{Z}^{d}_{\geq 0}$ , then we write $\mathbb{E}_{y,s}[f(X(t))]$ . Fix $h\in[0,t]$ , then

[TABLE]

where the $\{X_{i}(t-h)\}_{i=1}^{n}$ are i.i.d. realizations of $X(t-h)$ . A natural estimator for the right hand side of the above equation is

[TABLE]

where we generate the $X_{ij}$ in the following manner:

•

simulate $n$ independent realizations of the process $X$ over the time interval $[0,t-h]$ , each with an initial value determined by $\nu$ , and denote the $i$ th realization by $X_{i}$ ,

•

for each $i\in\{1,\dots,n\}$ , generate $m$ conditionally independent realizations over the time interval $[t-h,t]$ , each of which has initial state $X_{i}(t-h)$ . Denote the $j$ th such realization by $X_{ij}$ .

Note that for each $j\in\{1,\dots,m\}$ , the process $X_{ij}$ is equal to $X_{i}$ over the interval $[0,t-h]$ . See Fig. 1.

Since $\{X_{i_{1}j}(t)\}_{j=1}^{m}$ and $\{X_{i_{2}j}\}_{j=1}^{m}$ are independent for $i_{1}\neq i_{2}$ , the strong law of law numbers implies that with probability one we have

[TABLE]

Hereafter we will refer to the original estimator Eq. 2 as classical Monte Carlo, and the new estimator Eq. 4 as conditional Monte Carlo. The conditional Monte Carlo estimator has two unspecified parameters, denoted $m$ and $h$ . The number of branches is determined by $m$ , and the time at which branching occurs is controlled by $h$ . If $m$ and $h$ are fixed, then the remaining parameter $n$ is simply chosen large enough such that the estimator’s variance is below some desired threshold. If $m=1$ , $h=0$ , or $h=t$ , then the conditional and classical Monte Carlo estimators are the same. If $m>1$ and $h\in(0,t)$ , then for the same computational cost as classical Monte Carlo, the conditional Monte Carlo estimator obtains more observations of $X(t)$ . We would like to choose the values of $m$ and $h$ such that, in some sense, our new estimator is more efficient than classical Monte Carlo. In Section 3, we provide an algorithm for finding optimal values of $m$ and $h$ , which is the key contribution of this article.

The distributions produced by our conditional Monte Carlo method can, of course, be used to construct unbiased estimates of moments and other expectations. However, we stress here that our new estimator is optimized for estimating the entire distribution of the process and not for estimating expectations. Estimating expectations is a separate–and very important–problem that has seen a large amount of research activity over the past decade (see [5, 6, 7, 8, 9, 17, 18, 44] for a subset of works focusing on this problem). In fact, in Appendix B we prove that the type of conditioning we carry out here (optimized for estimating the entire distribution) can not be more efficient than standard Monte Carlo for the estimation of the expected value of a linear birth process at some future time $t>0$ . This may seem surprising at first since conditioning always reduces the variance (as discussed above). However, in the present method we also use Monte Carlo to solve for the conditional expectation, which has its own cost. Determining better, and perhaps optimal, ways to estimate expectations via conditional Monte Carlo in the present context is a worthy direction of future research and will be discussed further in Section 6 and Appendix B.

The remainder of the article is organized as follows. In Section 2, we define the continuous time Markov chain model of reaction networks. Then in Section 3, we present an algorithm for finding the optimal values of $m$ and $h$ , and also the full algorithm, Algorithm 3, for the implementation of the conditional Monte Carlo estimator. Next, in Section 4, we give numerical results demonstrating the order of magnitude improvement that can be obtained with the use of conditional Monte Carlo in the current context. In Section 5, we derive a central limit theorem for the error of the conditional Monte Carlo estimator and then test it on examples. Finally, in Section 6, we summarize our results and suggest ideas for future work. The proofs of the main results are in Appendix A. The supplementary material contain more figures related to numerical results. An example MATLAB implementation of the conditional Monte Carlo algorithm is at https://github.com/kehlert/conditional_monte_carlo_example.

2 Mathematical model

Suppose our reaction network has $d$ types of species and $R$ reactions. For $1\leq r\leq R$ ,

(i)

we will denote by $\zeta_{r}$ the reaction vector for the $r$ th reaction, meaning that if the $r$ th reaction occurs at time $t$ , and the process is currently in state $x\in\mathbb{Z}^{d}_{\geq 0}$ , then the new state becomes $x+\zeta_{r}$ ;

(ii)

we will denote by $\lambda_{r}:\mathbb{Z}^{d}_{\geq 0}\to[0,\infty)$ the intensity, or propensity, function of the $r$ th reaction.

A standing assumption is that $\lambda_{r}(x)=0$ if $x+\zeta_{r}\notin\mathbb{Z}^{d}_{\geq 0}$ , which preserves the non-negativity of the components. We let $X$ be a continuous time Markov chain (CTMC) whose transition rate from state $x$ to $x^{\prime}$ is

[TABLE]

Hence, $X$ is a Markov process with infinitesimal generator $Af(x)=\sum_{r=1}^{R}\lambda_{r}(x)(f(x+\zeta_{r})-f(x)),$ where $f:\mathbb{Z}_{\geq 0}^{d}\to\mathbb{R}$ is a bounded function with compact support. We will denote our process by $X$ , so that $X(t)\in\mathbb{Z}^{d}_{\geq 0}$ is the vector whose $i$ th component gives the count of species $i$ at time $t\geq 0$ .

The most common choice of intensity function is stochastic mass action kinetics. Suppose that we require $y_{i}$ copies of species $i$ for the $r$ th reaction to occur. Then we say that $\lambda_{r}$ has stochastic mass action kinetics if

[TABLE]

for some $\kappa_{r}>0$ , which is called the rate constant of the reaction. For example, for the reaction $2A+B\to A+C$ , where $A$ , $B$ , and $C$ are the species types in our model system, the reaction vector is $(-1,-1,1)^{T}$ and $y=(2,1,0)^{T}$ , in which case $\lambda_{r}(x)=\kappa_{r}x_{1}(x_{1}-1)x_{2},$ where we have ordered the species alphabetically.

None of our theoretical results assume that the $\lambda_{r}$ has the above mass action form, but the models we tested do use it unless otherwise noted.

One well–known representation for the stochastic process $X$ is the random time change representation of Thomas Kurtz [11, 12, 38]

[TABLE]

where $X(0)$ is the initial state and the $Y_{r}$ are independent unit-rate Poisson processes. We will make use of the above representation in some of our proofs.

2.1 Examples

In the subsequent sections, we intersperse numerical results, and below is a list of all the example models we used. The species to the left of the arrows are the reactants (giving the counts of the species consumed in the reaction), and those to the right are the products. The numbers above the arrows are the rate constants $\kappa_{r}$ . Unless otherwise noted, for every model and reaction we define the intensities $\lambda_{r}$ with Eq. 5.

(i)

Birth

The initial state is $X(0)=10$ and $t=2$ . The single reaction is

[TABLE]

Following Eq. 5, the rate of the reaction is $\lambda(x)=x$ . 2. (ii)

Birth–Death

The initial state is $X(0)=100$ and $t=2$ . There are two reactions

[TABLE]

Following Eq. 5, the rates of the reactions are $\lambda_{1}(x)=50,$ and $\lambda_{2}(x)=x$ , respectively. 3. (iii)

Lotka–Volterra

This model is also often called the predator-prey model. The initial state is $A(0)=200$ and $B(0)=100$ . We set $t=4$ . The reactions are

[TABLE]

Following Eq. 5, and after ordering the species as $(A,B)$ , the rates of the reactions are $\lambda_{1}(x)=2x_{1}$ , $\lambda_{2}(x)=0.01x_{1}x_{2}$ , and $\lambda_{3}(x)=2x_{2}$ , respectively. 4. (iv)

Dimerization

In this model, $mRNA$ is translated into the protein $P$ , which then dimerizes into $D$ , and the dimer $D$ accumulates over time. The initial state for every species is zero except for $G(0)=1$ . We set $t=1$ . The reactions are

[TABLE]

Following Eq. 5, and after ordering the species as $(G,mRNA,P,D)$ , the rates of the reactions are $\lambda_{1}(x)=25x_{1}$ , $\lambda_{2}(x)=100x_{2}$ , $\lambda_{3}(x)=0.001x_{3}(x_{3}-1)$ , $\lambda_{4}(x)=0.1x_{2}$ , and $\lambda_{5}(x)=x_{3}$ respectively. 5. (v)

Toggle

Each species represses the production of the other, which leads to a probability mass function that is multimodal. The initial state is $A(0)=B(0)=0$ . We set $t=100$ . The reactions are

[TABLE]

For this model, the first and third intensity functions are not chosen to be mass action. Specifically, we let

[TABLE]

where we again ordered the species as $(A,B)$ . 6. (vi)

Fast/Slow

$A$ and $B$ quickly convert into one another, and $B$ slowly turns into $C$ . The initial state is $A(0)=B(0)=100$ and $C(0)=0$ . We set $t=10$ . The reactions are

[TABLE]

Following Eq. 5, and after ordering the species as $(A,B,C)$ , the rates are $\lambda_{1}(x)=10x_{1},$ $\lambda_{2}(x)=10x_{2}$ , and $\lambda_{3}(x)=0.1x_{2}$ , respectively.

3 Determining the values of $m$ and $h$ via optimization

The conditional Monte Carlo estimator Eq. 4 is of little value without knowledge of which values of $m$ and $h$ to use. In this section, we will show that appropriate values can be found by numerically solving an easy optimization problem.

Recall that the distribution of the process is denoted by $p_{t}^{\nu}$ , and we denote an estimate of this distribution by $\hat{p}_{t}^{\nu}$ . We will measure the quality of the estimation via the mean integrated squared error (MISE), which is

[TABLE]

Note that if $\hat{p}_{t}^{\nu}$ is constructed via our conditional Monte Carlo estimator, then it, and by extension $\text{MISE}(\hat{p}_{t}^{\nu})$ , is a function of $n,m$ , and $h$ . Suppose we have a fixed computational budget, which we denote as $b$ . We then want to choose the values of $n$ , $m$ , and $h$ so that we minimize MISE $(\hat{p}_{t}^{\nu})$ subject to our budget constraint $b$ . We choose the squared error in (7), as opposed to the total variation norm or some other $L^{p}$ error, as this choice was more amenable to analysis, especially in the derivation of the central limit theorem in Section 5.

3.1 Computational cost model

Assuming that our model is non-explosive111A process is said to explode if there are an infinite number of transitions in a finite amount of time. A process is said to be non-explosive if the probability of an explosion is zero for all initial distributions [3, 46]. the expected number of reactions required to generate $\{X_{1j}\}_{j=1}^{m}$ is given by

[TABLE]

where $\lambda_{0}(x)=\sum_{r=1}^{R}\lambda_{r}(x)$ (see Theorem A.1). Hence, the expected computational cost for our conditional Monte Carlo estimator is

[TABLE]

where $c>0$ is an unknown constant.

Since we cannot generally evaluate the expectations in the cost model Eq. 8, as this would be as difficult as the problem we are attempting to solve, we need to estimate them. To do so, fix a relatively small $\tilde{n}$ and simulate $\tilde{n}$ i.i.d. paths $\{X_{i}\}_{i=1}^{\tilde{n}}$ . Then the expectations are approximately equal to

[TABLE]

Importantly, for the fixed set of $\tilde{n}$ paths, the values Eq. 9 can be computed for a variety of different $h$ values. The process $X_{i}$ is piecewise constant, and therefore so is $\lambda_{0}(X_{i})$ . Thus, for any value of $h$ , we can easily compute the integrals so long as we have stored the jump times of $X_{i}$ and the value of $\lambda_{0}(X_{i})$ at each jump.

3.2 Optimization problem

Given a reaction network, our goal is to find values of $n$ , $m$ , and $h$ that minimize the mean integrated squared error (MISE) Eq. 7 for our conditional Monte Carlo estimator (4) while staying within our computational budget of $b$ . More precisely, we want to solve the following optimization problem

[TABLE]

subject to

[TABLE]

The following theorem will allow us to transform the above optimization problem into a more solvable form.

Theorem 3.1.

Suppose the process $X$ is non-explosive. For any fixed $n,m\in\mathbb{Z}_{\geq 1}$ and $h\in[0,t]$

[TABLE]

The proof of Theorem 3.1 can be found in Section A.2.

If we allow $n$ to be continuous, then we can use the constraint Eq. 11 to solve for $n^{-1}$ , and subsequently eliminate the constraint by substitution. This leads to a simpler optimization problem. In particular, let

[TABLE]

Then the original optimization problem Eqs. 10 and 11 is equivalent to

[TABLE]

Note that both $c$ and $b$ have dropped out of the optimization problem.

There are three terms in $f$ that we must know, or be able to approximate, in order to solve Eq. 12.

•

The expectations of the integrals. We discussed how to approximate these in Section 3.1.

•

The sum $\sum_{x}p_{t}^{\nu}(x)^{2}$ . However, we note that $\sum_{x}p_{t}^{\nu}(x)^{2}$ is the probability that two independent paths end up in the same state at time $t$ . For many models, including the ones we tested, that sum is much smaller than $P_{\nu}(X_{11}(t)=X_{12}(t))$ and is close to zero. Thus for our examples, we replace the sum with zero and make that our general recommendation.

•

The term $P_{\nu}(X_{11}(t)=X_{12}(t))$ , whose approximation is the subject of the next section.

Note that there are many models for which $\sum_{x}p_{t}^{\nu}(x)^{2}$ will not be near zero. However, for such models a small number of states will necessarily have a large probability. An example of such a model would be a Birth-Death model, as in Section 2.1, with input rate 1 and output rate 1. Such a model has a stationary distribution that is Poisson with a parameter of 1 [4], and so for large $t$ the distribution $p_{t}^{\nu}$ will concentrate on the set $\{0,1,2,3\}$ . Other examples where $\sum_{x}p_{t}^{\nu}(x)^{2}$ is not small include those with extinction events. For such models, it would not be appropriate to set this term to zero. However, for models with diffuse probability mass functions, i.e., those models for which estimating $p_{t}^{\nu}$ is difficult and are the focus of this paper, the assumption will often be valid.

3.3 Approximating the joint probability

In order to optimize the objective function $f(m,h)$ in Eq. 12, we need to know, or be able to quickly approximate, the term $P_{\nu}(X_{11}(t)=X_{12}(t))$ . The following theorem, proven in Section A.3, will allow us to make a good approximation, without requiring any additional simulations. The theorem makes use of the Skellam $(\mu_{1},\mu_{2})$ distribution, which is the distribution of the difference between two independent Poisson random variables with parameters $\mu_{1}$ and $\mu_{2}$ , respectively.

Theorem 3.2.

Let $S$ be the $d\times R$ matrix whose $r$ th column is $\zeta_{r}$ and let $\text{null}(S)$ be the right nullspace of $S$ restricted to integer values. Let $X$ and $Z$ satisfy

[TABLE]

where the $Y_{r}^{X}$ and $Y_{r}^{Z}$ are independent, unit-rate Poisson processes. Assume that $X$ is non-explosive. For each $1\leq r\leq R$ and $0\leq a\leq b\leq t$ , denote

[TABLE]

and let $K_{r}^{a,b}$ have the Skellam( $\Lambda_{r}^{a,b},\Lambda_{r}^{a,b})$ distribution. Then

[TABLE]

Note that $X$ is the process Eq. 6 that is of interest to us. Returning to our setup, if we assume that

[TABLE]

which should be valid for small $h$ , then Theorem 3.2 leads to an approximation of $P_{\nu}(X_{11}(t)=X_{12}(t))$ . In particular, we may sample $\tilde{n}$ paths and for the $i$ th such path define

[TABLE]

Then $P_{\nu}(X_{11}(t)=X_{12}(t))\approx\hat{P}_{\nu}(X_{11}(t)=X_{12}(t))$ , where

[TABLE]

and $\tilde{N}$ is a finite subset of $\text{null}(S)$ .

To find $\tilde{N}$ , we use the “Algorithm for Solving the Linear Diophantine Equation Problem” from section 1.5.2 of [20]. In general, the algorithm finds solutions $x\in\mathbb{Z}^{d}$ to linear equations of the form $Ax=b$ for rational $A$ and $b$ . In our case, we enumerate solutions to $Sk=0$ for $k\in\mathbb{Z}^{d}$ . Generally, there are infinitely many solutions, however the right-hand side of (14) is always maximized at $k=0$ , and decreases as $k$ moves away from [math]. Thus we approximate (14) by starting at $k=0$ and enumerating all “nearby” solutions. Algorithm 1 shows how to apply the algorithm from [20] to our particular problem. In all of our numerical examples, we chose $C=4$ in Algorithm 1.

3.4 Approximation to the optimization problem

By using the joint probability approximation Eq. 14, we can approximate the function $f$ in the optimization problem Eq. 12. In particular, let

[TABLE]

where $\bar{\lambda}_{0}(X(s))=\frac{1}{\tilde{n}}\sum_{i=1}^{\tilde{n}}\sum_{r=1}^{R}\lambda_{r}(X_{i}(s))$ , and the $\{X_{i}\}_{i=1}^{\tilde{n}}$ are independent paths of $X$ . Then we may substitute $f$ with $\hat{f}$ and our new optimization problem is the following:

[TABLE]

Note that above we have allowed $m$ to be real–valued, as opposed to integer valued. This allows us to use continuous optimization algorithms, which generally converge more rapidly. According to Figure SM1, which shows $\hat{f}(m,h)$ for many values of $m$ and $h$ , $\hat{f}$ does not change too quickly with $m$ , so allowing $m$ to range over the reals instead of the integers should not change the optimal values of $m$ and $h$ appreciably.

It is important to know when the optimization problem Eq. 16 has a finite solution. In the proposition below, we show that a solution necessarily exists when $\hat{P}_{\nu}(X_{11}(t)=X_{12}(t))$ is larger than the approximation used for $\sum_{x}p_{t}^{\nu}(x)^{2}$ . Since we approximate the sum with zero, we may conclude that a finite solution always exists in our setup.

Proposition 3.3.

Let $\widehat{p^{2}}$ be our approximation to $\sum_{x}p_{t}^{\nu}(x)^{2}$ . If $\hat{P}_{\nu}(X_{11}(t)=X_{12}(t))>\widehat{p^{2}}$ for all $h\in[0,t]$ , then Eq. 16 has a finite solution.

Proof 3.4.

*Since the integrals are nonnegative, $h$ is in a compact domain, $\hat{f}$ depends continuously on $h$ and $m$ , and $\lim_{m\to\infty}\hat{f}(m,h)=\infty$ , a finite solution exists. *

Algorithm 3 outlines the full conditional Monte Carlo algorithm, which brings together all of the individual pieces of the algorithm that we previously discussed.

4 Numerical results

In this section, we present numerical results demonstrating the improvement in accuracy, quantified via the mean integrated squared error (7), that comes from using our conditional Monte Carlo estimator instead of the classical Monte Carlo estimator. In particular, when near–optimal values of $m$ and $h$ are utilized, the accuracy often improves by an order of magnitude for a fixed computational budget. Moreover, we show that the function $\hat{f}$ of Eq. 16 is indeed a very good approximation for $f$ of Eq. 12 for the examples we considered, allowing us to conclude that the values of $m$ and $h$ our method produces are near–optimal.

The following steps were carried out on each of our test examples. First, we fixed an integer $n_{1}$ and computed the classical Monte Carlo estimator

[TABLE]

For all models, we used $n_{1}=10^{4}$ . We also recorded the number of random variates used in generating $p_{t}^{\text{MC}}(\ \cdot\ ;n_{1})$ , which served as the budget $b$ in the computational cost constraint Eq. 11.

After obtaining $p_{t}^{\text{MC}}(\ \cdot\ ;n_{1})$ , we computed the conditional Monte Carlo estimator

[TABLE]

for various pairs of $m$ and $h$ , and $n_{2}$ was allowed to increase until the conditional estimator used essentially the same number of random variates as the classical Monte Carlo estimator. All random variates generated for the conditional estimators were independent of those utilized for the classical estimator.

Next, for both classical and conditional Monte Carlo, we computed the integrated squared error

[TABLE]

where $\tilde{S}$ was a large fixed subset of the state space, and $\hat{p}(x)$ was either the classical or conditional Monte Carlo estimate. The ISE is itself a random variable, and so we approximated the mean integrated square error (MISE) by averaging 100 independent samples of the ISE.

The exact values of $p_{t}^{\nu}(x)$ were unknown. Thus the values were estimated with conditional Monte Carlo with a large value of $n_{1}$ (we used $n_{1}=10^{9}$ ), and with $m$ and $h$ chosen so that they approximately minimize the MISE.

Finally, we denote by $\text{MISE}_{\text{MC}}$ our estimate of the classical Monte Carlo MISE, and, for a given $m$ and $h$ , we denote by $\text{MISE}_{\text{CMC}}(m,h)$ the conditional version. For each model, and for each choice of $m$ and $h$ , an “empirical error improvement” was computed as the following ratio

[TABLE]

where a number greater than one implies that conditional Monte Carlo has a lower MISE than classical Monte Carlo when given the same computational budget. These values, one for each pair of $m$ and $h$ , can then be plotted. In the top half of Figures 2 and 3 (and Figures SM2 to SM5), we display these values with a heatmap. Of particular interest is the order of magnitude improvement in computational efficiency we see with the conditional Monte Carlo estimator as compared to classical Monte Carlo when well–chosen values of $h$ and $m$ are utilized. In particular, for the Lotka-Volterra model we see a 40-fold improvement, for the dimerization model we see a 20-fold improvement, for the toggle model we see a 20-fold improvement, and for the fast/slow model we see a 20-fold improvement. For the birth and birth–death models we see more modest improvements in computational efficiency, but this can be explained by the simplicity of these models which makes classical Monte Carlo sufficient for the task at hand. In particular, one promising aspect of the present work comes into focus with these numerical results: the more complicated the model, and the larger and more diffuse the distribution of the model (which is where other methods, including those that approximately solve the chemical master equation directly, struggle), the better the performance of the conditional Monte Carlo estimator.

In practice, we are not given the optimal values of the parameters $m$ and $h$ , so we find them via the optimization problem Eq. 16. In each of the bottom portions of Figures 2 and 3 (and Figures SM2 to SM5), we provide the values of $\hat{f}(m,h)$ for the different pairs of $m$ and $h$ . We report the inverse so that the heatmap will agree qualitatively with the top portion of the figures (higher values are desirable). We also normalized the values by multiplying them by $\hat{f}(1,0)$ , which does not affect the results of the optimization problem in any way. To generate each value $1/\hat{f}(m,h)$ we first sampled $\tilde{n}=500$ paths, which then allowed us to compute $\bar{\lambda}_{0}$ and $\hat{P}_{\nu}(X_{11}(t)=X_{12}(t))$ as detailed in the previous section. We could then use these values to compute $\hat{f}(m,h)$ via Eq. 15.

Note that the empirical error improvement and $\hat{f}$ do not need to have the same value for a pair of $m$ and $h$ . The important thing is that the maximizer of the empirical error improvement is similar to the minimizer of $\hat{f}$ . The heatmaps do indeed suggest that the true and approximate optimization problems have similar solutions. What is also clear from these numerical results is that even if $m$ and $h$ slightly deviate from their optimal values, we still get a substantial improvement.

We stress that such heatmaps do not need to be made by anyone who uses the conditional Monte Carlo algorithm. They are only used here to demonstrate that the optimization problem Eq. 16 can be safely used to find the near–optimal values of $m$ and $h$ , which can then be used to construct the desired estimator Eq. 4 via Algorithm 3.

5 A central limit theorem

In this section, we will show how to obtain an approximate one-sided confidence interval for the integrated squared error Eq. 17 without running more simulations. Specifically, for a fixed (presumably large) finite subset of the state space $\tilde{S}$ , a fixed $\alpha\in(0,1)$ , and large $n$ , we want to find a sequence of positive constants $\{C_{n}\}$ and a constant $u>0$ such that

[TABLE]

where $C_{n}$ is allowed to depend on $m$ and $h$ . The following central limit theorem will lead us to values for $\{C_{n}\}$ and $u$ .

Theorem 5.1.

Fix $m\in\mathbb{Z}_{\geq 1}$ and $h\in[0,t]$ . Let $\mathcal{S}\subset\mathbb{Z}_{\geq 0}^{d}$ be the state space of the continuous time Markov chain, and let $\tilde{\mathcal{S}}$ be a finite subset of $\mathcal{S}$ . Choose an enumeration of $\tilde{\mathcal{S}}$ and denote it $\{x_{i}\}_{i=1}^{|\tilde{\mathcal{S}}|}$ . Let $p_{t}^{\nu},\hat{p}_{t}^{\nu}\in\mathbb{R}^{|\tilde{\mathcal{S}}|}$ with their $i$ th elements equal to $p_{t}^{\nu}(x_{i})$ and $\hat{p}_{t}^{\nu}(x_{i};n,m,h)$ , respectively. Let

[TABLE]

where $\text{diag}(p_{t}^{\nu})$ is the diagonal matrix with $p_{t}^{\nu}$ along its diagonal, and $A$ is a $|\tilde{\mathcal{S}}|\times|\tilde{\mathcal{S}}|$ matrix where $A_{ij}=P_{\nu}(X_{11}(t)=x_{i},X_{12}(t)=x_{j})$ . Then

[TABLE]

*where $\{\lambda_{\ell}\}_{\ell=1}^{|\tilde{S}|}$ are the eigenvalues of $\Sigma$ and $Z_{\ell}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}N(0,1)$ . *

$\Sigma$ is usually an enormous matrix, so we do not want to store it, much less compute its eigenvalues. The Satterthwaite approximation [52] says that

[TABLE]

where $\chi^{2}(v)$ denotes a $\chi^{2}$ random variable with $v$ degrees of freedom. The approximation is obtained by matching the first two moments of the linear combination (above left-hand side) and the chi-squared distribution (above right-hand side). The advantage of the approximation is that we can estimate $\text{tr}\left(\Sigma\right)$ and $\text{tr}\left(\Sigma^{2}\right)$ without storing $\Sigma$ explicitly or computing its eigenvalues.

Theorem 5.2.

Fix $n,m\in\mathbb{Z}_{\geq 1}$ and $h\in[0,t]$ . Let $\tilde{\mathcal{S}}$ , $\{x_{k}\}_{k=1}^{|\tilde{\mathcal{S}}|}$ , and $\hat{p}_{t}^{\nu}$ be defined as in Theorem 5.1. For $1\leq i\leq n$ , let $M_{i}\in\mathbb{Z}_{\geq 0}^{|\tilde{\mathcal{S}}|}$ , and set its $k$ th element to $M_{i}(x_{k})\stackrel{{\scriptstyle\text{def}}}{{=}}\sum_{j=1}^{m}\mathbbm{1}(X_{ij}=x_{k})$ (the $\{X_{ij}\}$ are defined in Section 1). Let $\hat{\Sigma}_{n}$ be the usual sample covariance matrix of $\{M_{i}\}_{i=1}^{n}$ . Specifically,

[TABLE]

where $\overline{M}=n^{-1}\sum_{i=1}^{n}M_{i}$ . Then

[TABLE]

and

[TABLE]

Furthermore

[TABLE]

For the models we tested, the optimal value of $m$ was only moderately large (on the order of 10 to 100), and the indicator in the summand of $M_{i}(x)$ is zero for many values of $x$ . Whenever those two conditions hold, $M_{i}$ sparse. Consequently, storing $\{M_{i}\}_{i=1}^{n}$ does not require too much memory, and the terms $M_{i}^{T}M_{j}$ and $\overline{M}^{T}M_{i}$ are cheap to compute. Algorithm 4 summarizes how we compute the traces. Using the sparsity of the $M_{i}$ is important, because otherwise the vectors are too large to store and the operations are slow.

Corollary 5.3.

Fix $n,m\in\mathbb{Z}_{\geq 1}$ and $h\in[0,t]$ . Also fix an $\alpha\in(0,1)$ , and let $\chi_{\alpha}^{2}(v)$ be the $1-\alpha$ quantile of the $\chi^{2}$ distribution with $v$ degrees of freedom. An approximate $1-\alpha$ confidence interval for $\sum_{x\in\tilde{\mathcal{S}}}\big{(}\hat{p}_{t}^{\nu}(x;n,m,h)-p_{t}^{\nu}(x)\big{)}^{2}$ is $[0,U_{n}/(nm^{2})]$ , where

[TABLE]

Figures 4a and 4b (and also Figures SM6 to SM9), compare the empirical distribution of

[TABLE]

to the approximate asymptotic distribution Eq. 21, where the true traces are replaced with the sample traces from Algorithm 4. The figures also compare the sample 95% quantile to the same quantile based on Corollary 5.3, which turned out to be close.

6 Directions for future research

We demonstrated how to implement a version of conditional Monte Carlo in the context of continuous time Markov chain models for reaction networks. There are many possible directions for future research; we list three.

The method could be extended so it provides estimates of the distribution at multiple fixed time-points. The method we developed, and in particular the optimization problem we utilize to find the values of $m$ and $h$ , is tailored to the single time-point case. 2. 2.

In the method developed here the conditional expectation in Eq. 3

[TABLE]

is approximated by Monte Carlo with $m$ conditionally independent realizations. However, it could be approximated by solving the chemical master equation directly, perhaps via the finite state projection algorithm [45]. Because the solver need only integrate the system of ODEs over the time interval $[t-h,t]$ , the probability mass should not become too diffuse, thereby solving one of the major difficulties related to these solvers.

We implemented this approach and observed some increase in efficiency over the conditional Monte Carlo algorithm Algorithm 3, around a factor of three. However, the gains were only realized when an optimal value of $h$ was chosen, and we needed to test many different $h$ values in order to find the optimal value. In practice, we would need a faster method for finding the optimal parameters, similar to the optimization problem detailed in this paper. 3. 3.

As discussed in the introduction and Appendix B, the present method is not optimized for the estimation of expectations. Developing a new conditional Monte Carlo estimator tailored to that problem is a natural focus of future work.

Appendix A Proofs

A.1 Theorem regarding the expected number of reactions

Theorem A.1.

Suppose that the process $X$ is non-explosive and fix $h\in[0,t]$ and $m\in\mathbb{Z}_{\geq 1}$ . Then the expected number of reactions required to sample $\{X_{1j}\}_{j=1}^{m}$ is

[TABLE]

Proof A.2.

The number of reactions required to sample $\{X(s)\}_{s\in[a,b]}$ is

[TABLE]

where the $Y_{r}$ are independent unit-rate Poisson processes [38]. For each $r$ ,

[TABLE]

*is a martingale [12, Theorem 1.22], so the result follows. *

A.2 Proof of Theorem 3.1

For simplicity, denote $X_{ij}(t)$ as $X_{ij}$ . We start with the left-hand side of the desired equality. The monotone convergence theorem implies that we can move the expectation inside the sum, by which we mean

[TABLE]

The last line follows from the fact that the estimator $\hat{p}_{t}^{\nu}$ is unbiased. From the definition of $\hat{p}_{t}^{\nu}$ , and also basic properties of variance, the above is equal to

[TABLE]

We can also take $p_{t}^{\nu}(x)$ to be a marginal distribution. In that case, interpret sums over $x$ as sums over the lower-dimensional marginal variables. Also, view $X_{11}=X_{12}$ as being true if their coordinates corresponding to the marginal variables are equal.

A.3 Proof of Theorem 3.2

Let $\Lambda^{0,t}\in\mathbb{R}_{\geq 0}^{R}$ be the vector whose $r$ th element is $\Lambda_{r}^{0,t}$ , and let $Y^{X},Y^{Z}\in\mathbb{Z}_{\geq 0}^{R}$ be the vectors whose $r$ th elements are $Y_{r}^{X}(\Lambda_{r}^{0,t})$ and $Y_{r}^{Z}(\Lambda_{r}^{0,t})$ , respectively. Then

[TABLE]

The elements of $Y^{X}$ and $Y^{Z}$ are independent when conditioned on $\Lambda^{0,t}$ . Therefore we can expand the conditional probability into a product of probabilities, by which we mean

[TABLE]

When conditioned on $\Lambda_{r}^{0,t}$ , $Y_{r}^{X}-Y_{r}^{Z}$ is the difference of two independent Poissons with the same intensity $\Lambda_{r}^{0,t}$ . Therefore the difference follows a Skellam distribution. To summarize,

[TABLE]

Continuing from above,

[TABLE]

where the expectation is taken over $\Lambda^{0,t}$ .

If we are estimating a marginal distribution, then we need to modify the sum slightly. Let $S^{\prime}$ be the same as $S$ , except the rows corresponding to the marginalized-out variables are removed. Then replace $\text{null}(S)$ with $\text{null}(S^{\prime})$ .

A.4 Proof of Theorem 5.1

Let $\{X_{i}(t-h)\}_{i=1}^{n}$ be i.i.d. realizations of $X(t-h)$ . Define $X_{ij}(t)$ to be the state of the CTMC conditioned on $X_{ij}(t-h)=X_{i}(t-h)$ , where $1\leq j\leq m$ . For simplicity, later we will denote $X_{ij}(t)$ as just $X_{ij}$ .

Let $M_{i}\in\mathbb{Z}_{\geq 0}^{|\tilde{\mathcal{S}}|}$ , where the $k$ th element of $M_{i}$ is defined as $\sum_{j=1}^{m}\mathbbm{1}(X_{ij}=x_{k})$ . Let $\Sigma\in\mathbb{R}^{|\tilde{\mathcal{S}}|\times|\tilde{\mathcal{S}}|}$ be the covariance matrix of $M_{1}$ . The $M_{i}$ are i.i.d., so if $\Sigma$ is finite, then the usual multivariate central limit theorem implies that

[TABLE]

Let $M_{i}(x)$ denote the element if $M_{i}$ corresponding to $x$ . Then by definition, for all $x$

[TABLE]

Therefore

[TABLE]

The dot product is continuous, so the continuous mapping theorem implies that

[TABLE]

[15, Theorem 2.1] implies that the right side has the same distribution as $\sum_{\ell=1}^{|\tilde{S}|}\lambda_{\ell}Z_{\ell}^{2}$ . Let $\Sigma_{xx}$ be the element of $\Sigma$ on the diagonal corresponding to state $x$ . Then by definition

[TABLE]

$\text{Var}\left[\mathbbm{1}(X_{1j}=x)\right]=p_{t}^{\nu}(x)(1-p_{t}^{\nu}(x))$ , and the covariance simplifies when we rewrite it in terms of expectations. We get

[TABLE]

Let $x_{1}$ and $x_{2}$ be distinct states, and let $\Sigma_{x_{1},x_{2}}$ be the element whose row and column correspond to the states $x_{1}$ and $x_{2}$ , respectively. By definition

[TABLE]

Rearrange the terms in the sum to get

[TABLE]

which is equivalent to

[TABLE]

Since $x_{1}\neq x_{2}$ , $\mathbbm{1}(X_{1j}=x_{1})\mathbbm{1}(X_{1j}=x_{2})=0$ . Also, the second expectation can be rewritten as a probability. The above expression simplifies to

[TABLE]

Equation Eq. 19 simply expresses the above results with matrix-vector notation.

If we are estimating a marginal distribution, then take $\mathcal{S}$ to be the lower dimensional space corresponding to the marginal variables. Also interpret $X(t)$ as the state vector containing only the marginal variables.

A.5 Proof of Theorem 5.2

If we write out the definition of $\hat{\Sigma}_{n}$ and use the fact that the trace is linear, we can see that

[TABLE]

We use the cyclic property of the trace to rewrite the right side as

[TABLE]

Expanding the summands leads to

[TABLE]

From the definition of $\bar{M}$ , the above expression is equal to

[TABLE]

By definition, $m\hat{p}_{t}=\bar{M}$ , therefore

[TABLE]

Next consider $\text{tr}\left(\hat{\Sigma}_{n}^{2}\right)$ . We will proceed in a similar way. By definition

[TABLE]

The trace is linear, so

[TABLE]

The last line follows from the cyclic property of the trace. When we expand the summands, the right side becomes

[TABLE]

As for the claim about almost sure convergence of the traces, note that $\hat{\Sigma}_{n}\stackrel{{\scriptstyle\text{a.s.}}}{{\to}}\Sigma$ . Since matrix multiplication and the trace are continuous, the continuous mapping theorem implies the result.

A.6 Proof of Corollary 5.3

Define

[TABLE]

Since $\hat{\Sigma}_{n}\stackrel{{\scriptstyle\text{a.s.}}}{{\to}}\Sigma\text{ as }n\to\infty$ , the continuous mapping theorem and Lemma A.3 taken together imply that $U_{n}\to U$ almost surely as $n\to\infty$ . Also Theorem 5.1 says that

[TABLE]

Therefore by Slutsky’s theorem

[TABLE]

which we can rewrite as

[TABLE]

Applying the Satterthwaite approximation [52] to the right-hand side gives

[TABLE]

The result still holds for marginal distributions. We just need to remove the coordinates of $\tilde{\mathcal{S}}$ corresponding to the variables that are marginalized out.

Lemma A.3.

*Let $X_{\theta}$ be a family of random variables parameterized by $\theta\in\mathbb{R}$ with strictly increasing cumulative distribution functions $F_{\theta}$ . Suppose that for each $\theta$ , the function $F_{\theta}$ is continuous. Assume also that $F_{\theta}(x)$ is continuous in $\theta$ for each $x\in\mathbb{R}$ . Then the $1-\alpha$ quantiles of $F_{\theta}$ are also continuous in $\theta$ for all $\alpha\in(0,1)$ . *

Proof A.4.

Let $\alpha\in(0,1)$ , and let $\{\theta_{n}\}_{n=1}^{\infty}$ be a sequence that converges to $\theta$ . Define $q_{n}$ and $q$ to be the $1-\alpha$ quantiles corresponding the $\theta_{n}$ and $\theta$ , respectively. We want to show that $q_{n}$ converges to $q$ .

Let $\varepsilon>0$ . Since $\alpha\in(0,1)$ , we know that $q$ is finite. Therefore, we can choose $\underline{q}$ and $\overline{q}$ such that

[TABLE]

We want to show that $|q_{n}-q|<\varepsilon$ for all sufficiently large $n$ , so it will suffice to prove that $\underline{q}<q_{n}<\overline{q}$ for all $n$ large enough.

By assumption, $F_{\theta}(\underline{q})$ is continuous in $\theta$ , so

[TABLE]

*The inequality is strict, because $q$ is a quantile and $F_{\theta}$ is strictly increasing and $\underline{q}<q$ . Since $F_{\theta_{n}}$ is non–decreasing, $q_{n}>\underline{q}$ for all sufficiently large $n$ . We can use essentially the same argument to conclude that $q_{n}<\overline{q}$ for all $n$ large enough. *

Appendix B Expectations

The specific conditional Monte Carlo method introduced in this paper has been developed to estimate the entire distribution in a manner that is more efficient than regular Monte Carlo, as quantified by the mean integrated squared error Eq. 10 for a fixed computational budget. This does not imply that it will be more efficient in the computation of any specific expectation. In fact, in this Appendix we prove that it is necessarily less efficient in computing the first moment of a linear birth model. Specifically, we prove that for a fixed computational budget the variance of the estimator generated via the conditional Monte Carlo method is greater than or equal to the variance of the standard Monte Carlo estimator. This demonstrates that caution is required when implementing a method in a context it was not intended for.

Recall the Birth Model, which consists of the single reaction $X\xrightarrow{1}2X,$ where we have chosen a rate parameter of 1. Assuming a fixed initial condition of $X_{0}\in\mathbb{Z}_{\geq 0}$ , it is straightforward to show that

[TABLE]

For a fixed number of paths $n_{1}$ , and a point mass $X_{0}$ , the standard Monte Carlo estimator has an expected cost–quantified by the number of random variables utilized–of

[TABLE]

and a variance of $\text{Var}\left[n_{1}^{-1}\sum_{i=1}^{n_{1}}X_{i}(t)\right]=n_{1}^{-1}\text{Var}(X_{1}(t))=n_{1}^{-1}X_{0}e^{t}(e^{t}-1)$ .

For a fixed number of paths $n$ and $m$ , and a fixed parameter $h\in[0,t]$ , the expected cost of the conditional Monte Carlo estimator is

[TABLE]

The variance of the conditional Monte Carlo estimator is

[TABLE]

Using the generic result that for random variables $X$ and $Y$ on the same probability space $\text{Var}(X)=\mathbb{E}[\text{Var}(X|Y)]+\text{Var}(\mathbb{E}[X|Y])$ , we have

[TABLE]

Thus, dividing by $n\cdot m^{2}$ as in Eq. 26, the variance of the conditional Monte Carlo estimator is

[TABLE]

For a fixed $n_{1}$ , setting $\text{Cost}_{\text{CMC}}(n,m,h)=\text{Cost}_{\text{MC}}(n_{1})$ yields

[TABLE]

or

[TABLE]

Thus, for a fixed $n_{1}$ and $n$ chosen above the variance of the conditional Monte Carlo estimator is

[TABLE]

This is minimized at the boundary with $m=1$ , giving exactly the same variance as the regular Monte Carlo estimator. Thus, to summarize, for a given fixed computational cost the variance of the conditional Monte Carlo estimator must be larger than the variance of the standard Monte Carlo estimator.

Acknowledgments

We are grateful for financial support from the Army Research Office through grant W911NF-18-1-0324 and the National Science Foundation through grant DMS-2051498.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. F. Anderson , A modified next reaction method for simulating chemical systems with time dependent propensities and delays , The Journal of chemical physics, 127 (2007), p. 214107.
2[2] D. F. Anderson , Incorporating postleap checks in tau-leaping , The Journal of chemical physics, 128 (2008), p. 054103.
3[3] D. F. Anderson, D. Cappelletti, M. Koyama, and T. G. Kurtz , Non-explosivity of stochastically modeled reaction networks that are complex balanced , Bull. Math. Biol., 80 (2018), pp. 2561–2579.
4[4] D. F. Anderson, G. Craciun, and T. G. Kurtz , Product-form stationary distributions for deficiency zero chemical reaction networks , Bulletin of mathematical biology, 72 (2010), pp. 1947–1970.
5[5] D. F. Anderson, A. Ganguly, and T. G. Kurtz , Error analysis of tau-leap simulation methods , Annals of Applied Probability, 21 (2011), pp. 2226 – 2262.
6[6] D. F. Anderson and D. J. Higham , Multi-level Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics , SIAM: Multiscale Modeling and Simulation, 10 (2012), pp. 146 – 179.
7[7] D. F. Anderson, D. J. Higham, and Y. Sun , Complexity of multilevel Monte Carlo tau-leaping , SIAM J. Numer. Anal., 52 (2014), pp. 3106–3127.
8[8] D. F. Anderson, D. J. Higham, and Y. Sun , Computational complexity analysis for Monte Carlo approximations of classically scaled population processes , SIAM Multiscale Model. Simul., 16 (2018), pp. 1206–1226.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Conditional Monte Carlo for Reaction Networks

Abstract

keywords:

1 Introduction

2 Mathematical model

2.1 Examples

3 Determining the values of mmm and hhh via optimization

3.1 Computational cost model

3.2 Optimization problem

Theorem 3.1**.**

3.3 Approximating the joint probability

Theorem 3.2**.**

3.4 Approximation to the optimization problem

Proposition 3.3**.**

Proof 3.4**.**

4 Numerical results

5 A central limit theorem

Theorem 5.1**.**

Theorem 5.2**.**

Corollary 5.3**.**

6 Directions for future research

Appendix A Proofs

A.1 Theorem regarding the expected number of reactions

Theorem A.1**.**

Proof A.2**.**

A.2 Proof of Theorem 3.1

A.3 Proof of Theorem 3.2

A.4 Proof of Theorem 5.1

A.5 Proof of Theorem 5.2

A.6 Proof of Corollary 5.3

Lemma A.3**.**

Proof A.4**.**

Appendix B Expectations

Acknowledgments

3 Determining the values of $m$ and $h$ via optimization

Theorem 3.1.

Theorem 3.2.

Proposition 3.3.

Proof 3.4.

Theorem 5.1.

Theorem 5.2.

Corollary 5.3.

Theorem A.1.

Proof A.2.

Lemma A.3.

Proof A.4.