Conditional Monte Carlo for Reaction Networks
David F. Anderson, Kurt W. Ehlert

TL;DR
This paper introduces a conditional Monte Carlo estimator for reaction network models that improves probability estimation accuracy in high-dimensional, stochastic systems with small species counts, while maintaining simplicity.
Contribution
The authors develop a novel conditional Monte Carlo estimator with parameter optimization and provide theoretical guarantees including a central limit theorem.
Findings
Enhanced estimator accuracy over classical Monte Carlo methods.
Efficient parameter approximation for optimal performance.
Theoretical validation via a central limit theorem.
Abstract
Reaction networks are often used to model interacting species in fields such as biochemistry and ecology. When the counts of the species are sufficiently large, the dynamics of their concentrations are typically modeled via a system of differential equations. However, when the counts of some species are small, the dynamics of the counts are typically modeled stochastically via a discrete state, continuous time Markov chain. A key quantity of interest for such models is the probability mass function of the process at some fixed time. Since paths of such models are relatively straightforward to simulate, we can estimate the probabilities by constructing an empirical distribution. However, the support of the distribution is often diffuse across a high-dimensional state space, where the dimension is equal to the number of species. Therefore generating an accurate empirical distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Gene Regulatory Network Analysis · Stochastic processes and statistical mechanics
\newsiamremark
remarkRemark \newsiamremarkhypothesisHypothesis
\newsiamthmclaimClaim \headersConditional Monte Carlo for Reaction NetworksD. F. Anderson and K. W. Ehlert
\externaldocumentsupplement
Conditional Monte Carlo for Reaction Networks
David F. Anderson Department of Mathematics, University of Wisconsin-Madison () [email protected]
Kurt W. Ehlert Department of Mathematics, University of Wisconsin-Madison () [email protected]
Abstract
Reaction networks are often used to model interacting species in fields such as biochemistry and ecology. When the counts of the species are sufficiently large, the dynamics of their concentrations are typically modeled via a system of differential equations. However, when the counts of some species are small, the dynamics of the counts are typically modeled stochastically via a discrete state, continuous time Markov chain.
A key quantity of interest for such models is the probability mass function of the process at some fixed time. Since paths of such models are relatively straightforward to simulate, we can estimate the probabilities by constructing an empirical distribution. However, the support of the distribution is often diffuse across a high-dimensional state space, where the dimension is equal to the number of species. Therefore generating an accurate empirical distribution can come with a large computational cost.
We present a new Monte Carlo estimator that fundamentally improves on the “classical” Monte Carlo estimator described above. It also preserves much of classical Monte Carlo’s simplicity. The idea is basically one of conditional Monte Carlo. Our conditional Monte Carlo estimator has two parameters, and their choice critically affects the performance of the algorithm. Hence, a key contribution of the present work is that we demonstrate how to approximate optimal values for these parameters in an efficient manner. Moreover, we provide a central limit theorem for our estimator, which leads to approximate confidence intervals for its error.
keywords:
Monte Carlo, continuous time Markov chain, chemical master equation, nonparametric density estimation, reaction networks
{AMS}
65C05, 60J28, 62G07
1 Introduction
Systems of interacting species appear often in nature. To better understand the dynamics of such systems, we can model them as reaction networks with deterministic or stochastic dynamics [12, 28, 35, 58]. If the counts of the constituent species are high, then the dynamics are commonly modeled by a system of differential equations [12, 24, 58]. However, if the count of any species is small, then a stochastic model with a discrete state space is more appropriate [11, 12, 42, 50, 55, 58].
Since the amount of each species is necessarily nonnegative and discrete, the state space of the stochastic process is a subset of , where is the number of species types. Let be the distribution of the initial state, which is often a point mass distribution, and suppose we are interested in the distribution of the state of the process at some fixed time . That is, if is the state of the process at time , then we would like to know the value of
[TABLE]
In general, finding the exact values of is extremely difficult. More precisely, the authors are not aware of any general class of models for which can be solved for explicitly, with the exception of linear, or first-order, models [33] or, more generally, models that satisfy a dynamical and restricted complex-balanced condition and admit a time-dependent product form Poisson distribution [13]. However, there are many numerical methods that give an estimate. One type of approach is to approximately solve Kolmogorov’s forward equation, which is called the chemical master equation (CME) in much of the biology and chemistry literature. The CME can be written as
[TABLE]
where is the number of reactions in the system, is the intensity (or propensity) function for the th reaction, gives the net change in the counts of the species due to an occurrence of the th reaction, and the initial distribution is given by . See Section 2 for the precise specification of the model, including terminology.
For most models of interest, solving (1) entails solving a high-dimensional (often infinite-dimensional) system of linear ordinary differential equations. Solving such a system directly is almost always very difficult, so there has been a considerable amount of research devoted to the development of fast and accurate approximate algorithms. The general approach for many such algorithms is to first truncate the state space of the system to a smaller subset. This truncation makes solving the problem computationally feasible, at the cost of introducing a controllable error to the solution. After truncation, the new system of ODEs must be solved.
There is currently a wide variety of methods for performing both the truncation step and solution step. In particular, there is the finite state projection algorithm [45, 56], the uniformization method [21], sliding window methods [32, 59], the sparse grid method [31], the radial basis function approximation [37], a class of spectral methods [23, 34], and methods that specialize to systems with multiple scales [16, 19, 39, 40, 48]. Moreover, there are tensor methods [36, 53, 57] that represents the truncated CME with tensors.
As an alternative to approximating (1) directly via the methods above, we can take a Monte Carlo approach. That is, we can generate independent and identically distributed (i.i.d.) realizations of the process , denoted by , and use the Monte Carlo estimator
[TABLE]
where is the expectation under the initial distribution and starting time of zero. By the strong law of large numbers, the approximation becomes an equality as goes to infinity.
To utilize the above estimator, we need to simulate exact realizations of the process over the time interval , and there are many methods to choose from. In particular, there is the Gillespie algorithm, also called the stochastic simulation algorithm, [26], the next reaction method [25], and the modified next reaction method [1], which are all straightforward to implement and often have similar efficiency. For our numerical results in the later sections, we used the modified next reaction method.
One drawback of using the Monte Carlo estimator Eq. 2 to approximate the solution to the CME (1) is that huge numbers of simulations are generally required to achieve a high level of accuracy. That said, the Monte Carlo estimator has at least two distinct advantages when compared against the methods that approximately solve the CME directly: it is very simple to implement and it is substantially less sensitive to the dimension of the state space.
There are two natural ways to improve upon a Monte Carlo estimator. The first way is to decrease the time required to generate realizations of the random samples (i.e., the process in our case). Lowering the time required to generate paths of the processes that we are interested in has been an active area of research for almost two decades [1, 25, 41, 43, 49, 54]. Moreover, researchers have also designed efficient algorithms that generate approximate paths that trade some accuracy for speed [2, 10, 17, 18, 22, 27, 30, 51].
The second way to improve upon a Monte Carlo estimator, and the focus of this article, is to instead lower the variance of the estimator itself. There are many broadly applicable variance reduction techniques, including coupling methods, control variates, stratified sampling, antithetic random variables, quasi-Monte Carlo, and conditional Monte Carlo [29, 47].
In this paper, we utilize a form of conditional Monte Carlo to reduce the variance. Briefly, conditional Monte Carlo follows from the observation that for one-dimensional random variables and , defined on the same probability space, we have , and , so long as all the expectations are well defined [14]. That is, one can always reduce variance by conditioning. Of course, the “art” is in the selection of an appropriate random variable .
Returning to our situation, define as the expectation of taken with respect to the initial state distribution and starting time . That is, . If is a point-mass distribution at , then we write . Fix , then
[TABLE]
where the are i.i.d. realizations of . A natural estimator for the right hand side of the above equation is
[TABLE]
where we generate the in the following manner:
- •
simulate independent realizations of the process over the time interval , each with an initial value determined by , and denote the th realization by ,
- •
for each , generate conditionally independent realizations over the time interval , each of which has initial state . Denote the th such realization by .
Note that for each , the process is equal to over the interval . See Fig. 1.
Since and are independent for , the strong law of law numbers implies that with probability one we have
[TABLE]
Hereafter we will refer to the original estimator Eq. 2 as classical Monte Carlo, and the new estimator Eq. 4 as conditional Monte Carlo. The conditional Monte Carlo estimator has two unspecified parameters, denoted and . The number of branches is determined by , and the time at which branching occurs is controlled by . If and are fixed, then the remaining parameter is simply chosen large enough such that the estimator’s variance is below some desired threshold. If , , or , then the conditional and classical Monte Carlo estimators are the same. If and , then for the same computational cost as classical Monte Carlo, the conditional Monte Carlo estimator obtains more observations of . We would like to choose the values of and such that, in some sense, our new estimator is more efficient than classical Monte Carlo. In Section 3, we provide an algorithm for finding optimal values of and , which is the key contribution of this article.
The distributions produced by our conditional Monte Carlo method can, of course, be used to construct unbiased estimates of moments and other expectations. However, we stress here that our new estimator is optimized for estimating the entire distribution of the process and not for estimating expectations. Estimating expectations is a separate–and very important–problem that has seen a large amount of research activity over the past decade (see [5, 6, 7, 8, 9, 17, 18, 44] for a subset of works focusing on this problem). In fact, in Appendix B we prove that the type of conditioning we carry out here (optimized for estimating the entire distribution) can not be more efficient than standard Monte Carlo for the estimation of the expected value of a linear birth process at some future time . This may seem surprising at first since conditioning always reduces the variance (as discussed above). However, in the present method we also use Monte Carlo to solve for the conditional expectation, which has its own cost. Determining better, and perhaps optimal, ways to estimate expectations via conditional Monte Carlo in the present context is a worthy direction of future research and will be discussed further in Section 6 and Appendix B.
The remainder of the article is organized as follows. In Section 2, we define the continuous time Markov chain model of reaction networks. Then in Section 3, we present an algorithm for finding the optimal values of and , and also the full algorithm, Algorithm 3, for the implementation of the conditional Monte Carlo estimator. Next, in Section 4, we give numerical results demonstrating the order of magnitude improvement that can be obtained with the use of conditional Monte Carlo in the current context. In Section 5, we derive a central limit theorem for the error of the conditional Monte Carlo estimator and then test it on examples. Finally, in Section 6, we summarize our results and suggest ideas for future work. The proofs of the main results are in Appendix A. The supplementary material contain more figures related to numerical results. An example MATLAB implementation of the conditional Monte Carlo algorithm is at https://github.com/kehlert/conditional_monte_carlo_example.
2 Mathematical model
Suppose our reaction network has types of species and reactions. For ,
- (i)
we will denote by the reaction vector for the th reaction, meaning that if the th reaction occurs at time , and the process is currently in state , then the new state becomes ;
- (ii)
we will denote by the intensity, or propensity, function of the th reaction.
A standing assumption is that if , which preserves the non-negativity of the components. We let be a continuous time Markov chain (CTMC) whose transition rate from state to is
[TABLE]
Hence, is a Markov process with infinitesimal generator where is a bounded function with compact support. We will denote our process by , so that is the vector whose th component gives the count of species at time .
The most common choice of intensity function is stochastic mass action kinetics. Suppose that we require copies of species for the th reaction to occur. Then we say that has stochastic mass action kinetics if
[TABLE]
for some , which is called the rate constant of the reaction. For example, for the reaction , where , , and are the species types in our model system, the reaction vector is and , in which case where we have ordered the species alphabetically.
None of our theoretical results assume that the has the above mass action form, but the models we tested do use it unless otherwise noted.
One well–known representation for the stochastic process is the random time change representation of Thomas Kurtz [11, 12, 38]
[TABLE]
where is the initial state and the are independent unit-rate Poisson processes. We will make use of the above representation in some of our proofs.
2.1 Examples
In the subsequent sections, we intersperse numerical results, and below is a list of all the example models we used. The species to the left of the arrows are the reactants (giving the counts of the species consumed in the reaction), and those to the right are the products. The numbers above the arrows are the rate constants . Unless otherwise noted, for every model and reaction we define the intensities with Eq. 5.
- (i)
Birth
The initial state is and . The single reaction is
[TABLE]
Following Eq. 5, the rate of the reaction is . 2. (ii)
Birth–Death
The initial state is and . There are two reactions
[TABLE]
Following Eq. 5, the rates of the reactions are and , respectively. 3. (iii)
Lotka–Volterra
This model is also often called the predator-prey model. The initial state is and . We set . The reactions are
[TABLE]
Following Eq. 5, and after ordering the species as , the rates of the reactions are , , and , respectively. 4. (iv)
Dimerization
In this model, is translated into the protein , which then dimerizes into , and the dimer accumulates over time. The initial state for every species is zero except for . We set . The reactions are
[TABLE]
Following Eq. 5, and after ordering the species as , the rates of the reactions are , , , , and respectively. 5. (v)
Toggle
Each species represses the production of the other, which leads to a probability mass function that is multimodal. The initial state is . We set . The reactions are
[TABLE]
For this model, the first and third intensity functions are not chosen to be mass action. Specifically, we let
[TABLE]
where we again ordered the species as . 6. (vi)
Fast/Slow
and quickly convert into one another, and slowly turns into . The initial state is and . We set . The reactions are
[TABLE]
Following Eq. 5, and after ordering the species as , the rates are , and , respectively.
3 Determining the values of and via optimization
The conditional Monte Carlo estimator Eq. 4 is of little value without knowledge of which values of and to use. In this section, we will show that appropriate values can be found by numerically solving an easy optimization problem.
Recall that the distribution of the process is denoted by , and we denote an estimate of this distribution by . We will measure the quality of the estimation via the mean integrated squared error (MISE), which is
[TABLE]
Note that if is constructed via our conditional Monte Carlo estimator, then it, and by extension , is a function of , and . Suppose we have a fixed computational budget, which we denote as . We then want to choose the values of , , and so that we minimize MISE subject to our budget constraint . We choose the squared error in (7), as opposed to the total variation norm or some other error, as this choice was more amenable to analysis, especially in the derivation of the central limit theorem in Section 5.
3.1 Computational cost model
Assuming that our model is non-explosive111A process is said to explode if there are an infinite number of transitions in a finite amount of time. A process is said to be non-explosive if the probability of an explosion is zero for all initial distributions [3, 46]. the expected number of reactions required to generate is given by
[TABLE]
where (see Theorem A.1). Hence, the expected computational cost for our conditional Monte Carlo estimator is
[TABLE]
where is an unknown constant.
Since we cannot generally evaluate the expectations in the cost model Eq. 8, as this would be as difficult as the problem we are attempting to solve, we need to estimate them. To do so, fix a relatively small and simulate i.i.d. paths . Then the expectations are approximately equal to
[TABLE]
Importantly, for the fixed set of paths, the values Eq. 9 can be computed for a variety of different values. The process is piecewise constant, and therefore so is . Thus, for any value of , we can easily compute the integrals so long as we have stored the jump times of and the value of at each jump.
3.2 Optimization problem
Given a reaction network, our goal is to find values of , , and that minimize the mean integrated squared error (MISE) Eq. 7 for our conditional Monte Carlo estimator (4) while staying within our computational budget of . More precisely, we want to solve the following optimization problem
[TABLE]
subject to
[TABLE]
The following theorem will allow us to transform the above optimization problem into a more solvable form.
Theorem 3.1**.**
Suppose the process is non-explosive. For any fixed and
[TABLE]
The proof of Theorem 3.1 can be found in Section A.2.
If we allow to be continuous, then we can use the constraint Eq. 11 to solve for , and subsequently eliminate the constraint by substitution. This leads to a simpler optimization problem. In particular, let
[TABLE]
Then the original optimization problem Eqs. 10 and 11 is equivalent to
[TABLE]
Note that both and have dropped out of the optimization problem.
There are three terms in that we must know, or be able to approximate, in order to solve Eq. 12.
- •
The expectations of the integrals. We discussed how to approximate these in Section 3.1.
- •
The sum . However, we note that is the probability that two independent paths end up in the same state at time . For many models, including the ones we tested, that sum is much smaller than and is close to zero. Thus for our examples, we replace the sum with zero and make that our general recommendation.
- •
The term , whose approximation is the subject of the next section.
Note that there are many models for which will not be near zero. However, for such models a small number of states will necessarily have a large probability. An example of such a model would be a Birth-Death model, as in Section 2.1, with input rate 1 and output rate 1. Such a model has a stationary distribution that is Poisson with a parameter of 1 [4], and so for large the distribution will concentrate on the set . Other examples where is not small include those with extinction events. For such models, it would not be appropriate to set this term to zero. However, for models with diffuse probability mass functions, i.e., those models for which estimating is difficult and are the focus of this paper, the assumption will often be valid.
3.3 Approximating the joint probability
In order to optimize the objective function in Eq. 12, we need to know, or be able to quickly approximate, the term . The following theorem, proven in Section A.3, will allow us to make a good approximation, without requiring any additional simulations. The theorem makes use of the Skellam distribution, which is the distribution of the difference between two independent Poisson random variables with parameters and , respectively.
Theorem 3.2**.**
Let be the matrix whose th column is and let be the right nullspace of restricted to integer values. Let and satisfy
[TABLE]
where the and are independent, unit-rate Poisson processes. Assume that is non-explosive. For each and , denote
[TABLE]
and let have the Skellam( distribution. Then
[TABLE]
Note that is the process Eq. 6 that is of interest to us. Returning to our setup, if we assume that
[TABLE]
which should be valid for small , then Theorem 3.2 leads to an approximation of . In particular, we may sample paths and for the th such path define
[TABLE]
Then , where
[TABLE]
and is a finite subset of .
To find , we use the “Algorithm for Solving the Linear Diophantine Equation Problem” from section 1.5.2 of [20]. In general, the algorithm finds solutions to linear equations of the form for rational and . In our case, we enumerate solutions to for . Generally, there are infinitely many solutions, however the right-hand side of (14) is always maximized at , and decreases as moves away from [math]. Thus we approximate (14) by starting at and enumerating all “nearby” solutions. Algorithm 1 shows how to apply the algorithm from [20] to our particular problem. In all of our numerical examples, we chose in Algorithm 1.
3.4 Approximation to the optimization problem
By using the joint probability approximation Eq. 14, we can approximate the function in the optimization problem Eq. 12. In particular, let
[TABLE]
where , and the are independent paths of . Then we may substitute with and our new optimization problem is the following:
[TABLE]
Note that above we have allowed to be real–valued, as opposed to integer valued. This allows us to use continuous optimization algorithms, which generally converge more rapidly. According to Figure SM1, which shows for many values of and , does not change too quickly with , so allowing to range over the reals instead of the integers should not change the optimal values of and appreciably.
It is important to know when the optimization problem Eq. 16 has a finite solution. In the proposition below, we show that a solution necessarily exists when is larger than the approximation used for . Since we approximate the sum with zero, we may conclude that a finite solution always exists in our setup.
Proposition 3.3**.**
Let be our approximation to . If for all , then Eq. 16 has a finite solution.
Proof 3.4**.**
*Since the integrals are nonnegative, is in a compact domain, depends continuously on and , and , a finite solution exists. *
Algorithm 3 outlines the full conditional Monte Carlo algorithm, which brings together all of the individual pieces of the algorithm that we previously discussed.
4 Numerical results
In this section, we present numerical results demonstrating the improvement in accuracy, quantified via the mean integrated squared error (7), that comes from using our conditional Monte Carlo estimator instead of the classical Monte Carlo estimator. In particular, when near–optimal values of and are utilized, the accuracy often improves by an order of magnitude for a fixed computational budget. Moreover, we show that the function of Eq. 16 is indeed a very good approximation for of Eq. 12 for the examples we considered, allowing us to conclude that the values of and our method produces are near–optimal.
The following steps were carried out on each of our test examples. First, we fixed an integer and computed the classical Monte Carlo estimator
[TABLE]
For all models, we used . We also recorded the number of random variates used in generating , which served as the budget in the computational cost constraint Eq. 11.
After obtaining , we computed the conditional Monte Carlo estimator
[TABLE]
for various pairs of and , and was allowed to increase until the conditional estimator used essentially the same number of random variates as the classical Monte Carlo estimator. All random variates generated for the conditional estimators were independent of those utilized for the classical estimator.
Next, for both classical and conditional Monte Carlo, we computed the integrated squared error
[TABLE]
where was a large fixed subset of the state space, and was either the classical or conditional Monte Carlo estimate. The ISE is itself a random variable, and so we approximated the mean integrated square error (MISE) by averaging 100 independent samples of the ISE.
The exact values of were unknown. Thus the values were estimated with conditional Monte Carlo with a large value of (we used ), and with and chosen so that they approximately minimize the MISE.
Finally, we denote by our estimate of the classical Monte Carlo MISE, and, for a given and , we denote by the conditional version. For each model, and for each choice of and , an “empirical error improvement” was computed as the following ratio
[TABLE]
where a number greater than one implies that conditional Monte Carlo has a lower MISE than classical Monte Carlo when given the same computational budget. These values, one for each pair of and , can then be plotted. In the top half of Figures 2 and 3 (and Figures SM2 to SM5), we display these values with a heatmap. Of particular interest is the order of magnitude improvement in computational efficiency we see with the conditional Monte Carlo estimator as compared to classical Monte Carlo when well–chosen values of and are utilized. In particular, for the Lotka-Volterra model we see a 40-fold improvement, for the dimerization model we see a 20-fold improvement, for the toggle model we see a 20-fold improvement, and for the fast/slow model we see a 20-fold improvement. For the birth and birth–death models we see more modest improvements in computational efficiency, but this can be explained by the simplicity of these models which makes classical Monte Carlo sufficient for the task at hand. In particular, one promising aspect of the present work comes into focus with these numerical results: the more complicated the model, and the larger and more diffuse the distribution of the model (which is where other methods, including those that approximately solve the chemical master equation directly, struggle), the better the performance of the conditional Monte Carlo estimator.
In practice, we are not given the optimal values of the parameters and , so we find them via the optimization problem Eq. 16. In each of the bottom portions of Figures 2 and 3 (and Figures SM2 to SM5), we provide the values of for the different pairs of and . We report the inverse so that the heatmap will agree qualitatively with the top portion of the figures (higher values are desirable). We also normalized the values by multiplying them by , which does not affect the results of the optimization problem in any way. To generate each value we first sampled paths, which then allowed us to compute and as detailed in the previous section. We could then use these values to compute via Eq. 15.
Note that the empirical error improvement and do not need to have the same value for a pair of and . The important thing is that the maximizer of the empirical error improvement is similar to the minimizer of . The heatmaps do indeed suggest that the true and approximate optimization problems have similar solutions. What is also clear from these numerical results is that even if and slightly deviate from their optimal values, we still get a substantial improvement.
We stress that such heatmaps do not need to be made by anyone who uses the conditional Monte Carlo algorithm. They are only used here to demonstrate that the optimization problem Eq. 16 can be safely used to find the near–optimal values of and , which can then be used to construct the desired estimator Eq. 4 via Algorithm 3.
5 A central limit theorem
In this section, we will show how to obtain an approximate one-sided confidence interval for the integrated squared error Eq. 17 without running more simulations. Specifically, for a fixed (presumably large) finite subset of the state space , a fixed , and large , we want to find a sequence of positive constants and a constant such that
[TABLE]
where is allowed to depend on and . The following central limit theorem will lead us to values for and .
Theorem 5.1**.**
Fix and . Let be the state space of the continuous time Markov chain, and let be a finite subset of . Choose an enumeration of and denote it . Let with their th elements equal to and , respectively. Let
[TABLE]
where is the diagonal matrix with along its diagonal, and is a matrix where . Then
[TABLE]
*where are the eigenvalues of and . *
is usually an enormous matrix, so we do not want to store it, much less compute its eigenvalues. The Satterthwaite approximation [52] says that
[TABLE]
where denotes a random variable with degrees of freedom. The approximation is obtained by matching the first two moments of the linear combination (above left-hand side) and the chi-squared distribution (above right-hand side). The advantage of the approximation is that we can estimate and without storing explicitly or computing its eigenvalues.
Theorem 5.2**.**
Fix and . Let , , and be defined as in Theorem 5.1. For , let , and set its th element to (the are defined in Section 1). Let be the usual sample covariance matrix of . Specifically,
[TABLE]
where . Then
[TABLE]
and
[TABLE]
Furthermore
[TABLE]
For the models we tested, the optimal value of was only moderately large (on the order of 10 to 100), and the indicator in the summand of is zero for many values of . Whenever those two conditions hold, sparse. Consequently, storing does not require too much memory, and the terms and are cheap to compute. Algorithm 4 summarizes how we compute the traces. Using the sparsity of the is important, because otherwise the vectors are too large to store and the operations are slow.
Corollary 5.3**.**
Fix and . Also fix an , and let be the quantile of the distribution with degrees of freedom. An approximate confidence interval for \sum_{x\in\tilde{\mathcal{S}}}\big{(}\hat{p}_{t}^{\nu}(x;n,m,h)-p_{t}^{\nu}(x)\big{)}^{2} is , where
[TABLE]
Figures 4a and 4b (and also Figures SM6 to SM9), compare the empirical distribution of
[TABLE]
to the approximate asymptotic distribution Eq. 21, where the true traces are replaced with the sample traces from Algorithm 4. The figures also compare the sample 95% quantile to the same quantile based on Corollary 5.3, which turned out to be close.
6 Directions for future research
We demonstrated how to implement a version of conditional Monte Carlo in the context of continuous time Markov chain models for reaction networks. There are many possible directions for future research; we list three.
The method could be extended so it provides estimates of the distribution at multiple fixed time-points. The method we developed, and in particular the optimization problem we utilize to find the values of and , is tailored to the single time-point case. 2. 2.
In the method developed here the conditional expectation in Eq. 3
[TABLE]
is approximated by Monte Carlo with conditionally independent realizations. However, it could be approximated by solving the chemical master equation directly, perhaps via the finite state projection algorithm [45]. Because the solver need only integrate the system of ODEs over the time interval , the probability mass should not become too diffuse, thereby solving one of the major difficulties related to these solvers.
We implemented this approach and observed some increase in efficiency over the conditional Monte Carlo algorithm Algorithm 3, around a factor of three. However, the gains were only realized when an optimal value of was chosen, and we needed to test many different values in order to find the optimal value. In practice, we would need a faster method for finding the optimal parameters, similar to the optimization problem detailed in this paper. 3. 3.
As discussed in the introduction and Appendix B, the present method is not optimized for the estimation of expectations. Developing a new conditional Monte Carlo estimator tailored to that problem is a natural focus of future work.
Appendix A Proofs
A.1 Theorem regarding the expected number of reactions
Theorem A.1**.**
Suppose that the process is non-explosive and fix and . Then the expected number of reactions required to sample is
[TABLE]
Proof A.2**.**
The number of reactions required to sample is
[TABLE]
where the are independent unit-rate Poisson processes [38]. For each ,
[TABLE]
*is a martingale [12, Theorem 1.22], so the result follows. *
A.2 Proof of Theorem 3.1
For simplicity, denote as . We start with the left-hand side of the desired equality. The monotone convergence theorem implies that we can move the expectation inside the sum, by which we mean
[TABLE]
The last line follows from the fact that the estimator is unbiased. From the definition of , and also basic properties of variance, the above is equal to
[TABLE]
We can also take to be a marginal distribution. In that case, interpret sums over as sums over the lower-dimensional marginal variables. Also, view as being true if their coordinates corresponding to the marginal variables are equal.
A.3 Proof of Theorem 3.2
Let be the vector whose th element is , and let be the vectors whose th elements are and , respectively. Then
[TABLE]
The elements of and are independent when conditioned on . Therefore we can expand the conditional probability into a product of probabilities, by which we mean
[TABLE]
When conditioned on , is the difference of two independent Poissons with the same intensity . Therefore the difference follows a Skellam distribution. To summarize,
[TABLE]
Continuing from above,
[TABLE]
where the expectation is taken over .
If we are estimating a marginal distribution, then we need to modify the sum slightly. Let be the same as , except the rows corresponding to the marginalized-out variables are removed. Then replace with .
A.4 Proof of Theorem 5.1
Let be i.i.d. realizations of . Define to be the state of the CTMC conditioned on , where . For simplicity, later we will denote as just .
Let , where the th element of is defined as . Let be the covariance matrix of . The are i.i.d., so if is finite, then the usual multivariate central limit theorem implies that
[TABLE]
Let denote the element if corresponding to . Then by definition, for all
[TABLE]
Therefore
[TABLE]
The dot product is continuous, so the continuous mapping theorem implies that
[TABLE]
[15, Theorem 2.1] implies that the right side has the same distribution as . Let be the element of on the diagonal corresponding to state . Then by definition
[TABLE]
, and the covariance simplifies when we rewrite it in terms of expectations. We get
[TABLE]
Let and be distinct states, and let be the element whose row and column correspond to the states and , respectively. By definition
[TABLE]
Rearrange the terms in the sum to get
[TABLE]
which is equivalent to
[TABLE]
Since , . Also, the second expectation can be rewritten as a probability. The above expression simplifies to
[TABLE]
Equation Eq. 19 simply expresses the above results with matrix-vector notation.
If we are estimating a marginal distribution, then take to be the lower dimensional space corresponding to the marginal variables. Also interpret as the state vector containing only the marginal variables.
A.5 Proof of Theorem 5.2
If we write out the definition of and use the fact that the trace is linear, we can see that
[TABLE]
We use the cyclic property of the trace to rewrite the right side as
[TABLE]
Expanding the summands leads to
[TABLE]
From the definition of , the above expression is equal to
[TABLE]
By definition, , therefore
[TABLE]
Next consider . We will proceed in a similar way. By definition
[TABLE]
The trace is linear, so
[TABLE]
The last line follows from the cyclic property of the trace. When we expand the summands, the right side becomes
[TABLE]
As for the claim about almost sure convergence of the traces, note that . Since matrix multiplication and the trace are continuous, the continuous mapping theorem implies the result.
A.6 Proof of Corollary 5.3
Define
[TABLE]
Since , the continuous mapping theorem and Lemma A.3 taken together imply that almost surely as . Also Theorem 5.1 says that
[TABLE]
Therefore by Slutsky’s theorem
[TABLE]
which we can rewrite as
[TABLE]
Applying the Satterthwaite approximation [52] to the right-hand side gives
[TABLE]
The result still holds for marginal distributions. We just need to remove the coordinates of corresponding to the variables that are marginalized out.
Lemma A.3**.**
*Let be a family of random variables parameterized by with strictly increasing cumulative distribution functions . Suppose that for each , the function is continuous. Assume also that is continuous in for each . Then the quantiles of are also continuous in for all . *
Proof A.4**.**
Let , and let be a sequence that converges to . Define and to be the quantiles corresponding the and , respectively. We want to show that converges to .
Let . Since , we know that is finite. Therefore, we can choose and such that
[TABLE]
We want to show that for all sufficiently large , so it will suffice to prove that for all large enough.
By assumption, is continuous in , so
[TABLE]
*The inequality is strict, because is a quantile and is strictly increasing and . Since is non–decreasing, for all sufficiently large . We can use essentially the same argument to conclude that for all large enough. *
Appendix B Expectations
The specific conditional Monte Carlo method introduced in this paper has been developed to estimate the entire distribution in a manner that is more efficient than regular Monte Carlo, as quantified by the mean integrated squared error Eq. 10 for a fixed computational budget. This does not imply that it will be more efficient in the computation of any specific expectation. In fact, in this Appendix we prove that it is necessarily less efficient in computing the first moment of a linear birth model. Specifically, we prove that for a fixed computational budget the variance of the estimator generated via the conditional Monte Carlo method is greater than or equal to the variance of the standard Monte Carlo estimator. This demonstrates that caution is required when implementing a method in a context it was not intended for.
Recall the Birth Model, which consists of the single reaction where we have chosen a rate parameter of 1. Assuming a fixed initial condition of , it is straightforward to show that
[TABLE]
For a fixed number of paths , and a point mass , the standard Monte Carlo estimator has an expected cost–quantified by the number of random variables utilized–of
[TABLE]
and a variance of .
For a fixed number of paths and , and a fixed parameter , the expected cost of the conditional Monte Carlo estimator is
[TABLE]
The variance of the conditional Monte Carlo estimator is
[TABLE]
Using the generic result that for random variables and on the same probability space , we have
[TABLE]
Thus, dividing by as in Eq. 26, the variance of the conditional Monte Carlo estimator is
[TABLE]
For a fixed , setting yields
[TABLE]
or
[TABLE]
Thus, for a fixed and chosen above the variance of the conditional Monte Carlo estimator is
[TABLE]
This is minimized at the boundary with , giving exactly the same variance as the regular Monte Carlo estimator. Thus, to summarize, for a given fixed computational cost the variance of the conditional Monte Carlo estimator must be larger than the variance of the standard Monte Carlo estimator.
Acknowledgments
We are grateful for financial support from the Army Research Office through grant W911NF-18-1-0324 and the National Science Foundation through grant DMS-2051498.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. F. Anderson , A modified next reaction method for simulating chemical systems with time dependent propensities and delays , The Journal of chemical physics, 127 (2007), p. 214107.
- 2[2] D. F. Anderson , Incorporating postleap checks in tau-leaping , The Journal of chemical physics, 128 (2008), p. 054103.
- 3[3] D. F. Anderson, D. Cappelletti, M. Koyama, and T. G. Kurtz , Non-explosivity of stochastically modeled reaction networks that are complex balanced , Bull. Math. Biol., 80 (2018), pp. 2561–2579.
- 4[4] D. F. Anderson, G. Craciun, and T. G. Kurtz , Product-form stationary distributions for deficiency zero chemical reaction networks , Bulletin of mathematical biology, 72 (2010), pp. 1947–1970.
- 5[5] D. F. Anderson, A. Ganguly, and T. G. Kurtz , Error analysis of tau-leap simulation methods , Annals of Applied Probability, 21 (2011), pp. 2226 – 2262.
- 6[6] D. F. Anderson and D. J. Higham , Multi-level Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics , SIAM: Multiscale Modeling and Simulation, 10 (2012), pp. 146 – 179.
- 7[7] D. F. Anderson, D. J. Higham, and Y. Sun , Complexity of multilevel Monte Carlo tau-leaping , SIAM J. Numer. Anal., 52 (2014), pp. 3106–3127.
- 8[8] D. F. Anderson, D. J. Higham, and Y. Sun , Computational complexity analysis for Monte Carlo approximations of classically scaled population processes , SIAM Multiscale Model. Simul., 16 (2018), pp. 1206–1226.
