A Convex Approach to Steady State Moment Analysis for Stochastic Chemical Reactions
Yuta Sakurai, Yutaka Hori

TL;DR
This paper introduces a convex optimization method to accurately compute steady state moments of molecular counts in stochastic chemical reactions, aiding in biomolecular circuit design.
Contribution
It presents a novel convex semi-algebraic set framework for bounding moments, enabling precise predictions with semidefinite programming.
Findings
Accurately predicts mean and variance of protein copy numbers.
Provides a rigorous convex optimization approach for stochastic moment analysis.
Demonstrates effectiveness on a protein dimerization example.
Abstract
Model-based prediction of stochastic noise in biomolecular reactions often resorts to approximation with unknown precision. As a result, unexpected stochastic fluctuation causes a headache for the designers of biomolecular circuits. This paper proposes a convex optimization approach to quantifying the steady state moments of molecular copy counts with theoretical rigor. We show that the stochastic moments lie in a convex semi-algebraic set specified by linear matrix inequalities. Thus, the upper and the lower bounds of some moments can be computed by a semidefinite program. Using a protein dimerization process as an example, we demonstrate that the proposed method can precisely predict the mean and the variance of the copy number of the monomer protein.
| Reaction | Transition rate |
|---|---|
| Parameter | Value | Meaning |
|---|---|---|
| | Transcription and translation rate | |
| Protein (monomer) degradation | ||
| Dimerization rate | ||
| Total copy of DNA |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Convex Approach to Steady State Moment Analysis for Stochastic
Chemical Reactions
Yuta Sakurai and Yutaka Hori This work was supported in part by JSPS KAKENHI Grant Number JP16H07175, Okawa Foundation Research Grant under grant number 16-10, Keio Gijuku Academic Development Funds and Research Grant of Keio Leading-edge Laboratory of Science and Technology. ©2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Y. Sakurai and Y. Hori are with Department of Applied Physics and Physico-Informatics, Keio University, Japan. [email protected], [email protected]
Abstract
Model-based prediction of stochastic noise in biomolecular reactions often resorts to approximation with unknown precision. As a result, unexpected stochastic fluctuation causes a headache for the designers of biomolecular circuits. This paper proposes a convex optimization approach to quantifying the steady state moments of molecular copy counts with theoretical rigor. We show that the stochastic moments lie in a convex semi-algebraic set specified by linear matrix inequalities. Thus, the upper and the lower bounds of some moments can be computed by a semidefinite program. Using a protein dimerization process as an example, we demonstrate that the proposed method can precisely predict the mean and the variance of the copy number of the monomer protein.
1 Introduction
A grand challenge of synthetic biology is to build and control layers of artificial biomolecular reaction networks that enable complex tasks by utilizing naturally existing reaction resources in micro-scale organisms such as E. coli. In control engineering community, many theoretical tools were developed over the last decade to enable model-guided design of biomolecular circuits including a fold-change detector [1], oscillators [2, 3, 4, 5], an event detector [6], and a light-controlled feedback circuit for setpoint control [7, 8]. Other successfully implemented examples include logic gates [9], a genetic memory [10], and a communication system between cells [11], to name a few. Given the elementary modules of artificial bimolecular networks, the next step stone is to assemble the circuit parts to build multiple layers of biomolecular networks that can provide practical functions.
As is the case with mechanical and electrical engineering, guaranteeing the performance of individual circuit modules at high precision is a key to successfully building large-scale and robust biomolecular circuits. Biosystems engineering, however, has a unique challenge, in contrast with other fields of engineering, that the signal-to-noise ratio is so low that the heterogeneity of circuit states between biological cells is almost impossible to avoid. A major source of the heterogeneity is the stochastic chemical reactions that are caused by the low copy nature of reactive molecules in a small-volume reactor, that is, a biological cell. In other words, the events of molecular collision that fire reactions are more accurately captured by stochastic process rather than continuous and deterministic process governed by ordinary differential equations (ODEs). This situation raises a strong need for the theoretical tools that can rigorously certify the performance of stochastic chemical reactions using statistical norms such as mean and variance of molecular copy numbers.
The random fluctuation of the copy number of molecules can be regarded as a Markov process that follows the chemical master equation (CME) [12]. Despite a linear ODE, the CME is often hard to solve analytically because of the infinite dimensionality of the equation. A typical solution to this problem is to use Monte Carlo simulations [13]. This approach, however, requires high computational time. Moreover, a strict error bound is hard to obtain. To resolve these issues, the finite state project [14] provides a systematic way to truncate the equation with an error bound. Although useful for analyzing transient dynamics, the FSP still suffers from high dimensionality if one wants to evaluate the probability distribution near the steady state, which is often the operation point of interest.
A more direct approach to evaluating the statistics of biomolecular circuits is to use the moment equation, an infinite dimensional ODE derived from the CME. Recently developed moment closure approaches [15, 16] allow for approximately solving the ODE by order reduction technique. The central idea is to approximate the high order moments in the equation using the lower order moments to derive a finite order closed ODE. Using this method, we can efficiently and directly compute the statistics of the molecular copy numbers of biomolecular circuits. Although appealing, this approach requires assuming the underlying probability distribution, which is not apriori in practice. As a result, biocircuit designers have to engineer biomolecular circuits based on approximated statistics with unknown precision.
The objective of this paper is to propose an algebraic approach to rigorously quantifying the steady state statistics of stochastic biomolecular reactions without assuming underlying probability distributions. Specifically, we formulate a semidefinite program that computes the upper and the lower bounds of statistical values of molecular copy numbers using the moment approach [Lasserre2009]. During the review process, the authors were informed that the same approach has been explored independently by two other research groups [Ghusinga2016, Kuntz2017]. In comparison with these works, this paper presents an optimization algorithm that directly computes the upper bound of the second order central moment, or variance, without running multiple optimizations for computing raw moments, allowing for direct and tighter quantification of the upper bound. We illustrate the proposed method using a stochastic protein dimerization process.
The organization of this paper is as follows. In Section II, we introduce the CME and derive the moment equation. Then, the optimization problem for moment computation is formulated in Section III. In section IV, we analyze the mean and the variance of a stochastic dimerization process. Finally, Section V summarizes our findings and concludes this paper.
The following notations are used in this paper. . . . is the degree of the polynomial . For multivariate polynomials , .
2 Moment Dynamics of Stochastic Chemical Reactions
2.1 Model of stochastic chemical reactions
We consider a chemical reaction network consisting of species of molecules and denote the copy number of each molecular species by . There are types of reactions in the reaction network, and let be the increment of the molecular copy by the -th chemical reaction ().
Chemical reactions are caused by collisions of molecules. When the reaction volume is sufficiently large, the copy number of molecules is so large that the collision event occurs almost continuously in time. On the other hand, in small reactor systems such as microbes, the collision events tend to be stochastic as the copy number of molecules is small. This results in the stochastic fluctuation of . It is known that stochastic chemical reactions can be modeled by a Markov process. More specifically, we define as the probability that there are molecules at time . The stochastic chemical reactions can be modeled by the following Chemical Master Equation (CME) [12],
[TABLE]
where is the transition rate associated with the -th chemical reaction and is defined by
[TABLE]
The transition rate is characterized by the rate of the reaction We assume that all of the reactions are elementary as non-elementary reactions can always be decomposed into elementary reactions. For elementary reactions, the transition rate is a polynomial of ’s, and we utilize this fact in the following subsection to derive the equations of stochastic moments.
As is the vector of non-negative positive integers , the CME (1) is a linear but an infinite dimensional ODE with respect to . An analytic solution to the CME is, thus, hardly obtained except for some simple examples. In the next subsection, we derive the equations of moment dynamics to directly compute the statistical values without computing the distribution .
2.2 The dynamics of moments
In this section, we derive the model of moment dynamics based on the CME (1). First, we define a stochastic moment by
[TABLE]
where . We refer to the sum of the entries of as the order of the moment and denote by with a little abuse of notation.
The stochastic moments carry all the information of the probability distribution, and some of the low order moments are useful to define the design specification of stochastic biomolecular circuits. Figure 1 illustrates an example of the distribution of a molecular copy number of some biomolecular reaction. In this case, the analytic form of the distribution is not known, and thus it is more appropriate to specify the design goal by statistical norms such mean and the standard deviation of the distribution using the moments.
To derive the dynamics of stochastic moments, we multiply the model (1) by and calculate the expected value by taking sum over to obtain the following differential equation.
[TABLE]
where we use to abbreviate the sum over the positive orthant.
[TABLE]
As is a polynomial of , we can rewrite the equation (4) as
[TABLE]
where are the coefficients of the monomials of the polynomial
[TABLE]
Note that the range of the summation in (6) depends on the degree of the polynomial (7). In many cases, the higher order moments are necessary to characterize the low order moments, i.e., there are moments satisfying in the right-hand side of (6). More specifically, when the reactions are elementary, it is reasonable to assume that the polynomial order of the transition rates are at most two since most practical reactions are unimolecular or bimolecular reactions [17]. When or , in the right-hand side of equation (6) satisfies
[TABLE]
implying that the moment equation is closed, i.e, the dynamics of the -th order moment can be written by the -th or lower order moments. On the other hand, when one or more ’s are quadratic, i.e., ,
[TABLE]
Thus, the moment equation (6) becomes an infinite dimensional coupled linear ODE, which is hard to solve analytically.
3 Moment Analysis of Stochastic Chemical Reactions
3.1 Truncation of the moment equation
When we design stochastic chemical reactions using a genetic circuit, the specification of the circuit is often given by a set of statistical constraints such as the mean and the variance of the copy numbers of the molecules. In this paper, we consider the following problem.
Problem 1. For a given CME (1), find the lower and the upper bounds of the mean and the variance of molecular copy number at steady state.
To solve this problem, let in the equation (6), and consider the subset of linear equations by limiting the equations up to the -th order moments. Specifically,
[TABLE]
where the set is defined by
[TABLE]
In what follows, we refer to as the truncation order. When or , the number of equations is the same as the number of variables (see (8)), and thus we can compute the unique solution (if the equations are not degenerated). In the more general case where the transition rates are quadratic, the right-hand side of (10) depends on the higher order moments, i.e., , as shown in (9) Consequently, the equations become underdetermined, and we can only conclude that the solution lies on a certain hyperplane unless we consider an infinite number of equations.
To further specify the solution space of the moments , we utilize the fact that must be the moments of some (probability) measure defined on the positive orthant. In particular, we use a so-called moment condition to constrain and formulate a semidefinite program solving Problem 1.
3.2 Necessary condition for to be moments
Let be the vector that consists of all of the monomial bases satisfying , and be a matrix of the form
[TABLE]
where represents the entry-wise expected value of the matrix . In a similar manner, we define by
[TABLE]
It should be noted that the entries of the matrices and can be written with satisfying
[TABLE]
Using the matrices and , we define the following block Hankel matrices and .
[TABLE]
[TABLE]
where and are determined as follows.
[TABLE]
[TABLE]
Using and , the following proposition provides linear matrix inequality (LMI) conditions that the moments of some (probability) measure on must satisfy.
Proposition 1. [18] Consider a sequence with the set defined by (11). The sequence constitutes moments of some measure defined on the positive orthant only if
[TABLE]
The conditions (19) and (20) are called a moment condition and localized moment conditions, respectively. Combining Proposition 1 with (10), the following theorem specifies semi-algebraic conditions that the moments of the stochastic chemical reactions must satisfy.
Theorem 1. Consider the stochastic chemical reaction modeled by the equation (1). Let denote the steady state moments of the random variable . For a given truncation order , the moments satisfy the following conditions.
[TABLE]
where are the coefficients of the polynomial (7).
Theorem 1 implies that the moments of the stochastic chemical reaction governed by (1) lie in the semi-algebraic set (21). Thus, Problem 1 can be recast as a relaxation problem over the convex semi-algebraic set. In particular, we show, in the following section, that the bounds of the mean and the variance can be computed by semidefinite programming.
Remark 1. In the special case of random variable, the conditions (19) and (20) with become both necessary and sufficient for the sequence to be the moments of some measure on [18]. The associated problem is called Stieltjes moment problem named after the renowned mathematician Thomas Stieltjes. Similar conditions are available for the moments of measures defined on various domains such as , and semi-algebraic sets (see [18, 19] for example).
3.3 Semidefinite programming for mean and variance computation
Theorem 1 asserts that the lower and the upper bounds of the steady state statistics, say , can be calculated by solving the following optimization problems, respectively.
[TABLE]
[TABLE]
Note that the constrains are convex. In what follows, we show that the bounds of mean and variance, the two most important statistics of stochastic chemical reactions, can be formulated as semidefinite programs.
Mean The objective function is obviously linear, and thus the problem falls into the class of semidefinite programming. For example, if one wants to compute the mean of the copy number of the -th molecule, the objective function should be
[TABLE]
with representing the row vector whose -th entry is 1 and 0 elsewhere.
In general, the gap between the upper and the lower bounds decreases monotonically with the increase of the truncation order . Thus, there is a tradeoff between computational cost and the conservativeness of the bounds.
Upper bound of variance By definition, the variance of the copy number of the -th molecule is given by
[TABLE]
The second term of the objective function is quadratic, and the optimization problem (23) is seemingly not semidefinite programming. However, using the Schur complement, we can convert the optimization (23) into an equivalent semidefinite programming as follows.
Theorem 2. The solution of the optimization problem (23) with the objective function (25) is the same as that of the following optimization problem.
[TABLE]
The proof of this theorem can be found in Appendix A.
Lower bound of variance The lower bound of variance can be also computed by semidefinite programming with additional relaxation. Specifically, we first compute the upper bound of the mean value , and then substitute the result into of the equation (25) to obtain a linear objective function. The estimated lower bound of variance monotonically increases with the truncation order since the upper bound of the mean value monotonically decreases.
As the optimization problems shown in this subsection use convex relaxation, there may be the case where the truncation order needs to be unrealistically large to reduce the conservativeness of the bounds to a practically useful level. To the authors’ experience, however, one will only need to use to 12-th order moments to obtain a practically useful bounds for small reaction networks with to 5 molecules. The general quantitative analysis, however, is still an open question.
Remark 2. The moment closure approaches [15, 16] allow for approximating the high order moments, with , using the low order moments, with , to solve the equation (10). This approach, however, is based on assumptions on the underlying distribution . Thus, the quantification of the approximation error is not easy. On the other hand, the proposed semidefinite programs are advantageous in that they can compute the upper and the lower bounds of the statistics of stochastic chemical reactions with theoretical rigor without assuming an underlying distribution.
4 Application to a Stochastic Genetic Circuit
To illustrate the proposed semidefinite programming approach, we analyze a protein dimerization process, which is one of the simplest examples of stochastic biomolecular reactions. Consider the chemical reactions shown in Fig. 2, where there are three species of molecules, DNA, monomer protein and dimer protein, and reactions. In Fig. 2, the protein monomer is first produced from DNA, and then it is either degraded or dimerized. As a result, the copy number of protein monomer reaches a steady state at .
In what follows, our goal is to analyze the steady state mean and variance of the copy number of the protein monomer. Let the copy number of monomer protein and its -th order moment be defined by and
[TABLE]
respectively. It should be noticed that by definition. Let denote the total copy number of DNA molecules and assume that is a constant. The reaction rates of monomer production, degradation and dimerization are then defined by the polynomials in Table 1. The stoichiometry of the reactions, or the increment of the copy number of monomer protein by each reaction, is given by
[TABLE]
Suppose the values of the parameters are given by Table 2. First, we derive the moment equation (10) by expanding the polynomial (7). Specifically, let the truncation order be . Then, we have
[TABLE]
It should be noted that the low order moments and depend on the higher order moment since the transition rate is quadratic in (see Table 1). The equation (30) implies that the moments of the stochastic reactions are in the nullspace of the matrix. To further constrain the feasibility set, we use the moment condition and the localized moment condition, which are given by
[TABLE]
respectively. In practice, the truncation order needs to be determined so that the gap between the lower and the upper bounds is sufficiently small. This might require computing the optimization problems (22) and (23) multiple times.
To analyze the mean copy number of protein monomer, let the objective function be and solve the optimization problems (22) and (23). Figure 3 (A) illustrates the computed lower and the upper bounds of the mean monomer protein copy number for each truncation order . The gap between the bounds monotonically decreases with increasing , and with , we can conclude
[TABLE]
which is a sufficient resolution in practice.
Next, to calculate the upper bound of the variance, let , and solve the optimization problem (26). The lower bound is also computed using the relaxation described in Section III. The optimization result in Fig. 3 (B) shows that the variance is bounded by
[TABLE]
It should be noticed that these results are mathematically strict and thus enable rigorous quantification of biocircuit performance using the statistical norms.
Remark 3. The optimization problem was solved with SeDuMi 1.32 on MATLAB 2016b. To avoid numerical issues, we normalized the variables by constants in the above numerical example. It should be noted that optimization problems associated with the moment matrices tend to be numerically unstable as reported in [Kuntz2016].
5 Conclusion
In this paper, we have formulated the optimization problem to compute the bounds of the steady state statistics of stochastic chemical reactions in genetic circuits. First, we have introduced the steady state moment equation, which is a set of underdetermined linear equations. To restrict the solution space of the moment equation, our key idea is to use the LMI conditions, which the moments of some measure must satisfy. Consequently, we have obtained a convex relaxation problem that can be solved by semidefinite programming. A distinctive feature of the proposed approach is that it can provide mathematically rigorous upper and lower bounds of the statistics for any stochastic chemical reactions modeled by the CME. To demonstrate the method, we have analyzed the protein dimerization process and have obtained tight bounds of the mean and the variance.
Although we have only considered the steady state moments, we can extend the proposed approach to the analysis of transient moments. The authors are currently working to build a stochastic biocircuit design tool based on the specifications of transient statistics.
Appendix A Proof of Theorem 2.
Proof The optimization problem (23) is equivalent to the following problem.
[TABLE]
As the objective function is , the inequality in the constraint can be equivalently written by
[TABLE]
using Schur compliment [20]. The optimal value of (23) is thus equal to that of (26).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Guo, Y. Hori, and R. M. Murray, “Systematic design and implementation of a novel synthetic fold-change detector biocircuit in vivo,” tech. rep., California Institute of Technology, 2014.
- 2[2] M. B. Elowitz and S. Leibler, “A synthetic oscillatory network of transcriptional regulators,” Nature , vol. 403, no. 6767, pp. 335–338, 2000.
- 3[3] Y. Hori, T.-H. Kim, and S. Hara, “Existence criteria of periodic oscillations in cyclic gene regulatory networks,” Automatica , vol. 47, no. 5, pp. 1203–1209, 2011.
- 4[4] Y. Hori, M. Takada, and S. Hara, “Biochemical oscillations in delayed negative cyclic feedback: existence and profiles,” Automatica , vol. 49, no. 9, pp. 2581–2590, 2013.
- 5[5] H. Niederholtmeyer, Z. Sun, Y. Hori, E. Yeung, A. Verpoorte, R. M. Murray, and S. J. Maerkl, “Rapid cell-free forward engineering of novel genetic ring oscillators,” e Life , vol. 4, p. e 09771, 2015.
- 6[6] V. Hsiao, Y. Hori, P. W. K. Rothemund, and R. M. Murray, “A population-based temporal logic gate for timing and recording chemical events,” Molecular Systems Biology , vol. 12, no. 869, 2016.
- 7[7] A. Milias-Argeitis, S. Summers, J. Stewart-Ornstein, I. Zuleta, D. Pincus, H. El-Samad, M. Khammash, and J. Lygeros, “In silico feedback for in vivo regulation of a gene expression circuit,” Nature Biotechnology , vol. 29, no. 12, pp. 1114–1116, 2011.
- 8[8] A. Milias-Argeitis, M. Rullan, S. K. Aoki, P. Buchmann, and M. Khammash, “Automated optogenetic feedback control for precise and robust regulation of gene expression and cell growth,” Nature Communications , vol. 7, no. 12, pp. 1114–1116, 2011.
