Parallel replica dynamics method for bistable stochastic reaction   networks: simulation and sensitivity analysis

Ting Wang; Petr Plech\'a\v{c}

arXiv:1705.06807·math.NA·January 17, 2018

Parallel replica dynamics method for bistable stochastic reaction networks: simulation and sensitivity analysis

Ting Wang, Petr Plech\'a\v{c}

PDF

TL;DR

This paper introduces a parallel replica method to efficiently sample the stationary distribution of bistable stochastic reaction networks, enabling better understanding of their long-term behavior and sensitivity analysis.

Contribution

The paper presents a novel application of the parallel replica method to stochastic reaction networks, improving sampling efficiency and integrating sensitivity analysis.

Findings

01

Efficient sampling of rare transitions in bistable networks.

02

Accurate sensitivity analysis using combined ParRep and path space bounds.

03

Validated method on Schl"{o}gl model and genetic switches network.

Abstract

Stochastic reaction networks that exhibit bistability are common in many fields such as systems biology and materials science. Sampling of the stationary distribution is crucial for understanding and characterizing the long term dynamics of bistable stochastic dynamical systems. However, this is normally hindered by the insufficient sampling of the rare transitions between the two metastable regions. In this paper, we apply the parallel replica (ParRep) method for continuous time Markov chain to accelerate the stationary distribution sampling of bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions and it is very easy to implement. We combine ParRep with the path space information bounds for parametric sensitivity analysis. We demonstrate the efficiency and accuracy of the method by studying the Schl\"{o}gl…

Tables6

Table 1. Table 1: Bistable Schlögl model.

Reaction	Propensity Function	Stoich. Vec.
$A + 2 S \to 3 S$	$λ_{1}^{V} (x, c) = c_{1} a x (x - 1) / V$	$η_{1} = 1$
$3 S \to A + 2 S$	$λ_{2}^{V} (x, c) = c_{2} x (x - 1) (x - 2) / V^{2}$	$η_{2} = - 1$
$B \to S$	$λ_{3}^{V} (x, c) = c_{3} b V$	$η_{3} = 1$
$S \to B$	$λ_{4}^{V} (x, c) = c_{4} x$	$η_{4} = - 1$

Table 2. Table 2: Estimated path space FIM for Schlögl model

Matrix Element	Estimated pFIM	Half width C.I.
$(1, 1)$	8.75E+01	3.02E-01
$(2, 2)$	1.67E+03	5.66E+00
$(3, 3)$	2.00E+02	2.59E-06
$(4, 4)$	2.46E+01	7.88E-02

Table 3. Table 3: Estimated sensitivity bounds and approximated sensitivities for Schlögl model

Parameter	$c_{1}$	$c_{2}$	$c_{3}$	$c_{4}$
Bounds	7.16E+03	3.09E+04	1.08E+04	3.80+E03
CME Approx.	4.07E+02	9.10E+02	6.30E+02	-2.65+E02

Table 4. Table 4: Genetic switching system

Reaction	Propensity Function	Stoich. Vec.
${DNA}_{act} \to {DNA}_{act} + mRNA$	$λ_{1}^{V} (ξ, x_{1}, x_{2}) = a ξ$	$η_{1} = (1, 0)$
$mRNA \to \emptyset$	$λ_{2}^{V} (ξ, x_{1}, x_{2}) = γ x_{1}$	$η_{2} = (- 1, 0)$
$mRNA \to mRNA + Protein$	$λ_{3}^{V} (ξ, x_{1}, x_{2}) = γ b x_{1}$	$η_{3} = (0, 1)$
$Protein \to \emptyset$	$λ_{4}^{V} (ξ, x_{1}, x_{2}) = x_{2}$	$η_{4} = (0, - 1)$

Table 5. Table 5: Path space FIM of genetic switch network.

Matrix Element	Estimated pFIM	Half width C.I.
$(1, 1)$	3.34E-03	2.35E-05
$(2, 2)$	1.69E+00	1.19E-02
$(3, 3)$	3.57E-01	2.52E-03
$(4, 4)$	4.34E-01	2.83E-03
$(5, 5)$	8.32E-04	5.88E-06
$(6, 6)$	8.33E-03	6.12E-05
$(7, 7)$	8.44E-04	5.23E-06
$(8, 8)$	2.22E-05	1.56E-07

Table 6. Table 6: IAFs of genetic switch network.

active DNA	inactive DNA	mRNA	Protein
1.64E+02	1.64E+02	7.47E+02	9.45E+08

Equations84

Q_{x, y}^{V} = {λ_{j}^{V} (x, c) 0 y = x + η_{j} for some j = 1, \dots, m; Otherwise.

Q_{x, y}^{V} = {λ_{j}^{V} (x, c) 0 y = x + η_{j} for some j = 1, \dots, m; Otherwise.

X^{V} (t) = X^{V} (0) + j = 1 \sum m P_{j} (\int_{0}^{t} λ_{j}^{V} (X^{V} (s)) d s) η_{j},

X^{V} (t) = X^{V} (0) + j = 1 \sum m P_{j} (\int_{0}^{t} λ_{j}^{V} (X^{V} (s)) d s) η_{j},

\frac{d p ^{V} ( x , t )}{d t} = j = 1 \sum m λ_{j}^{V} (x - η_{j}) p^{V} (x - η_{j}, t) - λ_{j}^{V} (x) p^{V} (x, t),

\frac{d p ^{V} ( x , t )}{d t} = j = 1 \sum m λ_{j}^{V} (x - η_{j}) p^{V} (x - η_{j}, t) - λ_{j}^{V} (x) p^{V} (x, t),

\frac{d x ˉ}{d t} = j = 1 \sum m η_{j} λ_{j} (\overset{x}{ˉ})

\frac{d x ˉ}{d t} = j = 1 \sum m η_{j} λ_{j} (\overset{x}{ˉ})

V \to \infty lim 0 \leq s \leq t sup ∣ X_{V} (s) - \overset{x}{ˉ} (s) ∣ = 0

V \to \infty lim 0 \leq s \leq t sup ∣ X_{V} (s) - \overset{x}{ˉ} (s) ∣ = 0

X_{V} (t) = X_{V} (0) + \frac{1}{V} j = 1 \sum m P_{j} (\int_{0}^{t} V λ_{j} (X_{V} (s)) d s) η_{j} .

X_{V} (t) = X_{V} (0) + \frac{1}{V} j = 1 \sum m P_{j} (\int_{0}^{t} V λ_{j} (X_{V} (s)) d s) η_{j} .

t \to \infty lim \frac{1}{t} \int_{0}^{t} f (X^{V} (s)) d s = π^{c} (f)

t \to \infty lim \frac{1}{t} \int_{0}^{t} f (X^{V} (s)) d s = π^{c} (f)

\nabla π_{V}^{c} (f) = (\frac{\partial π _{V}^{c} ( f )}{\partial c _{1}}, \dots, \frac{\partial π _{V}^{c} ( f )}{\partial c _{l}})^{tr}

\nabla π_{V}^{c} (f) = (\frac{\partial π _{V}^{c} ( f )}{\partial c _{1}}, \dots, \frac{\partial π _{V}^{c} ( f )}{\partial c _{l}})^{tr}

P (0 \leq t \leq T sup ∣ X_{V} (t) - x (t) ∣ \geq δ) \approx e^{- V I_{0}^{T} (x)}

P (0 \leq t \leq T sup ∣ X_{V} (t) - x (t) ∣ \geq δ) \approx e^{- V I_{0}^{T} (x)}

ν (A) = P^{ν} (X_{n} \in A ∣ N > n)

ν (A) = P^{ν} (X_{n} \in A ∣ N > n)

N^{*} = r min N^{r} .

N^{*} = r min N^{r} .

K = min {r = 1, \dots, R; X_{N^{*}}^{r} \in / W} .

K = min {r = 1, \dots, R; X_{N^{*}}^{r} \in / W} .

ν_{n_{c}} (A) = P (X_{n_{c}} \in A ∣ N > n_{c})

ν_{n_{c}} (A) = P (X_{n_{c}} \in A ∣ N > n_{c})

S_{f, v} (P^{c}) = ϵ \to 0 lim \frac{1}{ϵ} {E_{P^{c + ϵ v}} [f] - E_{P^{c}} [f]} .

S_{f, v} (P^{c}) = ϵ \to 0 lim \frac{1}{ϵ} {E_{P^{c + ϵ v}} [f] - E_{P^{c}} [f]} .

S_{f, v} (P_{[0, T]}^{c}) \leq Var_{P_{[0, T]}^{c}} (f) v^{tr} I (P_{[0, T]}^{c}) v,

S_{f, v} (P_{[0, T]}^{c}) \leq Var_{P_{[0, T]}^{c}} (f) v^{tr} I (P_{[0, T]}^{c}) v,

S_{f, v} (π^{c}) \leq τ_{π^{c}} (f) v^{tr} I_{H} (P^{c}) v,

S_{f, v} (π^{c}) \leq τ_{π^{c}} (f) v^{tr} I_{H} (P^{c}) v,

τ_{π^{c}} (f) = T \to \infty lim \frac{1}{T} Var_{P_{[0, T]}^{c}} (\int_{0}^{T} f (X (s)) d s) .

τ_{π^{c}} (f) = T \to \infty lim \frac{1}{T} Var_{P_{[0, T]}^{c}} (\int_{0}^{T} f (X (s)) d s) .

\frac{1}{T ( N - 1 )} k = 1 \sum N (Y^{(k)} - \overset{ˉ}{Y})^{2},

\frac{1}{T ( N - 1 )} k = 1 \sum N (Y^{(k)} - \overset{ˉ}{Y})^{2},

I_{H} (P^{c}) = E_{π^{c}} {j = 1 \sum m λ_{j} (x, c) \nabla λ_{j} (x, c) \nabla λ_{j} (x, c)^{tr}} .

I_{H} (P^{c}) = E_{π^{c}} {j = 1 \sum m λ_{j} (x, c) \nabla λ_{j} (x, c) \nabla λ_{j} (x, c)^{tr}} .

I_{H} (P^{c}) = T \to \infty lim \frac{1}{T} \int_{0}^{T} j = 1 \sum m λ_{j} (X (s), c) \nabla λ_{j} (X (s), c) \nabla λ_{j} (X (s), c)^{tr} d s .

I_{H} (P^{c}) = T \to \infty lim \frac{1}{T} \int_{0}^{T} j = 1 \sum m λ_{j} (X (s), c) \nabla λ_{j} (X (s), c) \nabla λ_{j} (X (s), c)^{tr} d s .

\frac{1}{N} k = 1 \sum N Z^{(k)},

\frac{1}{N} k = 1 \sum N Z^{(k)},

2 S c_{1} c_{2} 3 S, \emptyset c_{3} c_{4} S

2 S c_{1} c_{2} 3 S, \emptyset c_{3} c_{4} S

\frac{d x ˉ}{d t} = c_{1} a \overset{x}{ˉ}^{2} - c_{2} \overset{x}{ˉ}^{3} + c_{3} b - c_{4} \overset{x}{ˉ} .

\frac{d x ˉ}{d t} = c_{1} a \overset{x}{ˉ}^{2} - c_{2} \overset{x}{ˉ}^{3} + c_{3} b - c_{4} \overset{x}{ˉ} .

W_{+} = {x \in E : x \leq \overset{ˉ}{X}_{0}^{V}}, W_{-} = {x \in E : x > \overset{ˉ}{X}_{0}^{V}},

W_{+} = {x \in E : x \leq \overset{ˉ}{X}_{0}^{V}}, W_{-} = {x \in E : x > \overset{ˉ}{X}_{0}^{V}},

F (x_{2}) G (x_{2}) = k_{0}^{min} + (k_{0}^{max} - k_{0}^{min}) x_{2}^{2} / (x_{2}^{2} + D^{2}) = k_{1}^{max} - (k_{1}^{max} - k_{1}^{min}) x_{2}^{2} / (x_{2}^{2} + D^{2}) .

F (x_{2}) G (x_{2}) = k_{0}^{min} + (k_{0}^{max} - k_{0}^{min}) x_{2}^{2} / (x_{2}^{2} + D^{2}) = k_{1}^{max} - (k_{1}^{max} - k_{1}^{min}) x_{2}^{2} / (x_{2}^{2} + D^{2}) .

X^{V} (t) = X^{V} (0) + j = 1 \sum 4 P_{j} (\int_{0}^{t} λ_{j}^{V} (ξ (s), X^{V} (s)) d s) η_{j},

X^{V} (t) = X^{V} (0) + j = 1 \sum 4 P_{j} (\int_{0}^{t} λ_{j}^{V} (ξ (s), X^{V} (s)) d s) η_{j},

\frac{d X ˉ _{1}}{d t} \frac{d X ˉ _{2}}{d t} = \frac{a F ( X ˉ _{2} )}{F ( X ˉ _{2} ) + G ( X ˉ _{2} )} - γ \overset{ˉ}{X}_{1} = γ b \overset{ˉ}{X}_{1} - \overset{ˉ}{X}_{2},

\frac{d X ˉ _{1}}{d t} \frac{d X ˉ _{2}}{d t} = \frac{a F ( X ˉ _{2} )}{F ( X ˉ _{2} ) + G ( X ˉ _{2} )} - γ \overset{ˉ}{X}_{1} = γ b \overset{ˉ}{X}_{1} - \overset{ˉ}{X}_{2},

W_{+} = {(x, y) \in E ∣ y < 511.2865}

W_{+} = {(x, y) \in E ∣ y < 511.2865}

W_{-} = {(x, y) \in E ∣ y > 511.2865} .

W_{-} = {(x, y) \in E ∣ y > 511.2865} .

R (P_{[0, T]}^{c} ∣ P_{[0, T]}^{c^{'}}) = \int_{E} lo g (\frac{d P _{[0, T]}^{c} ( x )}{d P _{[0, T]}^{c^{'}} ( x )}) P_{[0, T]}^{c} (d x) .

R (P_{[0, T]}^{c} ∣ P_{[0, T]}^{c^{'}}) = \int_{E} lo g (\frac{d P _{[0, T]}^{c} ( x )}{d P _{[0, T]}^{c^{'}} ( x )}) P_{[0, T]}^{c} (d x) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Parallel replica dynamics method for bistable stochastic reaction networks: simulation and sensitivity analysis

Ting Wang

[email protected]

Petr Plecháč

[email protected]

Department of Mathematical Sciences, University of Delaware, Delaware 19716 USA

Abstract

Stochastic reaction networks that exhibit bi-stable behavior are common in many fields such as systems biology and materials science. Sampling of the stationary distribution is crucial for understanding and characterizing the long term dynamics of bistable stochastic dynamical systems. However, this is normally hindered by the insufficient sampling of the rare transitions between the two metastable regions. In this paper, we apply the parallel replica (ParRep) method for continuous time Markov chain ParRep-CTMC to accelerate the stationary distribution sampling of bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions and it is very easy to implement. We combine ParRep with the path space information bounds dupuis2016path for parametric sensitivity analysis. We demonstrate the efficiency and accuracy of the method by studying the Schlögl model and the genetic switch network.

I Introduction

Stochastic reaction networks have become increasingly important as a tool for modeling complex biological and chemical systems with random noises mcadams1999sa . Simulation of real-world reaction networks using the stochastic simulation algorithm (SSA) gillespie-SSA can be computationally intractable due to the multiscale feature of the systems. For instance, reaction networks in biological cells often involve vastly different numbers of molecules for different species and rate constants for different reaction channelskang2013separation . Therefore, the system is metastable in the sense that the SSA rarely samples the reactions involving small rate constants or low population species. Our paper addresses with another type of metastable issue associated with reaction networks. We consider metastablity that is caused by extremely rare transitions between two separate regions of the state space, i.e., the bistable reaction networks. It has been discovered recently that many biological and physical systems exhibit bistability and hence it is of great interests to understand the bistable phenomenon. mehta2008exponential ; gardner2000construction ; umulis2006robust ; angeli2004detection

We study two important aspects regarding bistable reaction networks: accelerated stationary distribution sampling and parametric uncertainty quantification, dupuis2016path ; pantazis2013relative ; arampatzis2015accelerated using the parallel replica dynamics (ParRep) method. voter1998parallel ; ParRep-SDE ; ParRep-Chain ; ParRep-CTMC We know that SSA based sampling for bistable reaction networks can be extremely expensive because of the rare sampling of transitions between two metastable regions. As a remedy for this issue, the ParRep uses multiple parallel replicas to explore the transition path between the two metastable regions with a controllable error. The method was originally designed for sampling rare events in molecular dynamics simulation such as Langevin dynamics. voter1998parallel The mathematical framework of ParRep was recently developed for the discrete time Markov chains (DTMC).ParRep-SDE ; binder2015generalized ; ParRep-Chain ; aristoff2015parallel In this paper we apply the version of ParRep algorithm that we developed for continuous time Markov chains (CTMC) ParRep-CTMC to accelerate the simulation of bistable reaction networks. Furthermore, the algorithm allows us to efficiently sample the stationary distribution starting from the transient regime. We also investigate the parametric sensitivity problem of bistable reaction networks. Basically, we study the change of bistable system outputs to perturbations in system parameters. This enables us to quantify the parametric uncertainty and system robustness. We point out that the proposed version of ParRep can be easily combined with the path space information bounds dupuis2016path to provide useful information and reductions for parametric sensitivity analysis in high dimensions.

I.1 Stochastic reaction network model

We consider a well-mixed chemical system with $n$ species interacting through $m$ reaction channels with system size $V$ . Under the well-mixed assumption, the molecular population is modeled as an $n$ dimensional CTMC $X^{V}(t)$ . The numbers of molecule of the $i$ th species consumed and produced in the $j$ th reaction are denoted by $\eta_{ij}^{-}$ and $\eta_{ij}^{+}$ , respectively. We call the net change $\eta_{j}=\eta_{j}^{-}-\eta_{j}^{+}$ caused by the $j$ th reaction the stoichiometric vector, which is independent of the system size $V$ . Each reaction channel is associated with a propensity function $\lambda_{j}^{V}(x,c),j=1,\cdots,m$ such that given $X^{V}(t)=x$ , the probability of the $j$ th reaction occurs at the infinitesimal time interval $[t,t+\delta t)$ is $\lambda_{j}^{V}(x,c)\delta t$ , where $c$ is the vector of rate constants in $\mathbb{R}^{l}$ . In this paper, we will suppress $c$ when we write the propensity functions unless we study the parametric sensitivity with respect to $c$ . From the propensity functions, we can construct the transition rate matrix (or the infinitesimal generator) $Q^{V}$ of the Markov chain $X^{V}$ such that

[TABLE]

Moreover, it is well known that the time evolution of $X^{V}$ is characterized by the random time change representationethier2009markov

[TABLE]

where $\mathcal{P}_{j}$ are independent unit rate Poisson processes.

For a fixed system of the size $V$ , the probability distribution of the population process $X^{V}$ is completely governed by the chemical master equation (CME)

[TABLE]

where $p^{V}(x,t)=\mathbb{P}(X^{V}(t)=x)$ . In principle, the CME enables the computation of the distribution of $X^{V}(t)$ for any $V$ . However, the CME is normally an infinite dimensional system which cannot be solved explicitly in general. Therefore, Monte Carlo methods such as Gillespie’s SSA are commonly used to obtain the numerical solution to the CME.

We denote by $X_{V}(t)=V^{-1}X^{V}(t)$ the corresponding concentration process for a system with size $V$ . When $V$ is large, the randomness of the reaction network can be neglected and $X_{V}(t)$ can be approximated by the solution of the reaction rate equation (RRE) kurtz1970solutions

[TABLE]

in the sense that

[TABLE]

almost surely for any $t>0$ , where we assume $\lambda_{j}(x)=V^{-1}\lambda_{j}^{V}(Vx)$ for all $j=1,\ldots,m$ . Throughout this paper, we assume such form for propensity functions and hence the random time change representation for the concentration process is

[TABLE]

We focus on reaction networks that are modeled by an ergodic CTMC $X^{V}(t)$ such that the stationary distribution $\pi^{c}$ exists and the ergodic limit

[TABLE]

holds for suitable observables $f$ . Here the stationary distribution $\pi_{V}^{c}$ depends on $c$ since the process $X^{V}(t)$ depends on $c$ . The gradient of $\pi_{V}^{c}(f)$ with respect to the parameter $c$ , i.e., $\nabla\pi_{V}^{c}(f)$ , serves as an indicator for the system’s parametric uncertainty or robustness. We call the estimation of

[TABLE]

the stationary sensitivity analysis problem, where the superscript “tr” means transpose.

I.2 Reaction networks with bistability

In this paper, we are mainly interested in accelerating simulation and sensitivity analysis for bistable reaction networks, i.e., reaction networks whose RRE has a pair of asymptotically stable fixed points $\bar{x}_{+}$ and $\bar{x}_{-}$ separated by a saddle point $\bar{x}_{0}$ . We denote the neighborhood of $\bar{x}_{+}$ by $W_{+}$ and the neighborhood of $\bar{x}_{-}$ by $W_{-}$ . If we neglect the randomness of the network, any initial point that is placed in $W_{+}$ (resp. $W_{-}$ ) approaches to $\bar{x}_{+}$ (resp. $\bar{x}_{-}$ ) eventually. However, due to the random noise (since $V$ is finite), the system is subject to rare, large fluctuations which make the concentration process $X_{V}(t)$ to be far away from one stable fixed point and enter into the neighborhood of the other stable point. The theoretical tool to study this type of large fluctuations is the large deviation principle (LDP) LDP-Dembo-Zeitouni ; LDP-Shwartz-Weiss ; LDP-Dupuis-Ellis . The key ingredient in the LDP of $X_{V}$ is the rate function (or action) $I_{0}^{T}(x)$ which characterizes the exponentially small probability for $X_{V}$ remaining in a small neighborhood of a path $x$ , i.e.,

[TABLE]

for all small $\delta$ when $V$ is large. By minimizing the rate function over the path space one can find the so called most probable path dykman1994large . In a bistable reaction network, $X_{V}$ sojourns in $W_{+}$ (resp. $W_{-}$ ) for long time until there is an exponentially small probability for it to leave $W_{+}$ (resp. $W_{-}$ ) along the most probable path. In this sense, we call $W_{+}$ and $W_{-}$ metastable sets for $X_{V}$ since the sojourn times in both sets are exponentially long. The metastability issue normally leads to insufficient sampling of transition events between $W_{+}$ and $W_{-}$ and consequently makes it computationally prohibitive to sample the stationary distribution $\pi_{V}^{c}$ for $X^{V}$ . In this work we aim to speed up the sampling of $\pi_{V}^{c}$ by accelerating the exit from metastable sets using parallel computing.

II methodology

II.1 Parallel replica dynamics

The idea of ParRep was first introduced for simulating rare events voter1998parallel and was recently formalized in several papersParRep-SDE ; ParRep-Chain ; ParRep-CTMC . Our goal in this section is to introduce the ParRep method ParRep-CTMC to accelerate the simulation of bistable stochastic reaction networks and estimate the stationary distribution. Since we are considering fixed volume $V$ in this section, we will suppress the superscript $V$ to simplify the notations.

The theoretical justification for ParRep relies on the notion of the quasi-stationary distribution (QSD). Given a set $W$ and a DTMC $X_{n}$ , a distribution $\nu$ is called the quasi-stationary distribution of $X_{n}$ in $W$ if

[TABLE]

for all $n=1,2,\ldots$ and any measurable set $A\subset W$ , where $N$ is the first exit time of $X_{n}$ from $W$ . The definition roughly says that the QSD is a distribution supported on $W$ such that if the initial distribution is $\nu$ , then the DTMC $X_{n}$ remains distributed with $\nu$ before it exits $W$ . The existence and uniqueness of the QSD in this setting can be shown rigorously ParRep-CTMC . The consequence of assuming that $X_{n}$ starts at the QSD $\nu$ in $W$ is that the first exit time $N$ follows a geometric distribution with some parameter $1-\lambda$ , i.e., $\mathbb{P}^{\nu}(N>n)=\lambda^{n}$ for all $n=1,2,\ldots$ . Moreover, the first exit time $N$ and the exit state $X_{N}$ are independentQSD .

Now suppose we have $R$ independent and identically distributed replicas $(X_{n}^{1},\ldots,X_{n}^{R})$ of $X_{n}$ , each with initial distribution QSD $\nu$ . Denote the first exit time of the $r$ th replica by $N^{r}$ and define the smallest first exit time among all $R$ replicas by

[TABLE]

Note that there could be more than one replicas which realize the $N^{*}$ (exit after the same number of steps), we denote by $K$ the smallest index among the exited replicas, i.e.,

[TABLE]

Assuming each of the replicas of $X_{n}$ is initially distributed with the QSD $\nu$ , the following two results are crucial for the design of the ParRep algorithmParRep-Chain .

$X_{N^{*}}^{K}$ is independent of $R(N^{*}-1)+K$ ; 2. 2.

$(X_{N^{*}}^{K},R(N^{*}-1)+K)$ has the same distribution as $(X_{N^{1}}^{1},N^{1})$ .

The first result states that the first exit state from $W$ over $R$ replicas is independent with the total sojourn time over $R$ replicas. Furthermore, the second result guarantees that joint distribution of the first exit time and the first exit state is independent of the number of replicas. These facts suggest that we can use multiple replicas to explore a metastable region in order to accelerate the sampling of exit events but without changing the exit distribution. That is, we can achieve acceleration by using parallel computing. However, the gain of efficiency in this procedure is under the assumption that all replicas start with the QSD of $W$ , which is not the case in general. In order to sample the QSD for launching the parallel step, some preparation steps are needed to make the process to be well into the quasi-stationary state. Therefore, a complete cycle of ParRep can be roughly divided into three steps,

S1

Decorrelation: simulate $X_{n}$ until the QSD $\nu$ of the current metastable set $W$ is sampled. Proceed to the dephasing step;

S2

Dephasing: prepare a sequence of iid initial state $(x_{1},\ldots,x_{R})$ from $\nu$ . Proceed to the parallel step;

S3

Parallel: launch $R$ replicas of $X_{n}$ at $(x_{1},\ldots,x_{R})$ to explore the exit path from $W$ . Return to the decorrelation step.

We can adapt the above ParRep procedure for DTMC to the simulation of CTMC through simulating its embedded chain. More significantly, the algorithm can be modified to effectively sample the stationary distribution of a CTMC without the detailed balance assumption. We present the ParRep algorithm for CTMC in Algorithm 1. The setup of notations in the ParRep algorithm is as follows.

$\tilde{X}(t)$ : ParRep process we simulate throughout the ParRep algorithm;
$T_{s}$ : time clock throughout the ParRep algorithm;
$I_{s}$ : accumulated contribution to the time integral $\int_{0}^{T_{s}}f(\tilde{X}(s))\,ds$ throughout the ParRep algorithm;
$N_{c}$ : count of transitions in each decorrelation step;
$n_{c}$ : decorrelation threshold;
$N_{p}$ : count of transitions in each dephasing step;
$n_{p}$ : dephasing threshold;
$\tau$ : holding time for the next reaction;
$J$ : index of the next reaction;

Before we start the ParRep algorithm, we choose fixed decorrelation threshold $n_{c}$ and dephasing threshold $n_{p}$ and initialize $T_{s}=0,I_{s}=0$ and $\tilde{X}=x_{0}$ .

The procedure of the decorrelation step can be summarized as follows. If $W$ is not a metastable set, then the process would leave $W$ rapidly and hence there is no need to launch the following dephasing and parallel steps. However, if $W$ is metastable then the process would remain in $W$ for at least $n_{c}$ transitions and the algorithm proceed to the dephasing step. Since we assume $n_{c}$ is large enough for the process to reach the QSD of $W$ , the state we obtain after $n_{c}$ transitions is asymptotically distributed according to the QSD. The dynamics in the decorrelation step is exact and hence there is no loss of accuracy and no acceleration either during this step. In the dephasing step, we apply the Fleming-Viot particle technique binder2015generalized to sample a sequence of iid initial states that can be used in the subsequent parallel step. Similar to the decorrelation step, we specify the dephasing threshold $n_{p}$ and let all $R$ replicas to evolve for $n_{p}$ transitions (jumps). During this procedure, if a replica leaves $W$ then we force it to restart from the current state of another replica (chosen uniformly). Similar to $n_{c}$ , $n_{p}$ is large enough so that we sample a sequence of QSD distributed states $(x_{1},\ldots,x_{R})$ . Note that the dephasing step does not contribute anything to the $T_{s}$ , $I_{s}$ and $\tilde{X}$ , its only purpose is to prepare the initial states $(x_{1},\ldots,x_{R})$ for the subsequent parallel step.

The acceleration of ParRep comes from the parallel step. We launch $R$ parallel replicas from $(x_{1},\ldots,x_{R})$ to explore the exit event from $W$ , that is, sample $N^{*}$ , $K$ and the first exit state $X_{N^{*}}^{K}$ . Since $(X_{N^{*}}^{K},R(N^{*}-1)+K)$ has the same distribution as $(X_{N^{1}}^{1},N^{1})$ , sampling of exit events with $R$ replicas (i.e., sample $N^{*}$ and $K$ ) in the parallel step is approximately $R$ times faster than that with serial simulation (i.e., sample $N^{1}$ ). Moreover, all the generated data from each replica in the parallel step are collected in order to sample the stationary distribution $\pi^{c}$ . This is through the update of the clock time $T_{s}$ and the time integral $I_{s}$ . Note that sampling $\pi^{c}$ by reusing these generated data from ParRep is statistically correct (asymptotically when $n_{c}$ and $n_{p}$ are large) comparing to that from the serial simulation. In fact, we have shown that the averaged contribution to $T_{s}$ or $I_{s}$ over each ParRep cycle is independent with the replica number $R$ , provided that $x_{1},\ldots,x_{R}$ are independent and distributed according to the QSD ParRep-CTMC ; aristoff2015parallel .

II.2 Accuracy and efficiency

The accuracy of ParRep method relies on the choice of the decorrelation step $n_{c}$ and the dephasing step $n_{p}$ since these parameters determine how “good” we sample the QSD before the parallel step. In practice, we would never have exact sampling of the QSD at each ParRep cycle and hence there is an error associated with the inexact sampling of QSD. However, for large $n_{c}$ and $n_{p}$ we can expect that the error is sufficiently small. In fact, this can be justified by the following resultParRep-CTMC . For fixed $n_{c}$ , we define the distribution $\nu_{n_{c}}$ as

[TABLE]

for any measurable set $A\subset W$ , i.e., the distribution of $X_{n_{c}}$ conditioned on no exit event occurred after $n_{c}$ transition steps. If we assume that the dephasing step is exact (i.e., $(x_{1},\ldots,x_{R})$ are independent and distributed as the QSD), then the averaged error for sampling $I_{s}$ over each ParRep cycle can be bounded by a constant times the total variation $\|\mu_{n_{c}}-\mu\|_{\text{TV}}$ . Furthermore, the total variation converges geometrically fast in terms of $n_{c}$ . This justifies that the dynamics of transition from one metastable set to another metastable set (i.e., one ParRep cycle) is asymptotically correct. The analysis of the global error from all ParRep cycles is hard to analyze. However, our numerical experiments in Sec. IV suggest that ParRep is a rather accurate algorithm for long time simulation.

We briefly discuss the efficiency of ParRep for CTMC. In this paper, we define the speedup as the ratio between the total computational time of serial simulation and that of ParRep simulation. In the idealized scenario, the speedup factor of ParRep could be up to the number of replica used in the simulation as suggested by the properties. However, in practice the preparation for a sequence of QSD initial states offsets this linear acceleration. Heuristically, the efficiency of ParRep relies on the metastability of the set. If the set $W$ is strongly metastable, then the time spent in the decorrelation and dephasing steps is negligible comparing to the acceleration achieved in the parallel step. However, if the set $W$ is not truly metastable then the parallel step would not be activated and hence the ParRep is equivalent to SSA. In fact, this argument can be formalized and it turns out that the efficiency of ParRep is determined by the ratio $\lambda_{1}/(\text{Re}(\lambda_{2})-\lambda_{1})$ , where $\lambda_{1}$ and $\lambda_{2}$ (with $0>\lambda_{1}>\text{Re}(\lambda_{2})$ ) are the two largest eigenvalues of the transition rate matrix $Q$ (see (1) for definition) restricted to the metastable set $W$ . We do not pursue this aspect rigorously in this paper. Interested reader could refer to the related literature binder2015generalized .

III Path space information bound

In this section, we combine the ParRep method with the path-space information bounds dupuis2016path to accelerate the parametric sensitivity analysis of stochastic reaction networks. The bounds are derived using several concepts in information theory. For the readability of the paper, we briefly review these concepts and their connections in Appendix A.

Recall that we define the sensitivity analysis problem at the end of Section I.1. There exist several types of sensitivity analysis methods such as the finite difference rathinam2010efficient ; anderson2012efficient , likelihood ration plyasunov2007efficient and infinitesimal perturbation analysis or pathwise derivative method sheppard2012pathwise . We refer them as the direct methods since they aim to estimate the sensitivity directly. However, direct estimation of the sensitivity can be extremely expensive due to their large variances wang2016efficiency and complexity when applied to large reactions networks. Alternatively, we aim to compute a gradient-free upper bound of the sensitivity. The computed sensitivity bounds can be used for screening out those insensitive parameters (with small bounds) and then direct methods can be applied for the remaining of parameters arampatzis2015accelerated .

In general, given a probability distribution $P^{c}$ which depends on a vector of parameters $c$ , we define the sensitivity index of an observable $f$ (along the direction $v$ ) as

[TABLE]

Note that in the case that $P^{c}=\pi^{c}$ and $v=e_{k}$ (the $k$ -th basis vector), the sensitivity index is simply the $k$ -th component of the gradient $\nabla\pi^{c}(f)$ . When we are interested in the sensitivity analysis of the stochastic process $X(t,c)$ with stationary distribution $\pi^{c}$ , it is often convenient to interpret the distribution $P^{c}$ as the path space distribution $P_{[0,T]}^{c}$ , i.e., the probability distribution of paths of $X(t,c)$ on the time interval $[0,T]$ . It can be shown that in the transient regime (i.e., the initial distribution of $X(t,c)$ is not $\pi^{c}$ ) the sensitivity index can be bounded by

[TABLE]

where $\mathcal{I}(P_{[0,T]}^{c})$ is the path space Fisher information matrix (FIM) of the relative entropy $\mathcal{R}(P_{[0,T]}^{c}|P_{[0,T]}^{c+\epsilon v})$ (see Appendix B for a formal derivation). In the stationary regime, a similar sensitivity bound can be derived. That is, the stationary sensitivity index can be bounded by

[TABLE]

where $\tau_{\pi^{c}}(f)$ is the integrated auto-correlation function (IAF) and $\mathcal{I}_{\mathcal{H}}(P^{c})$ is the path space FIM of the relative entropy rate $\mathcal{H}(P^{c}|P^{c+\epsilon v})$ . In fact, $\mathcal{I}_{\mathcal{H}}(P^{c})$ can be roughly interpreted as $\lim_{T\to\infty}T^{-1}\mathcal{I}(P_{[0,T]}^{c})$ . See Appendix B for precise definitions and a formal derivation of the bounds (8) and (9).

We focus on bounding the stationary sensitivity in the context of stochastic reaction networks, i.e., $X(t,c)$ is a continuous time jump Markov process. To make use of the bounds (9), we need reliable estimators for the IAF $\tau_{\pi^{c}}(f)$ and the path space FIM $\mathcal{I}_{\mathcal{H}}(P^{c})$ . For the IAF, we have shown in the Appendix B that

[TABLE]

Hence, when $T$ is large, an approximate estimator for the IAF is

[TABLE]

where $N$ is the sample size, $Y^{(k)}=\int_{0}^{T}f(X^{(k)}(s))ds$ is the $k$ -th sample and $\bar{Y}=N^{-1}\sum_{k=1}^{N}Y^{(k)}$ is the sample average. Note that (9) assumes the dynamics starts at the stationary regime, hence a burn-in period is necessary for the dynamics to relax to the stationary state before we start sampling the IAF. Now for the path space FIM, it can be written as the stationary expectation of a special observable in terms of the propensity functions (see Appendix A), i.e.,

[TABLE]

Since the expectation is taken under the stationary distribution, the path space FIM can be approximated as the ergodic average of the observable. That is,

[TABLE]

Hence, an estimator for the path space FIM is simply

[TABLE]

where $Z^{(k)}$ is the $k$ -th realization of the ergodic average. Note that the FIM is of great interests by itself since it reflects the identifiability of parameters by Cramér-Rao’s inequality. We will use the path space information bounds $\eqref{eqn:stationary-bounds}$ to estimate the stationary sensitivity bounds for numerical experiments in the next section.

IV Numerical examples

In this section, we consider two bistable examples arising in chemistry and systems biology. We demonstrate that the ParRep algorithm can efficiently sample rare transitions between two stable equilibrium points and outperforms the standard SSA by a significant speedup factor.

IV.1 Bistable Schlögl model

IV.1.1 Model

The Schlögl model is one of the simplest example of stochastic reaction networks that exhibit bistability. It is an auto-catalytic network involving three species whose population can change according to the reaction network in Table 1. Following our notational convention, we denote by $X_{V}(t)$ the concentration of the species $S$ and $X^{V}(t)$ the population of $S$ . The concentration of $A$ and $B$ (denoted by $a$ and $b$ , respectively) are fixed due to an exchange of chemicals between two material baths vellela2009stochastic and hence $a$ and $b$ are considered as parameters of the network. Therefore, it is equivalent to the Schlöglmodel as a one species network

[TABLE]

with $a$ and $b$ absorbed in the rate constants $c_{1}$ and $c_{3}$ . In this paper, we follow the chemical convention to write the reactions of Schlögl network as in Table 1.

In the large volume limit, the concentration process has a deterministic limit $\bar{x}$ satisfying the RRE (4)

[TABLE]

We choose $a=1$ , $b=2$ , $c_{1}=3$ , $c_{2}=0.6$ , $c_{3}=0.25$ and $c_{4}=2.95$ cao2013adaptively , in which case the RRE has two stable equilibrium points $\bar{x}_{+}$ and $\bar{x}_{-}$ separated by an unstable equilibrium point $\bar{x}_{0}$ . Therefore, Schlögl model exhibits two time scales: the fast time scale corresponds to the relaxation to one of the stable equilibrium points and the slow time scale corresponds to the rare transitions between the two stable equilibrium points. The two-time scale feature is illustrated in Figure 1, where the standard SSA is performed with $V=25$ .

Due to the bistable nature, long time simulation is needed to sample enough transition events so that the system relaxes to stationary distribution. We apply the ParRep algorithm to accelerate the sampling of very long trajectories in order to estimate the stationary distribution $\pi^{c}$ . We decompose the state space into two metastable sets separated by the unstable equilibrium state $\bar{X}_{0}^{V}=V\bar{x}_{0}$ (we multiply the concentration $\bar{x}_{0}$ by $V$ so that all the comparisons are in terms of the population instead of the concentration). That is,

[TABLE]

where $E$ is the state space of $X^{V}(t)$ . Note that this decomposition will be optimal for ParRep in terms of efficiency since both $W_{+}$ and $W_{-}$ will be strongly metastable. This can be seen by contradiction. In fact, if the decomposition is defined in terms of a point $X^{\prime}$ which is left to $V\bar{x}_{0}$ ( $X^{\prime}<V\bar{x}_{0}$ ), then every time $X^{V}$ exits from $W_{+}$ (with first exit state in the interval $(X^{\prime},V\bar{x}_{0})$ ) will be quickly pulled back to the left stable point $V\bar{x}_{+}$ with a dominating probability, by the large deviation principle. Hence, the subinterval $(X^{\prime},V\bar{x}_{0})$ in $W_{-}$ is not metastable and the ParRep will be inefficient since the parallel step is not activated when the process is in this interval. Therefore, the optimal choices for separatrix is the point $V\bar{x}_{0}$ which guarantees that both of the decomposed sets are truly metastable.

IV.1.2 Results and discussion

Figure 2 shows the estimates of the stationary average of $X$ (ergodic average at $t=10^{5}$ ) with SSA (blue dashed line) and with the ParRep algorithm (red dot with $95\%$ confidence interval) for different choice of decorrelation and dephasing steps. The number of replicas for ParRep is $R=100$ . We also plot the numerical approximation of CME as a benchmark (green solid line) for accuracy in Figure 2. It can be seen that the ParRep simulation approximates the stationary average very well (relative error with respect to the CME solution is $0.04\%$ for $n_{c}=n_{p}=5000$ ) when the decorrelation and dephasing steps are large, this is consistent with our expectation that the QSD of each metastable set is well approximated for large $n_{c}$ and $n_{p}$ . All simulation results are obtained based on $100$ sample trajectories. The CPU time of standard SSA simulation is about $192$ hours for $100$ samples. We demonstrate the corresponding speedup factor with $n_{c}=n_{p}=1000$ to $5000$ (smaller values are ignored as the corresponding estimates are not accurate enough). We can see that with $100$ replicas, our ParRep outperforms the standard SSA by a significant speedup factor.

We also study the efficiency of ParRep in terms of the number of replicas. In Figure 3 we show the estimation of the stationary average of $X^{V}$ and the corresponding speedup factor. The decorreation and dephasing steps are fixed at $n_{c}=n_{p}=5000$ . We observe that the speedup factor increases from $7$ to $20$ when the number of replicas changes from $20$ to $100$ . However, the accuracy of ParRep is independent of the number of replicas. In Figure 4 we demonstrate the application of ParRep to estimate the probability distribution of $X$ with $n_{c}=n_{p}=5000$ . The estimated probability distribution (blue bar) is compared with the probability distribution obtained from CME approximation. The plot suggests that ParRep is a rather accurate method when suitable $n_{c}$ and $n_{p}$ are chosen.

Finally, we apply the path space information bound (9) to obtain a bound for the sensitivity index $S_{f,v}({\pi}^{c})$ . Here we only consider the stationary sensitivity of the observable $f(X^{V}(t))=X^{V}(t)$ with respect to each parameter $c_{i}$ , $i=1,\ldots,4$ . Note that the stochastic reaction networks we simulate start at the transient regime, i.e., the initial distribution of $X^{V}(t)$ is not necessarily $\pi^{c}$ . However, the path space information bounds (9) assume that $X^{V}(t)$ starts in the stationary regime. Therefore, a burn-in period is needed for the process to be well into the stationary regime before we can start sampling the IAF $\tau_{\pi^{c}}(f)$ and the path space FIM $\mathcal{I}_{\mathcal{H}}(P^{c})$ . We choose the terminal time $T=2\times 10^{5}$ and use the first half $[0,10^{5}]$ as the burn-in period to prepare the stationary distribution and the second half $[10^{5},2\times 10^{5}]$ to sample the IAF and path space FIM. The computed path space FIM and the confidence intervals are shown in Table 2. Note that the pFIM is not only useful for obtaining the final sensitivity bounds, but also implies the identifiability of parameters by the Cramér-Rao bound. The computed IAF $\tau_{\pi^{c}}(f)$ is $5.87E$ + $05$ . The resulting sensitivity bounds are shown in Table 3. To see whether the obtained sensitivity bounds are tight enough, we compare them with the approximated sensitivities. The approximation is obtained by differentiating the CME (3) at steady state and truncating the state space to $[0,149]$ . The resulting equation is a linear system that can be solved numerically. Comparing the sensitivity bounds with the approximated sensitivities, we observe that the bounds are not tight enough in this example. In fact, it has been observed in several examples that the path space information bounds are not always tight when applied to multi-scale problems. Nevertheless, the bounds are quite useful for screening insensitive parameters in large scale stochastic dynamical systems. We will demonstrate this application of the bounds in the next example.

IV.2 Genetic switch with positive feedback

IV.2.1 Model

Another example we study in this paper is the genetic switch network which is the fundamental mechanism for cells to shift between alternate gene-expression states. See Figure 5 for the diagram of the network. In the genetic switch network, there is an on-off switch for DNA to be in the active or inactive state. Hence the total population of active DNA and inactive DNA is $1$ . The transition rates $F$ (inactive to active) and $G$ (active to inactive) between these two states depend on the number of proteins through a positive feedback. Following Assaf, Roberts and Luthey-Schulten (2012)assaf2011determining , we explicitly take the mRNA noise into account since it has been shown that the presence of mRNA has a significant impact on the dynamics of the network mehta2008exponential . We list the propensity function and stochiometric vector of each reaction channel in Table 4.

We fix large volume $V=ab=2400$ throughout this example. The two-dimensional process $X^{V}(t)=(X_{1}^{V}(t),X_{2}^{V}(t))$ denotes the number of mRNA and protein at time $t$ . We denote the number of active DNA by the process $\xi(t)$ and hence the number of inactive DNA is $1-\xi(t)$ . The transition rates are taken to be of the Hill-type functions $F(x_{2})$ for the inactive to active transition and $G(x_{2})$ for the reverse transition, where

[TABLE]

Throughout this example, we follow Assaf et al. assaf2011determining to set the parameters as follows: $a=2400/b$ , $b=22.5$ , $\gamma=50$ , $k_{0}^{\text{min}}=k_{1}^{\text{min}}=24/b$ , $k_{0}^{\text{max}}=k_{1}^{\text{max}}=2400/b$ and $D=1000$ .

We point out that the genetic switch model does not fall into the standard framework of stochastic reaction networks we describe in Sec I.1. In fact, the random time change representation of $X^{V}(t)$ can be written as

[TABLE]

See Table 4 for the four reactions involved in this representation. Note that the propensity functions $\lambda_{j}$ are functions of both the switching variable $\xi(t)$ and the population process of mRNA and protein $X^{V}(t)$ . Since $\xi(t)\in\{0,1\}$ does not depend on volume $V$ , the process $(\xi,X^{V}(t))$ does not satisfy the large volume limit (5). However, the mean numbers of mRNA and protein still satisfy the following rescaled RRE (i.e., the ODE governing $\bar{X}^{V}(t)=V\bar{x}(t)$ ) lv2014constructing ; li2015large

[TABLE]

where the factor $F/(F+G)$ gives the probability that the DNA is in an active state. With our choice of parameters, (12) has two stable equilibrium points $\bar{X}_{+}^{V}=(0.0225,25.2628)$ and $\bar{X}_{-}^{V}=(1.6353,1839.6883)$ separated by a saddle point $\bar{X}_{0}^{V}=(0.4545,511.2865)$ . Therefore, the genetic switching network is bistable. When $V$ is finite, there are noise induced rare transitions between $\bar{X}_{+}^{V}$ and $\bar{X}_{-}^{V}$ .

To find the optimal decomposition of the state space $E$ into two metastable sets, we need to analyze the phase space of (12) to determine the separatrix of the two metastable regions induced by $\bar{X}_{+}^{V}$ and $\bar{X}_{-}^{V}$ . Unlike the Schlögl model , it is nontrivial to find the separatrix in this example since it is in $\mathbb{R}^{2}$ . Instead, the way we detect rare transitions is ad-hoc. We simply choose the horizontal line that passes through the saddle point $\bar{X}_{0}^{V}$ as the boundary to define the two metastable sets. We provide a heuristic explanation for the choice. From the large deviation perspective, there exists a most probable transition path from $\bar{X}_{+}^{V}$ to $\bar{X}_{-}^{V}$ li2015large ; lv2014constructing ; dykman1994large such that if a transition occurs, then with a dominating probability, the transition would move along this path. We know that the true separatrix passes through the saddle point $\bar{X}_{0}^{V}$ and that the most probable transition path crosses the true separatrix along a path that is “sufficiently close” to the saddle point (lv2014constructing ). Since the points $(0,511)$ , $(0,512)$ , $(1,511)$ and $(1,512)$ are the only possible states that are sufficiently close to the saddle point $(0.4545,511.2865)$ , the most probable transition path can only cross the separatrix (from $\bar{X}_{+}^{V}$ to $\bar{X}_{-}^{V}$ ) by moving from $(0,511)$ to $(0,512)$ or to $(1,511)$ depending on where the true separatrix lies. Accordingly, this suggests that we can use either $y=511.2865$ (if the most probable path moves from $(0,511)$ to $(0,512)$ ) or $x=0.4545$ (if the most probable path moves from $(0,511)$ to $(1,511)$ ) as the boundary to decompose the state space into two metastable sets. It is readily seen that we should choose the horizontal one since the process is much more metastable in the $y$ direction than that in the $x$ direction. Therefore, we decompose the state space into two sets

[TABLE]

and

[TABLE]

Our simulation results confirm that this is a good choice of decomposition. Note that though the choice of decomposition may not be optimal since we do not know the true separatrix a priori, it only affects the efficiency but not the accuracy of ParRep as we discuss in Section II.2. A rigorous approach to defining the optimal decomposition into metastable sets is the subject of ongoing work.

IV.2.2 Results and discussion

Throughout the simulation of the genetic switch network, all simulation results are obtained based on $100$ sample trajectories. The initial population is $1$ molecule for inactive DNA and [math] molecule for all the remaining species. The terminal time is taken to be $T=10^{6}$ which is sufficiently large for sampling the ergodic average. We first study the accuracy of ParRep in terms of the decorrelation step $n_{c}$ and dephasing step $n_{p}$ with $100$ replicas. Figure 6 demonstrates the simulation results regarding the stationary means of mRNA and protein when $n_{c}=n_{p}$ increase. Simulation results with SSA (blue dashed line) are used for accuracy comparison. The corresponding speedup factor is shown in the same plot (lower panel). The plot suggests that with $n_{c}=n_{p}=2\times 10^{4}$ or above, the ParRep is as accurate as the standard SSA. Figure 7 shows the speedup of ParRep when we vary the number of replicas with $n_{c}=n_{p}=2\times 10^{4}$ . We do not gain as much speedup as in the Schlögl model since the genetic switch network requires much larger $n_{c}$ and $n_{p}$ to converge to the QSD at each metastable set (as we observed in Figure 6). Nevertheless, we can see that with $100$ replicas, the speedup factor of ParRep is about $7\times$ when compared to SSA.

We also study the parametric sensitivity of the genetic switch model by quantifying the stationary path space sensitivity bounds (9). The observables in considerations are the number of active DNA ( $\xi$ ), inactive DNA ( $1-\xi$ ), mRNA ( $X_{1}^{V}$ ) and protein ( $X_{2}^{V}$ ). The parameters are arranged in the order $a,b,\gamma,k_{0}^{\text{min}},k_{0}^{\text{max}},k_{1}^{\text{min}},k_{1}^{\text{max}},D$ . We aim to apply ParRep to estimate the stationary sensitivity bounds of each observable with respect to each of the parameters. In order to obtain the bounds, we simulate the process up to final time $2\times 10^{6}$ . The time interval $[0,10^{6}]$ corresponds to the transient regime and $[10^{6},2\times 10^{6}]$ corresponds to the stationary regime. The estimated (diagonal) path space FIM along with the confidence interval are shown in Table 5. The estimated IAF for each observable is shown in Table 6. Finally, we combine the path space FIM and IAF to obtain the stationary sensitivity bounds. To illustrate the observation that most of the sensitivity indices are small, we visualize the sensitivity bounds in Figure 8. We see that the the active DNA, inactive DNA and mRNA are insensitive to parametric perturbation, whereas the protein tends to be sensitive. If we are interested in quantifying the parametric uncertainty for the genetic switch model, these sensitivity bounds suggest that we can screen out those insensitive combinations and apply direct methods to estimate the remaining sensitivity such as the number of protein with respect to $b$ , $\gamma$ and $k_{0}^{\text{min}}$ . Note that without this bounds, we have to estimate $4\times 8=32$ sensitivities even when we do not take other observables into consideration. However, with the sensitivity bounds for screening, we only need to estimate much fewer sensitivities depending on the controlled confidence level we use. Therefore, this two-step strategy significantly reduces the computational cost especially when applied to large scale networks.

Appendix A Basics for information theory

We review some basic information theory concepts for the completeness of the paper. In particular, we briefly reproduce the formula for the relative entropy and the path space FIM in the context of stochastic reaction networks (i.e., continuous time jump Markov processes) pantazis2013relative .

Given the path space probability distribution $P_{[0,T]}^{c}$ and its perturbation $P_{[0,T]}^{c^{\prime}}$ on the path space $E$ , their pseudo-distance can be measured by the relative entropy

[TABLE]

In particular, the Radon-Nikodym derivative follows from the following Girsanov formula bremaud1981point

[TABLE]

where $R_{j}(t)$ is the count of the $j$ -th reaction up to time $t$ . Assuming the dynamics starts from the stationary regime (i.e., $\mu^{c}=\pi^{c}$ ) and using the facts that $R_{j}(t)-\int_{0}^{t}\lambda_{j}(X(s))ds$ is a martingale under $P_{[0,T]}^{c}$ , the relative entropy can be simplify as

[TABLE]

where

[TABLE]

is the path space relative entropy rate (RER). Note that when $c^{\prime}=c+\delta$ , the Taylor expansion of $\mathcal{H}(P^{c}|P^{c^{\prime}})$ gives

[TABLE]

where

[TABLE]

is the path space Fisher information matrix (FIM) of RER $\mathcal{H}(P^{c}|P^{c^{\prime}})$ .

Appendix B Path space information bounds: a formal derivation

For completeness of the paper, we give a formal derivation of the path space information bounds on both the transient regime and the stationary regime, see the reference dupuis2016path for rigorous derivation. We consider a continuous time Markov process $X(t,c)$ with stationary distribution $\pi^{c}$ . For the path space measure $P_{[0,T]}^{c}$ , we assume that it is absolutely continuous with respect to a reference measure $R_{[0,T]}$ such that $P_{[0,T]}^{c}(dx)=p_{[0,T]}^{c}(x){R_{[0,T]}(dx)}$ for any $c$ . Then by the definition of sensitivity indices and the Cauchy-Schwarz inequality,

[TABLE]

where $\mathcal{I}(P_{[0,T]}^{c})$ is the FIM of the relative entropy $\mathcal{R}(P_{[0,T]}^{c}|P_{[0,T]}^{c+\epsilon v})$ . This gives the path space information bounds on the transient regime.

In the stationary regime, we focus on the ergodic average type observable $F(x)=T^{-1}\int_{0}^{T}f(x(s))\,ds$ . Since the stationary distribution $\pi^{c}$ is also the initial distribution of the stochastic process $X(t,c)$ , it holds that $S_{F,v}(P_{[0,T]}^{c})=S_{f,v}(\pi^{c})$ . Hence by the path space information bounds for $F$ ,

[TABLE]

Taking $T\to\infty$ ,

[TABLE]

where we assumed that $T^{-1}\mathcal{I}(P_{[0,T]}^{c})$ converges to the path space FIM $\mathcal{I}_{\mathcal{H}}(P^{c})$ . Note that

[TABLE]

gives the integrated auto-correlation function (IAF) for observable $f$ , where

[TABLE]

is the covariance between $f(x(u))$ and $f(x(v))$ . We denote that IAF by

[TABLE]

We remark that the IAF only differs with the integrated auto-correlation time (IAT) by a multiplying factor $\text{Cov}_{f}(0,0)$ , i.e., $\text{IAF}=\text{Cov}_{f}(0,0)\times\text{IAT}$ . Finally, we have the stationary path space information bounds

[TABLE]

Acknowledgements.

The work of P.P. has been partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under the contract number DE-SC0010549. The work of T.W. was partially supported by the DARPA project W911NF-15-2-0122. We thank Professor Tiejun Li for discussions about the genetic switches example.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. F. Anderson. An efficient finite difference method for parameter sensitivities of continuous time markov chains. SIAM Journal on Numerical Analysis , 50(5):2237–2258, 2012.
2[2] D. Angeli, J. E. Ferrell, and E. D. Sontag. Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proceedings of the National Academy of Sciences , 101(7):1822–1827, 2004.
3[3] G. Arampatzis, M. A. Katsoulakis, and Y. Pantazis. Accelerated sensitivity analysis in high-dimensional stochastic reaction networks. Plo S one , 10(7):e 0130825, 2015.
4[4] D. Aristoff. The parallel replica method for computing equilibrium averages of markov chains. Monte Carlo Methods and Applications , 21(4):255–273, 2015.
5[5] D. Aristoff, T. Lelièvre, and G. Simpson. The parallel replica method for simulating long trajectories of markov chains. Appl. Math. Res. Express. , 2014(2):332–352, 2014.
6[6] M. Assaf, E. Roberts, and Z. Luthey-Schulten. Determining the stability of genetic switches: explicitly accounting for mrna noise. Phys. Rev. Lett. , 106(24):248102, 2011.
7[7] A. Binder, T. Lelièvre, and G. Simpson. A generalized parallel replica dynamics. J. Comput. Phys. , 284:595–616, 2015.
8[8] P. Brémaud. Point processes and queues: martingale dynamics. 1981.