Estimation of parameter sensitivities for stochastic reaction networks   using tau-leap simulations

Ankit Gupta; Muruhan Rathinam; Mustafa Khammash

arXiv:1703.00947·math.PR·January 12, 2018·SIAM J. Numer. Anal.

Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations

Ankit Gupta, Muruhan Rathinam, Mustafa Khammash

PDF

TL;DR

This paper introduces a new, efficient method for estimating parameter sensitivities in stochastic reaction networks using tau-leap simulations, reducing computational cost while maintaining accuracy.

Contribution

The authors develop a novel integral representation for sensitivity estimation that can be approximated by any tau-leap method, improving efficiency over existing techniques.

Findings

01

The method is easy to implement and compatible with any tau-leap scheme.

02

It achieves similar accuracy to the underlying tau-leap method.

03

Numerical examples demonstrate significant efficiency gains.

Abstract

We consider the important problem of estimating parameter sensitivities for stochastic models of reaction networks that describe the dynamics as a continuous-time Markov process over a discrete lattice. These sensitivity values are useful for understanding network properties, validating their design and identifying the pivotal model parameters. Many methods for sensitivity estimation have been developed, but their computational feasibility suffers from the critical bottleneck of requiring time-consuming Monte Carlo simulations of the exact reaction dynamics. To circumvent this problem one needs to devise methods that speed up the computations while suffering acceptable and quantifiable loss of accuracy. We develop such a method by first deriving a novel integral representation of parameter sensitivity and then demonstrating that this integral may be approximated by any convergent…

Tables5

Table 1. Table 1: Trade-off relationships among the bias ℬ ( 𝒳 ) ℬ 𝒳 \mathcal{B}(\mathcal{X}) , variance 𝒱 ( 𝒳 ) 𝒱 𝒳 \mathcal{V}(\mathcal{X}) and the computational cost 𝒞 ( 𝒳 ) 𝒞 𝒳 \mathcal{C}(\mathcal{X}) for existing sensitivity estimation methods. Here h ℎ h is the perturbation size for finite-difference schemes [ 2 , 40 ] and M 0 subscript 𝑀 0 M_{0} quantifies the number of auxiliary paths for APA [ 25 ] and PPA [ 26 ] . The cost of exactly simulating the underlying process is 𝒞 0 subscript 𝒞 0 \mathcal{C}_{0} .

Type	Method	Trade-off	Trade-off	Preserved
	$𝒳$	quantities	parameter	quantity
Biased	CRP	$ℬ (𝒳) & 𝒱 (𝒳)$	$h$	$𝒞 (𝒳) \approx 2 𝒞_{0}$
	CFD
Unbiased	APA	$𝒱 (𝒳) & 𝒞 (𝒳)$	$M_{0}$	$ℬ (𝒳) = 0$
	PPA

Table 2. Table 2: Reactions for the Repressilator network [ 13 ] . Here x = ( x 1 , … , x 6 ) 𝑥 subscript 𝑥 1 … subscript 𝑥 6 x=(x_{1},\dots,x_{6}) denotes the copy-numbers of the 6 network species ordered as M 1 subscript 𝑀 1 M_{1} , M 2 subscript 𝑀 2 M_{2} , M 3 subscript 𝑀 3 M_{3} , P 4 subscript 𝑃 4 P_{4} , P 5 subscript 𝑃 5 P_{5} and P 6 subscript 𝑃 6 P_{6} .

No.	Reaction	Propensity
1	$\emptyset ⟶ M_{1}$	$λ_{1} (x) = 1 + 100 / (1 + x_{5}^{α_{1}})$
2	$\emptyset ⟶ M_{2}$	$λ_{2} (x) = 1 + 100 / (1 + x_{6}^{α_{2}})$
3	$\emptyset ⟶ M_{3}$	$λ_{3} (x) = 1 + 100 / (1 + x_{4}^{α_{3}})$
4	$M_{1} ⟶ \emptyset$	$λ_{4} (x) = x_{1}$
5	$M_{2} ⟶ \emptyset$	$λ_{5} (x) = x_{2}$
6	$M_{3} ⟶ \emptyset$	$λ_{6} (x) = x_{3}$
7	$M_{1} ⟶ M_{1} + P_{1}$	$λ_{7} (x) = 50 x_{1}$
8	$M_{2} ⟶ M_{2} + P_{2}$	$λ_{8} (x) = 50 x_{2}$
9	$M_{3} ⟶ M_{3} + P_{3}$	$λ_{9} (x) = 50 x_{3}$
10	$P_{1} ⟶ \emptyset$	$λ_{10} (x) = γ_{1} x_{4}$
11	$P_{2} ⟶ \emptyset$	$λ_{11} (x) = γ_{2} x_{5}$
12	$P_{3} ⟶ \emptyset$	$λ_{12} (x) = γ_{3} x_{6}$

Table 3. Table 3: Birth-death model: Sensitivity estimation results for T = 5 , 10 𝑇 5 10 T=5,10 . For all the methods, N = 10 5 𝑁 superscript 10 5 N=10^{5} are used to estimate the following quantities - the estimator mean ( 2.6 ), the standard deviation ( 2.9 ), the relative error (RE) percentage ( 4.37 ) and the relative standard deviation adjusted computation cost (RSDCC) ( 4.38 ) in seconds. The exact sensitivity values are − 90.204 90.204 -90.204 for T = 5 𝑇 5 T=5 and − 264.241 264.241 -264.241 for T = 10 𝑇 10 T=10 .

	eIPA				$τ$ IPA
$T$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
5	-90.079	0.093	0.139	0.379E-5	-90.938	0.078	0.813	0.121E-5
10	-264.5	0.309	0.099	0.97E-5	-266.34	0.243	0.793	0.247E-5
	eCFD				$τ$ CFD
$T$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
5	-90.632	0.088	0.4746	0.078E-5	-86.456	0.089	4.155	0.033E-5
10	-268.77	0.142	1.716	0.054E-5	-268.214	0.146	1.503	0.021E-5
	eCRP				$τ$ CRP
$T$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
5	-90.749	0.097	0.604	0.343E-5	-86.481	0.098	4.128	0.343E-5
10	-268.82	0.169	1.734	0.152E-5	-267.92	0.173	1.393	0.131E-5

Table 4. Table 4: Genetic toggle switch: Sensitivity estimation results w.r.t. all the model parameters α 1 , α 2 , β subscript 𝛼 1 subscript 𝛼 2 𝛽 \alpha_{1},\alpha_{2},\beta and γ 𝛾 \gamma . For all the methods, N = 10 5 𝑁 superscript 10 5 N=10^{5} are used to estimate the following quantities - the estimator mean ( 2.6 ), the standard deviation ( 2.9 ), the relative error (RE) percentage ( 4.37 ) and the relative standard deviation adjusted computation cost (RSDCC) ( 4.38 ) in seconds. The true sensitivity values are approximately 1.195 ± 0.009 plus-or-minus 1.195 0.009 1.195\pm 0.009 for θ = α 1 𝜃 subscript 𝛼 1 \theta=\alpha_{1} , − 2.1194 ± 0.01 plus-or-minus 2.1194 0.01 -2.1194\pm 0.01 for θ = α 2 𝜃 subscript 𝛼 2 \theta=\alpha_{2} , − 5.9929 ± 0.035 plus-or-minus 5.9929 0.035 -5.9929\pm 0.035 for θ = β 𝜃 𝛽 \theta=\beta and 54.5721 ± 0.133 plus-or-minus 54.5721 0.133 54.5721\pm 0.133 for θ = γ 𝜃 𝛾 \theta=\gamma . These values are estimated with eIPA using 10 6 superscript 10 6 10^{6} samples and they are expressed in the form s 0 ± l plus-or-minus subscript 𝑠 0 𝑙 s_{0}\pm l , which signifies that the 99 % percent 99 99\% confidence interval is ( s 0 − l , s 0 + l ) subscript 𝑠 0 𝑙 subscript 𝑠 0 𝑙 (s_{0}-l,s_{0}+l) .

	eIPA				$τ$ IPA
$θ$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
$α_{1}$	1.202	0.0107	0.625	0.0046	1.185	0.0131	0.822	0.0023
$α_{2}$	-2.133	0.0132	0.663	0.0021	-2.3968	0.0148	13.087	0.0008
$β$	-5.924	0.0419	1.144	0.0020	-8.5372	0.0562	42.456	0.0008
$γ$	54.372	0.1679	0.367	0.0009	60.156	0.191	10.232	0.0003
	eCFD				$τ$ CFD
$θ$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
$α_{1}$	1.053	0.11	11.883	0.1925	1.183	0.0491	1.021	0.0088
$α_{2}$	-2.007	0.267	5.305	0.3219	-2.734	0.0991	29.011	0.0066
$β$	-5.865	0.4535	2.1339	0.1053	-8.787	0.1813	46.617	0.0021
$γ$	54.67	1.1589	0.1794	0.0080	59.431	0.3907	8.9044	0.0002
	eCRP				$τ$ CRP
$θ$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
$α_{1}$	1.158	0.0793	3.13	0.0919	1.129	0.0781	5.4895	0.0562
$α_{2}$	-1.999	0.1306	5.701	0.0823	-2.415	0.1109	13.9646	0.0254
$β$	-6.21	0.1777	3.625	0.0161	-8.853	0.2198	47.7203	0.0074
$γ$	54.546	0.4756	0.0469	0.0015	59.807	0.4267	9.5925	0.0006

Table 5. Table 5: Repressilator model: Sensitivity estimation results w.r.t. model parameters α 1 , α 2 , α 3 , γ 1 , γ 2 subscript 𝛼 1 subscript 𝛼 2 subscript 𝛼 3 subscript 𝛾 1 subscript 𝛾 2 \alpha_{1},\alpha_{2},\alpha_{3},\gamma_{1},\gamma_{2} and γ 3 subscript 𝛾 3 \gamma_{3} . For all the methods, N = 10 5 𝑁 superscript 10 5 N=10^{5} are used to estimate the following quantities - the estimator mean ( 2.6 ), the standard deviation ( 2.9 ), the relative error (RE) percentage ( 4.37 ) and the relative standard deviation adjusted computation cost (RSDCC) ( 4.38 ) in seconds. The exact sensitivity values are approximately − 68.6271 ± 1 plus-or-minus 68.6271 1 -68.6271\pm 1 for θ = α 1 𝜃 subscript 𝛼 1 \theta=\alpha_{1} , − 2979.88 ± 8 plus-or-minus 2979.88 8 -2979.88\pm 8 for θ = α 2 𝜃 subscript 𝛼 2 \theta=\alpha_{2} , 145.041 ± 0.7 plus-or-minus 145.041 0.7 145.041\pm 0.7 for θ = α 3 𝜃 subscript 𝛼 3 \theta=\alpha_{3} , 257.091 ± 7.4 plus-or-minus 257.091 7.4 257.091\pm 7.4 for θ = γ 1 𝜃 subscript 𝛾 1 \theta=\gamma_{1} , − 119.526 ± 0.9 plus-or-minus 119.526 0.9 -119.526\pm 0.9 for θ = γ 2 𝜃 subscript 𝛾 2 \theta=\gamma_{2} and − 27.8796 ± 4.5 plus-or-minus 27.8796 4.5 -27.8796\pm 4.5 for θ = γ 3 𝜃 subscript 𝛾 3 \theta=\gamma_{3} . These values are estimated with eIPA using 10 6 superscript 10 6 10^{6} samples and they are expressed in the form s 0 ± l plus-or-minus subscript 𝑠 0 𝑙 s_{0}\pm l , which signifies that the 99 % percent 99 99\% confidence interval is ( s 0 − l , s 0 + l ) subscript 𝑠 0 𝑙 subscript 𝑠 0 𝑙 (s_{0}-l,s_{0}+l)

	eIPA				$τ$ IPA
$θ$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
$α_{1}$	-67.73	1.17	1.31	1.6801	-65.2	0.8	5	0.2886
$α_{2}$	-2982.2	10.6	0.078	0.0193	-2821.8	7.66	5.3	0.0053
$α_{3}$	145.36	1	0.22	0.2880	131.04	0.73	9.66	0.0623
$γ_{1}$	259.45	8.86	0.92	2.0139	250.4	8.2	2.6	0.7723
$γ_{2}$	-119.38	1.01	0.13	0.4097	-90.78	0.74	24.1	0.1251
$γ_{3}$	-30.38	7.82	8.98	104.45	-23.45	2.97	15.75	11.484
	eCFD				$τ$ CFD
$θ$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
$α_{1}$	-633.79	6.21	823.5	0.0334	-621.15	2.1	805.1	0.0017
$α_{2}$	-2987.1	10.01	0.24	0.0039	-2891.5	7.16	2.97	0.0009
$α_{3}$	356.95	22.3	146.1	1.3379	206.2	5.3	42.19	0.0972
$γ_{1}$	265.69	4.59	3.34	0.1019	250.5	1.43	2.5	0.0048
$γ_{2}$	-51.61	10.7	56.8	14.764	-22.8	1.16	80.9	0.3845
$γ_{3}$	-31.74	4.98	13.85	8.407	-24.61	1.42	11.72	0.4871
	eCRP				$τ$ CRP
$θ$	Mean	Std Dev	RE $%$	RSDCC	Mean	Std Dev	RE $%$	RSDCC
$α_{1}$	-648.1	2.38	844.3	0.0039	-620.9	2.1	804.7	0.0028
$α_{2}$	-3076.6	10.5	3.2	0.0033	-2897.2	7.5	2.8	0.0016
$α_{3}$	349.55	4.18	141	0.041	216.6	5.09	49.4	0.1315
$γ_{1}$	260.01	1.23	1.14	0.0064	251.7	1.41	2.1	0.0075
$γ_{2}$	-41.29	0.6	65.5	0.0602	-21.91	1.16	81.7	0.6639
$γ_{3}$	-33.98	0.52	21.88	0.0666	-23.78	0.91	14.7	0.3494

Equations193

S_{θ} (f, T) := \frac{\partial}{\partial θ} E (f (X_{θ} (T))) .

S_{θ} (f, T) := \frac{\partial}{\partial θ} E (f (X_{θ} (T))) .

A h (x) = k = 1 \sum K λ_{k} (x) (h (x + ζ_{k}) - h (x)),

A h (x) = k = 1 \sum K λ_{k} (x) (h (x + ζ_{k}) - h (x)),

X (t) = X (0) + k = 1 \sum K Y_{k} (\int_{0}^{t} λ_{k} (X (s)) d s) ζ_{k},

X (t) = X (0) + k = 1 \sum K Y_{k} (\int_{0}^{t} λ_{k} (X (s)) d s) ζ_{k},

\frac{d p _{t} ( x )}{d t} =

\frac{d p _{t} ( x )}{d t} =

S_{θ} (f, T) \approx E (s_{θ} (f, T)) .

S_{θ} (f, T) \approx E (s_{θ} (f, T)) .

μ_{N} = \frac{1}{N} i = 1 \sum N s_{i} .

μ_{N} = \frac{1}{N} i = 1 \sum N s_{i} .

μ = E (μ_{N}) = E (s_{θ} (f, T)) and σ_{N}^{2} = Var (μ_{N}) = \frac{σ ^{2}}{N}

μ = E (μ_{N}) = E (s_{θ} (f, T)) and σ_{N}^{2} = Var (μ_{N}) = \frac{σ ^{2}}{N}

\frac{RSD}{N} \leq ϵ,

\frac{RSD}{N} \leq ϵ,

σ_{N} = \frac{1}{N ( N - 1 )} i = 1 \sum N (s_{i} - μ_{N})^{2} .

σ_{N} = \frac{1}{N ( N - 1 )} i = 1 \sum N (s_{i} - μ_{N})^{2} .

N_{ϵ} C (X) \approx (RSD (X))^{2} C (X) ϵ^{- 2} = \frac{V ( X )}{( μ ( X ) ) ^{2}} C (X) ϵ^{- 2},

N_{ϵ} C (X) \approx (RSD (X))^{2} C (X) ϵ^{- 2} = \frac{V ( X )}{( μ ( X ) ) ^{2}} C (X) ϵ^{- 2},

S_{θ, h} (f, T) = \frac{E ( f ( X _{θ + h} ( T )) - f ( X _{θ} ( T )) )}{h},

S_{θ, h} (f, T) = \frac{E ( f ( X _{θ + h} ( T )) - f ( X _{θ} ( T )) )}{h},

s_{θ, h} (f, T) = \frac{f ( X _{θ + h} ( T )) - f ( X _{θ} ( T ))}{h} .

s_{θ, h} (f, T) = \frac{f ( X _{θ + h} ( T )) - f ( X _{θ} ( T ))}{h} .

S_{θ} (f, T) \approx E (s_{θ}^{(τ)} (f, T)) .

S_{θ} (f, T) \approx E (s_{θ}^{(τ)} (f, T)) .

R_{k} (t) = Y_{k} (\int_{0}^{t} λ_{k} (X (x_{0}, s)) d s) ζ_{k},

R_{k} (t) = Y_{k} (\int_{0}^{t} λ_{k} (X (x_{0}, s)) d s) ζ_{k},

Z_{α, β} (x_{0}, t_{i + 1}) = Z_{α, β} (x_{0}, t_{i}) + k = 1 \sum K ζ_{k} R_{k, i, α, β} for i = 1, \dots, μ,

Z_{α, β} (x_{0}, t_{i + 1}) = Z_{α, β} (x_{0}, t_{i}) + k = 1 \sum K ζ_{k} R_{k, i, α, β} for i = 1, \dots, μ,

∣ f (x) ∣ \leq C (1 + ∥ x ∥^{p}) for all x \in N_{0}^{d} .

∣ f (x) ∣ \leq C (1 + ∥ x ∥^{p}) for all x \in N_{0}^{d} .

t \in [0, T] sup ∣ E (∥ Z_{α, β} (x_{0}, t) ∥^{p}) - E (∥ X (x_{0}, t) ∥^{p}) ∣

t \in [0, T] sup ∣ E (∥ Z_{α, β} (x_{0}, t) ∥^{p}) - E (∥ X (x_{0}, t) ∥^{p}) ∣

\leq

\leq

∥ μ ∥_{p} = x \in Z \sum \frac{1}{2} (1 + ∥ x ∥^{p}) ∣ μ (x) ∣,

∥ μ ∥_{p} = x \in Z \sum \frac{1}{2} (1 + ∥ x ∥^{p}) ∣ μ (x) ∣,

M_{p} = {μ : Z \to R ∣ ∥ μ ∥_{p} < \infty},

M_{p} = {μ : Z \to R ∣ ∥ μ ∥_{p} < \infty},

t \in [0, T] sup (1 + E (∥ X (x_{0}, t) ∥^{p}))

t \in [0, T] sup (1 + E (∥ X (x_{0}, t) ∥^{p}))

and t \in [0, T] sup (1 + E (∥ Z_{α, β} (x_{0}, t) ∥^{p}))

t \in [0, T] sup ∣ E (ϕ (Z_{α, β} (x_{0}, t), t)) - E (ϕ (X (x_{0}, t), t)) ∣

t \in [0, T] sup ∣ E (ϕ (Z_{α, β} (x_{0}, t), t)) - E (ϕ (X (x_{0}, t), t)) ∣

t \in [0, T] sup ∣ E (ϕ (X (x_{0}, t), t)) ∣

and t \in [0, T] sup ∣ E (ϕ (Z_{α, β} (x_{0}, t), t)) ∣

Ψ_{θ} (x, f, t) = E (f (X_{θ} (t)) ∣ X_{θ} (0) = x),

Ψ_{θ} (x, f, t) = E (f (X_{θ} (t)) ∣ X_{θ} (0) = x),

Δ_{ζ_{k}} h (x) = h (x + ζ_{k}) - h (x) .

Δ_{ζ_{k}} h (x) = h (x + ζ_{k}) - h (x) .

S_{θ} (f, T) = \frac{\partial}{\partial θ} Ψ_{θ} (x_{0}, f, T) = k = 1 \sum K E (\int_{0}^{T} \frac{\partial λ _{k} ( X _{θ} ( t ) , θ )}{\partial θ} Δ_{ζ_{k}} Ψ_{θ} (X_{θ} (t), f, T - t) d t) .

S_{θ} (f, T) = \frac{\partial}{\partial θ} Ψ_{θ} (x_{0}, f, T) = k = 1 \sum K E (\int_{0}^{T} \frac{\partial λ _{k} ( X _{θ} ( t ) , θ )}{\partial θ} Δ_{ζ_{k}} Ψ_{θ} (X_{θ} (t), f, T - t) d t) .

\frac{\partial λ _{k} ( X _{θ} ( t ) , θ )}{\partial θ} .

\frac{\partial λ _{k} ( X _{θ} ( t ) , θ )}{\partial θ} .

Δ_{ζ_{k}} Ψ_{θ} (X_{θ} (t) + ζ_{k}, f, T - t) .

Δ_{ζ_{k}} Ψ_{θ} (X_{θ} (t) + ζ_{k}, f, T - t) .

S (f, T) = k = 1 \sum K E (\int_{0}^{T} \partial λ_{k} (X (t)) Δ_{ζ_{k}} Ψ (X (t), f, T - t) d t) .

S (f, T) = k = 1 \sum K E (\int_{0}^{T} \partial λ_{k} (X (t)) Δ_{ζ_{k}} Ψ (X (t), f, T - t) d t) .

τ_{max} = t \in [0, T] sup {∣ β_{0} ∣, ∣ β_{1} (t) ∣}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations

Ankit Gupta [email protected] Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland.

Muruhan Rathinam [email protected] Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, U.S.A.

Mustafa Khammash [email protected] Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland.

Abstract

We consider the important problem of estimating parameter sensitivities for stochastic models of reaction networks that describe the dynamics as a continuous-time Markov process over a discrete lattice. These sensitivity values are useful for understanding network properties, validating their design and identifying the pivotal model parameters. Many methods for sensitivity estimation have been developed, but their computational feasibility suffers from the critical bottleneck of requiring time-consuming Monte Carlo simulations of the exact reaction dynamics. To circumvent this problem one needs to devise methods that speed up the computations while suffering acceptable and quantifiable loss of accuracy. We develop such a method by first deriving a novel integral representation of parameter sensitivity and then demonstrating that this integral may be approximated by any convergent tau-leap method. Our method is easy to implement, works with any tau-leap simulation scheme and its accuracy is proved to be similar to that of the underlying tau-leap scheme. We demonstrate the efficiency of our methods through numerical examples. We also compare our method with the tau-leap versions of certain finite-difference schemes that are commonly used for sensitivity estimations.

Keywords: parameter sensitivity; reaction networks; Markov process; tau-leap simulations

Mathematical Subject Classification (2010): 60J22; 60J27; 60H35; 65C05.

1 Introduction

The study of chemical reaction networks is an essential component of the emerging fields of Systems and Synthetic Biology [1, 44, 17]. Traditionally chemical reaction networks were modeled in the deterministic setting, where the dynamics is represented by a set of ordinary differential equations (ODEs) or partial differential equations (PDEs). In the study of intracellular chemical reactions, some chemical species are present in low copy numbers. Since the behavior of individual molecules is best described by a stochastic process, in the low molecular copy number regime, the copy numbers of the molecular species itself is better modeled by a stochastic process than by ODEs [19]. Only in the limit of large molecular copy numbers, one expects the deterministic models to be accurate [3]. While our work in this paper is focused on biochemical reaction networks as primary examples, we emphasize that the mathematical framework of reaction networks can also be used to describe a wide range of other phenomena in fields such as Epidemiology [28] and Ecology [8].

Suppose $\theta$ is a parameter (like ambient temperature, cell-volume, ATP concentration etc.) that influences the rate of firing of reactions. Let $(X_{\theta}(t))_{t\geq 0}$ be the $\theta$ -dependent Markov process representing the reaction dynamics, and suppose that for some real-valued function $f$ and observation time $T$ , our output of interest is $f(X_{\theta}(T))$ . This output is a random variable and we are interested in determining the sensitivity of its expectation $\mathbb{E}(f(X_{\theta}(T)))$ w.r.t. infinitesimal changes in the parameter $\theta$ . We define this sensitivity value, denoted by $S_{\theta}(f,T)$ , as the partial derivative

[TABLE]

Determining these parametric-sensitivity values are useful in many applications, such as, understanding network design and its robustness properties [42], identifying critical reaction components, inferring model parameters [16] and fine-tuning a system’s behavior [15].

Generally the sensitivities of the form (1.1) cannot be directly evaluated, but instead, they need to be estimated with Monte Carlo simulations of the dynamics $(X_{\theta}(t))_{t\geq 0}$ . Many methods have been developed for this task [23, 33, 40, 41, 2, 25, 26], but they all rely on exact simulations of $(X_{\theta}(t))_{t\geq 0}$ that can be performed using schemes such as Gillespie’s stochastic simulation algorithm (SSA) [19]. This severely constrains the computational feasibility of these sensitivity estimation methods because these exact simulations become highly impractical if the rate of occurrence of reactions is high [21], which is typically the case. The main difficulty is that that exact simulation schemes keep track of each reaction event which is very time-consuming. To avoid this problem, tau-leaping methods have been developed that proceed by combining many reaction-firings over small time intervals [20]. Tau-leap methods have been shown to produce good approximations of the reaction dynamics, at a small fraction of the computational cost of exact simulations [20, 11, 36, 43, 5, 39, 47, 48, 29, 32]. Their accuracy and stability has also been investigated theoretically in many papers [38, 31, 6, 35, 37].

Our goal in this paper is to develop a method that takes advantage of the computational efficiency of tau-leap methods for the purpose of estimating sensitivity values of the form (1.1). Since tau-leap methods introduce a bias in the estimation, it is highly desirable to start with an unbiased method for computing sensitivities (instead of biased methods such as the Finite Difference (FD)) and then replace exact SSA simulations by a suitable tau-leap method. Having only one form of bias, modulated by the tau-leap step size, allows one to control the bias more effectively and also facilitates the design of multilevel strategies that eliminate or reduce the estimator bias and enhance its computational efficiency [7, 30, 32]. Among the existing methods in the literature, only the Girsanov Transformation (GT) method [22, 33], the Auxiliary Path Algorithm(APA)[25] and the Poisson Path Algorithm (PPA) [26] are unbiased. Since the GT method in general suffers from large variance [26, 2, 25, 40, 41, 45] and the APA/PPA methods are not directly amenable to tau-leap approximation, we develop a variant of the PPA method in which exact SSA simulations are replaced by tau-leap simulations. Our method, called Tau Integral Path Algorithm ( $\tau$ IPA), works with any underlying tau-leap simulation scheme and it is based on a novel integral representation of parameter sensitivity $S_{\theta}(f,T)$ that we derive in this paper. We provide computational examples that show that using $\tau$ IPA we can often trade-off a small amount of bias for large savings in the overall computational costs for sensitivity estimation. We prove that the bias incurred by $\tau$ IPA depends on the step-size in the same way as the bias of the tau-leap scheme chosen for simulations. Moreover if we substitute the tau-leap simulations in $\tau$ IPA with the exact SSA generated simulations, then we obtain a new unbiased method for sensitivity estimation which we call the ‘exact’ IPA or eIPA that is similar to the PPA method in [26]. Two main reasons for the high variance of the GT method that have been identified in the existing literature are: 1) low magnitude of the sensitivity parameter $\theta$ (see [26, 25]) and 2) large system-size or volume under the classical volume scaling of the reaction network [45]. The second issue is somewhat resolved by the centered Girsanov Transformation (CGT) method [45, 52] and our numerical results indicate that the volume scaling behavior of eIPA is similar to CGT (see Section 4.1). However eIPA does not suffer from high variance when the sensitivity parameter $\theta$ is small. In addition, when $\theta=0$ , GT or CGT methods are not even applicable while eIPA does not suffer from this restriction. These observations make eIPA more appealing than CGT for unbiased estimation of parameter sensitivity.

For the sake of comparison, we use tau-leap versions of certain commonly used finite-difference estimators (see [2, 40, 51]) that approximate the infinitesimal derivative in (1.1) by a finite-difference (see (2.11)). Such estimators are computationally faster than $\tau$ IPA (in simulation time per trajectory) but they suffer from two sources of bias (finite-differencing and tau-leap approximations) unlike $\tau$ IPA which only incurs bias from the latter source. We note that while in some examples the biases nearly cancel each other fortuitously, as a general principle one has no logical reason to expect such cancellation.

This paper is organized as follows. In Section 2 we describe the stochastic model for reaction dynamics and the sensitivity estimation problem. We also discuss the existing sensitivity estimation methods, the tau-leap simulation schemes and explain the rationale for using such simulations in sensitivity estimation. Section 3 contains the main results of this paper which include a novel integral representation of the exact sensitivity in Section 3.2, a result on error bounds for the sensitivity estimates of $\tau$ IPA in Section 3.3 and the novel tau-leap sensitivity estimation method $\tau$ IPA in Section 3.4. In Section 4 we provide computational examples to compare our method with other methods and finally in Section 5 we conclude and provide directions for future research.

2 Preliminaries

Consider a reaction network with $d$ species and $K$ reactions. We describe its kinetics by a continuous time Markov process whose state at any time is a vector in the non-negative integer orthant $\mathbb{N}^{d}_{0}$ comprising of the molecular counts of all the $d$ species. The state evolves due to transitions caused by the firing of reactions. We suppose that when the state is $x$ , the rate of firing of the $k$ -th reaction is given by the propensity function $\lambda_{k}(x)$ and the corresponding state-displacement is denoted by the stoichiometric vector $\zeta_{k}\in\mathbb{Z}^{d}$ . There are several ways to represent the Markov process $(X(t))_{t\geq 0}$ that describes the reaction kinetics under these assumptions. We can specify the generator (see Chapter 4 in [14]) of this process by the operator

[TABLE]

where $h$ is any bounded real-valued function on $\mathbb{N}^{d}_{0}$ . Alternatively we can express the Markov process directly by its random time-change representation (see Chapter 7 in [14])

[TABLE]

where $\{Y_{k}:k=1,\dots,K\}$ is a family of independent unit rate Poisson processes. Since the process $(X(t))_{t\geq 0}$ is Markovian, it can be equivalently specified by writing the Kolmogorov forward equation for the evolution of its probability distribution $p_{t}(x):=\mathbb{P}(X(t)=x)$ at each state $x$ :

[TABLE]

This set of coupled ordinary differential equations (ODEs) is termed as the Chemical Master Equation (CME) in the biological literature [3]. As the number of ODEs in this set is typically infinite, the CME is nearly impossible to solve directly, except in very restrictive cases. A common strategy is to estimate its solution with pathwise simulations of the process $(X(t))_{t\geq 0}$ using Monte Carlo schemes such as Gillespie’s SSA [19], the next reaction method [18], the modified next reaction method [4], and so on. While these schemes are easy to implement, they become computationally infeasible for even moderately large networks, because they account for each and every reaction event. To resolve this issue, tau-leaping methods have been developed which will be described in greater detail in Section 3.1.

We now assume that each propensity function $\lambda_{k}$ depends on a real-valued system parameter $\theta$ . To emphasize this dependence we write the rate of firing of the $k$ -th reaction at state $x$ as $\lambda_{k}(x,\theta)$ instead of $\lambda_{k}(x)$ . Let $(X_{\theta}(t))_{t\geq 0}$ be the Markov process representing the reaction dynamics with these parameter-dependent propensity functions. As stated in the introduction, for a function $f:\mathbb{N}^{d}_{0}\to\mathbb{R}$ and an observation time $T\geq 0$ , our goal is to determine the sensitivity value $S_{\theta}(f,T)$ defined by (1.1). This value cannot be computed directly for most examples of interest and so we need to find ways of estimating it using simulations of the process $(X_{\theta}(t))_{t\geq 0}$ . Such simulation-based sensitivity estimation methods work by specifying the construction of a random variable $s_{\theta}(f,T)$ whose expected value is “close” to the true sensitivity value $S_{\theta}(f,T)$ , i.e.

[TABLE]

Once such a construction is available, a large number (say $N$ ) of independent realizations $s_{1},\dots,s_{N}$ of this random variable $s_{\theta}(f,T)$ are obtained and the sensitivity is estimated by computing their empirical mean $\widehat{\mu}_{N}$ as

[TABLE]

This estimator $\widehat{\mu}_{N}$ is a random variable with mean and variance

[TABLE]

respectively, where $\sigma^{2}=\textnormal{Var}(s_{\theta}(f,T))$ . For a large sample size $N$ , the distribution of $\widehat{\mu}_{N}$ is approximately Gaussian with mean $\mu$ and variance $\sigma^{2}_{N}$ , due to the Central Limit Theorem. The standard deviation $\sigma_{N}$ measures the statistical spread of the estimator $\widehat{\mu}_{N}$ , that is inversely proportional to its statistical precision. The sample size $N$ must be large enough to ensure that $\sigma_{N}$ is small relative to $\mu$ , i.e. for some small parameter $\epsilon>0$ , we should have

[TABLE]

where $\textnormal{RSD}:=\sigma/|\mu|$ is the relative standard deviation of the random variable $s_{\theta}(f,T)$ . If such a condition holds, then $\widehat{\mu}_{N}$ is a reliable estimator for the true sensitivity value $S_{\theta}(f,T)$ because it is very likely to assume a value close to its mean $\mu=\mathbb{E}(s_{\theta}(f,T))$ which in turn is close to $S_{\theta}(f,T)$ (see (2.5)). In practice both $\mu$ and $\sigma$ are unknown, but we can estimate them as $\mu\approx\widehat{\mu}_{N}$ and $\sigma\approx\sqrt{N}\widehat{\sigma}_{N}$ where

[TABLE]

is the estimated standard deviation $\sigma_{N}$ of the estimator.

The performance of any sensitivity estimation method (say $\mathcal{X}$ ) depends on the following three key metrics that are based on the properties of random variable $s_{\theta}(f,T)$ :

The bias $\mathcal{B}(\mathcal{X})=\mathbb{E}(s_{\theta}(f,T))-S_{\theta}(f,T)$ , which is the error incurred by the approximation (2.5). 2. 2.

The variance $\mathcal{V}(\mathcal{X})=\textnormal{Var}(s_{\theta}(f,T))$ of random variable $s_{\theta}(f,T)$ . 3. 3.

The computational cost $\mathcal{C}(\mathcal{X})$ of generating one sample of $s_{\theta}(f,T)$ .

The bias $\mathcal{B}(\mathcal{X})$ can be positive or negative, and its absolute value $|\mathcal{B}(\mathcal{X})|$ can be seen as the upper-bound on the statistical accuracy that can be achieved with method $\mathcal{X}$ by increasing the sample size $N$ [9]. As mentioned before, the standard deviation $\sigma(\mathcal{X})=\sqrt{\mathcal{V}(\mathcal{X})}$ measures the statistical precision of the method $\mathcal{X}$ and its magnitude relative to the mean $\mu(\mathcal{X})=\mathbb{E}(s_{\theta}(f,T))$ determines the number of samples $N$ that are needed to produce a reliable estimate. In particular, to satisfy condition (2.8) for the relative standard deviation $\textnormal{RSD}(\mathcal{X})=\sigma(\mathcal{X})/|\mu(\mathcal{X})|$ , the number of samples $N_{\epsilon}$ needed would be around $N_{\epsilon}:=(\textnormal{RSD}(\mathcal{X}))^{2}\epsilon^{-2}$ . Hence the total cost of the estimation procedure is

[TABLE]

where $\mathcal{C}(\mathcal{X})$ is the CPU time required for constructing one realization of $s_{\theta}(f,T)$ . The goal of a good estimation method is to simultaneously minimize the three quantities $|\mathcal{B}(\mathcal{X})|$ , $\mathcal{V}(\mathcal{X})$ and $\mathcal{C}(\mathcal{X})$ . This creates various conflicts and trade-offs among the existing sensitivity estimation methods as we now discuss.

2.1 Biased methods

A sensitivity estimation method $\mathcal{X}$ is called biased if $\mathcal{B}(\mathcal{X})\neq 0$ . The most commonly used biased methods are the finite-difference schemes which approximate the infinitesimal derivative in the definition of parameter sensitivity (see (1.1)) by a finite-difference of the form

[TABLE]

for a small perturbation $h$ . The processes $X_{\theta}$ and $X_{\theta+h}$ represent the Markovian reaction dynamics with values of the sensitive parameter set to $\theta$ and $\theta+h$ respectively. These two processes can be simulated independently [23] but it is generally better to couple them in order to reduce the variance of the associated estimator. The two commonly used coupling strategies are called Common Reaction Paths (CRP) [40] and Coupled Finite Differences (CFD) [2] and they are based on the random time-change representation (2.3).

The finite-difference approximation (2.11) for the true sensitivity value can be expressed as the expectation $\mathbb{E}(s_{\theta,h}(f,T))$ of the following random variable

[TABLE]

The three metrics (bias, variance and computational cost) based on this random variable define the performance of CRP and CFD. Since both these methods estimate the same quantity $S_{\theta,h}(f,T)$ , they have the same bias (i.e. $\mathcal{B}(\textnormal{CRP})=\mathcal{B}(\textnormal{CFD})$ ). However in many cases it is found that the CFD coupling is tighter than the CRP coupling, resulting in a lower variance of $s_{\theta,h}(f,T)$ (i.e. $\mathcal{V}(\textnormal{CFD})<\mathcal{V}(\textnormal{CRP})$ ) (see [2]). For each realization of $s_{\theta,h}(f,T)$ , both CRP and CFD require simulation of a coupled trajectory $(X_{\theta},X_{\theta+h})$ in the time interval $[0,T]$ . The computational costs of such a simulation is roughly $2\mathcal{C}_{0}$ , where $\mathcal{C}_{0}$ is the cost of exactly simulating the process $X_{\theta}$ using Gillespie’s SSA [19] or a similar method.111In fact the cost of generating a realization of $s_{\theta}(f,T)$ is usually smaller for CFD in comparison to CRP (i.e. $\mathcal{C}(\textnormal{CFD})<\mathcal{C}(\textnormal{CRP})$ ), because the CFD coupling is such that if $X_{\theta}(t)=X_{\theta+h}(t)$ for some $t<T$ , then this equality will hold for the remaining time-interval $[t,T]$ , allowing us to directly set $s_{\theta,h}(f,T)=0$ without completing the simulation in the interval $[t,T]$ .

Finite-difference schemes introduce a bias in the estimate whose size is proportional to the perturbation value $h$ (i.e. $\mathcal{B}(\textnormal{CRP})=\mathcal{B}(\textnormal{CFD})\propto h$ ), but the constant of proportionality can be quite large in many cases, leading to significant errors even for small values of $h$ [26]. Unfortunately we cannot circumvent this problem by choosing a very small $h$ because the variance is proportional to $1/h$ (i.e. $\mathcal{V}(\mathcal{\textnormal{CRP}}),\mathcal{V}(\mathcal{\textnormal{CFD}})\propto 1/h$ ). Therefore if a very small $h$ is selected, the variance will be enormous and the sample-size required to produce a statistically precise estimate will be very large, imposing a heavy computational burden on the estimation procedure [26]. This trade-off between bias and variance is the main drawback of finite-difference schemes and there does not exist a strategy for selecting $h$ that optimally balances these two quantities. Note that unlike bias and variance, the computational cost of generating a sample (i.e. $\mathcal{C}(\textnormal{CRP})$ or $\mathcal{C}(\textnormal{CFD})$ ) does not change significantly with $h$ , thereby ensuring that regardless of $h$ , the total computational burden varies linearly with the required number of samples $N$ . Apart from finite-difference schemes, there exists another biased method, called the regularized pathwise-derivative method [41] for estimating the sensitivity value (1.1), but we do not discuss this approach in this paper.

2.2 Unbiased methods

A sensitivity estimation method $\mathcal{X}$ is called unbiased if $\mathcal{B}(\mathcal{X})=0$ . The main advantage of unbiased methods is that the estimation can in principle be made as accurate as possible by increasing the sample size $N$ . The first unbiased method for sensitivity estimation is called the Girsanov Transformation (GT) method [22, 33], which works by estimating the $\theta$ -derivative of the probability distribution of $X_{\theta}$ . The GT method is easy to implement and the computation cost of generating each sample is roughly $\mathcal{C}_{0}$ – the cost of exact simulation of the process $X_{\theta}$ . The main issue with the GT method is that generally the variance of its associated random variable $s_{\theta}(f,T)$ is very large and so the number of samples needed to obtain a statistically precise estimate is very high [2, 40]. So far two reasons have been identified for this behavior. Firstly, it has been shown that for mass-action models (see [3]) this variance can become unbounded when the magnitude of the sensitive reaction rate-constant $\theta$ approaches zero [26, 25]. This is a serious issue because biological networks often consist of slow reactions which are characterized by low values of the associated rate-constants. Furthermore the GT method does not allow one to estimate the sensitivity w.r.t. a rate-constant set to zero. Such sensitivity values are useful for understanding network design as it allows one to probe the effect of presence or absence of reactions. Another reason for the high variance of GT estimator was provided in [45] where it was theoretically established that this variance can grow boundlessly as the system expands in size, i.e. the system volume $V$ tends to infinity. This issue is somewhat ameliorated by the centered Girsanov Transformation (CGT) method [52] but the problem with small reaction rate-constants persists.

We now discuss a couple of unbiased methods that have been recently proposed. These methods are called the Auxiliary Path Algorithm(APA )[25] and the Poisson Path Algorithm (PPA) [26], and they are based on exact representations of the form (2.5) for the parameter sensitivity (1.1). For both the methods, sampling the random variable $s_{\theta}(f,T)$ requires simulation of a fixed number $M_{0}$ of additional paths of the process $X_{\theta}$ . It was shown in [25] that in comparison to the GT method, the computational cost of generating each sample for APA is much higher (i.e. $\mathcal{C}(\textnormal{APA})\gg\mathcal{C}(\textnormal{GT})$ ) but this is often compensated by the fact that its variance is much lower (i.e. $\mathcal{V}(\textnormal{APA})\ll\mathcal{V}(\textnormal{GT})$ ), resulting in a smaller overall cost of estimation (2.10). The reason for the higher sampling cost for APA is that it needs estimates of certain unknown quantities at each jump-time of the process $X_{\theta}$ in the time interval $[0,T]$ , which can be very large in number even for small networks. In PPA, this problem is resolved by randomly selecting a small number of these unknown quantities for estimation in such a way that the estimator remains unbiased. Due to this extra randomness, the sample variance for PPA is generally greater than APA (i.e. $\mathcal{V}(\textnormal{PPA})>\mathcal{V}(\textnormal{APA})$ ) but the computational cost for realizing each sample is much lower (i.e. $\mathcal{C}(\textnormal{PPA})\ll\mathcal{C}(\textnormal{APA})$ ). Moreover in comparison to APA, PPA is far easier to implement and has lower memory requirements, making it an attractive unbiased method for sensitivity estimation. In [26] it is shown using many examples that for a given level of statistical accuracy, PPA can be more efficient than GT and also the finite-difference schemes CFD and CRP. The computational cost of generating each sample in PPA is roughly $(2M_{0}+1)\mathcal{C}_{0}$ , where $M_{0}$ is a small number that upper-bounds the expected number of unknown quantities that will be estimated using additional paths. For both APA and PPA, the parameter $M_{0}$ serves as a trade-off factor between the computational cost and the variance - as $M_{0}$ increases, the cost also increases but the variance decreases. However both these methods remain unbiased for any choice of $M_{0}$ .

The foregoing trade-off relationships for the existing sensitivity estimation methods are summarized in Table 1.

2.3 Rationale for using tau-leap schemes for sensitivity estimation

All the existing sensitivity estimation methods suffer from a critical bottleneck – they are all based on exact simulations of the process $X_{\theta}$ . The computational cost $\mathcal{C}_{0}$ of generating each trajectory of $X_{\theta}$ can be exorbitant even for moderately large networks when those networks have some molecular species in moderately large copy numbers and/or reactions firing at multiple timescales (stiff systems). One way to counter this problem is to develop methods that can accurately estimate parameter sensitivities with approximate computationally inexpensive simulations of the process $X_{\theta}$ obtained with tau-leap methods. The use of tau-leap simulations provides a natural way to trade-off a small amount of error with a potentially large reduction in the computational costs.

The explicit tau-leap method with Poisson random numbers proposed by Gillespie [20] generally works well in non-stiff situations and when molecular copy numbers are modestly large. The major drawback is that it becomes inefficient for stiff systems where vastly different time scales are present. The implicit tau-leap was proposed to remedy this weakness [36]. Many other tau-leap methods and step size selection strategies have been proposed to address stiffness and other issues [11, 43, 5, 39, 48, 47, 32].

In the context of stiff systems, tau-leap methods have not been as successful in maintaining accuracy while reducing computational cost in comparison with the success of stiff solvers for deterministic differential equations. This is because stiffness manifests in a more complex manner in stochastic systems where stability is not the only issue, but accurately capturing the asymptotic distribution of the fast variables is also important [36, 37, 49, 50]. We shall limit our attention to non-stiff or modestly stiff systems in this paper.

Our goal in this paper is to develop a method that can estimate parameter sensitivity $S_{\theta}(f,T)$ of the form (1.1) using only tau-leap simulations of the process $X_{\theta}$ . This can be done by specifying a random variable $s^{(\tau)}_{\theta}(f,T)$ which can be constructed with these tau-leap simulations and whose expected value is “close” to the true sensitivity value $S_{\theta}(f,T)$ , i.e.

[TABLE]

We propose such a random variable $s^{(\tau)}_{\theta}(f,T)$ in this paper and provide a simple algorithm for generating the realizations of $s^{(\tau)}_{\theta}(f,T)$ . We theoretically show that under certain reasonable conditions, the associated estimator is tau-convergent, which means that the bias incurred due to the approximation in (2.12) converges to [math], as the maximum step-size $\tau_{\textnormal{max}}$ or the coarseness of the time-discretization mesh goes to [math]. Hence by making this mesh finer and finer, we can make the estimator as accurate as we desire, provided that we are willing to bear the increasing computational costs. In the context of estimating expected values $\mathbb{E}(f(X_{\theta}(T)))$ , the property of tau-convergence along with the rate of convergence, has already been established for many tau-leap schemes [38, 31, 6, 35]. We use these pre-existing results and obtain a similar tau-convergence result for our sensitivity estimation method. An important feature of our approach is that it is completely flexible, as far as the choice of the tau-leap simulation method is concerned. Furthermore the order of accuracy of our sensitivity estimation method is the same as the order of accuracy of the underlying tau-leap method.

We end this section with observing that incorporating tau-leap schemes in sensitivity estimation opens up a new dimension in attacking this challenging problem. In the trade-off relationships for existing sensitivity estimation methods (see Table 1) parameters like $h$ and $M_{0}$ only allow us to explore one trade-off curve between the variance $\mathcal{V}(\mathcal{X})$ and some other metric like the bias $\mathcal{B}(\mathcal{X})$ (for $\mathcal{X}$ = CRP, CFD) or the computational cost $\mathcal{C}(\mathcal{X})$ (for $\mathcal{X}$ = APA, PPA). The main advantage of employing tau-leap schemes is that they provide a mechanism for exploring another trade-off curve between the bias $\mathcal{B}(\mathcal{X})$ and the computational cost $\mathcal{C}(\mathcal{X})$ , for the purpose of optimizing the performance of a sensitivity estimation method. In Section 4, we provide numerical examples to show that with tau-leap simulations we can indeed trade-off a small amount of bias with large savings in the computational effort required for estimating parameter sensitivity. Moreover this trade-off relationship appears to be independent of existing trade-off relationships mentioned in Table 1 because replacing exact simulations in a sensitivity estimation method, with approximate tau-leap simulations, usually does not alter the variance $\mathcal{V}(\mathcal{X})$ significantly at least when the tau step size is sufficiently small (see Section 4). Of course, the computational advantage of tau-leap schemes can only be appropriated if we can incorporate them into existing sensitivity estimation methods. The main contribution of this paper is to develop a method, similar to PPA, that works well with tau-leap schemes (see Section 3). For the sake of comparison, we also provide tau-leap versions of the finite-difference schemes (CRP and CFD) in Section 4.

3 Sensitivity estimation with tau-leap simulations

In this section we present our approach for accurately estimating parameter sensitivities of the form (1.1) with only approximate tau-leap simulations of the dynamics. This approach is based on an exact integral representation for parameter sensitivity given in Section 3.2. With this representation at hand, we construct a tau-leap estimator for parameter sensitivity and examine its convergence properties as the time-discretization mesh gets finer and finer (see Sections 3.3 and 3.4). Thereafter in Section 3.5 we present an algorithm that computes the tau-leap estimator for sensitivity estimation. We start with the description of a generic tau-leap method that approximately simulates the stochastic reaction paths defined by the Markov process $(X(x_{0},t))_{t\geq 0}$ with generator $\mathbb{A}$ (see (2.2)) and initial state $x_{0}$ .

3.1 A generic tau-leap method

For each reaction $k=1,\dots,K$ , let $R_{k}(t)$ be the number of firings of reaction $k$ until time $t$ . Due to (2.3) we can express each $R_{k}(t)$ as

[TABLE]

where $\{Y_{k}:k=1,\dots,K\}$ is a family of independent unit rate Poisson processes. From now on we refer to $R(t)=(R_{1}(t),\dots,R_{K}(t))$ as the reaction count vector. For any two time values $s,t\geq 0$ (with $s<t$ ), the states at these times satisfy $X(x_{0},t)=X(x_{0},s)+\sum_{k=1}^{K}(R_{k}(t)-R_{k}(s))\zeta_{k}.$ At any given time $t$ and the computed (approximate) state $x$ at time $t$ , a tau-leap method entails taking either a predetermined step of size $\tau>0$ or choosing step-size $\tau$ as a function of the current state and time, i.e. step-size selection is adapted to the information sigma-algebra generated by the tau-leap process. Next an approximating distribution for the state at time $(t+\tau)$ is generated. This distribution is generally found by approximating the difference $(R(t+\tau)-R(t))$ in the reaction count vector by a random variable $\widetilde{R}=(\widetilde{R}_{1},\dots,\widetilde{R}_{K})$ whose probability distribution is easy to sample from. The most straightforward choice is given by the simple (explicit) Euler method [20], which assumes that the propensities are approximately constant in the time interval $[t,t+\tau)$ and conditioned on the information at time $t$ , each $\widetilde{R}_{k}$ is an independent Poisson random variable with rate $\lambda_{k}(x)\tau$ . Other distributions for $\widetilde{R}=(\widetilde{R}_{1},\dots,\widetilde{R}_{K})$ have also been used in the literature to obtain better approximations and particularly to prevent the state-components from becoming negative [43]. The selection method for step size $\tau$ also varies, with the simplest being steps based on a deterministic mesh $0=t_{0}<t_{1}\dots<t_{n}=T$ over the observation time interval $[0,T]$ . To obtain better accuracy several strategies have been proposed that randomly select $\tau$ based on some criteria such as avoidance of negative state-components or constancy of conditional propensities [11, 5, 32].

To represent a generic tau-leap method we shall use a pair of abstract labels $\alpha$ and $\beta$ , where $\alpha$ denotes a method, i.e. a choice of distribution for $\widetilde{R}$ , and $\beta$ denotes a step size selection strategy. We will use $|\beta|$ as a (deterministic) parameter which quantifies the coarseness of the time-discretization scheme $\beta$ . For instance $\alpha$ may stand for the explicit Euler tau-leap method [20] and $\beta$ may stand for a deterministic mesh $0=t_{0}<t_{1}<\dots<t_{n}=T$ , and in this case the coarseness parameter is $|\beta|=\max(t_{j}-t_{j-1})$ . Typically, tau-leap methods produce approximations of the underlying process at certain leap times that are separated by the step-size $\tau$ and one can interpolate these approximate state values at other time points. The most obvious interpolation is the “sample and hold” method, where the tau-leap process is held constant between the consecutive leap times. In circumstances, such as the explicit Euler tau-leap method with Poisson updates, it is more natural to use interpolation strategies based on the random time-change representation (2.3) – for example see the “Poisson bridge” approach in [29]. In the following discussion, we suppose that the interpolation strategy is also determined by the label $\alpha$ . We shall use $(Z_{\alpha,\beta}(x_{0},t))_{t\geq 0}$ to denote the tau-leap process, that approximates the exact dynamics $(X(x_{0},t))_{t\geq 0}$ , and that results from the application of a tau-leap method $\alpha$ with step size selection strategy $\beta$ . This process is defined by the prescription $Z_{\alpha,\beta}(x_{0},t_{0})=x_{0}$ and

[TABLE]

where $\mu$ is the (possibly) random number of time points, $0=t_{0}<t_{1}<\dots<t_{\mu}=T$ are the (possibly) random leap times, and $\widetilde{R}_{k,i,\alpha,\beta}$ for $i=1,\dots,\mu$ and $k=1,\dots,K$ are random variables whose distribution when conditioned on $Z_{\alpha,\beta}(x,t_{i})$ is determined by the method $\alpha$ and step size strategy $\beta$ .

Remark 3.1

Note that this generic tau-leap method reduces to Gillespie’s SSA [19], if at state $Z_{\alpha,\beta}(x_{0},t_{i})=z$ , the next step size $\tau$ is an exponentially distributed random variable with rate $\lambda_{0}(z):=\sum_{k=1}^{K}\lambda_{k}(z)$ and each $\widetilde{R}_{k,i,\alpha,\beta}$ is chosen as $1$ if $k=\eta$ and [math] otherwise, where $\eta$ is a discrete random variable which assumes the value $i\in\{1,\dots,K\}$ with probability $(\lambda_{i}(z)/\lambda_{0}(z))$ .

Later we shall establish tau-convergence of our sensitivity estimator by showing that for a fixed tau-leap method $\alpha$ , the bias incurred by our estimator converges to [math] as the coarseness $|\beta|$ of the time-discretization scheme goes to [math]. For this we shall require (weak) convergence of all moments of the tau-leap process to those of the exact process. We now state this requirement more precisely and present a simple lemma that will be needed later. For $p\geq 0$ , we say that a function $f:\mathbb{N}_{0}^{d}\to\mathbb{R}$ is of class $\mathcal{C}_{p}$ if there exists a positive constant $C$ such that

[TABLE]

We shall require that a tau-leap method $\alpha$ satisfies an order $\gamma>0$ convergent error bound. This is stated formally by Assumption 1 and it can be verified using the results in [35].

Assumption 1 Given a tau-leap method $\alpha$ , there exist $\gamma>0$ , $\delta>0$ and a mapping $\xi:\mathbb{R}_{+}\to\mathbb{R}_{+}$ such that, for every $p\geq 0$ and every final time $T>0$ , there exists a constant $C_{1}(p,T,\alpha)$ satisfying

[TABLE]

for any initial state $x_{0}$ provided that $|\beta|\leq\delta$ . Note that here the second inequality is our assumption while the first inequality always holds. In above, we have assumed that there is a common probability space $(\Omega,\mathbb{P})$ carrying the exact process $X$ and the tau-leap process $Z_{\alpha,\beta}$ .

Remark 3.2

We observe that Assumption 1 essentially assumes order $O(|\beta|^{\gamma})$ convergence in the so-called $p$ -th moment variation norm (see [35]) of the probability law of $Z_{\alpha,\beta}(x_{0},t)$ (on $\mathbb{Z}^{d}$ ) to the probability law of $X(x_{0},t)$ (on $\mathbb{Z}^{d}$ ) and it is not as restrictive as it might seem at first glance. The $p$ -th moment variation norm of a signed finite measure $\mu$ on $\mathbb{Z}^{d}$ which possesses a finite $p$ -th moment is defined by

[TABLE]

and the space $\mathcal{M}_{p}$ defined by

[TABLE]

is isometrically isormorphic to $\ell^{1}$ , the space of absolutely summable sequences and moreover, $\mathcal{C}_{p}$ is the dual space of $\mathcal{M}_{p}$ (see [35]). We note that by the Schur property, weak convergence implies norm convergence in $\ell^{1}$ . In Assumption 1, if we merely assumed weak convergence of order $O(|\beta|^{\gamma})$ in $\mathcal{M}_{p}$ , due to the Schur property, we obtain convergence in $p$ -th moment variation norm of order $O(|\beta|^{\gamma^{\prime}})$ for any $\gamma^{\prime}\in(0,\gamma)$ . Moreover, we note that convergence of tau-leap methods in the moment variation norms have been derived in [35] and apply to a large class of situations including (but not limited to) systems that remain in a bounded subset of the integer state space. We also remark that to our best knowledge, all convergence results on tau-leaping have been limited to considering determinstic time steps. However, in the applied literature, adaptive time step selection methods have been explored numerically, and it is reasonable to expect convergence results to be established in the future for a reasonable class of adaptive step size selection schemes. In this paper, our numerical simulations are restricted to deterministic time steps.

Additionally we will require Assumptions 2 and 3 on moment growth bounds of the exact process as well as the tau-leap process. These assumptions can be verified using the results in [34, 24, 35].

Assumptions 2 and 3 Given a tau-leap method $\alpha$ , there exists $\delta>0$ such that for each $T>0$ and $p\geq 0$ there exist constants $C_{2}(p,T)$ and $C_{3}(p,T,\alpha)$ satisfying

[TABLE]

for all $t\in[0,T]$ , provided $|\beta|\leq\delta$ .

We emphasize that constants $C_{1}$ and $C_{3}$ in Assumptions 1 and 3, do not depend on the step-size selection strategy $\beta$ , and all the three constants in these assumptions may be assumed to be monotonic in $T$ without any loss of generality. The following lemma follows readily from the above assumptions.

Lemma 3.3

Consider a function $\phi:\mathbb{N}_{0}^{d}\times[0,T]\rightarrow\mathbb{R}$ and suppose that there exists a constant $C>0$ such that $\sup_{t\in[0,T]}|\phi(x,t)|\leq C(1+\|x\|^{p})$ for all $x\in\mathbb{N}_{0}^{d}$ . Then under Assumptions 1,2 and 3, we have

[TABLE]

provided $|\beta|\leq\delta$ .

3.2 An integral formula for parameter sensitivity

Let $(X_{\theta}(t))_{t\geq 0}$ be the Markov process representing reaction dynamics with initial state $x_{0}$ and let $\Psi_{\theta}(x,f,t)$ be defined by

[TABLE]

for any state $x\in\mathbb{N}^{d}_{0}$ and time $t\geq 0$ . For any $k=1,\dots,K$ and any function $h:\mathbb{N}^{d}_{0}\to\mathbb{R}$ , let $\Delta_{\zeta_{k}}$ denote the difference operator given by

[TABLE]

The following theorem expresses the sensitivity value $S_{\theta}(f,T)$ as the expectation of a random variable which can be computed from the paths of the process $(X_{\theta}(t))_{t\geq 0}$ in the time interval $[0,T]$ . The proof of this theorem is provided in the Appendix 5.1.

Theorem 3.4

Suppose $(X_{\theta}(t))_{t\geq 0}$ is the Markov process with generator $\mathbb{A}_{\theta}$ and initial state $x_{0}$ . Then the sensitivity value $S_{\theta}(f,T)$ is given by

[TABLE]

Remark 3.5

This formula has the following simple interpretation. Due to an infinitesimal perturbation of parameter $\theta$ , the probability that the process $(X_{\theta}(t))_{t\geq 0}$ has an “extra” jump at time $t$ in the direction $\zeta_{k}$ is proportional to

[TABLE]

Moreover the change in the expectation of $f(X_{\theta}(T))$ at time $T$ due to this “extra” jump at time $t$ is just

[TABLE]

The above result shows that the overall sensitivity of the expectation of $f(X_{\theta}(x,T))$ is just the product of these two terms, integrated over the whole time interval $[0,T]$ .

The rest of this section is devoted to the development of a tau-leap estimator for parameter sensitivity using this formula. To simplify our notations, we suppress the dependence on parameter $\theta$ , and hence denote $\lambda_{k}(\cdot,\theta)$ by $\lambda_{k}(\cdot)$ , $\partial\lambda_{k}/\partial\theta$ by $\partial\lambda_{k}$ , $S_{\theta}(f,T)$ by $S(f,T)$ , $\Psi_{\theta}(x,f,t)$ by $\Psi(x,f,t)$ and the process $(X_{\theta}(t))_{t\geq 0}$ by $(X(t))_{t\geq 0}$ . Due to Theorem 3.4 the sensitivity value $S(f,T)$ can be expressed as

[TABLE]

3.3 Sensitivity approximation with tau-leap simulations

In order to construct a tau-leap estimator for parameter sensitivity using formula (3.18), we need to replace both $\partial\lambda_{k}(X(t))$ and $\Delta_{\zeta_{k}}\Psi(X(t),f,T-t)$ with approximations derived with tau-leap simulations. Recall from Section 3.1 that a generic tau-leap scheme can be described by a pair of abstract labels $\alpha$ and $\beta$ , specifying the method and the step-size selection strategy respectively. Assuming such a tau-leap scheme is chosen, let the corresponding tau-leap process $(Z_{\alpha,\beta}(x,t))_{t\geq 0}$ (see (3.13)) be an approximation for the exact dynamics starting at state $x$ .

Suppose that we use the tau-leap method $\alpha_{0}$ with the step-size selection strategy $\beta_{0}$ to approximate $X(t)$ and possibly a different tau-leap method $\alpha_{1}$ with a time-dependent step-size selection strategy $\beta_{1}(t)$ to compute an approximation of $\Delta_{\zeta_{k}}\Psi(X(t),f,T-t)$ . This time-dependence in step-size selection is needed because the latter quantity requires simulation of auxiliary tau-leap paths in the interval $[0,T-t]$ which varies with $t$ . We discuss this in greater detail in the next section. In the following discussion, we will assume that both the tau-leap schemes $(\alpha_{0},\beta_{0})$ and $(\alpha_{1},\beta_{1}(t))$ satisfy Assumptions 1,2 and 3, with common $\gamma>0,\delta>0$ and with $|\beta|$ replaced by the supremum step-size

[TABLE]

which is less than $\delta$ . We define the tau-leap approximation of $\Psi(x,f,t)$ (see (3.17)) by

[TABLE]

and make the assumption that the step size selection strategy $\beta_{1}(t)$ depends on $t$ in such a way that $t\mapsto\widetilde{\Psi}_{\alpha_{1},\beta_{1}(t)}(x,f,T-t)$ is a measurable function of $t$ . Motivated by formula (3.18), we shall approximate the true sensitivity value $S(f,t)$ by

[TABLE]

where $x_{0}$ is the starting state of the process $(X(t))_{t\geq 0}$ . The next theorem, proved in the Appendix 5.1, shows that the bias of this sensitivity approximation is similar to the bias of the underlying tau-leap scheme. In particular if the tau-leap method satisfies order $\gamma$ convergent error bound, then the same is true for the error incurred by the sensitivity approximation. Before we state the theorem, recall that for any $p\geq 0$ , a function $f:\mathbb{N}^{d}_{0}\to\mathbb{R}$ is in class $\mathcal{C}_{p}$ if it satisfies (3.14) for some constant $C\geq 0$ .

Theorem 3.6

Let $f:\mathbb{N}_{0}^{d}\to\mathbb{R}$ as well as $\partial\lambda_{k}$ for each $k=1,\dots,K$ be of class $\mathcal{C}_{p}$ for some $p\geq 0$ . Suppose that a tau-leap approximation $\widetilde{S}(f,T)$ of the exact sensitivity $S(f,T)$ is computed by (3.21), where a tau-leap method $\alpha_{0}$ with step size strategy $\beta_{0}$ is used to approximate the underlying process $(X(t))_{t\geq 0}$ and possibly a different tau-leap method $\alpha_{1}$ with time-dependent step size strategy $\beta_{1}(t)$ is used to compute approximations $\widetilde{\Psi}_{\alpha_{1},\beta_{1}(t)}(x,f,T-t)$ of $\Psi(x,f,T-t)$ at each $t\in[0,T]$ . If both the tau-leap methods satisfy Assumptions 1,2 and 3, with common $\gamma>0$ and $\delta>0$ , then there exists a constant $\widetilde{C}(f,T)$ such that

[TABLE]

where $\tau_{\textnormal{max}}$ is given by (3.19) and it is less than $\delta$ .

We remark that there are two forms of error analyses in the literature for tau-leap methods. The first type is more conventional where the analysis is carried out for a given system in an interval $[0,T]$ as $\tau_{\max}\to 0$ . See [38, 31, 35]. An alternative analysis considers a family of systems parametrized by “system size” $V$ , where step size $\tau$ is chosen in relation to $V$ as $\tau=V^{-\beta}$ (where $\beta>0$ ), and the limit considered as $V\to\infty$ [6]. As pointed out in [35] both analyses are useful. The first type of analysis with fixed system size is important in that if convergence or more importantly zero-stability (see [35]) does not hold in this conventional sense, then the computed solution can be very erroneous not only when the step size $\tau$ is too large, but also when it is too small! On the other hand, the system size scaling analysis helps explains why tau-leap remains efficient while leaping over several reaction events. In the interest of space, we limit ourselves to the first type in this paper.

3.4 A tau-leap estimator for parameter sensitivity

We now come to the problem of estimating the sensitivity approximation $\widetilde{S}(f,T)$ using tau-leap simulations. Expression (3.21) shows that $\widetilde{S}(f,T)$ is the expectation of the random variable $\overline{s}(f,T)$ defined by

[TABLE]

If we can generate samples of this random variable, then the estimation of $\widetilde{S}(f,T)$ would be quite straightforward using (2.6). However this is not the case as the random variable $\overline{s}(f,T)$ is nearly impossible to generate. This is mainly because it requires computing quantities of the form

[TABLE]

at infinitely many time points $t$ . These quantities generally do not have an explicit formula and hence they need to be estimated via auxiliary Monte Carlo simulations, which severely restricts the number of such quantities that can be feasibly estimated. We tackle these problems by constructing another random variable $\widetilde{s}(f,T)$ whose expected value equals $\widetilde{S}(f,T)$ , and whose samples can be easily generated using a simple procedure called $\tau$ IPA (Tau Integral Path Algorithm) that is described in Section 3.5. This random variable is constructed by adding randomness to the random variable $\overline{s}(f,T)$ in such a way that only a small finite number of unknown quantities of the form (3.23) require estimation. We now present this construction.

Construction of the random variable $\widetilde{s}(f,T)$ : Recall from Section 3.1 the description of the tau-leap process $(Z_{\alpha_{0},\beta_{0}}(x_{0},t))_{t\geq 0}$ which approximates the exact dyamics $(X(t))_{t\geq 0}$ . Let $0=t_{0}<t_{1}<\dots<t_{\mu}=T$ be the (possibly random) mesh corresponding to step size selection strategy $\beta_{0}$ . We denote the $\sigma$ -algebra generated by the process $(Z_{\alpha_{0},\beta_{0}}(x_{0},t))_{t\geq 0}$ and the random mesh $\beta_{0}$ over the interval $[0,T]$ by $\mathcal{F}_{T}$ . Let $\tau_{i}=t_{i+1}-t_{i}$ and let $\eta_{i}$ be the positive integer given by

[TABLE]

where $C$ is a positive constant and $\lceil x\rceil$ denotes the smallest integer greater than or equal to $x$ . The choice of $C$ and its role will be explained later in the section. Define $\sigma_{ij}:=t_{i}+u_{ij}\tau_{i}$ for each $j=1,\dots,\eta_{i}$ , where each $u_{ij}$ is an independent random variable with distribution $\textnormal{Uniform}[0,1]$ . Thus given $t_{i}$ and $t_{i+1}$ , the distribution of each $\sigma_{ij}$ is $\textnormal{Uniform}[t_{i},t_{i+1}]$ . Moreover taking expectation over the distribution of $u_{ij}$ -s we get

[TABLE]

In deriving the last equality we have used the substitution $t=t_{i}+u\tau_{i}$ . This relation along with (3.21) yields

[TABLE]

using linearity of the expectation operator. To obtain the states $Z_{\alpha_{0},\beta_{0}}(x_{0},\sigma_{ij})$ for all the $\sigma_{ij}$ -s, we need to interpolate the tau-leap dynamics between the times $t_{i}$ and $t_{i+1}$ .

To proceed further we define a “conditional estimator” $\widehat{D}_{kij}$ of the quantity (3.23) at $t=\sigma_{ij}$ by

[TABLE]

where $z=Z_{\alpha_{0},\beta_{0}}(x_{0},\sigma_{ij})$ , and $Z^{1kij}$ and $Z^{2kij}$ are instances of tau-leap approximations of the exact dynamics starting at initial states $(z+\zeta_{k})$ and $z$ respectively. Both these tau-leap processes use the same method $\alpha_{1}$ and the same step-size selection strategy $\beta_{1}(\sigma_{ij})$ . Moreover conditioned on $Z_{\alpha_{0},\beta_{0}}(x_{0},\sigma_{ij})$ and $\sigma_{ij}$ , the processes $Z^{1kij},Z^{2kij}$ and the step-size selection strategy $\beta_{1}(\sigma_{ij})$ are independent of the process $Z_{\alpha_{0},\beta_{0}}$ and the step-size selection strategy $\beta_{0}$ . Therefore it is immediate that

[TABLE]

and hence from (3.25) we obtain the following representation for $\widetilde{S}(f,T)$

[TABLE]

An estimator for $\widetilde{S}(f,T)$ based on this formula can require several computations of $\widehat{D}_{kij}$ . Since each evaluation of $\widehat{D}_{kij}$ is computationally expensive, we would like to control the total number of these evaluations by randomizing the decision of whether $\widehat{D}_{kij}$ should be evaluated at time $\sigma_{ij}$ or not. Moreover this randomization must be performed without introducing a bias in the estimator. We now describe this process.

Define $R_{kij}$ and $P_{kij}$ by

[TABLE]

and let $\rho_{kij}$ be an independent $\{0,1\}$ -valued random variable whose distribution is Bernoulli with parameter $P_{kij}$ . Since $\mathbb{E}\left(\rho_{kij}\middle|Z_{\alpha_{0},\beta_{0}}(x_{0},\sigma_{ij}),\mathcal{F}_{T}\right)=P_{kij}$ we have that

[TABLE]

where we define $R_{kij}/P_{kij}$ to be [math] when $R_{kij}=0$ . This formula suggests that $\widetilde{S}(f,T)$ can be estimated, without any bias, using realizations of the random variable

[TABLE]

In generating each realization of $\widetilde{s}(f,T)$ , the computation of $\widehat{D}_{kij}$ is only needed if the Bernoulli random variable $\rho_{kij}$ is $1$ . Therefore, if we can effectively control the number of such $\rho_{kij}$ -s then we can efficiently generate realizations of $\widetilde{s}(f,T)$ . This can be achieved using the positive parameter $C$ (see (3.24) and (3.29)) as we soon explain. Based on the construction outlined above, we provide a method in Section 3.5 for obtaining realizations of the random variable $\widetilde{s}(f,T)$ . We call this method, the Tau Integral Path Algorithm ( $\tau$ IPA), to emphasize the fact that $\widetilde{s}(f,T)$ is essentially an approximation of the integral (3.22). Using $\tau$ IPA we can efficiently generate realizations $s_{1},s_{2},\dots,s_{N}$ of $\widetilde{s}(f,T)$ and approximately estimate the parameter sensitivity $\widetilde{S}(f,T)$ with the estimator (2.6).

Minimizing the variance of $\widetilde{s}(f,T)$ : To improve the efficiency of $\tau$ IPA, we must minimize the additional variance due to the extra randomness that has been added to the random variable $\overline{s}(f,T)$ (3.22) to obtain $\widetilde{s}(f,T)$ . Since $\mathbb{E}(\widetilde{s}(f,T)|\mathcal{F}_{T})=\overline{s}(f,T)$ , this additional variance is equal to $\textnormal{Var}(\widetilde{s}(f,T)|\mathcal{F}_{T})$ , and in order to reduce this quantity we focus on reducing the conditional variance $\text{Var}(\widehat{D}_{kij}|\mathcal{F}_{T})$ . Recall that $\widehat{D}_{kij}$ is given by (3.26) and for convenience we abbreviate $Z^{lkij}_{\alpha_{1},\beta_{1}(\sigma_{ij})}$ by $Z^{l}$ for $l=1,2$ . The reduction in this conditional variance can be accomplished by tightly coupling the pair of processes $(Z^{1},Z^{2})$ . For this purpose we use the split-coupling (see [2]) specified by

[TABLE]

where $\{Y_{k},Y^{(1)}_{k},Y^{(2)}_{k}:k=1,\dots,K\}$ is an independent family of unit rate Poisson processes. Here $\alpha(s)=t_{i}$ for $t_{i}\leq s<t_{i+1}$ , and $\{t_{0},t_{1},t_{2},\dots\}$ is the sequence of leap-times of the pair of processes $(Z^{1},Z^{2})$ jointly simulated with the tau-leap scheme $(\alpha_{1},\beta_{1}(t))$ . Note that process $\alpha$ is adapted to the filtration generated by processes $(Z^{1},Z^{2})$ . Hence a solution to (3.32)-(3.33) can be found by explicit construction. The uniqueness of the solution $(Z^{1},Z^{2})$ , until the first time $\tau_{M}$ its norm exceeds some constant $M>0$ , is guaranteed by the local boundedness of the associated generator (see Theorem 4.1 in Chapter 4 of [14]). Using Assumption 3 one can show that as $M\to\infty$ we have $\tau_{M}\to\infty$ a.s. and from this, the uniqueness of the solution $(Z^{1},Z^{2})$ in the whole time-interval $[0,\infty)$ can be established. See Lemma A.1 in [27] for more details on this argument.

Controlling the number of nonzero $\rho_{kij}$ -s: We now discuss how the positive parameter $C$ can be selected to control the total number of $\rho_{kij}$ -s that assume the value $1$ in (3.31), which is $\rho_{\textnormal{tot}}=\sum_{k=1}^{K}\sum_{i=1}^{\mu-1}\sum_{j=1}^{\eta_{i}}\rho_{kij}$ . This is the number of $\widehat{D}_{kij}$ -s that are required to obtain a realization of $\widetilde{s}(f,T)$ . It is immediate that given the sigma field $\mathcal{F}_{T}$ , $\rho_{\textnormal{tot}}$ is a $\mathbb{N}_{0}$ -valued random variable whose expectation is given by:

[TABLE]

Using $a\wedge b\leq a$ and

[TABLE]

we obtain

[TABLE]

We choose a positive integer $M_{0}$ and set

[TABLE]

where the expectation can be approximately estimated using $N_{0}$ tau-leap simulations of the dynamics in the time interval $[0,T]$ . Such a choice ensures that $\rho_{\textnormal{tot}}$ is bounded above by $M_{0}$ on average. In most cases we can expect that $R_{kij}$ to be close to $\partial\lambda_{k}(Z_{\alpha_{0},\beta_{0}}(x_{0},t_{i}))\tau_{i}$ and so the choice of $\eta_{i}$ automatically ensures that $|R_{kij}|\leq C\eta_{i}$ . Hence inequality (3.34) is almost exact and with $C$ chosen as (3.35) we have $\mathbb{E}\left(\rho_{\textnormal{tot}}\right)\approx M_{0}$ . Therefore $M_{0}$ can be interpreted as the expected number of coupled auxiliary paths (3.32)-(3.33) needed to obtain a realization of $\widetilde{s}(f,T)$ . This parameter is in the hands of the user and it plays the same role as in PPA (see Section 2.2), namely, it allows one to select the trade-off between the computational cost $\mathcal{C}(\tau\textnormal{IPA})$ and the variance $\mathcal{V}(\tau\textnormal{IPA})$ . A higher value of $M_{0}$ reduces the variance while simultaneously increasing the computational cost. Hence it is difficult to ascertain the effect of $M_{0}$ on the overall estimation cost which depends on the product $\mathcal{C}(\tau\textnormal{IPA})\mathcal{V}(\tau\textnormal{IPA})$ (see (2.10)). Numerical examples suggest that for low values of $M_{0}$ , the overall estimation cost decreases gradually with increase in $M_{0}$ , but this trend reverses for higher values of $M_{0}$ (see Section 4). More work is needed to examine if this pattern persists more generally and how one can select the optimal value of $M_{0}$ . Note however that $\tau$ IPA will provide an unbiased estimator for $\widetilde{S}(f,T)$ (3.21) regardless of the choice of $M_{0}$ . Hence the accuracy of $\tau$ IPA does not vary much with $M_{0}$ , which is also seen in the numerical examples.

3.5 The Tau Integral Path Algorithm ( $\tau$ IPA)

We now provide a detailed description of the method $\tau$ IPA which produces realizations of the random variable $\widetilde{s}(f,T)$ defined by (3.31). Computing the empirical mean (2.6) of these realizations estimates the approximate parameter sensitivity $\widetilde{S}(f,T)$ . Throughout this section we assume that the function $rand()$ returns independent samples from the distribution $\textnormal{Uniform}[0,1]$ .

The method $\tau$ IPA can be adapted to work with any tau-leap scheme, but for concreteness, we assume that an explicit tau-leap scheme is used for all the simulations. This means that the current state $z$ and time $t$ , are sufficient to determine the distributions of the next time-step $\tau$ and the vector of reaction firings $\widetilde{R}=(\widetilde{R}_{1},\dots,\widetilde{R}_{K})$ in the time interval $[t,t+\tau)$ . We suppose that a sample from these two distributions can be obtained using the methods $\textsc{GetTau}(z,t,T)$ 222We allow the step-size selection to depend on both the current time $t$ and the final time $T$ . This is especially important for simulating the auxiliary paths that are required to compute the $\widehat{D}_{kij}$ -s in (3.31) (see Sections 3.3 and 3.4). and $\textsc{GetReactionFirings}(z,\tau)$ respectively. If we use the simplest tau-leap scheme given in [20], then reaction firings can be generated as

[TABLE]

for $k=1,\dots,K$ , where the function $\textsc{Poisson}(r)$ generates an independent Poisson random variable with mean $r$ . Once we have the reaction firings $\widetilde{R}=(\widetilde{R}_{1},\dots,\widetilde{R}_{K})$ , the state at time $(t+\tau)$ is given by $z^{\prime}=(z+\sum_{k=1}^{K}\widetilde{R}_{k}\zeta_{k})$ and for any intermediate time-point $\sigma\in(t,t+\tau)$ the state $\widehat{z}$ can be obtained using the “Poisson bridge” interpolation (see [29]). However this interpolation approach is equivalent to setting $\widehat{z}=(z+\sum_{k=1}^{K}\widetilde{R}^{(1)}_{k}\zeta_{k})$ and $z^{\prime}=(\widehat{z}+\sum_{k=1}^{K}\widetilde{R}^{(2)}_{k}\zeta_{k})$ , where $\widetilde{R}^{(1)}=(\widetilde{R}^{(1)}_{1},\dots,\widetilde{R}^{(1)}_{K})$ and $\widetilde{R}^{(2)}=(\widetilde{R}^{(2)}_{1},\dots,\widetilde{R}^{(2)}_{K})$ are reaction firing vectors generated according to (3.36) with $\tau$ replaced by $(\sigma-t)$ and $(t+\tau-\sigma)$ respectively. This idea can be easily generalized to obtain the interpolated states $\widehat{z}_{1},\dots,\widehat{z}_{\eta}$ at $\eta$ intermediate times $\sigma_{1},\dots,\sigma_{\eta}\in(t,t+\tau)$ sorted in ascending order, i.e. $\sigma_{1}<\dots<\sigma_{\eta}$ .

Let $Z$ denote the tau-leap process approximating the reaction dynamics with initial state $x_{0}$ . Our first task is to select the normalization parameter $C$ according to (3.35), by estimating the expectation in the formula using $N_{0}$ simulations of the process $Z$ . This is done using the function

$\textsc{Select-Normalizing-Constant}(x_{0},M_{0},T)$ (see Algorithm 2 in Appendix 5.2) where $M_{0}$ is the expected number of auxiliary paths (3.32)-(3.33) that need to be simulated (see Section 3.4). Once $C$ is chosen, a single realization of $\widetilde{s}(f,T)$ can be computed using $\textsc{GenerateSample}(x_{0},T,C)$ (Algorithm 1). This method simulates the tau-leap process $Z$ and at each leap-time $t_{i}$ , the following happens:

The next leap size $\tau_{i}$ ( $=\tau$ ) is chosen and the positive integer $\eta_{i}$ ( $=\eta$ ) is computed. 2. 2.

The intermediate time-points $\sigma_{j}$ -s are generated for $j=1,\dots,\eta$ and sorted in ascending order. 3. 3.

For each $j$ , the vector of reaction firings $\widetilde{R}=(\widetilde{R}_{1},\dots,\widetilde{R}_{K})$ for the time-interval $(\sigma_{j-1},\sigma_{j})$ is computed and the interpolated state $\widehat{z}_{j}$ at time $\sigma_{j}$ is evaluated. Then for each reaction $k$ the following happens:

•

The variables $R_{kij}$ ( $=R$ ), $P_{kij}$ ( $=P$ ) and $\rho_{kij}$ ( $=\rho$ ) are generated. The function $\textsc{Bernoulli}(P)$ generates an independent Bernoulli random variable with expectation $P$ .

•

If $\rho_{kij}=1$ then $\widehat{D}_{kij}$ (see (3.26)) is evaluated using

$\textsc{EvaluateCoupledDifference}(\widehat{z}_{j},\widehat{z}_{j}+\zeta_{k},\sigma,T)$ (see Algorithm 3 in

Appendix 5.2) and the sample value is updated according to (3.31). This method independently simulates the pair of processes $(Z^{1},Z^{2})$ specified by the split-coupling (3.32)-(3.33) in order to compute $\widehat{D}_{kij}$ . For simplicity we assume that these simulations are carried out by the same tau-leap scheme which generates reaction firings according to (3.36). 4. 4.

Finally, time $t$ is updated to $(t+\tau)$ , reaction firings for the time-interval $[\sigma_{\eta},t)$ are computed and the state is updated accordingly.

Note that in the computation of reaction firings the propensities are evaluated at $z$ rather than any of the interpolated states $\widehat{z}_{j}$ .

4 Numerical Examples

In this section we computationally compare six sensitivity estimation methods on many examples. The methods we consider are the following:

Tau Integral Path Algorithm or $\tau$ IPA: This is the method described in Section 3.5. The tau-leap scheme we use is the simple Euler method [20] with Poisson reaction firings (3.36) and uniform step-size $\tau=\tau_{\textnormal{max}}$ . To avoid the possibility of leaping-over the final time $T$ at which the sensitivity is to be estimated, we set

[TABLE]

The value of $\tau_{\textnormal{max}}$ will depend on the example being considered and the default value of parameter $M_{0}$ is $10$ . 2. 2.

Exact Integral Path Algorithm or eIPA: This is the method we obtain by replacing the tau-leap simulations in $\tau$ IPA with the exact simulations performed with Gillespie’s SSA [19]. This replacement can be easily made by choosing the step-size and the reaction firings according to Remark 3.1. Moreover we need to change the method EvaluateCoupledDifference to the version given in [26]. Note that eIPA is a new unbiased method for estimating parameter sensitivity, like the methods in Section 2.2. This method is conceptually similar to PPA [26], but unlike PPA, the formula (3.18) underlying $\tau$ IPA does not involve summation over the jumps of the process, which makes it more amenable for incorporating tau-leap schemes. 3. 3.

Exact Coupled Finite Difference or eCFD: This is same as the CFD method in [2]. 4. 4.

Exact Common Reaction Paths or eCRP: This is same as the CRP method in [40]. 5. 5.

Tau Coupled Finite Difference or $\tau$ CFD: This method is the tau-leap version of CFD which has been proposed in [51]. Let $(Z_{\theta},Z_{\theta+h})$ be the pair of tau-leap processes that approximate the processes $(X_{\theta},X_{\theta+h})$ , and suppose that at leap time $t_{i}$ their state is $(Z_{\theta}(t_{i}),Z_{\theta+h}(t_{i}))=(z_{1},z_{2})$ . If the next step-size is $\tau$ , then for every reaction $k=1,\dots,K$ , we set the number of firings $(\widetilde{R}_{\theta,k},\widetilde{R}_{\theta+h,k})$ for this pair of processes as $\widetilde{R}_{\theta,k}=A_{k}+\textsc{Poisson}((\lambda_{k}(z_{1})-\lambda_{k}(z_{1})\wedge\lambda_{k}(z_{2}))\tau)$ and $\widetilde{R}_{\theta+h,k}=A_{k}+\textsc{Poisson}((\lambda_{k}(z_{2})-\lambda_{k}(z_{1})\wedge\lambda_{k}(z_{2}))\tau)$ , where $A_{k}=\textsc{Poisson}((\lambda_{k}(z_{1})\wedge\lambda_{k}(z_{2}))\tau)$ . Such a selection of reaction firings emulates the CFD coupling. To facilitate comparison, we choose the tau-leap simulation method to be the same as for $\tau$ IPA. 6. 6.

Tau Common Reaction Paths or $\tau$ CRP: This method can be viewed as the tau-leap version of CRP where the CRP coupling is emulated by coupling the Poisson random variables that generate the reaction firings. Using the same notation as before, if $(Z_{\theta}(t_{i}),Z_{\theta+h}(t_{i}))=(z_{1},z_{2})$ and the next step-size is $\tau$ , then we set the number of firings $(\widetilde{R}_{\theta,k},\widetilde{R}_{\theta+h,k})$ as $\widetilde{R}_{\theta,k}=\textsc{Poisson}(\lambda_{k}(z_{1})\tau,k)$ and $\widetilde{R}_{\theta+h,k}=\textsc{Poisson}(\lambda_{k}(z_{2})\tau,k)$ for every reaction $k=1,\dots,K$ . Here we assume that there are $K$ parallel streams of independent $\textnormal{Uniform}[0,1]$ random variables (see [40]), and the method $\textsc{Poisson}(r,k)$ uses the uniform random variable from the $k$ -th stream for generating the Poisson random variable with mean $r$ . As for $\tau$ CFD, the tau-leap simulation method is the same as for $\tau$ IPA.

In all the finite-difference schemes, we use perturbation-size $h=0.1$ and we center the parameter perturbations to obtain better accuracy. This centering can be easily achieved by substituting $\theta$ with $(\theta-h/2)$ and $(\theta+h)$ with $(\theta+h/2)$ in the expression (2.11) and also in the definition of the coupled processes. Since we use Poisson random variables to generate the reaction firings for tau-leap simulations, it is possible that some state-components become negative during the simulation run. In this paper we deal with this problem rather crudely by setting the negative state-components to [math]. We have checked that this does not cause a significant loss of accuracy because the state-components become negative very rarely.

Note that among the methods considered here, eIPA is the only unbiased sensitivity estimation method. All the other methods are biased either due to a finite-difference approximation of the derivative (eCFD and eCRP) or due to tau-leap approximation of the sample paths ( $\tau$ IPA) or due to both these reasons ( $\tau$ CFD and $\tau$ CRP). In the examples, we apply each sensitivity estimation method $\mathcal{X}$ with a sample-size of $N=10^{5}$ , and compute the estimator mean $\widehat{\mu}_{N}$ (2.6), the standard deviation $\widehat{\sigma}_{N}$ (2.9), the relative standard deviation $\textnormal{RSD}(\mathcal{X})$ and the computational cost per sample $\mathcal{C}(\mathcal{X})$ (see Section 2). Assume that the exact sensitivity value is $s_{0}$ which is known. We compare the different estimation methods using the following two quantities - the percentage relative error (RE) defined by

[TABLE]

and the RSD adjusted computational cost (RSDCC) defined by

[TABLE]

The first quantity RE measures the accuracy of a method, while the second quantity RSDCC determines the overall computational time that will be required by the method to yield an estimate with the desired statistical precision (see (2.10)).

Our numerical results will show that the exact schemes (eIPA, eCFD and eCRP) usually have a higher RSDCC than their tau-leap counterparts ( $\tau$ IPA, $\tau$ CFD and $\tau$ CRP), but expectedly their RE is lower. Generally the RE for eIPA is smaller than both eCFD and eCRP because of its unbiasedness and this advantage in accuracy often persists when we compare $\tau$ IPA with $\tau$ CFD and $\tau$ CRP. It can be seen that in most of the cases, the sample variance $\mathcal{V}(\mathcal{X})$ or the estimator standard deviation (2.9), remain of similar magnitude, when we switch from an exact scheme to its tau-leap version (see Appendix 5.2). This supports our claim in Section 2.3, that substituting exact paths with tau-leap trajectories allows one to trade-off bias with computational costs, and this trade-off relationship is somewhat “orthogonal” to other trade-off relationships shown in Table 1.

In all the examples below, the propensity functions $\lambda_{k}$ -s for all the reactions have the mass-action form [3] unless stated otherwise. Also $\partial$ always denotes the partial-derivative w.r.t. the designated sensitive parameter $\theta$ .

4.1 Single-species birth-death model

Our first example is a simple birth-death model in which a single species $\mathcal{S}$ is created and destroyed according to the following two reactions:

[TABLE]

Let $\theta_{1}=10$ , $\theta_{2}=0.1$ and assume that the sensitive parameter is $\theta=\theta_{2}$ . Let $(X(t))_{t\geq 0}$ be the Markov process representing the reaction dynamics. Assume that $X(0)=0$ . For $f(x)=x$ we wish to estimate

[TABLE]

for $T=5$ and $T=10$ . For this example, we set $\tau_{\textnormal{max}}=0.5$ . For each $T$ we estimate the sensitivity using all the six methods and the results are displayed in Table 3 in Appendix 5.2. For this network we can compute the sensitivity $S_{\theta}(f,T)$ exactly as the propensity functions are affine. These exact values are stated in the caption of Table 3, and they allow us to compute the RE of a method according to (4.37). We also compute the RSDCC333All the computations in this paper were performed using C++ programs on an Apple machine with the 2.9 GHz Intel Core i5 processor. for each method using (4.38), and we compare these RE and RSDCC values for all the methods in Figure 1A. From these comparisons we can make the following observations: 1) The exact methods are typically more accurate than the tau-leap methods but they are usually more computationally demanding. 2) For $T=5$ , eCFD/eCRP are far more accurate than $\tau$ CFD/ $\tau$ CRP suggesting that the two sources of bias (finite-difference and tau-leap approximations) are additive in nature. However the same is not true for $T=10$ . 3) For both the cases $T=5$ and $T=10$ , $\tau$ IPA outperforms $\tau$ CFD/ $\tau$ CRP in terms of accuracy even though it is slightly more computationally expensive. Same is true when we compare eIPA with eCFD/eCRP.

In Figure 1B we numerically analyze the performance of $\tau$ IPA w.r.t. its two key parameters - the expected number of auxiliary paths $M_{0}$ and the maximum tau-leap step-size $\tau_{\textnormal{max}}$ . We see that RE is fairly insensitive to variations in $M_{0}$ while RSDCC first decreases with $M_{0}$ up to a certain point, and then it starts increasing with $M_{0}$ . As we are using a first-order explicit tau-leap scheme, it is unsurprising that RE increases almost linearly with $\tau_{\textnormal{max}}$ . However, importantly, RSDCC decreases exponentially with $\tau_{\textnormal{max}}$ , which makes it possible to use tau-leap simulations to trade-off a small amount of accuracy for a large gain in computational efficiency with $\tau$ IPA.

Observe that if we scale the production rate $\theta_{1}$ by the system-size or volume parameter $V$ , then the concentration process, derived by dividing the copy-number counts $X(t)$ by $V$ , converges to a deterministic ODE limit as $V\to\infty$ (see Chapter 11 in [14]). Often it is of interest to determine how the performance of various sensitivity estimation methods scales with the volume parameter $V$ . We investigate this issue for the exact schemes (eIPA, eCFD and eCRP) in Figure 2, by numerically examining the dependence of their RSD, RSDCC and RE on $V$ . Here we set the expected number of auxiliary paths $M_{0}$ for eIPA to be equal to $V$ . Note that RSD for finite-difference schemes (eCFD/eCRP) scales like $1/\sqrt{V}$ as was proved in [45] and consequently their RSDCC is of order $1$ , because the computational time per sample, which is proportional to the number of reaction events per unit time-interval, is of order $V$ . Similar to these finite-difference schemes the RSD for eIPA also scales like $1/\sqrt{V}$ , but its RSDCC is of order $V$ as its computational time per sample is of order $V^{2}$ because to generate each sample for eIPA, $M_{0}=V$ auxiliary paths need to be simulated in addition to the main sample path. This computational disadvantage of eIPA is compensated by the fact that accuracy of eIPA improves with volume (i.e. RE decreases with volume), while for the finite-difference schemes it is almost a constant. These numerical results suggest that the computational efficiency of eIPA scales with volume $V$ in the same way as it does for the CGT method (see Section 2.2) whose RSD has been shown to be of order $1$ w.r.t. volume $V$ (see [45]). Despite this similarity in volume scaling, eIPA is still a preferable unbiased method when compared to the CGT method, as its estimator variance does not become unbounded as the magnitude of the sensitive parameter approaches zero (see Section 2.2). The volume-scaling analysis presented here can also be performed for the tau-leap schemes by parameterizing the step-size $\tau_{\textnormal{max}}$ by volume $V$ as discussed in Section 3.3. We expect the results to be qualitatively similar to the exact schemes, because, as mentioned previously, it is observed that the sample variance remains similar when we switch from an exact scheme to its tau-leap version (see Appendix 5.2). However this needs to be investigated in detail in a future work.

4.2 Repressilator Network

Our second example considers the Repressilator network given in [13], which consists of three mutually repressing gene-expression modules (say 1,2 and 3). Repression occurs at the level of transcription, i.e. production of the three mRNAs $M_{1}$ , $M_{2}$ and $M_{3}$ , and it is carried out by the corresponding protein molecules $P_{1}$ , $P_{2}$ and $P_{3}$ in a cyclic pattern. In other words, protein $P_{i}$ represses the transcription of mRNA $M_{i-1}$ , where we identify $M_{0}$ with $M_{3}$ . The repression mechanism is modeled with a nonlinear Hill function. The repressilator network consists of $6$ biomolecular species and $12$ reactions described in Table 2.

We set the Hill coefficient $\alpha_{i}$ for the transcription of each mRNA to be $1$ (see reactions 1-3 in Table 2) and the degradation rate constant $\gamma_{i}$ for each protein to be $0.1$ (see reactions 10-12 in Table 2). Let $(X(t))_{t\geq 0}$ be the $\mathbb{N}^{6}_{0}$ -valued Markov process representing the reaction dynamics, under the species ordering described in the caption of Table 2. We assume that $X(0)=(0,0,0,0,0,0)$ and define $f:\mathbb{N}^{6}_{0}\to\mathbb{R}$ by $f(x_{1},\dots,x_{6})=x_{4}$ . At $T=10$ , our goal is to estimate

[TABLE]

for $\theta=\alpha_{1},\alpha_{2},\alpha_{3},\gamma_{1},\gamma_{2},\gamma_{3}$ . These values measure the sensitivity of the mean of protein $P_{1}$ population at time $T=10$ with respect to the Hill coefficients $\alpha_{i}$ -s and the protein degradation rates $\gamma_{j}$ -s. For this example, we set $\tau_{\textnormal{max}}=0.01$ .

For each $\theta$ we estimate the sensitivity using all the six methods and the results are displayed in Table 5 in Appendix 5.2. Unlike the previous example, we cannot compute the sensitivity values exactly because of nonlinearity of some of the propensity functions. So we obtain accurate approximations of these values using the unbiased estimator (eIPA) with a large sample size ( $N=10^{6}$ ) and they are provided in the caption of Table 5. With these values we can compute the REs (4.37), which are then compared along with RSDCCs for all the methods in Figure 3. The results vary with the choice of the sensitive parameter $\theta$ , but one can clearly see that $\tau$ IPA can be several times more accurate than $\tau$ CFD / $\tau$ CRP even though its RSDCC is of a similar magnitude. This is especially observable for cases $\theta=\alpha_{1},\alpha_{3}$ and $\gamma_{2}$ . Most notably for the case $\theta=\alpha_{1}$ , the RE for finite-difference schemes is around $800\%$ , while it is $1.3\%$ for eIPA and $5\%$ for $\tau$ IPA.

4.3 Genetic toggle switch

As our last example we look at a simple network with nonlinear propensity functions. Consider the network of a genetic toggle switch proposed by Gardner et. al. [17]. This network has two species $\mathcal{U}$ and $\mathcal{V}$ that interact through the following four reactions

[TABLE]

where the propensity functions $\lambda_{i}$ -s are given by

[TABLE]

In the above expressions, $x_{1}$ and $x_{2}$ denote the number of molecules of $\mathcal{U}$ and $\mathcal{V}$ respectively. We set $\alpha_{1}=50$ , $\alpha_{2}=16$ , $\beta=2.5$ and $\gamma=1$ . Let $(X(t))_{t\geq 0}$ be the $\mathbb{N}^{2}_{0}$ -valued Markov process representing the reaction dynamics with initial state $(X_{1}(0),X_{2}(0))=(0,0)$ . For $T=10$ and $f(x)=x_{1}$ , our goal is to estimate

[TABLE]

for $\theta=\alpha_{1},\alpha_{2},\beta$ and $\gamma$ . In other words, we would like to measure the sensitivity of the mean of the number of $\mathcal{U}$ molecules at time $T=10$ , with respect to all the model parameters. For this example, we set $\tau_{\textnormal{max}}=0.1$ . We estimate these sensitivities with all the six methods and the results are presented in Table 4 in Appendix 5.2, and in Figure 4A.

As in the previous example, we estimate the true sensitivity values using the unbiased estimator (eIPA) with a large sample size ( $N=10^{6}$ ). These approximate values are given in the caption of Table 4 and they were used in computing the relative errors (4.37) for Figure 4. Here we find that eIPA outperforms eCFD/eCRP both in terms of accuracy and computational efficiency for all the parameters. Similarly $\tau$ IPA is computationally more efficient than $\tau$ CFD/ $\tau$ CRP for all the parameters, but except for the case $\theta=\alpha_{1}$ , its accuracy is similar to $\tau$ CFD/ $\tau$ CRP. In Figure 4B we numerically examine how the performance of $\tau$ IPA is affected by the parameter $M_{0}$ , for a couple of cases. As in Section 4.1, we find this effect to be quite small for RE but RSDCC first decreases with $M_{0}$ and then increases.

5 Conclusions and future work

Estimation of parameter sensitivities for stochastic reaction networks in an important and difficult problem. The main source of difficulty is that all the estimation methods rely on exact simulations of the reaction dynamics performed using Gillespie’s SSA [19] or its variants [18, 4]. It is well-known that these simulation algorithms are computationally very demanding as they track each and every reaction event which can be very cumbersome. This issue represents the main bottleneck in the use of sensitivity analysis for systems modeled as stochastic reaction networks. The aim of this paper is to develop a method, called Tau Integral Path Algorithm ( $\tau$ IPA), that feasibly deals with this issue by requiring only approximate tau-leap simulations of the reaction dynamics, and still providing provably accurate estimates for the sensitivity values. This method is based on an explicit integral representation for parameter sensitivity that was derived from the formula given in [25]. Furthermore, by replacing the tau-leap simulation scheme in $\tau$ IPA with an exact simulation scheme like SSA, we obtain a new unbiased method (called eIPA) for sensitivity estimation, that can serve as the natural limit of $\tau$ IPA when the step-size $\tau$ gets smaller and smaller.

Using computational examples we compare $\tau$ IPA with tau-leap versions of the finite-difference schemes [2, 40, 51] that are commonly employed for sensitivity estimation. We find that in many cases, $\tau$ IPA outperforms these tau-leap finite-difference schemes in terms of both accuracy and computational efficiency. This makes $\tau$ IPA an appealing method for sensitivity analysis of stochastic reaction networks, where the exact dynamical simulations are computationally infeasible and tau-leap approximations become necessary.

As we argue in Section 2.3, tau-leap simulations provide a natural way to trade-off estimator bias with gains in computational speed. Therefore it would be of fundamental importance to extend the ideas in this paper and try to maximize the computational gains from tau-leap simulations while sacrificing the minimum amount of accuracy. In this context, we now mention two possible directions for future research. The method we proposed here, $\tau$ IPA, can work with any underlying tau-leap simulation scheme, but for simplicity we examined it with the most basic tau-leap scheme i.e. an explicit Euler method with a constant (deterministic) step-size and Poissonian reaction firings [20]. As this tau-leap scheme has several drawbacks (see [21]), it is very likely that $\tau$ IPA can yield much better results if a more sophisticated tau-leap scheme is employed, possibly with random step-sizes [11, 5, 32], or with Binomial leaps [43] or using implicit step-size selection [36]. We shall explore these issues in a future paper. Note that $\tau$ IPA essentially converts the problem of estimating parameter sensitivities to the problem of estimating a collection of expected values of the process with tau-leap simulations. The latter problem can be efficiently handled using multilevel strategies, where estimators are constructed for a range of $\tau$ -values, and are suitably coupled to simultaneously reduce the estimator’s bias and variance [7, 30, 32]. A promising approach would be to integrate these multilevel estimators with $\tau$ IPA to improve its accuracy and computational efficiency.

Appendix

5.1 Proofs of the main results

Proof.[Proof of Theorem 3.4] Let $\{\mathcal{F}_{t}\}$ be the filtration generated by the process

$(X_{\theta}(t))_{t\geq 0}$ and let $\sigma_{i}$ be its $i$ -th jump time for $i=1,2,\dots$ . We define $\sigma_{0}=0$ for convenience. Since the process $(X_{\theta}(t))_{t\geq 0}$ is constant between consecutive jump times we can write

[TABLE]

where $\delta_{i}=\sigma_{i+1}\wedge T-\sigma_{i}\wedge T$ and the last equality holds due to linearity of the expectation operator and the fact that $\delta_{i}=0$ if $\sigma_{i}\geq T$ . Given $X_{\theta}(\sigma_{i})=y$ and $\sigma_{i}=u<T$ , the distribution of the random variable $\delta_{i}$ has the cumulative density function given by

[TABLE]

This shows that for any continuous function $g:[0,\infty)\to[0,\infty)$ we have

[TABLE]

where the last relation holds because by applying integration by parts we get

[TABLE]

Taking $g\equiv 1$ gives us $\mathbb{E}\left(\delta_{i}\middle|X_{\theta}(\sigma_{i})=y,\sigma_{i}=u\right)=\int_{0}^{T-u}e^{-\lambda_{0}(y,\theta)s}ds$ and therefore

[TABLE]

Substituting this in (5.1) we obtain

[TABLE]

Theorem 2.3 in [25] shows that the sensitivity value $S_{\theta}(f,T)$ can be expressed as

[TABLE]

where

[TABLE]

Using this fact along with (5.1) we obtain

[TABLE]

where

[TABLE]

However relation (5.41) with $g(s)=\Delta_{\zeta_{k}}\Psi_{\theta}(X_{\theta}(\sigma_{i}),f,T-\sigma_{i}-s)$ implies that given $X_{\theta}(\sigma_{i})$ and $\sigma_{i}<T$ , we have

[TABLE]

Substituting this in the last expression for $S_{\theta}(f,T)$ and using the fact that $X_{\theta}(s)=X_{\theta}(\sigma_{i})$ for all $s\in[\sigma_{i},\sigma_{i+1})$ we get

[TABLE]

This completes the proof of this result. $\Box$

Proof.[Proof of Theorem 3.6] For each $k=1,\dots,K$ define $g_{k},h_{k}$ by

$g_{k}(x,t)=\partial\lambda_{k}(x)\Delta_{\zeta_{k}}\Psi(xk,f,T-t)$ and $h_{k}(x,t)=\partial\lambda_{k}(x)\Delta_{\zeta_{k}}\widetilde{\Psi}_{\alpha_{1},\beta_{1}(t)}(x_{k},f,T-t)$ . Without loss of generality, we can assume that there exists a $C>0$ such that

[TABLE]

Then due to Lemma 3.3 we obtain

[TABLE]

where $c_{0}(p)$ is a constant that depends only on $p$ as well as $\zeta_{1},\dots,\zeta_{K}$ . Lemma 3.3 also shows that

[TABLE]

and $\sup_{t\in[0,T]}|g_{k}(x,t)|\leq c_{1}(p)C^{2}C_{2}(p,T)(1+\|x\|^{2p})$ , where $c_{1}(p)$ is again a constant that depends only on $p$ and $\zeta_{1},\dots,\zeta_{K}$ .

From (5.44) and Lemma 3.3 it follows that

[TABLE]

Moreover from (5.43), we get

[TABLE]

and hence using Assumption 2, we obtain

[TABLE]

Note that

[TABLE]

Using (5.45) and (5.46) we obtain the bound

[TABLE]

which proves the theorem. $\Box$

5.2 Supplementary Tables and Algorithms

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] U. Alon , An introduction to systems biology : design principles of biological circuits , Chapman & Hall/CRC mathematical and computational biology series, Chapman & Hall/CRC, 2007.
2[2] D. Anderson , An efficient finite difference method for parameter sensitivities of continuous time markov chains , SIAM: Journal on Numerical Analysis, 50 (2012).
3[3] D. Anderson and T. Kurtz , Continuous time Markov chain models for chemical reaction networks , in Design and Analysis of Biomolecular Circuits, H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore, eds., Springer-Verlag, 2011.
4[4] D. F. Anderson , A modified next reaction method for simulating chemical systems with time dependent propensities and delays , The Journal of Chemical Physics, 127 (2007), 214107.
5[5] D. F. Anderson , Incorporating postleap checks in tau-leaping , The Journal of Chemical Physics, 128 (2008), 054103.
6[6] D. F. Anderson, A. Ganguly, and T. G. Kurtz , Error analysis of tau-leap simulation methods , Ann. Appl. Probab., 21 (2011), pp. 2226–2262.
7[7] D. F. Anderson and D. J. Higham , Multi-level monte carlo for continuous time markov chains, with applications to biochemical kinetics , SIAM Multiscale Modeling and Simulation, 10 (2012), pp. 146–179.
8[8] J. Bascompte , Structure and dynamics of ecological networks , Science, 329 (2010), pp. 765–766.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

**Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations **

Abstract

1 Introduction

2 Preliminaries

2.1 Biased methods

2.2 Unbiased methods

2.3 Rationale for using tau-leap schemes for sensitivity estimation

3 Sensitivity estimation with tau-leap simulations

3.1 A generic tau-leap method

Remark 3.1

Remark 3.2

Lemma 3.3

3.2 An integral formula for parameter sensitivity

Theorem 3.4

Remark 3.5

3.3 Sensitivity approximation with tau-leap simulations

Theorem 3.6

3.4 A tau-leap estimator for parameter sensitivity

3.5 The Tau Integral Path Algorithm (τ\tauτIPA)

4 Numerical Examples

4.1 Single-species birth-death model

4.2 Repressilator Network

4.3 Genetic toggle switch

5 Conclusions and future work

Appendix

5.1 Proofs of the main results

5.2 Supplementary Tables and Algorithms

Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations

3.5 The Tau Integral Path Algorithm ( $\tau$ IPA)