Identification of Parameters for Large-scale Kinetic Models

Ugur G. Abdulla; Roby Poteau

arXiv:1904.07197·q-bio.QM·December 7, 2020·J. Comput. Phys.

Identification of Parameters for Large-scale Kinetic Models

Ugur G. Abdulla, Roby Poteau

PDF

TL;DR

This paper presents a modified numerical method for identifying parameters in large-scale nonlinear ODE models in systems biology, improving robustness and applicability to noisy, partial data sets.

Contribution

The authors enhance an existing parameter identification method by embedding a staggered corrector and multi-objective optimization, enabling application to large-scale models with non-identifiable parameters.

Findings

01

Demonstrated geometric convergence with minimal data sets

02

Applied the method to a benchmark pathway model with 36 parameters

03

Developed and shared the qlopt software package

Abstract

Inverse problem for the identification of the parameters for large-scale systems of nonlinear ordinary differential equations (ODEs) arising in systems biology is analyzed. In a recent paper in \textit{Mathematical Biosciences, 305(2018), 133-145}, the authors implemented the numerical method suggested by one of the authors in \textit{J. Optim. Theory Appl., 85, 3(1995), 509-526} for identification of parameters in moderate scale models of systems biology. This method combines Pontryagin optimization or Bellman's quasilinearization with sensitivity analysis and Tikhonov regularization. We suggest modification of the method by embedding a method of staggered corrector for sensitivity analysis and by enhancing multi-objective optimization which enables application of the method to large-scale models with practically non-identifiable parameters based on multiple data sets, possibly with…

Tables3

Table 1. Table 1: Comparison of several local optimization methods with the presented method for the three step metabolic network. The relative errors (r.e.), the number of iterations (n.i.), mean of the computational time (c.t.) of 20 runs, and the number of function evaluations (f.e.) are compared in numerical experiment designed by AMIGO2 which contained 16 datasets with 21 time measurements per component and per data set.

metric	qlopt	lsqnonlin	nl2sol
r.e.	$4.271 162 36 × 10^{- 4}$	$3.2356 × 10^{- 5}$	$4.3538 × 10^{- 4}$
n.i.	$8$	$16$	$7$
c.t. (s)	$1.388 763 5$	$7.6563$	$5.0426$
f.e.	$8$	$593$	$299$

Table 2. Table 4: The values of the control paramters P 𝑃 P and S 𝑆 S for estimations that used 16 data sets.

Data sets	P	S
1	$0.1$	$0.05$
2	$0.1$	$0.135 72$
3	$0.1$	$0.3684$
4	$0.1$	$1.0$
5	$0.464 16$	$0.05$
6	$0.464 16$	$0.135 72$
7	$0.464 16$	$0.3684$
8	$0.464 16$	$1.0$
9	$2.1544$	$0.05$
10	$2.1544$	$0.135 72$
11	$2.1544$	$0.3684$
12	$2.1544$	$1.0$
13	$10$	$0.05$
14	$10$	$0.135 72$
15	$10$	$0.3684$
16	$10$	$1.0$

Table 3. Table 5: The values of the control paramters P 𝑃 P and S 𝑆 S for estimations that used 5 data sets.

Data sets	P	S
1	$0.05$	$10$
2	$0.3684$	$2.1544$
3	$1.0$	$0.1$
4	$0.092 86$	$2.1544$
5	$0.135 72$	$2.1544$

Equations93

\frac{d x}{d t} = f (t, x, u, v), t_{0} \leq t \leq t_{1}

\frac{d x}{d t} = f (t, x, u, v), t_{0} \leq t \leq t_{1}

x (t_{0}) = x^{0} \in R^{n},

x (t_{0}) = x^{0} \in R^{n},

x = x (t) = (x_{1} (t), x_{2} (t), \dots, x_{n} (t)) : [t_{0}, t_{1}] \to R^{n}

x = x (t) = (x_{1} (t), x_{2} (t), \dots, x_{n} (t)) : [t_{0}, t_{1}] \to R^{n}

u = (u_{1}, u_{2}, \dots, u_{m}) \in R^{m}

u = (u_{1}, u_{2}, \dots, u_{m}) \in R^{m}

v = (v_{1}, v_{2}, \dots, v_{p}) \in R^{p}

v = (v_{1}, v_{2}, \dots, v_{p}) \in R^{p}

f = (f_{1}

f = (f_{1}

[t_{0}, t_{1}] \times R^{n} \times R^{m} \times R^{p} \to R^{n}

\frac{\partial f}{\partial x}, \frac{\partial f}{\partial u} .

\frac{\partial f}{\partial x}, \frac{\partial f}{\partial u} .

x = x^{d} (t) = x^{d} (t; u) := x (t, u, v^{d}), d = 1, ..., D

x = x^{d} (t) = x^{d} (t; u) := x (t, u, v^{d}), d = 1, ..., D

x_{N}^{d} (t) = p \to \infty lim x_{N, p}^{d} (t), t_{0} \leq t \leq t_{1},

x_{N}^{d} (t) = p \to \infty lim x_{N, p}^{d} (t), t_{0} \leq t \leq t_{1},

\frac{d x _{N, p}^{d}}{d t} = f (t, x_{N, p - 1}^{d}, u, v^{d}) +

\frac{d x _{N, p}^{d}}{d t} = f (t, x_{N, p - 1}^{d}, u, v^{d}) +

J (t, x_{N, p - 1}^{d}, u, v^{d}) (x_{N, p}^{d} - x_{N, p - 1}^{d}),

x_{N, p}^{d} (t_{0}) = x^{0},

J (t, x, u, v) = \frac{\partial f ( t , x , u , v )}{\partial x}

J (t, x, u, v) = \frac{\partial f ( t , x , u , v )}{\partial x}

u_{N} = u_{N - 1} + Δ u,

u_{N} = u_{N - 1} + Δ u,

R = x^{d} (t, u) - x_{N}^{d} (t, u_{_{N}}) .

R = x^{d} (t, u) - x_{N}^{d} (t, u_{_{N}}) .

R = Δ x_{N}^{d} (t) - U_{N}^{d} Δ u + o (∣Δ u ∣), as ∣Δ u ∣ \to 0,

R = Δ x_{N}^{d} (t) - U_{N}^{d} Δ u + o (∣Δ u ∣), as ∣Δ u ∣ \to 0,

Δ x_{N}^{d} (t) = x^{d} (t, u) - x_{N}^{d} (t, u_{N - 1}),

Δ x_{N}^{d} (t) = x^{d} (t, u) - x_{N}^{d} (t, u_{N - 1}),

U_{N}^{d,j}=\Big{(}\frac{\partial\mathbf{x}^{d}_{N}(t,\mathbf{u}_{N-1})}{\partial u^{j}}\Big{)},\ j=1,...,m.

U_{N}^{d,j}=\Big{(}\frac{\partial\mathbf{x}^{d}_{N}(t,\mathbf{u}_{N-1})}{\partial u^{j}}\Big{)},\ j=1,...,m.

\frac{d U _{N}^{d}}{d t} = \frac{\partial}{\partial u} f (t, x_{N}^{d}, u_{N - 1}, v^{d}) +

\frac{d U _{N}^{d}}{d t} = \frac{\partial}{\partial u} f (t, x_{N}^{d}, u_{N - 1}, v^{d}) +

J (t, x_{N}^{d}, u_{N - 1}, v^{d}) U_{N}^{d}, t_{0} \leq t \leq t_{1}

U_{N}^{d} (t_{0}) = 0,

J (Δ u) = d = 1 \sum D ∣∣Δ x_{N}^{d} - U_{N}^{d} Δ u ∣ ∣_{L_{2}^{n} (t_{0}, t_{1})}^{2},

J (Δ u) = d = 1 \sum D ∣∣Δ x_{N}^{d} - U_{N}^{d} Δ u ∣ ∣_{L_{2}^{n} (t_{0}, t_{1})}^{2},

(g, h)_{L_{2}^{n} (t_{0}, t_{1})} = \int_{t_{0}}^{t_{1}} g^{T} h d t .

(g, h)_{L_{2}^{n} (t_{0}, t_{1})} = \int_{t_{0}}^{t_{1}} g^{T} h d t .

\mathcal{J}_{N}^{\prime}(\Delta\mathbf{u})=2\sum_{d=1}^{D}\int\limits_{t_{0}}^{t_{1}}\Big{[}(U^{d}_{N})^{T}U^{d}_{N}\Delta\mathbf{u}-(U^{d}_{N})^{T}\Delta\mathbf{x}^{d}_{N}\Big{]}\mathrm{d}t,

\mathcal{J}_{N}^{\prime}(\Delta\mathbf{u})=2\sum_{d=1}^{D}\int\limits_{t_{0}}^{t_{1}}\Big{[}(U^{d}_{N})^{T}U^{d}_{N}\Delta\mathbf{u}-(U^{d}_{N})^{T}\Delta\mathbf{x}^{d}_{N}\Big{]}\mathrm{d}t,

J_{N}^{''} (Δ u) = 2 d = 1 \sum D t_{0} \int t_{1} (U_{N}^{d})^{T} U_{N}^{d} d t .

J_{N}^{''} (Δ u) = 2 d = 1 \sum D t_{0} \int t_{1} (U_{N}^{d})^{T} U_{N}^{d} d t .

A_{N} Δ u = P_{N},

A_{N} Δ u = P_{N},

A_{N}=\sum_{d=1}^{D}\int\limits_{t_{0}}^{t_{1}}(U^{d}_{N})^{T}U^{d}_{N}\mathrm{d}t=\Big{(}a_{N}^{ij}\Big{)}_{i,j=1}^{m}

A_{N}=\sum_{d=1}^{D}\int\limits_{t_{0}}^{t_{1}}(U^{d}_{N})^{T}U^{d}_{N}\mathrm{d}t=\Big{(}a_{N}^{ij}\Big{)}_{i,j=1}^{m}

a_{N}^{ij}=\sum_{d=1}^{D}\int_{t_{0}}^{t_{1}}\Big{(}\frac{\partial x_{N}^{d}(t,\mathbf{u}_{N-1})}{\partial u^{i}}\Big{)}^{T}\frac{\partial x_{N}^{d}(t,\mathbf{u}_{N-1})}{\partial u^{j}}dt,

a_{N}^{ij}=\sum_{d=1}^{D}\int_{t_{0}}^{t_{1}}\Big{(}\frac{\partial x_{N}^{d}(t,\mathbf{u}_{N-1})}{\partial u^{i}}\Big{)}^{T}\frac{\partial x_{N}^{d}(t,\mathbf{u}_{N-1})}{\partial u^{j}}dt,

P_{N}=\sum_{d=1}^{D}\int\limits_{t_{0}}^{t_{1}}(U^{d}_{N})^{T}\Delta\mathbf{x}_{N}\mathrm{d}t=\Big{(}p_{N}^{j}\Big{)}_{j=1}^{m}

P_{N}=\sum_{d=1}^{D}\int\limits_{t_{0}}^{t_{1}}(U^{d}_{N})^{T}\Delta\mathbf{x}_{N}\mathrm{d}t=\Big{(}p_{N}^{j}\Big{)}_{j=1}^{m}

\displaystyle p_{N}^{j}=\sum_{d=1}^{D}\int_{t_{0}}^{t_{1}}\Big{(}\frac{\partial x_{N}^{d}(t,\mathbf{u}_{N-1})}{\partial u^{j}}\Big{)}^{T}(x^{d}(t,\mathbf{u})

\displaystyle p_{N}^{j}=\sum_{d=1}^{D}\int_{t_{0}}^{t_{1}}\Big{(}\frac{\partial x_{N}^{d}(t,\mathbf{u}_{N-1})}{\partial u^{j}}\Big{)}^{T}(x^{d}(t,\mathbf{u})

- x_{N}^{d} (t, u_{N - 1})) d t .

a_{N}^{ij} = d = 1 \sum D (U_{N}^{d, i}, U_{N}^{d, j})_{L_{2}^{n} (t_{0}, t_{1})} .

a_{N}^{ij} = d = 1 \sum D (U_{N}^{d, i}, U_{N}^{d, j})_{L_{2}^{n} (t_{0}, t_{1})} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Identification of Parameters for Large-scale Kinetic Models

Ugur G. Abdulla & Roby Poteau

Department of Mathematical Sciences, Florida Institute of Technology, 150 W University Blvd, Melbourne, FL 32901

Abstract

Numerical method for the inverse problem on the identification of parameters for large-scale systems of nonlinear ordinary differential equations (ODEs) arising in systems biology is introduced. The method combines Pontryagin optimization or Bellman’s quasilinearization with sensitivity analysis and Tikhonov regularization. Embedding a method of staggered corrector for sensitivity analysis and by enhancing multi-objective optimization enables application of the method to large-scale models with practically non-identifiable parameters based on multiple data sets, possibly with partial and noisy measurements. The method is tested in two canonical benchmark models, such as three-step pathway modeled by 8 nonlinear ODEs with 36 unknown and two control input parameters, and a model of central carbon metabolism of Escherichia coli described by a system of 18 linear ODEs with 116 unknown parameters. The numerical results demonstrate superlinear convergence with a minimum data sets and with minimum measurements per data set, and possibly with partial and noisy measurements. Software package qlopt is developed and posted in GitHub. MATLAB package AMIGO2 is used to demonstrate advantage of qlopt over most popular methods/software such as lsqnonlin, fmincon and nl2sol.

keywords:

kinetic models , parameter estimation , nonlinear ODEs , three step metabolic network , carbon metabolism of Escherichia coli , superlinear convergence

††journal: Journal of Computational Physics

1 Introduction

Systems Biology is an actively emerging interdisciplinary field between biology and mathematics, based on the idea of treating biological systems as a whole entity which is more than the sum of its interrelated components. These systems are networks with emerging properties generated by complex interaction of a large number of cells or organisms. The mission is to reveal and understand the global properties of biological or bioengineering systems that occur through complex interactions on a microscopic level. One of the major goals of systems biology is to reveal, understand, and predict such properties through the development of mathematical models based on experimental data. In many cases, predictive models of systems biology are described by large systems of nonlinear differential equations. Quantitative identification of such systems requires the solution of inverse problems on the identification of parameters of the system. Inverse problems are ill-posed, meaning that a solution may not exist, may not be unique, and most importantly, may not be continuously dependent on the measurements. Since measurements always contain some error, it is impossible to solve inverse problems without regularization techniques [1, 2].

Ill-posedness of the inverse problems in large scale models of systems biology is strongly associated with the problem of correlation of parameters, expressed in explicit or implicit functional relation between the group of unknown parameters. Correlation of parameters in realistic large scale models is unavoidable, for it is motivated by biological phenomena of correlation between dynamics and signatures of genes and proteins. This creates an essential obstacle in solving the inverse problems for many Systems Biology models [3, 4, 5, 6]. The major difficulty arises due to the fact that correlated parameters in general cannot be identified uniquely via measurements. Therefore, correlated parameters are called non-identifiable. Besides unknown parameters, large-scale models include control input parameters, which are used for experimental design and measurements. Non-identifiable parameters are called structurally non-identifiable, if correlation of parameters is independent of control input parameters. Structural non-identifiability of parameters is an intrinsic property of the model and cannot be resolved with additional or more accurate measurements. On the other hand, if correlation of parameters depends on control input parameters, it is possible that non-identifiable parameter set can be remedied to be identifiable with improved and/or additional measurements. In this case parameters are called practically non-identifiable [1, 3, 4, 5, 6, 7, 8]. Besides the difficulties associated with ill-posedness of inverse problems, nonlinearity of the system causes extreme sensitivity with respect to the parameters. Therefore, delicate sensitivity analysis is of high demand [9]. Inverse problems on the identification of parameters in systems biology are an actively growing research area [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 2, 21, 22]. We refer to survey articles [11, 23, 19, 24] for the extensive list of references. Standard popular approach to parameter identification problem in systems biology is its formulation as nonlinear optimization problem with objective to find unknown parameters via minimization of the mismatch or residual between experimental data and model dynamics. Ultimate goal is to develop a global optimization method with least computational cost, which is robust with respect to nonlinearities and scales well with problem size [24]. Currently such an ideal method does not exist. Local methods can be classified as gradient-based methods ([25]) and gradient-free ones [26, 27]. Gradient based methods are very effective, but they only provide convergence to local optima. Gradient-free methods are less efficient, and has slow convergence rate [27]. Global optimization methods can be classified as stochastic ([28]) and deterministic ones [29]. Stochastic global optimization algorithms are implementing pseudorandom sequences to determine search directions toward the global optimum. Most effective class of stochastic methods are pure random search and adaptive sequential methods, clustering methods, population-based methods and nature inspired methods [30, 19]. Advantage of popular global stochastic methods such as simulated annealing, particle swarm optimization, or genetic algorithms is their scalability, but the downside is high computational cost [31]. Deterministic local optimization methods can be used as global optimization method by embedding "Multi-start" strategy into it which facilitates many optimization runs from randomly selected initial parameter guesses [19, 23]. Latin hypercube sampling [32] for partition of the parameter space is used to guarantee that each parameter estimation iteration starts with initial guess from different region in parameter space.

Some of the most popular local optimization methods available as open source software ([19, 33, 34]) are

Levenberg-Marquardt algorithm and trust-region-reflective method (function lsqnonlin in MatLab) [10];

2.

Sequential Quadratic Programming (function fmincon in MatLab Optimization toolbox);

3.

An adaptive non-linear least-squares algorithm (function nl2sol in MatLab) [20].

One can also consider so called hybrid optimization algorithms as a combination of stochastic and deterministic algorithms: first candidate set of parameter values are generated by stochastic algorithm, followed by deterministic iterative algorithms by choosing candidate set elements as initial guesses [23, 19]. In [23] comprehensive comparison of 15 optimization algorithms from three groups is pursued. 12 stochastic optimization algorithms ([35]) were compared with a deterministic trust region algorithm ([36]) in combination with Latin hypecube sampling and two different approaches for calculating derivatives of the objective function: finite difference approximation and analytically derived sensitivity equations. The results show that multi-start deterministic optimization using the sensitivity equations for the calculation of derivatives significantly outperforms all other tested algorithms. The performance of stochastic optimization algorithms was surprisingly low compared to the hybrid and fully deterministic optimization algorithms [23] .

In another recent paper [24] collection of benchmark problems were selected to evaluate the performance of two families of optimization methods:

multi-start local optimization (MS),

2.

enhanced scatter search metaheuristic (eSS),

the latter may be combined with deterministic local searches, leading to hybrid methods. Selected local optimization methods were

nl2sol-FWD: the nonlinear least-squares algorithm nl2sol, using forward sensitivity analysis for evaluating the gradients of the residuals.

2.

fmincon-ADJ: the interior point algorithm included in fmincon, using adjoint sensitivities for evaluating the gradient of the objective function.

3.

dhc-a gradient-free dynamic hill climbing algorithm.

4.

eSS without any local methods (NOLOC) and particle swarm optimization (PSO) [37].

Comprehensive evaluation of [24] clearly shows that high-quality sensitivity calculation methods provide a competitive advantage to local methods that exploit them. Optimization using adjoint and forward sensitivity analysis ${\it fmincon}-ADJ$ and ${\it nl2sol}-FWD$ usually outperform the gradient-free alternative dhc. The combinations of eSS with gradient-based methods, eSS-fmincon-ADJ and eSS-nl2sol-FWD, clearly outperform the gradient-free alternatives, eSS-dhc and eSS-NOLOC, as well as the gradient-free PSO.

The comparison analysis performed in [23, 24] demonstrates that robust deterministic local optimization methods embedded with MS or eSS strategy, and with sharp sensitivity analysis platform are the best candidates for creation of powerful global optimization methods for large-scale models of systems biology.

The goal of this paper is to develop effective local optimization method for solving inverse problems on the identification of finite-dimensional parameter sets for the large-scale systems of nonlinear ODEs arising in Systems Biology, and to develop an effective software which is competitive with currently known popular methods/software such as lsqnonlin, fmincon and nl2sol.

In [38] we implemented the numerical method, introduced originally in [39, 40], for the solution of the inverse problems for the canonical models of Systems Biology. The iterative method combines ideas of Pontryagin’s optimization or Bellman’s quasilinearization with sensitivity analysis and Tikhonov regularization. Extensive computational analysis pursued in [38] demonstrates that the method is very well adapted to canonical models of system biology with moderate size parameter sets and has quadratic convergence. Software package qlopt was developed and posted in GitHub [41]. MATLAB package AMIGO2 [19] was used to demonstrate the competitiveness and advantage of qlopt with other most popular local search methods like lsqnonlin, fmincon, nl2sol.

However, direct adaptation and scalability of the method implemented in [38] to inverse problems with significantly larger size was not as effective. In this paper we introduce a modification of the method which is effective to solve the inverse problem on the identification of parameters for large scale models in systems biology. In Section 2 we introduce the new modified method accompanied with two types of Tikhonov regularization algorithms. The modification is twofold.

Method of staggered corrector [42] is embedded into the step of calculation of the sensitivity vectors. Precisely, instead of solving linearized system and associated sensitivity system, we first solve original system through quasilinearization [43], and then use its solution to solve linear sensitivity system corresponding to the original nonlinear system. We use software package CVODES [44] to implement the method of staggered corrector into our algorithm.

2.

Multi-objective optimization is added to the method which enables the application of the method to large-scale models that are practically non-identifiable due to parameter correlation.

In Section 3 we present results and analysis of the application of the method to benchmark kinetic model of a biological network for a three-step pathway modeled by 8 nonlinear ODEs describing 8 metabolic concentrations with 36 unknown parameters and two control input parameters specifying the experimental design. Section 3.1 presents the results for noise-free simulated data. It is demonstrated that 5 to 16 data sets are satisfactory to uniquely identify all 36 parameters with high precision. This is followed by Section 3.2 where we demonstrate that the delicate implementation of the Tikhonov regularization with optimal choice of the regularization parameter significantly affects the convergence rate and precision of the algorithm. In the following Section 3.3 we demonstrate the effect of the number of time measurements for each component of the system on the convergence and accuracy of the method. Subsequently in Section 3.4 we explore in depth the major question of identifiability of the parameters with minimum number of data sets. It is demonstrated that minimum five data sets are required to identify uniquely all parameters with high precision. Section 3.5 demonstrates that the careful implementation of the Type II Tikhonov regularization significantly improves the convergence range of the algorithm. In Section 3.6 we apply the method to simulated noisy data and demonstrate its robustness. Section 3.7 presents numerical analysis of the rate of convergence of the method. Section 3.8 demonstrates the robustness of the method with respect to partial measurements. In Section 3.9, we pursue comparison and demonstrate the competitiveness and the advantages of our method and the associated software package qlopt against most advance methods/software like lsqnonlin, fmincon, nl2sol.

In Section 4 we analyze benchmark kinetic model of central carbon metabolism of Escherichia coli modeled by 18 linear differential equations for the concentrations of 17 intracellular metabolites and extracellular glycose [45, 46]. We apply the method to inverse problem on the identification of 116 parameters which express kinetic properties and maximum reaction rates. Section 4.1 presents the results with noise-free simulated data. High accuracy and robustness of the method with respect to number of time-measurements both in linear and logarithmic scale, convergence with partial measurements, and superlinear convergence rate is demonstrated. In Section 4.2 we apply the method to simulated noisy data and demonstrate its robustness. Section 4.3 demonstrates that the careful implementation of the Type II Tikhonov regularization significantly increases convergence range of the method in logarithmic scale.

Finally, in Section 5 we outline the main conclusions.

2 Description of the Method

Consider a dynamical system:

[TABLE]

where

[TABLE]

is the state vector,

[TABLE]

is the unknown parameter vector,

[TABLE]

is the control input parameter vector, and

[TABLE]

is a continuous vector function with continuous derivatives

[TABLE]

Consider inverse problem of finding the parameter $\mathbf{u}$ given $D$ measurements for the state vector $\mathbf{x}$ corresponding to $D$ fixed values of the control vector $\mathbf{v}$ :

[TABLE]

on an interval $t_{0}\leq t\leq t_{1}$ , where $\mathbf{x}^{d}(t_{0})=\mathbf{x}^{0}$ .

Having chosen the initial vector function $\mathbf{x}^{d}_{N,0}$ (say, $\mathbf{x}^{d}_{N,0}=\mathbf{x}^{d}(t)$ ), and initial approximation $\mathbf{u}=\mathbf{u}_{0}$ , we implement quasilinearization of (1) ([43]) and at each fixed iteration $N=1,2,...$ we find the solution as a limit

[TABLE]

where $\mathbf{x}^{d}_{N,p}$ solves the linear system of ODEs in $[t_{0},t_{1}]$ with $\mathbf{u}=\mathbf{u}_{N-1}$ :

[TABLE]

where

[TABLE]

is the n $\times$ n Jacobian matrix. It is well known that the convergence (3) has a quadratic rate [43]. Given the initial guess $\mathbf{u}_{0}$ of the unknown parameter $\mathbf{u}$ , we identify at every step of the iteration a new approximation

[TABLE]

which minimizes the $L_{2}$ -norm of the residues

[TABLE]

We have

[TABLE]

where

[TABLE]

$U^{d}_{N}$ is an $n\times m$ sensitivity matrix with columns

[TABLE]

$U_{N}^{d}$ solves the matrix differential system

[TABLE]

where $\mathbf{x}^{d}_{N}$ is the solution of (1),(2) with $\mathbf{u}=\mathbf{u}_{N-1},\mathbf{v}=\mathbf{v}^{d}$ as it is constructed in (3). Finding $\mathbf{x}^{d}_{N},U_{N}^{d}$ from (3), (4), (7) form the method of staggered corrector [42].

To find $\Delta\mathbf{u}$ , we minimize the multi-objective function

[TABLE]

where $L^{n}_{2}(t_{0},t_{1})$ is a Hilbert space of vector functions $g:(t_{0},t_{1})\to\mathbb{R}^{n}$ with inner product

[TABLE]

We have

[TABLE]

Therefore, minimum $\Delta\mathbf{u}$ satisfies the following system of linear algebraic equations

[TABLE]

where

[TABLE]

is an $m\times m$ symmetric matrix with elements

[TABLE]

and

[TABLE]

is an $m$ -vector with elements

[TABLE]

In fact, $A_{N}$ is a sum of Gram matrices $A_{N}^{d}$ of vectors $U_{N}^{d,j}$ , and

[TABLE]

It is known [47] that

[TABLE]

and it is positive, that is to say, $A_{N}^{d}$ is non-singular, if and only if the vectors $U_{N}^{d,j},j=1,...,m$ are linearly independent.

Hence, we suggest the following modfication of the numerical algorithm from [39].

2.1 Algorithm

Initialize $\mathbf{u}_{0}$ and set $N=1$ . 2. 2.

Set $\mathbf{x}^{d}_{N,0}(t)$ and find $\mathbf{x}^{d}_{N}(t,\mathbf{u}_{N-1})$ via quasilinearization from (3),(4). 3. 3.

Having $\mathbf{x}^{d}_{N}(\cdot,\mathbf{u}_{N-1})$ find sensitivity matrices $U^{d}_{N}$ by solving linear ODE system (7). 4. 4.

Find $\Delta\mathbf{u}$ by solving linear algebraic equations system (9) and update the new value $\mathbf{u}_{N}$ of the parameter using (5). 5. 5.

If satisfactory accuracy is achieved, then terminate the process, otherwise replace $N$ with $N+1$ and go back to Step 2. As termination criteria, the smallness of either of the expressions

[TABLE]

can be used.

2.2 Regularization

As in [38] we implement two types of Tikhonov regularization. Type I regularization is performed by replacing the function (8) with

[TABLE]

This yields the following linear system instead of (9)

[TABLE]

where $I$ is the identity matrix and $\alpha$ is a regularization parameter. Type II regularization is performed by replacing the function (8) with

[TABLE]

where $\mathbf{u}^{*}$ is a known vector expected to be close to the true value of the unknown parameter. This implies the following linear system instead of (9):

[TABLE]

2.3 Identifiability vs. Practical Non-identifiabilty

Convergence of the algorithm is connected to the identifiability of unknown parameters. In fact, $d$ th Gram matrix summand $A_{N}^{d}$ of $A_{N}$ in (9) is called Fisher information matrix (FIM) for the ODE system (1), which characterizes the information content of the experimental measurement in the $d$ th data set. Singularity of $A_{N}^{d}$ is equivalent to linear dependence of the sensitivity vectors $U_{N}^{d,j},j=1,...,m$ . From (7) it follows that the latter is equivalent to linear dependence of the columns of the matrix $\frac{\partial}{\partial\mathbf{u}}\mathbf{f}(t,\mathbf{x}^{d}_{N},\mathbf{u}_{N-1},\mathbf{v}^{d})$ , which is, in general, equivalent to correlation of parameters. Hence, singularity of the matrix $A_{N}$ is equivalent to non-identifiability of parameters. However, note that the matrix $\frac{\partial\mathbf{f}}{\partial\mathbf{u}}$ depends on the control input parameter $\mathbf{v}$ . Therefore, if selection of ${\bf v}$ can arrange independence of the columns of the matrix $\frac{\partial\mathbf{f}}{\partial\mathbf{u}}$ , the correlation of parameters will be remedied, and parameters will be practically non-identifiable. This will guarantee non-singularity of $A_{N}$ , and convergence of the algorithm can be established. The major goal is to pursue experimental design and collection of measurements for carefully selected values of the control parameter $\mathbf{v}$ , so that the algorithm converges to the unique solution of the inverse problem. Yet another difficulty arises when $A_{N}$ is non-singular but $detA_{N}$ is sufficiently small, and for computer simulation $A_{N}$ is treated as a singular matrix [48, 17, 8]. Our two regularization algorithms are developed to address such practical non-identifiability cases. A major factor for the convergence of the algorithm for the identification of practically non-identifiable parameters is the increase of number of data sets $D$ . Specifically, there is a minimum number of data sets with different inputs of control parameters for experimental design needed to relieve the parameter correlations and acquire suitable measurement data for unique parameter estimation [4].

3 Benchmark Model 1: Biological Network for a Three-step Pathway

We tested the method on a benchmark model of a biological network for a three-step pathway modeled by 8 nonlinear ODEs describing 8 metabolic concentrations and 36 parameters $p_{i},i=1,...,36$ ([49]). Two parameters $P$ and $S$ are control input parameters specified by the experimental design. The unknown parameters $p_{i}$ are correlated, but their functional relationship with one another is dependent on the input parameters $P$ and $S$ , and in general parameters are practically non-identifiable, and can be identified with multiple data sets. In [31], the inverse problem was analyzed with 16 noise-free data sets, and in [8] with 16 both noise-free and noisy data sets. The results demonstrated strong parameter correlations in several groups, with accurate parameter values identified in [8]. Parameter correlations were analyzed in [4]. It is demonstrated that correlated parameters are practically non-identifiable for a single data set and at least 5 data sets with different control inputs are required to uniquely estimate the 36 parameters of this model.

[TABLE]

For our experiments we used the common values for the initial conditions $(6.6667e-1,5.7254e-1,4.1758e-1,4.0e-1,3.6409e-1,2.9457e-1,1.419,9.3464e-1)$ , with $t_{0}=0$ and $t_{1}=120.$ True values of 36 parameters are outlined in last columns of Tables 2 and 3 in Appendix. We implemented 16 input parameters given in AMIGO2 [19] (Table 4 in Appendix) and 5 input parameters given in [4] (Table 5 in Appendix) for our experiments. We chose the regularization parameter $\alpha$ as a function of the residual:

[TABLE]

where $C,\gamma>0$ are chosen experimentally.

3.1 Numerical Results with Noise-free Data Sets

We applied the numerical method to identify the 36 parameters with 16 and 5 data sets. We generated simulated measurements for each data set by solving the system of 8 nonlinear ODEs with true values of 36 parameters. We chose the number of time data points for each of the 8 components of the system either at 240 or at 20 uniformly distributed time grid points in the segment $[0,120]$ . Computational cost of each iteration per one data set consists of iterative solution of the system of 8 nonlinear ODEs through quasilinearization; solving a system of 288 linear ODEs to identify sensitivity matrix-function; calculation of 1332 integrals for entries of the matrix $A_{N}$ and vector $P_{N}$ ; and finally solving a system of 36 linear algebraic equations to find the increment of the parameters. The green line in Figures 1 and 2 demonstrate the results for 16 data sets with 240 time points. Rapid convergence to the true solution happens in only 7 iterations. Next we applied the method with 5 data sets. Though it required 3 extra iterations, the black line in Figures 1 and 2 demonstrate the rapid convergence of the method with reduced error.

Next we applied the method by choosing measurements at 20 time grid points for each of the 8 components. The results are demonstrated for 16 and 5 data sets in Figures 1 and 2 by the blue and red lines, respectively. The algorithm converges in the same number of iterations with respect to the number of data sets, while maintaining around the same level of accuracy, as demonstrated in Figures 1 and 2.

3.2 Effect of the Regularization Parameter $\alpha$

The choice of the regularization parameter $\alpha$ is an important factor which significantly improves the convergence rate and computational cost of the algorithm. To demonstrate the existence of the optimal non-trivial value of $\alpha$ at every fixed step $N$ , we considered profiles of $\alpha$ vs $|\mathbf{u}_{N-1}+\Delta\mathbf{u}-\mathbf{u}|$ , where $\mathbf{u}$ is the true solution. Figure 3 corresponds to the 2nd and 4th iterations of the green line in Figure 1. Similarly, Figure 4 corresponds to 3rd and 6th iteration of the black line in the same figure. In each example there is a clear minimum which is the best choice of the regularization parameter. The bullets on the graph corresponds to our choice of the regularization parameter according to the residual method (14). In fact, optimal or nearly optimal choice of the regularization parameter significantly increases convergence rate of the method from geometric to be close to quadratic convergence (see Section 3.7). The residual method provides a close, but not necessarily optimal value of $\alpha$ . Figure 3 demonstrates that our choice of the regularization parameter by the residual method is optimal. However, Figure 4 demonstrates that residual method doesn’t necessarily provide optimal choice of $\alpha$ . This analysis demonstrates that there is room for improvement of the convergence rate of the algorithm through implementation of a more effective method for the search of regularization parameter $\alpha$ without significantly affecting computational cost.

3.3 Convergence vs. Number of Data Points

The method is very robust and convergence is still the case if the number of data points is reduced to a single time measurement at the end of the time interval for each of the 8 components of the system. Figure 5 demonstrates the dependence of the number of time measurements for each component on the average error

[TABLE]

calculated at the final iteration in the experiment with $D=5$ data sets. Three graphs correspond to three different settings of the relative and absolute tolerances for CVODES. Decrease of the latter increases the overall accuracy of the result. Similar dependence in the experiment with 16 data sets and with CVODES tolerance being set up at $1\text{\times}{10}^{-6}$ is demonstrated in Figure 6. Some of the variation in the chart can be attributed to error accumulation and noise.

3.4 Convergence vs. Number of Data Sets

Our numerical analysis confirms the result of [4] that at least 5 data sets with different control inputs are required to uniquely estimate the 36 parameters of this model. Tables 2 and 3 in Appendix demonstrate the results of the numerical experiments when the number of data sets vary from 1 to 5, and time measurements for each of the 8 components of the system is 240 and 20 respectively. Table 2 demonstrates that when the number of data sets increases from 1 to 5 with accuracy $10^{-3}$ , the number of identified parameters increases as 22, 27, 32, 34 and 36 accordingly, provided that 240 time measurements are given. Table 3 demonstrates that with 20 time measurements the same number increases as 11, 24, 32, 33, 36.

3.5 Range of convergence

We define the range of convergence as a neighborhood of the true solution $\mathbf{u}$ in $\mathbb{R}^{36}$ such that for any $\mathbf{u}_{0}$ chosen from it, the sequence $\mathbf{u}_{N}$ constructed according to our algorithm converges to $\mathbf{u}$ . Consider the rectangular prism neighborhood of $\mathbf{u}$ :

[TABLE]

where $\tau$ and $\omega$ are two positive real numbers satisfying $\tau<1<\omega$ . Numerical analysis demonstrates that for our model example, $\mathcal{P}_{0.5}^{1.65}$ is the largest rectangular prism contained in the convergence range according to the algorithm accompanied by Type I regularization. Figure 7 demonstrates the convergence with initial iteration $\mathbf{u}_{0}$ chosen at extremes of $\mathcal{P}_{0.5}^{1.65}$ , namely $0.5\mathbf{u}$ and $1.65\mathbf{u}$ , respectively.

Careful implementation of Type II regularization allows significant expansion of the convergence range. In fact, by selecting $\mathbf{u}^{*}$ at the extremes of $\mathcal{P}_{0.5}^{1.65}$ , namely $\mathbf{u}^{*}=0.5\mathbf{u}$ and $\mathbf{u}^{*}=1.65\mathbf{u}$ we increased the convergence range to $\mathcal{P}_{0.03}^{\infty}$ according to the algorithm accompanied by Type II regularization. Figure 8 demonstrates the results of convergence of the method with Type II regularization when $\mathbf{u}^{*}=0.5\mathbf{u}$ , and initial iteration is chosen as $0.03\mathbf{u}$ . It also demonstrates the results when $\mathbf{u}^{*}=1.65\mathbf{u}$ , and initial iteration is chosen as $1001\mathbf{u}$ .

3.6 Convergence with Noisy Measurements

We pursued numerical experiments with simulated noisy data with Gaussian noise

[TABLE]

where $p$ is a percentage and $\nu_{i}$ is a random variable with standard normal distribution:

[TABLE]

Figure 9 demonstrates the convergence in the experiment with 5 data sets and 240 noisy time measurements with $p=1,3$ and $5$ .

In Figures 10 and 11 we show the box plot based on 100 simulations for the residual and parameter vector error dependence on the noise percentage $p$ .

Similar results with 20 noisy time measurements are given in Figure 12.

3.7 Rate of convergence

To estimate the convergence rate $\gamma$ from the relation

[TABLE]

we plot $\log|\mathbf{u}_{k+1}-\mathbf{u}_{k}|$ vs. $\log|\mathbf{u}_{k}-\mathbf{u}_{k-1}|$ and find a line of best fit to identify $\gamma$ and $C$ . Figures 13 and 14 demonstrate the outcome. For the numerical experiment for the green line in Figure 1 we have $\gamma=1.6104,C=3.1622E-3$ , and for the black line we have $\gamma=1.1674,C=6.3271E-1$ . The difference in convergence rate of two examples is in particular due to choice of the regularization parameter $\alpha$ . Choosing the optimal choice for $\alpha$ , as it is demonstrated in Figure 3 vs. Figure 4, causes higher convergence rate for the numerical experiment expressed in Figure 13 vs. Figure 14. We expect theoretical convergence rate of the method is quadratic [39].

3.8 Convergence with Partial Measurements

We tested the convergence of the method when only some of the components of the system have available measurements or partial measurements. In this case the inverse problem must be solved with partial observations. A typical result is demonstrated in Figure 15. We considered our numerical experiment with 5 data sets, and with 20 time measurements of only components 3, 4, 5, and 7. Figure 15 demonstrates the convergence, although convergence rate slowed down in comparison with the experiment when full set of measurements are given.

3.9 Comparison with lsqnonlin, nl2sol and fmincon

As in our previous paper [38] we are comparing our method qlopt with the most popular methods available as open software [19, 33, 34] such as

Levenberg-Marquardt algorithm and trust-region-reflective method (function lsqnonlin in MatLab) [10].

2.

An adaptive non-linear least-squares algorithm (function nl2sol in MatLab) [20].

3.

Sequential Quadratic Programming (function fmincon in MatLab Optimization toolbox)

We used model example provided by AMIGO2, which had 16 data sets with each component evaluated at 21 time points giving a total of 2688 data points. We ran each algorithm 20 times and recorded the average relative error of the parameter values (r.e.), the median number of objective function evaluations (f.e.), the average computational time (c.t.), and the median number of iterations (n.i.). The results are demonstrated in Table 1. Algorithm performed by fmincon did not converge. All three other methods have a comparable relative error. In terms of required number of iterations, our method is comparable to nl2sol, and both have a clear advantage over lsqnonlin. In terms of computational time and function evaluations our method has an enormous advantage over both methods. It should be noted that our software package qlopt is using C++ and Eigen, which gives an advantage over MatLab-based methods with respect to computational time.

4 Benchmark Model 2: Central Carbon Metabolism of Escherichia coli

The method is tested in a benchmark model of central carbon metabolism of Escherichia coli introduced in [45] and labeled as model B2 in [46]. The mathematical model reproduces the response to a pulse in extracellular glucose concentration. It consists of a system of 18 differential equations for the concentrations of metabolites, which include 17 intracellular metabolites and extracellular glycose. As in [46] we consider inverse problem on the identification of 116 parameters which express kinetic properties and maximum reaction rates.

4.1 Numerical Results with Noise-free Data

We applied the numerical method to identify the 116 parameters with one data set ( $d=1$ ). We generated simulated measurements by solving the system of 18 linear ODEs with true values of 116 parameters. We choose the number of time data points for each of the 18 components of the system at 1200 uniformly distributed time grid points in the segment $[0,300]$ . Note that since the system is linear, there is no need on quasilinearization step in our algorithm. Computational cost of each iteration consists of solution of the original system of 18 linear ODEs; solving a system of 2088 linear ODEs to identify sensitivity matrix-function; calculation of 13572 integrals for entries of the matrix $A_{N}$ and vector $P_{N}$ ; and finally solving a system of 116 linear algebraic equations to find the increment of the parameters. We applied our algorithm by selecting number of uniformly distributed in $[0,300]$ data points varying between 1 and 1200. The method is extremely robust and convergence is still the case if the number of data points is reduced to a single time measurement at the end of the time interval for each of the 18 components of the system. Figure 16 demonstrates the dependence of the number of time measurements for each component on the Euclidean norm difference of the true parameter vector from the parameter vector at the final iteration. Figure 17 demonstrate the results of the same experiment pursued in logarithmic scale. In general, solving problems in the logarithmic scale improves convergence, especially when the parameters vary greatly in magnitude [24, 23].

Next, we pursued an experiment with partial measurements: 100 data points taken only for the 9 components of the system. This is similar to experiments analyzed in [45, 46]. Figure 18 demonstrates rapid numerical convergence. Error $||\mathbf{x}-\mathbf{x}_{N}||_{L_{2}^{n}}$ reaches desired accuracy $10^{-5}$ in 8 steps, if initial iteration is chosen as $\bf{u}_{0}=1.75\bf{u}$ , where $\bf{u}$ is a true parameter vector. Figure 19 demonstrates that the algorithm maintains superlinear convergence rate.

4.2 Convergence with Noisy Measurements

We pursued numerical experiments with simulated noisy data with Gaussian noise as in (15). Figure 20 demonstrates the box plot based on 100 simulations for the parameter vector error dependence on the noise percentage $p$ changing between 1% to 10%. In every experiment the iterative value of the regularization parameter $\alpha$ was chosen optimally.

4.3 Range of convergence

Numerical analysis demonstrates that for our model example B2 using the logarithmic scale for the parameters, with the help of Type II regularization, the convergence range can be extended to $\mathcal{P}_{0.05}^{105}$ (before log transformation). Figure 21 demonstrates the convergence with initial iteration $\mathbf{u}_{0}$ chosen at extremes of $\mathcal{P}_{0.05}^{105}$ , namely $0.05\mathbf{u}$ and $105\mathbf{u}$ , respectively.

5 Conclusions

This paper develops the numerical method for solving inverse problem on the identification of parameters for large scale kinetic models in systems biology. The iterative method combines ideas of Pontryagin optimization or Bellman quasilinearization with sensitivity analysis and Tikhonov regularization. Embedding a method of staggered corrector for sensitivity analysis and by enhancing multi-objective optimization enables application of the method to large-scale models with practically non-identifiable parameters based on multiple data sets, possibly with partial and noisy measurements. The method is tested in two benchmark kinetic models, such as three-step pathway modeled by 8 nonlinear ODEs with 36 unknown and two control input parameters, and a model of central carbon metabolism of Escherichia coli described by a system of 18 linear ODEs with 116 unknown parameters. Extensive analysis demonstrates that the modified method is extremely well adapted to large scale problems. The main conclusions of the paper can be summarized as follows:

There is a minimum number of data sets with different control parameter inputs required to achieve superlinear/quadratic convergence and unique identifiability of parameters for large-scale problems.

2.

Increase of data sets beyond the minimum doesn’t significantly affect convergence rate and accuracy, but possibly affects the computational cost.

3.

The method is extremely robust in terms of required number of time measurements for components of the system for every data set. For the benchmark models, high accuracy is achieved even with the single measurement for each component at the final time instance.

4.

Optimal choice of the Tikhonov regularization parameter significantly increases the convergence rate and precision.

5.

The method is robust with respect to noisy measurements. Simulating up to 5% Gaussian noise in benchmark models does not affect the convergence rate, but only adds some additional error to final output in accordance with the noise level.

6.

Implementation of the Type II Tikhonov regularization significantly increases the convergence range of the algorithm.

7.

Method is robust with respect to partial measurements. Application to the benchmark model with measurements of only half of the components demonstrates convergence with slightly reduced but still quite high accuracy.

8.

The method is highly competitive and has an advantage over popular methods such as lsqnonlin, fmincon, nl2sol in terms of computational time, number of iterations and function evaluations.

9.

Combination of the method with "multi-start" strategy and latin hypercube sampling will open a way to develop a powerful global optimization method with least computational cost, which is robust with respect to nonlinearities and scales well with problem size

References

6 Appendix

Tables 2 and 3 demonstrate the dependence of the parameter identification on the number of data sets 1-5 for the Benchmark Model 1 (see Section 3.4).

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. N. Tikhonov, V. Y. Arsenin, Methods for Solving Ill-posed Problems, John Wiley & Sons, 1977.
2[2] A. Gábor, J. R. Banga, Robust and efficient parameter estimation in dynamic models of biological systems, BMC systems biology 9 (1) (2015) 74.
3[3] S. Audoly, G. Bellu, L. D’Angio, M. P. Saccomani, C. Cobelli, Global identifiability of nonlinear models of biological systems, IEEE Transactions on Biomedical Engineering 48 (1) (2001) 55–65.
4[4] P. Li, Q. D. Vu, Identification of parameter correlations for parameter estimation in dynamic biological models, BMC Systems Biology 7 (91).
5[5] M. J. Chappell, K. R. Godfrey, S. Vajda, Global identifiability of the parameters of nonlinear systems with specified inputs: A comparison of methods, Mathematical Biosciences 102 (1").
6[6] A. Raue, C. Kreutz, T. Maiwald, U. Klingmüller, J. Timmer, Addressing parameter identifiability by model-based experimentation, IET Systems Biology 5 (2) (2011) 120–130.
7[7] O.-T. Chis, J. R. Banga, E. Balsa-Canto, Structural identifiability of systems biology models: A critical comparison of methods, PLOS ONE 6.
8[8] M. Rodriguez-Fernandez, P. Mendes, J. R. Banga, A hybrid approach for efficient and robust parameter estimation in biochemical pathways, Biosystems 83 (2) (2006) 248 – 265.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Identification of Parameters for Large-scale Kinetic Models

Abstract

keywords:

1 Introduction

2 Description of the Method

2.1 Algorithm

2.2 Regularization

2.3 Identifiability vs. Practical Non-identifiabilty

3 Benchmark Model 1: Biological Network for a Three-step Pathway

3.1 Numerical Results with Noise-free Data Sets

3.2 Effect of the Regularization Parameter α\alphaα

3.3 Convergence vs. Number of Data Points

3.4 Convergence vs. Number of Data Sets

3.5 Range of convergence

3.6 Convergence with Noisy Measurements

3.7 Rate of convergence

3.8 Convergence with Partial Measurements

3.9 Comparison with lsqnonlin, nl2sol and fmincon

4 Benchmark Model 2: Central Carbon Metabolism of Escherichia coli

4.1 Numerical Results with Noise-free Data

4.2 Convergence with Noisy Measurements

4.3 Range of convergence

5 Conclusions

6 Appendix

3.2 Effect of the Regularization Parameter $\alpha$