Tropical Abstraction of Biochemical Reaction Networks with Guarantees
Andreea Beica, J\'er\^ome Feret, Tatjana Petrov

TL;DR
This paper introduces a tropical analysis-based approximation method for biochemical reaction network models that guarantees bounds on species concentrations, simplifying complex ODE systems while maintaining accuracy.
Contribution
It presents a novel tropical abstraction technique that provides guaranteed bounds and hybrid models for high-dimensional biochemical ODE systems, improving analysis efficiency.
Findings
Method provides sound interval bounds for species concentrations.
Hybrid models adapt to changing dominance relations during system dynamics.
Tested successfully on multiple biochemical case studies.
Abstract
Biochemical molecules interact through modification and binding reactions, giving raise to a combinatorial number of possible biochemical species. The time-dependent evolution of concentrations of the species is commonly described by a system of coupled ordinary differential equations (ODEs). However, the analysis of such high-dimensional, non-linear system of equations is often computationally expensive and even prohibitive in practice. The major challenge towards reducing such models is providing the guarantees as to how the solution of the reduced model relates to that of the original model, while avoiding to solve the original model. In this paper, we have designed and tested an approximation method for ODE models of biochemical reaction systems, in which the guarantees are our major requirement. Borrowing from tropical analysis techniques, dominance relations among terms of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Tropical Abstraction of Biochemical Reaction Networks with Guarantees
Andreea Beica
Département d’informatique
École normale supérieure, CNRS, PSL Research University
75005 Paris, France
Jérôme Feret
Département d’informatique
École normale supérieure, CNRS, PSL Research University
75005 Paris, France
INRIA
Tatjana Petrov
Department of Computer and Information Science
University of Konstanz, Germany
Abstract
Biochemical molecules interact through modification and binding reactions, giving raise to a combinatorial number of possible biochemical species. The time-dependent evolution of concentrations of the species is commonly described by a system of coupled ordinary differential equations (ODEs). However, the analysis of such high-dimensional, non-linear system of equations is often computationally expensive and even prohibitive in practice. The major challenge towards reducing such models is providing the guarantees as to how the solution of the reduced model relates to that of the original model, while avoiding to solve the original model.
In this paper, we have designed and tested an approximation method for ODE models of biochemical reaction systems, in which the guarantees are our major requirement. Borrowing from tropical analysis techniques, we look at the dominance relations among terms of each species’ ODE. These dominance relations can be exploited to simplify the original model, by neglecting the dominated terms. As the dominant subsystems can change during the system’s dynamics, depending on which species dominate the others, several possible modes exist. Thus, simpler models consisting of only the dominant subsystems can be assembled into hybrid, piecewise smooth models, which approximate the behavior of the initial system. By combining the detection of dominated terms with symbolic bounds propagation, we show how to approximate the original model by an assembly of simpler models, consisting in ordinary differential equations that provide time-dependent lower and upper bounds for the concentrations of the initial model’s species.
The utility of our method is twofold. On the one hand, it provides a reduction heuristics that performs without any prior knowledge of the initial system’s behavior (i.e., no simulation of the initial system is needed in order to reduce it). On the other hand, our method provides sound interval bounds for each species, and hence can serve to evaluate the faithfulness of tropicalization reduction heuristics for ODE models of biochemical reduction systems. The method is tested on several case studies.
keywords:
ODE models, model reduction, tropicalization, numerical approximation
††volume: NN††journal: Electronic Notes in Theoretical Computer Science††thanks: Email: \[email protected]††thanks: Email: \[email protected]††thanks: Email: \[email protected], Tatjana Petrov’s research is supported by the Ministry of Science, Research and the Arts of the state of Baden-Württemberg
1 Introduction
As biology becomes a data intensive science, due to advancements in high throughput molecular biology, the importance of in silico dynamical models of complex biological systems that are able to reproduce intricate behaviors observed in experimental settings increases. As such, modeling becomes a part of biological reasoning, but turns out to be a particularly challenging task. In particular, in models of biochemical networks, the number of possible chemical species is often subject to combinatorial explosion, due to the large number of species that may arise as a result of protein bindings and post-translational modifications [17]. As a consequence, mechanistic models of signaling pathways easily become very combinatorial. A common modeling approach describes the time-dependent evolution of concentrations of each of the modeled species through a system of coupled ordinary differential equations (ODEs). The combinatorial explosion of species and rich interaction scheme renders solving such a system of ODEs often prohibitory in practice, let alone the fact that is it already an approximation of its stochastic counter-part [20], as well as that the equations themselves do not transparently reflect the underlying mechanisms. Addressing the latter, formalisms allowing to write the mechanistic hypothesis in form of discrete transition steps have been proposed: Boolean networks[30], logical networks[32], Petri Nets[8], cellular automata[16], rule-based languages[10], to name the most common. Languages such as Kappa[10, 11] and BNGL[3] provide compact ways of describing models prone to combinatorial explosion, of simulating them [9], and even transforming them into ODEs [3]. However, the curse of dimensionality once again rises when trying to compute the system behavior.
A strategy to cope with such complexity is model reduction, in which certain properties of biochemical models are exploited in order to obtain simpler versions of the original complex model; these simpler models should preserve the important behavioral aspects of the initial system. An example of such a property is the multiscaleness of biochemical networks, with respect to both time-scales and species’ abundance. In the case of the former, it is known that biochemical processes governing network dynamics span over many well separated timescales: while protein complex formation occurs on the seconds scale, post-translational protein modification takes minutes, and changing gene expression can take hours, or even days. As for the latter, multiscaleness also applies to the abundance of various species in biochemical networks: the DNA molecule has one to a few copies, while mRNA copy numbers can vary from a few to tens of thousands. On the one hand, these widely different time- and concentration scales represent challenges for the estimation of rate constants, for the measurement of low-concentration species, and even for the numerical integration. On the other hand, they represent a feature that can be exploited for model reduction purposes, allowing to approximate the complete mechanistic description with simpler rate expressions, retaining the essential features of the full problem on the time scale or in the concentration range of interest. The dynamics of multiscale, large biochemical systems can be reduced to those of simpler models, called dominant subsystems [26], which contain less parameters and are easier to analyze. Dominant subsystems are chosen by comparing the time-scales of the large system. For example, the classical quasi steady-state (QSS) [5] and quasi-equilibrium (QE) approximations [23, 18] are conditions that lead to dominance, and represent popular methods for the computation of “first approximations” to the slow invariant manifold. Classical QSS is based on the small concentrations of highly reactive intermediate species (i.e., atoms, ions, enzymes and substrate-enzyme complexes)[4], while in the QE approximation the reduction of the full mechanism is done based on the existence of fast and slow reactions.
The multiscaleness property of biochemical network is by definition closely linked to the mathematical notion of dominance, captured in the framework of tropical analysis[1, 21]. Recently, a class of semi-formal methods for reducing and hybridizing models of biochemical networks has been developed, based on ideas from tropical analysis [26, 27, 28, 29]. These methods exploit the multiscaleness of biochemical networks, in order to deduce dominance relations among parameters and/or reaction rates, which can then be used to obtain a system of truncated ODEs (by eliminating the dominated terms). One of the advantages of using dominance relations in multi-scale networks is that it helps cope with parameter uncertainty: parameter values are replaced with their orders of magnitude, which are easier to determine. However, providing guarantees as to how the solution of the reduced model relates to the original one remains a challenge. Such is also the case in [2], where we proposed a reduction framework for rule-based models, based on time-scale separation. While this time-scale separation technique is justified by asymptotic convergence results, for any concrete parameter values, there is no information on the accuracy of the trajectories obtained by executing the reduced model.
In this paper, we design an approximation method for ODE models of biochemical networks, in which the guarantees are our major requirement. Our method combines abstraction and numerical approximation, and aims at providing better understanding/evaluation of tropical reduction methods. We abstract the solution of the original system of ODEs by a box, that over-approximates the state of the original system and provides lower and upper bounds for the value of each variable of the system in its current state. The simpler equations (which we call tropicalized) defining the hyperfaces of the box are obtained by combining the dominance concept borrowed from tropical analysis with symbolic bounds propagation. Mass invariants of the initial system of ODEs are used to refine the computed bounds, thus improving the accuracy of the method. The resulting (simplified) system provides a posteriori time-dependent lower and upper bounds for the concentrations of the initial model’s species, and thus bounds on numerical errors stemming from tropicalization. This means that no information on the original system’s trajectory is needed - the most important advantage of our approach. By contrast, the main difficulty of applying the classical QSS and QE reductions to biochemical models is that QE reactions and QSS species need to be specified a priori, which implies that some knowledge about the initial system’s behavior is necessary. This, in turn, means that significantly high-dimensional, non-linear systems cannot benefit from these reductions, as their analysis can be prohibitive in practice. An approach similar with respect to providing a posteriori time-dependent lower and upper bounds has been proposed in [7], where the differential semantics of rule-based models with non-contracting dynamics and unbounded sets of variables are treated. Rather than using dominance relations between ODE terms, a finite set of patterns is used in order to bound the number of occurrences of each pattern. Further related works, similar in the sense that they provide automatizible reduction methods with strong reduction guarantees are described in [13, 14]. However, both of these works are designed specifically for rule-based models, where they exploit the site-graph encoding of species’ structure, rather than the dominance regions.
Depending on the chosen granularity of mass-invariant-derived bounds, the method presented in this paper can either be used to reduce models of biochemical networks, or to quantify the approximation error of tropicalization reduction methods that do not involve guarantees. The guarantees of our method are obtained by formalizing the soundness relation between the original system of equations and the abstract system of ordinary differential equations operating on the coordinates of the hyper-faces of the box. The solution of a sound abstraction of an original system of differential equations, starting from a box that contains the initial state of the original system, defines a sound abstraction of the solution(s) of the original system. We apply our method to several case studies.
Outline The rest of this article is organized as follows. In Sect.2 we define the setting and concepts used in our approach, as well as introduce motivating examples. We then formally present and justify the method for deriving the system of reduced ODEs over the lower and upper bounds of species’ concentrations in Sect. 3. Also in Sect. 3, we present the two possible uses of our approach. We then discuss and conclude in Sect. 4.
2 Definitions and Motivating Example(s)
2.1 General Setting and Definitions
We define a dynamic reaction network over a set of species as a reaction system of the form:
[TABLE]
where is the reaction number, and for each reaction , are the non-negative reaction rate constants of the forward, respectively backward reaction.
In this paper, we focus on the ordinary differential equations (ODE) semantics of models of biochemical networks. The underlying assumptions are that the various species of the chemical network are highly abundant, that stochastic fluctuations are negligible, and that the reaction system is well-mixed. In these conditions, the state of the dynamic system (1) can be represented as a multiset of the species’ concentrations , and its dynamics is described by a system of ODEs:
[TABLE]
For each reaction , denotes the reaction (forward and backward) rate, and is a non-linear function of the concentrations. For example, the mass-action law reads:
[TABLE]
which in turn means that is a multivariate polynomial of the species’ concentrations. In other words, the ODE of the -th species, , under mass action kinetics reads as a sum of monomials, which can be split into production and consumption terms, according to the sign that precedes their occurence in the equation:
[TABLE]
where are Laurent polynomials with positive coefficients:
[TABLE]
For convenience purposes, we will denote , where represent the production, respectively the consumption, monomials.
The reduction heuristics that use ideas from tropical analysis exploit the concept of dominance, which we borrow for our method. Let and be two (positive) monomials. We define -dominance as the following partial order relation on the set of multivariate monomials defined on subsets of :
Definition 2.1**.**
(-dominance) For an , we say that dominates at a time point , denoted by , if .
In multiscale biochemical systems, the various monomials that compose the polynomials have different magnitude orders, such that at any given time there is only one or a few dominating monomials.
Definition 2.2**.**
(Dominant monomial of a polynomial) For a given , the dominant monomial of a polynomial is defined as .
By using the max-plus algebra idea that the sum of positive, well separated terms, can be replaced by the maximum term, each of the two polynomials of (4) can be replaced by their dominant monomials. The result is a reduced model, consisting of a piecewise smooth function. As the dominant monomials of the can change from one concentration domain to another, the reduced model is a piecewise-smooth hybrid model.
Definition 2.3**.**
(Two-term tropicalization of the smooth ODE system) We call two-term tropicalization of the smooth ODE system (4) the following piecewise-smooth system:
[TABLE]
We note that a one-term tropicalization of the smooth ODE system, , is also possible, but choosing only one dominant momomial instead of the production-consumption pair of dominant monomials leads to a less precise model reduction (as more information is discarded in the one-term tropicalization). Thus, in this paper, we choose to deal with the two-term method.
2.2 Motivating example: Michaelis-Menten
In Sect.1, we mentioned that the classical QSS[5] and QE[18, 23] approximations represent popular methods for the simplification of biochemical networks.
As such, our motivating example is the Michaelis-Menten mechanism. This enzymatic model illustrates how these two simple methods that use the idea of dominance can be useful for model reduction of nonlinear models with multiple timescales. The Michaelis-Menten mechanism consists of an enzyme that reversibly binds a substrate to form a complex, which in turn releases a product, while preserving the enzyme:
[TABLE]
The ODE system describing the evolution of the species’ concentration writes as:
[TABLE]
The Michaelis-Menten equation relates the rate of product formation to the substrate concentration:
[TABLE]
where is the total enzyme concentration, and is the Michaelis-Menten constant. Eq.(9)can be interpreted as the reaction rate of a reduced reactive system, equivalent to system (7), in which the intermediary complex has been eliminated:
[TABLE]
The approximation of (8) by (9) is generally considered to be sufficiently good if the QSS assumption holds, that is if the total initial enzyme concentration is much lower than the total initial concentration of substrate: . In this case, the complex is a low concentration fast species, whose concentration is dominated by that of the substrate; the value of almost instantly relaxes to a value determined by . Thus, one can set , and exploit this relation to pool the two reactions of the initial system (7) into an unique irreversible reaction (10). The QSS condition can also be stated as [26].
The original MM analysis used the complementary QE approximation, which considers the complex formation reaction to be fast and reversible: . Thus, the term dominates the term in Eq.(8), meaning the latter can be discarded from the ODE system, allowing for pooling of species, and resulting once again in a single step approximation that reads:
[TABLE]
with , if indeed . We note that if the QE assumption is indeed valid, .
One of the main difficulties of applying QSS and QE reductions to biochemical models is that the QE reactions and QSS species need to be specified a priori. Thus, simulation of the original model is sometimes 111In [27], the authors propose a formal method for the identification of QSS species and QE conditions, which follows from the calculation of the tropicalized system, and which does not require simulation of trajectories. needed in order to detect dominated species, which are either QSS species, or participate in QE reactions [26]. For high-dimensional non-linear systems, this requirement can represent an obstacle towards model reduction.
The issue regarding simulation of the initial system also arises when trying to quantify the efficiency of model reduction methods: ideally, the approximation errors resulting from the reduction should be computed without executing the original system.
Thus, in this paper we propose an approximation method for biochemical networks, in which no prior knowledge about the original system’s behavior is required. Our method combines the dominance concept with mass invariants of the original ODE system in order to compute inequality constraints on the species’ concentrations. These constraints are then combined with the original system of equations, in order to obtain a reduced system of ODEs that provides time-dependent lower and upper bounds on the species’ concentrations. Depending on the coarseness of detail we choose to incorporate in the mass invariant-generated inequalities, our approach can serve either as a reduction method, or to quantify the approximation errors of tropicalization reduction heuristics.
To achieve this, we abstract the original system by a box, the hyper-faces of which provide lower and upper bounds for the concentrations of the species. The two equations of the hyper-faces of a species represent simplified versions of the original differential equation of the species, in which only the dominant positive and negative monomials are considered. We refer to these equations as being tropicalized. Then, instead of interpreting the differential equations over the state of the original system, we will lift this interpretation conservatively over each hyper-face of the box. To do this, we will bound, for every hyper-face, the derivative of the corresponding coordinate in the solution of the original differential equation over the whole hyper-face. Our method should allow for formal evaluation of tropicalization approaches, and as such the bounds are derived using the dominance relations between monomials of the original ODE. Mass invariants of the original system will then be used to refine the bounds, and thus increase the accuracy of our method. By construction, the maximal solutions of the original, respectively tropicalized (i.e., abstracted) equations are related by the following soundness criterion: when both defined at time t, the state of the original system is within the hyper-box of the abstract system.
Example 2.4**.**
Let us consider the equations (2) of the Michaelis-Menten mechanism, under the QSS assumption: , i.e. , for an . From (2.1), it follows that one can write (by extension): . Then, we can deduce the following lower and upper bounds (that we call tropicalized) on the concentration of :
[TABLE]
For convenience purposes, we will use the notation for the species’ concentrations, . We propose to approximate the state of the system by a box of . A box of is a set of the form , where are pairs of numbers satisfying , . Intuitively, the real number provides a lower bound to the value of the variable , and denotes the face . We will denote this face as . The other faces are defined in the same way, and the same reasoning applies to , which provides an upper bound to the same variable. For ease of notation, we shall use to denote the vector .
Next, let us consider the following functions:
[TABLE]
The abstraction of the concrete system of equations is then defined as ,
.
If we fix the same initial conditions for both the concrete and the abstracted system, , we can relate the solution of the abstract system to that of the original one. For every , the real number provides a lower bound to the value of the function over the face , whereas the real number provides an upper bound to the value of the function over the face . That is to say, we have , for every pair , and , for every pair . Then, using the results of [19], we can conclude that, for every time point , and , the bounds:
[TABLE]
are satisfied. Thus, the solution of the abstract system of equations provides lower and upper bounds for the value of the variables of the original system of equations.
Remark 2.5**.**
In the above example, in order to obtain safe lower/upper bounds on ’s concentration, we make the variables range over the hyper-faces. One notices that the variable is treated specifically in the derivatives of the variables - any of its occurences is replaced by the variable corresponding to the hyper-face we want to bound. By contrast, the other variables, , are replaced according to the sign of their occurence:
- •
in , is replaced with
- •
in , is replaced with
This comes from the fact that the derivative on is evaluated on the corresponding hyper-face, which allows for greatly reducing the loss of precision. For a formal proof of the soundness of this approach, the reader is referred to [19]. Intuitively, it is justified by the intermediate value theorem: given a family of functions over the real field, if one function does not take the highest value at time , whereas it is the case at time , then necessarily, there exists a time such that in which takes the highest value while crossing another function of the family.
In Example 2.4, the inequality constraints on the concentrations of species were determined based on reaction rates constants that verify the QSS condition. In Fig.1, we show the time-evolution of the bounds on the concentration of the 4 species in the Michaelis-Menten system, for an arbitrarily chosen set of reaction rate constants and initial concentrations that satisfy the QSS condition (i.e., , and at time ). Nonetheless, our model reduction is sound no matter the value of initial concentrations and reaction rate constants. The equations have been integrated using the solver of Matlab[22]. Strictly speaking, numerical errors stemming from numerical integration may accumulate throughout the simulation, but herein we choose to ignore them.
In Fig.1, we notice that the bounds diverge at a fast rate from the original trajectory, despite the restriction of the derivative’s evaluation on the hyper-face of the box (as explained in Note 1). A way to improve accuracy is to take into account the original system’s mass invariants, when computing the bounds. In general, a biochemical system can have several conservation laws/mass invariants, which are linear functions of the concentrations, that are constant in time. These equality constraints can be used to refine the bounds on the initial system’s species’ concentrations. We can safely further restrict the evaluation of the derivative of each coordinate to the intersection of the corresponding hyperface with the subspace delimited by the conservation laws containing the variable itself. Because a variable can appear in more than one mass invariant, we choose to keep the most optimistic bound that can be computed by intersecting the hyper-face with the mass invariant subspace: the greatest lower bound, respectively the smallest upper bound.
Example 2.6**.**
In the Michaelis-Menten system, the total number of enzymes is constant, and so is the overall number of substrates and product. The two conservation laws can be written as: , with , and .
Assuming once more that , by substituting by or into 12, three equivalent tropicalized upper bounds on the concentration of are obtained:
[TABLE]
Lifting the interpretation of the differential equations over the hyper-face corresponding to results in three different expressions for the upper bound on , of possibly different accuracies:
[TABLE]
The most accurate sound upper bound on then writes as:
[TABLE]
Note 2.7**.**
The choice to introduce and operations in the expressions of the computed bounds is accounted for by our initial motivation: because existing tropicalization reduction heuristics are not justified by rigourous estimates, we aim to provide a method for quantifying errors stemming from such tropicalization reduction approaches, at the same time creating a tropicalization approach with guarantees222We nonetheless stress that our goal is not to correct the faults of existing tropicalization-inspired reduction methods, but rather quantify them by proposing a more rigorous tropicalization approach, in which the dominated monomials are bounded, rather than discarded from the ODEs. As such, we aim at computing error bounds that are as precise as possible, hence the choice of using and operations for bound refinement, albeit with the disadvantage of using functions that are not , thus introducing non-smooth vector fields. The trade-off between smoothness and precision can be tuned according to the desired goal: less precise bounds can be obtained by choosing to use smooth functions. Moreover, smoothness of vector fields is generally not guaranteed during the numerical simulation of biochemical models: as the model variables represent biochemical species’ concentrations, a good practice is to call the numerical solvers used to approximate the system’s behavior using with the ’Non-Negative’ option, which amounts to introducing a operation into the equations (i.e., ), in order to prevent negative values of variables.
The same reasoning can be applied to all variables appearing in the expression of , in order to obtain the most accurate upper bound:
[TABLE]
Note 2.8**.**
In (15), when computing the third bound, instead of substituting by its conservation law expression, , we choose to bound its value by an expression not containing . We do so in order to avoid introducing supplementary variables w.r.t. those present in the tropicalized original bound (i.e., , and ). This method, in which mass invariant partial refinement is introduced after the tropicalized bounds have been computed, can be considered as a per se model reduction method, as no supplementary information/complexity is introduced by incorporating the conservation laws. By contrast, the approach in which all the information contained by the conservation laws is exploited in order to derive the most accurate bounds can constitute a method of error-estimation for tropicalization based reduction heuristics. We present the two different methods formally in Section 3.
The issue of specifying QSS species and QE reactions a priori, when performing model reductions, is circumvented by our method. Instead, the notion of region is used in order to eliminate monomials from the species’ ODEs. Our method uses static inspection of each ODE, in order to partition the state space into different regions according to which production, respectively consumption terms dominate the others. Using this partitioning, simplified expressions bounding the species concentrations are derived for each region of the state space, allowing symbolic simplification and limiting numerical approximations.
Example 2.9**.**
In the case of the Michaelis-Menten mechanism, there are three possible dominance regions for leading to three possible pairs of lower and upper bounds:
- \normalshape(1)
Region 1, if dominates :
** 2. \normalshape(2)
Region 2, if dominates :
** 3. \normalshape(3)
Region 3, if there is no dominant rate (i.e., and are of comparable magnitude):
**
The complete system of equations obtained using mass invariants refinement of bounds, for all the possible dominance regions, can be found in Example 2.10. The improvement of bound accuracy via mass invariants can be observed in Fig. 2. As expected, one can also observe in Fig.2 that results become more precise as the value of increases, i.e. as and become more separated.
Example 2.10**.**
For convenience purposes, denote the species concentrations, , using . Then, the derivatives of the lower and upper bounds of the original system’s species’ concentrations write as:
- •
**
- •
**
- •
, with
- •
, with
- •
, with
- •
,with
- •
**
- •
**
The Michaelis-Menten system represents a particular, simple case study: the choice of reaction rate constants fixes the dominance region in which the system evolves. In general, the state of a biochemical network can traverse multiple such regions, as the dominant monomials can change from one concentration domain to another. Thus, we next introduce a case study in which the dominant monomials are concentration-dependent, which in turn means that the dominance region is no longer fixed. Our method is designed with this more general situation in mind: having computed the most accurate bounds for each region of the state space partitioning, and having no information regarding the region in which the original system evolves at a given time , our approach chooses the least accurate local bound, in order to ensure global soundness.
2.3 Motivating example: A DNA model
We construct a simple extension of the Michaelis-Menten system, in which the product formation reaction is catalazyed by a dimer of an enzyme . The reaction system and its ODE system333once again, we denote the species by read:
[TABLE]
[TABLE]
The mass invariants write:
[TABLE]
Dominance regions become concentration dependent: for example, the dominant positive monomial in is determined by the dominance relations between both the concentrations of and , and between reaction rate constants and .
This DNA example will serve as a case study for the remainder of our paper.
3 Combining ODEs and mass invariants
3.1 Model reduction using conservative numerical approximations
The guarantees of our method are a consequence of a carefully designed symbolic propagation of inequality constraints on the species’ concentrations. Thus, symbolic transformations have to be applied on numerical expressions, of which we introduce a syntax and semantics. We also introduce an alternative definition of a biochemical model to that presented in Sect. 2, which is then used to define and justify our approximation method.
Definition 3.1**.**
(Syntax of expressions) Let be a set of variables. We define an -expression inductively, as follows444the syntactic operators are written using a superscript dot, in order to distinguish them from their associated mathematical functions:
- \normalshape(1)
each positive real number is an -expression; 2. \normalshape(2)
each variable is an -expression; 3. \normalshape(3)
if is an -expression, then is an -expression; 4. \normalshape(4)
if and are -expressions, then , , , are all -expressions;
The set of -expressions is denoted as . Given an -expression , we define its support, denoted , as the set of variables it contains.
Definition 3.2**.**
(Semantics of expressions) Let be a set of variables and be an -expression. The semantics of the expression is the function , defined inductively as follows:
- \normalshape(1)
** 2. \normalshape(2)
** 3. \normalshape(3)
** 4. \normalshape(4)
** 5. \normalshape(5)
** 6. \normalshape(6)
** 7. \normalshape(7)
**
for every environment .
We use Defs. 3.1 and 3.2 to define the notion of system of symbolic differential equations and symbolic equality constraints derived from conservation laws.
Definition 3.3**.**
(Symbolic ODE system)
A system of symbolic ordinary differential equations and equality constraints modeling a biochemical network is a tuple , where:
- •
* is a set of variables, denoting species’ concentrations,*
- •
* is a non-negative function, mapping each species to its initial concentration,*
- •
* is a function describing the evolution of species’ concentrations, as described in Eq.(4):*
[TABLE]
with , Laurent polynomials with positive coefficients,
- •
555the number indexes the different ways of expressing a species , by using the mass invariants in which it appears* is a family of functions from the set into the set , denoting equality constraints derived from conservation laws, such that satisfying*
[TABLE]
the constraint
[TABLE]
is satisfied for every function of the family , , and for every time .
Example 3.4**.**
(A DNA example) In our running example, , is defined by the equations of (19), and the equality constraints derived from the conservation laws of (20) write:
[TABLE]
We partition the state space of each ODE into regions, each one defined by the corresponding pair of dominant monomials, (). At any given time , several monomials can be dominant, which can lead to an exponential number of possible regions. To circumvent this issue and obtain a linear number of regions, we choose to replace each region that has more than one dominant term with the unique region in which no term is dominant: if , we choose to keep in the reduced ODE, instead of replacing it with . The following definition formalizes these concepts.
Definition 3.5**.**
(State partitioning of a symbolic ODE) Let be a symbolic ODE system, and a scale separation constant. Then, for every variable , if and , its state space can be partitioned into regions, each one determined by the corresponding pair of dominant monomials
[TABLE]
Example 3.6**.**
(A DNA example) In Eq.(19), the state space of can be partitioned in 9 regions, as its ODE contains 2 positive terms and 2 negative terms:
\begin{array}[]{ccc}r_{2}^{1,1}=(k_{1}x_{1}^{2},k_{-1}x_{2});\quad r_{2}^{2,1}=(k_{-2}x_{4},k_{-1}x_{2});\quad r_{2}^{3,1}=(k_{1}x_{1}^{2}+k_{-2}x_{4},k_{-1}x_{2})\\ r_{2}^{1,2}=(k_{1}x_{1}^{2},k_{2}x_{2}x_{3});\quad r_{2}^{2,2}=(k_{-2}x_{4},k_{2}x_{2}x_{3});\quad r_{2}^{3,2}=(k_{1}x_{1}^{2}+k_{-2}x_{4},k_{2}x_{2}x_{3})\\ r_{2}^{1,3}=(k_{1}x_{1}^{2},k_{-1}x_{2}+k_{2}x_{2}x_{3});\quad r_{2}^{2,3}=(k_{-2}x_{4},k_{-1}x_{2}+k_{2}x_{2}x_{3});\quad r_{2}^{3,3}=(k_{1}x_{1}^{2}+k_{-2}x_{4},k_{-1}x_{2}+k_{2}x_{2}x_{3})\end{array}**
We next use the dominance relations that define each region, in order to obtain region-specific lower and upper bounds on the ODE being considered. The next definition formalizes this procedure:
Definition 3.7**.**
(Region-specific tropicalized bounds) Given a symbolic ODE system , and the set of regions for each species , the dominance definition 2.1 can be used to define the following functions, for every region :
[TABLE]
[TABLE]
Functions and provide symbolic tropicalized lower, resp. upper bounds for on region .
Example 3.8**.**
(A DNA example) In our running example, in region , the dominant positive (production) monomial is , and the dominant negative (consumption) monomial is . Formally, this writes as , and .
Thus, the specific tropicalized bounds write as:
[TABLE]
which by construction satisfy .
The bounds of Def. 3.7 can further be refined by using the mass invariants given by the family of functions , as follows:
Definition 3.9**.**
(Region-specific refined tropicalized bounds) Given a symbolic ODE system , the set of regions and the symbolic tropicalized bounds , for each species , we define the following bounds:
[TABLE]
with , , for each function of the family () that applies to the variable , and is either 0, or a bound generated by the dominating monomial inequality constraints.
Example 3.10**.**
(A DNA example)
When dealing with the tropicalized bounds of Ex.(3.8), one needs to refine the bounds of the variables in their support: . We do so by using their respective equality constraints from (21): , and .
What’s more, the second dominance inequality of region in Ex.(3.8)can be rewritten as . This allows for a new upper bound on variable : .
Using Def.3.9, the -specific bounds on and write as:
\begin{array}[]{cccc}\mathbb{L}_{2,1}^{2,1}(x_{2})=0;\quad\mathbb{L}_{2,2}^{2,1}(x_{2})=0;\quad\mathbb{L}_{2,1}^{2,1}(x_{4})=DNA_{0}-\epsilon\frac{k_{-1}}{k_{2}};\quad\mathbb{L}_{2,2}^{2,1}(x_{4})=0\\ \mathbb{U}_{2,1}^{2,1}(x_{2})=\frac{M_{0}}{2}-x_{4};\quad\mathbb{U}_{2,2}^{2,1}(x_{2})=\frac{M_{0}}{2}-DNA_{0}+\epsilon\frac{k_{-1}}{k_{2}};\quad\mathbb{U}_{2,1}^{2,1}(x_{4})=DNA_{0};\quad\mathbb{U}_{2,2}^{2,1}(x_{4})=\frac{M_{0}}{2}-x_{2}\end{array}**
Using mass invariants to compute the most optimistic bound is done inductively over the expressions of the candidate bounds, by applying usual formulae of interval arithmeticsto propagate the and operators. The resulting evaluation functions, which we call and respectively, are detailed in Appendix A.
With all this in place, we can proceed to the definition of the reduced system.
Definition 3.11**.**
(Reduced system) Let be a system of ordinary equations with equality constraints. The reduction of the system is defined as the triple , with:
- \normalshape(1)
** 2. \normalshape(2)
* is defined by h* 3. \normalshape(3)
, defined as:
[TABLE]
for every variable , where:
- •
**
- •
**
- •
**
- •
**
Intuitively, for each region of species , the reduction method first replaces by the pair of tropicalized lower and upper bounds, and , that result directly from the dominance inequalities that characterize the region. Then, and are refined, using the bounds on variables that can be deduced from the conservation laws of the original system. For example, replacing any occurence of a variable in with one of its expressions (or with its appropriate bound derived from 666 for the positive occurences of , and for its negative occurences) results in another safe upper bound for . By choosing the minimum such candidate bound, one obtains the most accurate, locally safe upper bound. The same reasoning applies to the computation of lower bounds, but the operation is replaced with .
In order to obtain safe (i.e., correct) global bounds, the least precise local bounds are chosen: the miminal lower, resp. the maximal upper bounds.
Finally, the interpretation of the variables is lifted over the hyper-faces. Any occurence of is replaced with its analogue corresponding to the hyperface we want to bound, while the others are replaced to their analogue given by the polarity of their occurence, as explained in Note 2.5.
Theorem 3.12**.**
Let be a system of ordinary equations with equality constraints. Let be a reduction of the system .
Let be a function from the set into the set s.t. for every variable , we have:
[TABLE]
and be a function from the set into the set s.t. for very abstract variable , we have:
[TABLE]
Under these assumptions, we have that for every variable and every time :
[TABLE]
i.e.**, the reduced system provides sound lower and upper bounds for the concentration of the original system’s species.
Example 3.13**.**
We apply our method on the DNA example constructed in Sect.2.3, for different values of the scale separating constant , and for arbitrarily chosen reaction rate constants and initial concentrations . We show in Fig.3 the time evolutions on the bounds on the concentration of the product , i.e. the variable . We notice once again that the results become more precise as decreases, i.e. as the monomials become more separated. As an example, the ODE of the lower bound of species can be found in Appendix B.
4 Error estimates of tropicalized systems using conservative numerical approximations
Our approach also serves as a heuristics for quantifying errors of tropicalization approaches for biochemical model reduction, provided a slight modification is applied to Def.3.9. Instead of computing bounds using only variables from the support of the tropicalized bounds, one can use the equality constraints , to refine the accuracy of bounds. The resulting model presents a trade-off: it introduces new variables w.r.t. the support of the tropicalized bounds, albeit exclusively in the form of conservation laws which are always linear functions, but gains in bound accuracy. As such, the approximation error/accuracy of a given reduction method can be assessed by checking if the reduced trajectory lies between the lower and upper bounds computed by our method.
Example 4.1**.**
It is well known that the Michaelis-Menten reduction is valid only under the QSS and QE assumptions. In Fig.4, we simulate the reduced Michaelis-Menten system (10), as well as our modified reduced system, as presented above, for a set of initial conditions that no longer satisfy the QSS assumption, i.e. the total enzyme concentration is comparable to the total substrate concentration. As expected, the reduced Michaelis-Menten system no longer represents a good approximation of the initial enzymatic system (7); this is reflected by the fact that the trajectory of the reduced model does not lie between the lower and upper bounds computed by our approach.
4.1 Tyson’s Cell Cycle Model
The tropicalization heuristics can be difficult to justify by rigourous estimates, although this is possible in some cases[26]. For example, the existence of tropical varieties - the set of points where at least two monomials of are equal- can lead to sliding modes, which in turn represent challenges in providing accuracy justifications for hybrid models obtained using tropical ideas. Sliding modes are well known phenomena in ODEs with discontinuous vector fields, in which the dynamics can follow discontinuity hyper-surfaces where the vector field is not defined; what’s more, the conditions for the existence of sliding modes are usually intricate. As noted in [27], sliding modes can have a nefarious effect on the behavior of the tropicalized system: tropical varieties (i.e. tropical curves) decompose the state space into sectors corresponding to the smooth modes of the hybrid tropicalized system, which passes from one type of smooth dynamics to another intrinsically, when the trajectory attains the tropical curve. However, if certain conditions w.r.t. the sliding modes are fulfilled, the trajectory can continue along some tropical curve instead of changing sector, which further deviates the reduced system’s trajectory from the original one (see Figure 1 in [27], for an example).
In [27], such phenomena become apparent when tropicalization is applied to the minimal cell cycle model proposed by Tyson[31], in order to obtain a reduced hybrid model. The Tyson model describes the interplay between cyclin and cyclin dependent kinase cdc2 during the progression of the cell cycle, and demonstrates the existence of three possible regimes, that can be associated to different phases in the cell life: the biochemical system can either function as an oscillator, converge to a steady state, or behave as an excitable switch. The three possible behaviours can be associated to early embryos rapid division, arrest of unfertilised eggs and growth controlled division of somatic cells, respectively.
The dynamics of this non-linear model with rational reaction rates contains 6 species and 9 reactions, and is described by the following system of polynomial differential equations:
[TABLE]
and has the conservation law , where the value 1 denoting the total initial concentration of kinase cdc2 (i.e. ) was chosen by convenience. The values of the reaction rates constants are fixed as to have the model display the oscillatory behavior: .
In [26, 27], a hybrid model of the Tyson cell cycle is obtained by detecting and eliminating QSS species of the original model, pruning dominated monomials, and then ultimately tropicalizing the reduced-size model. Besides having the inconvenient of analyzing trajectories of the original model in order to detect QSS species, the reduced model suffers from the the sliding mode-related issues mentioned above: although both the smooth (original) and the reduced system exhibit oscillating behavior and have stable periodic trajectory (i.e. limit cycle), the period of the tropicalized limit cycle is different with respect to that of the smooth cycle, due to the tropicalized trajectory sliding along the tropical manifolds instead of changing sectors. Having different oscillation periods means in turn that assessing the accuracy of the tropicalized reduced model is not a trivial question, as the distance between original and tropicalized trajectories is variable from cycle to cycle (as can be seen in Figure5). What’s more, it can also provide an indication of the poor performance of tropicalization based reduction methods when dealing with more complex systems, such oscillating systems.
Indeed, by applying our method to the original Tyson model, we are able to effectively provide guarantees on the reduced model, albeit not very strong ones: this could be interpreted as an indication of the poor accuracy of the tropicalized Tyson model. In Figure 5, we plot the bounds for the concentration of species obtained using our method, in order to compare the trajectory of the original model to the one of the hybrid one that can be found in [26]. The lack of oscillating behavior in the computed bounds could intuitively be explained by the difference in period of the original and reduced systems, that causes a shift at every cycle in the tropicalized trajectory w.r.t. the original behavior. Nonetheless, the obtained bounds accurately capture the amplitude of the tropicalized model. One also notes that the time points where the upper bound, respectively the tropicalized system, begin to diverge w.r.t. the original trajectory coincide.
From a more practical point of view, we note that while simulation of the tropicalized Tyson model proposed in [27] was performed in 354.876706 seconds, on a 2.2 GHz Intel Core i7 processor and 16 GB 1600 MHz DDR3 memory, simulation of the model obtained via our method was performed in only 9.775511 seconds using the same numerical integration method (i.e. Matlab’s ode15s), thus providing a significant improvement in computation time.
We note that an alternative reduced model is obtained in [26], using tropical equilibration, that circumvents the need to simulate the original system. We plan to include tropical equilibration techniques in future work.
5 Comparison with existing methods
We mentioned previously that numerical errors stemming from numerical integration are ignored herein. Indeed, numerical integration methods, while heavily used, only provide approximations of the solution of the initial value problem (IVP) of ODE systems. Even when using variable-step size methods, there are no guarantees that the approximate solution computed by the chosen method is close to the actual solution. In order to solve the drawbacks associated to traditional ODE solvers/numerical solutions of IVP, interval numerical methods for IVP are used for computing validated enclosures of the solution of an IVP for an ODE. For example, the VNODE-LP[24] C++ solver proves that a unique solution to a problem exists, and then computes rigorous bounds that are guaranteed to contain it. Such bounds can then be used to help prove theoretical results, check if a solution satisfied a condition in a safety-critical calculation, or simply to verify the results produced by a traditional ODE solver. Another example of such software is the CAPD library [6]. Both represent well-established software for computing enclosures of generic ODE systems, and are integrated in various SMT solvers (e.g., iSAT[12], dReal[15]). For a more comprehensive state of the art on such methods, the reader is refered to [25].
Interval methods for IVPs for ordinary differential equations are typically based on Taylor series expansions, which require the computation of Taylor coefficients up to some order . Given a final time point, the aim is to compute interval vectors that are guaranteed to contain the solution to a given IVP, at all intermediary points. In order to compute such interval vectors, interval propagation methods are used to enclose roundoff and truncation errors in the computed bounds, and thus obtain rigorous bounds on the true solution of the ODE.
In our approach, instead of interpreting the differential equations over the state of the system, the interpretation is lifted conservatively over each hyper-face of the hyper-box abstracting the system state (i.e., we over-approximate the derivatives only on the hyper-faces). When compared to our method, interval propagation methods over-approximate the partial derivatives of the function over the whole enclosing hyper-box, instead of doing so only on the hyper-faces. This in turn means that our approach computes tighter bounds than those computed by interval methods for IVPs.
We demonstrate our claim with the following example:
Example 5.1**.**
Let us consider the following initial value problem :
**
As presented in Section 3, our framework can be decomposed in two independents parts: the first part consists in synthesizing bounds on the derivatives of the original system, and the second part deals with the propagation of said bounds, in order to obtain the enclosing system.
As our goal is to better understand and evaluate tropicalization approaches for biochemical model reduction, so far we chose to focus on bounds obtained by using dominance relations between monomials. The second part of our method simply represents an improved alternative to existing ODE enclosure methods, as explained above, and as such can be used in such methods in order to get better enclosure results.
For example, in order to compare the performance of our method to that of VNODE-LP and CAPD, instead of using dominance relations to derive inequality constraints on species’ concentrations, we now use the Taylor Series expansion with terms ( will serve as a parameter) for the functions and , in order to derive bounds on and :
**
*Then, for a fixed order and an , instead of using dominance-related inequalities with our method, one can use the following inequalities:
[TABLE]
,
where for is the residual for the Taylor expansion, and can be bound by , which in turn is for .
Our method then proceeds as usual to the computation of ODEs for bounds on the concentrations of and .
In Fig.6, we compare the accuracy of our method to that of VNODE-LP and CAPD, for different values of the order . The accuracy is given by the tightness of bounds, which can be evaluated by computing the difference between the upper and lower bounds, during a simulation. The results indicate that, when compared to existing enclosure interval methods, our approach represents a consistent improvement of several orders of magnitude, across different values of .
6 Conclusion and outlook
In this paper, we present an approximation method for biochemical networks, which can also serve as a technique for evaluating the faithfulness of existing tropicalization reduction methods that do not involve guarantees. Our approach relies on the multiscaleness property of biochemical systems. Tropical geometry offers a natural framework for studying such networks. Tropical approaches [28, 29] can guide model reduction of ODE systems, by using time- and concentration scales separation to identify and neglect equation terms whose values are significantly smaller than those of other terms of the same equation. This leads to partitioning the state space into different regions, according to which term dominates the others. A similar approach is employed in our method, but instead of neglecting the dominated terms, we propose to conservatively bound their value using an amortizing scale separation constant and the value of the dominant terms. These bounds can be further refined by incorporating the conservation laws of the initial system. The resulting approximated model is composed of two-term ODEs (which we call tropicalized), which by construction provide time-dependent lower and upper bounds for the concentration of the initial system’s species. As such, our approach can also serve to test the accuracy of other given reduction methods, while circumventing the execution of the original system: the suitability of a reduction will be confirmed if the reduced model’s trajectory lies between the bounds provided by our abstraction.
We have tested our approach on the classical Michaelis-Menten system, a simple extension of it, and Tyson’s cell cycle model. Our method can be easily automatized, either using static analysis, or existing symbolic math tools777for example, Matlab’s Symbolic Math Toolbox ; as such, Definitions 3.1-3.11 are written in an operational-semantics style, as to describe the different procedures composing the algorithm that implements our method. A tool that automatizes our approach is currently being developed. Further work also includes expanding the case studies to larger networks, possibly with no conservation laws.
Appendix A Symbolic propagation of min and max
The propagation of the and operations over -expression is done using the evaluation functions and , which are defined by mutual induction over the syntax of -expressions denoting monomials:
- \normalshape(1)
,
- •
- •
2. \normalshape(2)
,
- •
- •
3. \normalshape(3)
,
- •
- •
4. \normalshape(4)
,
- •
- •
5. \normalshape(5)
,
- •
- •
Appendix B A DNA model: bound equations
Below, we give the equation of the lower bound on species from our running DNA model example. According to Def.3.11, the derivative of lower bound on the concentration of is computed by selecting the minimum region-dependent (i.e., local) lower bound, out of the 9 possible cases:
[TABLE]
with
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. Baccelli, G. Cohen, G. J. Olsder, J.-P. Quadrat. Synchronisation and linearity . Wiley, 1992
- 2[2] A. Beica, C. C. Guet, T. Petrov. Efficient reduction of kappa models by static inspection of the rule-set . In: A. Abate, D. Safranek (Eds.), Hybrid Systems Biology - Fourth International Workshop, HSB 2015, Madrid, Spain, September 4-5, 2015. Revised Selected Papers, Vol. 9271 of Lecture Notes in Computer Science, Springer, 2015, pp. 173–191
- 3[3] M.L. Blinov, J.R. Faeder, B. Goldstein, and W.S. Hlavacek. Bionetgen: software for rule-based modeling of signal transduction based on the interactions of molecular domains . Bioinformatics, 20(17), 2004.
- 4[4] C. Boseung, GA Rempala, JK Kim. Beyond the Michaelis-Menten equation: Accurate and efficient estimation of enzyme kinetic parameters . Scientific Reports, 7(1), 2017.
- 5[5] G.E. Briggs, J.B.S. Haldane. A note on the kinematics of enzyme action . Biochem J. 19 (2): 338–339, 1925.
- 6[6] Computer Assisted Proofs in Dynamics (CAPD) Library. http://capd.ii.uj.edu.pl
- 7[7] K. Chanseau Saint-Germain, J. Feret, Conservative numerical approximations of the differential semantics in biological rule-based models , 2016
- 8[8] C. Chaouiya. Petri net modelling of biological networks . Briefings in Bioinformatics, Volume 8, Issue 4, 1 July 2007, Pages 210–219.
