Identifying the Parametric Occurrence of Multiple Steady States for some Biological Networks
R. Bradford, J.H. Davenport, M. England, H. Errami, V. Gerdt, D., Grigoriev, C. Hoyt, M. Kosta, O. Radulescu, T. Sturm, and A. Weber

TL;DR
This paper develops symbolic computation methods to identify parameter regions where biological network models exhibit multiple steady states, focusing on MAPK networks with complex algebraic conditions.
Contribution
It introduces semi-algebraic descriptions of multistationarity regions using advanced symbolic tools, improving understanding of parameter-dependent steady states in biological models.
Findings
Successfully characterized multistationarity regions in 2D parameter space
Applied symbolic methods to a complex 11-variable MAPK model
Extended techniques to locate regions in 3D parameter space
Abstract
We consider a problem from biological network analysis of determining regions in a parameter space over which there are multiple steady states for positive real values of variables and parameters. We describe multiple approaches to address the problem using tools from Symbolic Computation. We describe how progress was made to achieve semi-algebraic descriptions of the multistationarity regions of parameter space, and compare symbolic results to numerical methods. The biological networks studied are models of the mitogen-activated protein kinases (MAPK) network which has already consumed considerable effort using special insights into its structure of corresponding models. Our main example is a model with 11 equations in 11 variables and 19 parameters, 3 of which are of interest for symbolic treatment. The model also imposes positivity conditions on all variables and parameters. We…
| 90.6512 | 17.6392 | 122.034 | 323.761 | |
| 2.67311 | 6.97675 | 14.6721 | 9.49621 | |
| 10.4996 | 367.57 | 234.974 | 37.1013 | |
| 17.8545 | 36.6772 | 14.5102 | 6.72938 | |
| 35.9695 | 5.50874 | 7.16952 | 13.6295 | |
| 32.0501 | 12.811 | 35.064 | 43.1428 | |
| 0.0954536 | 0.511775 | 0.42579 | 0.127807 | |
| 15.5631 | 83.4416 | 69.4223 | 20.8381 | |
| 2.39331 | 8.06095 | 7.43877 | 3.21139 | |
| 0.641001 | 0.25622 | 0.70128 | 0.862856 | |
| 45.4331 | 2.73253 | 15.2681 | 61.4581 |
| Numerical | Symbolic | ||||
|---|---|---|---|---|---|
| Model | Mean | Mean | Median | StdDev | Maximum |
| 26 – Original | 2.4 | 0.568 | 0.530 | 0.107 | 0.905 |
| 26 – Reduced | 0.85 | 0.053 | 0.047 | 0.036 | 0.343 |
| 28 – Original | 16.57 | 42.430 | 40.529 | 8.632 | 84.116 |
| 28 – Reduced | 0.485 | 0.468 | 0.119 | 0.796 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Identifying the Parametric Occurrence of Multiple Steady States for some Biological Networks
R. Bradford
J.H. Davenport
M. England
H. Errami
V. Gerdt
D. Grigoriev
C. Hoyt
M. Košta
O. Radulescu
T. Sturm
A. Weber
Department of Computer Science, University of Bath, UK
Faculty of Engineering, Environment and Computing, Coventry University, UK
Institute for Informatics, University of Bonn, Germany
Joint Institute for Nuclear Research (JINR), Dubna, Russian Federation
and Friendship University of Russia (RUDN University), Moscow, Russian Federation
CNRS & University of Lille, France
Department of Life Science Informatics, B-IT, University of Bonn, Germany
Slovak Academy of Sciences, Slovakia
DIMNP, University of Montpellier, France
CNRS, Inria, and the University of Lorraine, Nancy, France
MPI Informatics and Saarland University, Saarbrücken, Germany
Abstract
We consider a problem from biological network analysis of determining regions in a parameter space over which there are multiple steady states for positive real values of variables and parameters. We describe multiple approaches to address the problem using tools from Symbolic Computation. We describe how progress was made to achieve semi-algebraic descriptions of the multistationarity regions of parameter space, and compare symbolic results to numerical methods.
The biological networks studied are models of the mitogen-activated protein kinases (MAPK) network which has already consumed considerable effort using special insights into its structure of corresponding models. Our main example is a model with 11 equations in 11 variables and 19 parameters, 3 of which are of interest for symbolic treatment. The model also imposes positivity conditions on all variables and parameters.
We apply combinations of symbolic computation methods designed for mixed equality/inequality systems, specifically virtual substitution, lazy real triangularization and cylindrical algebraic decomposition, as well as a simplification technique adapted from Gaussian elimination and graph theory.
We are able to determine multistationarity of our main example over a 2-dimensional parameter space. We also study a second MAPK model and a symbolic grid sampling technique which can locate such regions in 3-dimensional parameter space.
keywords:
Mixed Equation / Inequality Solving , Real Quantifier Elimination , Biological Networks , Signaling Pathways , MAPK
1 Introduction
In this work we describe the application of combinations of symbolic computation methods in various computer algebra systems to a key problem from computational biology. The work serves to demonstrate how recent advances in such algorithms, and crucially their effective combination, allows for their application on problem instances previously thought beyond reach. In this introduction we start by describing the biological networks that are our topic of study, and highlight previous relevant work. We then outline the remainder of the paper and clarify the relationship of this article to prior work.
1.1 Multistationarity
The mathematical modelling of intra-cellular biological processes has been using nonlinear ordinary differential equations since the early ages of mathematical biophysics in the 1940s and 50s [49]. A standard modelling choice for cellular circuitry is to use chemical reactions with mass action law kinetics, leading to polynomial differential equations. Rational functions kinetics, for instance the Michaelis-Menten kinetics, can generally be decomposed into several mass action steps.
An important property of biological systems is their multistationarity by which we mean their having multiple stable steady states. It is instrumental to cellular memory and cell differentiation during development or regeneration of multicellular organisms and is also used by micro-organisms in survival strategies.
It is thus important to determine the parameter values for which a biochemical model is multistationary. As demonstrated in the next section, with mass action reactions, testing for multiple steady states boils down to counting real positive solutions of algebraic systems and so is suitable for study with Symbolic Computation and Computer Algebra Systems.
The models studied in this paper concern intracellular signaling pathways. These pathways transmit information about the cell environment by inducing cascades of protein modifications (phosphorylation) all the way from the plasma membrane via the cytosol to genes in the cell nucleus. Multistationarity of signaling usually occurs as a result of activation of upstream signaling proteins by downstream components [5]. A different mechanism for producing multistationarity in signaling pathways was proposed by Markevich et al. [45]. In this mechanism the cause of multistationarity are multiple phosphorylation/ dephosphorylation cycles that share enzymes. A simple, two steps phosphorylation/dephosphorylation cycle is capable of ultrasensitivity, a form of all or nothing response with no multiple steady states (the Goldbeter–Koshland mechanism). In multiple phosphorylation/dephosphorylation cycles, enzyme sharing provides competitive interactions and positive feedback that ultimately leads to multistationarity [45, 40].
1.2 Bistability
Multistationarity has important consequences on the capacity of signaling pathways to process biological signals, even in its elementary form of two stable steady states. This is known as bistability and is present in our case study problems. Bistable switches can act as memory circuits storing the information needed for later stages of processing [62]. The response of bistable signaling pathways shows hysteresis, namely dynamic and static lags between input and output. Because of hysteresis one can have, at the same time, a sharp binary response and protection against chatter noise.
1.3 Prior Symbolic Work
Our study is complementary to works applying numerical methods to ordinary differential equations models used for biology applications. Gross et al. [31] used polynomial homotopy continuation methods for global parameter estimation of mass action models. Bifurcations and multistationarity of signaling cascades was studied with numerical methods based on the Jacobian matrix by Zumsande and Gross [64].
Algorithmically the task will be to count the positive real solutions of a parameterised system of polynomial or rational systems, making symbolic methods a possible tool. Due to the high computational complexity of this task [30] considerable work has been done to use specific properties of networks and to investigate the potential of multistationarity of a biological network out of the network structure.
This only determines whether or not there exist rate constants allowing multiple steady states, instead of coming up with a semi-algebraic description of the range of parameters yielding this property. These approaches can be traced back to the origins of Feinberg’s Chemical Reaction Network Theory (CRNT) whose main result is that networks of deficiency 0 have a unique positive steady state for all rate constants [26, 17]. We refer to Conradi et al. [15], Millán and Turjanski [48], Johnston [36], and Conradi et al. [14] for the use of CRNT and other graph theoretic methods to determine potential existence of multiple positive steady states, with Joshi and Shiu [37] giving a survey.
Given a bistable mechanism it is also important to compute the bistability domains in parameter space: the parameter values for which there is more than one stable steady state. The size of bistability domains gives the spread of the hysteresis and quantifies the robustness of the switches. The work of Wang and Xia [57] is relevant here: they used symbolic tools, including cylindrical algebraic decomposition as we do, to determine the number of steady states and their stability for several systems. They reported results up to a 5-dimensional system using specified parameter values, but their method is extensible to parametric questions. Higher-dimensional systems were studied using sign conditions on the coefficients of the characteristic polynomial of the Jacobian. In some cases these guarantee uniqueness of the steady state [16].
1.4 Outline and New Contributions
In Section 2 we outline the particular biological model and symbolic problem that we aim to solve: BioModel 26 of the MAPK network, which can be found as Model 26 in the BioModels Database of [42].
In Sections 3 and 4 we describe two independent symbolic attempts to solve the problem. The first in Section 3 is able to identify symbolically the multistationarity regions of a 1-dimensional parameter space with a combination of Virtual Substitution and Cylindrical Algebraic Decomposition in the Redlog package for Reduce. The second in Section 4 goes on to give full semi-algebraic solution formulae with a combination of Real Triangularization and Cylindrical Algebraic Decomposition using the Regular Chains Library for Maple. The solutions were obtained in different computer algebra systems using different fundamental algorithms, but all from the family of methods for real quantifier elimination. We move on in Section 5 to describe a new pre-processing method for the problems inspired by graph theory and Gaussian elimination. Then in Section 6 we describe how a combination of ideas from all three preceding sections can be combined to provide solutions over a 2-dimensional parameter space.
In Section 7 we discuss testing the stability of fixed points. Then in Section 8 we consider an alternative larger model from the MAPK network (Model 28 in the BioModels Database of [42]). In Section 9 we compare the models and detour to describe a symbolic grid sampling approach to this problem, including a comparison of this to a leading numerical solver. We consider how further progress could be achieved in Section 10, identifying a conjecture for determining where multistationarity for MAPK may occur without the costly calculations described. Finally we summarise and give final thoughts in Section 11.
This journal article follows published conference works at ISSAC 2017 [6] and CASC 2017 [23]. The present article reproduces this material clarifying, correcting and extending in places. In particular, Sections 3 and 4 were largely described in the ISSAC 2017 paper and Sections 5 and 9 in the CASC 2017 paper. The most notable new contributions are given in Section 6, where we describe for the first time semi-algebraic solutions with two free parameters; and in Section 10, where we identify a promising conjecture for future investigation.
2 Problem Outline
2.1 MAPK Bio-Model 26
The model of the MAPK cascade we are investigating can be found in the BioModels Database [42] as Model 26111www.ebi.ac.uk/biomodels-main/BIOMD0000000026. This is the first version of the models proposed by Markevich et al. [45] corresponding to the so-called distributive ordered phosphorylation/dephosphorylation mechanism. Hereafter we will refer to it as Model 26.
It is given by the following set of differential equations. We have renamed the species names to , …, and the rate constants to , …, to facilitate reading. As usual means the time derivative of .
[TABLE]
Later, we will use to refer to (1) with all the left hand sides replaced by [math] in order to find fixed points of the system. The BioModels Database gives us meaningful values for the rate constants:
[TABLE]
Some of these values are accurately measured and some are well-educated guesses. For the purpose of our study we assume they are all suitable.
We may add three linear conservation constraints to this system, which in turn introduce three further constant parameters , , :
[TABLE]
Computations to produce these, for example in MathWorks SimBiology, use the left-null space of the stoichiometric matrix under positivity conditions. For details see for example Schuster and Höfer [50].
The constants , and represent total initial concentrations of cell substances, and meaningful values are harder to obtain than for (2). The following are some realistic value estimates, used by Markevich et al. [45]:
[TABLE]
These should be considered significantly less reliable than those in (2). Indeed, the long-term goal of our research is to treat all three of these together parametrically, although in the present work we produce results only with of these parameters free.
Our computational biology problem is to identify regions in parameter space over which the system formed by the unions of constraints in (1) and (3) under estimates (2) exhibits multistationarity.
The system has several special structure properties, e.g. it is a so called MESSI system [47]. However, in the following we will not directly use this structure property. The non-linearities occurring in the system are at most quadratic. As by introducing new variables the general polynomial case can be reduced to such a case and from a dynamical systems perspective point of view already quadratic systems are capable to generate all kinds of structurally stable dynamics including chaos [54] this property is not restrictive.
2.2 Real Algebraic Problem
To identify fixed points we formulate a real algebraic problem by first replacing the left hand sides of all equations in (1) with [math], which as noted above we denote . This, together with the equations in (3), yields an algebraic system with polynomials in
[TABLE]
However, ideal theory is not sufficient, as we are concerned only with real valued solutions. Further, we have the additional inequality restrictions that all entities in our model are strictly positive. This yields an additional system
[TABLE]
establishing a side condition on the solutions of that all variables and parameters of be positive. In terms of first-order logic our specification of and yields a quantifier-free Tarski formula,
[TABLE]
The estimations for the rate constants in (2) formally establish a substitution rule in postfix notation, which can be applied to , , or . Applying this to ; converting the floats from (2) into rational numbers; and multiplying over common denominators, gives us the quantifier-free Tarski formula below.
[TABLE]
Our problem in real algebra is to obtain a semi-algebraic description of the regions in parameter-space where there are multiple solutions of (6). The multistationarity problem would also require to know about the stability of these solutions, as discussed in Section 7.
2.3 Suitable Symbolic Technology
This real algebraic problem is amenable to technology developed for real quantifier elimination. Note that the number of indeterminates (variables and parameters) is high compared to those usually tackled by such technology. However, the degrees involved are low, with every monomial at most degree 2, which helps make it tractable.
As we will not include a priori information about the stability of the fixed points, we must not only consider the existence of (at least) two stable fixed points but also unstable fixed points. Hence we simply investigate where in parameter space there exist multiple different roots of .
In theory, any Real Quantifier Elimination (QE) technology can directly handle the parametric existence of steady states, taking as input and producing as output a quantifier free formula in the parameters describing where solutions exists. However, this is not sufficient to solve our problem as we are not only interested in the existence but also in the number of solutions. We can use a specific QE tool to do this: Cylindrical Algebraic Decomposition.
2.3.1 Cylindrical algebraic decomposition and its terminology
Cylindrical Algebraic Decomposition (CAD) was first proposed by Collins in the 1970s. This original algorithm222see for example the work of Arnon et al. [1]. took as input a set of polynomials in , producing as output a set of cells which together give a decomposition of which is sign-invariant, meaning each input polynomial has constant sign over each cell. The sign-invariance means that the polynomials may be studied over an an infinite domain by querying a finite number of sample points: one per cell.
The cells are all semi-algebraic, meaning they can be described by a polynomial system, and they are arranged cylindrically, meaning their projections with respect to a stated variable ordering are either equal or disjoint. The cylindricity means the semi-algebraic descriptions are triangular and the cells form cylinders over another (induced) CAD of given by the projection of the -dimensional cells. All cells are either sections, defined by a polynomial vanishing; or a sector, defined as the space between two sections, or possibly extending infinitely.
Collins’ algorithm proceeded with a system of: projection, which identified key polynomials in fewer variables; and lifting, where the induced CADs are incrementally constructed via substitution of sample points and univariate root isolation. The act of projection must be defined so that working at a sample point may be concluded representative for the entire cell.
There has been numerous extensions and improvements to CAD since Collins’ original method. The collection edited by Caviness and Johnson [10] is a key resource; in particular the survey paper within by Collins [13]. A more recent survey was given in the Introduction section of the work by Bradford et al. [7]. A key choice for CAD is the variable ordering which defines the cylindricity property and controls the order steps are taken by the algorithm. For use in quantifier elimination CAD must project variables in the order they are quantified. Our problem (6) is not quantified but our desire to understand the problem over parameter space means that we must project variables before parameters. However, besides this the choice is free for us. We define the main variable of a polynomial / constraint to be the highest one present (first to be projected) in the ordering.
The worst-case time complexity of CAD is doubly exponential. Traditionally, this is doubly exponential in the number of indeterminates, which would include our symbolically treated parameters. However recent progress on CAD in the presence of equational constraints (see for example the work of England et al. [21]), of which there are many in (6), allows us to conclude it is actually doubly-exponential in the number of variables minus the number of equational constraints at different levels of the projection [22]. Despite this, the number of variables present in (6) is too large for contemporary CAD implementations to tackle alone.
2.3.2 Combing with other symbolic tools
We are able to make progress by combining CAD with additional symbolic methods. Two independent investigations were undertaken. The first, described in Section 3, uses the Redlog package in Reduce and combines CAD with virtual substitution. The second, described in Section 4, uses the Regular Chains Library in Maple and combines CAD with real triangularization. In both cases we have combined the corresponding methods by hand, but automation is clearly possible.
3 Using Real Quantifier Elimination Technology in Redlog
In this section we are going to combine Virtual Substitution (VS) with CAD. The former smoothly eliminates the majority of the quantifiers while the latter allows us to count numbers of solutions via decomposition of the remaining low-dimensional spaces. That combination of methods requires the solution of several QE runs with each problem and some combinatorial arguments. Throughout this section we are performing computations using the Redlog Package [19] for Reduce revision r3606. Timings are reported for a 2.4 GHz Intel Core i7 with 3 GB RAM or cores on a compute server with similar speed and memory limitations.
3.1 Virtual Substitution
Substitution methods for quantifier elimination date back to an article from Weispfenning [58], which treated the special case with only linear occurrences of the quantified variables. Originally motivated by the proof of tight complexity bounds for the real decision problem, that approach turned out to be applicable to practical problems, especially with many parameters. Consequently, the method was systematically generalized by Weispfenning and his students to arbitrary but bounded degrees [61, 59, 39].
Quantifier elimination proceeds from the inside to the outside of a prenex quantifier block. An innermost existential quantifier is eliminated by equivalently replacing it with a finite disjunction:
[TABLE]
where is a finite elimination set containing abstract test points . The terms are derived from symbolic representations of formal zeros of parametric univariate polynomials from occurring in with possibly adding infinitesimals . They are guarded by quantifier-free formulas that guarantee the existence of the zeros in terms of the parameters. Recall that regular term substitution maps terms to terms, which naturally generalizes to corresponding maps on quantifier-free formulas. Virtual substitution , in contrast, maps atomic formulas to quantifier-free formulas. This allows to express the substitution of the terms without using any non-standard symbols. Furthermore, virtual substitution adds the guarding conditions in a suitable way. For examples and surveys of the virtual substitution method see the work of Sturm [52, 53].
3.2 Parameter Free Computations
We start by considering the case where all parameters in (5) are substituted for their estimates in (2) and (4) (interpreted as rational numbers):
[TABLE]
The closed formula states the existence of a suitable real solution. In a first step, we solve for the following eleven QE problems using VS:
[TABLE]
Each is a univariate quantifier-free formula describing all possible real choices for for which there exist real choices for all other variables such that holds. CAD can easily decompose the corresponding one-dimensional spaces. It happens that for each there are exactly three zero-dimensional cells , , where holds. We extract all , , and as real algebraic numbers, i.e., as the unique root of a univariate defining polynomials with integer coefficients within an isolating interval. By combinatorial arguments it is not hard to see that the following holds for the set of real solutions of :
[TABLE]
Notice that at this point we have proven the existence of multiple fixed points of the system for . We can furthermore compute by plugging the candidates from the Cartesian product into . A straightforward approach requires arithmetic with real algebraic numbers followed by the determination of the signs of the results, which is quite inefficient in practice. However, it turns out that interval arithmetic starting with refinements of the isolating intervals of the real algebraic numbers excludes of the candidate solutions. Even the three remaining candidates then require no further checking with algebraic numbers since we already know that . The overall CPU time is 71.3 seconds for 11 runs of VS plus 11 runs of CAD, followed by 16 hours for checking candidates. Our checking procedure is a file-based prototype starting a Reduce process for every single of the candidates; there is considerable room for optimization.
For instead of all eleven univariate CAD computations yield unique solutions which can be straightforwardly combined to one unique solution for the corresponding . The overall CPU time here is 66.4 seconds for 11 runs of VS plus 11 runs of CAD. Machine float approximations of all our solutions are given in Table 1.
3.3 Parametric Analysis for
We next consider the case where is left as a free parameter:
[TABLE]
Again, we solve for eleven QE problems using VS:
[TABLE]
This time each is a bivariate quantifier-free formula in and the corresponding . Hence we must now construct a two-dimensional CAD for each . The projection order is important: we first project , then the CAD base phase decomposes the -axis, followed by an extension phase that decomposes the -space over the -cells obtained in the base phase. This is feasible if we make one limitation: not to extend over zero-dimensional -cells. In other words, we accept finitely many blind spots in parameter space, which we can explicitly read off from the CAD so that in the end we know exactly what we are missing.
Figure 1 shows our CAD tree for . The first layer from the root shows the decomposition of the -axis. The five zero-dimensional (rectangular) cells are the previously mentioned blind spots, among which the smallest one is not relevant, as it has negative value of . Those zero-dimensional cells also establish the limits of the full dimensional (oval) cells in between. The cylinders over those one-dimensional -cells each contain either one or three zero-dimensional -cells where holds. We have deleted from the tree all -cells where does not hold.
We make two observations, important for a qualitative analysis of our system:
For all positive choices of , extending to infinity, there is at least one positive solution for . 2. 2.
There is a break point around where the system changes from having a unique solution to exactly three solutions.
Recall that for all floating point numbers given here as approximations we in fact know exact real algebraic numbers. For instance, the exact break point is the only real zero in the open interval of an irreducible defining polynomial
[TABLE]
Figure 2 depicts all eleven CAD trees for , …, . They are quite similar to the one just discussed. Even the break point from one to three solutions for is identical for all so that we can generalize our observations from earlier:
For all positive choices of , extending to infinity, there is at least one positive solution for . 2. 2.
There is a break point around where the system changes its qualitative behaviour. We have exactly given as a real algebraic number in Equation (8). For there is exactly one positive solution for . For there are at least and at most positive solutions for .
The overall computation time for our parametric analysis is 4.3 minutes. It is strongly dominated by 2.8 minutes for the computation of one particular CAD tree, for . It turns out that the suitable projection order with eliminated first is computationally considerably harder than projecting the other way round. As a preprocessing step we apply CAD-based simplification of the with the opposite, faster, projection order. Here we use QEPCAD-B (v1.69), which performs better than Redlog at simple solution formula construction [9].
4 Using Triangular Decomposition Tools in the Regular Chains Library for Maple
In this section we are going to apply triangular decomposition methods, including CAD. We find that a triangular decomposition can derive solution formulae for many variables in terms of a smaller subset for which we must apply CAD to count solutions. Throughout this section we are performing computations in Maple 2016, but using an updated version of the Regular Chains Library333www.regularchains.org. Timings are reported for a Windows 7 64 bit Desktop PC with Intel i5.
4.1 Parametric Analysis for
Regular chains are the triangular decompositions of systems of polynomial equations, where triangular means decreasing subsets of variables occurring in each polynomial. Highly efficient methods for working in complex space have been developed based on these; see the work of Wang [56] and Aubry et al. [2] for a survey.
Recent work by Chen et al. [11] proposes adaptations of these tools to the real analogue: semi-algebraic systems. They describe two algorithms to decompose any real polynomial system into finitely many regular semi-algebraic systems. The first, Real Triangularize (RT), does so directly while the second, Lazy Real Triangularize (LRT), produces the highest (complex) dimension solution component and unevaluated function calls, which if all evaluated would combine to give the full solution. These algorithms are implemented in the Regular Chains Library for Maple.
We will apply LRT on the quantifier-free formula (5) evaluated with the parameter estimates for , …, , i.e. the system (7) as studied with Redlog in Section 3.3.
We need to choose a variable ordering: our analysis requires that be the indeterminate considered alone. We place the remaining variables in lexicographical order since the in-built heuristics to make the choice could suggest nothing better. The solutions must hence contain constraints in , constraints in (, in ( and so on.
Applying LRT this way produces one solution component and 6 unevaluated function calls in around 15 seconds.
4.1.1 The main solution component from LRT
In the evaluated component: for each of , …, there is a single equation which has this as the main variable. Further, these are all linear in their main variable meaning they can be easily rearranged into the solution formulae given below.
[TABLE]
Note that these solution formula: are guaranteed valid for all positive excluding three isolated points which are provided as part of the output from LRT and described below; are triangular, with each is expressed in variables ; and are provided for all but variable .
The output of LRT also requires that be both positive and satisfy:
[TABLE]
where the coefficients are univariate polynomials in of maximum degree as given in B. Hence there are at most six solutions for , with the exact number depending on whether solutions of (19) are real and positive.
There are four constraints on free parameter as given below, one of which is the non-vanishing of the polynomial in Appendix A whose root defined the break point found by Redlog in Section 3.3. Note that the coefficients break over lines within the final constraint.
[TABLE]
Evaluating the real roots of the polynomials appearing in the above allows us to conclude that this solution component is valid for all positive values of excluding three points. As with Redlog, Maple can represent these as exact algebraic numbers but for brevity we give float approximations:
[TABLE]
Software Remark 1**.**
In the authors’ ISSAC 2017 paper [6] the description of the evaluated solution component ended here. However, following the publication of that paper a bug was uncovered by one of the authors in the simplifier of the Regular Chains Library when working with a different MAPK model to the one considered presently. For that example the simplifier was incorrectly discarding certain positivity conditions. The bug was reported to the Regular Chains developers, and the current version of the simplifier444http://www.arcnl.org/cchen/software.html now excludes all such simplifications. So presently, the output from LRT includes also the positivity conditions
[TABLE]
Some of these can clearly be removed. For example, if we know and then (16) implies and this coupled with (14) implies . However, it is not trivial to imply all such inequalities, and so any proposed solution in should be checked to see if it implies a positive solution in all the remaining variables before being accepted. This is indeed the case for all solutions described in the ISSAC 2017 paper, and below.
4.1.2 The unevaluated function calls from LRT
The main solution component described in Section 4.1.1 is not the entire solution to the system. LRT produced also six unevaluated function calls which if evaluated and combined with the main component would give the full solution. LRT guarantees that the complex dimension of the solution components from these unevaluated calls is smaller that the main component. In fact, three of the six unevaluated calls define empty solution sets, with evaluating to discover this instantaneous.
With regards to the other three: we can infer from the arguments to these function calls that each defines the solution at one of the three points in (24) that were excluded from the main component. I.e. each of these three calls has as an argument the negation of one of the univariate inequations for from (21)(23). Actually evaluating these solution components is not possible in reasonable time. Thus, as with Redlog in Section 3, we proceed accepting a small number of blind spots.
The output of LRT has quickly given us the structure of the solution space valid at all but three isolated values of . However, it does not identify where the number of real solutions change. Note that although the break point identified in Section 3 has been rediscovered in (24), there is not yet any information gathered by Maple from which we can infer its significance. We also note that there seems to be no significance for our application of the other two isolated points in (24).
4.1.3 Counting solutions with CAD
To finish the analysis we need to decompose -space according to the real roots of ; and also and since the constraints and were specified separately in the output. CAD is ideally suited for this task. We apply the Regular Chains based implementation in Maple first described by Chen et al. [12]. A CAD for , with the ordering chosen so that the -axis is the one decomposed, divides the plane into 135 cells in a few seconds. This CAD decomposes the axis into 11 cells, i.e. identifying five points, which approximate to:
[TABLE]
We give these as floats for brevity but exact algebraic numbers are available555See the Research Data Statement at the end of the paper to access them..
On the cell where , the cylinder above in the -plane is divided into 11 cells: three of which cover (two 2d sectors and a 1d section). We see that is zero on the section but not the sectors. This can be inferred by testing a sample point of the section (the invariance properties of the CAD mean that the signs of the input at this point are representative for the whole cell. In fact, with the CAD implementation we use the cells comes with a semi-algebraic description which for this section is the statement that (along with the bounds on ).
We can perform a similar analysis on the two cells for and . In each case the cylinders above are divided into 15 cells, seven of which cover , with the three sections satisfying .
So we can conclude that: (a) if then has a single positive real solution; and (b) if then has three positive real solutions. We cannot conclude with certainty what happens at the points and .
At the end of this analysis we have rediscovered the break point identified in Section 3 where the system moves from a single positive real solution to three. We also have explicit solutions valid for all except three isolated values. To obtain an actual numerical solution we need only: select the value of interest (call it ); perform univariate root isolation on , noting we know in advance how many to expect based on ; then for each solution substitute recursively into equations (9)(18), starting with (18) and working up, substituting the new variable solution from each formula into the next. The solutions in Table 1 may be easily rediscovered this way, for example.
We note that, as discussed in Software Remark 1, we have ensured that for each cell all the positive solutions in provided by the sample point do indeed lead to positive solutions for all other variables via the back substitution process.
4.2 Repeating for Other Choices
We have repeated the approach described in Section 4.1 for different choices of free parameter and different choices of fixed parameter values. For example:
With set to instead of we find that the break point between 1 and 3 real positive solutions moves to . With set to it moves to .
- 2.
Allowing to be free and fixing we find that there is only ever one positive real solution.
- 3.
Allowing to be free and fixing we find the number of positive real solutions moving from 1 to 3 to 1 breaking at and .
- 4.
Similarly, allowing to be free and fixing we find there is only ever one positive real solution; but fixing instead we find 3 real solutions between and and 1 otherwise.
This hints that there is a shape approximating a paraboloid within (, , )-space within which bistability may occur; with bistability available for any and value but bounded from below in the coordinate.
We note that these conclusions are, as with the one described in detail, valid at all but a handful of isolated values of the free parameter.
5 A Graph Theory Guided Parametric Gaussian Elimination Preprocessing Method
As described above, the complexity of polynomial systems obtained with steady-state approximations of biological models is comparatively high for the application of symbolic methods, particularly in reference to the dimension (number of indeterminates). The two studies described in Sections 3 and 4 both used tools to effectively reduce the problem dimension before applying the costly CAD method.
More generally, it is highly relevant for the the success of general polynomial systems methods if we can first identify and exploit particular structural properties of the input. Here, the MAPK models have remarkably low total degrees with many linear monomials after some substitutions for rate constants. For example, the final equation of suggests a simple polynomial expression for in terms of the remaining variables of the system. This promoted the idea of pre-processing MAPK input with essentially Gaussian elimination: in the sense of solving single suitable equations with respect to some variable and substituting the corresponding solution into the system.
5.1 Parametric Gaussian Elimination
Generalizing this idea to situations where linear variables have parametric coefficients in the other variables requires, in general, a parametric variant of Gaussian elimination, which replaces the input system with a finite case distinction with respect to the vanishing of certain coefficients and one reduced system for each case. Further, for our problem the positivity conditions establish a further apparent obstacle, because we are formally not dealing with a parametric system of linear equations but with a parametric linear programming problem.
The theory of real quantifier elimination by virtual substitution tells us that it is sufficient for the inequality constraints to play a passive role in the sense that their polynomials do not contribute to the elimination set discussed in Section 3.1. This key idea occurred first for the linear case in Theorem 3.11 of the work by Loos and Weispfenning [44]; while the current state-of-the-art is described in the thesis of Košta [39]. The crucial observation is that our entire formula is (and remains during the considered elimination) a single Gauss Prime Constituent in the sense of [39, Section 3.1.1]. Further, for the considered MAPK model, it turns out that those positivity assumptions on the variables are actually strong enough to guarantee the non-vanishing of all relevant coefficients, so case-distinctions are never necessary! We do not claim such an approach will always be so lucky, but it may be this result generalises for the MAPK hierarchy. It was the case also for the second larger MAPK model we describe in Section 8.
5.2 An Optimal Strategy
Parametric Gaussian elimination can increase the degrees of variables in the parametric coefficient, in particular destroying their linearity and suitability to be used for further reductions. For example, solving the last equation of and substituting into the first equation would destroy any linearity present in that first equation.
The natural question is whether there is an optimal strategy to Gauss-eliminate a maximal number of variables? This has been answered positively only recently by Grigoriev et al. [29]: draw a graph, where vertices are variables and edges indicate multiplication between variables within some monomial. Then one can Gauss-eliminate a maximum independent set, which is the complement of a minimum vertex cover. Figure 3 shows that graph for , where is a minimal vertex cover, and all other variables can be linearly eliminated.
Recall that minimum vertex cover is one of 21 classical NP-complete problems described by Karp [38]. However, our instances considered here and instances to be expected from other biological models are so small that the use of existing approximation algorithms [28] appears unnecessary. We have used real quantifier elimination, which did not consume measurable CPU time; alternatively one could use integer linear programming or SAT-solving.
It is a most remarkable fact that a significant number of biological models in the databases have that property of loosely connected variables. This phenomenon resembles the well-known community structure of propositional satisfiability problems, which has been identified as one of the key structural reasons for the impressive success of state-of-the-art CDCL-based SAT solvers by Girvan and Newman [27].
5.3 Reduced System for Model 26
We conclude this section with the reduced system computed with an implementation of this pre-processing in Redlog [19]. From (6) we obtain
[TABLE]
We now have a system of just two equalities in 5 indeterminates together with positivity conditions on those indeterminates. Notice that no complicated positivity constraints come into existence from this method. All corresponding substitution results are entailed by the other constraints, which is implicitly discovered by using the standard simplifier of Dolzmann and Sturm [20] during preprocessing.
Note that, with defined in (6), we have a formal equivalence here, from the theory of quantifier elimination via virtual substitution:
[TABLE]
So if we can determine the region of parameter space where solutions to exist we are guaranteed to also find solutions to there. However, our problem concerns not just the existence of solutions but the number, and so on the surface this may seems inadequate. However, because the only technology used in this reduction is linear substitution we can also conclude that the number of solutions found for will lead to the same number of solution of .
Hence it is sufficient to study . This pre-processing allows us to derive solutions with two free-parameters in the next section. We also give some indication of the performance improvements of various methods offered by the pre-processing later in Section 9.
6 Combined Approach for a Solution over 2-parameter space
In this section we describe a new derivation of a solution to the real algebraic problem with two free parameters, produced after the publication of the authors’ ISSAC 2017 and CASC 2017 conference papers [6, 23]. The progress is made by combining ideas from all three of the preceding sections. We describe in detail below but broadly we: start with the reduced system from the pre-processing of Section 5 with two free-parameters; apply the LRT method of Section 4 to reduce the problem by an indeterminate; build part of a CAD, an idea used in Section 3, sufficient to identify the regions of parameter space of interest. Timings are reported for the same hardware and software as Section 4.
6.1 Applying LRT and Preparing for CAD
We start with the reduced system (25) derived in Section 5 above. We set to 50 and leave and free. Hence we seek the regions of the -plane where there exist multiple solutions.
We first run the LRT algorithm introduced in Section 4, using variable ordering . We needed the parameters to come after the variables so we work over the parameter space, but within the pairs the orders could have been reversed. In around 5 seconds LRT outputs one solution component and 4 unevaluated function calls.
The evaluated component consists of the four positivity conditions from the input and the two equations, which may be seen in C where they are labelled (30) and (31). Of course these equations are triangular: (30) involves while (31) does not depend on . Note that (30) is linear in and so we can easily rearrange to give a solution formula for in terms of . (31) is of degree 6 in but of course not all its solutions need be real and positive. If we can determine where (31) has multiple positive real solutions then all that remains is to back substitute and to get real solutions for the other variables and check these are also positive. We will determine this using CAD.
Before that, we examine the 4 unevaluated functions calls from LRT: two instantaneously evaluate to empty solution sets while the other two cannot be evaluated in reasonable time. We infer from the arguments to the function calls that the latter two define solutions on the graphs of two polynomials in -space. These two polynomials may be found in D. The smaller is degree 5 in and degree 4 in (total degree 5 overall) and the larger degree 14 in and degree 10 in (total degree 14 overall)666As described later in Section 10.3 the boundary of the multistationarity region is actually defined by part of the graph of one of these polynomials, although there is no reason to conclude that at this stage of the analysis..
We proceed on the understanding that any results are valid everywhere in -space except on these graphs. We may compare this to Sections 3.3 and 4.1 which accepted a finite number of isolated blind spots in a one-dimensional parameter space.
6.2 Solution via an Open CAD
A CAD sign-invariant for the polynomial defining (and to allow for positivity checks) would be sufficient. However, the size of the polynomial puts this beyond CAD currently. Instead, we proceed as follows:
Step 1:
Calculate the projection set for CAD input consisting of polynomial defining (31) and polynomial (to allow for positivity check).
This is a set of 19 polynomials in the greatest of which has degree 34, and so it is not reasonable to print them all here.
Step 2:
Build an Open CAD of -space for these polynomials, along with polynomials and (to allow for positivity checks).
An Open CAD means the full dimensional cells only. The boundaries may be determined by algebraic numbers but because we do not lift over the boundaries there no costly algebraic number calculations. The idea has been much discussed by McCallum [46], Strzeboński [51], Wilson et al. [63], and other names used for it include generic CAD and 1-layered Sub-CAD. It was partly applied by the approach in Redlog in Section 3. It is sufficient to solve problems which are only in strict inequalities, but of course, that is not the case here. By making this restriction we are accepting that our solutions and conclusions are not necessarily valid on cell boundaries: a finite number of curve segments in the -plane. However, we have already made such an acceptance, in the use of LRT above.
We perform the above steps with the ProjectionCAD package of England et al. [24] in Maple777http://computing.coventry.ac.uk/~mengland/ProjectionCAD.html in 17 seconds. The resulting CAD has 533 cells.
Step 3:
Identify those cells in the upper quadrant of the -plane.
We only care about solutions in this upper quadrant. We can easily identify 139 such cells by querying sample points (note that no cell can straddle the boundary of the quadrant since the CAD produced was also produced sign-invariant for and as polynomials). Since in Step 1 we ensured that this CAD was built for the projection of the polynomial defining (31) we may conclude that for this polynomial we can work at a sample point of the cell but draw conclusions for the whole cell, as we do next.
Step 4:
Identify the number of positive real roots the polynomial defining (31) has over each of these cells.
We do this by substituting for the sample point and applying Maple’s default real root isolation algorithm. We identify 35 of the 139 cells where there are three positive real roots for , with the other 104 all having one.
Step 5:
Check that these solutions provide a positive solution for via back substitution into (30).
We first checked that the 104 cells with one positive real solution for all lead to one positive real solution for as expected. We then analyse the 35 cells and each of their three positive real solutions for in turn. For 28 of these cells each solution gives a corresponding positive real solution for . For the other 7 cells, only one of the three solutions does, so these join the other 104 as representing the parameter space with one solution.
The semi-algebraic descriptions of these 28 cells provide the exact description of the regions in -space where multistationarity can occur. We use these descriptions to produce the 4 plots of the multistationarity region in Figures 4 and 5. The 4 images are all produced from the data in the 28 cells, but with different plotting regions. In each case, the coloured regions represent the cells with multistationarity, with the only purpose of the different colours to show the separation of the cells888Because we produced an Open CAD above we cannot formally conclude what happens on these cell boundaries..
The left plot in Figure 4 is for the original range of values considered and has the region of multistationarity described by 4 full dimensional CAD cells. The right plot shows that this region grows as increases: at this range 9 cells are in view including the 4 from the left plot which are at the bottom of the region.
The left plot of Figure 5 expands the ranges considerably. There are 24 cells in view of the range but the original 9 described above are now too small to see. The right plot of Figure 5 expands the range further to include all 28 cells; with all 24 from the previous image now too small to see. In this final image the two cells at the top actually extend infinitely in the direction while always being bounded on both sides in the direction.
7 Stability of Fixed Points
The work described in Section 36 was dedicated to identifying where multiple fixed points occur. This alone does not prove multistationarity as we must also check the stability properties of these fixed points.
We may use the three linear conservation constraint equations (3) to eliminate , , and from system (1) and symbolically compute the Jacobian of the obtained reduced system. We can then numerically compute the eigenvalues of for the instances arising from the substitution of the parameter values and the different positive fixed points for the variables.
We have used the float approximations for the unique solution with and the three solutions , …, for in Table 1. For the single positive fixed point the Jacobian has eigenvalues with negative real part only and hence can be shown to be stable. For one of the three positive fixed points can be shown to be unstable, as has one eigenvalue with positive real part; the other seven had negative real parts. In contrast and can be shown to be stable. Hence for the system is indeed bistable.
A verification of the stability of the fixed points using exact real algebraic numbers by the well-known Routh–Hurwitz criterion is possible algorithmically [33], but seems to be out of range of current methods for this example. Notice that in other studies on multistationarity of signaling pathways, such as those of Conradi et al. [15] and Gross et al. [32], the question of stability has also been left to one side.
8 Another MAPK Model
We describe a second MAPK model, which we will use alongside the first from Section 2 in the remaining sections, to broaden the conclusions drawn.
8.1 MAPK Bio-Model 28
The system with number 28 in the BioModels Database is given by the following set of differential equations. This model is the distributive fully random kinetics version of the models proposed by Markevich et al. [45]. Hereafter we refer to it as Model 28. Again, we have renamed the species to and the rate constants to to facilitate reading:
[TABLE]
We denote by the system formed by replacing all left hand sides of (26) by [math].
The estimates of the rate constants given in the BioModels Database are:
[TABLE]
Again, using the left-null space of the stoichiometric matrix under positive conditions as a conservation constraint [25] we obtain the following three linear conservation constraints:
[TABLE]
where , , are new constants. Meaningful values for these three are harder to obtain than the constants in (2). The following are some realistic value estimates:
[TABLE]
Ideally we would treat all three symbolically and identify multistationarity within the parameter space.
8.2 Preprocessing
We may apply the preprocessing procedure outlined in Section 5 to and the positivity constrains similarly to as described in Section 5 for Model 26. The connection graph is given in Figure 6 showing that as a minimum vertex cover. We obtain the simplified system:
[TABLE]
[TABLE]
along with positivity constraints , , , , and .
9 Grid Sampling: Symbolic vs Numeric
In this section we summarise work that was first presented in CASC 2017 [23] which compared the use of symbolic and numeric techniques to identify multistationary regions via grid sampling.
9.1 Algorithms and Software
In this section we will use Symbolic Grid Sampling: so we have results only for a set of numerical sample points, but each sample point will undergo a symbolic computation. The result will still be an approximate identification of the region, since the sampling will be finite, but the results at those sample points will be guaranteed free of numerical errors. The symbolic computations follow exactly the strategy introduced in Section 4 except each sample point will set all parameters (rather than leaving one free) meaning a simpler symbolic computation than in Section 4 performed multiple times. In particular, with no free parameters the Lazy variant of Real Triangularization (LRT) used in Section 4 gives the full solution (no laziness) as we would get from Real Triangularization (RT) and so we just use the latter.
We will compare this symbolic grid sampling with a fully numerical gird sampling approach using the homotopy solver Bertini developed by Bates et al. [4], in its standard configuration to compute complex roots. Alternatives to Bertini include PHCpack by Verschelde [55] and the Numerical Algebraic Geometry package for Macaulay2 by Leykin [41]. Reasons for choosing Bertini include that it is the most cited homotopy solver for the past 8 years and that it allows adaptive and very high-precision arithmetic (whereas PHCpack only allows double-double)999We note that a recent development for Bertini published after this article was in press could be applicable to this problem: Paramotopy by Bates et al. [3] allows for parallelism and computation reuse, well suited for such grid sampling.. We parsed the output of Bertini using Python, and determined numerically which of the complex roots are real and positive using a threshold of for positivity.
Bertini computations (v1.5.1) were carried out on a Linux 64 bit Desktop PC with Intel i7. Maple computations (v2016 with April 2017 Regular Chains) were carried out on a Windows 7 64 bit Desktop PC with Intel i5.
Software Remark 2**.**
For the reduced system of Model 28 Bertini (incorrectly) could not find any roots, not even complex ones, for any of the parameter settings. The situation did not change when going from adaptive precision to a very high fixed precision. However, we have not attempted more sophisticated techniques like providing user homotopies. It seems a bug in Bertini has been triggered by this problem instance. It has been reported to the developers.
9.2 Sample Ranges and Plots
For Model 26 we will use a sampling range for from 200 to 1000 by 50; for from 80 to 200 by 10; and for from 5 to 75 by 5.
For Model 28 we will use a sampling range for from 100 to 1600 by 100; for from 40 to 160 by 10; and for from 120 to 240 by 10.
We produce 2d plots in each case with the third parameter fixed to its values indicated in (4) and (29). In those plots we will colour sample points according to the number of fixed points observed: yellow discs indicate one fixed point and blue boxes three. Diamonds indicate numerical errors where zero (red) or two (green) fixed states were identified.
9.3 Results and Comparison
The plots produced by the grid sampling are presented in Figures 710; and the time taken to produce them is summarised in Table 2.
9.3.1 Comparison of models
Model 28 forms a larger real algebraic problem than Model 26, 16 variables and equations rather than 11, so it unsurprising that it takes longer to perform computations.
Regarding the symbolic computations: Model 28 requires an actual CAD of a plane to be produced for each sample point while Model 26 only real root isolation (decomposition of a line). This was the case regardless of whether the original or reduced system was used as the starting point, since the RT preprocessing also reduced the number of variables that needed analysis by CAD. We note that even with the reduced system it was still beneficial to pre-process CAD with RT: the average time per sample point with pre-processing (and including time taken to pre-process) was 0.485 seconds while without it was 3.577 seconds. It is not clear if this is because of a genuine simplification or because the CAD algorithm from the Regular Chains Library that we used it particularly tuned for triangular systems.
9.3.2 Effects of the pre-processing in Section 5
Figure 7 and Figure 8 both refer to Model 26. The latter is produced by Maple’s symbolic calculations and so guaranteed free of numerical error. The former, Figure 7, represents the output of Bertini on the original system. We see that there are numerous numerical errors present: the rouge red and green diamonds in Figure 7. We find that when computing with the reduced system rather than the original system Bertini was able to to avoid all these errors, producing the same plots as Maple in Figure 8.
With Model 28 we see similar numerical errors from Bertini in Figure 9 when compared with Maple in Figure 10. However, in the case of Model 28 the reduction led to catastrophic effects for Bertini: built-in heuristics quickly (and incorrectly) concluded that there are no zero dimensional solutions for the system, and when switching to a positive dimensional run also no solutions could be found.
From the timing data in Table 2 we see that both Bertini and Maple benefited from the reduced system: For Model 26 Bertini took a third of the original time while Maple took a tenth of the original. For Model 28 the speed-up enjoyed by the symbolic method from the pre-processing was even greater: almost 100 fold!
9.3.3 Symbolic vs Numerical
As described above, we have observed numerous numerical errors when using Bertini which may avoided with the symbolic computations of Maple. However, they can also be avoided (at least for Model 26) by using the pre-processing technique described in Section 5.
However, and surprisingly, for Model 26 the symbolic methods were actually quicker than the numerical ones. The symbolic methods used are well known for their doubly exponential computational complexity (in the number of variables) so it is not necessary surprising that as the system size increases the results of the comparison would change. For Model 28 we have the expected outcome of the numerical calculations being quicker.
We can see some other statistical data for the timings in Maple: the standard deviation for the timings is fairly modest but in each row there are large outliers and so the median is always a little less than the mean average.
9.4 Higher Sampling Rates
Of course, the grid sampling described in this section scales directly with the number of sample points, so we can easily produce plots with higher sampling rates such as those shown later in Figure 11.
10 Going Further
The work presented is a substantial step forward but there is a wide range of directions for future work.
10.1 Solution in 3-parameter Space
The complexity of the fully symbolic approaches puts a complete analysis over this space out of reach (for now). However, the grid-sampling method of Section 9 can already be extended into 3 parameters with relative ease: at a cost linearly proportional to the increased number of sample points. This was completed for Model 26, where the multistationarity region is bounded on both sides in the and directions but extends infinitely above in . For example, with the range bound at 1000 the region is bounded by extending to 800 and to 600. With a sample rate of 20 for and and 50 for we have produced a Maple point plot of 20,400 points in 18 minutes. Figure 12 shows 2D captures of the 3D plot of the bistable points only. Figure 13 gives two views of the convex hull of the bistable points in Figure 12. This was produced using the convex package101010http://www.math.uwo.ca/~mfranz/convex/. We note the lens shape seen in the orientation in the left plot is comparable with the image in the original paper of Markevich et al. [45] (Fig. S7).
10.2 Effect of Other Parameters
Our work has focussed on understanding the behaviour of the system in the 3-parameter space but as described in Section 2 there are many other parameters for which we simply took the values from the BioModels Database. While there is confidence in the accuracy of these values, an important question for future work is the stability of the approaches we present to small perturbations in these values.
10.3 Conjecture for Semi-algebraic Solutions without CAD
All our semi-algebraic calculations used CAD as the backend to produce solutions, although after considerable simplification of the input. CAD is the most expensive technology employed by a significant margin. Its doubly exponential theoretical complexity is felt clearly in practice and so will be a barrier to studying larger parameter spaces or models. However, the results of Sections 4 and 6 hint that the solution could be available without CAD.
Recall from Section 4 that with one free-parameter the key break point in parameter space between 1 and 3 fixed points was determined by a real root of (8), one of the univariate polynomials whose roots were excluded from the validity of the LRT solution component. Similarly, studying the 28 cells where multistationarity could occur identified in Section 6 shows that the key region was also identified by the polynomial defining one of the graphs where LRT’s solution component was not valid.
Figures 14 and 15 give numerical plots of the polynomial (33), the former on smaller ranges and the latter on larger. The images on the right focus on the upper quadrant of interest and should be compared with Figures 4 and 5 of the exact multistationarity region. It is clear that (33) provides the boundary of this region. However, as the images on the left show, it is only one segment of the graph of this polynomial that is of interest.
Of course, this is just an observation. We have yet to derive a proof that this would always be identified by LRT. Even if it were there would still be things to clarify:
Which polynomial from the several that LRT uses to define excluded regions is the one of interest? Recall from Section 4 that as well as (8) LRT identified two further polynomials in (22) and (23); while in Section LRT identified not only (33) but also (32).
- 2.
Which portion of the graph forms the boundary? The graph of (33) is a superset of the boundary. Even, when restricting our view to the positive quadrant (plot on the right of Figure 14) there is a second curve segment that does not have relevance to the application.
Nevertheless, we have identified a promising conjecture for continued study. At the least it gives useful insight on where to look for multistationarity without employing CAD. For example, it could direct future application of detailed grid sampling.
11 Summary and Final Thoughts
11.1 Summary
We have considered the problem of identifying regions of multistationarity in models of biological networks, an important problem with potentially clinical applications. We have investigated a variety of symbolic approaches encompassing multiple algorithms and computer algebra systems. We have derived semi-algebraic solution formulae and region descriptions for a classic MAPK model; as well as demonstrating the utility of symbolic-numeric grid sampling. We have drawn together the work first presented at conferences in 2017 [6, 23] and extended it to give solutions over a 2-parameter space not previously published and a conjecture on where future progress may come from.
11.2 Final Thoughts
We hope this work will inspire further study on the application of symbolic tools to biological network analysis, from both communities. Indeed, work on developing Mathematica tools for such problems has now been undertaken by Lichtblau [43], inspired by Bradford et al. [6] but based on tools for discriminant varieties not considered there. The study of such real world problems is of great benefit not only to the application domains but also to the software developers: these MAPK studies uncovered bugs in both Regular Chains (see Software Remark 1 in Section 4.1) and Bertini (see Software Remark 2 in Section 9.1) which had escaped the numerous other tests and applications of those algorithms.
Key areas of future study include the sensitivity of the analysis to variations in the other parameters (Section 10.2) and the conjecture described in Section 10.3. Additional areas to investigate could include the various degrees of freedom with the algorithms used. For example, we have a free choice of variable ordering: Model 26 has 11 variables corresponding to 39 916 800 possible orderings while Model 28 has 16 variables corresponding to more than orderings! Heuristics that exist to help with this choice, such as those of Dolzmann et al. [18], Bradford et al. [8], could not discriminate between the orderings on offer, even though the orderings do make a difference to the computation. Recent work on using machine learning to make such choice by Huang et al. [35, 34] may be applicable. Also, since MAPK problems contain many equational constraints an approach as described by England et al. [21] may be applicable for the higher dimensional CADs required to study more parameters.
Semi-algebraic solutions over 3-parameter space is out of reach at the time of writing. We note however that instances like MAPK were until recently thought out of reach of symbolic computation altogether, and while writing the ISSAC 2017 contribution we thought the 2-parameter case of Section 6 out of reach. So further progress will surely follow.
Acknowledgements
Section 3 uses two great free software tools: GNU Parallel for distributing computations on several processors, and yEd for visualization of CAD trees.
J. Davenport, M. England and T. Sturm are grateful to the European Union’s Horizon 2020 Research and Innovation programme, under grant agreement No 712689 (SC2). H. Errami, O. Radulescu, and A. Weber thanks the French-German Procope-DAAD program for partial support of this research. V. Gerdt was partially supported by the RUDN University Program 5-100. D. Grigoriev is grateful to the grant RSF 16-11-10075 and to MCCME for wonderful working conditions and an inspiring atmosphere. M. Košta has been supported by the DFG/ANR Project STU 483/2-1 SMArT.
We thank the anonymous reviewers of the present paper and our earlier conference papers for their useful comments which have improved this work.
Research Data Statement:
Data supporting the research in this paper is freely available in a Zenodo repository: https://doi.org/10.5281/zenodo.2533280.
Appendix A Defining Polynomial of the Section 3 Break Point
In Section 3.3 a break point where the system moved from 1 to 3 positive real solutions was discovered at around . The exact point is an algebraic number defined as the only real zero of a polynomial with coefficients as below. Note that the coefficients are too large to fit on a single line: the line breaks between digits should be read as a continuation of the single coefficient description rather than anything else.
[TABLE]
Appendix B Polynomial from Section 4.1.1
In Section 4 we described the application of LRT to (7). The main solution component provided the formulae 918 and required that where the coefficients are as given below.
[TABLE]
Appendix C Evaluated LRT Solution Component from Section 6
In Section 6.1 we applied LRT to (25) to simplify that reduced system further before applying CAD. The evaluated solution component consisted of the positivity conditions and the two following equations.
[TABLE]
[TABLE]
Appendix D The polynomials in )-space excluded by LRT in Section 6
The evaluated solution component in the previous appendix is guaranteed to describes the solution everywhere except upon the graphs of two polynomials in )-space. The smaller of these polynomials is as follows:
[TABLE]
The larger is defined by
[TABLE]
where the are univariate polynomials in given over the following pages. In Section 10.3 we noted that part of the graph of this polynomial forms the boundary of the desired region in )-space where multiple solutions exist.
See pages - of Excl1.pdf
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Arnon et al. [1984] Arnon, D. S., Collins, G. E., Mc Callum, S., 1984. Cylindrical algebraic decomposition I: The basic algorithm. SIAM J. Comput. 13 (4), 865–877.
- 2Aubry et al. [1999] Aubry, P., Lazard, D., Moreno Maza, M., 1999. On the theories of triangular sets. J. Symb. Comput. 28 (1-2), 105–124.
- 3Bates et al. [2018] Bates, D., Brake, D., Niemerg, M., 2018. Paramotopy: Parameter homotopies in parallel. In: Davenport, J., Kauers, M., Labahn, G., Urban, J. (Eds.), Mathematical Software – Proc. ICMS 2018. Vol. 10931 of Lecture Notes in Computer Science. Springer International Publishing, pp. 28–35.
- 4Bates et al. [2013] Bates, D. J., Hauenstein, J. D., Sommese, A. J., Wampler, C. W., 2013. Bertini: Software for numerical algebraic geometry. doi:10.7274/R 0H 41PB 5.
- 5Bhalla and Iyengar [1999] Bhalla, U. S., Iyengar, R., 1999. Emergent properties of networks of biological signaling pathways. Science 283 (5400), 381–387.
- 6Bradford et al. [2017] Bradford, R., Davenport, J., England, M., Errami, H., Gerdt, V., Grigoriev, D., Hoyt, C., Kosta, M., Radulescu, O., Sturm, T., Weber, A., 2017. A case study on the parametric occurrence of multiple steady states. In: Proc. ISSAC ’17. ACM, pp. 45–52.
- 7Bradford et al. [2016] Bradford, R., Davenport, J., England, M., Mc Callum, S., Wilson, D., 2016. Truth table invariant cylindrical algebraic decomposition. J. Symb. Comput. 76, 1–35.
- 8Bradford et al. [2013] Bradford, R., Davenport, J., England, M., Wilson, D., 2013. Optimising problem formulations for cylindrical algebraic decomposition. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (Eds.), Intelligent Computer Mathematics. Vol. 7961 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 19–34.
