The interdependent network of gene regulation and metabolism is robust   where it needs to be

David F. Klosik; Anne Grimbs; Stefan Bornholdt; Marc-Thorsten H\"utt

arXiv:1701.09002·q-bio.MN·November 1, 2017

The interdependent network of gene regulation and metabolism is robust where it needs to be

David F. Klosik, Anne Grimbs, Stefan Bornholdt, Marc-Thorsten H\"utt

PDF

TL;DR

This study applies the theory of interdependent networks to the gene regulation and metabolism networks in E. coli, revealing their differential robustness and sensitivity to various perturbations, and demonstrating the utility of this approach in systems biology.

Contribution

It is the first quantitative application of interdependent network theory to biological systems, elucidating the robustness and sensitivity of gene-metabolism networks.

Findings

01

Network is sensitive to gene regulatory perturbations.

02

Network is robust against metabolic changes.

03

Interdependent network theory aids in analyzing biological robustness.

Abstract

The major biochemical networks of the living cell, the network of interacting genes and the network of biochemical reactions, are highly interdependent, however, they have been studied mostly as separate systems so far. In the last years an appropriate theoretical framework for studying interdependent networks has been developed in the context of statistical physics. Here we study the interdependent network of gene regulation and metabolism of the model organism Escherichia coli using the theoretical framework of interdependent networks. In particular we aim at understanding how the biological system can consolidate the conflicting tasks of reacting rapidly to (internal and external) perturbations, while being robust to minor environmental fluctuations, at the same time. For this purpose we study the network response to localized perturbations and find that the interdependent network is…

Tables4

Table 1. TABLE S1: Overview of the four custom-built randomization schemes in order of their strictness. In particular, the degree of freedeom is denoted by the possible pairs of edges available for the randomization and by the conserved quantities of the graph: source domain (SD), target domain (TD), logical category of an edge (LCE), biological category of a vertex (BCV), and biological category of an edge (BCE).

Scheme	Possible pairs	Conserved quantities
DOMAIN	$134, 942, 137$	(SD, TD)
DOMAIN_LCE	$59, 075, 210$	(SD, TD), LCE
DOMAIN_BCV	$54, 592, 007$	(SD, TD), BCV
DOMAIN_BCE	$42, 774, 454$	(SD, TD), BCE

Table 2. TABLE S3: Biological categories of a vertex (BVC) of the integrative E. coli model and the logical categories of an edge (LCEs) a target vertex of this category may contribute to: ’conjunct’ ( C 𝐶 C ), ’disjunct’ ( D 𝐷 D ) and ’regulation’ ( R 𝑅 R ). The detailed vertex composition will be given in Grimbs et al. Sgrimbs_integrated_2016 .

Vertex	BCV	LCEs
	gene ( $g n$ )	$R$
	protein monomer ( $p m$ )	$D$ , $R$
	protein-protein-complex ( $p p c$ )	$C$ , $D$ , $R$
	protein-compound-complex ( $p c c$ )	$C$ , $D$
	protein-rna-complex ( $p r c$ )	–
	reaction ( $r x n$ )	$C$ , $D$ , $R$
	compound ( $c m p$ )	$C$ , $D$

Table 3. TABLE S4: Domain-related categories of a vertex (DVCs) of the integrative E. coli model, the corresponding biological categories of vertices (BVCs): gene, protein monomer, protein-protein-complex, protein-compound-complex, protein-rna-complex, reaction, compound, as well as the biological categories of an edge (BCEs) a vertex of this category is involved.

DCV	BCVs	BCEs
gene ( $g$ )		$g \to p$
protein ( $p$ )	, ,	$g \to p$ , $p \to x$
complex ( $x$ )	,	$p \to x$ , $c \to x$
enzyme ( $z$ )	,	$z \to r$
reaction ( $r$ )		$z \to r$ , $e \to r$ , $r \to d$
compound ( $c$ )		$c \to x$
educt ( $e$ )	, , ,	$e \to r$
product ( $d$ )	, , ,	$r \to d$

Table 4. TABLE S5: Biological categories of an edge (BCEs) and the corresponding logical categories of an edge (LCEs): ’conjunct’ ( C 𝐶 C ), ’disjunct’ ( D 𝐷 D ) and ’regulation’ ( R 𝑅 R ), for each vertex linkage of the integrative E. coli model. The different linkages are denoted by the combinations of BCVs: gene, protein monomer, protein-protein-complex, protein-compound-complex, protein-rna-complex, reaction, compound. The detailed edge composition will be given in Grimbs et al. Sgrimbs_integrated_2016 .

BCEs	LCEs	Vertex linkages
gene $\to$ protein ( $g \to p$ )	$D$
protein $\to$ complex ( $p \to x$ )	$C$
compound $\to$ complex ( $c \to x$ )	$C$
enzyme $\to$ reaction ( $z \to r$ )	$D$
educt $\to$ reaction ( $e \to r$ )	$C$
reaction $\to$ product ( $r \to d$ )	$D$
transport ( $t$ )	$C$
regulation ( $r^{*}$ )	$R$

Equations26

e_{i}^{LCE} \in {C, D, R}

e_{i}^{LCE} \in {C, D, R}

σ_{i} (t + 1) = f_{i} (σ_{j} (t) ∣ σ_{j} \in Γ_{i}^{-})

σ_{i} (t + 1) = f_{i} (σ_{j} (t) ∣ σ_{j} \in Γ_{i}^{-})

\displaystyle f_{i}=\begin{cases}1&\mbox{if }\parbox[t]{108.405pt}{$\left[\sum_{j}c_{ij}(1-\sigma_{j})=0\right]\wedge\;\left[\sum_{j}d_{ij}\sigma_{j}>0\vee\sum_{j}d_{ij}=0\right]\;\wedge\;\left[\sum_{j}|r_{ij}|(1-\sigma_{j})=0\right]$}\\ 0&\text{otherwise}\end{cases}

\tilde{χ} (X, q) = ⟨ B_{\infty}^{2} ⟩ - ⟨ B_{\infty} ⟩^{2}

\tilde{χ} (X, q) = ⟨ B_{\infty}^{2} ⟩ - ⟨ B_{\infty} ⟩^{2}

X

X

σ_{i} \in {0, 1}, i = {1, \dots, ∣ G ∣}

σ_{i} \in {0, 1}, i = {1, \dots, ∣ G ∣}

∣ V_{t} (X, q) ∣

∣ V_{t} (X, q) ∣

A (X, q)

A (X, q)

B (X, q)

v_{i}^{BCV} \in {r x n, c m p, g n, p m, pp c, p cc, p r c} .

v_{i}^{BCV} \in {r x n, c m p, g n, p m, pp c, p cc, p r c} .

v_{i}^{DCV} \in {g, p, x, z, r, c, e, d} .

v_{i}^{DCV} \in {g, p, x, z, r, c, e, d} .

e_{i} = (v_{s}^{(i)}, v_{t}^{(i)}), v_{j}^{(i)} \in {G_{R}, G_{I}, G_{M}},

e_{i} = (v_{s}^{(i)}, v_{t}^{(i)}), v_{j}^{(i)} \in {G_{R}, G_{I}, G_{M}},

e_{i}^{LCE} \in {C, D, R} .

e_{i}^{LCE} \in {C, D, R} .

e_{i}^{BCE} \in {g \to p, p \to x, c \to x, z \to r, e \to r, r \to d, t, r^{*}} .

e_{i}^{BCE} \in {g \to p, p \to x, c \to x, z \to r, e \to r, r \to d, t, r^{*}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The interdependent network of gene regulation and metabolism

is robust where it needs to be

David F. Klosik

[email protected]

Institute for Theoretical Physics, University of Bremen

Anne Grimbs

[email protected]

Department of Life Sciences and Chemistry, Jacobs University Bremen

Stefan Bornholdt

[email protected]

Institute for Theoretical Physics, University of Bremen

Marc-Thorsten Hütt

[email protected]

Department of Life Sciences and Chemistry, Jacobs University Bremen

Abstract

The major biochemical networks of the living cell, the network of interacting genes and the network of biochemical reactions, are highly interdependent, however, they have been studied mostly as separate systems so far. In the last years an appropriate theoretical framework for studying interdependent networks has been developed in the context of statistical physics.

Here we study the interdependent network of gene regulation and metabolism of the model organism Escherichia coli using the theoretical framework of interdependent networks.

In particular we aim at understanding how the biological system can consolidate the conflicting tasks of reacting rapidly to (internal and external) perturbations, while being robust to minor environmental fluctuations, at the same time. For this purpose we study the network response to localized perturbations and find that the interdependent network is sensitive to gene regulatory and protein-level perturbations, yet robust against metabolic changes.

This first quantitative application of the theory of interdependent networks to systems biology shows how studying network responses to localized perturbations can serve as a useful strategy for analyzing a wide range of other interdependent networks.

A main conceptual approach of current research in the life sciences is to advance from a detailed analysis of individual molecular components and processes towards a description of biological systems and to understand the emergence of biological function from the interdependencies on the molecular level.

Supported by the diverse high-throughput ’omics’ technologies, the relatively recent discipline of systems biology has been the major driving force behind this new perspective which becomes manifest, for example, in the effort to compile extensive databases of biological information to be used in genome-scale models kitano_systems_2002 ; ideker_new_2001 ; aderem_systems_2005 .

Despite its holistic ’game plan’, however, systems biology frequently operates on the level of subsystems: Even when considering cell-wide transcriptional regulatory networks, as, e.g., in a network motif analysis milo_network_2002 , this is only one of the cell’s networks. Likewise, the popular approach to studying metabolic networks in systems biology, constraint-based modeling, accounts for steady-state predictions of metabolic fluxes of genome-scale metabolic networks llaneras_stoichiometric_2008 , which again, is only one of the other networks of the cell.

In the analysis of such large networks, systems biology draws its tools considerably from the science of complex networks which provides a mathematical framework especially suitable for addressing interdisciplinary questions. Combining the mathematical subdiscipline of graph theory with methods from statistical physics, this new field greatly contributed to the understanding of, e.g., the percolation properties of networks cohen_percolation_2002 , potential processes of network formation dorogovtsev_evolution_2002 or the spreading of disease on networks newman_spread_2002 . In the early 2000s, gene regulation and metabolism have been among the first applications of the formalisms of ’network biology’ Barabasi:2004p17908 . Among the diverse studies of network structure for these systems, the most prominent ones on the gene regulatory side are the statistical observation and functional interpretation of small over-represented subgraphs (’network motifs’) Shen-Orr:Nat-Genet2002 ; Alon:2007p17924 and the hierarchical organization of gene regulatory networks yu2006genomic . On the metabolic side, the broad degree distribution of metabolic networks stands out Jeong:2000:Nature:11034217 , with the caveat, however, that ’currency metabolites’ (like ATP or H2O) can severely affect network properties ma2003reconstruction , as well as the hierarchical modular organization of metabolic networks Ravasz:2002p351 ; Guimera:2005p15058 .

Over the last decade, the field of complex networks moved its focus from the investigation of single-network representations of systems to the interplay of networks that interact with and/or depend on each other. Strikingly, it turned out that explicit interdependence between network constituents can fundamentally alter the percolation properties of the resulting interdependent networks, which can show a discontinuous percolation transition in contrast to the continuous behavior in single-network percolation parshani_critical_2011 ; parshani_interdependent_2010 ; son_percolation_2012 ; radicchi_abrupt_2013 ; zhou_simultaneous_2014 ; radicchi_percolation_2015 . It has also been found that, contrary to the isolated-network case, networks with broader degree distribution become remarkably fragile as interdependent networks buldyrev_catastrophic_2010 .

However, this set of recent developments in network science still lacks application to systems biology.

Arguably, the most prominent representative of interdependent networks in a biological cell is the combined system of gene regulation and metabolism which are interconnected by various forms of protein interactions, e.g., enzyme catalysis of biochemical reactions couples the regulatory to metabolic network, while the activation or deactivation of transcription factors by certain metabolic compounds provides a coupling in the opposite direction.

Although it is well-known that gene regulatory and metabolic processes are highly dependent on one another only few studies addressed the interplay of gene regulation and metabolism on a larger scale and from a systemic perspective covert_integrating_2004 ; shlomi_genome-scale_2007 ; samal_regulatory_2008 . The first two studies have aimed at finding consistent metabolic-regulatory steady states by translating the influence of metabolic processes on gene activity into metabolic flux predicates and incorporating high-throughput gene expression data. This can be considered as an extension of the constraints-based modeling framework beyond the metabolic network subsystem. In the paper of Samal and Jain samal_regulatory_2008 , on the other hand, the transcriptional regulatory network of Escherichia coli (E. coli) metabolism has been studied as a Boolean network model into which flux predicates can be included as additional interactions. These models were first important steps towards integrating the subsystems of gene regulation and metabolism from a systems perspective.

The formalism of interdependent networks now allows us to go beyond these pioneering works on integrative models, by analyzing the robustness of the combined system in terms of the maximal effect a small perturbation can have on such interdependent systems. In particular, the findings can be interpreted in the context of cascading failures and percolation theory.

We here undertake a first application of the new methodological perspective to the combined networks of gene regulation and metabolism in E. coli.

Using various biological databases, particularly EcoCyc as the main core keseler_ecocyc:_2005 ; keseler_ecocyc:_2013 , we have compiled a graph representation of gene regulatory and metabolic processes of E. coli including a high level of detail in the structural description, distinguishing between a comparatively large number of node and link types according to their biological functionality.

A structural analysis of this compilation reveals that, in addition to a small set of direct links, the gene-regulatory and the metabolic domains are predominantly coupled via a third network domain consisting of proteins and their interactions. Figure 1 shows this three-domain functional division. Details about the data compilation, the network reconstruction and the domain-level analysis are given in Grimbs et al. grimbs_integrated_2016 .

This rich structural description, together with purpose-built, biologically plausible propagation rules allows us to assess the functional level with methods derived from percolation theory. More precisely, we will investigate cascading failures in the three-domain system, emanating from small perturbations, localized in one of the domains. By network response to localized perturbations analysis we will observe below that (i) randomized versions of the graph are much less robust than the original graph and (ii) that the integrated system is much more susceptible to small perturbations in the gene regulatory domain than in the metabolic one.

I The System

The core object of our investigation is an E. coli network representation of its combined gene regulation and metabolism, which can be thought of as functionally divided into three domains: the representation captures both gene regulatory and metabolic processes, with these processes being connected by an intermediate layer that models both, the enzymatic influence of genes on metabolic processes, as well as signaling-effects of the metabolism on the activation or inhibition of the expression of certain genes. The underlying interaction graph $G=(V,E)=\{G_{R},G_{I},G_{M}\}$ with its set of nodes (vertices) $V$ and links (edges) $E$ consists of three interconnected subgraphs, the gene regulatory domain $G_{R}$ , the interface domain $G_{I}$ and the metabolic domain $G_{M}$ . From the functional perspective, $G$ is the union of gene regulatory ( $G_{R}$ ) and metabolic processes ( $G_{M}$ ), and their interactions and preparatory steps form the interface ( $G_{I}$ ). Figure 1 shows a sketch of the network model.

The integrated network representation has been assembled based on the EcoCyc database (version 18.5; keseler_ecocyc:_2005 ; keseler_ecocyc:_2013 ) which offers both data about metabolic processes and (gene) regulatory events incorporated from the corresponding RegulonDB release 8.7 salgado_regulondb_2013 . The extensive metadata allows for the assignment of the vertices to one of the three functional domains. Details of this process and a detailed characterization of the resulting model will be described elsewhere grimbs_integrated_2016 . The corresponding graph representation consists of $10,383$ vertices and $24,150$ directed edges.

Since we are interested in the propagation of a signal between the domains, in the following we will refer to the domains of the source and target vertices of the edge $e_{i}=(v_{s}^{(i)},v_{t}^{(i)})$ as source domain, SD, and target domain, TD, respectively. The metadata can be used to assign properties to the nodes and edges of the graph beyond the domain structure, some of which are used in the following analysis, namely in the construction of the propagation rules of the system and of the randomization schemes.

We distinguish between biological categories of edges (capturing the diverse biological roles of the edges) and the logical categories (determining the rules of the percolation process). According to their biological role in the system, both vertices and edges are assigned to a biological category; we abbreviate the biological category of a vertex as BCV and the biological category of an edge as BCE (for details see Supplementary Materials). Each of the eight BCEs can then be mapped uniquely to one of only three logical categories of an edge, LCE,

[TABLE]

which are of central importance for the spreading dynamics in our system:

C, ’conjunct’

The target vertex of an edge with this logical AND property depends on the source node, i.e., it will fail once the source node fails. For example, for a reaction to take place, all of its educts have to be available.

D, ’disjunct’

Edges with this logical OR property are considered redundant in the sense that a vertex only fails if the source vertices of all of its incoming D-edges fail. For instance, a compound will only become unavailable once all of its producing reactions have been canceled.

R, ’regulation’

Edges of this category cover $14$ different kinds of regulatory events (described in detail in the Supplementary Materials). As shown below, in terms of the propagation dynamics we treat these edges similar to the ’conjunct’ ones.

II Perturbation Rules

Next we describe the dynamical rules for the propagation of an initial perturbation in the network in terms of the logical categories of an edge (LCE), which distinguish between the different roles a given edge has in the update of a target vertex.

Every vertex is assigned a Boolean state variable $\sigma\in\{0,1\}$ ; since we intend to mimic the propagation of a perturbation (rather than simulate a trajectory of actual biological states) we identify the state $1$ with not yet affected by the perturbation while the state [math] corresponds to affected by the perturbation. We stress that the trajectory $\vec{\sigma}(t)$ does not correspond to the time evolution of the abundance of gene products and metabolic compounds, but the rules have been chosen such that the final set of affected nodes provides an estimate of all nodes potentially being affected by the initial perturbation. A node not in this set is topologically very unlikely of being affected by the perturbation at hand (given the biological processes contained in our model).

A stepwise update can now be defined for vertex $i$ with in-neighbours $\Gamma^{-}_{i}$ in order to study the spreading of perturbations through the system by initially switching off a fraction $q$ of vertices:

[TABLE]

where $c_{ij}$ is $1$ if $v_{j}$ is connected to $v_{i}$ via a C-link and [math] otherwise; $d_{ij}$ and $r_{ij}$ are defined analogously.

Thus, a vertex will be considered unaffected by the perturbation if none of its in-neighbours connected via either a $C$ or an $R$ edge have failed (regardless of the sign of the regulatory interaction), and at least one of its in-neighbours connected via a $D$ edge is still intact. With an additional rule it is ensured that an initially switched off vertex stays off. The choice of the update rules ensures that the unperturbed system state is conserved under the dynamics: $\vec{f}(\vec{1})=\vec{1}$ .

As a side remark, the spreading of a perturbation according to the rules defined above could also be considered as an epidemic process with one set of connections with a very large, and a second set of connections with a very low probability of infection watts_simple_2002 .

III Elements of Percolation Theory

In systems which can be described without explicit dependencies between its constituents but with a notion of functionality that coincides with connectivity, percolation theory is a method of first choice to investigate the system’s response to average perturbations of a given size that can be modelled as failing vertices or edges callaway_network_2000 ; cohen_percolation_2002 The fractional size of the giant connected component as a function of the occupation probability $p$ of a constituent typically vanishes at some critical value $p_{c}$ , the percolation threshold. In the following, we will mostly use the complementary quantity $q=1-p$ so that $q_{c}=1-p_{c}$ can be interpreted as the critical size of the initial attack or perturbation of the system. The strong fluctuations of the system’s responses in the vicinity of this point can serve as a proxy for the percolation threshold, which is especially useful in finite systems in which the transition appears smoothed out. In our analysis, the susceptibility $\tilde{\chi}=\langle S^{2}\rangle-\langle S\rangle^{2}$ , where $S$ is the size of the largest cluster, is used PhysRevE.93.030302 .

Upon the introduction of explicit dependencies between the system’s constituents, the percolation properties can change dramatically. The order parameter no longer vanishes continuously but typically jumps at $p_{c}$ in a discontinuous transition parshani_interdependent_2010 ; parshani_critical_2011 as cascades of failures fragment the system. A broader degree distribution now enhances a graph’s vulnerability to random failures, in opposition to the behavior in isolated graphs buldyrev_catastrophic_2010 ; albert_error_2000 . Details of the corresponding theoretical framework have been worked out by Parshani et al. parshani_interdependent_2010 , Son et al. son_percolation_2012 , Baxter et al. baxter_avalanche_2012 and more recently the notion of ’networks of networks’ has been included gao_networks_2011 ; gao_single_2014 ; kenett_networks_2015 . There have also been attempts to integrate this class of models into the framework of multilayer networks kivela_multilayer_2014 .

In addition to random node failure other procedures for initial node removal have been explored, e.g., node removal with respect to their degree (targeted attacks) huang_robustness_2011 .

Currently, two notions of localized attacks have been described. Attacks of the first sort are defined on spatially embedded networks and are ’local’ with respect to a distance in this embedding, i.e. in a ’geographical’ sense berezin_localized_2015 . The second approach considers locality in terms of connectivity: around a randomly chosen seed, neighbours are removed layer by layer yuan_how_2015 ; shao_percolation_2015 . In contrast, as described below in our approach, attacks are localized with respect to the three network domains, while within the domains nodes are chosen randomly.

At this point we would like to shortly comment on the applicability of the mathematical concepts of interdependent networks to real-world data. Aiming at analytical tractability, typical model systems need to choose a rather high level of abstraction. While certainly many systems can be accurately addressed in that way, we argue that especially in the case of biological systems the theoretical concepts can require substantial adjustment to cover essential properties of the system at hand.

When asking for the systemic consequences of interdependency, the distinction between several classes of nodes and links may be required. Effectively, some classes of links may then represent simple connectivity, while others can rather be seen as dependence links. In Biology, such dependencies are typically mediated by specific molecules (e.g., a small metabolite affecting a transcription factor, or a gene encoding an enzyme catalyzing a biochemical reaction). Such implementations of dependence links are no longer just associations and it is hard to formally distinguish them from the functional links.

In contrast to the explicitly alternating ’percolation’ and ’dependency’ steps in typical computational models in which the decoupling of nodes from the largest component yields dependent nodes to fail, in our directed model both, connectivity and dependency links are evaluated in every time step and (apart from nodes failing due to dependency) only fully decoupled vertices cause further dependency failures.

IV Network Response to Localized Perturbation Analysis

Due to the functional three-domain partition of our E. coli gene regulatory and metabolic network reconstruction, we have the possibility to classify perturbations not only according to their size, but also with respect to their localization in one of the domains comprising the full interdependent system, thereby enabling us to address the balance of sensitivity and robustness of the interdependent network of gene regulation and metabolism.

Here we introduce the concept of network response to localized perturbations analysis. This analysis will reveal that perturbations in gene regulation affect the system in a dramatically different way than perturbations in metabolism. Thus we study the response to localized perturbations. We denote such perturbations by $\mbox{Per}(X,q)$ , where $X$ is the domain, in which the perturbation is localized ( $X\in\{R,I,M,T\}$ , with $T$ representing the total network $G$ , i.e., the case of non-localized perturbations). The perturbation size $q=1-p$ is measured in fractions of the total network size $N=|G|$ . A perturbation $\mbox{Per}(M,0.1)$ thus is a perturbation in the metabolic domain with (on average) $0.1|G|$ nodes initially affected. Note that sizes $q$ of such localized perturbations are limited by the domain sizes, e.g., $q|G|<|G_{R}|$ for a perturbation in the gene regulatory domain.

After the initial removal of a fraction $q$ of the vertices from the domain $X$ the stepwise dynamics described above will lead to the deactivation of further nodes and run into a frozen state $\vec{\sigma}_{\infty}$ in which only a fraction $A(X,q)$ of the vertices are unaffected by the perturbation (i.e., are still in state $1$ ). In addition to being directly affected by failing neighbors, in the process of network fragmentation nodes may also become parts of small components disconnected from the network’s core, and could in this sense be considered non-functional; we therefore also monitor the relative sizes of the largest (weakly) connected component in the frozen state, $B(X,q)$ , for different initial perturbation sizes.

In the limit of infinite system size we could expect a direct investigation of $B(X,q)$ as a function of $q$ to yield the critical perturbation size $q_{c}=1-p_{c}$ at which $B$ vanishes. In our finite system, however, we have to estimate $q_{c}$ ; following Radicchi and Castellano PhysRevE.93.030302 and Radicchi radicchi_predicting_2015 we measure the fluctuation of $B(X,q)$ which serves as our order parameter and look for the peak position of the susceptibility

[TABLE]

as a function of parameter $q$ in order to estimate the transition point from the finite system data. Supplementary Figure S1 schematically illustrates the analysis.

V Randomization Schemes

In order to interpret the actual responses of a given network to perturbations, one usually contrasts them to those of suitably randomized versions of the network at hand. Thereby, the often dominant effect of the node degree distribution of a network can be accounted for and the effects of higher-order topological features that shape the response of the network to perturbations can be studied systematically.

The same is true for the localized perturbation response analysis introduced here. In fact, due to the substantially larger number of links from gene regulation to metabolism (both, directly and via the interface component of the interdependent network) than from metabolism to gene regulation we can already expect the response to such localized perturbations to vary.

Here we employ a sequence of ever more stringent randomization schemes to generate sets of randomized networks serving as null models for the localized perturbation response analysis. In all of the four schemes the edge-switching procedure introduced by Maslov and Sneppen Maslov910 is employed which conserves the in- and out-degrees of all vertices.

Our most flexible randomization scheme (DOMAIN) only considers the domains of the source and target vertices of an edge (SD and TD): only pairs of edges are flipped which share both, the source and the target domain (e.g., both link a vertex in the metabolic domain to a vertex in the interface). The remaining three randomization schemes all add an additional constraint. The DOMAIN_LCE randomization further requires the edges to be of the same logical categories of an edge (i.e., C, D, or R), while the DOMAIN_BCV scheme only switches edges whose target vertices also share the same biological category of a vertex, BCV. The strictest randomization, DOMAIN_BCE, finally, only considers edges with, additionally, the same biological category of an edge, BCE. A tabular overview of the four schemes is given in Supplementary Table S1.

VI Results

The main feature of our reconstructed network, the three-domain structure based on the biological role of its constituents, allows us to study the influence of localizing the initial perturbation. Thus, although we will not focus on (topological) details of the graph here (which will be presented elsewhere grimbs_integrated_2016 ), already from the vertex and edge counts in Figure 1 we see that the domains are of different structure. While the regulatory and the metabolic subgraphs, $G_{R}$ and $G_{M}$ have average (internal) degrees of about $1.9$ and $2.5$ , the interface subgraph, $G_{I}$ , is very sparse with $\langle k\rangle\approx 0.6$ and we can expect it to be fragmented. Hence, in the following we decide to only perturb in $G_{R}$ and $G_{M}$ .

In a first step we sample some full cascade trajectories in order to check our expectation of different responses of the system to small perturbations applied in either $G_{R}$ or $G_{M}$ ; two rather large values of $q$ are chosen and the raw number of unaffected nodes is logged during the cascade. Indeed, already this first approach implies a different robustness of the gene regulatory and the metabolic domains in terms of the transmission of perturbation cascades to the other domains. Cascades seeded in the metabolic domain of the network tend to be rather restricted to this domain, while the system seems much more susceptible to small perturbations applied in the gene regulatory domain. This effect can be seen both in the overall sizes of the aggregated cascades as well as in the domain which shows the largest change with respect to the previous time step, which we indicate by black markers in Figure 2. More sample trajectories are shown in the Supplementary Materials and, although they illustrate occasionally large fluctuations between the behaviors of single trajectories, they are consistent with this first observation. They also show that considerably larger metabolic perturbations are needed for large cascades and back-and-forth propagation between domains to emerge.

After this first glance at the system we aim for a more systematic approach and apply our analysis as described above: we compute cascade steady-states $\vec{\sigma}_{\infty}$ but now we choose the largest (weakly) connected component $B(X,q)$ as the order parameter and compute the susceptibility according to equation $(\ref{eq:susceptibility})$ , the peak-position of which, when considered as a function of $q=1-p$ , we use as a proxy for the perturbation size at which the interdependent system breaks down.

The results for different initially perturbed domains illustrate that, indeed, a considerably lower $p_{c}$ (i.e., larger critical perturbation size $q_{c}$ ) is estimated in the case of metabolic perturbations compared to regulatory or non-localized ones (Figure 3, panel a). For each point we average $500$ runs for the corresponding set of parameters.

In order to assess whether the above-described behavior is due to specific properties of the network we use the sets of randomized graphs. For each of the four randomization schemes we prepared $500$ graph instances and repeated the analysis for each of them as done before for the single original graph. The corresponding results for the susceptibility (Figure 3, panels b–e) yield two major observations: firstly, metabolic perturbations still lead to, albeit only slightly, higher $q_{c}=1-p_{c}$ estimates (with exception of DOMAIN randomization). Thus, the system’s tendency to be more robust towards metabolic perturbations is largely preserved. Secondly, we see that overall the original network seems to be much more robust than the randomized networks; very small perturbations are sufficient to break the latter ones. The robustness of the original graph, thus, cannot be solely due to the edge and vertex properties kept in the randomization schemes.

Finally, let us focus on the practical aspect of these findings. Beyond the careful statistical analysis described above, a quantity of practical relevance is the average size of the unaffected part of the system under a perturbation. For this purpose, we examine the fractions of unaffected vertices, $A(X,q)$ , after cascades emanating from perturbations of different sizes and seeded in different domains, regardless of the resulting component structure and for both, the original graph and the shuffled ones (Figure 4).

The number of unaffected vertices for the real network is much larger than for all four randomization schemes, suggesting a strong overall robustness of the biological system. Distinguishing, however, between the metabolic and the gene regulatory components reveals that the metabolic part is substantially more robust than the regulatory part (for not too large initial perturbations, $p>0.94$ ).

VII Discussion and Outlook

We investigated the spreading of perturbations through the three domains of a graph representation of the integrated system of E. coli’s gene regulation and metabolism. Our results quantify the resulting cascading failures as a function of size and localization of the initial perturbation.

Our findings show that the interdependent network of gene regulation and metabolism unites sensitivity and robustness by showing different magnitudes of damage dependent on the site of perturbation.

While the interdependent network of these two domains is in general much more robust than its randomized variants (retaining domain structure, degree sequence, and major biological aspects of the original system), a pronounced difference between the gene regulatory and metabolic domain is found: Small perturbations originating in the gene regulatory domain typically trigger far-reaching system-wide cascades, while small perturbations in the metabolic domain tend to remain more local and trigger much smaller cascades of perturbations.

In order to arrive at a more mechanistic understanding of this statistical observation, we estimated the percolation threshold of the system, $p_{c}$ , and found that it is much lower (i.e, larger perturbations, $q_{c}=1-p_{c}$ , are required) for perturbations seeded in the metabolic domain than for those applied to the gene regulatory domain.

This is in accordance with the intuition that the metabolic system is more directly coupled to the environment (via the uptake and secretion of metabolic compounds) than the gene regulatory domain. The distinct perturbation thresholds therefore allow for implementing a functionally relevant balance between robustness and sensitivity: The biological system can achieve a robustness towards environmental changes, while – via the more sensitive gene regulatory domain – it still reacts flexibly to other systemic perturbations.

Discovering this design principle of the biological system required establishing a novel method of analyzing the robustness of interdependent networks, the network response to localized perturbations: An interdependent network can have markedly different percolation thresholds, when probed with perturbations localized in one network component compared to another.

Lastly, we would like to emphasize that the application of the theoretical concepts of interdependent networks to real-life systems involves several non-trivial decisions:

In the vast majority of (theoretical) investigations, two definitions of interdependent networks coincide: the one derived from a distinction between dependency links and connectivity-representing links and the one based on two functionally distinguishable, but interconnected subnetworks.

Here we have three classes of nodes: those involved in gene regulation, metabolic nodes, and nodes associated with the (protein) interface between these two main domains. These nodes are interconnected with (functionally) different classes of links. These link classes are necessary to define meaningful update rules for perturbations. As a consequence, the notion of dependency links vs. connectivity links is no longer applicable. We expect that such adjustments of the conceptual framework will often be required when applying the notion of interdependent networks to real-life systems.

As mentioned above, one major task when dealing with biological data is to abstract from the minor but keep the essential details; we have outlined that in this study we chose to keep a rather high level of detail.

With only incomplete information available, a challenge is to find the right balance between radical simplifications of systemic descriptions and an appropriate level of detail still allowing for a meaningful evaluation of biological information. Here we incorporate high level of detail in the structural description, distinguishing between a comparatively large number of node and link types. This rich structural description, together with a set of update rules motivated by general biological knowledge, allows us to assess the dynamical/functional level with the comparatively simple methods derived from percolation theory.

An important question is, whether the analysis of the fragmentation of such a network under random removal of nodes can provide a reliable assessment of functional properties, since the response of such a molecular network clearly follows far more intricate dynamical rules than the percolation of perturbations can suggest.

A future step could include the construction of a Boolean network model for the full transcriptional regulatory network and the connection of this model to flux predictions obtained via flux balance analysis, a first attempt of which is given in Samal and Jain samal_regulatory_2008 (where the model of Covert et al. covert_integrating_2004 with still fewer interdependence links has been used).

Our perturbation spreading approach might help bridging the gap between theoretical concepts from statistical physics and biological data integration: Integrating diverse biological information into networks, estimating ’response patterns’ to systemic perturbations and understanding the multiple systemic manifestations of perturbed, pathological states is perceived as the main challenge in systems medicine (see, e.g., Bauer et al. bauer2016interdisciplinary ). Concepts from statistical physics of complex networks may be of enormous importance for this line of research Barabasi:2011iha ; huett2014 .

While the simulation of the full dynamics is still problematic as our knowledge of the networks is still incomplete, our present strategy extracts first dynamical properties of the interdependent networks. At a later time point, we can expect qualitatively advances from full dynamical simulations, however, dependent on the quality of the data sets.

On the theoretical side, future studies might shift the focus onto recasting the system into an appropriate spreading model, e.g., in the form of an unordered binary avalanche samuelsson_exhaustive_2006 ; gleeson_mean_2008 , or as an instance of the Linear Threshold model watts_simple_2002 with a set of links with a very high and a second set with a very low transmission probability (C/R and D-links, respectively).

Radicchi radicchi_percolation_2015 presents an approach for the investigation of the percolation properties of finite size interdependent networks with a specific adjacency matrix with the goal of loosening some of the assumptions underlying the usual models (e.g., infinite system limit, graphs as instances of network model). While this formalism allows for the investigation of many real-world systems there are still restrictions as to the possible level of detail. In our special case, for instance, a considerable amount of information would be lost if the system was restricted to vertices with connections in both the C/R- and D-layers.

The existence of different percolation thresholds for localized perturbations in interdependent networks may reveal itself as a universal principle for balancing sensitivity and robustness in complex systems. The application of these concepts to a wide range of real-life systems is required to make progress in this direction.

Acknowledgements.

S.B. and M.H. acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), grants BO 1242/6 and HU 937/9.

Author Contributions

M.H. and S.B. designed and supervised the study; D.K. and A.G. performed the reconstruction, simulations and analyses; and D.K., A.G., S.B. and M.H. wrote the manuscript.

Competing Financial Interests

The authors declare no competing financial interests.

I Description and Schematic Overview of the analysis

The network response to localized perturbations analysis presented here (see Figure S1), is a multi-step method entailing several runs of the perturbation algorithm, $\mbox{Per}(X,q)$ , and the statistical evaluation of these runs. More precisely, for each analysis 500 runs of the perturbation algorithm have been performed and the set of the remaining vertices have been evalueted, e.g., in terms of the susceptibility, $\chi\left(X,q\right)$ (part a in Figure S1).

For a single run of the perturbation algorithm, $\mbox{Per}(X,q)$ , a domain $X$ has to be chosen where a perturbation of size $q=1-p$ (measured as a fraction of vertices) will be applied,

[TABLE]

Based on $X$ and $q$ the set of initially perturbed vertices, $V_{0}\left(X,q\right)$ , is randomly selected. The state of the system can also be described as a vector of Boolean state variables, $\vec{\sigma}$ ,

[TABLE]

where 0 denotes a perturbed vertex and 1 an unaffected one.

Running the perturbation dynamics described in the main manuscript will (probably) cause the failure of further vertices resulting in a time series of affected vertices, $\{V_{t}(X,q)\}_{t}$ (or, equivalently, to the trajectory $\sigma(t,\,X,q)$ ). The size of the affected network after $t$ propagation steps can be described as

[TABLE]

From the set of affected vertices in the in the asymptotic regime, $V_{\infty}\left(X,q\right)$ , the size of the unaffected network, $A\left(X,q\right)$ , and the size of the largest (weakly) connected component ( $LCC$ ) of the unaffected network, $B\left(X,q\right)$ are computed (Figure S1, part b),

[TABLE]

Randomized networks (we used sets of $500$ instances for each of the randomization schemes) can be passed to the algorithm instead (Figure S1, part c).

II Notation of vertex and edge categories

The following notation concerning vertices $v_{i}\in V$ and edges $e_{i}\in E$ and their properties have been used. In our integrated network model a vertex is characterized by its biological function yielding seven unique biological categories of a vertex (BCVs), ’reaction’ ( $rxn$ ), ’compound’ ( $cmp$ ), ’gene’ ( $gn$ ), ’protein monomer’ ( $pm$ ), ’protein-protein complex’ ( $ppc$ ), ’protein-compound complex’ ( $pcc$ ), and ’protein-rna complex’ ( $prc$ ; see Table S3),

[TABLE]

Introducing an additional vertex classification facilitates the assignment to one of the three functional domains as well as the edge characterization. The domain-related categories of a vertex (DCVs) are eight-fold: ’gene’ ( $g$ ), ’protein’ ( $p$ ), ’complex’ ( $x$ ), ’enzyme’ ( $z$ ), ’reaction’ ( $r$ ), ’compound’ ( $c$ ), ’educt’ ( $e$ ), ’product’ ( $d$ ),

[TABLE]

While the categories $g$ and $r$ are a one-to-one translations of the corresponding BCVs, i.e., $gn$ and $rxn$ , the domain-related category $c$ only comprises vertices of BCV $cmp$ but the inverse does not hold. The remaining five categories are ambiguous assignments for both BCVs to DCVs and vice versa. The complete mapping of BCVs onto DCVs is given in Table S4.

An edge is characterized by its source and target vertices, $v_{s}^{(i)}$ and $v_{t}^{(i)}$ , and their corresponding domains (’source domain’, SD and ’target domain’, TD), as well as the edge’s logical category and its biological category; an edge is given by

[TABLE]

thus determining SD and TD. The logical category of an edge (LCE) determines, qualitatively speaking, whether a perturbation will propagate along this edge via a logical AND or a logical OR; the three categories are ’conjunct’ ( $C$ ), ’disjunct’ ( $D$ ) and ’regulation’ ( $R$ ),

[TABLE]

As an illustration of the potential linkages, two case examples are presented in Figure S2.

The biological categories of an edge (BCEs) are derived from combinations of the domain-related categories of a vertex (DCVs), plus ’transport’ ( $t$ ) and regulation ( $r^{*}$ ),

[TABLE]

The mapping of biological categories of edges onto logical categories of edges is given in Table S5.

III Tables

IV Sample perturbation trajectories

Here, further sample trajectories are given similar to the ones in Fig. 2 in the main manuscript.

References

(1)

Grimbs, A., Klosik, D. F., Bornholdt, S. & Hütt, M.-T.

Integrative system-wide modeling of metabolic and regulatory processes in Escherichia coli (2016).

Unpublished.

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Kitano, H. Systems Biology: A Brief Overview. Science 295 , 1662–1664 (2002).
2(2) Ideker, T., Galitski, T. & Hood, L. A new approach to decoding life: Systems Biology. Annu. Rev. Genomics Hum. Genet. 2 , 343–372 (2001).
3(3) Aderem, A. Systems Biology: Its Practice and Challenges. Cell 121 , 511–513 (2005).
4(4) Milo, R. et al. Network Motifs: Simple Building Blocks of Complex Networks. Science 298 , 824–827 (2002).
5(5) Llaneras, F. & Picó, J. Stoichiometric modelling of cell metabolism. J. Biosci. Bioeng. 105 , 1–11 (2008).
6(6) Cohen, R., ben Avraham, D. & Havlin, S. Percolation critical exponents in scale-free networks. Phys. Rev. E 66 , 036113 (2002).
7(7) Dorogovtsev, S. N. & Mendes, J. F. F. Evolution of networks. Adv Phys 51 , 1079–1187 (2002).
8(8) Newman, M. E. J. Spread of epidemic disease on networks. Phys. Rev. E 66 , 016128 (2002).