Non-Markovian recovery makes complex networks more resilient against large-scale failures
Zhao-Hua Lin, Mi Feng, Ming Tang, Zonghua Liu, Chen Xu, Pak Ming Hui, and Ying-Cheng Lai

TL;DR
This paper investigates how non-Markovian recovery processes with memory influence failure propagation in complex networks, revealing that such memory effects can enhance network resilience against large-scale failures.
Contribution
It introduces a pair approximation analysis considering non-Markovian recovery with delay, demonstrating that memory can unexpectedly improve network robustness.
Findings
Non-Markovian recovery can reduce large-scale failure likelihood.
Memory effects enhance network resilience in both natural and engineered systems.
The analysis provides insights for designing more robust networks.
Abstract
Non-Markovian spontaneous recovery processes with a time delay (memory) are ubiquitous in the real world. How does the non-Markovian characteristic affect failure propagation in complex networks? We consider failures due to internal causes at the nodal level and external failures due to an adverse environment, and develop a pair approximation analysis taking into account the two-node correlation. In general, a high failure stationary state can arise, corresponding to large-scale failures that can significantly compromise the functioning of the network. We uncover a striking phenomenon: memory associated with nodal recovery can counter-intuitively make the network more resilient against large-scale failures. In natural systems, the intrinsic non-Markovian characteristic of nodal recovery may thus be one reason for their resilience. In engineering design, incorporating certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Non-Markovian recovery makes complex networks more resilient against large-scale failures
Zhao-Hua Lin
State Key Laboratory of Precision Spectroscopy and School of Physics and Electronic Science, East China Normal University, Shanghai 200241, China
Mi Feng
Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai 200241, China
Ming Tang
State Key Laboratory of Precision Spectroscopy and School of Physics and Electronic Science, East China Normal University, Shanghai 200241, China
Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai 200241, China
Zonghua Liu
State Key Laboratory of Precision Spectroscopy and School of Physics and Electronic Science, East China Normal University, Shanghai 200241, China
Chen Xu
School of Physical Science and Technology, Soochow University, Suzhou 215006, China
Pak Ming Hui
Department of Physics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Ying-Cheng Lai
School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287, USA
Abstract
Abstract
Non-Markovian spontaneous recovery processes with a time delay (memory) are ubiquitous in the real world. How does the non-Markovian characteristic affect failure propagation in complex networks? We consider failures due to internal causes at the nodal level and external failures due to an adverse environment, and develop a pair approximation analysis taking into account the two-node correlation. In general, a high failure stationary state can arise, corresponding to large-scale failures that can significantly compromise the functioning of the network. We uncover a striking phenomenon: memory associated with nodal recovery can counter-intuitively make the network more resilient against large-scale failures. In natural systems, the intrinsic non-Markovian characteristic of nodal recovery may thus be one reason for their resilience. In engineering design, incorporating certain non-Markovian features into the network may be beneficial to equipping it with a strong resilient capability to resist catastrophic failures.
Introduction
The dynamics of failure propagation on complex networks constitute an active area of research in network science and engineering with significant and broad applications. This is because the functioning of a modern society relies on the cooperative working of many networked systems such as the electrical power grids, various transportation networks, computer and communication networks, and business networks, but these networks typically possess a complex structure and are vulnerable to failures and intentional attacks. Among the diverse failure scenarios, one of the most severe types is cascading failures ML:2002 , where the failure of some nodes would cause their neighbors to fail and the process would propagate to the entire network, disabling a large fraction of the nodes and causing malfunctioning of the system at a large scale ZPL:2004 ; ZPLY:2005 ; GC:2007 ; Bialek:2007 ; dobson2007complex ; Gleeson:2008 ; rosato2008modelling ; HLC:2008 ; SBPBH:2008 ; YWLC:2009 ; takayasu2010econophysics ; HL:2011 ; WLA:2011 ; LWLW:2012 . Classic examples of cascading failures include power blackout
- the collapse of power grids Bialek:2007 ; dobson2007complex , traffic jams li2015percolation , and even economic depression parshani2011critical ; WLA:2011 . Previous studies mostly focused on how cascading failures occur, how network structures and failure propagation are related, and on network robustness and vulnerability to failure propagation watts2002simple ; dodds2004universal ; simonsen2008transient ; buldyrev2010catastrophic ; ganin2017resilience .
A tacit assumption employed in many previous studies of cascading failures is irreversible failure propagation, where a node, if it has failed, cannot recover and is no longer able to function actively. A failed node is then removed from the network completely, including all the links associated with it. There are real-world situations of networked systems, such as financial and transportation networks, where failed nodes can recover from malfunctioning spontaneously after a collapse nudo2013recovery ; shang2015impact ; hu2016recovery ; white2001autonomic ; toohey2007self ; desmurget2007contrasting . In general, there are two types of failure-and-recovery scenarios majdandzic2014spontaneous : internal and external. In the first type, a node fails because of internal causes (e.g., the occurrence of some abnormal or undesired dynamical behaviors within the node), which is independent of the states of its neighbors. In this case, the node can recover spontaneously after a period of time. An example is the failure of a company characterized by a drop in its market value due to poor management, followed by recovery due to internal restructuring. The second type is external failures, where a node's failure is externally triggered, e.g., by the failures of its neighboring nodes. After a period of time, as its local ``environment'' is improved, the node is able to recover spontaneously. The time of recovery depends not only on the specific type of failure-and-recovery mechanism, i.e., whether internal or external, but also on the individual node and its position in the network. For example, for a given node in the network, it may take longer to complete an internal restructuring process to recover from a failure due to an internal than an external cause. Previous computation and mean-field analysis have revealed that cascading dynamics incorporating a failure-and-recovery mechanism can exhibit a rich variety of phenomena such as phase transitions, hysteresis, and phase flipping majdandzic2014spontaneous ; podobnik2014network ; Podobnik2015 ; Podobnik2015a ; Majdandzic2016 . With respect to the resilience responses of networks, the effects of removing a fraction of nodes and links on network functions were studied national2012disaster ; gao2016universal ; ganin2016operational ; linkov2019science , demonstrating that resilience can be used to characterize the critical functionality of the network with applications in complex infrastructure engineering ganin2016operational ; linkov2019science .
In spite of the variations in the recovery dynamics across networks or even nodes in the same network, generally the process can be classified into two distinct types: Markovian and non-Markovian. In a Markovian recovery process, an event occurs at a fixed rate and the inter-event time follows an exponential distribution pastor2015epidemic ; wang2017unification ; de2018fundamentals , rendering memoryless the process. On the contrary, a non-Markovian recovery process has memory, as the current state of a node depends not only on the most recent state but also on the previous states. In this case, the inter-event time distribution is not exponential but typically exhibits a heavy tail. For example, in human activity and interaction dynamics, the occurrences of contacts among the individuals in a social network can be characteristically non-Markovian, for which there is mounting empirical evidence Barabasi:2005 ; GHB:2008 ; SGMB:2012 ; ZYZZHL:2013 ; ZHHLL:2014 ; PSRPGB:2015 ; ZZYWL:2016 ; YWGL:2017 . Non-Markovian type of recovery process also occurs in biochemical reactions bratsun2005delay and in the financial markets takayasu2010econophysics ; scalas2006waiting . We note that, in the context of spreading dynamics on complex networks, the effects of the non-Markovian process, due to its high relevance to the real world, have attracted growing attention VRLB:2007 ; IM:2009 ; MB:2013 ; JPKK:2014 ; KRV:2015 ; SGB:2017 ; SMBK:2018 ; FCTL:2019 . From the point of view of mathematical analysis, incorporating memories into the dynamical process makes analytic treatment challenging.
While the impacts of non-Markovian processes on spreading dynamics have been reasonably well documented VRLB:2007 ; IM:2009 ; MB:2013 ; JPKK:2014 ; KRV:2015 ; SGB:2017 ; SMBK:2018 ; FCTL:2019 , there has been little work so far addressing the influence of non-Markovian recovery process on failure propagation dynamics. In this paper, we address this issue systematically through a comparison study of two types of dynamical processes: one with Markovian and another with non-Markovian recovery. In the Markovian recovery (MR) model, failures due to internal and external causes will recover with different constant rates. In the non-Markovian recovery (NMR) model, such a constant rate cannot be defined. We thus resort to the recovery time. In particular, we assume that the failed nodes due to internal and external causes will take different time to recover, so a memory effect is naturally built into the model. For each model, we develop a mean-field theory and an analysis based on the pair approximation (PA) majdandzic2014spontaneous ; Valdez2016 ; Bottcher2017 ; Bottcher2017a ; keeling1997correlation that retains the two-node correlation but ignores any correlation of higher orders. Comparing results with numerical simulations indicates that both mean-field theory and PA analysis capture the key features of the failure propagation dynamics qualitatively, but the PA analysis yields results that are in better quantitative agreement with numerics. The counterintuitive and striking phenomenon is then that non-Markovian character with a memory effect makes the network more resilient against large-scale failures. There are two implications. Firstly, in physical, biological, or other natural networked systems, the intrinsic non-Markovian character of nodal recovery may be one reason for resilience of these networks and their existence in a harsh environment. Secondly, in engineering and infrastructure design, incorporating certain non-Markovian features into the network may help strengthen its resilience and robustness.
Results
.1 Spontaneous recovery models
For general failure propagation dynamics on a network, a node can be in one of two states: an active (labeled as -type) state in which the node functions properly and an inactive state (-type) in which the node has failed. To distinguish the causes for a node to become inactive, we label an inactive node due to internal or external failure as -type or -type, respectively.
In the NMR model, an -type node may fail spontaneously at the rate to become an -type node; or it may fail at the rate to become a -type node when the number of its -type neighboring nodes is less than or equal to a threshold integer value that sets the limit on neighboring support for proper functioning of a node. Without loss of generality, we assume that external failures occur more frequently than internal failures: . This is often the case as internal failures can be made less probable by building up the capability of the nodes through better equipment and/or management, while external failures are uncontrollable and more difficult to avoid. For examples, falling stocks may be the result of unanticipated changes in the market rather than poor management. In a road network, failures are caused more often by congestion than by physical failures. Once a node becomes inactive, it takes time to recover from an internal failure (when the node is of the -type) or time to recover from an external failure (when the node is of the -type). The non-Markovian characteristic is taken into account through the incorporation of a memory effect into the model. In particular, the nodes that will recover at time constitute those that were turned into -type inactive nodes at the time and those turned into -type inactive nodes at the time . Here, we assume , for the reason that repairing a node or restructuring the management due to the malfunctioning of the node itself would need more time. For example, reorganizing a company or repairing a road often takes more time. The failure processes characterized by the rates and as well as the recovery processes as determined by and are schematically illustrated in Fig. 1.
Note that the case of may also arise in the real world. For example, for an infrastructure network in civil engineering, when an earthquake strikes and destroys buildings (nodes), the time to rebuild can be longer than that required for recovering from internal failures, e.g., the collapse of a roof due to some material failure. Our computations of this case yield qualitatively similar results to those in the case of - see Supplementary Note 3 for detail.
The MR and NMR models differ only in the recovery processes. In the MR model, an inactive node of the -type or the -type recovers at a constant rate or , respectively, as illustrated in Fig. 1. Consequently, the number of nodes recovered at time depends only on the number of inactive nodes of both -type and -type at the previous time step.
To develop theories for failure propagation on networks with MR or NMR recovery process and to identify the key differences between the two type of dynamics, we focus on random regular networks. In the numerical simulations, we use a relatively large network size with the degree . In the NMR model, the recovery times are taken to be and for the -type and -type of nodes, respectively. In the MR model, the values of the recovery rates are set to be and so that they correspond to the same scales for the recovery times in the NMR model (see Supplementary Note 1 for a more detailed explanation). The threshold values in both models are . Synchronous updating is invoked in simulations with the time step .
.2 Markovian Recovery Process
Mean-field theory. We start with setting up the dynamical equations for MR dynamics and comparing results with simulations. Based on the mean-field theory in part A of Methods, we first examine the behavior of in Eq. (4). Figure 2(a) shows the dependence of on the fraction of failed nodes . It can be seen that exhibits two different types of behaviors over a large part of : for a wide range of small values (low failure) and for a range of large values (high failure). In the low failure state, external failure events rarely occur. In the high failure state, an active node is supported by an insufficient number of active neighbors and external failure events almost always happen. It implies that the stationary state can possess two branches: setting in Eq. (5) gives as the low-failure branch, while setting gives as the high-failure branch. The two branches are shown in Fig. 2(b) (dashed and solid curves) in terms of the dependence of on , for , and as an example. To check which branch the system would take on and whether there are two states for some range of parameters, the simulation results for moving the value of up (circles) and down (squares) are shown in Fig. 2(b) for comparison. As the values of are increased or decreased, the initial state is taken to be the final state corresponding to the previous value of - the adiabatic process. The results indicate that: (i) the values of from simulations follow the two branches given by the mean-field approximation, and (ii) the low-failure (high-failure) branch is followed when moving up (down) until a particular value at which there is a jump to the high-failure (low-failure) branch - the signature of a hysteresis. The results also imply that if one starts from the initial conditions and , there exists a critical value of for a sudden increase in the number of failed nodes when is small as the system will follow the low-failure branch. However, for large , the critical value becomes as the system will follow the high-failure branch. A plot of against will therefore exhibit two plateaus with for small and for large .
The mean-field approximation not only simplifies the analysis but also provides insights into the dynamical process. For example, the mean-field theory suggests the ratios and as key parameters. In general, solutions can be obtained numerically by solving Eq. (5) together with Eq. (4). The results are shown in Fig. 2(c) as a phase diagram. For parameters falling into the regions corresponding to the low-failure (high-failure) phase, the system will evolve into a low-failure (high-failure) state. For parameters in the bistable phase, the system will evolve either to a low-failure or a high-failure state, depending on the initial conditions. The high-failure and low-failure phase boundaries meet at the critical point determined by and .
In addition to the stationary state, the evolution of the system can also be studied by iterating Eqs. (1) and (2) for a given initial condition. Figure 3(a) shows the evolution of the MR dynamics as obtained by the mean-field theory for and . In the three-dimensional space formed by , , and , the sum rule defines a triangular plane, as shown in Fig. 3. At any time , the state of the system is characterized by a point in the plane. The results show that the MR dynamics will evolve into either the low-failure or the high-failure state (filled circles), depending on where the system begins. The mean-field theory also gives a separatrix, the line traced out by the open circles, where the system will evolve into a different state starting from a point on a different side of the separatrix. For , the system will evolve to a high-failure state with (,,) given approximately by (, , ). For , the system may evolve to the high-failure state or a low-failure state at around (, [math], ). Numerical results are shown in Fig. 3(b), verifying all the features predicted by the mean-field theory. For example, the high-failure state is given by (,,) (, , ) and the low-failure state at around (, [math], ), both are quite close to the values predicted by the mean-field theory.
Pairwise approximation theory for the MR model. It is possible to formulate a theory that takes into account of two-node spatial correlation based on the pairwise approximation (PA). The basic idea is to follow the evolution of different types of links, i.e., links that connect different pairs of neighboring nodes keeling1997correlation . The PA method has been used widely in studying epidemic and information spreading ben1992mean ; mata2013pair ; gross2006epidemic , and in coevolving voter models and adaptive games with two or more strategies ji2013coevolve ; zhang2013SG ; zhang2014coevolve ; choi2017coevolve . In Part B of Methods, we develop a PA based theory for the MR model.
Figure 4 presents a comparison of the predictions of the PA analysis and mean-field theory with the numerical results, where Fig. 4(a) shows the time evolution of and from the initial state for , , , and . While both mean-field and PA theories capture the key features in time evolution, the results of PA are in better agreement with those from simulations. It is useful to understand the dynamical behaviors in the MR model qualitatively (so as to enable a meaningful comparison with those of the NMR model later). For this purpose, we identify several stages in the time evolution as marked in Fig. 4(a). In the early stage, i.e., , most nodes are active and they have more active neighbors, violating the condition . As a result, only internal failures occur and grows but decreases and eventually vanishes. For , , active nodes start to fail into -type nodes, leading to fewer active nodes in the system and triggering more external nodal failures. This results in the observed rapid increase in . In the later stage , there are more failed nodes than active ones. While the failed nodes of and types can recover with their respective rates, the remaining or recovered active nodes will more likely fail again through external than internal causes due to the many failed nodes surrounding the active nodes. Consequently, in this later stage, increases and decreases toward their respective steady-state values for , with when the system evolves into a high-failure state. The PA analysis captures the behavior of over time and the onset of better than the mean-field analysis. Figure 4(b) shows the phase diagram for and . The mean-field phase diagram is the same as that shown in Fig. 2(c), where it can be seen that the results of the PA analysis (solid curves) are indeed in better agreement with the simulation results than the predictions of the single-node mean-field theory.
Note that Fig. 2 reveals the emergence of a critical value of in the spontaneous failure rate beyond which the system incurs a large-scale failure starting from the initial conditions and . The critical rate is calculated by starting the system from the initial conditions for different values of (for a fixed value of ) and search for the value of beyond which the system attains a high-failure state (see Supplementary Figure 1 in Supplementary Note 2). The critical value thus depends on , the initial fraction of failed nodes due to an internal mechanism. Figure 4(c) shows the numerically obtained functional relation (open circles), together with two types of theoretical prediction (PA analysis and mean-field theory). As the initial fraction is increased from a near zero value, maintains at a relatively higher constant value (about 0.007). As increases through the value of about 0.4, the value of the critical rate suddenly decreases to about 0.003. We see that, again, the prediction of the abrupt change in by the PA analysis is more accurate than that by the mean-field theory.
What is the physical meaning of the abrupt decrease in the critical value of the spontaneous failure rate as displayed in Fig. 4(c)? A higher value of means that the network system is more resilient to large-scale failures as it requires a larger rate value to drive the system into a high-failure state. As the fraction of initially failed nodes is increased, the network as a whole is more prone to large-scale failure so we expect the value of to decrease. Because of the lack of any memory effect in the ideal, Markovian type of recovery process, i.e., after a node fails, it either recovers instantaneously or does not recover (with probabilities determined by the rate of recovery), we expect a characteristic change in the system dynamics as characterized by the value of the critical rate to occur in an abrupt manner. Indeed, as Fig. 4(c) reveals, as the fraction of initially failed nodes is increased through a threshold value, there is a sudden decrease of about in the value of , giving rise to a first-order type of transition. This behavior of abrupt transition may not occur in reality because of the assumed Markovian recovery process, which is ideal and cannot be expected to arise typically in the physical world. In the next section, we will demonstrate that making the dynamics more physical by assuming non-Markovian type of recovery process will drastically alter the picture of transition in Fig. 4(c).
.3 Non-Markovian recovery process
To analyze failure propagation dynamics in systems with NMR, a viable approach is to construct difference equations that relate the fractions of types of nodes and links at time to those at time . It is necessary to keep track of the time when a node becomes the or type as well as the time at which a link becomes type . In Part C of Methods, we develop a PA analysis for the NMR model. Figure 5 shows the simulation results from the NMR model, together with predictions of the PA analysis and mean-field approximation for . The time evolution of and is shown for the parameter setting , , (thus ), and (thus ). The initial conditions are . Both theories capture the key features of the dynamics. Comparing with results from the MR model [e.g., Fig. 4(a)], we see that the time evolution of the dynamical variables in the NMR model is different from that in the MR model, in spite of the approximately identical steady-state values.
To describe the key features of the NMR model, we divide the evolution into five stages with the respective time intervals , , , , and , as shown in Fig. 5(a). In the earliest stage , increases due to internal failures but is insufficient to cause external failures. The behavior is similar to that in the MR model, but the duration is shorter and the rise in is steeper in the NMR model. The reason is that the memory effect in NMR model allows the recovery of -type nodes to take place only after steps, while the recovery occurs probabilistically in the MR model. In the narrow time window of , attains a level high enough to trigger the onset of many external failures. As a result, the failed nodes constitute the majority in the system and decreases sharply, giving rise to the sharp increase in . The -type nodes recover deterministically after () into active nodes. In the period , the recovery of -type nodes refuels the system with active nodes that can participate in two paths: more internal and external failures. For , the existing -type nodes have yet to recover and continues to increase but at a slower pace due to the external failure path, while reduces slightly.
In the time window , the initial internally failed nodes begin to recover as , in addition to the recovery of the -type nodes. The -type nodes due to recovery will be more likely to become -type as the failed nodes remain the majority (due to the parameter setting in this example). This leads to the observed increase in and decrease in in the time interval . In the final stage , stops decreasing because the recovery of -type nodes at the time is due to those failed internally at for which the number was small. However, the recovery of -type nodes at a shorter time scale supplies fresh active nodes. The fraction of failed nodes is so high, i.e., approaching the high-failure state, that the dynamics lead to a higher steady value of than in long time. For time well beyond , both and become steady.
Figure 5(b) shows the phase diagram of the NMR model analogous to Fig. 4(b) for the MR model, with and . The results of the PA analysis (solid curve) are in better agreement with the simulation results than those obtained from the mean-field theory (dashed curve). The difference in dynamics in the NMR model also alters the dependence of to sustain a high-failure state on . Carrying out the same analysis as for the MR model (see Supplementary Figure 1 in Supplementary Note 2 for details), we get the relationship for attaining a high-failure state for a given initial condition, as shown in Fig. 5(c). The pair approximation, again, gives more accurate prediction than that from the mean-field theory.
The result in Fig. 5(c) demonstrates the striking effect of non-Markovian type of recovery with memory on the failure propagation dynamics, which is in stark contrast to the ideal case of Markovian process as exemplified in Fig. 4(c). In particular, as the fraction of initially failed nodes is increased from a near zero value to one, the value of begins to decrease continuously and smoothly until it reaches a minimum, at which increases relatively more rapidly to a high value of about 0.006 for . For , the value of remains approximately constant at 0.006. Comparing Fig. 5(c) with Fig. 4(c), we see two major, characteristic differences. Firstly, the behavior of an abrupt decrease in the Markovian case is replaced by a gradual process in the non-Markovian case, essentially converting a first-order like process to a second-order one. Secondly and more importantly, recovers from its minimum value and maintains at a high value regardless of the value of insofar as it exceeds about . This means that, the system can maintain its degree of resilience even when the initial fraction of failed nodes reaches ! This contrasts squarely the behavior in the Markovian case where the system resilience is reduced dramatically even when only about of the nodes failed initially. In this sense we say that a non-Markovian type of memory effect makes the network system more resilient against failure propagation.
While the behavior in Fig. 5(c) is counterintuitive, a heuristic reason is as follows. For an initial state with many initial -type nodes, the few remaining nodes will switch from being active to the -type and back. All the initial -type nodes will have to wait for the time period to recover. At that time, the system becomes one with only a few failed nodes - effectively equivalent to one with small value and requiring a larger value to evolve into the high-failure state. In a range of small , a smaller can already cause more active nodes to become -type, helping maintain the system in a high-failure state as described for Fig. 5(a). Theoretical support for the behavior is provided by the PA analysis and mean-field theory, as shown in Fig. 5(c).
In addition to the different time evolution in the MR and NMR models, there are also cases where the same initial conditions , , and would lead to different final states. Figure 6 shows the final states starting from any and in the - plane (the basin structure), with , , , and . The results from the mean-field theory [Fig. 6(a)] and direct simulations [Fig. 6(b)] show essentially the same features. (Results from the initial-condition setting and are presented in Supplementary Figure 2 of Supplementary Note 2.) It is useful to contrast the final states of the MR and NMR models. From Fig. 3, an initial state, e.g., =, will evolve into a high-failure state in the MR model, but it will end up in a low-failure state in the NMR model. This means that, the NMR process can make the system more resilient to failures. (More examples can be found in Supplementary Figure 3 of Supplementary Note 2 where different steady states from the two models are presented.)
.4 MR and NMR dynamics on heterogeneous networks
So far our analysis and simulations have been carried out for MR and NMR dynamics on random regular networks. We find that altering the network structure causes little change in the qualitative results. For example, we have carried out simulations on scale-free networks of size with degree range and degree distribution . Figure 7 shows the results of versus for the MR and NMR dynamics for networks with . Because of the heterogeneity in the nodal degree distribution, the threshold on external failure is given in terms of the fraction one-half of the failed neighbors.
Comparing results with Fig. 4(c) for MR dynamics and Fig. 5(c) for NMR dynamics in random regular networks, we see that the key features are similar when the underlying random regular networks are replaced by scale-free networks. We have also carried out numerical simulations on four additional types of synthetic and empirical networks: (a) networks with degree-degree correlation, (b) networks with a community structure, (c) empirical arenas-email network, and (d) empirical friendship-hamster network, with results presented in Supplementary Notes 4 and 5 for the former and latter two cases, respectively. These results, together with Fig. 7, suggest that, for heterogeneous networks, a non-Markovian process tends to enhance the network resilience against large-scale failures.
Discussion
The intrinsic memory effect associated with non-Markovian processes makes it challenging to analyze the underlying network dynamics, new and surprising phenomena can arise. Most previous studies treated Markovian processes through either a mean-field type of theory Bottcher2017 ; Bottcher2017a or an effective degree approach Valdez2016 . For non-Markovian processes, the mean-field approximation can still be applied majdandzic2014spontaneous ; Podobnik2015 ; Podobnik2015a ; Majdandzic2016 , but it is necessary to invoke a higher-order theory such as the PA analysis. Our work presents such an example in the context of failure propagation in complex networks.
Our study has demonstrated that, in both models, the network can evolve into a low-failure or a high-failure state, with the latter corresponding to the undesired state of large-scale failure. Both the mean-field and PA theories are capable of predicting the dynamical behaviors of failure propagation, and the performances of the theories are gauged by simulation results, revealing that the more laborious pair approximation gives results in better quantitative agreement with the numerics. Our systematic computations on different complex networks and two types of theoretical analyses have uncovered a striking phenomenon: the non-Markovian memory effect in the nodal recovery can counter-intuitively make the network more resilient against large-scale failures.
Our finding also calls for the incorporation of non-Markovian type of memory factors into the design of communication, computer, and infrastructure networks in various engineering disciplines. We hope our work will stimulate interest in examining and exploiting non-Markovian processes in various network dynamical processes. We have carried out a systematic study of the effects of Markovian versus non-Markovian recovery on network synchronization using the paradigmatic Kuramoto network model, with the main finding that non-Markovian recovery makes the network more resilient against large-scale breakdown of synchronization (Supplementary Note 6).
Methods
A. Mean-field theory for MR dynamics. Let , and be the fractions of -type, -type and -type nodes in the system at time , respectively. A hierarchical set of dynamical equations for the MR model can be constructed to include increasingly longer spatial correlation. The equations for the evolution of the fractions of different types of nodes are:
[TABLE]
and
[TABLE]
where the first term in each equation gives the supply to () due to internal (external) failures and the second term represents the drop in () due to recovery. Note that, because of the relation
[TABLE]
an equation for is unnecessary. The quantity is the probability of an -type node having neighbors of -type nodes at time and thus the node will be infected at the rate .
In general, the quantity involves the correlation between two neighboring nodes. To connect Eqs. (1) and (2) so as to retain the simplicity of a single-node theory, we use the approximation
[TABLE]
where . Equations (1-4) form a set of equations, from which the fractions of different types of nodes can be solved. This is the simplest single-site mean-field approximation for the MR dynamics that ignores any spatial correlation. Despite its simplicity, it is capable of revealing the key features in the stationary state, in which Eqs. (1) and (2) require the fraction of failed nodes to satisfy
[TABLE]
which can be solved for self-consistently with Eq. (4). Equation (5) implies that depends only on the ratios and within the mean-field approximation, and so are the other fractions , , and .
B. Effect of nodal correlation: pairwise approximation for the MR model. Our PA based analysis begins by defining as the fractions of type of links in the system at time , where . A connection that stems out from a node can be classified by a type. For example, for a node with the current state being -type, each link that it carries can be classified into the , , or type, depending on the state of the node at the other end of the link. Taking into account every link from every node, we have that the fractions of links satisfy
[TABLE]
with for .
In general, the equations of single-node quantities, e.g., Eq. (2), necessarily involve quantities of more extensive spatial correlation because the interplay between the failure of a node and the states of its neighboring nodes. Since is the probability of an -type node having an inactive node regardless of the types of the neighbors, the probability that there are exactly neighbors of -type and inactive neighbors of either or type is
[TABLE]
where is the degree of the node. The quantity in Eq. (2), as schematically depicted in Fig. 8(a), is thus given by
[TABLE]
which indicates explicitly that the dynamics of single-node quantities are governed by the two-node quantity . This is reminiscence of the BBGKY (Bogoliubov-Born-Green-Kirkwood-Yvon) hierarchy of equations for the distribution functions in a system consisting of a large number of interacting particles in statistical physics Harris:book . Only under the approximation (so that the two-node correlation can be neglected) will the resulting equation be Eq. (4) - a set of single-node mean-field equations.
To proceed, we derive the dynamical equations for that will in general involve more extensive spatial correlation. For example, a link of the type would evolve into a different type depending on the neighborhoods of the two nodes, effectively a small cluster of nodes. To develop a manageable approximation, we retain the two-node correlation and decouple any longer spatial correlation in terms of one-node and two-node functions. This is the idea behind PA for obtaining a closed set of equations. In particular, the dynamical equations for and are
[TABLE]
and
[TABLE]
where
[TABLE]
is the probability of an -type node having -type neighbors among its neighbors, given that one neighbor is inactive, and
[TABLE]
is the probability of an -type node having -type neighbors among its neighbors, given that one neighbor is active. Figure 8 illustrates the meanings of , , and schematically. The terms in Eqs. (9) and (10) account for how the recovery and failure processes affect the fractions of -type and -type links. The complete set of dynamical equations is listed in Supplementary Note 1, which can be solved iteratively to yield the temporal variations on the type of nodes and the type of links given an initial condition. The steady-state quantities can be obtained through a sufficiently large number of iterations.
C. Pairwise approximation theory for the NMR model. Specifically, we let be the fraction of nodes of type at time , which became type from some other type only time steps ago, and be the fraction of links of the UV type when the corresponding node(s) associated with a link became that of the labeled type and time steps ago. The time evolution of the fraction of -type nodes in the NMR model is given by
[TABLE]
The first line in Eq. (13) gives the new supply due to internal failure of -type nodes in the time duration . The second line accounts for the nodes which were inactive for a duration at time but have not reached the time for recovery at time . The third line states that all -type nodes that came to existence earlier have been recovered. Similarly, the time evolution of the fraction of -type nodes is given by
[TABLE]
where is defined in Eq. (8) and . The fractions of -type and -type nodes, regardless of how long they have been in the corresponding state, are given by and , respectively. The fraction of active nodes follows from .
To develop a PA analysis for failure propagation dynamics with NMR, we construct the equations for the time evolution of -types of links and retain spatial correlation up to two neighboring nodes. Our derivation of the counterparts of Eqs. (13) and (14) in the MR case suggests the necessity to examine the history of the inactive nodes(s) associated with a link. For example, the time evolution of the links in is governed by
[TABLE]
where is defined in Eq. (11). The first line represents the new supply to -type of links due to an internal failure in one of the active nodes associated with a link of the -type, and an internal failure together with a recovery of an inactive node in a link of the - and -types. The second line includes the supply to l-type links due to recoveries from and types as well as the links of l-Δt type that became l type in the recent duration . The last line comes from the fact that an -type node must recover after a time since it became inactive. The fraction of links of -type, regardless of how long the node in the link has taken in the -type, is given by . We thus have that the fraction of -type of links evolves in time as
[TABLE]
where is defined in Eq. (12). Equations for other types of links can also be constructed (Supplementary Note 1). Equations (15) and (16) are analogous to Eqs. (9) and (10) in the MR model. The number of equations is determined by the divisions of and into the small time steps , which increases rapidly when is small compared with the other time scales in the NMR dynamics.
A crude approximation analogous to the mean-field theory can be developed for the NMR model by retaining only the fractions of nodes in the equations, which can be done by decoupling the two-node quantities such as by . The resulting equations governing the fractions of different types of nodes become
[TABLE]
and
[TABLE]
where takes on the approximate form in Eq. (4). Equations (17), (18), and (4) form a set of equations that can be solved to yield the fractions of different types of nodes. The first two terms in Eqs. (17) and (18) correspond to the increase in inactive nodes due to failure and due to those remaining inactive, and the last term corresponds to recovery. The number of equations, again, depends on the choice of . This is the mean-field approximation for the NMR model that ignores any spatial correlation.
Data Availability
The source data underlying Figs. 2-7 and Supplementary Figs. 1-12 are available at https://github.com/zhlin2328/Codes-for-NCOMMS-19-1125220.
Code Availability
C++ codes to reproduce the data in the main text and the Supplementary Information are available at https://github.com/zhlin2328/Codes-for-NCOMMS-19-1125220.
Reference
Acknowledgments
The authors would like to thank Zhenhua Wang for helpful discussions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 11975099, 11575041, 11675056 and 11835003), the Natural Science Foundation of Shanghai (Grant No. 18ZR1412200), and the Science and Technology Commission of Shanghai Municipality (Grant No. 14DZ2260800). YCL would like to acknowledge support from the Vannevar Bush Faculty Fellowship program sponsored by the Basic Research Office of the Assistant Secretary of Defense for Research and Engineering and funded by the Office of Naval Research through Grant No. N00014-16-1-2828.
Author Contributions
Z.-H.L., M.T. and Z.H.L. designed research; Z.-H.L. performed research; Z.-H.L., M.F., M.T., Z.H.L., C.X. and P.M.H. contributed analytic tools; Z.-H.L., M.F., M.T., Z.H.L., C.X., P.M.H. and Y.-C.L. analyzed data; Z.-H.L., M.T., Z.H.L., P.M.H. and Y.-C.L. wrote the paper.
Competing Interests
The authors declare no competing interests.
Correspondence
To whom correspondence should be addressed. E-mail: [email protected]; [email protected]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Motter, A. E. & Lai, Y.-C. Cascade-based attacks on complex networks. Phys. Rev. E 66 , 065102 (2002).
- 2(2) Zhao, L., Park, K. & Lai, Y.-C. Attack vulnerability of scale-free networks due to cascading breakdown. Phys. Rev.E 70 , 035101(R) (2004).
- 3(3) Zhao, L., Park, K., Lai, Y.-C. & Ye, N. Tolerance of scale-free networks against attack-induced cascades. Phys. Rev.E 72 , 025104(R) (2005).
- 4(4) Galstyan, A. & Cohen, P. Cascading dynamics in modular networks. Phys. Rev. E 75 , 036109 (2007).
- 5(5) Bialek, J. W. Why has it happened again? Comparison between the UCTE blackout in 2006 and the blackouts of 2003. In Power Tech 2007 IEEE Lausanne , 51–56 (IEEE, 2007).
- 6(6) Dobson, I., Carreras, B. A., Lynch, V. E. & Newman, D. E. Complex systems analysis of series of blackouts: Cascading failure, critical points, and self-organization. Chaos 17 , 026103 (2007).
- 7(7) Gleeson, J. P. Cascades on correlated and modular random networks. Phys. Rev. E 77 , 046117 (2008).
- 8(8) Rosato, V. et al. Modelling interdependent infrastructures using interacting dynamical models. Int. J. Crit. Infrastruc. 4 , 63–79 (2008).
