Bundled Causal History Interaction
Peishi Jiang, Praveen Kumar

TL;DR
This paper introduces an information-theoretic method to analyze how groups of variables in complex systems causally influence each other over time, demonstrated through a chemical system example.
Contribution
It presents a novel approach using partial information decomposition to quantify interactions between variable groups in probabilistic graphical models.
Findings
Successfully applied to a stream chemistry system to reveal complex dependencies.
Quantifies the strength of interactions and memory effects between variable groups.
Establishes a foundation for studying group interactions in various complex systems.
Abstract
Complex system arises as a result of the nonlinear interactions between components. In particular, the evolutionary dynamics of a multivariate system encodes the ways in which different variables interact with each other individually or in groups. One fundamental question that remains unanswered is: how do two non-overlapping multivariate subsets of variables interact to causally determine the outcome of a specific variable? Here we provide an information based approach to address this problem. We delineate the temporal interactions between the bundles in a probabilistic graphical model. The strength of the interactions, captured by partial information decomposition, then exposes complex behavior of dependencies and memory within the system. The proposed approach successfully illustrates complex dependence between cations and anions as determinants of \textit{pH} in an observed stream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Bundled Causal History Interaction
Peishi Jiang
Praveen Kumar
Ven Te Chow Hydrosystem Laboratory, Department of Civil and Environmental Engineering,
University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
Abstract
Complex systems arise as a result of the nonlinear interactions between components. In particular, the evolutionary dynamics of a multivariate system encodes the ways in which different variables interact with each other individually or in groups. One fundamental question that remains unanswered is: How do two non-overlapping multivariate subsets of variables interact to causally determine the outcome of a specific variable? Here, we provide an information-based approach to address this problem. We delineate the temporal interactions between the bundles in a probabilistic graphical model. The strength of the interactions, captured by partial information decomposition, then exposes complex behavior of dependencies and memory within the system. The proposed approach successfully illustrated complex dependence between cations and anions as determinants of pH in an observed stream chemistry system. In the studied catchment, the dynamics of pH is a result of both cations and anions through mainly synergistic effects of the two and their individual influences as well. This example demonstrates the potentially broad applicability of the approach, establishing the foundation to study the interaction between groups of variables in a range of complex systems.
I Introduction
In complex systems shaped by the interaction of a multitude of variables, an interesting question that remains unanswered is: In what ways do the evolutionary history of two subsets of variables interactively influence the current state of a target variable? Answering this question would be extremely useful in furthering our understanding in the collective behavior of a system’s dynamics, where the interactions of variables in groups play a key role. For instance, in the study of connectivity between different regions of the brain, one may be interested in how a specific reaction pulse is jointly induced by different groups of incoming signals Tononi and Edelman (1998). In stream chemistry, which is shaped by numerous biophysical processes and chemical reactions in both the stream and the contributing landscape, one may be interested in understanding how stream pH level is an outcome of the joint effect of the concentrations of different anions and cations Kirchner and Neal (2013). Addressing how the state of a specific variable at any time is a causal outcome of interaction from the entire or a part of the evolutionary history of a system requires a quantitative approach.
At the most elementary level, this approach calls upon investigating whether and in what way two variables interact with each other (Figure 1a). This type of pairwise interaction has been widely addressed by causal influence analysis Granger (1969); Pearl (1995); Sugihara et al. (2012); Imbens and Rubin (2015). Generally, most causality detection and analysis fall into two categories: the intervention- and non-intervention-based approaches. While the interventional analysis (e.g., Pearl’s causality Pearl (1995)) is more intuitive by investigating how intervening a source variable impacts the target variable, the non-interventional analysis is more applicable in most real life systems where artificial interventions are almost impossible in a multivariate situation (Figure 1b–d). Among the existing non-interventional approaches, Granger causality Granger (1969) has gained significant popularity due to its intuitive statistical interpretation using observed time series to evaluate the cause–effect relation between two variables. This is done by quantifying the reduced uncertainty of the target that is explained by the source conditioned on the knowledge of the remaining variables in the system. In this study, the causal analysis is referred in this Granger sense.
While the original Granger causality only assesses the change of the second-order moment to determine the reduced uncertainty, recent studies extend it to capture the change of the entire probability space by using information measures Schreiber (2000); Runge et al. (2012). For example, transfer entropy Schreiber (2000) quantifies the shared dependency between the current state of a target and the previous state of a source variable given the knowledge of the previous state of the target. Besides, momentary information transfer Runge et al. (2012) combines multivariate transfer entropy with probabilistic graphical model to efficiently estimate the information flow between two lagged variables in a multivariate system. Information measures, therefore, are the core utilities for quantitative analysis in this study, in that they are capable of delineating the nonlinear interactions.
Here, we propose new information measures to quantify and characterize the interactive strength among two bundled variable sets in affecting the present state of a target variable (Figure 1c). This approach allows us: (1) to consider the effect of the entire evolutionary history of all interacting variables, termed causal history Jiang and Kumar (2018, 2019), that determines the current state of a variable of interest; and (2) to characterize such effect by using partial information decomposition (PID) Williams and Beer (2010) framework. We aim to analyze the joint influence from the evolutionary history of two subsets of variables, thus requiring multivariate interaction assessment. This draws upon another key feature of information measure—characterizing the dynamics among multiple components. For instance, to investigate the total information from two source variables to a target (Figure 1b), PID is used for decomposing the total information into different information contents identified as unique, redundant and synergistic. In addition, we further the PID approach to characterize the information from two lagged sources through specific pathways by conditioning the remaining system’s dynamics that are not in the pathways of interest Jiang and Kumar (2018). This approach is called momentary partial information decomposition, building on the idea of momentary information and PID. Another illustrative example of using information measures to assess multivariate interaction is the causal history analysis framework Jiang and Kumar (2019, ), which accounts for the influence from the entire evolutionary dynamics of the system (Figure 1c).
The main contribution of this study is that it is the first time to delineate the overall effect of multiple interacting grouped sources to a target using information theory. It should be noted that the proposed method is fundamentally different from existing information measures. Information theory has been extensively employed to assess pairwise interactions (e.g., transfer entropy Schreiber (2000) and momentary information transfer Runge et al. (2012)), i.e., whether and how a source affects a target. Some recent efforts take one step further to unravel multivariate interactions (e.g., PID and the previous causal history analysis framework). Nevertheless, neither the pairwise interaction nor the current multivariate analysis using information theory takes into account group interactions, which is the key in this study. By grouping variables, the proposed analysis targets dynamics of specific sets of components, thus providing new insights on interactions among different subsets of a system. Such insights cannot be obtained using previous analysis treating variables as individual components.
The rest of the paper is organized as follows. Section II details the proposed information measures for analyzing the bundled causal history interaction. They are developed based on a directed acyclic graph (DAG) representation for time-series. The DAG further serves as the basis for dimensionality reduction of the measures to ensure reliable information estimations. In Section III, the bundled causal history analysis is employed to investigate the joint influence on the stream pH from cations and anions by using a set of observed stream chemistry data. Last, a brief conclusion is drawn in Section IV.
II Methodology
We consider complex systems that can be conceptualized as a multivariate system consisting of variables , varying in time . The current state of a target variable, , is an outcome of interactions in the entire evolutionary dynamics in the causal history Jiang and Kumar (2019) prior to time , . Among the influence from the entire causal history in , here, we investigate the joint effect from the historical states of two specific groups of variables in on . We denote a bundled set containing variables at time as . The entire historical states of is represented as . Now, the aim of our study is to characterize how is jointly driven by causal histories of two bundled sets, and , where .
In the rest of this section, we first develop the information measures for delineating the information flow from the two bundled sets, and , to the target . Then, to achieve reliable information measure estimation, we introduce a two-stage dimensionality reduction approach to reduce the cardinality of the proposed measures.
II.1 Interactive Information Flow from Two Bundled Variables
Let us denote the remaining variables in the system outside of the two chosen bundles as with as the exclusion symbol. The total information, , given by the evolutionary dynamics of two bundled sets to the target can then be expressed as a conditional mutual information (CMI) Shannon and Weaver (1949) between the two bundled sets given the knowledge of , which is given by:
[TABLE]
where . The conditioning on is to exclude the influence from the interactions of the rest of the system in the quantification of the interaction between the two bundles. Note that can belong to any variable in one of the bundled sets or to that in . When , Equation (1) can be considered as a generalized transfer entropy Schreiber (2000). Transfer entropy captures the reduction in the uncertainty associated with the prediction of the current state of a variable given the knowledge of another variable that is in addition to that from the knowledge of its own history. This generalization allows us to characterize the reduction in uncertainty from multiple variables that are in addition to those provided by the variables own history or that of a set to which it belongs.
To characterize the information contents in , we take advantage of Partial Information Decomposition (PID) Williams and Beer (2010), which allows us to decompose into: (1) redundant information —the overlapping information given by two bundled causal histories, and ; (2) synergistic information —the joint information given by and ; and (3) unique information and —the information provided by each bundled set, and , respectively. This is given by:
[TABLE]
Before we develop quantitative estimation, we ask another related question: Do all the historical states in the bundled sets provide information to ? If the answer is no, then when does such influence end as the lag between and any source node in increases? Otherwise, how much information is given by very early dynamics in ? Answering these questions requires the assessment of the memory dependencies due to the bundled causal history on . Therefore, we partition the entire bundled causal history into two complementary components: (1) a recent dynamics from all the states up to a positive time lag , , termed immediate bundled causal history; and (2) the remaining earlier dynamics, , termed distant bundled causal history. By using the chain rule of CMI Cover and Thomas (2006), we can decompose in Equation (1) into the information from the immediate () and distant () bundled causal histories, which are given by:
[TABLE]
Note that information flow from the two partitioned histories, and in Equation (3), are functions of . Quantifying and along with allows the investigation of the memory dependency due to the evolutionary interactions of the two bundled set Jiang and Kumar (2019).
Partitioning into immediate and distant causal histories further highlights the need to characterize the joint interactions of the two bundled sets in the two complementary historical states. This, again, can be achieved by using the PID approach, and is given by:
[TABLE]
where the last equation reflects the sum of the corresponding terms in the previous two equations. Therefore, Equation (4) illustrates the additive contribution of each information content (i.e., synergistic, redundant, and unique components) in the two partitioned histories, and , to the entire bundled causal history, .
II.2 Two-Stage Dimensionality Reduction
Computing information flows in Equations (1)–(4) is infeasible due to the possibly infinite length of historical states involved, resulting in a joint probability density with infinite dimensions. To resolve this issue, we represent the temporal dependencies of the system as a Directed Acyclic Graph (DAG), where the dimensionality reduction is performed in the following two stages. First, we employ the probabilistic graphical model approach developed by Eichler and Runge Eichler (2012); Runge et al. (2012) that allows a reduction of the infinite historical states in the above equations into a finite set, by assuming Markov property for the DAG. Then, a further dimension reduction is achieved through reducing the DAG by eliminating “redundant” edges. This elimination is performed by using weighted transitive reduction Bosnacki et al. (2010) with momentary information transfer serving as the weights. This approach is called Momentary Information Weighted Transitive Reduction (MIWTR) Jiang and Kumar . The second stage of dimensionality reduction is to avoid potential high albeit finite cardinalities of the joint probability in computing the information measures after the first round of reduction.
Stage 1: From infinite to finite cardinality—a probabilistic graphical model approach. Figure 2 illustrates the use of a DAG for time-series representation. The DAG, , includes a set of directed edges and a set of nodes connected by edges in . Every state in is represented by a node in , and a directed edge in connecting from an earlier node to a recent node (), , refers to the direct influence from to . An illustration of using the DAG for time-series to depict the temporal multivariate dynamics is shown in Figure 2a through a system consisting of seven components, . We consider as the target, as the first bundled causal history, and as the second bundled causal history. The nodes involved in the bundled causal histories are highlighted in blue for immediate bundled causal history up to the partitioning time lag , and in orange for the remaining distant bundled causal history . The historical states outsize of those two bundled sets in the system, , are denoted as gray nodes.
Estimation of the quantities in Equations (1) and (3), and therefore the corresponding Equations (2) and (4), are challenging because the condition set has a large dimension due to its potentially very long history. Therefore, to avoid the curse of dimensionality in computing Equations (1)–(4), the Markov property for the DAG for time-series, as developed by Lauritzen et al. Lauritzen et al. (1990), is assumed. Loosely speaking, the Markov property for the graphical model states that a node is independent of its non-descendants in given the knowledge of its parents, denoted by . By using the Markov property, the information flow from the entire (), the immediate (), and the distant () bundled causal histories in Equations (1) and (3) can be revised as (see Figure 2b):
[TABLE]
where is the parent set of all the nodes in the bundled causal history and the target ; is the intersection of the parents of the target, , and the immediate bundled causal history, ; and is the parent set of the immediate history belonging to the distant history. Figure 2b illustrates , , and in blue, orange, and gray nodes, respectively, in the seven-component system. Equation (5) states that while information from immediate and distant bundled causal histories is aggregated at and influencing , respectively, the conditioning on blocks the information from the remaining dynamics in the system, , on the interaction between and .
The usage of the Markov property successfully reduces the infinite nodes in immediate () and distant () bundled causal histories into two finite sets, and in Equation (5), respectively. However, the condition set in Equation (5) still contains possibly infinite nodes, making the computation infeasible. Therefore, we now further adopt two orders of approximations on , and explore the corresponding implications. At the zeroth order (Order-0), we assume the condition set to be empty, i.e., . This approach of not conditioning on the states in the remaining variables allows the information from to influence the estimation of the interaction between the target and the bundled causal history. At the first order (Order-1) approximation, the condition set is allowed to include the parents of the target in the remaining variables , i.e., , denoted as gray hatched nodes in Figure 2b. The Order-0 approximation mimics the idea of mutual information, which aims at capturing the shared dependency between and . On the other hand, the Order-1 approximation is consistent with the insight of transfer entropy Schreiber (2000), such that we prevent the influence of the states in the remaining system directly affecting the target, represented by , from characterizing the information flowing from the bundled causal history. Note that the simplification due to the Markov property in Equation (5) and the two approximations in the condition set and can be also used in computing the synergistic, redundant, and unique information in Equation (4).
Stage 2: From high to low cardinality—MIWTR approach. The cardinality of Equation (5) can be potentially high in a strongly interacting multivariate system, leading to higher uncertainty in the estimation of information measures. Here, we adopt a recently-proposed Momentary Information Weighted Transitive Reduction approach Jiang and Kumar to further reduce the dimensionality of Equation (5) by simplifying the DAG. The basic idea of MIWTR is to first exclude any “redundant” edges connecting a node in with node in immediate history by using weighted transitive reduction, and then remove any node in which are now not directly linked to the nodes in , thereby resulting in reduced cardinality of . Here, the edge weight, representing the information flowing through the edge, is measured by momentary information transfer Runge et al. (2012) which quantifies the shared dependency between two linked nodes conditioned on their parents. The “redundancy” of a directed edge linking two nodes by using WTR, say to , is assessed according to the existence of an indirect path connecting and as well as the weights of the edges involved. That is, a directed edge, , is considered “redundant” and thus removed, if and only if there exists a path indirectly linking and and the minimum weight of all the edges in this indirect pathway is larger than that of . In other words, the existence of an indirect pathway, whose capacity of conveying information from to is stronger than the direct channel between the two nodes, makes the direct edge obsolete. More details of MIWTR can be found in Jiang and Kumar .
III Application: Bundled Causal Interaction in Stream Chemistry Dynamics
We used this bundled causal history approach to analyze a set of published stream solute data Kirchner and Neal (2013) to understand how two groups consisting of cations and anions affect pH. The data were recorded every 7-h from March 2007 to January 2009, in the Upper Hafren catchment in the United Kingdom. The catchment, approximately 20 km from the western coast, is mainly covered by grassland over acidic soils. To investigate how different cations and anions jointly determine the pH level of the stream, we considered the cations {Na+, Al3+, Ca2+} as the first bundled set, the anions {Cl-, SO42-} as the second bundled set, and {pH, (the logarithm of flow rate)} as the remaining variables. Based on the observed data shown in Figure 3a, we constructed the DAG for time-series by using Tigramite algorithm Runge et al. (2012); Runge (2015); Runge et al. (2015, 2017). Generally, the algorithm first builds up preliminary links between nodes by using mutual information-based independence test, and then removes any spurious links by using CMI-based independence test by conditioning on the parents of the connected two nodes. The resulting DAG is shown in Figure 3b, with the estimation methodology for the graph detailed in Jiang and Kumar (2019).
Based on Equation (5), the current state of the target pH, the parents of the target in the bundled causal history (), and the parents of the immediate bundled causal history () are denoted in black, blue, and orange colors, respectively, in Figure 3b. The Order-1 approximation of the condition set, , is colored in red (note that is an empty set). Figure 3b shows that consists of 23 nodes which results in high dimensionality of the condition set . Therefore, we reduce the dimensionality of using the MIWTR approach (see the Methodology Section for details). The reduced obtained by using MIWTR is shown in Figure 3c, where we see that the number of nodes in is reduced from 23 to 11.
We next computed the information flow from the entire (), immediate (), and distant () bundled causal histories in Equation (5) as well as their synergistic, redundant, and unique components in Equation (4) by using -nearest-neighbor (NN) estimator Kraskov et al. (2004). NN estimator was employed in this study because of its better estimation performance in using short dataset compared with other methods (e.g., binning approach and kernel density estimation Khan et al. (2007); Walters-Williams and Li (2009)). To assess the sensitivity of choosing , we computed the information flow in Equations (4) and (5) with for both orders of approximations and for . Different information contents of the PID framework in Equation (4) was estimated using a rescaled approach for calculating the redundant information proposed in Goodwell and Kumar (2017) and also used in Jiang and Kumar (2018). The results are plotted in Figure 4. The figure shows that the information measures corresponding to different values generally captures similar patterns for each order of approximations. However, the limited data length (shorter than 2000) Kirchner and Neal (2013) and the data gaps shown in Figure 3a results in shorer usable data lengths as gets larger Jiang and Kumar (2019), and impedes reliable estimation. The limitation in data leads to the large wiggles and peaks in the estimation for some values, such as the significant drop of in the Order-0 approximation when , and the spike in the Order-1 approximation when . This outcome points to the need for further research to investigate the reliability of information metrics estimation by using NN estimator under different data lengths and values. Here, we chose and 5 for the Order-0 and Order-1 approximations in , respectively, for illustration.
The estimated information components from immediate () and distant () bundled causal histories, over partitioning time lag from 5 to 150, are plotted in Figure 5a,b for the Order-0 and Order-1 approximations, respectively. In both approximations of , the information from earlier dynamics converges to a non-zero value with increasing (the area above the dotted black line). This illustrates the long-term dependence of pH on the selected cation and anion groups, which consist of both unique information ( and ) and synergistic information (). Moreover, in comparing Order-0 and Order-1 approximations, while larger values of information contents are expected in Order-0 approximation (since there is no conditioning), the different portions of these information in the two approximations reveals Order-1’s condition effect.. When not conditioning on the dynamics from the remaining variables (i.e., and pH), Figure 5a shows that cations provide very dominant unique information ( and ) to the current state of pH in both histories. This is because cations consists of three of the overall four nodes in in Figure 3, thus dominating the contributions that affect pH. Nevertheless, there still exists a certain amount of redundant information from recent dynamics, , and synergy from both histories, and . This captures the overlapping and joint effects due to the dynamics of cation and anion concentrations. On the other hand, in the Order-1 approximation, given the knowledge of the historical states in the remaining variables directly affecting pH, that is, , we single out the information from the entire bundled causal history transferred only through . Preventing the impact from the remaining system on pH, through conditioning on , reduces the total information significantly, from 1.3 nats to 0.14 nats. In particular, the redundancy in immediate bundled causal history, (which is mainly induced by the dependence of solutes on flow rate Jiang and Kumar (2019)), diminishes. This implies no overlapping influence from cations and anions on pH. Meanwhile, the synergistic effect from recent dynamics, , now occupies a much larger proportion of the total information, illustrating the interactive influence on pH due to the chemical interactions between cations and anions. This analysis, which illustrates a quantitative way to characterize the information guiding the current state of the stream pH level transferred from the selected cations and anions in the stream, can be generalized to other multivariate systems.
IV Conclusion
We present an information flow-based framework to characterize the joint influence from the evolutionary dynamics of two groups of variables on the present state of a target variable. Partitioning the total information into synergistic, redundant, and unique components helps delineate different information characteristics due to the two bundled sets. This framework was applied to observed stream chemistry datasets, and successfully showed the joint impacts of cations and anions on stream pH.
The proposed information measures are fundamentally different from the causal detection analysis and other existing information measures (see Figure 1). The causal detection techniques (e.g., transfer entropy Schreiber (2000)) are used for analyzing whether and how a source affects a target, or learning a pairwise interaction pattern. Meanwhile, the proposed bundled interaction analysis, rooted in multivariate analysis, aims to characterize the joint outcome of the evolutionary dynamics of two bundled source sets. This unique feature also allows the proposed measures to be distinct from other information measures (e.g., the previous causal history analysis framework Jiang and Kumar (2019)).
One key issue associated with most multivariate analysis is the curse of dimensionality, which impairs reliable estimations on the corresponding measures. Here, we propose a two-stage approach to reduce the dimensions of the information measures. Based on the DAG for time-series, the Markov property is first assumed to reduce the dimensions from infinite in Equation (3) to finite in Equation (5), and a further reduction is performed by simplifying the DAG by using weighted transitive reduction. Furthermore, while the reduced graph might also affect the computed dynamics, we assume that such impact is small compared with the estimation bias induced by the high-dimensionality. This is especially true when the original graph is highly connected and many edges are removed using the two-stage approach, such as the stream chemistry example. The effectiveness of this approach was verified by the application on stream chemistry dynamics.
The proposed two orders of approximation of the influence from the rest of the system further illustrate such characterization under varying impacts from the remaining variables. Characterizing such information in both immediate and distant bundled causal histories, on the other hand, details the delineation of the influence due to a recent and the complementary earlier dynamics. In the stream chemistry application, the analysis shows that the influences of cations and anions on determining the dynamics of pH in the studied catchment are mainly from their synergistic effect and also from their individual impacts through unique information. Such phenomenon is masked (i.e., the top of Figure 5), when the dynamics of the remaining system (i.e., flow rate and earlier history of pH) are not conditioned in the computation of the information measures in Equation (5) (for the case of Order-0 approximations on ).
With the increasing availability of observed time-series data, such multivariate analysis framework opens new avenues for understanding the role of different groups of components in controlling the dynamics of a complex system.
Acknowledgements.
Funding support from the following NSF grants are acknowledged: EAR 1331906, ICER 1440315, EAR 1417444, and OAC 1835834. The directed acyclic graph for time-series of the stream chemistry example is estimated by using the Tigramite package Runge et al. (2012); Runge (2015); Runge et al. (2015, 2017). The codes for conducting momentary information weighted transitive reduction in Fig. 3 and calculating the information flows in Fig. 5 are available at: https://github.com/HydroComplexity/CausalHistory.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Tononi and Edelman (1998) G. Tononi and G. M. Edelman, Science 282 , 1846 (1998) . · doi ↗
- 2Kirchner and Neal (2013) J. W. Kirchner and C. Neal, Proceedings of the National Academy of Sciences 110 , 12213 (2013) . · doi ↗
- 3Granger (1969) C. W. J. Granger, Econometrica 37 , 424 (1969) .
- 4Pearl (1995) J. Pearl, Biometrika 82 , 669 (1995) . · doi ↗
- 5Sugihara et al. (2012) G. Sugihara, R. May, H. Ye, C.-h. Hsieh, E. Deyle, M. Fogarty, and S. Munch, Science 338 , 496 (2012) . · doi ↗
- 6Imbens and Rubin (2015) G. W. Imbens and D. B. Rubin, Causal inference in statistics, social, and biomedical sciences (Cambridge University Press, 2015). · doi ↗
- 7Schreiber (2000) T. Schreiber, Phys. Rev. Lett. 85 , 461 (2000) . · doi ↗
- 8Runge et al. (2012) J. Runge, J. Heitzig, V. Petoukhov, and J. Kurths, Phys. Rev. Lett. 108 , 258701 (2012) . · doi ↗
