Estimation of Information Flow-Based Causality with Coarsely Sampled Time Series
X. San Liang

TL;DR
This paper introduces a new method for analyzing causality in time series data sampled at low frequencies, using Lie groups instead of Lie algebras.
Contribution
A novel approach using Lie groups to improve causality estimation in coarsely sampled nonlinear systems.
Findings
The new method works well for linear systems and reduces bias in nonlinear systems with low sampling rates.
The approach was successfully tested on coupled Rössler oscillators, even when they were nearly synchronized.
The method relies on sample covariances and avoids complex differential equations.
Abstract
The past decade has seen growing applications of the information flow-based causality analysis, particularly with the concise formula of its maximum likelihood estimator. At present, the algorithm for its estimation is based on differential dynamical systems, which, however, may raise an issue for coarsely sampled time series. Here, we show that, for linear systems, this is suitable at least qualitatively, but, for highly nonlinear systems, the bias increases significantly as the sampling frequency is reduced. This study provides a partial solution to this problem, showing how causality analysis can be made faithful with coarsely sampled series, provided that the statistics are sufficient. The key point here is that, instead of working with a Lie algebra, we turn to work with its corresponding Lie group. An explicit and concise formula is obtained, with only sample covariances involved.…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —National Science Foundation of China
- —Shanghai Institute for Mathematics and Interdisciplinary Sciences
- —Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai)
- —Fudan University
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Model Reduction and Neural Networks · Chaos control and synchronization
1. Introduction
Causality analysis is a fundamental task in scientific research. Though traditionally formulated as a statistical problem (see, for example, the classics by Granger [1], Pearl [2], and Rubin [3,4]) in data science and computer science, among other disciplines, formalisms within the framework of dynamical systems have also been established; refer to a focus issue of Chaos [5] for relevant references. Particularly, in terms of information flow/transfer, it has been argued that causality is “a real notion in physics that can be derived ab initio” [6]. A comprehensive study with generic systems has been conducted, and explicit formulas have been attained in closed form [6,7] with the aid of the Frobenius–Perron operator (e.g., [8]), a technique also exploited in similar studies (e.g., [9]). These formulas have been validated with benchmark systems such as the baker transformation, Hénon map, etc., and have been applied successfully to real-world problems in diverse disciplines, such as global climate change, dynamic meteorology, land–atmosphere interaction, data-driven prediction, near-wall turbulence, neuroscience, financial analysis, and quantum information; see [10] for a brief review of the applications and [11,12,13,14,15,16] for some updates.
For the purpose of this study, we first give a brief introduction to the theory within the framework of a differential dynamical system (also available for discrete-time mappings; see [6]). Let
be a d-dimensional continuous-time stochastic system for (we do not distinguish notations for random and deterministic variables), where may be arbitrary nonlinear differentiable functions of and t, is a vector of white noise, and is the matrix of perturbation amplitudes, which may also be any differentiable functions of and t. This setting, particularly the setting in the absence of stochasticity, avails us of the arsenal from physics in approaching the problem. For example, the concept of symmetry has been found to play an important role [17]. Using the Frobenius–Perron operator, Liang (2016) [6] proved that the rate of information flowing from to (in nats per unit time) is
where signifies , E stands for mathematical expectation, , is the marginal probability density function (pdf) of , is the pdf of conditioned on , and . The algorithm for the information flow-based causal inference is as follows: if , then is not causal to ; otherwise, it is causal, and the absolute value measures the magnitude of the causality from to . This is guaranteed by a property called the “principle of nil causality”. Another property, as proven by Liang (2018) (see [10]), regards the invariance upon coordinate transformation, indicating that the obtained information flow (IF) is an intrinsic property in nature. Also established is that [6], for a linear model, i.e., for with and being constant matrices in (1), the information flow rate from to is
where is the population covariance of and . By this, it follows that, in the linear sense, causation implies correlation, but not vice versa. In an explicit expression, this corollary expresses mathematically the debate on causation vs. correlation ever since George Berkeley (1710) [18].
In the case with only d time series , the quantitative causality, i.e., the IF, between them can be estimated using maximum likelihood estimation (see [19,20]). Under the assumption of a linear system with additive noise, the maximum likelihood estimator (mle) of (2) for is [20]
where is the sample covariance between and , are the cofactors of the matrix , and is the sample covariance between and a series derived from using the Euler forward differencing scheme: , with being some integer. Equation (3) is rather concise in form, involving only the common statistics, i.e., sample covariances. The transparent formula makes causality analysis, which otherwise would be complicated, very easy and computationally efficient. Note, however, that Equation (3) cannot replace Equation (2); it is merely the maximum likelihood estimator (mle) of the latter. Statistical significance tests can be performed for the estimators. This is achieved with the aid of a Fisher information matrix. See [19,20] for details.
Originally, the formalism was established within the framework of a differential system; in other words, it is with infinitesimal time increments. (The formalism with discrete mappings was also established by Liang (2016) [6], but no estimation has ever been made so far.) A question naturally arises about its applicability in the case of coarsely sampled time series. Indeed, it is not unusual that the given series may be coarsely sampled due to limited observations. As will be seen in the following section, this may pose a problem for nonlinear systems if the sample interval is large. This paper henceforth attempts to address this issue in the original linear framework. In the following, we first check the applicability of (3) for series from a linear system and a highly nonlinear system (Section 2), with a variety of sampling intervals. A new approach is presented in Section 3, which is then utilized to reconduct the causal inferences in Section 2. Some remaining issues are discussed in Section 5.
2. The Issue with Coarsely Sampled Series
2.1. Time Series from Linear Systems
We first test the applicability of (3), as the sampling interval increases, with a well-studied linear system whose IF rates have been found half-analytically. This is the validation example in [19]:
where , are components of independent white noise. It has been shown that the rates of information flow per unit time are as , and for all t, reflecting accurately the one-way causality from to . Now, using the same sample path as that in Liang (2014) [19], we resample the series with low frequencies to obtain new series. (Note that, because of the pseudorandom number generator, the generated sample path using the normal differencing scheme may not be satisfactory. To see whether the obtained sample path is correct, one may check the resulting covariances, which can be rather accurately obtained by solving a deterministic ODE. Here, the data for the generated sample path can be downloaded from http://www.ncoads.org/article/show/68.aspx under the item PRE_2014.dat (accessed on 1 January 2020).) Shown in Figure 1 is part of the sample path, with triangles marking the sampling points.
The computed IFs for different sampling intervals are listed in Table 1. We also compute the confidence intervals at a level of 90% (at a significance level of 0.1). First, the estimators for all SIs here are significantly distinct from zero, while those in the opposite case, , are not significant at a level of 90%. Thus, the causality in a qualitative sense can be faithfully recovered even with very low sampling frequencies (large SIs). (In fact, even with SI = 1000, the result is still correct; we do not consider cases beyond SI = 500 since the sample size is too small for SI > 500, resulting in insufficient statistics.)
Since this example actually has a half-analytical solution ( , ), further conclusions can be made about the computed results. Generally, the result of appears satisfactory. For , it is rather accurate for SI ≤ 100. Beyond 100, it is no longer accurate.
2.2. Time Series from Synchronized Chaotic Oscillators
The following example is from the synchronization problem as examined by Paluš et al. [21]. The system is composed of two Rössler oscillators, and , where
is the master system, and
is the driven one. Following [21], we choose and . Using the Runge–Kutta scheme and choosing a time step , the coupled six-dimensional system can be solved rather accurately with different . Figure 2 plots the solutions of and when the coupling strength (upper panel) and (lower panel). As shown in the latter case, the two subsystems become synchronized if . Again, we choose to study the problem for ( time steps in total).
For each , we generate six time series of steps and evaluate the IFs according to Equation (3) ( is chosen). The IFs as functions of are then obtained, and they are plotted in Figure 3a, which accurately shows that the master is , and is the slave. An interesting observation is that this causality inference works even when the two oscillators are nearly synchronized as , demonstrating the power of this rigorously formulated causality analysis.
We subsample the series at every SI step, SI = 10, 50, 100, 300, 500, and reconduct the computation using the same scheme. The resulting IF rates are shown in Figure 3b–f, respectively. With the preset causality, the dashed line should be the zero line. Clearly, the causal inference works well for SI ≤ 10. The computed IF becomes biased for SI ≥ 50, and the bias grows significantly as the SI increases. If we focus on , i.e., when the systems are not synchronized (see [21]), the causal inference still functions well for SI ≤ 50. If the synchronized cases are taken into account ( > 0.15), then the inferences in the cases for are significantly biased, and those for SIs exceeding 300, corresponding to an approximate sampling frequency of 20 per period, are no longer correct.
3. Approaching a Partial Solution
As shown above, if the sampling frequency of the time series is low, the resulting linear IF for nonlinear series may be biased. Indeed, in the case with high nonlinearity, the linear assumption is always easy to blame. While theoretically it is not a problem (causality is guaranteed, as it is proven in a theorem), we agree that, before a fully nonlinear algorithm is developed, this will be a continuing issue. What we want to show here is, how much room there is for improvement. At present, the algorithm documented in [19], and later in [20], is based on the Bernstein–Euler differencing scheme, which is, of course, very rudimentary due to the first-order differencing. If a time series is coarsely sampled, the error could be large.
A theorem established by Liang (2008) [7] reads that, if the noise is additive in Equation (1), i.e., if is a constant matrix, then the noise itself does not appear in the formula of . Thus, under the additive noise assumption, we can estimate the IF within the framework of a deterministic system. In this case, note that the linear equation set can be solved for an interval , regardless of the size of . This gives insight regarding a solution to the low sampling frequency problem.
Consider
where is a matrix. Let us assume that , since the time series can always be pretreated by removing the linear trend, and it has been proven that this removal does not alter the IF rates. In this case, on the interval , we actually have a mapping that takes the state to the state at , with the propagating operator
It is not easy to estimate , but it is easy to estimate instead, by observing the relation
This is written in matrix form as
for . Averaging all rows of the algebraic equation set, and subtracting the mean from each row, we get
where , , i.e., the series is the series advanced by one step. Let i run through . We have the following d overdetermined equation sets:
Denote by the matrix ; then, the matrix of unknowns in the above equation sets is . Left multiplication by
on both sides yields d equation sets:
where is the sample covariance matrix of , , and is the sample covariance between and , i.e., advanced by one time step. The least-square solutions of the overdetermined sets (10) are the solutions of (11),
and hence
The estimator of is, therefore,
(Caution should be used in cases of singularity. The irrelevant imaginary part also should be discarded.)
Once we have obtained A, and hence the coefficients , we substitute for the whole part
in Equation (3), i.e., we multiply by to arrive at the desideratum, . If we denote by the extraction of the entry of the matrix , this is
(Note here that log is the matrix logarithm. In MATLAB, the function is logm.)
4. The Coarsely Sampled Series Problem Revisited
As demonstrated above, for series generated from linear systems, the estimation of the IF is satisfactory qualitatively. Nevertheless, we want to examine how the new scheme may improve the results. Shown in Table 2 is a recalculation of the estimates. Since this case has a ground truth (half analytical) (≈0.11 nats per unit time), we can see that the result is accurate enough for all SIs.
The new scheme for the estimation is particularly satisfactory for the nonlinear case. For the pair of Rössler oscillators, the computed results are plotted in Figure 4. Compared to Figure 3, it can be seen that the performance is significantly improved. Consider the cases with SI ≤ 100 first. To see the improvement more clearly, we introduce a ratio to measure the performance of the one-way causal inference; the smaller the value, the more accurate the result, with taken as insignificant. By this standard, the cases with SI ≤ 10 (Figure 3a,b) appear satisfactory, but the causal relations as shown in Figure 3c,d are inaccurate or even incorrect. Specifically, for , i.e., when the system is synchronized, reaches and for the cases of SI = 50 and SI = 100, respectively. For , the inference appears satisfactory, albeit inaccurate. However, at , , the inference is incorrect. In contrast, the inferred causalities in Figure 4a–d are rather accurate for the coupling strengths considered (both synchronized and nonsynchronized), with all the r’s being insignificant.
For the case SI = 300, which corresponds to a sampling frequency of 20 points per period, the one-way causality is accurately recovered for the nonsynchronized cases ( ). However, beyond this, i.e., , the inference fails. In particular, when SI = 500 (Figure 4f), the result is even worse than its counterpart obtained with the traditional scheme as plotted in Figure 3f. This, of course, may be due to the resulting small ensemble size, which causes singularity to the matrix logarithm. Consider the extreme limit when the series are completely synchronized. In this case, it is not possible to determine whether there exists a causal relation or not, as they are identical, impossible to be differentiated. Correspondingly, this means a singular covariance matrix. Now, in the above problem, the series are nearly synchronized. Given the length, the size of the ensemble thus formed reduces as the sampling interval increases. This tends to increase the condition number of the resulting covariance matrix. Whenever a matrix is ill-conditioned, its logarithm is very sensitive, leading to large errors in the subsequent computation.
Nonetheless, the success in applying the linear estimator to such a highly nonlinear system is remarkable. The reason for this is probably the same as that for linearization; that is to say, on a small interval, a linearized system can provide a good approximation to an otherwise nonlinear system. While its applicability is yet to be proven, or investigated with more nonlinear problems, by our experience, it indeed works well, provided that the statistics are sufficient (stochastic systems or chaotic deterministic systems) and the sampling interval is not too large.
5. Discussion
The maximum likelihood estimator of the information flow (IF), i.e., Equation (3), provides an easy way toward causal inference. Theoretically, it is based on a linear assumption, but, practically, it has shown considerable success with series generated from highly nonlinear systems; anyhow, linearization piecewise in time proves to be an efficient asymptote to an otherwise nonlinear system. In reality, series may be coarsely sampled; the time resolution may be low. An issue thus arises, as this formalism is theoretically on the basis of infinitesimal time increments. In this case, as we have shown, it still works for linear systems in a qualitative sense; however, for a highly nonlinear system composed of two Rössler oscillators, the bias becomes increasingly significant as the sampling frequency is reduced.
A new scheme has been proposed to address this problem and provide a partial solution. Due to a property of the IF, as proven in [7], that additive noise does not alter the IF in form, it is reasonable to directly estimate the IF without paying attention to the stochasticity. Instead of estimating through the vector field using Euler–Bernstein differencing, we choose to consider the propagator on the finite time interval, i.e., to estimate the Lie group members. In doing so, the original Formula (3), which is rewritten here for ease of reference,
is replaced with (13),
where , and is the sample covariance between and , i.e., advanced by one time step. Note that here log is the matrix logarithm; in MATLAB (Version 23.2 or earlier), the function is logm. (The spurious imaginary part, if it arises, should be discarded.) In this way, the preset causality within the coupled system of chaotic oscillators can be rather accurately reproduced even when the sampling interval is large (sampling frequency is low). For convenience, this is summarized as an algorithm (Algorithm 1). Algorithm 1: Causal inference through information flow estimationinput: d time series and time step size output: a causal graph and IFs along edges evaluate the covariance matrices and ; compute ;for each do compute by (13), i.e., endreturn , together with the IFs
There is still much room for improvement in the above approach. For example, the estimation of the covariances in the quotient is achieved by replacing the population covariances with sample covariances, while the sample is formed from the time series. While this is satisfactory for stochastic systems under the ergodic assumption, this may not be suitable for deterministic chaos, such as in the case of the Rössler oscillators studied here. The reason is obvious: the time mean of the series in Figure 2 is zero, but one can imagine that the ensemble mean of all possible paths is by no means zero; rather, it should be a function of time (like the series itself), which may be close to the asymptotic linear system solution. Thus, it is more reasonable to treat the linear system solution as the mean. As such, we have attempted to improve the estimation by replacing the covariances of with those of , where stands for the resulting linear system solution. With this, we obtain another causal inference result for SI = 300; the resulting IFs are plotted in Figure 5. As one can see, the result appears rather accurate, as expected, in contrast to Figure 4e.
We, however, do not claim that we have solved the problem. What we want to show here is, how much room there is for improvement within a linear framework in estimating Equation (2), the original fully nonlinear formula . Indeed, in the case of high nonlinearity, the linear assumption is always easy to blame. While theoretically it is not a problem (causality is guaranteed as proven in a theorem; see [6] and other references), it is believed that, before a fully nonlinear algorithm is developed (when this paper was prepared, ref. [22] was not published), this will be a continuing issue.
It should be pointed out that the above algorithm works only when the sampling interval does not exceed the scale in question. For processes with multiple scales involved, different samplings may result in different physically meaningful information flow rates. In this case, it is not just a computational issue any more; it is a physical problem that deserves for an in-depth investigation. We will leave this problem, among others, for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Granger C.W.J. Investigating causal relations by econometric models and cross-spectral methods Econometrica 19693742443810.2307/1912791 · doi ↗
- 2Pearl J. Causality: Models, Reasoning, and Inference 2nd ed.Cambridge University Press Cambridge, UK 2000400 p
- 3Rubin D.B. Estimating causal effects of treatments in randomized and nonrandomized studies J. Edu. Psych.19746668870110.1037/h 0037350 · doi ↗
- 4Imbens G.W. Rubin D.B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction Cambridge University Press Cambridge, UK 201510.1017/CBO 9781139025751 · doi ↗
- 5Bollt E.M. Sun J. Runge J. Introduction to Focus Issue: Causation inference and information flow in dynamical systems: Theory and applications Chaos 20182807520110.1063/1.504684830070534 · doi ↗ · pubmed ↗
- 6Liang X.S. Information flow and causality as rigorous notions ab initio Phys. Rev. E 20169405220110.1103/Phys Rev E.94.05220127967120 · doi ↗ · pubmed ↗
- 7Liang X.S. Information flow within stochastic dynamical systems Phys. Rev. E 20087803111310.1103/Phys Rev E.78.03111318850999 · doi ↗ · pubmed ↗
- 8Lasota A. Mackey M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics 2nd ed.Springer New York, NY, USA 1994472 p
