Digital System Reconstruction by Pairwise Transfer Entropy

Zhong-Qi Kyle Tian; Douglas Zhou; David Cai

arXiv:1905.03972·q-bio.QM·May 13, 2019

Digital System Reconstruction by Pairwise Transfer Entropy

Zhong-Qi Kyle Tian, Douglas Zhou, David Cai

PDF

Open Access

TL;DR

This paper proposes a low-order, pairwise transfer entropy method for reconstructing digital system structures from binary data, effectively distinguishing causal links and quantifying coupling strength.

Contribution

It introduces a practical, low-dimensional pairwise TE framework that reliably detects causality and infers structural connectivity in digital systems.

Findings

01

TE values differ significantly between connected and unconnected pairs

02

TE value correlates quadratically with coupling strength

03

Method is robust across various systems and regimes

Abstract

Transfer entropy (TE) is an attractive model-free method to detect causality and infer structural connectivity of general digital systems. However it relies on high dimensions used in its definition to clearly remove the memory effect and distinguish the direct causality from the indirect ones which makes it almost inoperable in practice. In this work, we try to use a low order and pairwise TE framework with binary data suitably filtered from the recorded signals to avoid the high dimensional problem. Under this setting, we find and explain that the TE values from the connected and unconnected pairs have a significant difference of magnitude, which can be easily classified by cluster methods. This phenomenon widely and robustly holds over a wide range of systems and dynamical regimes. In addition, we find the TE value is quadratically related to the coupling strength and thus we can…

Equations31

⎩ ⎨ ⎧ C \frac{d V _{i}}{d t} \frac{d m _{i}}{d t} \frac{d hi}{d t} \frac{d n _{i}}{d t} = - (V_{i} - V_{N a}) G_{N a} m_{i}^{3} h_{i} - (V_{i} - V_{K}) G_{K} n_{i}^{4} - (V_{i} - V_{L}) G_{L} + I_{i}^{input} = (1 - m_{i}) α_{m} (V_{i}) - m_{i} β_{m} (V_{i}) = (1 - h_{i}) α_{h} (V_{i}) - h_{i} β_{h} (V_{i}) = (1 - n_{i}) α_{n} (V_{i}) - n_{i} β_{n} (V_{i})

⎩ ⎨ ⎧ C \frac{d V _{i}}{d t} \frac{d m _{i}}{d t} \frac{d hi}{d t} \frac{d n _{i}}{d t} = - (V_{i} - V_{N a}) G_{N a} m_{i}^{3} h_{i} - (V_{i} - V_{K}) G_{K} n_{i}^{4} - (V_{i} - V_{L}) G_{L} + I_{i}^{input} = (1 - m_{i}) α_{m} (V_{i}) - m_{i} β_{m} (V_{i}) = (1 - h_{i}) α_{h} (V_{i}) - h_{i} β_{h} (V_{i}) = (1 - n_{i}) α_{n} (V_{i}) - n_{i} β_{n} (V_{i})

α_{m} (V_{i}) = \frac{0.1 V _{i} + 4}{1 - exp ( - 0.1 V _{i} - 4 )}

α_{m} (V_{i}) = \frac{0.1 V _{i} + 4}{1 - exp ( - 0.1 V _{i} - 4 )}

α_{h} (V_{i}) = 0.07 exp (- (V_{i} + 65) /20)

α_{n} (V_{i}) = \frac{0.01 V _{i} + 0.55}{1 - exp ( - 0.1 V _{i} - 5.5 )}

\frac{d G _{i} ( t )}{d t} = - \frac{G _{i} ( t )}{σ _{r}} + H_{i} (t)

\frac{d G _{i} ( t )}{d t} = - \frac{G _{i} ( t )}{σ _{r}} + H_{i} (t)

\frac{d H _{i} ( t )}{d t} = - \frac{H _{i} ( t )}{σ _{d}} + f l \sum δ (t - T_{i l}^{F}) + j \neq = i \sum A_{ij} l \sum S δ (t - T_{j l}^{S})

\frac{d H _{i} ( t )}{d t} = - \frac{H _{i} ( t )}{σ _{d}} + f l \sum δ (t - T_{i l}^{F}) + j \neq = i \sum A_{ij} l \sum S δ (t - T_{j l}^{S})

T_{Y \to X} (τ) = \sum p (x_{n + 1}, x_{n}^{(k)}, y_{n - τ}^{(l)}) lo g \frac{p ( x _{n + 1} ∣ x _{n}^{(k)} , y _{n - τ}^{(l)} )}{p ( x _{n + 1} ∣ x _{n}^{(k)} )}

T_{Y \to X} (τ) = \sum p (x_{n + 1}, x_{n}^{(k)}, y_{n - τ}^{(l)}) lo g \frac{p ( x _{n + 1} ∣ x _{n}^{(k)} , y _{n - τ}^{(l)} )}{p ( x _{n + 1} ∣ x _{n}^{(k)} )}

p_{a, b} (k, l, τ)

p_{a, b} (k, l, τ)

Δ p_{a, b} (k, l, τ)

T_{Y \to X} (τ) = \frac{1}{2} a \sum \frac{1}{p _{a, 0} - p _{a, 0}^{2}} (b \sum p (a, b) Δ p_{a, b}^{2} - \frac{( \sum _{b} p ( a , b ) Δ p _{a, b} ) ^{2}}{p ( a )}) + O (a, b \sum Δ p_{a, b}^{3})

T_{Y \to X} (τ) = \frac{1}{2} a \sum \frac{1}{p _{a, 0} - p _{a, 0}^{2}} (b \sum p (a, b) Δ p_{a, b}^{2} - \frac{( \sum _{b} p ( a , b ) Δ p _{a, b} ) ^{2}}{p ( a )}) + O (a, b \sum Δ p_{a, b}^{3})

\frac{d x _{i}}{d t}

\frac{d x _{i}}{d t}

\frac{d y _{i}}{d t}

\frac{d z _{i}}{d t}

X_{i} (n + 1) = r X_{i} (n) (1 - X_{i} (n)) + j \neq = i \sum A_{ij} l \sum S δ_{i} (n + 1, T_{j l})

X_{i} (n + 1) = r X_{i} (n) (1 - X_{i} (n)) + j \neq = i \sum A_{ij} l \sum S δ_{i} (n + 1, T_{j l})

δ_{i} (n + 1, T_{j l}) = {1, 0, T_{j l} = n + 1 and X_{i} (n + 1) + S \in (0, 1) else

δ_{i} (n + 1, T_{j l}) = {1, 0, T_{j l} = n + 1 and X_{i} (n + 1) + S \in (0, 1) else

\frac{d H _{i} ( t )}{d t} = - \frac{H _{i} ( t )}{σ _{d}} + f l \sum δ (t - T_{i l}^{F}) + j \neq = i \sum A_{ij} l \sum S g (V_{j}^{pre})

\frac{d H _{i} ( t )}{d t} = - \frac{H _{i} ( t )}{σ _{d}} + f l \sum δ (t - T_{i l}^{F}) + j \neq = i \sum A_{ij} l \sum S g (V_{j}^{pre})

g (V_{j}^{pre}) = \frac{1}{1 + exp ( - ( V _{j}^{pre} - 20 ) /2 )}

g (V_{j}^{pre}) = \frac{1}{1 + exp ( - ( V _{j}^{pre} - 20 ) /2 )}

\frac{d x _{i}}{d t} = σ (y_{i} - x_{i}) + j \neq = i \sum A_{ij} S (x_{j} - x_{i})

\frac{d x _{i}}{d t} = σ (y_{i} - x_{i}) + j \neq = i \sum A_{ij} S (x_{j} - x_{i})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Image Processing Techniques and Applications · Advanced Electron Microscopy Techniques and Applications

Full text

Digital System Reconstruction by Pairwise Transfer Entropy

Zhong-Qi Kyle Tian1, Douglas Zhou1,[email protected], David Cai1,2,3

Abstract

Transfer entropy (TE) is an attractive model-free method to detect causality and infer structural connectivity of general digital systems. However it relies on high dimensions used in its definition to clearly remove the memory effect and distinguish the direct causality from the indirect ones which makes it almost inoperable in practice. In this work, we try to use a low order and pairwise TE framework with binary data suitably filtered from the recorded signals to avoid the high dimensional problem. Under this setting, we find and explain that the TE values from the connected and unconnected pairs have a significant difference of magnitude, which can be easily classified by cluster methods. This phenomenon widely and robustly holds over a wide range of systems and dynamical regimes. In addition, we find the TE value is quadratically related to the coupling strength and thus we can establish a quantitative mapping between the causal and structural connectivity.

1

School of Mathematical Sciences, MOE-LSC, and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China

2

Courant Institute of Mathematical Sciences and Center for Neural Science, New York University, New York, NY, United States of America

3

NYUAD Institute, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates

Introduction

The structural connectivity of networks is of great importance to understand their function in many scientific fields like the genetic regulatory networks and neuronal circuitry networks. When only limited knowledge of the underlying dynamics is given, it is almost impossible to identify the actual connectivity directly in experiment, although abundant time series data generated by individual nodes in networks can be recorded much more easily in today’s technology. Based on these massive quantities of time series, one may use statistical approaches to identify the causal connectivity and further infer the function of the network [37, 15, 8, 39]. Transfer entropy (TE), an information-theoretic measure, attracts more and more interest due to its excellent model-free property [33, 19, 13, 41], requiring no detailed knowledge of the system. Unlike the Grange causality (GC) analysis [8, 16, 11, 43], which is generally limited to linear dynamics. For a system of two nodes $X$ and $Y$ connected as $Y\rightarrow X$ , the idea of TE is that the history information of $Y$ is helpful to predict the future of $X$ , so the uncertainty about the future of $X$ conditioned on its own memory is reduced by adding the past information of $Y$ . This reduction of uncertainty is quantified by TE. Because of its conceptional simplicity and model-free property, TE has been widely and successfully applied to causal problems in many fields, such as neuron studies, social media and financial market [41, 19, 7, 40, 25].

Despite the conceptual apeal of TE, it suffers the problem of “the curse of dimensionality” [20, 9, 4, 31, 32, 17] for a successful application. For example, to clearly measure the causality from $Y$ to $X$ , the memory of $X$ should be well conditioned [42] which is often very long. Besides for large networks, the intermediate nodes should also be taken into account in case the computed causality is only indirect such as in the case of $Y\rightarrow Z\rightarrow X$ . Then the high dimensional problem occurs. However the data recorded in experiments is limited, so TE is often applied with some dimension reduction, $e.g.,$ take a bivariate setting and truncate the length of memory (low orders in TE’s definition) [33, 13, 41, 21, 31] or make a specific prior assumption of the distribution of the signals, such as the Gaussian distribution [24, 28, 22, 30, 35]. To reduce dimension, we use low order and pairwise TE and use binary time series filtered from the raw signals. Then a basic question is that can the reduced TE framework successfully infer the causal connectivity. Moreover the causal connectivity revealed by TE is only effective or functional [10], then how to infer the structure connectivity from the TE causal connectivity?

In this Letter, we consider three quite different and typical classes of non-linear networks to show the wide application of TE: the Hodgkin-Huxley (HH) neuronal network, Lorenz network and discrete Logistic map systems. Our numerical results show that there is a significant difference of order of magnitude between the connected and unconnected TE values. Thus we can set a proper threshold to classify the TE values into two groups where the larger ones are inferred from the connected pairs. The inferred TE connectivity is highly consistent with the structure connectivity over a wide range of dynamical regimes for all the three systems. Our theoretical analysis show that the TE values are quadratically related to the coupling strengths, thus we can establish a direct map between these two types of connectivity. It is worthwhile pointing out that we only use the spike timing of HH neurons to successfully infer the structure connectivity of HH networks. The spikes of individual neurons in a population can be easily recorded in experiment like by the calcium imaging [14, 36] and multi-electrode array methods [26, 34]. Our work may provide an operable and efficient TE framework to predict the structural connectivity of these neuronal systems.

Results

We first consider a Hodgkin-Huxley (HH) neuronal network with $N$ excitatory nodes (neurons). The HH model is widely used to simulate neuronal networks for computational neural scientists [1, 23, 29]. The dynamics of the $i$ th HH node is governed by

[TABLE]

where $C$ is the membrane capacitance and $V_{i}$ is its membrane potential, $m_{i}$ , $h_{i}$ and $n_{i}$ are gating variables to describe the sodium and potassium currents, $V_{Na},V_{K}$ and $V_{L}$ are the reversal potentials for the sodium, potassium and leak currents, respectively, $G_{Na},$$G_{K}$ and $G_{L}$ are the corresponding maximum conductances, $\alpha$ and $\beta$ are empirical functions of $V$ [18, 6, 12],

[TABLE]

The input current $I_{i}^{\textrm{input}}$ is given by $I_{i}^{\textrm{input}}=-G_{i}(t)(V_{i}-V_{G})$ with

[TABLE]

where $V_{G}$ is the reversal potential, $G_{i}(t)$ is the conductance, $H_{i}(t)$ is an additional parameter to describe $G_{i}(t)$ , $\sigma_{r}$ and $\sigma_{d}$ are fast rise and slow decay time scale, respectively, and $\delta(\cdot)$ is the Dirac delta function. The second term in Eq. (4) is the feedforward input with magnitude $f$ and the input time $T_{il}^{F}$ is generated from a Poisson process with rate $\nu$ . The last term in Eq. (4) is the synaptic interactions in the network, where $\mathbf{A}=(A_{ij})$ is the adjacency matrix, $S$ is the coupling strength. When the voltage of the $i$ th node $V_{i}$ , evolving continuously according to Eq. (1), reaches the threshold $V^{\textrm{th}}$ , we say it fires a spike at this time and denote it by $T_{il}^{S}$ . Instantaneously, all its postsynaptic nodes will receive this spike and their corresponding parameter $H$ will jump by an amount of $S$ . We record the spike times of each node and transform them into binary time series by a small time bin $\Delta t$ with the value 1 for a spike event in the interval and 0 otherwise.

For two binary random processes $X,Y$ with states $x$ and $y$ , respectively, the TE [33] from $Y$ to $X$ is defined by

[TABLE]

where $\tau$ is a proper time delay of interest, $x_{n}^{(k)}=(x_{n},x_{n-1},...,x_{n-k+1})$ and $y_{n-\tau}^{(l)}=(y_{n-\tau},y_{n-\tau-1},...,y_{n-\tau-l+1})$ , $k,l$ are the orders (memory) of $X$ and $Y$ , respectively. According to Wiener’s principle [42], we should use a sufficiently large value of $k$ to remove the memory effect of $X$ . Then we can call $Y$ is causal to $X$ if the information of the past of $Y$ improves the prediction of $X$ . However, it would increase the dimension greatly and make it inoperable in practice.

We now try to address the issue of whether we can use low order and pairwise TE to reconstruct the structural connectivity of the HH network. We start with the smallest orders $k=l=1$ , $i.e.$ , no memory effect. For the network shown in Fig. 1(a), we find the TE values from connected pairs are always significantly greater than the unconnected ones, over 100 times. This property robustly holds when we scan $f$ and $\nu$ as shown in Fig. 1(c) which covers realistic range of firing rates 2-50 Hz. So we can simply classify them into two groups by the $k$ -means method with the larger ones inferred from connected pairs. Then the adjacency matrix is successfully reconstructed as shown in Fig. 1(b) for all the scanned dynamical regimes. For a network of 100 nodes randomly connected as given in Fig. 1(d), there is still a great difference of magnitude between the TE values from the connected and unconnected pairs as shown in Fig. 1(e). By the $k$ -means method, the accuracy is 100%. We further check whether our TE framework is noise robust or not by adding a noise generated from the uniform distribution $U(-2,2)$ ms in the spike times. As shown in Fig. 1(f), the great difference of magnitude still holds and the reconstruction accuracy is 99.7% .

We now turn to answer the question of why a low order and pairwise TE framework can uncover the structure connectivity of the HH network above. First we present a theoretical estimation of TE defined in Eq.(5). For the simplicity of writing, we map the history state $x_{n}^{(k)}$ to a decimal number, for example $x_{n}^{(k)}=(x_{n}=1,x_{n-1}=0,x_{n-2}=0,x_{n-3}=0)$ can be one-to-one mapped to the binary number 1000 which equals to the digital number 8. Then we can use the notations

[TABLE]

where $a,b$ are the corresponding decimal numbers. We rewrite Eq.(5) in the form of $p_{a,b}$ and $\Delta p_{a,b}$ and a Taylor expansion gives

[TABLE]

The increase $\Delta p_{a,b}$ is from the signals 1 of $Y$ and its value is decided by the coupling strength $S$ from $Y$ to $X$ . As shown in the inset of Fig. 2, $\Delta p_{a,b}$ is linearly related to the coupling strength. We use the neuronal system to understand the linear relationship. For the HH neurons, an excitatory spike with strength $S=0.05\text{ mS/cm}$ can increase the voltage of a post neuron around 1.5 mV and it requires more than 10 spikes arrived at the same time to trigger the post neuron to fire a spike. Therefore, the influence of a single spike is very limited and the increase $\Delta p_{a,b}$ is very small in the order of 0.01. So the first order term is the leading order if we expand $\Delta p_{a,b}$ with respect to $S$ . From Eq.(8), we can finally conclude the relation of TE and coupling strength $T_{Y\rightarrow X}\propto S^{2}$ as shown in Fig. 2.

Due to the high dimensional problem, we apply TE under a low order and pairwise setting. But in principle, these two setting would make the indirect causality significant due to the memory effect and intermediate neurons. Consider a network of 3 nodes connected as $Y\rightarrow X\rightarrow Z$ . When we measure the causality on $Y$ from $X$ by TE, the memory effect would happen if we use a low order for $Y$ . The signal of $X$ driven by $Y$ may contain history information of $Y$ which improves the prediction of the future of $Y$ and the computed $T_{X\rightarrow Y}$ may be significantly overestimated. However this uncertainty reduction computed by TE is merely a self-prediction. On the other hand, when we measure the causality on $Z$ from $Y$ without conditioning on $X$ , we may wrongly infer a significant connection from $Y$ to $Z$ which can be avoided by using a conditional TE including $X$ .

We point out that by using proper binary time series these two problems can be avoided. For neuronal systems, the binary time series can be directly obtained from the spike trains. Under a time bin size of 0.5 ms which resolution can be realized in experiment, we find that almost all the components of the binary data are zeros and only a few odd ones. The auto-correlation of the binary data is quite weak as shown in Fig. 3(a). For binary random variables, uncorrelatedness is equivalent to independence. Therefore, the obtained binary time series are almost whitened, $i.e,$ no memory effect. As shown in Fig. 2, the TE values from reverse direction causality $T_{X\rightarrow Y}$ is negligible even with a order of $1$ . However if we use the continuous–valued voltage time series, the memory is over 10 ms as show in Fig. 3(b) and we can not use a low order. For the indirect causality, we have $\Delta p(Y\rightarrow Z)=O(\Delta p(Y\rightarrow X)\Delta p(X\rightarrow Z))$ from the causal path. When we take a time bin size of 0.5 ms, the increase $\Delta p$ from direct causality is quite small $O(0.01)$ . Hence the indirect $\Delta p$ is much smaller. From Eq.(5), the TE values from indirect causality $T_{Y\rightarrow Z}$ are also negligible as shown in Fig. 2.

As a model-free method, our low order and pairwise TE framework with binary data should also work in other systems. Here we give two examples: the Lorenz system and discrete logistic map system. The $i$ th node of a Lorenz system is governed by

[TABLE]

where $\sigma=10,\beta=8/3,\rho=28$ and we take the threshold $x^{\textrm{th}}=10$ . When $x_{i}$ reaches the threshold, it will give a pulse to all the post nodes and we denote the moment by $T_{il}$ . The binary data are obtained in the same way as in the HH system. For the logistic map network, the $i$ th node is govern by

[TABLE]

where $r=4$ and we take the threshold $X^{\textrm{th}}=0.9$ . Here $T_{il}$ is the time when $X_{i}(T_{il}-1)<X^{\textrm{th}}$ and $X_{i}(T_{il})\geq X^{\textrm{th}}$ . For the binary data, only the times of $\{T_{il}\}$ is 1. Note that for discrete systems, we cannot remove the memory effect by using a small time bin but have to include the strongly correlated historical lags which can be estimated by the auto-correlation function as shown in Fig. 3(b).

As shown in Fig.2, the TE framework for Lorenz and logistic systems has similar performance as that for HH system. Especially, the significant difference of magnitude between TE values from connected and unconnected pairs still holds. We also apply TE framework to Lorenz and Logistic networks of 100 nodes connected as the one in Fig. 1(d). Classified by $k$ -means method, the reconstruction accuracy is 98.7% and 100% for Lorenz and logistic networks respectively as shown in Fig. 4.

Discussion

In summary, we have proposed a low order and pairwise TE framework with binary data to avoid the high dimensional problem to reconstruct the structure connectivity by detecting the TE causality among a network. We have found and explained the phenomenon that there is a significant difference of magnitude between TE values from connected and unconnected pairs, depending on which the structure connectivity can be easily reconstructed by cluster methods. Our TE framework can be applied to a wide class of systems like the non-linear, discrete-valued or continuous-valued ones with a high reconstruction accuracy. We have also established a quadratic relationship between the TE values and the coupling strengths.

We should first point out that our TE framework does not rely on pulse-coupled dynamics as we used above. For example, we can use a continuous function to describe the dynamics of synaptic interactions in the HH model [5, 38]

[TABLE]

where

[TABLE]

We can also extend the Lorenz system to a continuously coupled case [27, 3, 2]

[TABLE]

and choose a proper threshold to obtain suitable binary time series. Conclusions shown in Figs. 2 and 4 will not change in these two extended models.

As for the proper time delay, it depends on the detailed systems. For example, once a recipient HH node receives a spike from a driver HH node, its voltage will increase and reach a local peak value some time later, around which is the optimal time delay. Another way is to scan the time delay to reach a peak TE value. In our TE framework, the order of magnitude makes sense, so the proper time delay allows a relatively wide range.

Finally, we point out that the distributions of TE values from connected and unconnected pairs shown in Figs. 1 and 4 may have a large overlap, $e.g.,$ when the coupling strength is inhomogenous and follows a distribution. Then the $k$ -means method may wrongly predict the connected pairs with weak couplings and unconnected pairs with strong indirect causality. A future work that may partially revise this problem is that we first preliminary compute the pairwise TE values and obtain the inferred adjacency matrix, then for each pair we recompute TE conditioned on the important intermediate nodes obtained from the preliminary inferred adjacency matrix and classify them by $k$ -means method again.

Acknowledgments

This work was supported by NYU Abu Dhabi Institute G1301 (Z.K.T, D.Z., and D.C.), NSFC-11671259, NSFC-11722107, NSFC-91630208, Shanghai Rising-Star Program-15QA1402600 (D.Z.), NSFC 31571071, NSF DMS-1009575 (D.C.), Shanghai 14JC1403800, Shanghai 15JC1400104, SJTU-UM Collaborative Research Program (D.Z. and D.C.).

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] LF Abbott and Thomas B Kepler. Model neurons: from hodgkin-huxley to hopfield. In Statistical mechanics of neural networks , pages 5–18. Springer, 1990.
2[2] Igor Belykh, Vladimir Belykh, and Martin Hasler. Synchronization in asymmetrically coupled networks with node balance. Chaos: An Interdisciplinary Journal of Nonlinear Science , 16(1):015102, 2006.
3[3] Vladimir N Belykh, Igor V Belykh, and Martin Hasler. Connection graph stability method for synchronized coupled chaotic systems. Physica D: nonlinear phenomena , 195(1-2):159–187, 2004.
4[4] Stefan Berchtold, Christian Böhm, and Hans-Peter Kriegal. The pyramid-technique: towards breaking the curse of dimensionality. In ACM SIGMOD Record , volume 27, pages 142–153. ACM, 1998.
5[5] Albert Compte, Maria V Sanchez-Vives, David A Mc Cormick, and Xiao-Jing Wang. Cellular and network mechanisms of slow oscillatory activity (< 1 hz) and wave propagations in a cortical network model. Journal of neurophysiology , 89(5):2707–2725, 2003.
6[6] Peter Dayan and Laurence F Abbott. Theoretical neuroscience , volume 806. Cambridge, MA: MIT Press, 2001.
7[7] Alexander G Dimitrov, Aurel A Lazar, and Jonathan D Victor. Information theory in neuroscience. Journal of computational neuroscience , 30(1):1–5, 2011.
8[8] Mingzhou Ding, Yonghong Chen, and SL Bressler. Granger causality: basic theory and application to neuroscience. 2006. ar Xiv preprint q-bio/0608035 .