TL;DR
This paper introduces a framework to analyze how network structure, noise, and interference affect information transmission in complex dynamical networks, highlighting the advantages of non-normal networks in noise cancellation and information throughput.
Contribution
It develops a mathematical framework to understand information propagation in networks, revealing the benefits of non-normal structures over normal ones for communication efficiency.
Findings
Non-normal networks can cancel noise effects by transiently amplifying input dimensions.
Normal networks suffer from interference noise and are less efficient.
Network wiring details often do not impact transmission quality in normal networks.
Abstract
In both natural and engineered systems, communication often occurs dynamically over networks ranging from highly structured grids to largely disordered graphs. To use, or comprehend the use of, networks as efficient communication media requires understanding of how they propagate and transform information in the face of noise. Here, we develop a framework that enables us to examine how network structure, noise, and interference between consecutive packets jointly determine transmission performance in networks with linear dynamics at single nodes and arbitrary topologies. Mathematically normal networks, which can be decomposed into separate low-dimensional information channels, suffer greatly from readout and interference noise. Interestingly, most details of their wiring have no impact on transmission quality. Non-normal networks, however, can largely cancel the effect of noise by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Efficient Communication over Complex Dynamical Networks: The Role of Matrix Non-Normality
Giacomo Baggio
Department of Information Engineering, University of Padova, via Gradenigo, 6/B I-35131 Padova, Italy
Virginia Rutten
Gatsby Computational Neuroscience Unit, University College London, London W1T 4JG, UK
Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
Guillaume Hennequin
Sandro Zampieri
Abstract
In both natural and engineered systems, communication often occurs dynamically over networks ranging from highly structured grids to largely disordered graphs. To use, or comprehend the use of, networks as efficient communication media requires understanding of how they propagate and transform information in the face of noise. Here, we develop a framework that enables us to examine how network structure, noise, and interference between consecutive packets jointly determine transmission performance in networks with linear dynamics at single nodes and arbitrary topologies. Mathematically normal networks, which can be decomposed into separate low-dimensional information channels, suffer greatly from readout and interference noise. Interestingly, most details of their wiring have no impact on transmission quality. Non-normal networks, however, can largely cancel the effect of noise by transiently amplifying select input dimensions while ignoring others, resulting in higher net information throughput. Our theory could inform the design of new communication networks, as well as the optimal use of existing ones.
One sentence summary: Non-normal networks spread information efficiently in the face of interference between consecutive transmissions and readout noise.
Introduction
Reliable propagation of information through networks with unreliable nodes is a fundamental problem facing many engineered and natural systems. This includes social networks (?, ?), peer-to-peer networks (?), gene regulatory networks (?, ?), power grids (?), and brain networks (?, ?), to cite only a few.††Corresponding author: [email protected].††Equal contributions. In order to engineer better communication networks, make better use of existing ones, or understand how natural (e.g., biological) networked systems function, a theory is needed that relates the network’s connectivity and dynamics to its performance in transmitting information.
Previous work at the interface of network science and information theory has been largely restricted to static, feedforward networks, in which packets of activity travel one after the other through layers of memoryless nodes, with no interference. Examples include classic connectionist work where feedforward “neural” networks are optimized so their outputs retain as much information as possible about their inputs (?, ?). These works have influenced how neuroscientists think about sensory pathways, which resemble layered networks of noisy neurons receiving input packets from body senses (?). In particular, the neural representations of visual stimuli that are found along the primate ventral stream are strikingly similar to those that emerge in deep networks trained on object recognition tasks (?). More recent work (?) has drawn a link between deep learning (?) and the information bottleneck method (?), a principled approach to compressive communication. Beyond feedforward networks, the effect of recurrent topologies on information transmission was studied in the context of virtual electrical circuits (?), but this was restricted to steady states and therefore disregarded any potential encoding of information in activity transients.
In most real-world scenarios, however, information does not propagate statically (or instantaneously), but dynamically within complex recurrent networks composed of non-memoryless nodes. The inherent dynamics of the network can greatly affect communication performance in ways that remain poorly understood. In (?), the authors proposed an analytical framework based atop standard notions of time-delayed mutual information and transfer entropy, to quantify the routing of small activity fluctuations propagating on top of oscillatory reference dynamics. While their framework allowed them to identify a generic mechanism capable of generating flexible information-routing patterns in the network, it is based on a small-noise approximation and therefore cannot fully capture the impact of noise on network communication. Moreover, the authors did not systematically study the role of network topology. The authors of (?) investigated the interplay between the network topology and its dynamics. They found that patterns of information are governed by universal laws that depend only on a few relevant parameters of the network dynamics. However, the analysis was carried out in a deterministic setting, and the proposed information transfer metric — which quantifies the sensitivity of a dynamical system to local perturbations — lacks an explicit information-theoretic interpretation. The work (?) used Fisher information theory to quantify the short-term memory storage capacity of networks governed by linear dynamics. In investigating this memory problem, which is a form of network communication through time, the authors were led to study the interactions between single-node dynamics, connectivity, and input statistics, similar to the theory we develop here. However, the network received a one-dimensional input, and temporal correlations were neglected.
Here, we study the role of graph topology on the quality of information transmission in noisy networks with otherwise simple, linear single-node dynamics. We establish a novel framework for quantifying the maximum amount of information about high-dimensional inputs that can be transmitted reliably through such networks. We apply our framework to various network architectures, ranging from simple, structured networks amenable to analytical derivations, to more complex, disordered, and real networks that we investigate numerically. Critically, all the networks we consider here have memory, from which interference arises between the network’s response to multiple packets transmitted in close succession, and constitutes a source of internal, structured noise. We show that when the amount of noise present in the information channel is large, anisotropic (mathematically “non-normal”, (?, ?)) networks that embed directed feedforward pathways perform better than isotropic (“normal”) ones.111 A network is said to be normal if its connectivity matrix is normal, i.e., if it satifies , where denotes the conjugate transpose. Otherwise, the network is said to be non-normal (?). Moreover, we find that such non-normal networks can even entirely overcome the effect of noise in some limit. Our results provide estimates for the amount of information that a network can propagate, and insights into how the propagation of information depends on key network properties. Additionally, we discuss how information propagation can be optimized by using specific distributions of input packets. We expect our theory to contribute to understanding the behaviour of natural networked systems, which are often found to be strongly non-normal (?). Further dissection of the mechanisms at work in natural networks (e.g., single-node dynamics, graph structure, adaptive wiring, …) may also suggest better engineered solutions to network communication.
Results
Modelling framework
Communication through networks
We consider the following model of a communication channel, whereby a sequence of to-be-transmitted packets of information is probabilistically encoded in a sequence of input vectors (Efficient Communication over Complex Dynamical Networks: The Role of Matrix Non-Normality). Information transmission occurs via propagation of the inputs through a dynamical network. In order to obtain analytical, interpretable results that hold for arbitrarily complex graph topologies, we assume minimalistic dynamics for single network nodes: first-order, linear responses to inputs. Specifically, we consider continuous-time linear dynamical systems of the form
[TABLE]
where denotes the state vector and is the state matrix. We restrict our analysis to the case of “stable” network dynamics, whereby responses to transient inputs do not grow unbounded (which would be physically unfeasible) but fade away after some time. Mathematically, this means we require all eigenvalues of to have negative real part.
Each input vector , independently drawn from an identical encoding probability distribution , contains the information carried by the -th transmitted packet. Each of these inputs is then delivered as an impulse (here modelled as a Dirac’s delta ) that excites the network dynamics in Equation 1. Transmission of successive packets occurs every units of time. The columns of the matrix define “input nodes” (red circles in Efficient Communication over Complex Dynamical Networks: The Role of Matrix Non-Normality), which are the only ones affected by the impulse. Likewise, a readout matrix singles out specific output nodes (blue circles) whose activations are transmitted to the receiver, further corrupted by independent Gaussian noise of variance . This results in corrupted trajectories which the receiver could use to reconstruct the corresponding input packets. In our assessment of communication performance, we will consider Shannon’s mutual information as a proxy for reconstruction quality (see below), instead of considering explicit decoding algorithms.
By reducing the complexity of single node dynamics to simple first-order evolution, Equation 1 allows us to focus on the effect of network architecture on the quality of information transmission. For example, Equation 1 is known as a “rate equation” in computational neuroscience, whereby it has been shown to capture key aspects of the dynamics of neuronal networks around fixed points (?). Indeed, single neurons are often characterized by input/output functions that remain approximately linear over their relevant dynamic range (?). In that case, represents the matrix of synaptic connection weights, and is interpreted as momentary deviations from steady-state firing rates.
Importantly, since each network node is governed by first-order dynamics, the network is not memoryless: activity trajectories elicited by previous communications interfere with (in fact, add linearly to) the network trajectory carrying information about the current input. Thus, for the transmission of a packet at time (assuming many packets have already been transmitted), interference contributes an additional source of noise , given by
[TABLE]
where denotes the unit-step function, defined as , if , and , otherwise. This phenomenon, known as inter-symbol interference in communications (?), arises in any communication medium that has some form of memory, including networks with node dynamics described by differential equations.
In the following, we study the combined effects of the network architecture (matrix ), communication time window (), noise level (), and encoding of input packets under this communication paradigm. We begin by establishing an analytical framework to characterize the quality of information transmission through the network, and highlight the trade-off that arises between sending packets of information at a high temporal rate and the ability for the receiver to accurately reconstruct them. We then summarize our analytical results, and illustrate them using appropriate network architectures.
Information transmission metrics
To quantify the amount of information that can be propagated through the network channel described above, we use the notion of Shannon’s mutual information between the input packet and the corresponding noisy network output observed over the subsequent time interval . Denoting by this output function (on which inter-symbol interference acts as an additional source of noise), and assuming stationarity to drop the subscripts, we can write the mutual information (in bits) between and as
[TABLE]
where the notation emphasises the dependence of mutual information on the transmission window (a more formal definition of the integral over functions in Equation 3 is given in Supplementary Note 2). To better utilise the channel, the sender can use the encoding distribution that maximizes the mutual information; this optimum defines an information metric which is independent of the encoding distribution,
[TABLE]
With a slight abuse of terminology, we will refer to this metric as information capacity, or, simply, capacity.222 Our choice of terminology is motivated by the fact that this metric coincides with the standard capacity of a digital communication channel, when the channel is memoryless, see, e.g., (?). We refer to Supplementary Note 2 for further details on the relation between the channel capacity and our metric in Equation 4.
In Equation 4, the maximization over the encoding distribution must be performed with an additional constraint on input power (input covariance). Theoretically, this is required so that the capacity remains finite (the signal-to-noise ratio can be made arbitrarily large if inputs can be arbitrarily large too). In practice, the nodes of any physical network have limited dynamic range, and therefore network inputs must be power-limited. Here, we consider Gaussian encoding distributions with zero mean and covariance , and input power constraint of the form (without loss of generality; cf. Supplementary Note 3).
An expression for the information capacity
Our main theoretical result is the following expression for the information capacity (Supplementary Note 2):
[TABLE]
where is the variance of the noise at the receiver, denotes the observability Gramian over the interval of the system in Equation 1, and is the infinite-horizon controllability Gramian of the dynamics in Equation 1 discretized with sampling time and input matrix (?). The formal definition of these matrices is reported in Materials and Methods and their properties discussed in our Supplementary Note 1. Note that Equation 5 still involves a (difficult) maximization over the input distribution (via its covariance matrix ); in the following, we perform this optimization analytically where possible, but otherwise numerically using efficient algorithms (Materials and Methods).
The information capacity affords a few intuitive properties (cf. Supplementary Note 3). First, always grows with increasing SNR . Second, is a bounded function of that attains its maximum as grows to infinity. This is because, for increasing , (i) network activations left over from previous transmissions have more time to decay away, leading to weaker interference, and (ii) longer stretches of signal are available for decoding, allowing for better estimation of the input signal via additional filtering/de-noising. Third, cannot decrease if nodes are added to either the set of input nodes, or the set of output nodes.
We also note that, in our framework, propagation of information through the network occurs over a finite time window , and packets of information can only be transmitted one at a time. Thus, a more relevant measure of information transmission performance is number of bits of information about contained in per unit time, i.e.,
[TABLE]
We term this metric information rate. Since the information capacity is bounded (due to output noise and inter-symbol interference), always decreases with for large enough . However, we will see that there often exists a non-zero optimal transmission window , at which reaches a maximum.
The limitations of normal networks
As we will see later, many high-dimensional networks can be conveniently decomposed as a set of parallel, independent communication channels each transmitting information about a one-dimensional, scalar quantity. We therefore begin our analysis of the role of connectivity in network communication by an in-depth look at a simple case, that of a single isolated node (Communication through networksA). With , and (where is the node’s decay time constant), Equation 5 simplifies considerably, yielding the following capacity:
[TABLE]
This expression illuminates some additional properties of the information capacity and its dependence on network parameters. To begin with, grows with the allotted transmission window (Communication through networksB and C, left). Intuitively, this is because increasing the transmission window reduces inter-symbol interference, as the node’s activity has more time to decay away before the next packet is transmitted. However, while grows linearly with for small increasing , it eventually saturates at a maximum value that grows both with the node’s decay time constant (; Communication through networksB, left) and with the SNR (; Communication through networksC, left). Indeed, for large enough , the output noise becomes the main factor limiting the capacity, and grows increasingly dominant during the transmission of a packet as the node’s activity (the “signal”) decays exponentially over time. Thus, increasing the observation time cannot indefinitely increase the ability of an ideal observer to reconstruct the input packet.
Next, as increases with diminishing returns on the capacity (cf. above), the rate (information per unit time, Equation 6) is bound to decrease (Communication through networksB and C, right). Thus, keeping the transmission window very short is the most effective way for a single node to transmit information under time pressure. In this limit, bits/s can be transmitted.
In practice though, transmission windows cannot be made arbitrarily small. For example, visual information conveyed to the brain via the optic nerve fluctuates on a timescale that is limited “at the source” by the rate at which objects move in the scene, and by the frequency and speed of saccadic eye movements which determine an effective sampling frequency (?). Thus, we now assume a finite transmission window . In this case, there exists an optimal value of the decay time constant for both (Communication through networksD, left) and (not shown). This reflects a trade-off between the noise and inter-symbol interference, mathematically evident from Equation 7, where can be seen to go to zero when is either very small or very large. Intuitively, for small decay time constants , inter-symbol interference becomes irrelevant, and the information capacity is limited by the effective signal to noise ratio , which in turn decreases with decreasing . Similarly, for long decay times (increasing ), inter-symbol interference dominates, and ruins the information capacity by letting the summed activities of many previous transmissions pollute the component relevant to the current packet. Thus, the rate (and capacity) is expected to achieve a maximum for some intermediate, optimal value of the decay time constant. Numerically, we find that this optimal time constant scales near-linearly with the transmission window (Communication through networksD, right).
The case of a single-node “network” is, in fact, characteristic of the broader class of so-called “normal” networks, which include symmetric, skew-symmetric, and translation-invariant graphs to name only a few examples. Indeed, when ,333We recall that the case represents the most favorable communication scenario, since any other choice of and provably yields a smaller value of the information capacity and rate (cf. Supplementary Note 3). any normal network composed of nodes can be shown to behave like a set of independent scalar information channels (Supplementary Note 5), each corresponding to a specific spatial “mode” of activity at the network level that decays at a specific rate between consecutive transmission events. For example, for a translation-invariant architecture, these channels correspond to Fourier modes of varying spatial frequencies with decay rates that depend on the strength and spatial smoothness of the recurrent interactions (?, ?).
Our mathematical analysis of normal networks shows that, despite their appealing interpretation as sets of parallel communication sub-channels, these networks might not be optimally suited for transmitting information. First, as expected from an ensemble of independent scalar sub-channels whose rates each decrease with (recall Communication through networksB and C, right; further examples are given below), multidimensional normal networks with too are best exploited in the limit of very small transmission windows (). As discussed previously, this limit is irrelevant in most applications (where is finite), implying that normal networks would always be sub-optimally exploited in practice. Second, and more importantly, we could show that the maximum achievable performance of a normal network does not depend on the fine details of its architecture (e.g., the detailed couplings between nodes) but only on the average decay rate of its nodes (the trace of ). Indeed, for any choice of and , the information rate of a normal network can never exceed (Supplementary Note 5)
[TABLE]
In particular, the above limit is attained with equality when all nodes are transmitting and receiving packets of information, that is, when . Critically, there are infinitely many network architectures that share the same but have otherwise very different geometries. Thus, it would be somewhat surprising if, among the very large set of all (i.e., normal and non-normal) networks with the same trace, the restricted subset of normal networks achieved the best performance. What is more, Equation 8 also implies that the maximum rate of any normal network in the low SNR regime is simply , which no longer depends on the connectivity matrix . In other words, no amount of clever structuring of a normal architecture can ever rescue the drop in information rate incurred by a decrease in SNR. These considerations prompted us to study information transmission through more general, non-normal networks.
Role of non-normality in information transfer
A “non-normal” network is any network whose connectivity matrix is not normal (?, ?). Thus, given the equivalence of normal networks with independent parallel channels discussed above, a non-normal network is one that cannot be so decomposed. This implies the existence of effective feedforward pathways, embedded either explicitly at the level of network nodes (i.e., an “anisotropic” tree-like structure that one would notice by looking at the connection graph; (?)) or implicitly at the level of orthogonal activity modes that involve many nodes simultaneously (“hidden” feedforward pathways; (?, ?, ?)). Mathematically, explicit and implicit tree-like structures can both be identified via the Schur decomposition . If is normal, this decomposition returns a diagonal matrix , with the Schur modes (columns of ) interpreted as separate information channels with decay rates given by the diagonal of . For a non-normal matrix , the Schur decomposition returns a triangular , the off-diagonal elements of which reveal hidden feedforward connections between the Schur modes.
While it is straightforward to classify a matrix as normal or non-normal, the extent or “degree” to which a matrix departs from normality, and how such departure affects the dynamics of the network and communication performance, are more difficult to assess. Indeed, although several non-normality metrics of either “dynamical” or “algebraic” nature have been proposed in the literature (Supplementary Note 6), there does not exist a unique scalar parameter quantifying the amount of non-normality of general matrices. To address this, we begin with a class of linear graphs whose departure from normality is parameterized by two characteristics that we can choose independently and arbitrarily: the length of the chains embedded in the graph, and the directionality of these chains (The limitations of normal networksA).444More generally, it can be shown that structural indicators of network non-normality are: (i) absence of cycles, (ii) low reciprocity of directed edges, and (iii) presence of hierarchical organization (see (?)). However, if the network is stable, the strength and length of directional paths in the networks represent effective indicators of non-normality (cf. Supplementary Note 6 and (?)). The connectivity matrix of these networks reads
[TABLE]
where , , and to enforce stability. The simplicity of this architecture allows us to conveniently decouple the effects of (i) the eigenvalues of , and (ii) its departure from normality, on the network dynamics (see below). We show later that the insights obtained from this simple structured example topology, especially concerning the role of network non-normality, carry over to higher-dimensional and heterogeneous networks. In particular, analogous considerations apply to the family of “layered” networks described in our Supplementary Note 7. This class consists of networks with arbitrary “baseline topology” made increasingly non-normal through a process of “directed stratification”. In addition, for these networks, one can define parameters and that represent the directionality strength between adjacent layers and depth of connected layers, respectively. As in the chain network (9), these parameters regulate departure from normality.
Mathematically normal versions of this chain architecture are obtained either when there effectively is no chain (set of isolated nodes), or when there is no specific directionality in the connectivity (, symmetric graph). In either case, the information rate decreases with increasing transmission window (The limitations of normal networksB, lowest curves), consistent with the formal theory developed above. To understand this behaviour, and as a preliminary to our analysis of non-normal networks, we examine the optimal allocation of input power, or the spatial structure of the optimal input distribution. In The limitations of normal networksC, we plot the optimal input covariance (calculated as part of deriving the capacity; recall Equation 5), expressed in the eigenbasis of the connectivity matrix , with eigenvectors sorted by decreasing values of their decay rate. For long transmission windows, more of the input variance is funnelled through slow-decaying modes than through fast-decaying ones (right, ). This allows more of the input signal to survive the natural decay of activity in the network, thereby sustaining the signal-to-noise ratio at the receiver. For shorter transmission windows, this strategy no longer pays off: much of what is “signal” for the current transmission is effectively “noise” for the next transmission epoch, and prolonging its decay adds further inter-symbol interference. Accordingly, the optimal allocation strategy for short is the opposite of that for large : each sub-channel is now allocated power proportional to its decay rate (Supplementary Note 5). Finally, while achieving the information capacity requires careful selection of sub-channels according to their decay rates (as just discussed), concentrating the input power on too few channels comes at a cost, as communication no longer exploits all the network’s degrees of freedom. This is best illustrated in a set of independent nodes with identical time constants, for which the best strategy is provably to give each node an equal share of the total available power (Supplementary Note 5). This amounts to maximizing the entropy of the input distribution. The covariances matrices of The limitations of normal networksC represent the optimal way of resolving the above trade-offs, for the chain architecture considered here.
We next show that large gains in information rate can be obtained by making the network connectivity non-normal. The degree of non-normality of the chain’s connectivity matrix () can be increased, without altering its eigenvalues, by increasing a single parameter reflecting the graph’s directionality (The limitations of normal networksA). As the network is made increasingly non-normal in this way, its information rate grows to eventually exceed the normal networks’ optimal rate by a large margin. Moreover, the optimal rate is now attained at some realistic, finite transmission window (The limitations of normal networksB, left).
To understand the mechanism through which non-normality improves information transmission, we repeat our inspection of optimal power allocation, now for a non-normal network with . In The limitations of normal networksD, we plot the optimal input covariances (no longer expressed in the eigenbasis of , but in the standard basis of the network’s nodes) for various transmission window lengths. For large transmission windows, including the one that leads to the largest rate , input power concentrates on the “source” nodes (left-most nodes in The limitations of normal networksA, bottom). This optimal strategy exploits the network’s ability to amplify signals as they propagate down the chain towards the “sink” (the last node). Thus, the SNR at the receiver can display large transient increases, whereas its decay could at best be slowed down in normal networks. For short transmission windows, such a strategy no longer pays off, due to the same tradeoffs as uncovered above for normal networks. First, the signal transiently builds up into the next transmission epoch, where it no longer is signal but instead contributes noise. Second, distributing input power unevenly across the network nodes by favouring the “source” nodes reduces the entropy of the input distribution, which fundamentally limits the information rate. Together, these drawbacks explain why the source nodes are not particularly favoured over sink nodes when is small (The limitations of normal networksD, left), and why, in general, the input power does not concentrate entirely on the first node in the chain, but is generally distributed among the first few.
To further substantiate that non-normality benefits the information capacity, we manipulate the degree of non-normality of the chain network discussed above, this time not by increasing , but through a complementary modification. Specifically, we morph the non-normal chain discussed above back into a normal network, by chopping the original chain of length into sets of shorter chains (The limitations of normal networksA, top to bottom). Shorter chains consistently yield smaller information capacity (The limitations of normal networksB, right), confirming that network non-normality has a positive impact on information transmission. We found a similar correlation for the more general class of layered topologies described in our Supplementary Note 7. More precisely, for these networks increasing the depth of connected layers has a provably beneficial effect on the communication performance.
How noise shapes the optimal architecture
The results presented so far show that non-normal architectures can, in principle, outperform normal networks as information transmission media. These results were obtained for fixed input SNR, and we now show that non-normality is all the more beneficial as the SNR is poor. To show this, we revisit the chain architecture of the previous section (The limitations of normal networksA) and systematically vary , the amplitude of the noise at the receiver (Role of non-normality in information transfer).
In the low-noise regime, non-normality has little impact on information transmission, whether the network is made non-normal by increasing its directionality (Role of non-normality in information transferA) or by increasing the length of its chains (Role of non-normality in information transferB). In fact, for any , when is small we have (cf. Supplementary Note 4)
[TABLE]
which shows that in the low-noise regime the rate depends on the spectrum of only. For the chain network, Equation 10 reduces to , which is independent of and . For large enough , however, increasing or has pronounced benefits on the maximum information rate (Role of non-normality in information transferA-B). In contrast, modifications of the parameters of the normal network () that affect the eigenvalues without causing any departure from normality have close to no impact on the information rate. Specifically, changing the decay rate of the single nodes is only beneficial in the low-noise regime (Role of non-normality in information transferC), corroborating the conclusions drawn from Equation 8 above. The same equation also predicts that changing the overall coupling strength (while keeping the directionality constant) have no effect on (not shown).
From our analysis of this simple architecture, we conclude that network non-normality can greatly enhance information transmission in the low SNR regime. In fact, we were able to show that non-normality can (in theory) cancel the effect of noise altogether (Supplementary Note 7). Specifically, it holds
[TABLE]
Equation 11 implies that, no matter how poor the SNR is, by increasing the degree of non-normality of the network via the directionality strength we get arbitrarily close to the maximum information rate achievable in the noiseless regime (by any network with identical value of ; Role of non-normality in information transferA, horizontal dashed red line).
Intriguingly, this result does not only hold for the simple line architecture described above, but also for more complex class of “layered” with the free parameter summarising departure from normality in terms of directionality strength between layers (Supplementary Note 7). In this family of models, as in the linear chain, the detrimental effect of output noise (however large) can be annihilated entirely by making the network sufficiently non-normal (by increasing ). In this limit of strong non-normality, the network effectively behaves as a one-dimensional channel with decay rate , and indeed achieves an information rate equal to that of any network with the same in the absence of output noise (Equation 10).
Finally, we investigated how the noise level shapes the optimal architecture via an optimization approach. More precisely, we numerically computed the network architecture optimizing the maximum information rate with nodes, bounded network weights and different values of the noise variance (Supplementary Note 8). From our numerical analysis, it turns out that as grows optimal networks become increasingly similar to a purely (hidden or effective) feedforward chain of maximal length, with approximately all of the input power allocated to the first nodes of the chain. This further corroborates our claim that non-normality is crucial for enhancing the communication performance of a network in the high-noise regime.
Generalization to heterogeneous topologies
Although the formulae we have derived regarding the information capacity of linear networks hold for arbitrary topology, most of the results presented so far were based either on highly simplified, small, and structured architectures (The limitations of normal networksA), or on networks that deviated from normality in a highly structured way (Supplementary Note 7). To assess the generality of our results, we now study larger and more heterogeneous networks whose departure from normality we can also control. Specifically, we generate random connectivity matrices following (?) as:
[TABLE]
Here is a random positive definite matrix drawn from the inverse Wishart distribution (Materials and Methods), and is a random skew-symmetric matrix whose (upper-triangular) elements are drawn independently from a normal distribution with zero mean and variance . It is easily shown that any state matrix drawn according to Equation 12 implies stable network dynamics, despite the network graph showing apparent disorder with connections of arbitrary average magnitude (e.g. there is no limit to the norm of and ). The degree of network non-normality is set by the parameter : when , is symmetric, hence normal; as increases, departs further from normality (cf. Materials and Methods and Supplementary Note 6). We calculated the maximum rate of such networks for various degrees of non-normality, and found a similar interplay between network non-normality, transmission window, and input SNR as in the simplified architecture of The limitations of normal networks and Role of non-normality in information transfer. Specifically, non-normality results in greater maximum rates realized by non-zero optimal transmission windows (Figure 5A). Moreover, these benefits over normal networks only arise in the low SNR regime (Figure 5B). Finally, enhanced transmission performance at low SNR relies on a low-dimensional allocation of input power (Figure 5C).
The role of non-normality in information transmission is further illuminated by considering the limit of poor SNR (): for any transmission window length , the rate decays with growing as (cf. Supplementary Note 4)
[TABLE]
where represents the maximum total energy that the network can autonomously generate over a time window , for an appropriate encoding of the input packet . While the momentary magnitude of activity in normal networks can only decay in time (leading to sublinear growth of with , i.e. decreasing in Equation 13), non-normal networks have the capacity to transiently amplify certain input codes before the eventual decay of signals implied by collective stability. This leads to superlinear growth of with , which in turn results in transiently increasing peaking at some finite value of (Equation 13).
Finally, in deriving Equation 13, we could also prove that in the limit of large noise , the rate is realized by effectively one-dimensional inputs, whose distribution lies entirely along the most sensitive input direction (i.e. along the initial condition that evokes the largest energy in the window ; Supplementary Note 4). In other words, the best way for the network to counteract a large amount of noise is to map every input packet onto a single, maximally amplified input pattern, thus effectively giving up on most of its degrees of freedom. This corroborates and strengthens the generality of our findings of The limitations of normal networksD and Figure 5C regarding the effective dimensionality of the input distribution in the high-noise regime.
Discussion
In this paper we have proposed a novel framework to model information propagation through networks with arbitrary topology and nodes governed by linear dynamics. These dynamics imply a form of memory in single nodes, giving rise to interference between the activity transient initiated by the presentation of a given input packet, and the activity left over from previous transmissions. We have used the notion of Shannon’s mutual information to quantify communication performance, and study how the latter depends on the network architecture. Our analysis has shown that the qualitative effects of graph connectivity on communication are largely determined by a property that is often overlooked: the degree of non-normality of the network’s (weighted) adjacency matrix. In particular, we have shown that normal networks perform poorly in the presence of large readout noise at the receiver. In contrast, non-normal networks exhibit more favorable communication properties, including the ability to entirely cancel out the effect of readout noise provided the input packets are appropriately encoded, and the adjacency matrix is sufficiently non-normal. Interestingly, non-normal networks appear ubiquitous, with strong non-normality having been found in foodwebs, transport, biological, social, communication, and citation networks (?). In addition, we mention that, besides information transfer, non-normality turns out to be the key to explaining and understanding a variety of other equally important phenomena. For instance, the process of pattern formation in natural and biological systems (?, ?), the selective amplification of cortical activity patterns in the brain (?), and the emergence of giant oscillations in noise-driven dynamical systems (?, ?, ?).
To further highlight the impact and potential practical relevance of our findings, we have used our framework to analyze the communication performance of the neuronal network of the nematode Caenorhabditis elegans. We focused on the (weighted and directed) chemical synapse network described in (?, ?), and examined the linearized and stabilized network dynamics of the neuronal membrane potentials (Materials and Methods). The network, which is illustrated in Figure 6A, comprises 279 neurons (divided into 88 sensory neurons, 82 interneurons, and 107 motor neurons) recurrently coupled through 2194 inhibitory/excitatory synaptic connections. We first wondered whether the non-normality of this directed biological connectome had the beneficial impact on communication that we have documented here for artificial networks. We thus compared its information rate (as a function of the transmission window ) with that of a symmetrized version (implying normal ), as well as a randomized ensemble wherein the direction of each existing coupling in the connectome is reversed with probability . Both manipulations induce a significant drop in from the real network (Figure 6B), indicating that the C. elegans connectome is non-normal in a way that benefits information transmission as shown in this paper. We next wondered if the network’s non-normal structure is likely to be exploited for communication by these organisms. We reasoned that communication would naturally flow from sensory neurons to motor neurons, and that the network should therefore display good communication (in our framework) if, and only if, the input matrix were to select sensory neurons while the output matrix were to read out motor neurons. We found that this is indeed the case (compare Figure 6C green and blue). Strikingly, also, the symmetrized version of the connectome is almost unable to communicate information from sensory to motor nodes (Figure Figure 6B, red). Although preliminary, these numerical findings could shed light on the actual functioning of the C. elegans neuronal circuit and behavioural responsiveness to external stimuli. More generally, we expect that our theoretical framework could be used to understand and explain the emergence of certain topological structures in biological networks, and to identify their intrinsic communication pathways.
In the paper we focused on weighted networks, and regarded the weights (and, precisely, their directionality and magnitude) as the main factors influencing network non-normality. However, non-normal architectures can also emerge in unweighted networks, e.g., in networks with heterogeneous outdegree/indegree distributions. It would therefore be interesting to investigate what the most relevant features impacting non-normality in unweighted networks are, and to what extent these features affect communication performance. Further, in our framework, noise is modelled as the combined effect of readout noise and an internal, structured source of noise arising from inter-symbol interference. Investigating how different noise models could affect our analysis and results represents a compelling direction of future research. Also, noise could play an active role in the information transfer process as the input source of the communication channel. This change of perspective could lead to an information-theoretic interpretation of the findings of (?, ?, ?), wherein non-normality has been linked with the emergence of amplified oscillations in noise-driven interconnected non-linear systems.
As is well known in the theory of non-normal matrices and operators (?), strong departure from normality often implies heightened sensitivity to structural perturbations — for example, the random addition/deletion of nodes or edges in a graph. This suggest a generic trade-off between communication performance and resilience, which would be interesting to study further. For example, we note that in the low-noise regime where normal networks can perform just as well as non-normal ones, constraints on robustness would favour normal networks. A similar trade-off has been identified recently in (?) where network resilience was shown to be generically at odds with network controllability.
Our work may also offer new perspectives on memory and information storage. Information transmission and storage are very similar problems: communication is transmission through space, while memory is transmission through time. Indeed, these two problems admit very similar models, are often both approached using the tools of information theory (?, ?, ?), and may interact in the context of network in ways that would be interesting to investigate further. Preliminary intuitions suggest that they may benefit each other: in our communication model, for example, inter-symbol interference could be reduced if one could keep a memory of decoded past packets, and subtract their individual contributions to the momentary network activity at any time. Conversely, communication may improve memory. An obvious example is the oral tradition in human communities, where transmission of information from generation to generation emerges as a way to overcome the finite memory- (and indeed, life-) span of individuals.
Materials and Methods
Gramian matrices and numerical computation of information capacity and rate.
The observability Gramian over the interval of the system in Equation 1 is defined as
[TABLE]
and can be numerically evaluated via numerical integration of the matrix-valued differential equation:
[TABLE]
subject to the initial condition (?). The infinite-horizon controllability Gramian of the dynamics in Equation 1 discretized with sampling time as
[TABLE]
and can be computed as the solution of the discrete-time algebraic Lyapunov equation:
[TABLE]
In the numerical evaluation of the capacity and rate, the Gramians (14) and (16) has been computed via (15) and (17), respectively. For vector-valued inputs (), the solution of the optimization problem in Equations 5 and 6 has been numerically carried out in Python using optimization routines from the Pymanopt library (?), together with automatic differentiation techniques provided by Autograd (?). If and is normal, the solution is unique and admits a closed-form expression in terms of eigenvalues of (Supplementary Note 5). More generally, if , then the optimization in Equations 5 and 6 is convex (Supplementary Note 3), and so convergence to the maximum is always guaranteed using trust-region or steepest descent methods. Otherwise, the problem turns out to be, in general, non-convex, and, in order to avoid local maxima, we ran the latter routines several times (-), starting from different random initializations, and selected the largest outcome.
Generation of random non-normal matrices and participation ratio.
In Equation 12, the skew-symmetric matrix has been generated as , with for , and otherwise. The positive definite matrix has been drawn from the inverse Wishart distribution with scale matrix and degrees of freedom. We chose , , in order to guarantee sufficient heterogeneity in the eigenvalues of (?). With this choice, it can be shown that correlates well with standard measures of matrix non-normality (Supplementary Note 6). Following (?), given a positive definite matrix with eigenvalues , we define the participation ratio
[TABLE]
When applied to the covariance matrix, the participation ratio provides a measure of the effective dimensionality of the underlying random vector.
Caenorhabditis elegans dataset and network dynamics.
The C. elegans connectivity data of (?, ?) comprise two datasets: the gap junction and chemical synapse wiring diagrams. Since the gap junction dataset does not include link directionality, in our study we focus on the chemical synapse network which possesses clear directionality extracted from electron micrographs. This network consists of 279 neurons. These neurons are categorized in 88 sensory neurons (neurons known to respond to specific environmental conditions), 107 motor neurons (neurons characterized by the presence of neuromuscular junctions), and 82 interneurons (the remainder). The network comprises 2194 synaptic connections. As in (?), we make the common assumption that GABAergic neurons (26 neurons) make inhibitory synapses, whereas the rest of the neurons form excitatory synapses. We describe the autonomous dynamics of the chemical synapse network by the following linear system
[TABLE]
where is the vector containing the membrane potentials of all neurons around an equilibrium, is the adjacency matrix of the chemical synapse network, , and . Here, the parameters , , and represent the (average) neuronal membrane capacitance, synaptic conductance, and membrane conductance, respectively (see (?, ?) for further details). In our numerical study, we set and tune in order to stabilize the network dynamics (19). Specifically, we set the largest real part of the eigenvalues to . This yields , a value within the physiological range of and (?). However, profiles of qualitatively similar to those in Figure 6A, B have been obtained for a wide range of values of parameters , , and noise variance .
Supplementary Materials
Note S1: Controllability and observability Gramians.
Note S2: Derivation of the information capacity formula.
Note S3: Properties of the information capacity and rate.
Note S4: Information rate in the low and high noise regime.
Note S5: Information rate of normal networks.
Note S6: Measures of matrix non-normality and network indicators of non-normality.
Note S7: Information rate of a class of non-normal networks.
Note S8: Optimal communication architectures.
Fig. S1: Non-normality metrics and parameters of chain network.
Fig. S2: Non-normality metrics and parameters of heterogeneous network.
Fig. S3: Construction of “layered” non-normal networks.
Fig. S4: Optimal networks and input covariances as a function of the transmission window .
Fig. S5: Optimal networks and input covariances as a function of the noise covariance .
References: (?, ?, ?, ?, ?, ?, ?, ?).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 11. A. Guille, H. Hacid, C. Favre, D. A. Zighed, Information diffusion in online social networks: A survey . ACM Sigmod Rec. 42 , 17–28 (2013).
- 22. S. Molaei, S. Babaei, M. Salehi, M. Jalili, Information spread and topic diffusion in heterogeneous information networks . Sci. Rep. 8 , 9549 (2018).
- 33. C. Decker, R. Wattenhofer, Proceedings of the 2013 IEEE Thirteenth International Conference on Peer-to-Peer Computing (P 2P) (2013), pp. 1–10.
- 44. R. Cheong, A. Rhee, C. J. Wang, I. Nemenman, A. Levchenko, Information transduction capacity of noisy biochemical signaling networks . Science 334 , 354–358 (2011).
- 55. J. Selimkhanov, B. Taylor, J. Yao, A. Pilko, J. Albeck, A. Hoffmann, L. Tsimring, R. Wollman, Accurate information transmission through dynamic biochemical signaling networks . Science 346 , 1370–1373 (2014).
- 66. S. Galli, A. Scaglione, Z. Wang, For the grid and through the grid: The role of power line communications in the smart grid . Proc. IEEE 99 , 998–1027 (2011).
- 77. S. B. Laughlin, T. J. Sejnowski, Communication in neuronal networks . Science 301 , 1870–1874 (2003).
- 88. A. Avena-Koenigsberger, B. Misic, O. Sporns, Communication dynamics in complex brain networks . Nat. Rev. Neurosci. 19 , 17 (2018).
