Diffusion geometry unravels the emergence of functional clusters in   collective phenomena

Manlio De Domenico

arXiv:1704.07068·physics.soc-ph·April 25, 2017

Diffusion geometry unravels the emergence of functional clusters in collective phenomena

Manlio De Domenico

PDF

TL;DR

This paper introduces a diffusion geometry-based approach using random walk dynamics to identify functional clusters in complex systems, revealing mesoscale organization that differs from structural modules in biological and artificial networks.

Contribution

It presents a novel framework leveraging diffusion geometry and random walk metrics to predict and analyze functional modules in complex networked systems.

Findings

01

Diffusion distance effectively captures functional clusters.

02

Functional modules often differ from structural modules.

03

The approach applies to both biological and synthetic systems.

Abstract

Collective phenomena emerge from the interaction of natural or artificial units with a complex organization. The interplay between structural patterns and dynamics might induce functional clusters that, in general, are different from topological ones. In biological systems, like the human brain, the overall functionality is often favored by the interplay between connectivity and synchronization dynamics, with functional clusters that do not coincide with anatomical modules in most cases. In social, socio-technical and engineering systems, the quest for consensus favors the emergence of clusters. Despite the unquestionable evidence for mesoscale organization of many complex systems and the heterogeneity of their inter-connectivity, a way to predict and identify the emergence of functional modules in collective phenomena continues to elude us. Here, we propose an approach based on…

Figures7

Click any figure to enlarge with its caption.

Equations12

\dot{θ}_{i} (τ) = ω_{i} + j = 1 \sum N σ_{ij} A_{ij} sin (θ_{j} (τ) - θ_{i} (τ)) .

\dot{θ}_{i} (τ) = ω_{i} + j = 1 \sum N σ_{ij} A_{ij} sin (θ_{j} (τ) - θ_{i} (τ)) .

\dot{θ} = - \tilde{L} θ,

\dot{θ} = - \tilde{L} θ,

s_{τ}^{2} (i, j) = [\tilde{θ} (τ; i) - \tilde{θ} (τ; j)]^{2},

s_{τ}^{2} (i, j) = [\tilde{θ} (τ; i) - \tilde{θ} (τ; j)]^{2},

c_{τ}^{2} (i, j) = [\tilde{x} (τ; i) - \tilde{x} (τ; j)]^{2},

c_{τ}^{2} (i, j) = [\tilde{x} (τ; i) - \tilde{x} (τ; j)]^{2},

\dot{p} (τ) = - p (τ) \tilde{L},

\dot{p} (τ) = - p (τ) \tilde{L},

d_{τ}^{2} (i, j) = [p (τ ∣ i) - p (τ ∣ j)]^{2},

d_{τ}^{2} (i, j) = [p (τ ∣ i) - p (τ ∣ j)]^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRandom Search

Full text

Diffusion geometry unravels the emergence of functional clusters in collective phenomena

Manlio De Domenico

Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain

Abstract

Collective phenomena emerge from the interaction of natural or artificial units with a complex organization. The interplay between structural patterns and dynamics might induce functional clusters that, in general, are different from topological ones. In biological systems, like the human brain, the overall functionality is often favored by the interplay between connectivity and synchronization dynamics, with functional clusters that do not coincide with anatomical modules in most cases. In social, socio-technical and engineering systems, the quest for consensus favors the emergence of clusters.

Despite the unquestionable evidence for mesoscale organization of many complex systems and the heterogeneity of their inter-connectivity, a way to predict and identify the emergence of functional modules in collective phenomena continues to elude us. Here, we propose an approach based on random walk dynamics to define the diffusion distance between any pair of units in a networked system. Such a metric allows to exploit the underlying diffusion geometry to provide a unifying framework for the intimate relationship between metastable synchronization, consensus and random search dynamics in complex networks, pinpointing the functional mesoscale organization of synthetic and biological systems.

The absence of a central authority coordinating the interactions among units of a complex system might lead to interesting collective phenomena, such as synchronization Arenas et al. (2008) in biological systems or consensus Olfati-Saber et al. (2007) in social and technological networks. This type of self-organization is affected by the underlying structure, which for a wide variety of real systems is highly heterogenous Barabási and Albert (1999) and modular Krause et al. (2003); Guimera and Amaral (2005). Understanding the interplay between structure and dynamics of such systems has been, and still is, a major challenge in the study of complex systems. Empirical observations, confirmed by numerical simulations and theoretical predictions, suggest that complex systems with hierarchical and/or modular mesoscale organization of their units Fortunato (2010) are characterized by topological scales Arenas et al. (2006) and the emergence of functional clusters that might be, in general, different from topological ones.

In this letter, we show that such functional clusters might be predicted and identified for a wide variety of complex networks. More specifically, for biological systems which can be modeled as networks of oscillators, and for systems of individuals or sensors attempting to reach consensus. The unifying picture is provided by diffusion geometry Coifman et al. (2005), developed one decade ago for nonlinear dimensionality reduction of complex data. This approach uses Markov processes to integrate local similarities at different scales, allowing to approximate the manifold which better describes the data while preserving their topological features. From a physical perspective, this approach relies on topological information gathered by random searches across time, a principle that has been used successfully in network science to unravel the topological mesoscale organization of a system based on how information flows through its units Rosvall and Bergstrom (2008); Delvenne et al. (2010); Schaub et al. (2012); Della Rossa et al. (2013); Lambiotte et al. (2014); Rosvall et al. (2014).

Synchronization dynamics. Let us indicate with $A_{ij}$ the entries of the adjacency matrix $\mathbf{A}$ representing the connections among a set of $N$ units (note that $A_{ij}=1$ if two units are connected and zero otherwise), each one encoding an oscillator with natural frequency $\omega_{i}$ and phase $\theta_{i}$ . The dynamics of this networked system of oscillators has been widely studied in the last decades Arenas et al. (2008) and it is generally described by the Kuramoto model:

[TABLE]

The choice of $\sigma_{ij}$ , the mixing rate, determines the speed of convergence to a synchronized state, if any, and the behavior of the system in the thermodynamic limit $N\longrightarrow\infty$ . It has been shown that, at variance with one’s naive expectation, synchronizability does not necessarily correlate with the average distance between oscillators, which might be extraordinarily small in the case of strongly heterogeneous connectivity Cohen and Havlin (2003). Such an heterogeneity might, in fact, suppress synchronization in networked oscillators which are coupled symmetrically with uniform coupling strength Nishikawa et al. (2003). A solution to this apparent paradox Motter et al. (2005) – undermining the relevance of scale-free paradigm as a universal property of robust self-organizing phenomena favored by evolutionary dynamics Barabási and Albert (1999) – is to consider a mixing rate which is inversely proportional to node’s degree $k_{i}=\sum\limits_{j}A_{ij}$ , i.e. $\sigma_{ij}=K/k_{i}$ , being $K$ an overall coupling constant (that we set equal to 1 in our analysis). This choice effectively reduces the dephasing effects in hubs, putting in a closer relationship the dynamics of synchronization close to the global attractor with the dynamics of information diffusion in the network, confirming that synchronizability does not only spread along shortest paths between two units but along all possible ones.

In complex networks with a well defined mesoscale organization, nodes belonging to the same cluster tend to synchronize to a common phase, not necessarily equal for all clusters, while the dynamics towards synchronization evolves Oh et al. (2005). If the natural frequency is the same for all units, there is only one attractor for the dynamics, corresponding to the point where all phases are the same, i.e., $\theta_{i}(\tau\longrightarrow\infty)=\theta^{\star}$ for $i=1,2,...,N$ and $\tau$ representing time. Numerical experiments show that a strong cluster organization favors a metastable synchronized state, where $\theta_{i}\simeq\theta_{j}$ if nodes $i$ and $j$ belong to the same cluster. In this peculiar state – and for a sufficiently small amount of time – contributions from units which act as bridges with other clusters might be neglected with respect to the larger number of intra-cluster contributions. The overall dynamics therefore consists of a first phase, where intra-cluster synchronization takes place, followed by a second phase where cluster-cluster synchronization emerges, slowly driving the system towards its global attractor (see Fig. 1). During both phases, $\sin(\theta_{j}-\theta_{i})\simeq(\theta_{j}-\theta_{i})$ and the dynamics can be approximately described by

[TABLE]

where $\mathbf{\tilde{L}}=\mathbf{I}-\mathbf{D}^{-1}\mathbf{A}$ is the normalized Laplacian matrix, $\mathbf{I}$ is the identity matrix, $D_{ii}=k_{i}$ and $D_{ij}=0$ for $i\neq j$ . The matrix $\mathbf{\tilde{L}}$ governing the dynamics is the same which governs the diffusion of a random walker and the probability to find it in a certain node at a certain time step, as we will see later. During the metastable state, we can describe the common phase of nodes which are clustered together by $\theta_{0}^{C_{m}}$ , with $m=1,2,...,M$ indicating the cluster, and we indicate with $\bm{\theta}_{0}$ the vector $(\theta_{0}^{C_{1}},\theta_{0}^{C_{2}},...,\theta_{0}^{C_{M}})$ . Let us introduce the rectangular matrix $\mathbf{S}$ encoding the (unknown) mesoscale organization of the system, i.e., $S_{im}=1$ if node $i$ belongs to cluster $m$ and it is zero otherwise. Such definitions allow us to write the state vector in a very compact form as $\mathbf{z}=\mathbf{S}\bm{\theta}_{0}$ . Let us make a localized small perturbation on the phase of unit $i$ : the perturbed state can be written as $\mathbf{z}_{i}=\mathbf{z}+\delta\theta_{0}\mathbf{v}_{i}$ , being $\mathbf{v}_{i}$ the canonical vector with $i-$ th component equal to 1 and $\delta\theta_{0}\ll 1$ . By assuming the metastable state as the initial condition, the state of the system at time $\tau$ is given by $\bm{\theta}(\tau;i)=\exp{(-\tau\mathbf{\tilde{L}})}\mathbf{z}_{i}$ . It is plausible to expect that the magnitude of the difference between the evolution of the perturbed states $\mathbf{z}_{i}$ and $\mathbf{z}_{j}$ is small when the corresponding nodes belong to the same cluster and larger when this is not the case. We define the synchronizability distance between two nodes by

[TABLE]

with $\bm{\tilde{\theta}}(\tau;i)=\mathbf{z}_{i}\exp(-\tau\mathbf{\tilde{L}})$ , to quantify how easy for two nodes is to reach a common phase during a metastable state. Intriguingly, the synchronizability distance reduces to $s^{2}_{\tau}(i,j)\propto\left[(\mathbf{v}_{i}-\mathbf{v}_{j})e^{-\tau\mathbf{\tilde{L}}}\right]^{2}$ , where the right-hand side is better known as diffusion distance Belkin and Niyogi (2001).

Consensus dynamics. In a social context, as well as in a system of sensors, decision-making processes require individuals (or units) to exchange information to self-organize and, under certain circumstances – such as the absence of coordinating authorities or external influences – the emergence of consensus is observed Olfati-Saber and Murray (2004); Olfati-Saber et al. (2007). A distributed consensus dynamics based on a linear protocol exists and it is governed by the Laplacian matrix of the network. Because of the natural heterogeneity observed in this type of systems Liljeros et al. (2001), it is desirable to define a consensus dynamics where the weight due to high connectivity of a few individuals is somehow compensated, for instance by rescaling the amount of exchanged information by their degree. This type of decentralized opinion-formation dynamics is equivalent to a continuous-time DeGroot model DeGroot (1974) and can mathematically described as in Eq. (2), with the opinion vector $\mathbf{x}(\tau)$ playing the role of the phase vector $\bm{\theta}(\tau)$ . It is straightforward to show that the weighted-average consensus is asymptotically reached Olfati-Saber et al. (2007). Similarly to the case of synchronization, we expect that in a network with a mesoscale organization, individuals or units within a cluster tend to reach consensus before, successively driving the collective dynamics of the system towards the overall consensus (see Fig. 1C). To better understand this process, we consider that the system is in a consensus state except for node $i$ , e.g. $\mathbf{x}(0)=\mathbf{v}_{i}$ . We consider the same setup with another node $j\neq i$ and then we track the evolution of both states over time. We introduce the consensus distance

[TABLE]

with $\mathbf{\tilde{x}}(\tau;i)=\mathbf{v}_{i}\exp(-\tau\mathbf{\tilde{L}})$ , under the plausible assumption that, like in the case of synchronization, this distance tends to be small if the two nodes belong to the same cluster and it is larger otherwise. This distance can be rewritten as $c^{2}_{\tau}(i,j)=\left[(\mathbf{v}_{i}-\mathbf{v}_{j})e^{-\tau\mathbf{\tilde{L}}}\right]^{2}$ , where the right-hand side is the diffusion distance.

Using diffusion geometry to reveal functional clusters. The dynamics describing how a piece of information diffuses through networked systems has been well studied for classical Noh and Rieger (2004) and multilayer networks De Domenico et al. (2014, 2016) (see Ref. Masuda et al. (2016) for a thorough review). The probability to find the random walker in any node after a certain amount of time $\tau$ is given by the solution of the master equation

[TABLE]

where $\mathbf{\tilde{L}}$ is the normalized Laplacian matrix we have discussed before. The general solution is given by $\mathbf{p}(\tau)=\mathbf{p}(0)\exp{(-\tau\mathbf{\tilde{L}})}$ . Here, we indicate by $\mathbf{p}(\tau|i)=\mathbf{v}_{i}\exp{(-\tau\mathbf{\tilde{L}})}$ the probability vector corresponding to the initial condition where the walker’s origin is in node $i$ with probability 1 (i.e., $\mathbf{p}(0)=\mathbf{v}_{i}$ ).

We exploit the intriguing connection between the measure of synchronizability in the metastable state, consensus and information diffusion to identify synchronization/consensus clusters, after mapping this problem into a hidden geometric space induced by Markov dynamics. The diffusion distance Belkin and Niyogi (2001) between nodes $i$ and $j$ is defined by

[TABLE]

where $p_{k}(\tau|i)$ encodes the probability to find a random walker originated in $i$ at node $k$ , at time $\tau$ . Diffusion maps, built on this concept, are widely adopted for low-dimensional embedding of high-dimensional data Coifman et al. (2005); Jones et al. (2008) and provide a unified probabilistic interpretation for spectral embedding and clustering algorithms Nadler et al. (2008), among others. The diffusion distance between two nodes is small if there are many paths which connect them, allowing information to be easily exchanged. We can exploit this property to gather insight about physical processes, such as information diffusion, and collective phenomena with emergent behavior, such as synchronization and consensus dynamics. In fact, in a complex network where units are organized in functional clusters, the diffusion distance among nodes belonging to the same cluster must be small, because the mesoscale structure favors the information exchange within the clusters rather than across them. The relationships among these processes is made explicit by the identities $s^{2}_{\tau}=\delta\theta_{0}d^{2}_{\tau}(i,j)$ and $c^{2}_{\tau}(i,j)=d^{2}_{\tau}(i,j)$ .

At a specific time delay $\tau$ , the diffusion distances among all pair of nodes define a matrix $\bm{\Delta}_{\tau}$ , that we name diffusion-distance matrix in the following. To obtain a geometrical intuition about its meaning, we can embed the units into a low-dimensional Euclidean space by using, for instance, multidimensional scaling (Fig. 2A). In this diffusion space, closer points correspond to units with smaller diffusion distance, i.e., to nodes that successfully exchange information in less than $\tau$ steps (Fig. 2B). Important consequences of this approach include the mapping from network’s mesoscale to clusters in space (Fig. 2C) and the identification of hierarchies at multiple resolutions. When $\tau$ is small, micro scale structure is revealed, while for increasing $\tau$ the mesoscale is screened until the macro scale structure is captured.

For specific applications, it might be useful to identify the mesoscale structure which provides the best coarse-groaning of the system, with respect to certain criteria. We use the persistence of the mesoscale across time, if any, to characterize the system. By construction, the diffusion distance between two units tends to zero for increasing time, it is therefore necessary to normalize it appropriately to allow the comparison between the cluster formation at different values of $\tau$ . As shown in Fig. 2D, this can be accomplished by using the normalized matrix $\bm{\tilde{\Delta}}_{\tau}=\bm{\Delta}_{\tau}/\max\limits_{ij}(\Delta_{ij}(\tau))$ , with the persistence of clusters being encoded in the persistence of the diffusion distance between their units. We exploit the fact that the normalized diffusion distance quickly shrinks for intra-cluster nodes, to guarantee that the average diffusion-distance matrix, defined by $\bm{\bar{\Delta}}=\tau_{max}^{-1}\sum\limits_{\tau=1}^{\tau_{max}}\bm{\Delta}_{\tau}$ – where $\tau_{max}$ is a temporal cutoff – will preserve this geometrical persistence. For $\tau_{max}\approx N$ , i.e., the size of the system, the results obtained from the matrix $\bm{\bar{\Delta}}$ are robust to the choice of this cutoff. It $\tau_{max}\ll N$ , the random walkers have not enough time to search through the system, and only the mesoscale closer to the micro scale can be revealed. Conversely, if $\tau_{max}\gg N$ , the information gathered during the search is washed out and only the macro scale can be captured. The hierarchical clustering of units in the diffusion space of average distances reveals the most persistent clusters and their hierarchical organization (Fig. 2E). To understand which hierarchy better represents the mesoscale structure, it is natural to analyze the corresponding network of clusters, where each node is a functional super-unit – consisting of units belonging to the same functional cluster – and connections between super-units are weighted by inter-cluster connectivity. The average diffusion distance among super-units is expected to be maximum when diffusion between clusters is extremely hindered; this happens when the most representative functional mesoscale is captured, and it is significantly different from random expectation (Fig. 2F).

To better understand the relationship between structural communities, due to purely topological connectivity, and the functional clusters, due to the interplay between structure and dynamics previously described, we have generated and analyzed ensembles of Girvan-Newman networks Girvan and Newman (2002), while varying the ratio between inter- and intra-community connectivity. Diffusion geometry identifies clusters in agreement with structural ones when this ratio is very small – i.e., when the structural mesoscale is strongly organized into well-defined clusters – and provides different results for larger ratios, by identifying a larger number of functional modules, compared to other methods Reichardt and Bornholdt (2004); Blondel et al. (2008); Rosvall and Bergstrom (2007, 2008) (see Suppl. Fig. 3).

Given the expected difference between topological and functional clusters, as an application of our framework we analyze an empirical network providing anatomical connectivity within and between visual cortical and sensorimotor areas in Macaque brain Négyessy et al. (2006). Our analysis (see Suppl. Fig. 4) reveals a hierarchical functional organization of cortical units, significantly different from what should be expected from a network with the same connectivity distribution in absence of correlations. The importance of ventral intraparietal (VIP) region in bridging the two functional areas is manifested from the analysis, in perfect agreement with previous findings Négyessy et al. (2006). Other key functional modules, such as areas 46 and 7a, are successfully identified, confirming studies based on neural collective behavior measured from transfer entropy functional connectivity and blood oxygenation level-dependent correlation patterns Honey et al. (2007). It is worth remarking that despite our results are not based on external functional information, they provide results comparable with existing knowledge obtained from that information. The analysis of similarities among the identified functional clusters, the anatomical ones and the structural mesoscale organization obtained from the spin-glass approach Reichardt and Bornholdt (2004), shows that our diffusion geometry framework identifies a functional organization that is distinct from the structural one (see Suppl. Fig. 5).

As diffusion mapping revolutionized applied math and machine learning, we envision many potential applications in complex systems physics based on the unifying framework of diffusion geometry. Complementary to approaches based on network’s hidden geometry deduced from structural properties Serrano et al. (2008); Boguna et al. (2009); Papadopoulos et al. (2012); Kleineberg et al. (2016), future applications to multilayer networks De Domenico et al. (2013); Del Genio et al. (2016); De Domenico et al. (2016) will allow to gain further insight on collective phenomena emerging from the interplay between structure and dynamics in such systems.

Acknowledgements.

*The author thanks Alex Arenas, Joan T. Matamalas and Massimo Stella for fruitful discussions. MDD acknowledges financial support from MINECO program Juan de la Cierva (IJCI-2014-20225). *

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arenas et al. (2008) A. Arenas, A. Díaz-Guilera, J. Kurths, Y. Moreno, and C. Zhou, Physics reports 469 , 93 (2008).
2Olfati-Saber et al. (2007) R. Olfati-Saber, J. A. Fax, and R. M. Murray, Proceedings of the IEEE 95 , 215 (2007).
3Barabási and Albert (1999) A. Barabási and R. Albert, Science 286 , 509 (1999).
4Krause et al. (2003) A. E. Krause, K. A. Frank, D. M. Mason, R. E. Ulanowicz, and W. W. Taylor, Nature 426 , 282 (2003).
5Guimera and Amaral (2005) R. Guimera and L. A. N. Amaral, Nature 433 , 895 (2005).
6Fortunato (2010) S. Fortunato, Physics reports 486 , 75 (2010).
7Arenas et al. (2006) A. Arenas, A. Díaz-Guilera, and C. J. Pérez-Vicente, Physical Review Letters 96 , 114102 (2006).
8Coifman et al. (2005) R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker, PNAS 102 , 7426 (2005).