Two-walks degree assortativity in graphs and networks
Alfonso Allen-Perkins, Juan Manuel Pastor, Ernesto Estrada

TL;DR
This paper introduces the two-walks degree assortativity measure, extending degree correlation analysis to second neighbors, and explores its properties and occurrences in all small graphs and real-world networks.
Contribution
It provides an analytical expression for two-walks degree assortativity and reveals its structural implications and prevalence in various networks.
Findings
Existence of graphs with degree disassortative and two-walks degree assortative properties.
All biological networks studied are in the disassortative-assortative class.
No networks exhibit assortative-disassortative structure.
Abstract
Degree ssortativity is the tendency for nodes of high degree (resp.low degree) in a graph to be connected to high degree nodes (resp. to low degree ones). It is sually quantified by the Pearson correlation coefficient of the degree-degree correlation. Here we extend this concept to account for the effect of second neighbours to a given node in a graph. That is, we consider the two-walks degree of a node as the sum of all the degrees of its adjacent nodes. The two-walks degree assortativity of a graph is then the Pearson correlation coefficient of the two-walks degree-degree correlation. We found here analytical expression for this two-walks degree assortativity index as a function of contributing subgraphs. We then study all the 261,000 connected graphs with 9 nodes and observe the existence of assortative-assortative and disassortative-disassortative graphs according to degree and…
|
|
|
|
|
|
|
|
|
|
|
|
| No. | Dataset | Domain | N | m | Ref. | ||
| 1 | Drosophila PIN | biological | 3039 | 3715 | [26] | -0.060 | 0.462 |
| 2 | Hpyroli | biological | 710 | 1396 | [27] | -0.243 | 0.161 |
| 3 | KSHV | biological | 50 | 122 | [28] | -0.058 | 0.215 |
| 4 | MacaqueVisualCortex | biological | 30 | 190 | [29] | -0.030 | 0.113 |
| 5 | Malaria PIN | biological | 229 | 604 | [30] | -0.083 | 0.116 |
| 6 | Neurons | biological | 280 | 1973 | [31] | -0.069 | 0.187 |
| 7 | PIN-Afulgidus | biological | 32 | 38 | [32] | -0.472 | 0.154 |
| 8 | Pin-Bsubtilis | biological | 84 | 98 | [33] | -0.486 | 0.136 |
| 9 | PIN-Ecoli | biological | 230 | 695 | [34] | -0.015 | 0.397 |
| 10 | PIN-Human | biological | 2783 | 6438 | [35] | -0.137 | 0.231 |
| 11 | Trans-Ecoli | biological | 328 | 456 | [36] | -0.265 | 0.330 |
| 12 | Transc-yeast | biological | 662 | 1062 | [36] | -0.410 | 0.401 |
| 13 | Trans-urchin | biological | 45 | 80 | [36] | -0.207 | 0.194 |
| 14 | Benguela | ecological | 29 | 191 | [37] | 0.0211 | 0.153 |
| 15 | BridgeBrook | ecological | 75 | 547 | [38] | -0.668 | -0.193 |
| 16 | Canton | ecological | 108 | 708 | [39] | -0.226 | -0.123 |
| 17 | Chesapeake | ecological | 33 | 72 | [40] | -0.196 | 0.081 |
| 18 | Coachella | ecological | 30 | 261 | [41] | 0.0347 | 0.148 |
| 19 | ElVerde | ecological | 156 | 1441 | [42] | -0.174 | 0.009 |
| 20 | ReefSmall | ecological | 50 | 524 | [43] | -0.193 | -0.127 |
| 21 | ScotchBroom | ecological | 154 | 370 | [44] | -0.311 | 0.350 |
| 22 | Shelf | ecological | 81 | 1476 | [45] | -0.094 | -0.035 |
| 23 | Skipwith | ecological | 35 | 364 | [46] | -0.319 | -0.122 |
| 24 | StMarks | ecological | 48 | 221 | [47] | 0.111 | 0.199 |
| 25 | StMartin | ecological | 44 | 218 | [48] | -0.153 | -0.0365 |
| 26 | Stony | ecological | 112 | 832 | [49] | -0.222 | -0.115 |
| 27 | Ythan1 | ecological | 134 | 597 | [50] | -0.263 | -0.119 |
| 28 | World Trade | economic | 80 | 875 | [51] | -0.392 | -0.355 |
|
29 |
SmallW | informational | 233 | 994 | [52] | -0.303 | -0.251 |
| 30 | ColoSPG | social | 324 | 347 | [53] | -0.295 | 0.296 |
| 31 | CorporatePeople | social | 1586 | 13126 | [54] | 0.268 | 0.431 |
| 32 | Dolphins | social | 62 | 159 | [55] | -0.044 | 0.303 |
| 33 | Drugs | social | 616 | 2012 | [51] | -0.117 | 0.304 |
| 34 | Hi-tech | social | 33 | 91 | [56] | -0.087 | 0.191 |
| 35 | Geom | social | 3621 | 9461 | [51] | 0.168 | 0.356 |
| 36 | PRISON-Sym | social | 67 | 142 | [57] | 0.103 | 0.332 |
| 37 | Sawmill | social | 36 | 62 | [58] | -0.071 | 0.243 |
| 38 | social3 | social | 32 | 80 | [59] | -0.119 | 0.179 |
| 39 | Zackar | social | 34 | 78 | [60] | -0.476 | -0.089 |
| 40 | electronic1 | technological | 122 | 189 | [61] | -0.002 | 0.337 |
| 41 | electronic2 | technological | 252 | 399 | [61] | -0.006 | 0.355 |
| 42 | electronic3 | technological | 512 | 819 | [61] | -0.030 | 0.367 |
| 43 | Power grid | technological | 4941 | 6594 | [62] | 0.003 | 0.599 |
| 44 | Software Abi | technological | 1035 | 1736 | [63] | -0.086 | 0.208 |
| 45 | Software Digital | technological | 150 | 198 | [63] | -0.228 | 0.447 |
| 46 | Software Mysql | technological | 1480 | 4221 | [63] | -0.083 | 0.147 |
| 47 | Software-XMMS | technological | 971 | 1809 | [63] | -0.114 | 0.397 |
| 48 | Software-VTK | technological | 771 | 1369 | [63] | -0.195 | 0.126 |
| 49 | USA Air 97 | technological | 332 | 2126 | [52] | -0.208 | -0.000 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Gene Regulatory Network Analysis · Bioinformatics and Genomic Networks
Two-Walks degree assortativity in Graphs and Networks
Alfonso Allen-Perkins, Juan Manuel Pastor, Ernesto Estrada
Abstract.
Degree ssortativity is the tendency for nodes of high degree (resp. low degree) in a graph to be connected to high degree nodes (resp. to low degree ones). It is usually quantified by the Pearson correlation coefficient of the degree-degree correlation. Here we extend this concept to account for the effect of second neighbours to a given node in a graph. That is, we consider the two-walks degree of a node as the sum of all the degrees of its adjacent nodes. The two-walks degree assortativity of a graph is then the Pearson correlation coefficient of the two-walks degree-degree correlation. We found here analytical expression for this two-walks degree assortativity index as a function of contributing subgraphs. We then study all the 261,000 connected graphs with 9 nodes and observe the existence of assortative-assortative and disassortative-disassortative graphs according to degree and two-walks degree, respectively. More surprinsingly, we observe a class of graphs which are degree disassortative and two-walks degree assortative. We explain the existence of some of these graphs due to the presence of certain topological features, such as a node of low-degree connected to high-degree ones. More importantly, we study a series of 49 real-world networks, where we observe the existence of the disassortative-assortative class in several of them. In particular, all biological networks studied here were in this class. We also conclude that no graphs/networks are possible with assortative-disassortative structure.
1. Introduction
Networks represent the topological skeleton of a wide range of systems in nature and society [1, 2, 3, 4]. The characterization of their structure is crucial since it shapes the evolutionary, functional, and dynamical processes that take place in those systems [4, 5, 6].
It is well known that links generally do not connect nodes regardless of their characteristics. In social networks, for instance, evidence suggests that individuals prefer to associate with others of similar age, religion, education or occupation as themselves [7]. Assortativity or assortative mixing is a graph metric that refers to the tendency for nodes in networks to be connected to other nodes that are similar (or different) to themselves in some way [8]. Typically, it is determined for the degree (i.e. the number of direct neighbours, k) of the nodes in the network [9, 10, 11, 12]. The tendency for high-degree nodes to associate preferentially with other high-degree nodes plays a major role in many important processes, such as epidemic spreading, synchronization or network robustness, among others [9, 13, 14, 15, 16]. However, assortativity may be applied to any characteristics of a node, including non-topological vertex properties, such as language or race [8]. Most of the research done in this area has been summarized in the review of Noldus et al. [17]. Other extensions to account for interactions beyond the nearest-neighbours have also been proposed in the recent literature [18].
The aim of this work is to define an assortativity index that captures the influence of first and second neighbours of a node. We then express this two-walks assortativity in terms of the subgraphs contributing to it.
The paper is organized as follows. In Section 2, the preliminaries are presented. In Section 3, the concept of two-walks degree assortativity is introduced and analysed. Main result is demostrated in Section 4. Numerical results are presented in Section 5. Conclusions are summarized in Section 6.
2. Preliminaries
Here we consider simple, undirected graphs , i.e., graphs without multiple edges, self-loops, directions or weights in their edges. The notation used is standard and the reader can check for instance [19]. Let us define some of the measures used in this work in order to make it self-contained. First, we define the degree assortativity index [8]. Mathematically, it is written as:
[TABLE]
where and are the degrees at both ends of link and is the number of links. A positive assortativity index indicates the tendency of higher degree nodes in the graph to be connected to other higher degree nodes. On the other hand, indicates the tendency of higher degree nodes to be connected to lower degree nodes. It was previously proved the following result [11].
Lemma 1**.**
Let be a simple graph. Let be the degree of the vertex . Let the number of edges, and the paths of length two and three, respectively, be the number of triangles in . Then, the assortativity coefficient can be written combinatorially as:
[TABLE]
Let the ratio , the number of star graphs of four nodes, and . Then:
- (1)
assortative (): if and only if , 2. (2)
neutral (): if and only if , and , and 3. (3)
disassortative (): if and only if
It is worth mentioning that the denominator of Eq. 2.2 is non-negative. Consequently, the sign of depends only upon the sign of the numerator, which is determined by the following structural factors: the global clustering coefficient (i.e. ), the intermodular connectivity (i.e. ) and the branching (i.e. ) [11].
The number of subgraphs contributing to the degree assortativity can be obtained using the following results [20].
Lemma 2**.**
Let be a simple graph. Let be the degree of the vertex . Let be the number of triangles in . Then, the number of edges , path of length two and three are given, respectively by
[TABLE]
[TABLE]
[TABLE]
Lemma 3**.**
Let be a simple graph. Let be the degree of the vertex in . Let be the adjacency matrix of . Let and be respectively the number of edges and the number of paths of length two in . Let be the number of subgraphs in (see Table 1). Let be the number of cycles of nodes in . Then, , and are given, respectively by
[TABLE]
[TABLE]
[TABLE]
Lemma 4**.**
Let be a simple graph. Let be the degree of the vertex . Let be the adjacency matrix of . Let , , , , and be the number of subgraphs , , , , and , respectively, in (see Table 1). Then,
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where , , is an all-x vector and denotes the Hadamard product.
3. Two-Walks Degree assortativity in Graphs and Networks
Let us start by the definition of the degree of a node , . The intuition behind this index is very simple. Every nearest neighbour of the node receives an identical weight of . Then, we sum the weights of every node adjacent to to obtain . Mathematically, this corresponds to obtaining the following vector after assigning the unit weights to every node:
[TABLE]
where is an all-ones vector. The intuition behind this index is very simple.
It is customary to consider that not all the neighbours of one particular node are equally important. This is the basis for instance of Katz centrality index [21], eigenvector centrality [22], PageRank [23], subgraph centrality [24] and so for. Then, we can consider that every neighbour of the node is weighted according to its “importance”. Of course, the definition of that importance will define the way in which we will proceed. In order to consider the current development as an extension of the concept of node degree we simply weight every node by its own degree. That is, now we consider the vector as the weighting vector for the nodes of the graph. Consequently, an extension of the concept of degree is given by applying a similar procedure as in (3.1) to ,
[TABLE]
It is straightforward to realize that . Then, obviously, the entries of this new vector represent a new kind of centrality of the nodes which counts the number of two-walks starting at the corresponding node. Consequently, we suggest the name of "two-walks" degree for the entries of . Let us call the th entry of in a graph. Notice that accounts for the degree of the node , i.e., closed walks of length two, as well as for the number of second neighbours of this node. Then,
[TABLE]
where is the neighbourhood of the node , i.e., . That is, the two-walks degree represents the number of weighted neighbours that the node has, where the weight of the nodes is given by its own degree.
Let us now define a quantity analogous to the degree assortativity index based on the two-walks degrees instead of on the node degrees.
Definition 5**.**
Let be a connected simple graph with adjacency matrix and let be the two-walks degree of the vertex . The two-walks degree assortativity index of a graph is defined as
[TABLE]
Obviously, this quantity tell us how well connected the most important nodes in a graph are. That is, if , the graph is two-walks degree assortative, which means that the most weighted nodes in terms of the degree of their neighbours tend to be connected to each other. On the other hand, if if , the graph is two-walks degree disassortative, which means that the most weighted nodes in terms of the degree of their neighbours tend to be connected to those with least weight. If , neither of these two tendencies is observed and we shall call such graphs neutral.
In Fig. 3.1 we represent a graph which is strongly disassortative for the degree () but it is assortative for the two-walks degree index (). We plot the graph with the nodes weighted by the difference between the degree (resp. two-walks degree) minus the average degree (resp. average two-walks degree). The negative values are colored in red and the positive contributions in blue. The size of the nodes is proportional to the absolute value of this difference. As can be seen in this picture the degree-degree interaction between the nodes (left panel) is dominated by red-blue interactions, which indicates a large number of interactions between high degree nodes (blue ones) with low degree ones (red nodes). This of course results in a negative degree assortativity coefficient. On the other hand, for the two-walks degree plot the graph is dominated by blue-blue and red-red interactions. That is, nodes of high two-walks degree interact with each other, and low two-walks degree nodes also interact preferentially among them. This effects result in a two-walks degree assortativity coefficient.
With the new correlation coefficient introduced here we assess the tendency of neighbourhoods with many interactions to be connected to other ”high-connected” neighbourhoods. However, in order for a graph to display a transition from degree diassortative to two-walks degree assortative it is necessary that there are separator nodes between the high-degree nodes. The graph in Fig. 3.1 has a separator, which is the node of degree 2 connecting both nodes of degree 3 and 5. A separator must be a low-degree node which connects two or more high-degree ones. Notice that if the number of high-degree nodes connected to the separator is too high, it will produce an increase in its own degree, which decreases its chances of being a proper separator. This characteristic–a separator connected to two high-degree nodes–introduces disassortativity to the graph. However, in term of the second-order correlation a separator allow the two-steps interactions between hubs, which results in two-walks degree assortativity. Mathematically, it is not difficult to see that the two-walks degree is related to walks of length two between node.
It is easy to realize that the two-walks degree assortativity can be written in matrix-vector form in the following way:
[TABLE]
4. Main Result
Our main result here consists on the determination of the two-walks degree assortativity of a graph in terms of contributing subgraphs of the graph. This allows us to understand this quantity in structural terms for the analysis of real world systems in further sections of this work.
Theorem 6**.**
Let be a simple graph. Then, is, in terms of two-walks degree,
i) assortative if ,
ii) neutral if ,
iii) disassortative if ,
where
[TABLE]
and and .
First, we prove that the denominator of the expression (3.5) is always non-negative.
Lemma 7**.**
Let be a connected simple graph with adjacency matrix . Let and be vectors of the nodes degrees and a vector of nodes two-walks degrees, respectively. Then,
[TABLE]
where is the network’s number of edges, is an all-ones vector and denotes the Hadamard product.
Proof.
By the Cauchy-Bunyakovsky-Schwarz inequality:
[TABLE]
Then, we have
[TABLE]
As is a connected simple graph, and the maximum degree in the graph is , then, , and hence the last term is always greater than or equal to zero, which proves the result. ∎
What remains now for the proof of the main result is to express the numerator of the Pearson coefficient of the two-walks degree - two-walks degree correlation in terms of subgraphs of the graph (reminding that when the denominator is equal to zero, the Pearson Correlation coefficient is not defined). We can write as follow
[TABLE]
where and are the two-walks degrees of nodes and , respectively, located at both ends of link . We can now rewrite the sums in Eq. (4.5) as:
[TABLE]
[TABLE]
Let us now find the expressions for the two terms contributing to . The first is given by
[TABLE]
where and are the number of paths of order and , respectively, and is the number of fragments which are illustrated in Table 1. We will give formulas for calculating these fragments for the sake of completeness of the paper.
For the second term contributing to we have
[TABLE]
Thus, we can rewrite as:
[TABLE]
which proves the main result.
Let us now give the formulas for calculating the subgraphs remaining in the expression of the two-walks degree assortativity which have not been previously defined. The proofs of these results are based on the strategy developed and explained in [25] and are not given here as they are lengthly and technical.
Lemma 8**.**
Let be a simple graph. Let be the degree of the vertex . Let be the number of subgraphs (see Table 1). Then,
[TABLE]
Lemma 9**.**
Let be a simple graph. Then, the number of subgraphs and in are given by, respectively,
[TABLE]
[TABLE]
5. Computational results
5.1. Small graphs
In this Section we describe the results obtained for all the 261,000 connected graphs with 9 unlabelled nodes. We have calculated the degree and two-walks degree assortativities for these graphs (see Fig. 5.1). As we can see there is no trivial correlation between the two indices, which indicates that the new index does not duplicate the structural information contained in the degree assortativity and consequently gives some new structural insights about graphs. This conclusion is also easily obtained by considering the subgraph contributions to both measures.
According to computer calculations 7% of the networks are assortative-assortative by both measures (AA), 60% are disassorartive-disassortative (DD) and 33% are disassortative by degree and assortative by two-walks degree (DA). The main observation is that there are no graphs which are degree assortative and two-walks degree disassortative (AD). We conjecture that these graphs cannot exist. Computer calculations show that . Therefore, we can express the numerator of the neighbourhood assortativity Eq. (4.10) as follows:
[TABLE]
Using the results from [11], if , then . The intuition behind this result is very simple. Nodes that belong to a degree assortativitive network tend to be linked to other nodes with similar degree. Therefore, their two-walks degrees tend to be similar too.
Generally, the second-neighbour degree assortativity depends on the balance between four structural factors: the weighted sum of subgraphs given by , transitivity (), intermodular connectivity (), relative branching (). The first three produce a positive contribution to the two-walks degree assortativity of a network, while branching is more likely associated with disassortative networks.
5.2. Real-world networks
In this subsection we study of group of 49 real-world networks representing systems in ecological (E), biological (B), social (S), technological (T) and socio-economic (SE) envirnments. The networks are described in the Appendix of this paper. We have calculated the degree and two-walks degree assortativities for these networks (see Fig. 5.2). According to these results 14% of the networks are assortative-assortative (AA) according to both measures, 24% are disassorartive-disassortative (DD) and the majority of networks analyzed (61%) are diassortative-assortative (DA). This confirms our previous observation that there are no graphs/networks which are assortative-disassorartive (AD). The analysis of the networks according to the functions shows the following trends: 53% of the ecological networks analyzed are DD, 27% are DA and 20% are AA; 50% of the social networks analyzed are DA, 30% are AA and 20% are DD; 80% of technological networks are DA, 10% are AA and 10% are DD. Finally, 100% of biological networks considered are DA. They included 9 protein-protein interaction networks (PINs), 3 transcription networks and 3 brain networks. This is a remarkable observation because it is the only single functional class of networks which is formed by one structural class, i.e., DA.
An important characteristic of our current approach is that we can understand the structural causes for the different kinds of assortativity in networks using the interpretation of these quantities in terms of subgraphs of the graph. As we have seen before an important structural feature of graphs allowing the transition from degree disassortative to two-walks degree assortative is the presence of separators. It has to be stressed that this is not a unique structural feature of this kind of networks and more studies are needed to completely understand the structural chracterization of this kind of networks. However, it is easy to visualize the connectors in the small PIN of the bacterium B. subtilis (see left panel in Fig. 5.3). In Fig. 5.3 we also illustrate the degree and two-walks degree of the nodes in the food web of ScotchBroom and in the transcription network of E. coli. All of them displaying DA characteristics.
6. Conclusions
Here we have proposed an extension of the concept of degree assortativity to one that account for thecorrelation between the degrees of the nodes and their nearest neighbours in graphs and networks. This measure, here named the two-walks degree assortativity, is expressed in terms of subgraphs of the graph. As we have proved here there are a few more fragments contributing to the two-walks degree assortativity than to the degree assortativity. This clearly indicates that the new quantity accounts for more structural information than the previous one. We have seen that both measures are not linearly correlated neither for all the connected graphs with 9 nodes nor for real-world networks. Further studies are needed to understand the role of this quantity in the study of real-world problems, as we have seen here, there are some apparently universal features of some classes of networks in relation to this quantity. For instance, all real-world biological networks studied here are degree disassortative but two-walks assortative. The implications of this observation for the study of the biological processes taking place on these networks is far beyond the scope of this work.
Appendix: Real-world network dataset
The real-world networks used in this paper belong to different domains: ecological (includes food webs and ecosystems), social (networks of friendships, communication networks, corporate relationships), technological (internet, transport, software development networks), informational (vocabulary networks, citations) and biological (protein-protein interaction networks, transcriptional regulation networks). The dataset comprises networks of different sizes, ranging from to nodes. The networks are listed in Table 2.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Albert and A.-L. Barabási, “Statistical mechanics of complex networks”, Rev. Mod. Phys. 74 , 47 (2002).
- 2[2] M. E. J. Newman, “The Structure and Function of Complex Networks”, SIAM Rev. 45 , 167 (2003).
- 3[3] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, “Complex networks: Structure and dynamics”, Phys. Rep. 424 , 175 (2006).
- 4[4] S. H. Strogatz, “Exploring complex networks”, Nature (London) 410 , 268 (2001).
- 5[5] A. Barrat, M. Barthélemy, and A. Vespignani, Dynamical Processes on Complex Networks (Cambridge University Press,UK, 2008).
- 6[6] L. D. F. Costa, F. A. Rodrigues, G. Travieso, and P. R. V. Boas, “Characterization of complex networks: A survey of measurements”, Adv. Phys. 56 , 167 (2007).
- 7[7] M. Mc Pherson, L. Smith-Lovin and J. M. Cook, “Birds of a feather: Homophily in social networks”, Annual Review of Sociology 27 : 415 (2001).
- 8[8] M. E. J. Newman, “Mixing patterns in networks”, Phys. Rev. E 67 , 026126 (2003)
