Relational flexibility of network elements based on inconsistent   community detection

Heetae Kim; Sang Hoon Lee

arXiv:1904.05523·physics.soc-ph·August 20, 2019

Relational flexibility of network elements based on inconsistent community detection

Heetae Kim, Sang Hoon Lee

PDF

TL;DR

This paper introduces companionship inconsistency as a new measure to quantify how strongly nodes are affiliated with communities, revealing unique node characteristics and offering insights into network structure and node roles.

Contribution

It proposes companionship inconsistency as a novel node centrality measure derived from community detection degeneracy, providing a new perspective on node relationships in networks.

Findings

01

Companionship inconsistency identifies outsider and promiscuous nodes in social networks.

02

It diagnoses balance in power transmission in infrastructure networks.

03

Discloses intrinsic node properties related to higher-order network organization.

Abstract

Community identification of network components enables us to understand the mesoscale clustering structure of networks. A number of algorithms have been developed to determine the most likely community structures in networks. Such a probabilistic or stochastic nature of this problem can naturally involve the ambiguity in resultant community structures. More specifically, stochastic algorithms can result in different community structures for each realization in principle. In this study, instead of trying to "solve" this community degeneracy problem, we turn the tables by taking the degeneracy as a chance to quantify how strong companionship each node has with other nodes. For that purpose, we define the concept of companionship inconsistency that indicates how inconsistently a node is identified as a member of a community regarding the other nodes. Analyzing model and real networks, we…

Tables2

Table 1. Table 1: The Pearson correlation coefficient r 𝑟 r and p 𝑝 p -value (corresponding to the null hypothesis of no correlation) between CoI and degree, betweenness, and mean externality of the clustered ER network model and real networks, where we also present the number of nodes ( N 𝑁 N ), that of edges ( L 𝐿 L ), and the resolution parameter γ 𝛾 \gamma for community detection used. For all of the cases, the CoI and mean externality values are calculated from independent 20 20 20 community detection results. For the clustered ER networks, the results are from the collection of the nodes’ centrality values for all of the 20 20 20 different network realizations.

Network	$N$	$L$	$γ$	Correlation	Degree	Betweenness	Mean externality
The clustered ER networks	500	5000	0.7	$r$	$0.155$	$0.458$	$0.553$
The clustered ER networks	500	5000	0.7	$p$ -value	$9.99 \times 10^{- 55}$	$< 2.22 \times 10^{- 308}$	$< 2.22 \times 10^{- 308}$
Zachary’s Zachary Karate club Zachary (1977)	34	77	0.7	$r$	$- 0.131$	$- 0.101$	$0.458$
Zachary’s Zachary Karate club Zachary (1977)	34	77	0.7	$p$ -value	$0.461$	$0.571$	$6.49 \times 10^{- 3}$
Star Wars (all episodes) Gabasova (2016)	111	450	0.6	$r$	$0.311$	$0.204$	$0.357$
Star Wars (all episodes) Gabasova (2016)	111	450	0.6	$p$ -value	$9.06 \times 10^{- 4}$	$0.0318$	$1.22 \times 10^{- 4}$
Work place contacts Génois et al. (2015)	92	755	0.6	$r$	$- 0.150$	$- 0.0966$	$0.421$
Work place contacts Génois et al. (2015)	92	755	0.6	$p$ -value	$0.154$	$0.360$	$3.00 \times 10^{- 5}$
Facebook friends Mastrandrea et al. (2015)	156	4515	0.8	$r$	$0.0846$	$0.130$	$0.542$
Facebook friends Mastrandrea et al. (2015)	156	4515	0.8	$p$ -value	$0.294$	$0.1045$	$2.61 \times 10^{- 13}$
Central Chilean power grid Kim et al. (2018)	347	444	0.7	$r$	$0.0691$	$0.0683$	$0.332$
Central Chilean power grid Kim et al. (2018)	347	444	0.7	$p$ -value	$0.199$	$0.204$	$2.4 \times 10^{- 10}$

Table 2. Table 2: The list of Star Wars characters including a result of community partition [as illustrated in Fig. 4 (c)], CoI, BC, mean externality ( E 𝐸 E ), and degree ( k 𝑘 k ). The characters belong to one of resistances (R), Jedi knights (J), and imperial military army (M).

Character	Community	CoI	BC	$E$	$k$	CoI $\times k$
Luke Skywalker	J	0.552	0.3935	0.512	26	$14.352$
C-3PO	R	0.231	0.1174	0.170	20	$4.620$
Princess Leia	R	0.231	0.1306	0.274	19	$4.389$
Han Solo	R	0.231	0.1006	0.144	16	$3.696$
Darth Vader	M	0.189	0.2326	0.469	16	$3.024$
R2-D2	R	0.231	0.0126	0.133	12	$2.772$
Chewbacca	R	0.231	0.0117	0.145	11	$2.541$
Lando	R	0.231	0.0271	0.227	11	$2.541$
Wedge	J	0.257	0.0581	0.250	8	$2.056$
Biggs	J	0.257	0.0164	0.363	8	$2.056$
Obi-Wan	R	0.231	0.0129	0.287	8	$1.848$
Mon Mothma	R	0.231	0.0005	0.088	8	$1.848$
Red Leader	J	0.257	0.0136	0.286	7	$1.799$
Boba Fett	R	0.231	0.0129	0.257	7	$1.617$
Admiral Ackbar	R	0.231	0.0065	0.229	7	$1.617$
Jabba	R	0.231	0.0050	0.117	6	$1.386$
Gold Leader	J	0.257	0.0009	0.040	5	$1.285$
Beru	R	0.231	0.0008	0.140	5	$1.155$
Yoda	J	0.552	0	0.350	2	$1.104$
Zev	J	0.552	0	0.350	2	$1.104$
Boushh	R	0.231	0.0006	0	4	$0.924$
Owen	R	0.231	0	0.175	4	$0.924$
Rieekan	R	0.231	0	0	4	$0.924$
Dondonna	J	0.257	0	0.067	3	$0.771$
Piett	M	0.189	0.0021	0.225	4	$0.756$
Bib Fortuna	R	0.231	0	0.232	3	$0.693$
Tarkin	M	0.189	0	0.300	3	$0.567$
Motti	M	0.189	0	0.300	3	$0.567$
Dack	J	0.552	0	0	1	$0.552$
Camie	J	0.257	0	0.100	2	$0.514$
Red Ten	J	0.257	0	0.100	2	$0.514$
Derlin	R	0.231	0	0	2	$0.462$
Ozzel	M	0.189	0	0	2	$0.378$
Needa	M	0.189	0	0	2	$0.378$
Emperor	M	0.180	0	0.500	2	$0.360$
Anakin	M	0.180	0	0.500	2	$0.360$
Janson	J	0.257	0	0	1	$0.257$
Greedo	R	0.231	0	0	1	$0.231$
Jerjerrod	M	0.189	0	0	1	$0.189$

Equations11

ϕ_{ij} = \frac{1}{n _{d}} α = 1 \sum n_{d} δ_{α} (g_{i}, g_{j}),

ϕ_{ij} = \frac{1}{n _{d}} α = 1 \sum n_{d} δ_{α} (g_{i}, g_{j}),

Φ_{i} = 1 - \frac{1}{N - 1} j (\neq = i) \sum (1 - 2 ϕ_{ij})^{2},

Φ_{i} = 1 - \frac{1}{N - 1} j (\neq = i) \sum (1 - 2 ϕ_{ij})^{2},

Q = \frac{1}{2 m} i \neq = j \sum [(A_{ij} - γ \frac{k _{i} k _{j}}{2 m}) δ (g_{i}, g_{j})],

Q = \frac{1}{2 m} i \neq = j \sum [(A_{ij} - γ \frac{k _{i} k _{j}}{2 m}) δ (g_{i}, g_{j})],

E_{i} = 1 - \frac{\sum _{j} A _{ij} δ ( g _{i} , g _{j} )}{\sum _{j} A _{ij}} = 1 - \frac{\sum _{j} A _{ij} δ ( g _{i} , g _{j} )}{k _{i}} .

E_{i} = 1 - \frac{\sum _{j} A _{ij} δ ( g _{i} , g _{j} )}{\sum _{j} A _{ij}} = 1 - \frac{\sum _{j} A _{ij} δ ( g _{i} , g _{j} )}{k _{i}} .

p (T) = \frac{1}{10}

p (T) = \frac{1}{10}

\displaystyle+\delta(T,0.8)+6\delta(T,1)\Big{]}\,,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Relational flexibility of network elements based on inconsistent community detection

Heetae Kim (김희태)

Department of Industrial Engineering, Universidad de Talca, Curicó 3341717, Chile

Asia Pacific Center for Theoretical Physics, Pohang 37673, Korea

Sang Hoon Lee (이상훈)

[email protected]

Department of Liberal Arts, Gyeongnam National University of Science and Technology, Jinju 52725, Korea

Abstract

Community identification of network components enables us to understand the mesoscale clustering structure of networks. A number of algorithms have been developed to determine the most likely community structures in networks. Such a probabilistic or stochastic nature of this problem can naturally involve the ambiguity in resultant community structures. More specifically, stochastic algorithms can result in different community structures for each realization in principle. In this study, instead of trying to “solve” this community degeneracy problem, we turn the tables by taking the degeneracy as a chance to quantify how strong companionship each node has with other nodes. For that purpose, we define the concept of companionship inconsistency that indicates how inconsistently a node is identified as a member of a community regarding the other nodes. Analyzing model and real networks, we show that companionship inconsistency discloses unique characteristics of nodes, thus we suggest it as a new type of node centrality. In social networks, for example, companionship inconsistency can classify outsider nodes without firm community membership and promiscuous nodes with multiple connections to several communities. In infrastructure networks such as power grids, it can diagnose how the connection structure is evenly balanced in terms of power transmission. Companionship inconsistency, therefore, abstracts individual nodes’ intrinsic property on its relationship to a higher-order organization of the network.

I Introduction

Community structures of networks Porter et al. (2009); Fortunato (2010) are arguably the most popular concept in investigating the mesoscale connectivity between node groups of networks, in the field of network science Newman (2010). Various community detection algorithms have been developed to divide a network into communities based on modularity optimization Newman (2004); Newman and Girvan (2004); Clauset et al. (2004); Fortunato and Barthélemy (2007); Blondel et al. (2008), information theory Rosvall and Bergstrom (2008), clique percolation Palla et al. (2005), etc. The main objective of community detection algorithms is to provide a principled guideline to determine each node’s community membership in a network. The algorithms work under the assumption that the nodes inside each community are statistically better connected to each other, compared with the connection to the other parts of the network, which is basically the very definition of communities in networks.

There are many ways to classify community detection algorithms, but for our purpose we classify them dichotomously as the following. Deterministic community detection algorithms, by definition, produce a single community structure for given control parameters. On the other hand, stochastic algorithms can yield different community structures at each realization in principle (and in practice, as we will show). In general, so far, the inconsistent result of the stochastic community detection has been taken as a kind of defect of such algorithms. In other words, the inconsistency (sometimes dubbed as the community degeneracy problem) has been taken as the inaccuracy of stochastic detection algorithms Kwak et al. (2011); Lancichinetti and Fortunato (2012).

In this paper, however, we would like to argue that there is nothing wrong with the “inconsistent” results such stochastic algorithms produce, as it is fundamentally impossible to define the exact boundary of one’s community identity in the first place. For example, people are naturally involved in various groups of other people and the degrees of participation between those groups are different. Such different types of participation of a node in groups can indicate crucial information on the node’s social existence or influence. Throughout our study, therefore, we directly confront the inconsistent community detection results of a stochastic algorithm and harness them instead of evading them. Based on ensembles of community detection results, we examine how frequently the nodes are identified as the different (or same) community members. Applying the method to real networks in addition to model networks with prescribed communities, we show that the companionship inconsistency represents the sense of belongingness of nodes in networks and thus conveys their unique properties, in comparison to conventional centrality measures.

The paper is organized as the following. First, we introduce the concept of companionship inconsistency (CoI) and methodology in Sec. II with an illustrative example network. We apply the method to model networks to investigate the characteristics of companionship inconsistency in Sec. III.1. In Sec. III.2, we apply companionship inconsistency measure to various real networks and identify the roles of nodes. We summarize the results and conclude the paper with open questions and discussions in Sec. IV.

II Methods

II.1 Companionship inconsistency

In principle, the community detection algorithm based on stochastic methods may produce different results by definition, even for the results from the same parameters, and it is not difficult to observe such cases in practice. Again, we would like to emphasize that it is not only from the limitation of algorithm but also from networks’ innately ambiguous characteristics when it comes to community boundaries. To quantify the ambiguity of community structures, we introduce a principled measure of CoI for each node as a new type of centrality. The CoI captures how inconsistently the node is classified as a community’s member, with or without fixed companion nodes. When a node tends to be clustered with different nodes for different realizations, we consider that the node’s community identity is inconsistent.

In our previous paper Kim et al. (2015), we have introduced the concept of CoI to relate the stability of power-grid nodes to their community membership structure (we defined the “community consistency” there, but we have changed it in this paper to focus on the inconsistent nodes representing functional flexibility). For visual illustration, see Fig. 1, where we take a small example network in Fig. 1(a). We recap the formal definition in this paper again for self-containedness. To formulate the CoI, we first define the co-occurrence matrix elements $\phi_{ij}$ as the proportion of the number of cases that nodes $i$ and $j$ are identified as the members of the same community, which corresponds to the matrix in step 2 of Fig. 1(c):

[TABLE]

where $\delta_{\alpha}$ is the Kronecker delta for the $\alpha$ th realization of community detection, $g_{i}$ is the community index of node $i$ , and $n_{d}$ is the total number of realizations of community detection. In other words, $\delta_{\alpha}(g_{i},g_{j})=1$ when $i$ and $j$ are in the same community and $\delta_{\alpha}(g_{i},g_{j})=0$ otherwise in the $\alpha$ th realization of community detection. The measure $\phi_{ij}$ is [math] (or $1$ ) if $i$ and $j$ consistently belong to the different (same) community and intermediate values when the pairwise community membership is inconsistent, respectively. The extreme values such as $\phi_{ij}=1$ and $\phi_{ij}=0$ represent consistency in community detection, while intermediate values represent inconsistency. Therefore, based on this, we define the CoI of node $i$ shown in step 3 of Fig. 1(c), denoted by $\Phi_{i}$ , as

[TABLE]

where $N$ is the total number of the nodes. As a result, $\Phi_{i}=0$ when node $i$ always forms communities with the same nodes. In principle, $\Phi_{i}=1$ implies that the probability that node $i$ is clustered with any other node is $1/2$ (the maximum uncertainty in the comembership). The “community consistency” defined in Ref. Kim et al. (2015) is equal to $1-\Phi_{i}$ .

Note that the maximum value $\Phi_{i}=1$ , i.e., the comembership matrix element $\phi_{ij}=1/2$ for all of the other nodes $j$ , assumes exactly two “ground-truth” communities potentially connected to the node $i$ . Therefore, in principle, one has to be careful when it comes to the comparison of the results CoI produces, as the maximum value $\Phi_{i}$ can be different for specific circumstances depending on the number of communities and community size heterogeneity. For instance, the fact that $\phi_{ij}$ is a decreasing function of the number of communities attached to node $i$ can make the head-to-head comparison between the nodes attached to different numbers of communities nontrivial. In reality, however, we are not able to know the number of ground-truth communities a priori, let alone the local communities connected to each node. Therefore, to design a measure first addressing this particular characteristic of each node, we stick to the simple assumption of two (or at least not many) communities attached to the bridge nodes. As we demonstrate in the following sections, our CoI measure produces meaningful results and works fine in practice.

In order to measure CoI, one can utilize any stochastic community detection algorithm. In this study, we take the GenLouvain Jeub et al. (2011-2017) algorithm, which is a variant of the original Louvain algorithm Blondel et al. (2008), with the default randomization option move. The GenLouvain (just as its ancestor Louvain) algorithm separates communities based on the modularity maximization Newman (2004); Newman and Girvan (2004). The algorithm can detect communities in different scales by tuning the resolution parameter $\gamma$ in the modularity function Newman (2004); Newman and Girvan (2004)

[TABLE]

where $A_{ij}$ is the adjacency matrix elements representing the network structure, $k_{i}$ is the degree (the number of neighbors) of node $i$ , $g_{i}$ is the community index of node $i$ , $\delta$ is the Kronecker delta, and $m$ is the total number of edges that plays the role of normalization factor for matching the scale of $A_{ij}$ and $k_{i}k_{j}$ terms for $\gamma=1$ , and ensuring $-1\leq Q\leq 1$ . The smaller $\gamma$ values we use, the larger communities (thus the smaller number of communities in total) we detect. To generate statistical ensembles, we run multiple realizations of the GenLouvain algorithm for given $\gamma$ values. It general, one needs to tune the value of $\gamma$ . For the rest of the paper, we use the $\gamma$ value in a rather heuristic way, so that it generates a reasonable number of communities, as our goal is to demonstrate the utility of CoI for various types of networks rather than provide the most precise fine-tuned value of $\gamma$ of the GenLouvain algorithm for each network. Therefore, one always has to keep in mind that CoI values depend on the choice of different resolution parameter $\gamma$ as well.

We note that in community detection literature, other types of measures: flexibility, promiscuity, disjointedness, cohesion strength, and Rand index also consider the change of community identity of nodes Garcia et al. (2018); Rand (1971). Flexibility counts the number of changing community identity of a node while promiscuity counts the number of communities to which the node ever belongs. In contrast to CoI, both flexibility and promiscuity do not consider the pairwise relationship with companions. Disjointedness and cohesion strength take the community identity of the other nodes, but disjointedness focuses on how a node independently changes its community identity apart from the other nodes and cohesion strength only counts the mutual companionship without taking the absence of companionship into count. The Rand index Rand (1971) measures the similarity in data clusterings but it is a cluster-centric measure Gates et al. (2019), while CoI is node centric. The aforementioned measures also utilize the fuzziness of community Reichardt and Bornholdt (2004) as CoI does. However, those measures require the information on the community label of each node, whereas CoI only considers whether a pair of nodes are in the same community or not. Therefore, we emphasize that CoI can reveal the unique characteristics of nodes that are not captured by the seemingly similar measures, as we begin to demonstrate from now on.

II.2 Implication of companionship inconsistency

Let us revisit the example network shown in Fig. 1 to inspect the implication of CoI, in particular, compared to another network centrality also known to able to detect “bridge” nodes between groups. The network in Fig. 1(a) has three nodes (denoted by 6, 7, and 8) located between two groups of nodes: the group of nodes 1, 2, 3, 4, and 5 on the left, and the other group of nodes 9, 10, 11, 12, and 13 on the right. The nodes in the two (left and right) groups are densely connected so that they are almost always consistently detected with each group’s members. However, since the community identity of the three nodes (6, 7, and 8) is rather ambiguous, the community identity of the nodes are changed for each detection, e.g., node 6 is sometimes clustered with the left group and sometimes with the right group. Counting this co-occurrence of each node with others in detected communities, we calculate how flexible partnership the nodes have, as we presented in Sec. II.1.

In this manner, the CoI reveals the characteristics of nodes considering the mesoscopic functional relationship between them on top of the structure. For instance, all three nodes 6, 7, and 8 are in the middle of the other two groups, which is captured by their large CoI values [step 3 of Fig. 1(c)]. However, the portion of betweenness centrality (BC)—the fraction of shortest paths between all of the pairs of nodes that go through the node Goh et al. (2001)—is highly concentrated on node 8 and the BC values of nodes 6 and 7 are almost indistinguishable from the internal nodes such as nodes 2 and 3, because BC only takes the shortest path (likely using the path through node 8 instead of nodes 6 and 7) into account for given source and target nodes [Fig. 1(b)]. The similar topological position of the three nodes can only be disclosed by CoI.

Since CoI is not solely based on the structure of the community partitioning but the mutual relationship between nodes, it reveals the functional context of nodes. In Fig. 2, for example, the central node marked by the red arrow belongs to different communities for three realizations of community detection. The CoI of the central node is $8/9$ by calculating its comembership structure with all of the other nodes in the network. One can see the feature of CoI clearly by comparing it to the measure called “externality,” which we define as the proportion of the external degree (the number of neighbors belonging to the different community as the node of interest) of a node, i.e.,

[TABLE]

It represents how many neighbors of a node belong to the different community with the node.

Compared with CoI, externality only counts the relative fraction of connections to different communities from its own. In Fig. 2, the central node’s externality values are $1/3$ , $2/3$ , and $1/3$ for each realization (from the left to the right), which results in the mean externality value $\langle E_{\mathrm{central}}\rangle=4/9$ . By comparison, the large value of CoI focuses on the central node’s functional role of switching communities, while the intermediate value of mean externality reflects the averaged level of the node’s participation in other communities. In other words, the fluctuations in the membership structure is somehow ignored in the mean externality, while the CoI mainly takes such fluctuations to quantify the node’s property. These two measures, as we will show later, are clearly related, but they also measure different types of bridgeness Wu et al. (2018).

III Results

III.1 Model network

As presented in Sec. II.2, the CoI characterizes the attribute of nodes that cannot be captured by other simple measures without explicitly taking the community structure into account, such as BC. With a series of clustered network models, we find that CoI is indeed not correlated with degree (the number of each node’s neighbors Newman (2010)) or BC but externality introduced in Sec. II.2. We generate random subgraphs and rewire the initially internal edges into outside, to merge them into a network that consists of densely connected nodes in each group and sparsely connecting edges between the groups.

First, we create $n_{c}$ number of Erdős-Rényi (ER) random subnetworks Erdős and Rényi (1959) with $N_{c}$ nodes and $L_{c}$ edges in each subnetwork with the index $c\in\{1,2,\cdots,n_{c}\}$ . There are $N=\sum_{c=1}^{n_{c}}{N_{c}}$ nodes in total, as a result. Then, we assign the initial community identity of nodes as the label of each subnetwork to which they belong and connect the nodes only to their (preassigned) community members. To set the external connections, for all of the nodes, rewire edges until $E_{i}=1-T_{i}$ (thus, $T_{i}=1$ means that node $i$ is not rewired at all). Note that as we apply the rewiring process for each node sequentially, the final value of externality of each node can be changed during the rewiring process from the other nodes. Therefore, we measure the real externality values for each network realization after all of the rewiring processes are finished.

Figure 3 shows a sample network from the model and the results from an ensemble of $20$ networks composed of $n_{c}=10$ subnetworks with $N_{c}=50$ , $L_{c}=500$ for each of them. Therefore, the total number of nodes $N=500$ and the total number of edges $L=5000$ for each sample network. We randomly distribute the threshold value to each node independently, according to the prescribed probability distribution $p(T)$ for discrete threshold values $T\in\{0.2,0.4,0.6,0.8,1\}$ :

[TABLE]

to generate different levels of CoI, where $\delta$ is the Kronecker delta. With this setting, nodes still statistically belong to their initial community (as long as $T_{i}>0.1$ , where $1-E_{i}=1-1/n_{c}=0.1$ corresponds to the case of randomly distributed membership). For each of this network realization, we identify $20$ community structures by independently running the GenLouvain algorithm with $\gamma=0.7$ (that actually gives $10$ communities as we intended), and obtain CoI values of the nodes for each network. Figure 3 shows a network of $20$ samples with color code based on the original cluster in Fig. 3(a) and externality in Fig. 3(b). One can identify $10$ communities of low externality nodes at the boundary and high externality nodes in the center.

We compare CoI of all nodes from the $20$ sample networks with degree, betweenness, and the mean externality from $20$ realizations of GenLouvain community detection for each sample, as shown in Fig. 3(c) and Table 1. The CoI values are weakly correlated with the degree and strongly correlated with the BC, which supposedly comes from the fact that the degree, BC, and bridgeness are all well correlated in our model networks generated from unstructured random networks. Those correlations are indeed very different in real networks as we will present in Sec. III.2.

As expected, the CoI and mean externality values are (positively) well correlated. However, there is a notable difference between the two. The externality values are multimodally distributed with a wide range [as shown in the histogram on the above horizontal axis of the rightmost panel in Fig. 3(c)], caused by the prescribed discrete levels of rewiring. On the other hand, most CoI values are very small and distributed within a narrow range [as shown in the histogram on the right vertical axis of the rightmost panel in Fig. 3(c)], except for few outliers that are responsible for the positive correlation.

The difference also highlights the property of CoI in comparison to externality: Even the nodes that have gone through significant amount of rewiring (e.g., the nodes with $T_{i}=0.2$ ) still maintain their original membership profile, so their community identity itself is relatively intact, which is reflected in small CoI values even for such nodes. In contrast, the externality almost directly measures the level of rewiring, which results in the gradually changed values. In this respect, CoI is a more robust measure to quantify the community identity of nodes with different levels of participation, compared with externality. In Sec. III.2, we move on to real-world networks to check if these properties hold there as well.

III.2 Real networks

In this subsection, we examine the properties of CoI in various real networks, whose different contexts provide opportunities to interpret CoI from multiple perspectives. For instance, if the sum of the weights attached to a node is bounded, then a larger number of neighbors (degree) of a node can result in the weaker connection assigned to each of its neighbor caused by the node’s limited amount of interaction resource. In the context of social networks, the node (or a person) with a large CoI value could be an outsider, as the person’s attention to its own group members will be diminished as a result. On the other hand, if the sum of the weights attached to a node is unbounded and scales with its degree, for instance, then a node with a large CoI value may be a multiplayer who intermediates several different communities. Therefore, practical applications may require the observation of different centralties including both CoI and conventional ones.

We provide multiple types of such contexts, by introducing different types of networks in the following. Note that as in the case of clustered ER network model in Sec. III.1, we use $20$ community detection results for each real network.

III.2.1 Zachary’s Zachary karate club

Zachary’s karate club network Zachary (1977) is the most popular benchmark network for community detection algorithms. The network is known to have two separated groups caused by the conflict between the administrator and the master. We use the “original” version of the Zachary’s karate club network with two nodes with a single neighbor instead of the “conventional” version with one node with a single neighbor, thanks to the recent blog post by dear colleague Holme (2018). We denote this original version by Zachary’s Zachary karate club (ZZKC) network, according to the title of his blog post.

The correlations between CoI and other network centralities for the ZZKC network are listed in Table 1 and Fig. 5(a) shows the regression line with the envelopes of the 95% confidence interval on top of the scatter plot shaded by density. In contrast to the clustered ER networks in Sec. III.1, the correlation between CoI and degree, and that between CoI and BC are negative, and they are not even statistically significant anyway with large $p$ -values. Therefore, we can conclude that real networks with nontrivial structures would have more than just a simple positive correlation between CoI and degree or BC observed in clustered ER networks. In the ZZKC network as well, though, the correlation between CoI and mean externality is positively with statistical significance, which is not surprising considering their inherent similarity measuring community belongingness (again, the correlation is not perfect), as we argued in Sec. II.2.

We note that there is a peculiar type of node in this network, which we can designate as the clearly observable outsider node who is weakly connected to both sides. As shown in Fig. 4(a), the network is segregated into two communities as colored by red and green. For multiple community detection results with $\gamma=0.7$ that yields two communities, most club members consistently belong to one of two communities. However, there exists a person who has a notably large CoI value comparing to the others [the brightest node in Fig. 4(b), which corresponds to the CoI value $>0.6$ in Fig. 5(a)]. We interpret that it is caused by the fact that this outsider node is connected only to two other nodes [the red edges in Fig. 4(b)], which (consistently) belong to the opposite community to each other. In particular, the node sometimes belongs to the same community with one neighbor and sometimes with the other neighbor, which results in the large value of CoI.

III.2.2 Star Wars characters

In social networks, CoI can also effectively identify multiplayers connecting different communities. For instance, the Star Wars network Gabasova (2016) is a social network between the characters of the movie Star Wars series. The nodes represent characters and the edges connect characters who communicate (not necessarily in a human language, considering R2-D2 and Chewbacca) to each other in a same scene. Figures 4(c) and 4(d) show the result of community detection and CoI of Star Wars original trilogy series for brevity (with $\gamma=0.6$ that gives three communities, roughly corresponding to the three major groups in the movie). The characters belong to one of the communities: resistances (violet), Jedi knights (orange), and imperial military army (green) (see Table 2 for the complete list). Luke Skywalker, C-3PO, Princess Leia, Han Solo, and Darth Vader are leading characters in the series, characterized by their large degree values. All of these leading characters interact with many other characters, but their communication is usually focused on their own communities, with an important except of Luke Skywalker.

Luke Skywalker, as the main protagonist of the original trilogy, contacts with characters from all of the three communities [the red edges in Fig. 4(d)]: resistances, Jedi knights, and even from imperial military army (his father). As a result, he achieves quite a unique status of having the largest values of both CoI and degree. As an illustrative example, we provide the list of the product of CoI and degree as a type of influence score, along with other centralities and community identity in Table 2, and Luke Skywalker’s influence score is absolutely dominant compared with the others. In contrast to the outsider in ZZKC who is not necessarily influential limited by the small number of connections, we denote this type of nodes with both large CoI and degree values by “multiplayers” who actively connect different communities, and Luke Skywalker in this network is a representative case. This shows that CoI, possibly combined with other network centralities, can be a useful ingredient to quantify nodes’ influence.

Analyzing social relationships is not trivial Jones-Correa (2012) but important, as social metrics are developed to catch bullying behaviors in school or work place Cowie et al. (2002); Alivernini et al. (2017); Vivolo-Kantor et al. (2014) or to indicate influencers Brandwatch (2019); Sprout Social (2019). In doing so, CoI has its merit as one requires only the information of whether a pair of people are in the same group or not, not the further information on the identities of the groups. It means that CoI needs less amount of information to analyze the companionship structure, which is particularly important considering privacy issues. We believe that CoI can augment those metrics by providing a new type of information in regard to nodes’ social belonging.

For more statistically sound results, we show the correlations between centralities for the Star Wars network of all six episodes (episodes IV, V, VI, I, II, and III, in the chronological order of the release date) with $\gamma=0.7$ in Fig. 5(b). As shown in Table 1, again, the positive correlation between CoI and mean externality is significant, while the correlation between CoI and degree or BC is moderately positive, possibly related to Luke Skywalker’s dominance for those centralities as well.

III.2.3 Work place contacts

The work place contacts network Génois et al. (2015) represents the face-to-face contact between people in a work place, recorded by radio-frequency identification (RFID) devices. The place is composed of five departments in an office building, and the RFID devices tracked the contacts for 10 days. Each individual contact with the time stamp forms an edge in temporal network Holme and Saramäki (2012), but we aggregate all temporal edges into a static network for the purpose of our analysis.

In this network, people who play the intermediary role is not necessarily “hub” nodes with many neighbors, as shown in Fig. 5(c) and the lack of statistically significant correlation between CoI and degree as listed in Table 1 (they show the results with $\gamma=0.6$ ). Those are nodes with relatively small numbers of neighbors but connect multiple communities, which results in the large values of CoI. Compared with the ZZKC and the Star Wars network, these intermediary nodes are somewhere between the extreme outsider in ZZKC and the extreme multiplayer (Luke Skywalker) in Star Wars. We can see the positive correlation between CoI and mean externality again in this network in Table 1.

III.2.4 Facebook friends

The Facebook friends network Mastrandrea et al. (2015) is an online social network between high school students. In contrast to the work place contacts network, the nodes in this network have more diverse range of degree values, as expected from the nature of online contacts. Except for that difference, overall, the profile of different centralities is similar to the work place contacts, with a few notable intermediary nodes [see Fig. 5(d), which shows the results with $\gamma=0.8$ ]. The similarity is also reflected in the lack of significant correlations between CoI and degree or BC, and the significant positive correlation between CoI and mean externality in Table 1.

III.2.5 Central Chilean power grid

The central Chilean power grid is the electrical power transmission grid of the central region in Chile. A power grid is one of the infrastructure networks that are spatially embedded with geographical coordinates. The electrical power system facilities such as power plants and substations are represented as the nodes and the edges represent the high voltage transmission lines between the nodes. In this study, we use the “without tap” (WOT) version of the Chilean power-grid network Kim et al. (2018). Basically, the WOT version taking it into account that the power-grid nodes are directly connected to the transmission line. By using WOT version, one can simplify the real network but it still retains the physical connection structure of the power grid (see the data description in Ref. Kim et al. (2018) for more detailed information).

The structure of power grids is determined by multiple factors such as the population, the location of natural resources, economic and environmental constraints, the distance between the facilities, etc. On top of that, by far the most important principle is that power grids should be stably operated under possible external perturbation caused by natural disasters and intentional attacks against the infrastructure. We have already checked the high CoI nodes are unstable against external perturbations in terms of the synchronous dynamic stability in Ref. Kim et al. (2015), where we used a more primitive version (in terms of data processing) of the Chilean power grid than the more sophisticated version we use in this work.

By analyzing the CoI values of the WOT version in this paper, we get the additional hint about the organizational principle of power grids based on the results (with $\gamma=0.7$ ) shown in Fig. 5(e). In contrast to the other networks used in this study with relatively narrowly distributed CoI values, the CoI values for the power grid are broadly distributed. Furthermore, if we look at the CoI values spatially distributed in the power grid (the rightmost network in Fig. 5, where the node layout comes from the actual geographical coordinates), then the CoI values are gradually changed along the power-grid nodes. This gradual change indicates that the power grid is organized hierarchically in different levels, from the most local to the most global ones. The large CoI values correspond to a few nodes in the southern region, where they connect the central and southern regions.

For the statistical correlation of CoI and other centrality measures in the central Chilean power grid, check Table 1, where it also shares similar properties with the other networks: no statistically significant correlation between CoI and degree or BC, and the strong positive correlation between CoI and mean externality.

IV Summary and outlook

In this study, we have extracted the nodes’ companionship in terms of community identity from the inconsistent community detection results. We have measured the CoI from the individual community relationship between the pairs of nodes. As we have demonstrated from the clustered ER network model and some real networks including ZZKC, Star Wars characters, work place contacts, Facebook friendship, and the central Chilean power grid networks, we have shown that CoI can effectively reveal the various types of nodes’ social roles: outsiders, multiplayers, and building blocks of hierarchical structures.

Considering the context of networks, one can interpret CoI in various ways and apply it to acquire further information. For example, outsiders and multiplayers—both tend to have high companionship inconsistency—can be distinguished by degree: outsiders with small degree and multiplayers with large degree, as we have shown in the case of ZZKC and Star Wars networks. By combining several different centrality measures as such, a new classification method can be suggested, which can be further research topics. In this sense, CoI can be a useful nodal information as a projection of network property through the mesoscale convex lens.

A common conclusion from all of the networks is that CoI and mean externality are positively correlated, which implies that the nodes sparsely (densely) connected to their community members are more likely to have the inconsistent (consistent) companions. One may take this as too obvious a fact, because it fits well with our intuition about the very concept of community structures. However, we believe that it is important to validate this fact by the actual data analysis as we have done in this study. An interesting future direction could be to look for exceptions to this rule.

The CoI value depends on the free parameters involved in algorithms, of course, e.g., the selection of the resolution parameter $\gamma$ for the case of using the GenLouvain algorithm. By taking this reversely, we could actually use the CoI values to determine what would be the most reasonable choice of $\gamma$ . So far, people take the parameter $\gamma$ as just the factor to tune the overall community scale, but the resultant CoI values and their distribution show more than just a scale. They reveal richer structural properties such as hierarchy, as we have demonstrated in the case of the power grid.

Beyond the simple and static network structure we use in this study, the pairwise co-occurrence can also be measured based on the evolving connection structure of temporal networks Holme and Saramäki (2012) by considering each time series as the individual snapshot of network for community detection. In addition, in the same manner of individually counting the multiple identity of a node, CoI can be applied to hierarchical or overlapping communities Palla et al. (2005); Ahn et al. (2010). For example, one can capture a comprehensive landscape of CoI from an ensemble of community detection results generated from different resolution parameter values Jeub et al. (2018) or even from different community detection algorithms. The CoI is based on the mutual companionship between a pair of nodes. However, even higher-order relations such as triplet or quadruplets Gates et al. (2019); Gates and Ahn (2017) can also be used to analyze their consistency. All of these are nice candidates for the future study, we believe.

Acknowledgements.

The authors greatly thank Petter Holme for establishing the formal definition of CoI during our collaboration Kim et al. (2015), Yong-Yeol Ahn for the fruitful discussions, and an anonymous referee for the insightful review on the formulation of the CoI measure. This work was supported by Gyeongnam National University of Science and Technology Grant in 2018–2019.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Porter et al. (2009) M. A. Porter, J. P. Onnela, and P. J. Mucha, “Communities in networks,” Not. Am. Math. Soc. 56 , 1082 (2009).
2Fortunato (2010) S. Fortunato, “Community detection in graphs,” Phys. Rep. 486 , 75 – 174 (2010) . · doi ↗
3Newman (2010) M E J Newman, Networks: An Introduction (Oxford University Press, Inc., New York, NY, USA, 2010).
4Newman (2004) M E J Newman, “Fast algorithm for detecting community structure in networks,” Phys. Rev. E 69 , 066133 (2004).
5Newman and Girvan (2004) M E J Newman and M Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E 69 , 026113 (2004).
6Clauset et al. (2004) A. Clauset, M E J Newman, and C. Moore, “Finding community structure in very large networks,” Phys. Rev. E 70 , 066111 (2004).
7Fortunato and Barthélemy (2007) S. Fortunato and M. Barthélemy, “Resolution limit in community detection,” Proc. Natl. Acad. Sci. U.S.A. 104 , 36 (2007).
8Blondel et al. (2008) V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat. Mech.: Theory Exp. 2008 , P 10008 (2008).