The way to uncover community structure with core and diversity
YunFeng Chang, SeungKee Han, XiDong Wang

TL;DR
This paper introduces a simple, efficient method using rational random selection to uncover and analyze the emergence, core, and diversity of community structures in complex systems, with high sensitivity and reliability.
Contribution
It presents a novel approach that reveals deterministic and diverse community states without heavy computation or prior information, enhancing understanding of complex system behaviors.
Findings
High sensitivity and reliability in community detection
Reveals hidden deterministic community states
Provides insights into community diversity and core structures
Abstract
Communities are ubiquitous in nature and society. Individuals that share common properties often self-organize to form communities. Avoiding the shortages of computation complexity, pre-given information and unstable results in different run, in this paper, we propose a simple and effcient method to deepen our understanding of the emergence and diversity of communities in complex systems. By introducing the rational random selection, our method reveals the hidden deterministic and normal diverse community states of community structure. To demonstrate this method, we test it with real-world systems. The results show that our method could not only detect community structure with high sensitivity and reliability, but also provide instructional information about the hidden deterministic community world and our normal diverse community world by giving out the core-community, the…
| 33 | 34 | 11 | 9 | 33 | 4 | 7 | 1 | 3 | 15 | 34 | 2 | 20 | 2 | 2 | 26 | 32 | 2 |
| 34 | 33 | 11 | 14 | 1 | 4 | 7 | 6 | 3 | 16 | 33 | 2 | 21 | 33 | 2 | 27 | 30 | 2 |
| 1 | 2 | 8 | 14 | 2 | 4 | 11 | 1 | 3 | 16 | 34 | 2 | 21 | 34 | 2 | 27 | 34 | 2 |
| 2 | 1 | 8 | 14 | 3 | 4 | 31 | 9 | 3 | 17 | 6 | 2 | 22 | 1 | 2 | 28 | 24 | 2 |
| 3 | 1 | 6 | 14 | 4 | 4 | 31 | 33 | 3 | 17 | 7 | 2 | 22 | 2 | 2 | 28 | 34 | 2 |
| 4 | 1 | 6 | 24 | 34 | 4 | 31 | 34 | 3 | 18 | 1 | 2 | 23 | 33 | 2 | 29 | 32 | 2 |
| 8 | 1 | 4 | 30 | 34 | 4 | 32 | 34 | 3 | 18 | 2 | 2 | 23 | 24 | 2 | 29 | 34 | 2 |
| 8 | 2 | 4 | 5 | 1 | 3 | 13 | 1 | 2 | 19 | 33 | 2 | 25 | 26 | 2 | 10 | 3 | 1 |
| 8 | 3 | 4 | 6 | 1 | 3 | 13 | 4 | 2 | 19 | 34 | 2 | 25 | 32 | 2 | 10 | 34 | 1 |
| 8 | 4 | 4 | 6 | 7 | 3 | 15 | 33 | 2 | 20 | 1 | 2 | 26 | 25 | 2 | 12 | 1 | 1 |
| number of communities | number of appearance | detailed structure | number of appearance | |
|---|---|---|---|---|
| 2 | 2513 | Fig.1(a)(17,17) | 1261 | |
| Fig.1(b)(16,18) | 1252 | |||
| CCom | Fig.1(c)(15,17,2) | 1196 | ||
| 3 | 4987 | Fig.1(d)(14,17,3) | 1255 | |
| Fig.1(e)(16,16,2) | 1251 | |||
| Fig.1(f)(13,18,3) | 1285 | |||
| 4 | 2500 | Fig.1(g)(14,15,2,3) | 1221 | |
| Fig.1(h)(13,16,2,3) | 1279 | |||
| 2 | 5634 | Fig.1(a)(17,17) | 2821 | |
| Fig.1(b)(16,18) | 2813 | |||
| RCom | Fig.1(c)(15,17,2) | 906 | ||
| 3 | 3733 | Fig.1(d)(14,17,3) | 909 | |
| Fig.1(e)(16,16,2) | 945 | |||
| Fig.1(f)(13,18,3) | 973 | |||
| 4 | 633 | Fig.1(g)(14,15,2,3) | 297 | |
| Fig.1(h)(13,16,2,3) | 336 |
| state | type of community | number of communities |
| hiddern | RCom | 1 |
| deterministic | CCom | 11 |
| normal | RCom | 3 |
| diversel | CCom | 10,11,12,13 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The way to uncover community structure with core and diversity
**YunFeng Chang1,2111Corresponding author E-mail: [email protected]
1College of Science, China Three Gorges University, Yichang 443002, Hubei, P. R. China
2 Department of Physics, Chungbuk National University, Cheongju 28644, Chungbuk, R. Korea, SeungKee Han2, XiDong Wang1**
Abstract
Communities are ubiquitous in nature and society. Individuals that share common properties often self-organize to form communities. Being able to identify community structure could help us understand and explore complex systems efficiently. Avoiding the shortages of computation complexity, pre-given information and unstable results in different run, in this paper, we propose one simple and efficient method to try to give a deep understanding of the emergence and diversity of communities in complex systems. By introducing rational random selection, our method reveals the hidden deterministic and normal diverse community states of community structure. To demonstrate this method, we test it with real-world systems. The results show that our method could not only detect community structure with high sensitivity and reliability, but also could provide instructional information about our normal diverse community world and the hidden deterministic community world by giving out the core-community, the real-community, the tide (boundary) and the diversity. This is of paramount importance in understanding, predicting, and controlling a variety of collective behaviors in complex systems.
Key words: community detection, core-community, real-community, diversity
Introduction
Communities are supposed to play special roles in the structure-function relationship. For examples, the communities in WWW are sets of web pages sharing the same topic1 ; the modular structure in biological networks are widely believed to play important roles in biological functions10 ; 2 ; 3 ; 4 ; 5 . The identification of community structure helps when analyzing the functionalities and organizations of complex systems.
With the spring up of complex network, which have attracted considerable attention in physics and other fields as a foundation for the mathematical representation of a variety of complex systems, many systems in different areas such as biology3 , sociology6 , medicine7 , web8 , and many others9 are represented as networks. In the field of complex network study, communities are defined as groups of nodes that are densely interconnected but only sparely connected with the rest of the network10 ; 11 ; 42 ; 43 . With this network based definition, researchers have proposed different algorithms for detecting communities such as topology based methods10 ; 33 , modularity optimization34 ; 35 , dynamic label propagation36 ; 37 ; 38 , statistical inference39 ; 40 ; 41 . Besides the shortages of computation complexity, pre-given information and unstable results in different run, moreover, some social networks are found with different community property that the individuals in one group might be gregarious, having many contacts with others, while the individuals in another group might be more reticent. An example of this behavior is seen in networks of sexual contacts, where separate communities of high- and low-activity individuals have been observed17 ; 18 . That is why there is no commonly agreed definition for community. For reviews see reference26 ; 27
Community detection is also called cluster analysis which is done with different kinds of relationships. Specifically, cluster analysis is the assignment of a set of observations into clusters of components that are similar to each other but different from components in other clusters. It is often used to ascertain whether a complex system comprises a set of distinct clusters, each representing components with substantially different properties. The segmentation of complex systems into clusters could also allow us to find specific functions naturally assigned to each cluster, as in the case of human functional brain system19 and metabolic system3 . A number of clustering methods have been developed as a tool for handling large and heterogeneous collections of systems, e.g., hierarchical clustering, k-means clustering, and affinity propagation20 ; 21 ; 22 .
Incorporating the two community detection considerations above, the elusive question is how communities with common internal properties arise? Inspired by the ultimatum game experiment in reference23 , in this paper, we propose one simple and efficient community detection method based on this elusive question. This method is able to cluster complex systems efficiently by merging components of the closest proximity into the same community. During the merging process, characteristic numbers of communities are obtained for the complex systems, which correspond to various resolution scales for viewing the system. Moreover, we get two practical community states, hidden deterministic and normal diverse, by introducing rational random selection. These two states provide an in-depth view of the community structure of real-life complex systems with diversity. To demonstrate this method, we test it with real-world systems and find that the method could not only detect community structure with high sensitivity and reliability but also could provide us instructional information about our community world by giving out the core-community, the real-community, the tide (boundary) and the diversity. This is of paramount importance in understanding, predicting, and controlling a variety of collective behaviors in complex systems, especially social complex systems.
Method
For finding meaningful communities, it is better to follow the real process of community formation or construction. Han et al. tried to explore this process by ultimatum game experiment23 . In real-life systems, communities are constructed by individuals with the choosing of friends. And this choosing process are base on individual’s judgment of its relationship with the surroundings. Most probably, individual chooses the one who is most similar to it or the one satisfy its expectation mostly. So that it might be a good choice to reconstruct and detect communities by the most similar pairs. That means, detecting communities by formalizing those relationships or those components believed to be the most significant.
The procedure of our method is explained in the following steps using Zachary’s karate club25 as an example. The constructed Zachary’s karate club, a university-based karate club, consists of 34 nodes. At the beginning of Zachary’s study there was an incipient conflict between the club instructor (node 1) and administrator (node 34) over the price of karate lessons. As time passed, the entire club became divided over this issue during the course of Zachary’s study. The graph representation of the relationships in the club (shortly before the fission) can be seen in 25 . The karate system is represented as a graph with a set of nodes model the members of the club and a set of edges indicating that two individuals consistently were observed to interact outside the normal activities of the club.
Similarity can be different kinds of interaction according to the properties of complex systems. For examples, internet users with common interests28 , social communities with distinctive social norms form spontaneously29 , related proteins group together to execute specific functions within a cell30 . Here, for the karate club, connected common neighbor is a good choice because human has the intention to follow the major, this is also the basic idea of label propagation algorithm. Like a friendship network, each person affects their known people. More common known people make two acquaintances more close to each another. The neighborhood of node is the set of adjacent nodes of ,
[TABLE]
Common neighbors of two nodes is the set of nodes containing adjacent nodes of node that are also adjacent nodes of node ,
[TABLE]
With the adjacency matrix , where if nodes and are connected and otherwise . Connected common neighbor (similarity) can be calculated in the following way:
[TABLE]
With , our method is carried out in the following steps:
Step 1: Identify all the most similar neighbor of each node and record their similarity in decreasing order, as shown in Table 1. For a system with components, our method needs only to process the elements in this list instead of dealing with a matrix of elements.
Step 2: Communities are constructed by starting from the connected node pair with the maximum similarity and then including more connected node pairs from the list. In this example, to begin with, node 33 and 34 formed the first core-community . Then node 1 and node 2 formed the second core-community . grew by including nodes 3, 4 through their connections with node 1 and resulted in . Node 8 is included into through any one of its connections with nodes 1,2,3,4 resulted in . Similarly, node 9 is added to , node 14 is added to , node 24 and 30 is added to , node 5, 6, 7, 11 to ; nodes 31, 32 to , node 13 to the second, node 15, 16 to the first, node 17, 18 to the second, node 19 to the first, node 20 to the second, node 21 to the first, 22 to second, 23 to the first. Node 25 and 26 to through (25,32) and (26,32). Then nodes 27, 28, 29 are added to . Node 10 is added to . And and are connected by the tide (10,34). Finally, node 12 is added to through its connection to node 1.
The tide (10,34) causes the merging of and into the real-community .
Results and Analysis
The hidden deterministic state. We call this community detection process with all the most similarity pairs: the hidden deterministic state. This is an ideal but usually impossible world for large complex systems because of the limitation of information and exact description. However, the hidden deterministic world does provide us with instructional community information about the system. Here, for the karate club, we could get the following instructions:
- •
The club has two hidden major core-communities with (33,34) and (1,2) acted as the central nodes. Detailed statistics gives out that, the core nodes are 34 and 1 because more nodes are connected to and through their connection to 34 and 1 correspondingly. This is in accordance with the real situation where node 34 is the administrator and node 1 is the instructor25 .
- •
In the above community detecting process, all the most similar pairs for one individual are processed together. However, if we observe these node pairs in detail, more community information of the system can be discovered. Some of the nodes with two or more most similar node pairs have no obscurity in the selection of community, just like nodes 8, 14, 31, 13. However, there are also some special nodes deserve more attention. For example, node pairs for node 6. If we deal with (6,7) firstly, a new core-community will appear temporarily. As for the node pairs for node 25, a new core-community will be found. All these detailed community information has heuristic and instructional meaning in community structure analysis. We will explain them later in the part of normal diverse community world.
- •
One vacillate node 10 is found, which has two most similar nodes 3 and 34 since it straddles between and . That is because of the oversimplified relationship of the network data. By investigating the system more in detail, Zachary gives out that node 10 is more similar with node 3425 . So there is no doubt that node 10 belongs to leaded by the administrator node 34. It is also natural for node 10 choose because it is connected directly to the leader of . However, the choice of node 10 is unknown before it made the choice, we can only guess that according to his behavior with probability, and made the conclusion that it may choose most probably.
- •
Finally and importantly, further detection could be done for the hidden deterministic world by regarding the detected communities as coarse-grained components. Then the coarse-grained system comprising these renormalized components can be further classified by steps 1 and 224 .
The hidden deterministic state shows that Zachary’s karate club is a system with two hidden core-communities, that it will evolve into two parts leaded by the instructor node 1 and the administrator node 34.
The normal diverse state. Then, is the above our real-life community world? In fact, it is more common and universal for individuals to choose one neighbor at one time according to their preference: rational random selection. We call the community detection with rational random selection the normal diverse state. The organization of real-life community world is usually based on such a rational random selection.
Clearly, the hidden deterministic world would be the same with the normal diverse world if all the individuals have only one distinct most similar node pair. For the karate system, as showed in table 1, most nodes have two or more most similar node pairs, which results in a diverse community world. Table 2 is the detailed community detection results in the normal diverse state for the karate system in 10000 run. The normal diverse state shows that:
- •
After introducing rational random selection, stable small local communities ( and ) emerge spontaneously and induce social diversity into the system, this phenomenon is also found experimentally by Han et. al. in 23 . is recognized as a small community because both of them settle at the outer boundary of , as well as, with no connection with . One can get this fact from Figure 1. For the same reason, is recognized as a small community, and it is included into in the fission because it sides on the outer boundary of while it has no connection with . The discovery of these small communities show that, rational randomness causes ordered diversity.
These small communities are also discovered by some of the community detection methods. However, former works do not pay any attention to them but give different kinds of appended manmade rule to achieve the fission results16 . In this paper, we try to deduce human invasion in community detection and recur the scene of real diverse community world which is the factual property of complex systems. With this idea, we get more instructional information.
- •
Diverse core-community results in table.2 show that the individuals live a diverse life at normal state before the fission that the instructor node 1 left and opened another club. Every possible structure in Fig.1 appears with almost equal probability. Diverse real-community results in table.2 show that, the probability to divide into two parts is , the probability to divide into 3 parts is , the probability to divide into 4 parts is . The structure with 2 communities appears apparently more frequently.
- •
These results could also provide valuable information for the control of the dynamics of the evolution of communities in complex system. For example, it is explicitly easier, in the karate system, for the instructor node 1 to persuade nodes (25,26) to win the fission, and nodes (6,7,17) to persuade for the administer node 34 to win correspondingly.
The normal diverse state reveals that Zachary’s karate club is a diverse real life system with major and minor communities. Diverse communities emerge from rational local interaction of individuals according to their preference as well as have no influence on the fission. That is, the karate system is a diverse complex system with hidden inside order.
We also tested our method with the dolphin system (38 components) and ISI social science journal system (1575 components).
In the dolphin system, the social network data contain 62 individuals31 while only 40 ones are included in their association analysis32 . As for these 40 dolphins, one (BZ) is not included in the 62 system, we could not get its connection information. And node (Natch) is only connected to BZ in the 40-system. So the remaining 38 dolphins are selected as our research objects. With the above community detection method, we found all the three associations and their core dolphins mentioned in 32 . Figure 2 is the community detection results for the dolphin system. Figure 2(a) is the 4 real-communities and Figure 2(b) is the 6 core-communities in the deterministic hidden state. The three major communities in Figure 2(b) correspond to the three significant associations found by D. Lusseau32 , with nodes 9 (Gallatin), 24 (Scabs) and 32(Topless) hold the central position correspondingly. The communities leaded by 24 (red) and 32 (blue) are the two significant male associations, and the community leaded by 9 (green) is the female association. These results are surprisingly in accordance with the observation results in reference32 .
Figure 2(c) is the distribution of the number of community in the normal diverse state. The distribution indicates that the dolphins live a diverse life with more communities at normal state. These diversities occur in the biggest community in Figure 2(a) (grey, composed by the two male associations), while the female community remains unchanged all the time. This stability indicates that the dolphins live in a diverse social association with hidden inside order.
Finally, we tested our method on a large complex system with 1575 nodes, the ISI social science journal system24 . In 2011, we used the cosine of co-citation as similarity, which could reflect the citation pattern of journals in detail. With this accurate information, every journal has only one most similar journal. That means, the similarity of co-citation pattern results in the same hidden deterministic and normal diverse community world. In this paper, we use connected common neighbor as similarity for comparison and simplicity. Table 3 is the community detection results. In the hidden deterministic state, these journals are all social science journals with 11 different research fields. And in the normal diverse state, three major research domains appear with 11.1689 different research fields on average. Figure 3(a) is the distribution of the number of communities in diverse normal state. These results are in accordance with our former work24 . Three real-communities in the normal diverse state correspond to the three knowledge domains. Domain I is the study of sociology includes sociology, politics and Geography with American sociological review (Impact Factor: 3.989) as the leading journal. Domain II is Psychology and related research field with Psychological Review (Impact Factor: 10.872) as the leading journal, this domain contains about half of the 1575 journals. Domain III is the study of social phenomenon includes History, Economics, Finance and Law, with American Historical Review (Impact Factor: 1.618) as the leading journal. These three leading journals are the most outstanding journals in its research domain correspondingly.
In order to show the hidden inside order of the normal diverse state, we made a comparison of the 3 real-communities in 10000 run. By setting the result of the first run (Domain I(266 journals), Domain II (848 journals), Domain II (461 journals)) as a standard result for comparison, figure 3(b) gives out the portion of consistence of the member journals in sorted order. W can get from figure 3(b) that SSCI journals is a stationary system with 3 domains because more than of the journals keep their research domain unchanged during the 10000 run. Detailedly, for the 1575 journals, 282 of them have more than one most similar node pair (282 nodes with 402 most similar node pairs). However, of these 282 journals, 36 nodes are found lies on the boundary of different domain.
Figure 3(c) is the cumulative probability distribution of the journal journal similarity with different community detection methods. The curve labeled unclustered is the cumulative probability distribution for the original SSCI journal co-citation similarity. The curve labeled with MSC is the cumulative probability distribution with 3 research domains in our former work24 . The curve labeled diverse 3 core-communities is the cumulative probability distribution of this paper. Comparisons of the distribution show the efficiency of our method, which can detect the components that are similar with each other into the same community but not the dissimilar ones. It behaves even better than the former coarse-grained results.
Summary
Communities are common and play a significant role in the functioning of complex systems. Inspired by the ultimatum game experiment in reference 23 and our former work24 , in this paper, we propose one simple and efficient community detection method. Our method could detect the community structure of complex systems efficiently by merging components of the closest proximity. By introducing rational random selection, our method reveals the hidden deterministic and normal diverse community states of community structure. These two states have direct corresponding meaning in real-life application. Most of the time, individuals live in a diverse world with many small communities until there happens some big event. We have many examples of this kind, just like the small research groups with same research interest in different country, small groups of Marathon all over the earth, human society before or after the election, etc. The community structure in different states correspond to various pint of view for viewing the community structure of real-life complex systems with diversity. We test our method with 3 real-world systems and find that the method could not only detect community structure with high sensitivity and reliability but could also provide instructional information about our community world by giving out the core-community, the real-community, the tide (boundary) and the diversity.
What’ more, statistics of the core-community and real-community reveal the hidden inside properties of complex systems. These results also give a possible indication that it is the rational randomness based on self expectation that is significant in the emergence of diversity and stability of communities.
Acknowledgements
This work is supported by National Natural Science Foundation of China (Grant No. 11547003), and China Scholarship Council (Grant No. 201607620007).
Author contributions statement
Y.F.C. conceived the method, performed the experiment, analyzed the results and prepared the manuscript. S.K.H. gave helpful comments on the analysis of the results and the manuscript. X.D.W. gave helpful comments on the manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) G.W. Flake, S. Lawrence, C.L. Giles and F.M. Coetzee, Self-organization and identification of Web communities, IEEE Computer, 35(3) , 66-71, (2002).
- 2(2) M. Girvan and M.E.J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences, 99(12) , 7821-7826, (2002).
- 3(3) E. Ravasz, A.L. Somera, A. Mongru, Z.N. Oltvai, A.L. Barabasi, Hierarchical organization of modularity in metabolic networks, Science, 297(5586) , 1551-1555, (2002).
- 4(4) R. Guimerà and L.A.N. Amaral, Functional cartography of complex metabolic networks, Nature, 433 , 895-900 (2005).
- 5(5) G. Palla, I. Derényi, I. Farkas and T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435 , 814-818, (2005).
- 6(6) M. Huss and P. Holme, Currency and commodity metabolites: Their identification and relation to the modularity of metabolic networks, IET Systems Biology, 1(5) , 280-285, (2007).
- 7(7) M.C. Gonzalez, C.A. Hidalgo and A.L. Barabasi, Understanding individual human mobility patterns, Nature, 453 , 779-782, (2008).
- 8(8) A.C. Gavin, et al., Proteome survey reveals modularity of the yeast cell machinery, Nature, 440 , 631-636, (2006).
