Modularity in Multilayer Networks using Redundancy-based Resolution and Projection-based Inter-Layer Coupling
Alessia Amelio, Giuseppe Mangioni, Andrea Tagarelli

TL;DR
This paper introduces a new multilayer network modularity formulation that better captures layer relevance, order relations, and community structure, improving community detection accuracy.
Contribution
It revises the semantics of resolution and inter-layer coupling parameters and incorporates layer ordering, addressing limitations of existing multislice modularity.
Findings
Proposed modularity outperforms existing methods on synthetic and real-world networks.
Reveals the impact of different parameter combinations on community detection.
Supports layer ordering in multilayer network analysis.
Abstract
The generalized version of modularity for multilayer networks, a.k.a. multislice modularity, is characterized by two model parameters, namely resolution factor and inter-layer coupling factor. The former corresponds to a notion of layer-specific relevance, whereas the inter-layer coupling factor represents the strength of node connections across the network layers. Despite the potential of this approach, the setting of both parameters can be arbitrarily selected, without considering specific characteristics from the topology of the multilayer network as well as from an available community structure. Also, the multislice modularity is not designed to explicitly model order relations over the layers, which is of prior importance for dynamic networks. This paper aims to overcome the main limitations of the multislice modularity by introducing a new formulation of modularity for…
| 0.667 | 1.000 | - | 0.667 | 1.000 | |
| 0.525 | 0.558 | 0.667 | 0.667 | 0.667 | |
| 0.602 | 0.667 | 0.667 | 0.602 | 0.558 | |
| 2.000 | - | - | - | 2.000 |
| 0.135 0.038 | 0.612 0.214 | 0.444 0.314 | 0.525 0.071 | 0.619 0.274 | 0.606 0.159 | |
| 0.398 0.096 | 1.105 0.284 | 1.115 0.157 | 1.192 0.320 | 1.018 0.245 | 1.193 0.305 | |
| 0.416 0.074 | 1.148 0.258 | 1.188 0.239 | 1.088 0.118 | 1.249 0.306 | 1.048 0.158 | |
| 0.018 0.000 | 0.091 0.000 | 0.000 0.000 | 0.000 0.000 | 0.091 0.000 | 0.110 0.000 |
| 0.064 | 0.098 | 0.098 | 0.129 | 0.131 | 0.142 | 0.141 | 0.126 | 0.063 | 0.097 | 0.097 | 0.123 | |
| 0.164 | 0.214 | 0.214 | 0.318 | 0.320 | 0.317 | 0.329 | 0.310 | 0.162 | 0.213 | 0.213 | 0.307 | |
| 0.161 | 0.213 | 0.213 | 0.313 | 0.309 | 0.325 | 0.312 | 0.266 | 0.160 | 0.212 | 0.212 | 0.299 | |
| 0.022 | 0.028 | 0.028 | 0.044 | 0.044 | 0.045 | 0.045 | 0.048 | 0.022 | 0.027 | 0.027 | 0.047 | |
| 0.411 | 0.554 | 0.554 | 0.803 | 0.804 | 0.828 | 0.827 | 0.750 | 0.408 | 0.550 | 0.550 | 0.777 |
| #entities | #edges | #layers | node set | degree | avg. path | clustering | |
|---|---|---|---|---|---|---|---|
| coverage | length | coefficient | |||||
| AUCS | 61 | 620 | 5 | 0.73 | 10.43 4.91 | 2.43 0.73 | 0.43 0.097 |
| EU-Air | 417 | 3 588 | 37 | 0.13 | 6.26 2.90 | 2.25 0.34 | 0.07 0.08 |
| FAO-Trade | 214 | 318 346 | 364 | 1.00 | 7.356.17 | 2.430.39 | 0.310.11 |
| FF-TW-YT | 6 407 | 74 836 | 3 | 0.58 | 9.97 7.27 | 4.18 1.27 | 0.13 0.09 |
| Flickr | 789 019 | 17 071 325 | 5 | 0.33 | 23.15 5.61 | 4.50 0.60 | 0.04 0.01 |
| GH-SO-TW | 55 140 | 2 944 592 | 3 | 0.68 | 41.29 45.09 | 3.66 0.62 | 0.02 0.01 |
| Higgs-Twitter | 456 631 | 16 070 185 | 4 | 0.67 | 18.28 31.20 | 9.94 9.30 | 0.003 0.004 |
| London | 369 | 441 | 3 | 0.36 | 2.12 0.16 | 11.89 3.18 | 0.036 0.032 |
| Obama | 2 281 259 | 4 061 960 | 3 | 0.50 | 4.27 1.08 | 13.22 4.49 | 0.001 0.0005 |
| VC-Graders | 29 | 518 | 3 | 1.00 | 17.01 6.85 | 1.66 0.22 | 0.61 0.89 |
| GL | LART | PMM | M-EMCD∗ | |
|---|---|---|---|---|
| AUCS | 5 | 27 | 2 | 13 |
| EU-Air | 10 | 381 | 5 | 39 |
| FAO-Trade | 12 | - | 10 | 11 |
| FF-TW-YT | 749 | - | 10 | 115 |
| GH-SO-TW | 87 | - | 10 | 392 |
| Flickr | 12290 | - | 10 | 7660 |
| Higgs-Twitter | 15218 | - | 10 | 121 |
| London | 21 | 339 | 30 | 46 |
| Obama | 297062 | - | 10 | 328367 |
| VC-Graders | 3 | 6 | 2 | 16 |
| #comm. | , | , | , | , | , | ||
|---|---|---|---|---|---|---|---|
| by GL | |||||||
| ER-ER | 10 | 0.249 | 0.192 | 0.196 | 0.290 | 0.258 | 0.262 |
| LFR-ER | 16 | 0.486 | 0.404 | 0.411 | 0.486 | 0.434 | 0.441 |
| GN-ER-GN-ER | 4 | 0.429 | 0.432 | 0.436 | 0.552 | 0.471 | 0.475 |
| GN-ER-ER-GN | 4 | 0.429 | 0.432 | 0.436 | 0.552 | 0.471 | 0.475 |
| AUCS | 0.41 | 0.37 | 0.39 | 0.35 |
| EU-Air | 0.04 | 0.03 | 0.04 | 0.03 |
| FAO-Trade | 0.11 | 0.03 | 0.11 | 0.03 |
| FF-TW-YT | 0.50 | 0.42 | 0.42 | 0.34 |
| Flickr | 0.32 | 0.31 | 0.28 | 0.27 |
| GH-SO-TW | 0.40 | 0.40 | 0.35 | 0.35 |
| Higgs-Twitter | 0.15 | 0.13 | 0.14 | 0.12 |
| London | 0.35 | 0.26 | 0.34 | 0.26 |
| Obama | 0.43 | 0.32 | 0.43 | 0.32 |
| VC-Graders | 0.54 | 0.53 | 0.44 | 0.43 |
| AUCS | 0.47 | 0.19 | 0.43 | 0.15 |
| EU-Air | 1.00 | 0.02 | 1.00 | 0.02 |
| London | 1.00 | 0.01 | 1.00 | 0.01 |
| VC-Graders | 0.30 | 0.28 | 0.22 | 0.20 |
| AUCS | 0.51 | 0.33 | 0.50 | 0.32 |
| EU-Air | 0.20 | 0.14 | 0.20 | 0.14 |
| FAO-Trade | 0.02 | 0.03 | 0.02 | 0.03 |
| FF-TW-YT | 0.47 | 0.41 | 0.47 | 0.41 |
| Flickr | 0.37 | 0.35 | 0.31 | 0.29 |
| GH-SO-TW | 0.64 | 0.63 | 0.61 | 0.60 |
| Higgs-Twitter | 0.58 | 0.58 | 0.52 | 0.52 |
| London | 0.46 | 0.26 | 0.46 | 0.25 |
| Obama | 0.42 | 0.29 | 0.42 | 0.29 |
| VC-Graders | 0.52 | 0.32 | 0.50 | 0.30 |
| GL | 0.786 | 0.734 | 0.512 | 0.511 | 0.783 | 0.729 | 0.504 | 0.503 |
|---|---|---|---|---|---|---|---|---|
| LART | 0.981 | 0.972 | 0.665 | 0.656 | 0.981 | 0.972 | 0.664 | 0.656 |
| M-EMCD∗ | 0.997 | 0.998 | 0.970 | 0.969 | 0.997 | 0.999 | 0.974 | 0.973 |
| #nodes/ | avg. | avg. | clust. | ||||||||||||
| #edges | degree | path | coeff. | ||||||||||||
| length | | | | | | ||||||||||
| | 4/8 | 3.000 | 1.081 | 0.523 | 0.076 | 0.083 | 0.037 | 0.052 | 0.038 | 0.054 | 0.073 | 0.069 | 0.064 | 0.078 | 0.067 |
| | 11/29 | 4.853 | 1.564 | 0.635 | 0.202 | 0.239 | 0.118 | 0.131 | 0.135 | 0.148 | 0.223 | 0.229 | 0.145 | 0.226 | 0.181 |
| | 6/4 | 1.210 | 1.148 | 0.120 | 0.052 | 0.061 | 0.020 | 0.035 | 0.020 | 0.036 | 0.044 | 0.039 | 0.038 | 0.053 | 0.040 |
| | 5/6 | 2.105 | 1.038 | 0.413 | 0.062 | 0.069 | 0.029 | 0.043 | 0.029 | 0.044 | 0.057 | 0.054 | 0.051 | 0.063 | 0.053 |
| | 3/4 | 2.100 | 1.133 | 0.600 | 0.054 | 0.066 | 0.018 | 0.034 | 0.018 | 0.035 | 0.041 | 0.034 | 0.036 | 0.056 | 0.038 |
| | 6/8 | 2.430 | 1.208 | 0.428 | 0.083 | 0.095 | 0.036 | 0.054 | 0.037 | 0.055 | 0.072 | 0.067 | 0.065 | 0.085 | 0.068 |
| | 7/13 | 3.570 | 1.551 | 0.659 | 0.123 | 0.141 | 0.057 | 0.074 | 0.058 | 0.075 | 0.103 | 0.101 | 0.096 | 0.128 | 0.103 |
| Global modularity | 0.651 | 0.754 | 0.314 | 0.424 | 0.337 | 0.447 | 0.615 | 0.593 | 0.495 | 0.689 | 0.550 | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Modularity in Multilayer Networks using
Redundancy-based Resolution and
Projection-based Inter-Layer Coupling††thanks: Accepted for publication with IEEE Trans. on Network Science and Engineering, April 2019. DOI: 10.1109/TNSE.2019.2913325 ††thanks: An abridged version of this work appeared in [1].
Alessia Amelio
Dept. of Computer Engineering, Modeling, Electronics and Systems Engineering (DIMES), University of Calabria, Rende (CS), Italy
Email: {aamelio, tagarelli}@dimes.unical.it.
Giuseppe Mangioni
Dept. of Electrical, Electronics and Computer Engineering, University of Catania, Catania, Italy
Email: [email protected].
Andrea Tagarelli Corresponding author: Andrea Tagarelli. Dept. of Computer Engineering, Modeling, Electronics and Systems Engineering (DIMES), University of Calabria, Rende (CS), Italy
Email: {aamelio, tagarelli}@dimes.unical.it.
Abstract
The generalized version of modularity for multilayer networks, a.k.a. multislice modularity, is characterized by two model parameters, namely resolution factor and inter-layer coupling factor. The former corresponds to a notion of layer-specific relevance, whereas the inter-layer coupling factor represents the strength of node connections across the network layers. Despite the potential of this approach, the setting of both parameters can be arbitrarily selected, without considering specific characteristics from the topology of the multilayer network as well as from an available community structure. Also, the multislice modularity is not designed to explicitly model order relations over the layers, which is of prior importance for dynamic networks.
This paper aims to overcome the main limitations of the multislice modularity by introducing a new formulation of modularity for multilayer networks. We revise the role and semantics of both the resolution and inter-layer coupling factors based on information available from the within-layer and inter-layer structures of the multilayer communities. Also, our proposed multilayer modularity is general enough to consider orderings of network layers and their constraints on layer coupling. Experiments were carried out on synthetic and real-world multilayer networks using state-of-the-art approaches for multilayer community detection. The obtained results have shown the meaningfulness of the proposed modularity, revealing the effects of different combinations of the resolution and inter-layer coupling functions. This work also represents a starting point for the development of new optimization methods for community detection in multilayer networks.
1 Introduction
Complex network systems, such as social networks, biological networks, and transportation networks, are inherently organized into communities, a.k.a. clusters or modules, with dense internal links and sparse external links. Since members of a community tend to generally share common properties, revealing the community structure in a network can provide a better understanding of the overall functioning of the network.
The well-known modularity [2, 3] function was originally conceived to evaluate a community structure in a network graph in terms of difference between the actual number of edges linking nodes inside each community and the expected connectivity in the null model. Typically, the expected connectivity is expressed through a configuration graph model, having a certain degree distribution and randomized edges. Since this graph ignores any community structure, a large difference between actual connectivity and expected connectivity would indicate the presence of a community structure.
Modularity has been widely utilized as objective function in several optimization methods designed for discovering communities in networks [4, 5], including greedy agglomeration [3, 6], spectral division [7, 8], simulated annealing [9], or extremal optimization [10].
Traditional approach to network analysis refers to the modeling of a real-world system as a single network of interacting entities. While this approach has been widely used to study a variety of applications, there are plenty of scenarios for which this methodology appears strongly limiting [11]. In general, ties among entities could be induced by one or several types of relations or interactions, or even be dependent on side-information based on specific dimensions or aspects of interest for the entities in the network. Within this view, multilayer networks [12, 13] represent a powerful tool to model systems interconnected by multiple types of relations. In the multilayer network model, each type of connection is represented by a layer, and an entity may be present in different layers based on the type of relations it is connected to its neighbor entities. Just to mention one real example, nowadays online users usually have multiple accounts across different online social networks, and several online/offline relationships are likely to occur among the same group of individuals (e.g., following relations, like/comment interactions, working together, having lunch). It should be emphasized that neglecting such a kind of complex organization by reducing the whole system to a single network (e.g., through some kind of projection, or by aggregation), has been shown to be much less informative than the multilayer representation [14]. For the above mentioned reasons, multilayer networks are experiencing an increasing interest from the scientific community, leading to an explosion of scientific papers in many areas of science, thus becoming one of the most used tools for interdisciplinary research [15], [16], [11], [13], [17], [18], [12], [19], [20, 21], [22], [23].
Clearly, the problem of community detection in such multilayer networks takes a central role to unveil meaningful patterns of node groupings into communities, by leveraging the various interaction modes that involve all the entity nodes in the network. To address these needs, modularity has been extended to the general case of multilayer networks. In particular, Mucha et al. [15] extend the modularity function to arbitrary multilayer networks (also called multislice in that work), by introducing two additional parameters w.r.t. classic modularity: a resolution parameter and an inter-layer coupling factor. The resolution parameter acts on the expected connectivity terms, thus controlling the effect on the size distribution of community due to the resolution limit known in modularity [24]. The inter-layer coupling factor focuses on the links across layers and hence impacts on the strength of the inter-layer connections of entities in the network. While being important to enhance the ability of modularity in evaluating a community structure, the two parameters introduced in the multislice modularity are nonetheless subjected to arbitrary choices, which raise a number of issues in the application of this modularity function. In particular, the resolution parameter can be arbitrarily set for each layer, but it discards any structure information at graph or community level. Moreover, the inter-layer coupling terms do not differentiate among the selected layers, and all pairs of layers can in principle be considered, which makes no sense in certain scenarios such as modeling of time-evolving networks.
The above considerations prompted us to revisit the notion of modularity in multilayer networks, and in particular to introduce novel aspects to take into account in both the resolution and inter-layer coupling definitions. First, the layer-specific resolution factor is also made dependent on each particular community. We notice that, since a high-quality multilayer community should embed high information content among its nodes, the resolution of a specific layer to control the expected connectivity of a given community in the modularity function should be lower as the contribution of that layer to the information content of the community is higher. By relating the information content of a multilayer community to the amount and variety of types of links internal to the community, we provide a new definition of resolution factor based on the concept of redundancy of community. Second, to determine the strength of coupling of nodes across layers, we again consider it at community level, such that for each pair of layers, the inter-layer coupling factor for nodes in a community depends on the relevance of the community projection on the two layers. Moreover, we account for an available ordering of the layers, and relating constraints on their coupling validity.
Our main contributions are summarized as follows:
- •
We propose a novel definition of multilayer modularity, in which we reconsider the role and semantics of its two key terms, that is, the resolution factor and inter-layer coupling factor. We conceive parameter-free unsupervised approaches for their computation, which leverage information from the within-layer and across-layer structures of the communities in the multilayer network. Moreover, our formulation of multilayer modularity is general enough to account for an available ordering of the layers, therefore is also well-suited to deal with temporal multilayer networks.
- •
We provide theoretical insights into properties of the proposed multilayer modularity. More specifically, we investigate the effect of increasing the number of communities in the behavior of the multilayer modularity, and we analytically derive the lower and upper bounds in the values of the multilayer modularity.
- •
We conduct an extensive experimental evaluation, primarily to understand how the proposed multilayer modularity behaves w.r.t. different settings regarding the resolution and the inter-layer coupling terms. Using 4 state-of-the-art methods for multilayer community detection (GL, PMM, LART, and M-EMCD∗), LFR synthetic multilayer networks and 10 real-world multilayer networks, results have shown the significance of our formulation and the different expressiveness against the previously existing multislice modularity.
2 Related Work
Community detection is a key-enabling task in network analysis and mining, with tons of methods developed in the last ten years — please refer to [25, 26, 16, 27] for surveys on this topic. In addition, different metrics for community structure evaluation have been introduced. As discussed in Section 1, the most popular and widely accepted measure is the so-called “modularity”, defined by Newman [2]. Initially conceived for undirected networks, the modularity function has been subsequently extended to cover different cases. In [28][29], modularity has been generalized to directed networks incorporating information contained in edge directions, while in [30] modularity is also adapted to capture communities in weighted networks. To overcome the well-known modularity resolution limit [24, 31, 32], in [33] and [34] modularity has been modified by incorporating a resolution parameter that helps reveal communities at different resolution scale. A further step towards a generalization of the modularity refers to its extension to signed networks [35, 36]. Also, to deal with bipartite networks, modifications have been proposed in [37, 38, 39]. To uncover overlapping communities, in [40] the authors propose an extension to the modularity function that includes the notion of belonging (or membership) coefficient, which measures to which extent a node belongs to the various communities. This approach is sometimes referred to as fuzzy community discovering. Finally, as introduced in Section 1, modularity has been generalized by [15] to capture communities in multislice networks. Such a version of the modularity function is detailed in the next section.
3 Background
3.1 Modularity
Given an undirected graph G=(\mbox{\mathcal{V}},\mathcal{E}), with n=|\mbox{\mathcal{V}}| nodes and edges, let be a community structure over . For any v\in\mbox{\mathcal{V}}, we use to denote the degree of , and for any community , symbol to denote the degree of , i.e., ; also, the total degree of nodes over the entire graph, d(\mbox{\mathcal{V}}), is defined as d(\mbox{\mathcal{V}})=\sum_{v\in\mathcal{V}}d(v)=\sum_{C\in\mathcal{C}}d(C)=2m. Moreover, we denote with the internal degree of , i.e., the portion of that corresponds to the number of edges linking nodes in to other nodes in . Newman and Girvan’s modularity is defined as follows [2]:
[TABLE]
In the above equation, the first term is maximized when many edges are contained in clusters, whereas the second term is minimized by partitioning the graph into many clusters with small total degrees. The value of ranges within -0.5 and 1.0 [4]; it is minimum for any bipartite network with canonic clustering, and maximum when the network is composed by disjoint cliques.
3.2 Multilayer network model
Let G_{\mathcal{L}}=(V_{\mathcal{L}},E_{\mathcal{L}},\mbox{\mathcal{V}},\mathcal{L}) be a multilayer network graph, where is a set of entities and is a set of layers. Each layer represents a specific type of relation between entity nodes. Let V_{\mathcal{L}}\subseteq\mbox{\mathcal{V}}\times\mathcal{L} be the set containing the entity-layer combinations, i.e., the occurrences of each entity in the layers. is the set of undirected links connecting the entity-layer elements. For every , we define V_{i}=\{v\in\mbox{\mathcal{V}}\ |\ (v,L_{i})\in V_{\mathcal{L}}\}\subseteq\mathcal{V} as the set of nodes in the graph of , and as the set of edges in . Each entity must be present in at least one layer, i.e., \bigcup_{i=1..\ell}V_{i}=\mbox{\mathcal{V}}, but each layer is not required to contain all elements of . We assume that the inter-layer links only connect the same entity in different layers, however each entity in one layer could be linked to the same entity in a few or all other layers.
3.3 Multislice Modularity
Given a community structure identified over a multilayer network , the multislice modularity [15] of is defined as:
[TABLE]
where is the total degree of the multilayer network graph, denotes the degree of node in layer , denotes a link between and in , is the total degree of the graph of layer , is the resolution parameter for layer , quantifies the links of node across layers , . Moreover, the Dirichlet terms have the following meanings: is equal to 1 if and 0 otherwise, is equal to 1 if and 0 otherwise (i.e., the inter-layer coupling is allowed only for nodes corresponding to the same entity), and is equal to 1 if the community assignments of node in and node in are the same and 0 otherwise.
**Limitations of . ** As previously mentioned, a different resolution parameter can be associated with each layer to express its relevance weight; however, in [15], there is no specification of any principled way to set a layer-weighting scheme, possibly including information from the available multilayer community structure. Moreover, neither the inter-layer coupling term (i.e., ) or any constraint on the layer comparability are clearly defined; actually, all nonzero inter-layer edges are set to a constant value , for all unordered pairs of layers. In general, both and parameters can assume any non-negative value, which further increases a clarity issue in the properties of .
4 Proposed Multilayer Modularity
In this section, we propose a new definition of modularity for multilayer networks that aims to overcome all of the issues of previously discussed. We pursue this goal by focusing on the role and semantics of the two key elements in multilayer modularity: the layer-specific resolution and the inter-layer coupling.
Our definitions of the two terms are independent on a-priori assumptions on the network and/or user-specified settings; by contrast, we conceive parameter-free unsupervised approaches for their computation, by leveraging information from the within-layer and inter-layer structures of the communities. Our proposed resolution factor is computed for pairs of layer and community, rather than for each layer globally. Analogously, to define the inter-layer coupling term, we account for properties related to a community on two layers; more in detail, we evaluate the projections of a community over any two comparable layers, i.e., the sets of nodes belonging to a community that lay on those layers. Remarkably, the comparability of layers is another key aspect of our definition of modularity: we generalize the inter-layer coupling term by admitting the existence of a partial order relation over the layers, in order to properly represent scenarios in which a particular ordering among layers is required. For instance, it may be the case that the network layers have to be processed according to their natural order (e.g., lexicographic order of the network labels), or according to a temporal order; moreover, it may be required to compare adjacent layers only, or each layer with any other succeeding it in the ordering. Figure 1 provides an illustration of a multilayer network and the aforementioned key aspects we deal with in our proposed multilayer modularity, which is formally presented next.
Definition 4.1** (Multilayer Modularity).**
Let G_{\mathcal{L}}=(V_{\mathcal{L}},E_{\mathcal{L}},\mbox{\mathcal{V}},\mathcal{L}) be a multilayer network graph, and let be an optionally provided partial order relation over the set of layers . Given a community structure as a partitioning of the multilayer graph , the multilayer modularity is defined as:
[TABLE]
where for any and :
- •
* and are the degree of and the internal degree of , respectively, by considering only edges of layer ;*
- •
* is the value of the resolution function;*
- •
* is the value of the inter-layer coupling function for any valid layer pairings with ;*
- •
* is a parameter to control the exclusion /inclusion of inter-layer couplings; and*
- •
* is the set of valid pairings with defined as:*
[TABLE]
Notably, unlike the multislice modularity in Eq. (2), our proposed modularity originally introduces a resolution factor that varies with each community, and an inter-layer coupling scheme that might also depend on the layer ordering. Moreover, Eq. (3) utilizes the total degree of the multilayer graph instead of the layer-specific degree (i.e., term , for each ). The latter difference w.r.t. the multislice modularity is also important because, as we shall later discuss more in detail, the total degree of the multilayer graph includes the inter-layer couplings and it might be defined in different ways depending on the scheme of inter-layer coupling. In the following, we elaborate on the resolution functional term, , and the inter-layer coupling functional term, .
4.1 Redundancy-based resolution factor
The layer-specific resolution factor intuitively expresses the relevance of a particular layer to the calculation of the expected community connectivity in that layer. While this can always reflect some predetermined scheme of relevance weighting of layers, we propose a more general definition that accounts for the strength of the contribution that a layer takes in determining the internal connectivity for each community. The key assumption underlying our approach is that, since a high quality community should envelope high information content among its elements, the resolution of a layer to control the expected connectivity of a given community should be lowered as the layer’s contribution to the information content of the community increases.
In this regard, the redundancy measure proposed in [41] is particularly suited to quantify the variety of connections, such that it is higher if edges of more types (layers) connect each pair of nodes in the community. Let us denote with the set of node pairs connected in at least one layer in the graph, and with the set of “redundant” pairs, i.e., the pairs of nodes connected in at least two layers. Given a community , and denote the subset of and the subset of , respectively, corresponding to nodes in . The redundancy of , , expresses the number of pairs in with redundant connections, divided by the number of layers connecting the pairs. Formally [41]:
[TABLE]
with .
Note that in the above formula, each of the sets refers to the layers on which two nodes in a redundant pair are linked. Upon this concept, we define the set of supporting layers for each community as:
[TABLE]
Using the above defined , we provide the following definition of redundancy-based resolution factor.
Definition 4.2** (Redundancy-based resolution factor).**
Given a layer and a community , the redundancy-based resolution factor in Eq. (3) is defined as:
[TABLE]
where expresses the number of times layer participates in redundant pairs.
Note that ranges in . In particular, it ranges in as long as participates in at least one redundant pair, and it decreases as increases; moreover, as special case, is equal to 2 when .
Example 4.1**.**
Consider the network with 16 entities and 5 layers shown in Fig. 2, and let us first focus on some specific cases for the computation of the terms. For instance, given community and layers and , the corresponding values of redundancy-based resolution are equal to 2, because never participate in redundant pairs for nodes of . By contrast, layers and participate in one redundant pair for community , which corresponds to the edge linking entities 2 and 3 for and the edge linking entities 1 and 2 for ; therefore, the values of redundancy-based resolution associated with on and are equal to 1. Also, the resolution for on takes a value lower than 1 since there is more than one redundant pair. Table 1 reports on the entire set of values for the resolution factor computed on the network of Fig. 2.
4.2 Projection-based inter-layer couplings
We propose a general and versatile approach to quantify the strength of coupling of nodes in one layer with nodes on another layer. Our key idea is to determine the fraction of nodes belonging to a community onto a layer that appears in the projection of the community on another layer, and express the relevance of this projection w.r.t. that pair of layers.
Given a community and layers , we will use symbols and to denote the projection of onto the two layers, i.e., the set of nodes in that lay on and , respectively. In the following, we define two approaches for measuring inter-layer couplings based on community projection.
For any two layers and community , the first approach, we call symmetric, determines the relevance of inter-layer coupling of nodes belonging to as proportional to the fraction of nodes shared between and that belong to .
Definition 4.3**.**
Given a community and layers , the symmetric projection-based inter-layer coupling, denoted as and referring to term in Eq. (3), is defined as the probability that lays on and :
[TABLE]
The above definition assumes that the two events “ in ” and “ in ” are independent to each other, and it does not consider that the coupling might have a different meaning depending on the relevance a community has on a particular layer in which it is located. By relevance of community, we simply mean here the fraction of nodes in a layer graph that belong to the community; therefore, the larger the community in a layer, the more relevant is w.r.t. that layer. However, we observe that more relevant community in a layer corresponds to less surprising projection from that layer to another. This would imply that the inter-layer coupling for that community is less interesting w.r.t. projections of smaller communities, and hence the strength of the coupling might be lowered. We capture the above intuition by the following definition of asymmetric projection-based inter-layer coupling.
Definition 4.4**.**
Given a community and layers , the asymmetric projection-based inter-layer coupling, denoted as and referring to term in Eq. (3), is defined as the probability that lays on given that lays on :
[TABLE]
**Dealing with layer ordering. ** Our formulation of multilayer modularity is general enough to account for an available ordering of the layers, according to a given partial order relation.
The previously defined asymmetric inter-layer coupling is well suited to model situations in which we might want to express the inter-layer coupling from a “source” layer to a “destination” layer. Given any two layers , it may be the case that only comparison of to , or vice versa, is allowed. This is clearly motivated when there exist layer-coupling constraints, thus only some of the layer couplings should be considered in the computation of .
In practical cases, we might assume that the layers can be naturally ordered to reflect a predetermined lexicographic order, which might be set, for instance, according to a progressive enumeration of layers or to a chronological order of the time-steps corresponding to the layers. That said, we can consider two special cases of layer ordering:
- •
Adjacent layer coupling: iff according to a predetermined natural order.
- •
Succeeding-layer coupling: iff according to a predetermined natural order.
Note that the adjacent layer coupling scheme requires pairs to consider, while the succeeding-layer coupling scheme involves the comparison between a layer and its subsequent ones, i.e., pairs.
Moreover, it should be noted that the availability of a layer ordering enables two variants of the asymmetric projection-based inter-coupling given in Eq. (8). For any two layers , such that holds, we refer to as inner the direct evaluation of , and as outer the case in which and are switched, i.e., .
In the inner case, the strength of coupling is higher as the projection of on the source layer (i.e., the preceding one in the order) is less relevant; vice versa, the outer case weights more the coupling as the projection on the destination layer (i.e., the subsequent one in the order) is less relevant.
Example 4.2**.**
Consider again the example network of Fig. 2. The asymmetric coupling for the projection of community from to is in the inner case, and in the outer case.
We hereinafter use symbols and to distinguish between the inner asymmetric and the outer asymmetric evaluation cases.
Time-evolving multilayer networks*. * So far we have assumed that when comparing any two layers , with , it does not matter the number of layers between and . Intuitively, we might want to penalize the strength of the coupling as more “distant” is from . This is often the case in time-sliced networks, whereby we want to understand how community structures evolve over time.
In light of the above remarks, we define a refinement of the asymmetric projection-based inter-layer coupling, by introducing a multiplicative factor that smoothly decreases the value of the function as the temporal distance between and increases.
Definition 4.5**.**
Given a community and layers , such that , the time-aware asymmetric projection-based inter-layer coupling, denoted as , is defined as
[TABLE]
Note that the second term in the above equation is 1 for the adjacent layer coupling scheme, thus making no penalization effect when only consecutive layers are considered.
Example 4.3**.**
Referring again to the example in Fig. 2, in Table 2 we summarize the mean and standard deviation values for the different variants of the inter-layer coupling factor. One remark is that the values for communities and are higher than those corresponding to communities and . This is mainly due to the representativity of and in all layers. The lowest values are obtained for community , which is in fact less represented than other communities (only in layers and ). For instance, let us focus on this community. The mean inter-layer coupling factor for community is 0.11, since: , (which is exactly the size of without node 16), and ; this determines an inter-layer coupling factor of 1.1, which is divided by the admissible pairings of layers, i.e., 10. On the contrary, the mean for is equal to zero, because the projection of this community is always empty when the adjacent coupling scheme is used.
Finally, Table 3 reports the multilayer modularity values, including the community-specific contributions. Note that, regardless of the settings of and factors, communities and obtain the highest values of modularity, which is mainly determined since they are disconnected from the rest of the graph at layer . In general, it should be noted that the contribution given by each community is consistent w.r.t. the various settings of and factors. Also, it is interesting to note that discarding the inter-layer couplings (, which corresponds to the 9th and 13th columns) can lead to values of community-modularity and global modularity that tend to be much higher than the corresponding cases with . This overestimation can also occur, though to a lesser extent, when fixing (13th column) vs. redundancy-based (9th column). Also, it is worth noting that using the redundancy-based resolution factor with unordered layers (2nd, 3rd and 4th columns) increases the community as well as global modularity vs. the same cases with (10th, 11th, and 12th columns).
4.2.1 Relations between the resolution and inter-layer coupling factors
Both factors take into consideration the network context, however they differ in that considers a “global” multiplex context, whereas considers a “local” multiplex context. Intuitively, is defined for each valid layer-community pair according to the status of the links among nodes in community that lays on versus their status on the other layers. By contrast, considers the status of the same community from one layer to another comparable layer.
In terms of numerical comparison, when the size of the community structure tends to the number of nodes of the network, tends to increase (i.e., to the maximum value of 2) while tends to decrease (i.e., to zero).
4.3 Properties of the proposed multilayer modularity
We provide theoretical insights on , focusing on the effect of increase in the size of the community structure and on the analytical derivation of the range of values of .
4.3.1 Effect of increase in the number of communities
We discuss the effect of increasing (i.e., decreasing the average size of communities in ) by distinguishing three configurations of : (i) symmetric inter-layer coupling, (ii) asymmetric inter-layer coupling, and (iii) ordered layers.
In the first case, tends to have a monotonic decreasing trend. This is easily explained by the combination of three contingencies. The first one is an average decrease in the internal degree . The second contingency is an increase in the redundancy-based resolution factor : in fact, smaller communities correspond to lower probability of observing redundant pairs within communities over different layers; this decreases the logarithmic term in the resolution factor, which will progressively tend to 2 (maximum value). The third contingency is a decrease in the inter-layer coupling factor , since the size of community intersection becomes increasingly smaller as the community size decreases.
By contrast, when equipped with the asymmetric projection-based inter-layer coupling , tends to differ from a monotonic decreasing trend because of the bias term , which increases with communities of smaller size.
In the third case (i.e., ordered layers), can again follow an increasing or decreasing trend. Recall that the term includes the contribution of the inter-layer edges, which obviously are fewer when the layer couplings are order-dependent. A decrease in the number of inter-layer couplings also makes the decrease in the actual connectivity term (i.e., ) slower as increases, since is smaller than in the unordered-layer contingency. Consequently, the inter-layer coupling term could compensate the actual connectivity term, which will result in increasing the value of . Finally, considering time-aware asymmetric inter-layer coupling, is more likely to follow a decreasing trend because of the effect due to the smoothing term , which penalizes for any two no time-consecutive layers. Consequently, since the inter-layer coupling factor is smaller than , could monotonically decrease despite the bias term in .
4.3.2 Lower and upper bounds
To determine the range of values of the basic modularity in simple graphs, the theoretical frameworks previously studied in [4] and [24] define two canonical structures to support the analytical computation of the minimum and maximum value of the modularity, respectively. More specifically, the former work proved that any bipartite graph with the canonical two-way clustering obtains the minimum value of modularity, whereas the latter work proved that the maximum modularity is reached in a graph composed of disjoint cliques.
Following the lead of the above works, here we provide theoretical results about the analytical derivation of the lower bound and upper bound of our proposed multilayer modularity.
Proposition 1**.**
Given a multilayer network G_{\mathcal{L}}=(V_{\mathcal{L}},E_{\mathcal{L}},\mbox{\mathcal{V}},\mathcal{L}), with n=|\mbox{\mathcal{V}}|,\ell=|\mathcal{L}|, and a community structure for , the lower bound of is as follows:
[TABLE]
with for and for , and is the total number of valid layer-pairings.
*Proof. * Proof is reported in the Appendix.
Proposition 2**.**
Given a multilayer network G_{\mathcal{L}}=(V_{\mathcal{L}},E_{\mathcal{L}},\mbox{\mathcal{V}},\mathcal{L}), with n=|\mbox{\mathcal{V}}|,\ell=|\mathcal{L}|, and a community structure for , the upper bound of is as follows:
[TABLE]
with for and for , and is the total number of valid layer-pairings.
*Proof. * Proof is reported in the Appendix.
Note that, in the special case for , i.e., the inter-layer coupling factor is discarded, the lower bound of is
[TABLE]
Analogously, the upper bound of is rewritten as:
[TABLE]
with .
5 Evaluation Methodology
We discuss here the evaluation networks (Sect. 5.1), the multilayer community detection methods (Sect. 5.2), and the experimental settings (Sect. 5.3).
5.1 Datasets
Our selection of network datasets was motivated to fulfill the reproducibility requirement: in fact, all of our evaluation datasets, including both real-world networks and synthetic generators, are publicly available. Moreover, we also took the opportunity of diversifying the choice of real-world networks by considering various domains that are profitably modeled as multilayer networks.
5.1.1 Real-world network datasets
We considered 10 real-world multilayer network datasets. AUCS [42, 43] describes relationships among university employees: work together, lunch together, off-line friendship, friendship on Facebook, and coauthorship. EU-Air transport network [42] (EU-Air, for short) represents European airport connections considering different airlines. FAO Trade network (FAO-Trade) [44] represents different types of trade relationships among countries, obtained from FAO (Food and Agriculture Organization of the United Nations). FF-TW-YT (stands for FriendFeed, Twitter, and YouTube) [12] was built by exploiting the feature of FriendFeed as social media aggregator to align registered users who were also members of Twitter and YouTube. Flickr refers to the dataset studied in [45]. We used the corresponding timestamped interaction network whose links express “who puts a favorite-marking to a photo of whom”. We extracted the layers on a month-basis and aggregated every six (or more) months. GH-SO-TW (stands for GitHub, StackOverflow and Twitter) [46] is another cross-platform network where edges express followships on Twitter and GitHub, and “who answers to whom” relations on StackOverflow. Higgs-Twitter [42] represents friendship, reply, mention, and retweet relations among Twitter users. London transport network [18] (London, for short) models three types of connections of train stations in London: underground lines, overground, and DLR. ObamaInIsrael2013 [47] (Obama, for short) models retweet, mention, and reply relations of users of Twitter during Obama’s visit to Israel in 2013. 7thGraders [18] (VC-Graders, for short) represents students involved in friendship, work together, and affinity relations in the class. Table 4 reports for each dataset, the size of set , the number of edges in all layers, and the average coverage of node set (i.e., ). The table also shows basic, monoplex structural statistics (degree, average path length, and clustering coefficient) for the layer graphs of each dataset.
5.1.2 Synthetic network datasets
Besides the real-world network data, we generated four synthetic multilayer network datasets. Our goal was the evaluation of the multilayer modularity on different network models. Two out of the four networks are composed of 2 layers and 256 entities. In one network, hereinafter referred to as ER-ER, the two layers are Erdös-Rényi (ER) random graphs. In the second network, dubbed LFR-ER, the first layer is generated by the Lancichinetti-Fortunato-Radicchi (LFR) benchmark, while the second layer is an Erdös-Rényi random graph. The other two networks are composed of 4 layers and 128 nodes. Both networks are characterized by two Erdös-Rényi layers and two layers built as Girvan-Newman (GN) graphs, but they differ in the layer ordering: GN-ER-GN-ER in the first network, and GN-ER-ER-GN in the second network.
Moreover, mainly for purposes of efficiency evaluation, we generated a set of synthetic multilayer networks using the Lancichinetti-Fortunato-Radicchi (LFR) benchmark. In particular, single-layer network datasets were provided by the LFR benchmark using a variable number of nodes with steps of 128 until 1024. Also, the maximum and average node degrees were set to 16, and the mixing coefficient was set to 0.1. Each network dataset was characterized by four communities. From each of such networks, a multilayer network was created by replicating the LFR single-layer from 2 to 10.
5.2 Community detection methods
We resorted to state-of-the-art methods for community detection in multilayer networks, which belong to the two major approaches, namely aggregation and direct methods. The former detect a community structure separately for each network layer, after that an aggregation mechanism is used to obtain the final community structure, while the latter directly work on the multilayer graph by optimizing a multilayer quality-assessment criterion. (Note that while it is possible to flatten the multilayer graph in order to apply on it any conventional community detection algorithm, this approach can be too simplistic, since, e.g., it would not permit to investigate about the temporal evolution of communities.)
As exemplary methods of the aggregation approach, we used Principal Modularity Maximization (PMM) [48] and Enhanced Modularity-driven Ensemble-based Multilayer Community Detection (M-EMCD∗) [20]. PMM aims to find a concise representation of features from the various layers (dimensions) through structural feature extraction and cross-dimension integration. Features from each dimension are first extracted via modularity maximization, then concatenated and subjected to PCA to select the top eigenvectors, which represent possible community partitions. Using these eigenvectors, a low-dimensional embedding is computed to capture the principal patterns across all the dimensions of the network, finally a -means on this embedding is carried out to discover a community structure. M-EMCD∗ is a parameter-free extension of the M-EMCD method proposed in [19]. Given an ensemble of community structures available for a multilayer network, M-EMCD optimizes a consensus objective function to discover a consensus solution with maximum modularity, subject to the constraint of being searched over a hypothetical space of consensus community structures that are valid w.r.t. the input ensemble and topologically bounded by two baseline solutions. To detect the initial cluster memberships of nodes, M-EMCD utilizes a consensus or co-association matrix, which stores the fraction of clusterings in which any two nodes are assigned to the same cluster. To filter out noisy, irrelevant co-association, a user-specified threshold must be specified. Besides introducing flexibility in community assignments of nodes during the modularity optimization, M-EMCD∗ overcomes the limitation of setting such a parameter of minimum co-association, by providing a parameter-free identification of consensus clusters based on generative models for graph pruning.
As for the direct methods, we resorted to Generalized Louvain (GL) [15] and Locally Adaptive Random Transitions (LART) [17]. GL extends the classic Louvain method using multislice modularity, so that each node-layer tuple is assigned separately to a community. Majority voting is adopted to decide the final assignment of an entity node to the community that contains the majority of its layer-specific instances. LART is a random-walk based method. It first runs a different random walk for each layer, then a dissimilarity measure between nodes is obtained leveraging the per-layer transition probabilities. A hierarchical clustering method is used to produce a hierarchy of communities which is eventually cut at the level corresponding to the best value of multislice modularity.
It should be emphasized that we selected the above methods because, while having different characteristics, they all use modularity either as optimization criterion (GL, PMM and M-EMCD∗) or as evaluation criterion to produce the final community structure (LART).
Note also that PMM requires the desired number of communities () as input. Due to different size of our evaluation datasets, we devised several configurations of variation of parameter in PMM, by reasonably adapting each of the configuration range and increment step to the network size. Concerning M-EMCD∗, we used the marginal likelihood filter (MLF) to perform parameter-free detection of the number of communities [20].
It should be noted that the selected methods actually discover different community structures, thus supporting our choice in terms of diversity of evaluation scenarios for the two competing modularity measures under study. Table 5 reports the number of communities of the solutions found by the various methods on the real-world network datasets. (The number of communities in PMM is selected according to the solution with highest modularity value.) We found that GL tends to discover a high number of communities for larger networks (i.e., Flickr, Higgs-Twitter, FF-TW-YT, and Obama), and the size distribution of these communities (results not shown) is highly right-skewed on the larger networks, while it is moderately left-skewed on the remaining datasets. A similar result can be observed in M-EMCD∗ for the different networks, although in Higgs-Twitter and FF-TW-YT (resp. GH-SO-TW) the number of communities is much lower (resp. higher) than in GL. By contrast, the best performances of PMM usually correspond to a low and quite stable number of communities. Also, LART generally tends to produce much more communities than the other methods, on the networks for which it is able to discover communities.
As a final general remark, we used the original implementations of the selected methods, based on the source code made available by the respective authors. We emphasize that it is beyond the goals of this work to make any performance improvement in the community detection methods under study, which hence are considered here with no intent of comparative evaluation and with all their limitations. (This justifies, in particular, the inability of LART in terminating the task for some network datasets.)
5.3 Experimental settings
We carried out GL, PMM, LART and M-EMCD∗ methods on each of the network datasets and measured, for each community structure solution, our proposed multilayer modularity () as well as the Mucha et al.’s multislice modularity ().
We evaluated using the redundancy-based resolution factor with either the symmetric () or the asymmetric () projection-based inter-layer coupling. We also considered cases corresponding to ordered layers, using either the adjacent-layer scheme or the succeeding-layer scheme, and for both schemes considering inner () as well as outer () asymmetric coupling. We further evaluated the case of temporal ordering, using the time-aware asymmetric projection-based inter-layer coupling. Yet, we considered the particular setting of uniform resolution (i.e., , for each layer and community ).
As for , we devised two settings: the first by varying within while fixing , the second by varying and [15].
6 Results
We organize our main experimental results into two parts, depending on whether layer ordering was considered in the evaluation networks. Experiments were carried out on an Intel Core i7-3960X CPU @3.30GHz, 64GB RAM machine.
6.1 Evaluation with unordered layers
6.1.1 Synthetic network datasets
Table 6 reports the multilayer modularity , multislice modularity and number of communities obtained by the GL solution on the four synthetic networks.
One first remark is that using the redundancy-based resolution factor always leads to higher w.r.t. the cases corresponding to fixed to 1. In particular, we observe gains up to 0.1 on ER-ER, 0.07 on LFR-ER, and 0.12 on GN-ER-GN-ER and GN-ER-ER-GN.
Another remark is that the fully combination of resolution and inter-layer coupling factors (i.e., rightmost two columns) tends to lower the value of w.r.t. the cases corresponding to varying with (i.e., third last column); moreover, the asymmetric inter-layer coupling results in a higher w.r.t. the symmetric setting of . This would hint that when the normalization term in the equation accounts for the inter-layer couplings, this results in lowering the value of , which is turn smoother when the asymmetric setting is used.
Comparing and , it should be noted that the two measures behave consistently on ER-ER vs. LFR-ER, i.e., the presence of a layer with a (LFR) modular structure actually leads to an increase in both modularity measures w.r.t. ER-ER. By contrast, tends increase faster than on the two GN-ER networks: this can be explained since a higher number of layers (as occurs for the two GN-ER networks than for the ER-ER and LFR-ER networks) has a higher effect on the inter-layer coupling factor , which is not present in .
6.1.2 Real-world network datasets
Tables 7–9 and Fig. 3 report measurements on the community structure solutions obtained by the various community detection methods.
Concerning GL (Table 7), we observe that with the exception of GH-SO-TW on which effects on are equivalent, using leads to higher than . On average over all networks, using yields an increment of 13.4% and 14.6% (with fixed to 1) w.r.t. the value of corresponding to . This higher performance of due to supports our initial hypothesis on the opportunity of asymmetric inter-layer coupling. It is also interesting to note that, when fixing to 1, decreases w.r.t. the setting with redundancy-based resolution — decrement of 11% and 12% using and , respectively.
Table 8 shows results obtained from LART solutions. (Due to memory-resource and efficiency issues shown by the currently available implementation of LART, we are able to report results only on some networks). We observe that the relative performance difference between and settings is consistent with results found in the GL evaluation; this difference is even extreme (0.98 or 0.99) on EU-Air and London, which is likely due also to the different sizes of community structures detected by the two methods (cf. Sect. 5.2).
Table 9 shows results obtained by M-EMCD∗ solutions. Also in this case, using generally leads to better than , regardless of the setting of . In particular, the observed increase is higher in VC-Graders and London (0.2), followed by AUCS (0.18) and Obama (0.13). Moreover, when fixing to 1, in most cases decreases (0.01-0.06) w.r.t. the setting with redundancy-based resolution .
Figure 3 shows how varies in function of the number () of clusters given as input to PMM. One major remark is that tends to decrease as increases. This holds consistently for the configuration of with symmetric inter-layer coupling; in fact, as discussed in Sect. 4.3.1, the decrease of for increasing depends on a combination of decrease of the internal degree , decrease of the symmetric inter-layer coupling factor , and increase of the redundancy-based resolution factor . Moreover, values of corresponding to tend to be close to the ones obtained for on the large networks, while on the smaller ones, trends are above , by diverging for high in some cases; in particular, in London modularity for follows a rapidly, roughly linear increasing trend with ; even more evident is the divergence of the and trends for AUCS. Again, as we previously discussed in Sect. 4.3.1, this is due to the bias term of , which increases with communities of smaller size. Note that, from an inspection of the behavior of for higher regimes of , we also found that values eventually tend to stabilize below 1. As concerns the setting with fixed to 1 (results not shown), while the trends of for and for do not change significantly, the values are typically lower than those obtained with redundancy-based resolution, which is again consistent with results observed for GL, LART and M-EMCD∗ evaluations.
**Correlation analysis. ** We investigated whether any correlation may exist at community-level between the value of and selected statistics based on structural characteristics of the input network. For this purpose, we focused on the average path length, clustering coefficient, redundancy, and node- and edge-set coverage, for each community in an evaluation network; note that the latter two statistics are computed as, given a community , the fraction of nodes (resp. edges) in a layer that belong to , averaged over all layers in the network.
Figure 4 shows the correlation between each of the above structural characteristics and the values of , with redundancy-based resolution factor and , on the solution found on selected networks by GL, M-EMCD∗ and PMM; for the latter, was chosen as that corresponding to the best modularity performance. Note also that the correlation results obtained by with and , and , only, and combination of and , do not show significant differences, hence their presentation is discarded. Looking at the three plots in the figure, we observe a mid-high positive correlation of with the topological measures in most cases. More in detail, in Fig. 4 (a) the modularity of the solution found by GL on EU-Air shows an average correlation of 0.85 with the other measures. Also, an average correlation of 0.95 and 0.96 is obtained between and respectively node-set and edge-set coverage on FF-TW-YT. For AUCS, has a positive correlation of 0.76 with clustering coefficient and a negative correlation with the other measures. For VC-Graders, shows a positive correlation with all measures except with redundancy. For London and GH-SO-TW, the correlation is up to 0.5. For FAO-Trade, shows a higher correlation up to 1 with node-set and edge-set coverage, and a lower correlation up to 0.5 with average path length, clustering coefficient and redundancy. Considering Fig. 4 (b), the multilayer modularity of the solution found by PMM shows an average correlation of 0.99 with clustering coefficient for EU-Air, of 1 and 0.92 with average path length, node-set and edge-set coverage for AUCS and FF-TW-YT, respectively, and of 1 with redundancy for VC-Graders. On the contrary, a correlation of -1 is obtained between and the clustering coefficient and redundancy for AUCS, and between and all measures except the redundancy for VC-Graders. For FAO-Trade, shows a positive correlation up to 1 with average path length, node-set and edge-set coverage, and a negative correlation up to -1 with the redundancy. A weakly negative correlation is shown between and clustering coefficient. In the other cases, the correlation ranges between -0.5 and 0.5. Finally, considering Fig. 4 (c), the multilayer modularity of the solution found by M-EMCD∗ shows a very high correlation with node-set and edge-set coverage in all networks. Also, shows a correlation with the average path length which is up to 0.5 in all networks, with the only exception of EU-Air and GH-SO-TW. For redundancy and clustering coefficient, obtains a high correlation with clustering coefficient and redundancy in FAO-Trade and with clustering coefficient in VC-Graders.
Figure 5 shows the correlation between various settings of and the previously analyzed set of statistics for solutions obtained by LART. Looking at the plots, obtains the highest correlation with the edge-set coverage, followed by the node-set coverage, clustering coefficient and redundancy. Overall, results by LART confirm the trends observed for GL and PMM, with even higher tendency to positive correlation in general. Remarkably, this particularly holds when involves the inter-layer coupling terms, with leading to higher correlation than .
6.2 Evaluation with ordered layers
In this section we focus on evaluation scenarios that correspond to the specification of an ordering of the set of layers. We will present results on the real-world networks EU-Air and Flickr. The former was chosen because of its highest dimensionality (i.e., number of layers) over all datasets, the latter is a time-evolving multilayer network and was chosen for evaluating the time-aware asymmetric inter-layer coupling.
Table 10 summarizes results by GL, LART and M-EMCD∗ on EU-Air, corresponding to adjacent and succeeding-layer coupling. We observe that, regardless of the setting of the resolution factor, values of with succeeding-layer coupling are higher than the corresponding ones for the adjacent layer coupling scheme. This suggests that the impact on the inter-layer coupling term is higher when all ordered pairs of layers are taken into account, than when only adjacent pairs are considered. In this regard, recall that the total degree of the multilayer graph, which normalizes the inter-layer coupling term as well, is properly computed according to the actual number of inter-layer couplings considered, depending on whether adjacent or succeeding-layer scheme was selected.
The above result is also confirmed by PMM, as shown in Fig. 6, where the plots for the succeeding-layer scheme superiorly bound those for the adjacent scheme, over the various . Note also that, while results on EU-Air are shown only for the ascendent layer ordering, by inverting this order we will have a switch between results corresponding to the inner asymmetric case with results corresponding to the outer asymmetric case. Moreover, Fig. 6(b) compares the effect of asymmetric inter-layer coupling on Flickr with and without time-awareness, for PMM solutions. Here we observe that both and plots are above those corresponding to and . This indicates that considering a smoothing term for the temporal distance between layers (Eq. (9)) leads to an increase in modularity. This general result is also confirmed by GL, LART and M-EMCD∗ (results not shown); for instance, GL achieved on Flickr modularity 0.462 for , 0.468 for , and 0.460 for , which compared to results shown in Table 7 represent increments in of 43%. Similarly, M-EMCD∗ obtained on Flickr modularity 0.975 for , , and , which is higher than the corresponding values reported in Table 9 for and , respectively.
**Correlation analysis. ** Analogously to correlation analysis performed for the unordered case, we compare different settings of with selected statistics on topological properties. Figures 7 show results on EU-Air obtained by GL, LART, PMM and M-EMCD∗, respectively. Again, for PMM, was set to the number of communities corresponding to the best modularity performance achieved by the method.
As a general remark is always non-negatively correlated with all topological measures. More specifically, the correlation is highly positive with all measures, when GL is used, and with all measures but average path length and clustering coefficient, when LART and M-EMCD∗ are used; for PMM, correlation is very high with clustering coefficient, and mid-low with the other measures. When equipped with succeeding-layer coupling, correlation is higher than in the adjacent-layer setting with average path length (up to +0.14), node-set coverage (up to +0.02) and redundancy (up to +0.05) for the solution found by GL and M-EMCD∗, and with average path length (up to +0.07), node-set coverage (up to +0.11) and edge-set coverage (up to +0.11) for the solution found by PMM. We also found that the layer ordering does not provide meaningful variations on the correlation values — plots regarding descendent layer ordering are reported in the Appendix.
6.3 Analysis of and comparison with
We discuss here performance results obtained by the community detection algorithms with as assessment criterion. We will refer to the default setting of unordered set of layers as stated in [15].
Using GL, tends to decrease as increases (while decreases, as it was varied with as ). This occurs monotonically in most datasets, within positive ranges (e.g., from 0.636 to 0.384 on FF-TW-YT, from 0.525 to 0.391 on GH-SO-TW) or including negative modularity for higher (e.g., from 0.645 to -0.05 on Flickr, from 0.854 to -4 on AUCS). Remarkably, the simultaneous effect of and on leads on some datasets (Obama, EU-Air, London) to a drastic degradation of modularity (down to much negative values) for some , followed by a rapid increase to modularity of 1 as increases closely to 2. Analogous considerations hold for LART, PMM and M-EMCD∗. For the latter method, the trend of drastic degradation of modularity followed by a rapid increase is only visible for EU-Air. For PMM, the plots of Fig. 8 show results by varying , on the real-world datasets. Surprisingly, it appears that is relatively less sensitive to the variation in the community structure than our . This is particularly visible in AUCS, London (not shown), and Obama where shows no variation for increasing . Also, it is worth noting that, for specific values of , may have an abrupt decrease with very low peaks, as it happens for Obama. By contrast, for FAO-Trade, a value around 0.8 for induces high modularity which is stable for the different values.
When varying within [0..2], with , tends to monotonically increase as increases. This holds consistently on all datasets (Fig. 9 shows plots for some of them) with the only exception of FAO-Trade, where is stable at 1 for 0.8. Variations are always on positive intervals (e.g., from 0.248 to 0.621 on Flickr, from 0.305 to 0.541 on FF-TW-YT, from 0.136 to 0.356 on Higgs-Twitter).
6.3.1 Comparison between and : qualitative evaluation on the solutions generated by the community detection methods
In the light of the above analysis, a few interesting remarks arise by observing the different behavior of and over the same community structure solutions, in function of the resolution and inter-layer coupling factors. From a qualitative viewpoint, the effect of on turns out to be opposite, in most cases, to the effect of our terms on : that is, accounting more for the inter-layer couplings leads to increase , while this does not necessarily happen in . Less straightforward is comparing the use of a constant resolution for all layers, as done in , and the use of variable (i.e., for each pair of layer and community) resolutions, as done in . In this regard, we have previously observed that the use of a varying redundancy-based resolution factor improves w.r.t. the setting . By coupling this general remark with the results (not shown) of an inspection of the values of in the computation of on the various network datasets (which confirmed that values span over its range, in practice), we can conclude that a more appropriate consideration of the term modeling the expected connectivity of community is realized in our w.r.t. keeping the resolution as constant for all layers in .
6.3.2 Comparison between and : evaluation on AUCS ground-truth communities
Let us now consider a further stage of evaluation of vs. , which is complementary to the previous comparative analysis, with the specific purpose of assessing their behavior w.r.t. a ground-truth community structure. To this aim, we resorted again to the AUCS data: in their original work [43], the authors filled a gap in the literature (actually, still largely open) corresponding to a lack of benchmarks for understanding multilayer/multiplex networks. In that work, the authors also provided a ground-truth multiplex community structure for AUCS, which reflects affiliation of the university employees/students to research groups. Please refer to [43] for a detailed description of how this ground-truth was obtained.
[FIGURE:]
In this context of evaluation, we analyzed again the behaviors of and under particular settings of the resolution and inter-layer coupling factors, while keeping fixed the community structure to a reference one corresponding to ground-truth knowledge. Before going into details of such analysis, let us first provide some remarks on the trends of under its two settings previously considered in Sect. 6.3: varying within [0..2] with , and varying within [0..2] with . As we observe from the results shown in Fig. 10, can vary significantly depending on the setting of : when , monotonically increases with , varying within a relatively small range (i.e., from 0.38 to 0.76); however, when , follows an inverse trend, with a more rapid decrease for , and overall wider range (from 0.768 to -1.09). Note that these considerations on the trends are consistent with the previous analysis for computed on the solution found by PMM for AUCS (cf. Figs. 8–9 (e)). We shall come back later on such a high parameter-sensitivity of .
Table 11 shows the global and community-specific values of and for particular combinations of their corresponding and inter-layer coupling factors ( and , respectively). Columns 2 to 5 report basic structural statistics for each of the communities, while the rest of the table is organized into three subtables: the first refers to results, the second to our , and the third again to with -biased settings of . In the latter case, we wanted to make “closer to” by setting its parameters to the values of and , respectively, averaged over the different communities and layers. Moreover, note that ground-truth communities in AUCS are 14 in total, however we only reported results for those (7) that contain more than one node, in order to avoid cluttering the table with roughly constant, zero-close modularity values that correspond to the singleton communities.
Looking at the table, communities and (resp. , , and ) correspond to the highest (resp. lowest) modularity values, for either modularity measure, under each of the parameter-combinations considered. In general, beyond the differences in the respective values of modularity (note that values still differ from the corresponding ones even for the -biased settings), the two measures appear to behave similarly over the various communities. To confirm this intuition, we evaluated the Pearson correlation of different pairings of and community-specific values. Results show indeed almost perfect correlation (always above 0.98).
One aspect of evaluation that we also investigated is whether disrupting the multiplex network by layer may have effect on the comparison between and , which resembles a sort of layer-oriented resiliency analysis. More specifically, based on the structural impact due to the various layers in AUCS [43], we considered the following alternative configurations: (i) we removed layer co-authorship (i.e., the smallest and less connected of all layers), (ii) we removed layers co-authorship and leisure (i.e., the ones having the lowest number of edges), (iii) we removed layer work (i.e., the one with the most edges), (iv) we retained layers work and lunch only. For each of such multiplex-disruption configurations, we replicated the above analysis corresponding to the full multiplex. Results (not shown) indicated no particular differences in terms of rankings of community modularity obtained by and , respectively; however, we also observed a general decrease in Pearson correlation of the pairings of and community-specific values, although the correlation remained still high, in particular always above 0.96 (e.g., when removing layer work, correlation was 0.965 between with and with , and 0.98 between with and with varying ).
To sum up, in this ground-truth evaluation, and exhibited consistently similar behaviors at community level for specific settings of their respective parameters of resolution and inter-layer coupling. However, it should be emphasized that such a similarity between the two modularity values was actually achieved for either canonical settings of in (i.e., and ) or -biased settings of (i.e., ). In general, has shown to be highly sensitive to the settings of its parameters, whereas by contrast, our has the key advantage of automatically determining the resolution and inter-layer coupling factors based on the structural information of the communities in the multilayer network.
6.4 Efficiency evaluation
We analyzed the computation time of for the different combinations of redundancy-based resolution factor and inter-layer coupling factors . The 3-D plots in Figs. 11–12 display the time vs. the number of layers and the number of nodes per layer. For this analysis, we referred to the solutions found by GL for the multilayer networks generated through the LFR benchmark (cf. Section 5.1.2).
As expected, the computation time increases with both the number of layers and nodes per layer, with the latter being less evident when setting fixed to 1. Also, while it is obvious that the computation time is higher when using variable (i.e., redundancy-based) resolution factor than in the case (with percentage increase of 50%, for the maximum number of layers and nodes per layer), we observe much less fluctuations in the plot surfaces than in the case of resolution factor fixed to 1, regardless of the setting of inter-layer coupling factors. The reader is also referred to the Appendix for further results by varying the inter-layer coupling settings.
7 Conclusion
We proposed a new definition of modularity for multilayer networks. Motivated by the opportunity of revising the multislice modularity proposed in [15], we conceived alternative notions of layer resolution and inter-layer coupling, which are key-enabling for generalizations of modularity for multilayer networks. Using four state-of-the-art methods for multilayer community detection, synthetic multilayer networks and ten real-world multilayer networks, we provided empirical evidence of the significance of our proposed modularity.
Our work paves the way for the development of new optimization methods of community detection in multilayer networks which, by embedding our multilayer modularity, can discover community structures having the interesting properties relating to the proposed per-layer/community redundancy-based resolution factors and projection-based inter-layer coupling schemes. In this respect, we point out that our multilayer modularity is able to cope with communities that are overlapping at entity level, which eventually reflect the different roles that the same entity can play when occurring in two or more layers of the network. Within this view, one benefit of adopting the multilayer network model is that the problem of computing soft community-memberships of entities can be translated into a simpler problem of identification of crisp community-memberships of nodes within each layer. Nonetheless, a further interesting direction would be to evaluate our multilayer modularity in contexts of node-overlapping communities. In this case, however, one challenge to face is whether and to what extent an overlapping-aware multilayer modularity should be able to measure the community overlaps within each layer and/or across the layers. Along this direction, it would be interesting to study an integration of our multilayer modularity into recently developed works that propose probabilistic representations or stochastic generative models for overlapping community detection in multilayer networks [49, 50].
Acknowledgements
This research work has been partly supported by the Start-(H)Open POR Grant No. J28C17000380006 funded by the Calabria Region Administration, and by the NextShop PON Grant No. F/050374/01-03/X32 funded by the Italian Ministry for Economic Development.
Appendix
Appendix A Analytical derivation of lower and upper bounds
**Proof of Proposition 1 (Lower bound of ). ** Let us assume that each of the layer graphs in has the form of a bipartite graph , with , and sets and induce a partitioning of the set of nodes in two communities denoted as and , respectively, so that with , and no internal links are drawn between nodes of the same community (because of the bipartite assumption).
To begin with, consider the reduction of to its simplest form, i.e., fixed to 1 for any and . Therefore, the contribution of community to is:
[TABLE]
Since and , is calculated as:
[TABLE]
The same above holds for . Therefore, the minimum value of when and is as follows:
[TABLE]
Let us now consider with its redundancy-based resolution factor while keeping . Since the internal degree of community is 0, there are no redundant pairs for any community and layer, and hence . Consequently, the contribution of to is:
[TABLE]
The same above holds for . Therefore, the lower bound of when is variable and is as follows:
[TABLE]
In the general form of with both resolution and inter-layer coupling factors (i.e., varying and ), the contribution of to is:
[TABLE]
In the above formula, let us indicate the terms , and . If we denote with the total number of valid layer-pairings, then , and can be rewritten as follows:
[TABLE]
For the reduction of the term , we consider the two cases of symmetric inter-layer coupling and asymmetric inter-layer coupling. In the first case, the minimum value for is equal to . Accordingly, is reduced as follows:
[TABLE]
In the second case, . Accordingly, is reduced as follows:
[TABLE]
The above expressions for and hold for . Therefore, the lower bound of in its general form is as follows:
[TABLE]
with for and for .
It should be noted that Eq. 15 is a special case of Eq. 16 with and discarding the contribution given by the inter-layer edges (i.e., ).
**Proof of Proposition 2 (Upper bound of ). ** Let us assume that each of the layer graphs in has community structure such that and each community is a clique with edges. Moreover, there are no external edges connecting the communities, therefore . Note that by uniformly distributing the nodes into the two communities, it can easily be shown that the maximum of is higher.
Analogously to the analysis of the minimum value of , let us first consider the setting , for any , and . Therefore, the contribution of community to is:
[TABLE]
Because and d_{L}(\mbox{\mathcal{V}})=\frac{n}{2}(\frac{n}{2}-1)2\ell, the above expression is rewritten as:
[TABLE]
The same above holds for . Therefore, the maximum of when and is as follows:111If and would have and 1 nodes, respectively, the resulting maximum value of will be , hence lower than what we obtain in Eq. 17.
[TABLE]
Consider now the setting with redundancy-based and . The contribution of to is calculated as:
[TABLE]
Since , it follows that is equal to for each layer and community. Note that the above constant quantity, hereinafter denoted as , tends to be , and it is smaller for higher number of nodes . The contribution of to is rewritten as:
[TABLE]
The same above holds for . Therefore, the maximum of with redundancy-based and is as follows:
[TABLE]
Note that the above quantity is as much closer to 1 as and are higher.
In the general setting of with redundancy-based and , the contribution of to is:
[TABLE]
In the above formula, let us indicate the terms , and . Because , d_{L}(\mbox{\mathcal{V}})=\frac{n}{2}(\frac{n}{2}-1)2\ell+np, and for each layer and community, we obtain:
[TABLE]
[TABLE]
For the reduction of the term , we again consider the two cases of symmetric inter-layer coupling and asymmetric inter-layer coupling. In the first case, . Therefore:
[TABLE]
In the second case, . Therefore:
[TABLE]
The same above holds for . Therefore, the maximum value of is as follows:
[TABLE]
with for and for .
It is worth noting that Eq. 18 is a special case of Eq. 19 with and discarding the contribution given by the inter-layer edges.
Appendix B Evaluation with ordered layers
Figure 13 provides further details on correlation analysis using descendent layer ordering.
Appendix C Efficiency results
Figures 14–15 show the computation time of for the different combinations of and factors.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Amelio and A. Tagarelli, “Revisiting Resolution and Inter-Layer Coupling Factors in Modularity for Multilayer Networks,” in Proc. IEEE/ACM ASONAM , 2017, pp. 266–273.
- 2[2] M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E , vol. 69, no. 2, p. 026113, 2004.
- 3[3] M. E. J. Newman, “Fast algorithm for detecting community structure in networks,” Phys. Rev. E , vol. 69, 2004.
- 4[4] U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEE Trans. Knowl. Data Eng. , vol. 20, no. 2, pp. 172–188, 2008.
- 5[5] M. Chen, K. Kuzmin, and B. K. Szymanski, “Community detection via maximization of modularity and its variants,” IEEE Trans. Comput. Social Syst. , vol. 1, no. 1, pp. 46–65, 2014.
- 6[6] A. Clauset, M. E. J. Newman, and C. Moore, “Finding community structure in very large networks,” Phys. Rev. E , vol. 70, 2004.
- 7[7] M. E. J. Newman, “Modularity and community structure in networks,” Proc. Natl. Acad. Sci. , pp. 8577––8582, 2005.
- 8[8] S. White and P. Smyth, “A spectral clustering approach to finding communities in graph,” in Proc. SDM , 2005.
