Adaptive Modularity Maximization via Edge Weighting Scheme

Xiaoyan Lu; Konstantin Kuzmin; Mingming Chen; Boleslaw K. Szymanski

arXiv:1705.04863·cs.SI·October 10, 2017

Adaptive Modularity Maximization via Edge Weighting Scheme

Xiaoyan Lu, Konstantin Kuzmin, Mingming Chen, Boleslaw K. Szymanski

PDF

TL;DR

This paper introduces a novel edge weighting scheme using a regression model to improve modularity-based community detection, addressing the resolution limit problem and enhancing accuracy on real and synthetic networks.

Contribution

It proposes a new edge weighting approach with a regression model based on local topological features to improve community detection accuracy.

Findings

01

Significant performance improvements on real networks

02

Effective edge weighting with linear-time feature extraction

03

Enhanced community detection accuracy in synthetic graphs

Abstract

Modularity maximization is one of the state-of-the-art methods for community detection that has gained popularity in the last decade. Yet it suffers from the resolution limit problem by preferring under certain conditions large communities over small ones. To solve this problem, we propose to expand the meaning of the edges that are currently used to indicate propensity of nodes for sharing the same community. In our approach this is the role of edges with positive weights while edges with negative weights indicate aversion for putting their end-nodes into one community. We also present a novel regression model which assigns weights to the edges of a graph according to their local topological features to enhance the accuracy of modularity maximization algorithms. We construct artificial graphs based on the parameters sampled from a given unweighted network and train the regression model…

Tables4

Table 1. Table 1 : Notations

Symbol	Meaning
$V$	the set of nodes
$E$	the set of edges
$d_{c}$	the sum of the degrees of nodes in community $c$
$W$	the sum of all edge weights
$W_{c_{i}, c_{j}}$	the sum of weights of edges connecting communities $c_{i}$ and $c_{j}$
$W_{c}^{i n}$	the sum of weights of edges inside community $c$
$W_{c}^{o u t}$	the sum of weights of edges with one endpoint in community $c$
$W_{c}$	the weight of community c, equal to $2 W_{c}^{i n} + W_{c}^{o u t}$
$C$	a partition of the graph, formed by a set of disjoint communities
$Δ Q_{c_{i}, c_{j}}$	the modulairy change caused by joining communities $c_{i}$ and $c_{j}$
$w_{e}$	the weight of edge $e$
$x_{e}$	the topological feature vector of edge $e$
$h ()$	the loss function

Table 2. Table 2 : Summary of the networks

No.	Network	#Nodes	#Edges	Type	Ref.
1	American college football	115	613	Real	[10]
2	LFR benchmark	5000	$\approx$ 35000	Synthetic	[18]
3	Amazon product co-purchasing network	334863	925872	Real	[36]
4	DBLP collaboration network	317080	1049866	Real	[36]

Table 3. Table 3 : Metric values characterizing the community structures computed over the original unweighted LFR benchmark networks but discovered by different algorithms. FG: Fast Greedy modularity maximization algorithm on the original unweighted graphs. FG-w: Fast Greedy modularity maximization algorithm running on the weighted graphs produced by our model.

$μ$	Method	VI	NMI	F-measure	ARI	$Q$	$Q_{d s}$
0.45	FG	3.2135	0.5953	0.3379	0.2355	0.4214	0.0366
	Fine-Tuned $Q_{d s}$	1.1523	0.8925	0.8806	0.7337	0.4536	0.1632
	FG-w	0.0137	0.9987	0.9990	0.9972	0.5152	0.1668
0.5	FG	3.5187	0.5481	0.2937	0.1993	0.3739	0.0274
	Fine-Tuned $Q_{d s}$	1.9677	0.8036	0.7489	0.4984	0.3563	0.1196
	FG-w	0.0678	0.9934	0.9950	0.9864	0.4625	0.1381

Table 4. Table 4 : Metric values characterizing the community structures computed over the original unweighted American college football network but discovered in either the original unweighted graph or the corresponding weighted graph produced by our model. FG: Fast Greedy algorithm [ 8 ] , LE: leading eigenvector method [ 21 ] , LP: label propagation algorithm [ 27 ] , RW: community detection based on random walks [ 26 ] , ML: multilevel algorithm [ 3 ] , NMI: normalized mutual information, ARI: adjusted rand index.

Metric	Graph	FG	LE	LP	RW	ML
NMI	Original	0.58528	0.58140	0.76962	0.83833	0.83391
NMI	Weighted	0.91117	0.85903	0.92635	0.91117	0.87272
ARI	Original	0.49333	0.49441	0.71749	0.86938	0.85815
ARI	Weighted	0.94723	0.88982	0.91539	0.94723	0.90085
$Q$	Original	0.56860	0.49326	0.57668	0.60337	0.60503
$Q$	Weighted	0.60140	0.59338	0.57315	0.60140	0.60356
$Q_{d s}$	Original	0.15877	0.13661	0.21106	0.23650	0.23626
$Q_{d s}$	Weighted	0.25696	0.23893	0.24025	0.25696	0.24889

Equations61

Q (G, C) = c_{i} \in C \sum [\frac{∣ E _{c_{i}}^{in} ∣}{∣ E ∣} - (\frac{d _{c_{i}}}{2∣ E ∣})^{2}]

Q (G, C) = c_{i} \in C \sum [\frac{∣ E _{c_{i}}^{in} ∣}{∣ E ∣} - (\frac{d _{c_{i}}}{2∣ E ∣})^{2}]

Q^{w} (G^{w}, C) = c_{i} \in C \sum [\frac{W _{c_{i}}^{in}}{W} - (\frac{W _{c_{i}}}{2 W})^{2}]

Q^{w} (G^{w}, C) = c_{i} \in C \sum [\frac{W _{c_{i}}^{in}}{W} - (\frac{W _{c_{i}}}{2 W})^{2}]

∣ E_{i}^{in} ∣ \geq \frac{∣ E ∣}{2} .

∣ E_{i}^{in} ∣ \geq \frac{∣ E ∣}{2} .

C^{*} = C = {c_{i}} \cup_{i} c_{i} = V arg max Q (G, C)

C^{*} = C = {c_{i}} \cup_{i} c_{i} = V arg max Q (G, C)

Jaccard (u, v) = \frac{∣ N ( u ) \cap N ( v ) ∣}{∣ N ( u ) \cup N ( v ) ∣}

Jaccard (u, v) = \frac{∣ N ( u ) \cap N ( v ) ∣}{∣ N ( u ) \cup N ( v ) ∣}

w \in N (u) \cap N (v) ⋃ \frac{1}{∣ N ( w ) ∣}

w \in N (u) \cap N (v) ⋃ \frac{1}{∣ N ( w ) ∣}

w \in N (u) \cap N (v) ⋃ \frac{1}{lo g ∣ N ( w ) ∣}

w \in N (u) \cap N (v) ⋃ \frac{1}{lo g ∣ N ( w ) ∣}

rel (u, v) = \frac{min ( ∣ N ( u ) ∣ , ∣ N ( v ) ∣ )}{max ( ∣ N ( u ) ∣ , ∣ N ( v ) ∣ )}

rel (u, v) = \frac{min ( ∣ N ( u ) ∣ , ∣ N ( v ) ∣ )}{max ( ∣ N ( u ) ∣ , ∣ N ( v ) ∣ )}

Δ Q_{c_{i}, c_{j}}^{w} = \frac{W _{c_{i}, c_{j}}}{W} - \frac{W _{c_{i}} W _{c_{j}}}{2 W ^{2}}

Δ Q_{c_{i}, c_{j}}^{w} = \frac{W _{c_{i}, c_{j}}}{W} - \frac{W _{c_{i}} W _{c_{j}}}{2 W ^{2}}

Δ Q_{c_{i}^{1}, c_{i}^{2}}^{w} \leq 0 for i \in I

Δ Q_{c_{i}^{1}, c_{i}^{2}}^{w} \leq 0 for i \in I

w min F (w) = (\overset{w}{ˉ} - 1)^{2} + λ_{1} σ_{w}^{2} + λ_{2} 1 \leq i \leq I \sum h (Δ Q_{c_{i}^{1}, c_{i}^{2}})

w min F (w) = (\overset{w}{ˉ} - 1)^{2} + λ_{1} σ_{w}^{2} + λ_{2} 1 \leq i \leq I \sum h (Δ Q_{c_{i}^{1}, c_{i}^{2}})

w_{e} = p_{0} + i = 1 \sum r p_{i} x_{e}^{< i >}

w_{e} = p_{0} + i = 1 \sum r p_{i} x_{e}^{< i >}

w_{e} = p^{T} x_{e}

w_{e} = p^{T} x_{e}

\frac{\partial F ( w ( p ))}{\partial p _{i}} = \frac{\partial F ( w )}{\partial w} \times \frac{\partial w}{\partial p _{i}}

\frac{\partial F ( w ( p ))}{\partial p _{i}} = \frac{\partial F ( w )}{\partial w} \times \frac{\partial w}{\partial p _{i}}

\frac{\partial w}{\partial p _{i}} = (x_{1}^{< i >}, x_{2}^{< i >}, \dots, x_{∣ E ∣}^{< i >})

\frac{\partial w}{\partial p _{i}} = (x_{1}^{< i >}, x_{2}^{< i >}, \dots, x_{∣ E ∣}^{< i >})

\frac{\partial F ( w )}{\partial w} = \frac{\partial ( w ˉ - 1 ) ^{2}}{\partial w} + λ_{1} \frac{\partial σ _{w}^{2}}{\partial w} + λ_{2} i \in I \sum \frac{\partial h ( Δ Q _{c_{i}^{1}, c_{i}^{2}}^{w} )}{\partial Δ Q _{c_{i}^{1}, c_{i}^{2}}^{w}} \frac{\partial Δ Q _{c_{i}^{1}, c_{i}^{2}}^{w}}{\partial w}

\frac{\partial F ( w )}{\partial w} = \frac{\partial ( w ˉ - 1 ) ^{2}}{\partial w} + λ_{1} \frac{\partial σ _{w}^{2}}{\partial w} + λ_{2} i \in I \sum \frac{\partial h ( Δ Q _{c_{i}^{1}, c_{i}^{2}}^{w} )}{\partial Δ Q _{c_{i}^{1}, c_{i}^{2}}^{w}} \frac{\partial Δ Q _{c_{i}^{1}, c_{i}^{2}}^{w}}{\partial w}

W = e \in E_{a} \sum w_{e} = e \in E_{a} \sum p^{T} x_{e} = p^{T} e \in E_{a} \sum x_{e}

W = e \in E_{a} \sum w_{e} = e \in E_{a} \sum p^{T} x_{e} = p^{T} e \in E_{a} \sum x_{e}

V I (C, GN) = H (C) + H (GN) - 2 I (C, GN)

V I (C, GN) = H (C) + H (GN) - 2 I (C, GN)

H (C)

H (C)

H (C, GN)

= - c_{i} \in C, g_{i} \in GN \sum \frac{∣ c _{i} \cap g _{i} ∣}{∣ V ∣} lo g \frac{∣ c _{i} \cap g _{i} ∣}{∣ V ∣}

N M I (C, GN) = \frac{2 I ( C , GN )}{H ( C ) + H ( GN )}

N M I (C, GN) = \frac{2 I ( C , GN )}{H ( C ) + H ( GN )}

F-measure (C, GN) = \frac{1}{∣ V ∣} c_{i} \in C \sum ∣ c_{i} ∣ g_{i} \in GN max \frac{2∣ c _{i} \cap g _{i} ∣}{∣ c _{i} ∣ + ∣ g _{i} ∣}

F-measure (C, GN) = \frac{1}{∣ V ∣} c_{i} \in C \sum ∣ c_{i} ∣ g_{i} \in GN max \frac{2∣ c _{i} \cap g _{i} ∣}{∣ c _{i} ∣ + ∣ g _{i} ∣}

A R I (C, GN) = \frac{\sum _{ij} ( 2 ∣ c _{i} \cap g _{j} ∣ ) - \frac{[ \sum _{i} ( 2 ∣ c _{i} ∣ ) \sum _{j} ( 2 ∣ g _{j} ∣ ) ]}{( 2 ∣ V ∣ )}}{\frac{1}{2} [ \sum _{i} ( 2 ∣ c _{i} ∣ ) + \sum _{j} ( 2 ∣ g _{j} ∣ ) ] - \frac{[ \sum _{i} ( 2 ∣ c _{i} ∣ ) \sum _{j} ( 2 ∣ g _{j} ∣ ) ]}{( 2 ∣ V ∣ )}} .

A R I (C, GN) = \frac{\sum _{ij} ( 2 ∣ c _{i} \cap g _{j} ∣ ) - \frac{[ \sum _{i} ( 2 ∣ c _{i} ∣ ) \sum _{j} ( 2 ∣ g _{j} ∣ ) ]}{( 2 ∣ V ∣ )}}{\frac{1}{2} [ \sum _{i} ( 2 ∣ c _{i} ∣ ) + \sum _{j} ( 2 ∣ g _{j} ∣ ) ] - \frac{[ \sum _{i} ( 2 ∣ c _{i} ∣ ) \sum _{j} ( 2 ∣ g _{j} ∣ ) ]}{( 2 ∣ V ∣ )}} .

Q_{d s}

Q_{d s}

d_{c_{i}}

W_{c_{i}}^{in} = e \in E_{c_{i}}^{in} \sum w_{e} \geq e \in E_{c_{i}}^{in} \sum 1 = ∣ E_{c_{i}}^{in} ∣.

W_{c_{i}}^{in} = e \in E_{c_{i}}^{in} \sum w_{e} \geq e \in E_{c_{i}}^{in} \sum 1 = ∣ E_{c_{i}}^{in} ∣.

Q^{w} (G^{w}, C)

Q^{w} (G^{w}, C)

\geq c_{i} \in C \sum [\frac{∣ E _{c_{i}}^{in} ∣}{∣ E ∣} - (\frac{d _{c_{i}}}{2∣ E ∣})^{2}]

= Q (G, C) .

Δ Q_{c_{i}, c_{j}}^{w}

Δ Q_{c_{i}, c_{j}}^{w}

\leq \frac{∣ E _{c_{i}, c_{j}} ∣}{∣ E ∣} - \frac{d _{c_{i}} d _{c_{j}}}{2∣ E ∣ ^{2}} = Δ Q_{c_{i}, c_{j}}

- Δ Q_{c_{i}, c_{j}}^{w}

- Δ Q_{c_{i}, c_{j}}^{w}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Adaptive Modularity Maximization via Edge Weighting Scheme

Xiaoyan Lu

Konstantin Kuzmin

Mingming Chen

Boleslaw K. Szymanski

[email protected]

Department of Computer Science, Rensselaer Polytechnic Institute

Google Inc.

Abstract

Modularity maximization is one of the state-of-the-art methods for community detection that has gained popularity in the last decade. Yet it suffers from the resolution limit problem by preferring under certain conditions large communities over small ones. To solve this problem, we propose to expand the meaning of the edges that are currently used to indicate propensity of nodes for sharing the same community. In our approach this is the role of edges with positive weights while edges with negative weights indicate aversion for putting their end-nodes into one community. We also present a novel regression model which assigns weights to the edges of a graph according to their local topological features to enhance the accuracy of modularity maximization algorithms. We construct artificial graphs based on the parameters sampled from a given unweighted network and train the regression model on ground truth communities of these artificial graphs in a supervised fashion. The extraction of local topological edge features can be done in linear time, making this process efficient. Experimental results on real and synthetic networks show that the state-of-the-art community detection algorithms improve their performance significantly by finding communities in the weighted graphs produced by our model.

keywords:

community detection, scalability, modularity maximization, regularization

††journal: Journal of Information Sciences

1 Introduction

Community structures are observed across a wide variety of networks, including World Wide Web, Internet, collaboration, transportation, social and biochemical networks. Many important tasks, such as data extraction, link prediction, network evolution analysis, and graph mining are based on the community structures discovered in these networks.

Modularity maximization is one of the state-of-the-art methods for community detection that has gained popularity in the last decade. It aims at discovering the partition of the network which maximizes modularity [22], a widely used community quality measure proposed by Newman et al. Modularity measures the difference between the observed fraction of edges within a community and the fraction of edges expected in a random graph with the same number of nodes and the same degree sequence. Thus, high positive modularity indicates the quality of a community structure in the network. Although modularity maximization has been widely used in many applications, in certain cases it tends to merge small communities into large ones, giving rise to the so-called resolution limit problem [11]. In the literature, initially, it was assumed that community structure with maximum modularity is the best. Discovery of the resolution limit problem demonstrated that this is not the case. Another assumption is that the number of communities in the given graph is unknown.

In this paper, we propose to expand the meaning of the edges that are currently used to indicate propensity of nodes for sharing the same community. In our approach this is the role for edges with positive weights while edges with negative weights indicate aversion for putting end-nodes into one community. We also propose a novel feature-based edge weighting scheme that learns how the local topological features indicate whether a given edge is intra- or inter-community using small artificial graphs similar to a network in question. Further, we demonstrate that our proposed regression model assigns weights to edges in such a way that the state-of-the-art community detection algorithms achieve higher accuracy on the produced weighted graphs than they do on the original unweighted ones. Recent work [2] shows that edge weighting scheme is capable of decreasing the upper bound on the size of communities detectable by modularity maximization. A similar approach has been adapted in [9] where edges are weighted according to their centrality. In contrast to [2, 9] where the edge weighting schemes are specified by experts, we develop a feature-based regression model and use labeled ground truth communities in artificial networks as training data to infer the suitable weights for edges of the input graph. These artificial networks are constructed to have degree distribution and clustering coefficient similar to the original unweighted networks. Considering the comprehensive definition of local community structures across different network instances, the regression model trained by ground truth community111If ground truth communities are not available then thanks to the small size of the artificial graph, we use communities detected algorithmically as ground truth. in the artificial networks is therefore able to assign such weights to the edges that community detection is enhanced. Furthermore, the local topological features of edges can be extracted efficiently; so our model converts a graph into a weighted one in a time proportional to the number of edges in a network.

The experimental results on real and synthetic networks show that modularity maximization algorithms achieve higher accuracy on weighted graphs than on the original unweighted ones. For example, the optimal modularity obtained by the Fast Greedy algorithm [8] increases by at least 15% on an LFR benchmark [18]. We also show that our approach solves the resolution limit problem on the American college football network [10]. In addition, the state-of-the-art community detection algorithms, including the label propagation algorithm of Raghavan et al. [27], Newman’s leading eigenvector method [21], algorithms based on random walks [26] and the multilevel algorithm of Blondel et al. [3], also improve their performance on the weighted graph produced by our approach, which validates the point that weighting graphs properly guides the algorithm to the desirable community detection results.

This paper is organized as follows. Section 2 introduces the related work on modularity maximization and edge weighting schemes. Section 3 discusses the effectiveness of the edge weighting scheme. The regression model is presented in Section 4, followed by the description of the key speedup improvements of the training algorithm. In Section 5, we describe the experimental results on real and synthetic networks. We close our work with conclusions presented in Section 6.

2 Related Work

2.1 Modularity maximization

The goal of the modularity maximization is to discover community structure in a network by maximizing the modularity, defined as

[TABLE]

where $G=(V,E)$ is an unweighted, undirected graph with the node set $V$ and the edge set $E$ ; $C=\{c_{i}\}$ is a partition of $G$ into communities, $c_{i}$ is the set of nodes in the $i$ -th community, $d_{c_{i}}$ is the sum of degrees of nodes in $c_{i}$ , $E_{c_{i}}^{in}$ denotes the set of edges residing within community $c_{i}$ .

The modularity can be naturally extended to the networks with weighted edges by replacing the count of edges with the sum of their weights. Hence, the weighted modularity is defined as

[TABLE]

where $W$ is the sum of weights of edges in the entire graph, $W_{c_{i}}^{in}$ is the sum of weights of edges within community $c_{i}$ , and the weight of a community is defined as $W_{c_{i}}=2W_{c_{i}}^{in}+W_{c_{i}}^{out}$ where $W_{c_{i}}^{out}$ is the sum of weights of edges with exactly one endpoint inside $c_{i}$ . The original definition of modularity is a special case of the weighted version when the weight of every edge is 1.

Many algorithms including [3, 8, 20, 23, 31, 34] were proposed to discover communities in a network by maximizing the modularity. One interesting finding is that Newman’s modularity measure is related to the broader family of spectral clustering methods [34]. There are two categories of spectral algorithms for maximizing modularity: one is based on the modularity matrix [21, 22, 28], the other is based on the Laplacian matrix of a network [34, 29]. The first greedy algorithm, Fast Greedy [8], iteratively merges communities in the network to maximize the modularity. Initially, every node is a single community. In every step, two communities joining of which results in the largest modularity among all partitions created by temporary merging one pair of communities are merged together. After $|V|-1$ steps, there is a single community remaining in the network and there are a total of $|V|$ partitions, each generated by a single step. Then the algorithm outputs the partition with the largest modularity.

The greedy algorithms solve the maximization problem efficiently, yet they suffer from the resolution limit problem. This problem is defined as an increase of modularity when small well-formed (or ground truth) communities are undesirably joined together into a large community. As pointed out in [11, 12], this problem arises because the definition of modularity does not penalize for the increase of the diameter in a community created by merging together smaller ones. In recent work [5], Chen et al. introduced a new quality metric, called modularity density, to limit such bias towards large communities. The new metric is also shown to be able to handle another known weakness of modularity, the counterproductive splitting of large communities. This is because the modularity density takes into account the density of discovered communities and penalizes the splitting of large communities. The fine-tuned $Q_{ds}$ algorithm [6] was proposed to maximize this new quality metric.

2.2 Edge weighting scheme

Efforts have been made to improve the performance of community detection by using fine-tuned similarity measures between pairs of nodes. Such methods enhance the performance of clustering algorithms via smart edge weighting strategies. In [30], the edge weight is obtained by fusing content (pictures, tags, text) and link information (friends, followers, users) for community detection in social networks. In [19], the authors use a set of must-link links and cannot-link links as constraints of the symmetric non-negative matrix factorization (SNMF) approach to improve the quality of discovered communities. The must-link links (edges residing within communities) and the cannot-link links (edges connecting nodes in different communities) are presumed to be known in advance. With a focus on link prediction and recommendations, Leskovec et al. [1] proposed the supervised random walk algorithm which converts an unweighted graph to a weighted graph in order to improve the performance of the random walk algorithm. Recent work [35] proposed a random walk based approach to assign weights to nodes so that irrelevant nodes called free-riders can be excluded from some small communities. Other works including [7, 17, 2] discuss the limitations of modularity maximization in unweighted graphs and use edge weighting schemes to improve the performance of modularity maximization algorithms.

3 Edge Weighting Scheme to Enhance Community Detection

As shown in [11], for the modularity maximization to be able to find a community $c_{i}$ with $|E^{in}_{i}|$ edges inside, the following inequality must hold,

[TABLE]

where $E^{in}_{i}$ is the set of the edges inside community $c_{i}$ , and $E$ is the set of edges of the entire graph. In large networks with millions of edges, the number of edges in most communities is often smaller than this lower bound. In [2], it has been shown that edge weighting scheme is capable of decreasing such theoretical bound and enhancing community detection performance in practice. Inspired by this result, we define the edge weighting scheme that enhances particular community as follows.

Definition 1

An edge weighting scheme enhances a community $c_{i}$ by sum of additional weights $e_{i}=\sum_{e\in{E^{in}_{c_{i}}}}(w_{e}-1)>0$ if $w_{e}\geq 1$ for $\forall e\in E^{in}_{c_{i}}$ , and $w_{e}\leq 1$ for $\forall e\in E^{out}_{c_{i}}$ while $W_{c_{i}}=d_{c_{i}}$ holds. Such a scheme is a balance enhancement if both communities connected by the cross-community edge with decreased weight are enhanced.

The edge weighting scheme enhances a community $c_{i}$ by increasing the weights of edges residing within this community with added total weight of $e_{i}>0$ , while reducing the weight of edges crossing to other communities by $2e_{i}$ to preserve the weight of the community $W_{c_{i}}$ equal to $d_{c_{i}}$ . It is worth noting that a balanced enhancement preserves the total weight of the original graph, which is $W=|E|$ . Here, we show that such balanced weighting scheme is non-decreasing modularity operation on a graph.

Theorem 1

$Q^{w}(G^{w},C)\geq Q(G,C)$ * if the weighting scheme is balanced.*

Proof 1

See A.

Although the edge weighting scheme which enhances one community in a partition always increases this community’s modularity, it does not necessarily guarantee that such enhanced partition would maximize modularity in the weighted graph. Here, we define a notion of locally maximal partition and prove that the proper edge weighting scheme can preserve such property.

Definition 2

Modularity of a partition $C$ is locally maximal if the modularity decreases upon splitting any community in $C$ or joining any two communities in $C$ .

Theorem 2

$\Delta Q^{w}_{c_{i},c_{j}}\leq\Delta Q_{c_{i},c_{j}}$ * if $c_{i},c_{j}$ are enhanced by the balanced edge weighting scheme.*

Proof 2

See A.

Theorem 3

If community $c$ with $d_{c}\leq\sqrt{8|E|}$ , is enhanced by the balanced edge weighting scheme and split into communities $c_{i},c_{j}$ and $\Delta Q_{c_{i},c_{j}}\geq 0$ then also $\Delta Q^{w}_{c_{i},c_{j}}\geq 0$ .

Proof 3

See A.

From Theorems 2 and 3 it follows immediately that if partition $C^{*}$ is locally maximal and each community $c$ satisfies the condition $d_{c}\leq\sqrt{8|E|}$ and is enhanced by the balanced edge weighting scheme, the modularity of this partition is locally maximal also for the weighted graph $G^{w}$ .

Since, by Theorem 2, joining communities $c_{i}$ and $c_{j}$ makes change $\Delta Q^{w}_{c_{i},c_{j}}\leq\Delta Q_{c_{i},c_{j}}$ , it is entirely possible that $\Delta Q^{w}_{c_{i},c_{j}}\leq 0<\Delta Q_{c_{i},c_{j}}$ . Thus, modularity maximization for the graph with the enhanced weights will avoid joining possibly well-formed communities $c_{i}$ and $c_{j}$ while the maximization on the original graph would join them. This example demonstrates that if the well-formed small communities are enhanced, then their chances of being detected will increase. This observation motivates us to propose a regression model for assigning weights to edges so that the real ground truth communities can be enhanced in a network.

4 Approach

4.1 Overview

Provided a graph $G=(V,E)$ , the modularity maximization problem is to find a partition of the graph that maximizes the modularity. A partition of the graph is defined as a set of disjoint communities $C=\{c_{i}\}$ . The modularity maximization seeks to find the partition $C^{*}$ such that,

[TABLE]

where $Q(G,C)$ is defined by Eq. (1). Since the modularity maximization problem is known to be $\mathcal{NP}$ -hard problem [4], almost all proposed solutions are heuristics which do not guarantee the optimality of the partition. In this paper, we follow the same paradigm of the original modularity maximization to detect communities but seek to assign weights to the edges to improve the quality of results. To be precise, a regression model is developed to convert an unweighted graph $G$ to a weighted graph $G^{w}$ so that modularity maximization finds communities of better quality by maximizing modularity in $G^{w}$ rather than in $G$ . The regression model takes the local topological features of edges as input and outputs the weight of every edge. Notations used in this paper are listed in Table 1.

As illustrated in Figure 1, the proposed procedure is divided into the following three steps:

Artificial network construction done to estimate the network parameters of the input graph and construct a similar artificial graph in which the ground truth communities are known beforehand by construction. The goal is to ensure these ground truth communities can be successfully separated in the modularity maximization process. The construction scheme is described in Section 4.2.

2.

Edge feature extraction executed on the edges of the artificial graph. The edge features are used as input to the regression model. The specific features selected by us for this purpose are discussed in Section 4.3.

3.

Regression on edge weights uses a regression model to compute the edge weights such that the modularity maximization is able to separate adjacent ground truth communities in the artificial network. Section 4.4 covers the details of the regression model and the corresponding training algorithm.

4.2 Artificial network construction

The first step is to construct a small artificial network with the ground truth communities and with topological properties similar to the properties of the input graph. The negative edge weights are introduced to discourage the algorithm from merging ground truth communities connected by cross community edges. For clarity, we describe the usage of these ground truth communities in Section 4.4 and here we focus on the construction scheme.

Given a large unweighted input graph, our approach constructs an artificial network which shares degree distribution and clustering coefficients with the input graph. Specifically, multiple Stochastic Block Model (SBM) networks [15] are created with high intra-block edge densities and with a few randomly chosen inter-block edges, resulting in a relatively small inter-block edge density and with blocks forming ground truth communities. Then, the edges in these SBM graphs are randomly removed from network instances until the average node degree becomes close to that of the input graph. Among all SBM network instances, the one with the average clustering coefficient closest in its value to the input graph is chosen as the final artificial network.

The ground truth communities (i.e., the nodes in blue and red in Figure 1) in the artificial network are used as training data to infer the parameters of the regression model. As Theorem 2 suggests, if the correct communities have been enhanced, then the probability of properly detecting these communities would increase. Therefore, the regression model incorporates these ground truth communities into the detection algorithm framework to enhance it. This approach will be discussed in detail in Section 4.4.

4.3 Extracting edge features

Since communities are considered local structures, the second step of our approach is to extract the local topological features of every edge in the network. For each edge $e=(u,v)$ in the graph, the following local topological features are extracted efficiently from the network.

f-1. The square root of the number of common neighbors, $\sqrt{\mathcal{N}(u)\cap\mathcal{N}(v)}$ , where $\mathcal{N}(v)$ denotes the set of neighbors of node $v$ .

f-2. The difference in clustering coefficients of the endpoints, $|c(u)-c(v)|$ , where $c(v)$ denotes the clustering coefficient of node $v$ .

f-3. Jaccard-coefficient which is defined as

[TABLE]

f-4. Resource allocation index which is defined as

[TABLE]

f-5. Adamic-Adar index which is defined as

[TABLE]

f-6. Relative degree ratio which is defined as

[TABLE]

When the degrees of nodes $u$ , $v$ are equal, $\text{rel}(u,v)=1$ .

The attributes of edges or nodes, such as text content and user profiles, can also be used as features, if they are available. Using more local topological features generally leads to better accuracy because more information is embedded in these features.

4.4 Regression model

As pointed out in [11], community detection algorithms based on modularity maximization tend to execute counterproductive merges of small communities into large ones. One way to handle this resolution limit problem is to cause such merging operations to decrease the modularity.

According to the definition of weighted modularity, the change in $Q$ upon joining two communities $c_{i}$ and $c_{j}$ is

[TABLE]

where $W_{c_{i},c_{j}}$ is the sum of weights of the edges between $c_{i}$ and $c_{j}$ , $W_{c_{i}}=2W_{c_{i}}^{in}+W_{c_{i}}^{out}$ is twice the sum of weights of the edges inside community $c_{i}$ plus the sum of weights of the edges with exactly one edge in $c_{i}$ , and W is the sum of weights of all edges.

To avoid the merging of some pairs of small communities $\{(c_{i}^{1},c_{i}^{2})\}_{i\in I}$ existing in the artificial networks described in Section 4.2, joining them should cause a decrease of modularity, hence

[TABLE]

where $I$ is the parameter defining number of pairs of small communities to be selected from the artificial network.

Using the penalty method, the optimization problem can be formulated as

[TABLE]

where $w=\{w_{e}\}$ is the set of the weights of edges in the entire graph, $\sigma_{w}^{2}$ is the variance of the edge weights, $\bar{w}$ is the average edge weight $\frac{\sum_{e\in E}{w_{e}}}{|E|}$ , $h(x)$ is the loss function such as the sigmoid function $h(x)=\frac{1}{1+e^{-x}}$ , $\lambda_{1}$ is a constant, and $\lambda_{2}$ is a coefficient for the penalty terms.

The regularization term $(\bar{w}-1)^{2}$ ensures that the resulting average edge weight is close to 1. Using regularization on $\bar{w}$ directly is likely to result in very small weights that are inconvenient in community detection tasks. For the same reason, the regularization on the variance of edge weights $\sigma_{w}^{2}$ limits the total number of negative edges. When $\lambda_{1}\gg\lambda_{2}$ , the above optimization problem converges at $w_{e}=1$ for $\forall e\in E$ , yielding the weights of edges in an unweighted graph.

So far, we have presented an optimization method of modifying edge weights which helps avoiding improper merging of communities. However, it involves as many variables as the total number of edges and does not guarantee that edges with similar features have similar weights. Let’s denote the $i$ -th topological feature of one edge $e$ as $x_{e}^{<i>}$ . The weight of an edge $e$ is obtained by the linear regression

[TABLE]

where $p_{i}$ is the parameter of the $i$ -th feature, for $i=1,\ldots,r$ . Let the feature vector of an edge $e$ be $x_{e}=(1,x_{e}^{<1>},x_{e}^{<2>},\ldots,x_{e}^{<r>})^{T}$ . Then Eq. (12) can be rewritten as

[TABLE]

where vector $p=(p_{0},p_{1},\ldots,p_{r})^{T}$ . This way, the objective function in Eq. (11) becomes a function over $p$ .

The first order partial derivative of the objective function over $p_{i}$ is

[TABLE]

The second term on the right side of the above equation is

[TABLE]

The first term on the right side of Eq. (14) is

[TABLE]

where the partial derivative $\frac{\partial h(\Delta Q_{c_{i}^{1},c_{i}^{2}})}{\partial\Delta Q_{c_{i}^{1},c_{i}^{2}}}$ is obtained from the specific loss function $h()$ . It is also easy to compute the partial derivative $\frac{\partial\Delta Q^{w}_{c_{i}^{1},c_{i}^{2}}}{\partial w}$ according to Eq. (9).

Algorithm. To solve the optimization problem presented above, we can apply a quasi-Newton method, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [24], which requires only the first derivative of the objective function to find the optimal result. The pseudo code of the training algorithm is presented in Algorithm 1.

During the training phase, $|I|$ pairs of ground truth communities $\{c_{i}^{1},c_{i}^{2}\}_{i\in I}$ can be chosen randomly from the artificial network assuming that the ground truth communities are provided. One efficient way to obtain the required number of pairs of ground truth communities is to sample adjacent communities in the artificial network randomly until $|I|$ pairs are collected. As indicated by Theorem 3, small communities are preferred to large communities here. Hence, we can set an upper bound on the size of the chosen communities. After the parameters are inferred, the regression model which assigns weights to edges can be applied to enhance the performance of community detection algorithms.

The time complexity of Algorithm 1 is $O(k(|I|+|E_{a}|))$ where $k$ is the number of BFGS iterations before the algorithm converges, $|I|$ is the number of constraints in Eq. (10) and $|E_{a}|$ is the total number of edges in the artificial graph. In order to accelerate the computation, we adopt the following key speedup improvements.

To compute the change of modularity upon joining two communities, the weights of all edges in the artificial graph need to be summed up which takes significant amount of time in each BFGS step. The summation of weights is

[TABLE]

which can be calculated efficiently because $\sum_{e}{x_{e}}$ needs to be computed only once at the outset of the optimization process, and $W$ is re-computed as the inner product of $p$ and $\sum_{e}{x_{e}}$ in every iteration. The sums of weights of the edges related to each community $c$ , such as $W^{in}_{c}$ and $W^{out}_{c}$ , and the variance of edge weights $\sigma_{w}^{2}$ can also be computed in the similar manner. Note that such speedup can be achieved because we intentionally use linear regression to compute the edge weights in Eq. (12). Otherwise, if non-linear regression function is used to obtain the edge weights, Eq. (17) does not hold and it generally takes more time to obtain the sums of weights.

In our algorithm, the edges with both endpoints not in any communities in pairs $\{c_{i}^{1},c_{i}^{2}\}$ for $i\in I$ are not involved in the computation of every BFGS iteration. The number of edges involved in every BFGS iteration is at most $2|I|Z$ where $Z$ is the average number of edges in communities in pairs $\{c_{i}^{1},c_{i}^{2}\}$ . So, the time complexity is reduced to $O(|E_{a}|+k|I|\times 2|I|Z)=O(|E_{a}|+k|I|^{2}Z)$ . In practice, this accelerated algorithm provided at least a 50-fold speedup compared to Algorithm 1.

Interpretation of edge weight in social networks. Edges are usually considered equally important in many community detection applications. Then, would not be the relationships between individuals also equally important in respect to forming communities in social networks? In real-world cases, one may know a lot of people, meet with them regularly, but trust only a few. The weight of a connection could be interpreted as the strength of the trust between people, or the strength of their social influence on each other. Social influence inferring has been studied in [14, 32]. Compared to these publications, our work focuses on assigning the edge weights in a way to assist in the formation of communities rather than to explain the spreading of opinions or ideas by social influences. Compared to other edge weighting schemes [7, 17, 2], the proposed regression model learns the edge weighting scheme from real ground truth communities in a supervised fashion. In addition, our work assigns one-dimensional weight to edges as a scalar quantitative measure, yet the weight could be extended to be a multi-dimensional measure of the strength of influence or trust in different contexts.

It is worth noting that our approach is a novel pre-processing tool to enhance community detection algorithms in most cases, even if they are not based on modularity maximization. However, since the proposed edge weighting scheme aims at improving the modularity maximization approaches, community detection algorithms based on other principles are not guaranteed to perform better on the weighted networks than they do on the original unweighted networks.

5 Experimental Results

In this section, we describe the experimental results obtained on real and synthetic networks. We compare the accuracy of modularity maximization algorithms running on original unweighted graphs and weighted graphs produced by our model. The experimental settings and evaluation metrics are explained in Section 5.1. The experimental results on synthetic and real networks are presented in Section 5.2.

5.1 Simulation configurations

To evaluate the performance, the state-of-art greedy modularity maximization algorithm, Fast Greedy [8], is executed on several real and synthetic networks. The regression model is trained by sampling the ground truth communities in artificial Stochastic Block Model (SBM) networks [15] in which the ground truth communities are complete, dense and well-defined [25]. In the SBM networks, nodes are connected to one another with particular edge densities, depending on their membership in the pre-defined communities. The artificial SBM network used by our model is constructed as follows: multiple SBM network instances are created with a high intra-edges density and a random, relatively small inter-edge density. Then, edges are randomly removed from network instances until the average node degree becomes close to that of the input graph. Among all SBM instances, the one with the average clustering coefficient closest in its value to the value of this coefficient in the input graph is used to train the regression model. The convergence of our training algorithm and the construction of SBM networks take only a few seconds.

The details of the tested networks are summarized in Table 2. The regression model converts each graph into a weighted one. Then, the Fast Greedy algorithm [8] is executed to detect communities in both the weighted and unweighted networks. We compare the detected communities with the given ground truth communities to compute the quality measures. The ground truth communities in real networks are often determined by the specific label of nodes. Although the goal of the community detection differs from the discovery of meta-data of nodes [25], we consider such labels to be a strong sign of the existence of some valid partitions.

Let the ground truth partition of the graph be denoted as $GN=\{g_{1},g_{2},\ldots\}$ where $g_{i}$ is a single ground truth community. The following evaluation metrics measure the similarity between the produced partition $C$ and ground truth partition $GN$ .

Variation of Information (VI) [13] measures the similarity between $C$ and $GN$ based on information theory

[TABLE]

where $I(C,GN)=H(C)+H(GN)-H(C,GN)$ is the Mutual Information, and $H()$ is the entropy function defined as

[TABLE]

Normalized Mutual Information (NMI) [33] is defined as

[TABLE]

F-measure [33] is given as

[TABLE]

Adjusted Rand Index (ARI) [16] computes the similarity by comparing all pairs of nodes that are assigned to the same or different communities in $C$ and $GN$ . Specifically, ARI is defined as

[TABLE]

Modularity Density [6] is a measure of the quality of communities in a network. Like the original modularity, it does not need the ground truth. The formal definition is

[TABLE]

where $d_{c_{i}}$ is the internal density of community $c_{i}$ , and $d_{c_{i},c_{j}}$ is the pair-wise density between community $c_{i}$ and community $c_{j}$ .

In addition, we evaluate the execution time of the training of the regression model and the additional time needed to convert an unweighted graph into a weighted one. We does not report the time cost of community detection, which depends on the specific algorithms. Hence, the reported time cost consists of two parts: (i) Training: the time cost to infer all the parameters of the regression model from the artificial graph; (ii) Weighting: the time cost to compute the weights of every edge in the original graph. Note that both parts include the I/O cost of loading the network files from disk and the edge topological feature extraction time.

5.2 Performance on synthetic and real networks

5.2.1 LFR benchmark

The LFR benchmark networks [18] serve as one of the standards for the evaluation of community detection algorithms. The properties of the network generated from the benchmark are defined by the following three parameters: $\gamma$ which is an exponent of the node degree in the power law distribution, $\beta$ which is an exponent of the community size in the power law distribution, and $\mu$ which is the mixing parameter that defines the fraction of edges originating in a community that have one endpoint outside of it. In our experiments, every LFR benchmark network has 5,000 nodes with the average node degree 15 and the maximum node degree 50. The exponents $\gamma$ and $\beta$ are set as 2 and 1 respectively and the mixing parameter $\mu$ takes two values, 0.45 and 0.5, which are quite challenging because high values of the mixing parameter are likely to result in loose community structures. Considering the randomness in the generation of synthetic networks, 10 network instances are constructed for each $\mu$ value.

We evaluate the modularity maximization performance on the original unweighted LFR benchmark networks. As seen in Table 3, the performance of the Fast Greedy (FG) algorithm [8] has been significantly improved by maximizing the modularity on the weighted networks instead of on the original unweighted graph. The F-measure is improved by nearly 40% and the NMI metric is improved by 25% in all cases. In Table 3, the modularity $Q$ and modularity density $Q_{ds}$ values are all computed over the original unweighted LFR benchmark networks. This surprising result shows that the execution of Fast Greedy algorithm on weighted graph can improves the $Q_{ds}$ value for the corresponding unweighted graph. In other words, the weighted edges allows the greedy algorithm to escape from local maximum of $Q_{ds}$ and get better value of it on the original unweighted graph. Also, using the edge weighting scheme, the Fast Greedy algorithm which maximizes the weighted modularity performs better than the fine-tuned $Q_{ds}$ algorithm [6]. In addition, we compared our approach to the previously published algorithm CNM [2]. The introduced here edge weighting scheme achieves an average 85% Jaccard-index score while the CNM algorithm obtains the average score of 82% on 10 different realizations of the LFR benchmark networks using the same parameters, with the mixing parameter set to $\mu=0.5$ . The remaining specific construction parameters of these LFR benchmark networks and the definition of Jaccard-index can be found in [2].

5.2.2 American college football network

The American college football network [10] consists of 115 nodes representing college football teams playing in a league with 11 conferences. Two teams are linked if they have played with each other in the year 2000 season. The teams in each of the 11 conferences can be treated as one community because they play with each other often. There are 8 independent teams (not members of any conference), each forming a single community. 19 ground truth communities are shown in Figure 2.a with each color representing a single community. However, only 6 communities are detected by the Fast Greedy algorithm on an unweighted graph as shown in Figure 2.b because some adjacent ground truth communities are joined together.

The regression model converts the original unweighted graph to a weighted graph where the edges with negative weights are marked in black in Figure 2.a. On the weighted graph, the Fast Greedy algorithm can find 11 league communities, each containing one individual conference, although it allocates the independent teams to some of these league communities. The regression model is trained by sampling the ground truth communities on the artificial SBM network, which is constructed to be similar to the Football network. The training process takes approximately 10 seconds on a machine with a single 2.5GHz CPU.

As illustrated in Table 4, in addition to the Fast Greedy modularity maximization algorithm, the state-of-the-art community detection algorithms, including label propagation algorithm by Raghavan et al. [27], Newman’s leading eigenvector method [21], the algorithm based on random walks [26] and the multilevel algorithm by Blondel et al. [3], also demonstrate improved performance on weighted graphs produced by our method222In the experiments, the edges with negative weight are removed from the graph because some community detection algorithms are not able to handle negative weights due to the algorithm design or implementation.. This result additionally supports our claim that properly weighting a graph can lead to an improved quality of community detection.

For a fair comparison, regardless of whether the partition of the graph is determined with or without the edge weights, both the modularity $Q$ and modularity density $Q_{ds}$ are computed over the unweighted graph, i.e., edge weights are all set to 1. Hence, a better $Q$ or $Q_{ds}$ found on the weighted graph indicates the edge weights allow the maximization algorithm to avoid the inferior local optima. The NMI and ARI measures indicate that the communities detected in the weighted graph are generally accurate. However, from the aspect of modularity, for three algorithm, LP, RW and ML, such communities may be evaluated as inferior (as they have slightly lower modularity) than the communities discovered in the original unweighted graph. Consequently, even if the maximum modularity is reached in the original unweighted graph, the resulting communities are still not likely to match the ground truth. In contrast, the modularity density $Q_{ds}$ 333Note that the modularity density values are all computed over the original Football network. of the communities detected in the weighted one is higher than in the original unweighted graphs, which means that it accurately measures the quality of these communities. The proposed edge weighting scheme leads to a higher modularity density $Q_{ds}$ in all cases because the weighted edges allows the greedy algorithm to escape from local maximum of $Q_{ds}$ and get better value of it on the original unweighted graph.

5.2.3 Large Networks

We evaluate the performance of our model on two large real networks: Amazon co-purchasing network and DBLP co-authorship network. The Amazon co-purchasing network [36] consists of 334,863 products with two frequently co-purchased products linked by an undirected edge. Each collection of products from the same category forms one ground-truth community. The DBLP collaboration network [36] is the co-authorship network where every node represents a researcher. Two researchers who published at least one paper together are linked. Following others, we assume that individual ground-truth community is defined by the publication venue, e.g., journal or conference. As seen in Figure 3, this assumption is not correct. In case of large conferences and most of the journals, each researcher writes papers with only a small fraction of all authors publishing in a venue. Yet, each researcher is likely to write several papers with the same co-authors to a group of conferences and journals covering their research interests. Hence, we believe that real grand truth communities in the DBLP network are smaller than a set of authors for a single venue. In both networks, the top 5000 high-quality ground truth communities are provided and we compare them with the detected ones to compute the F-measure, as shown in Figure 3.c.

The proposed weighting scheme is compared with the wCNM_1 algorithm [2] which computes the weight of an edge using all the triangles and 4-cycles containing it. In our experiments, the wCNM_1 algorithm iterates only once over updates, because the results in [2] show that additional iterations negligibly improve the final results.

The regression model which converts the original unweighted graphs to weighted ones is trained by sampling the dynamically constructed artificial SBM networks as described in Section 5.1. In addition, we also test the performance of our model trained by the ground truth communities in the American college football network, as shown in Figure 3. Perhaps surprisingly, the accuracy of the modularity maximization algorithm on the weighted graph when weights were based on SBM artificial network has improved for the Amazon network by almost 50% as measured by F-score and even more for the DBLP network.

The sizes of communities discovered in Amazon and DBLP networks containing more than $3$ nodes are plotted in Figure 3a-b. In the weighted graph produced by our model for the Amazon network, the distribution of the sizes of the detected communities is close to the distribution of the sizes of ground truth communities for weights based on SBM artificial network but quite different for weights based on the Football network. Since the F-score was similar for those two cases, this result demonstrates the importance of inspecting the distribution of the community sizes. In case of the DBLP network, the improvement of F-score is significant for the weights based on the SBM network, but the distribution of the community sizes is different. We believe that these two results show that presumed ground truth communities in DBLP are not correct, and that smaller communities of researchers co-authoring papers across several venues are the right communities. These results show that our model successfully converts large networks to the weighted ones where the modularity maximization algorithms can perform better than they do on the original unweighted networks.

In general, these large networks can be processed in a few minutes as shown in Figure 3d. The computation time is divided into two parts: (i) Training: the time spent to infer all the parameters of the regression model; (ii) Weighting: the time needed to compute the weights of every edge in the graph. Both steps include the I/O processing time of loading the network files from disk. The edge topological feature extraction (i.e., weighting) time increases as the number of edges grow, therefore processing of dense graphs can be more time-consuming. Unlike the weighting time, the training time does not change much with the size of the original network. This is because the size of the constructed artificial network is independent of the size of the original input graph. Last but not least, in our experiments, the edge topological feature extraction and edge weight evaluation use a single thread implementation. However, as problems that are easily parallelizable, they can be partitioned into many individual tasks to achieve a better performance.

6 Conclusions

We have developed a novel regression model of assigning weights to edges to assist the discovery of community structures based on modularity maximization. Surprisingly, the results show the execution of Fast Greedy algorithm on a weighted graph improves $Q_{ds}$ value for the original unweighted graph. In other words, the weighted edges allows the algorithm to escape from the local maximum of $Q_{ds}$ in unweighted network and the solution found with the weighted edges has higher value of $Q_{ds}$ metric in the original unweighted network. Other community detection algorithms which are not based on the modularity maximization principle may also benefit from running on the weighted graph produced by our model rather than the original unweighted graph. Moreover, we introduce speedup improvements to accelerate the training of our regression model. Experimental results show that our approach significantly improves the quality of community detection in both real and synthetic networks.

Acknowledgements

This work was supported in part by the Army Research Laboratory (ARL) under Cooperative Agreement Number W911NF-09-2-0053 (NS-CTA), by the Army Research Office (ARO), grant W911NF-16-1-0524, and by the Office of Naval Research (ONR) Grant No. N00014-15-1-2640. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies either expressed or implied of the Army Research Laboratory or the U.S. Government.

Appendix A Proof of Theorems 1, 2 and 3

Theorem 1.

Proof 4

For any community $c_{i}$ ,

[TABLE]

Thus,

[TABLE]

Theorem 2.

Proof 5

Without loss of generality, consider two ground truth communities $c_{i}$ and $c_{j}$ enhanced by the balanced edge weighting scheme. The change in modularity $Q^{w}$ upon joining these two communities is,

[TABLE]

Theorem 3.

Proof 6

Consider a community $c=c_{i}\cup c_{j}$ with $d_{c}\leq\sqrt{8|E|}$ enhanced by the balanced edge weighting scheme, where $c_{i}$ and $c_{j}$ are two non-empty communities, then $\Delta Q_{c_{i},c_{j}}\geq 0$ by assumption that modularity reaches local maximum for partition $C^{*}$ .

If $E_{c_{i},c_{j}}=\emptyset$ , then $W_{c_{i},c_{j}}=0$ and $\Delta Q_{c_{i},c_{j}}=-\frac{d_{c_{i}}d_{c_{j}}}{2|E|^{2}}\geq 0$ . This leads to either $d_{c_{i}}=0$ or $d_{c_{j}}=0$ which causes contradiction. Otherwise, if $|E_{c_{i},c_{j}}|\geq 1$ , then $W_{c_{i},c_{j}}\geq 1$ because the edge weighting scheme assigns weight $w_{e}\geq 1$ to any edge $e\in E_{c_{i},c_{j}}$ . Since $W_{c_{i}}+W_{c_{j}}=W_{c}=d_{c}$ , we have $W_{c_{i}}W_{c_{j}}\leq(\frac{d_{c}}{2})^{2}$ . When community $c$ splits into communities $c_{i}$ and $c_{j}$ , the change in modularity $Q^{w}$ is,

[TABLE]

Note that the last inequality holds because of the condition $d_{c}\leq\sqrt{8|E|}$ .

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Backstrom and Leskovec [2011] L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining , pages 635–644. ACM, 2011.
2Berry et al. [2011] J. W. Berry, B. Hendrickson, R. A. La Violette, and C. A. Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E , 83(5):056119, 2011.
3Blondel et al. [2008] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment , 2008(10):P 10008, 2008.
4Brandes et al. [2008] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner. On modularity clustering. IEEE Transactions on Knowledge and Data Engineering , 20(2):172–188, 2008.
5Chen et al. [2012] M. Chen, T. Nguyen, and B. K. Szymanski. A new metric for quality of network community structure. ASE Human Journal , 2(4):226–240, 2012.
6Chen et al. [2014] M. Chen, K. Kuzmin, and B. K. Szymanski. Community detection via maximization of modularity and its variants. IEEE Transactions on Computational Social Systems , 1(1):46–65, 2014.
7Ciglan et al. [2013] M. Ciglan, M. Laclavík, and K. Nørvåg. On community detection in real-world networks and the importance of degree assortativity. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1007–1015. ACM, 2013.
8Clauset et al. [2004] A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E , 70(6):066111, 2004.