Modularity of complex networks models

Liudmila Ostroumova Prokhorenkova; Pawel Pralat; Andrei Raigorodskii

arXiv:1701.03141·math.PR·July 18, 2017·WAW

Modularity of complex networks models

Liudmila Ostroumova Prokhorenkova, Pawel Pralat, Andrei Raigorodskii

PDF

TL;DR

This paper examines the concept of modularity in complex networks, comparing spatial and non-spatial models, providing theoretical insights, and discussing implications for community detection and model selection.

Contribution

It offers theoretical results for classical and preferential attachment models and contrasts them with spatial models, enhancing understanding of modularity in different network types.

Findings

01

Classical random d-regular graphs have low modularity.

02

Spatial preferential attachment models naturally produce high modularity.

03

Results aid in statistical testing and model selection for network clustering.

Abstract

Modularity is designed to measure the strength of division of a network into clusters (known also as communities). Networks with high modularity have dense connections between the vertices within clusters but sparse connections between vertices of different clusters. As a result, modularity is often used in optimization methods for detecting community structure in networks, and so it is an important graph parameter from a practical point of view. Unfortunately, many existing non-spatial models of complex networks do not generate graphs with high modularity; on the other hand, spatial models naturally create clusters. We investigate this phenomenon by considering a few examples from both sub-classes. We prove precise theoretical results for the classical model of random d-regular graphs as well as the preferential attachment model, and contrast these results with the ones for the spatial…

Tables2

Table 1. Table 1: Upper bounds U 1 subscript 𝑈 1 U_{1} , U 3 subscript 𝑈 3 U_{3} for q ∗ ( 𝒢 n , d ) superscript 𝑞 subscript 𝒢 𝑛 𝑑 q^{*}({\mathcal{G}}_{n,d}) and U 2 subscript 𝑈 2 U_{2} for q δ ( 𝒢 n , d ) subscript 𝑞 𝛿 subscript 𝒢 𝑛 𝑑 q_{\delta}({\mathcal{G}}_{n,d})

$d$	$U_{1}$	$U_{2}$	$U_{3}$
3	0.9386	0.8771	0.8038
4	0.8900	0.7800	0.6834
5	0.8539	0.7078	0.6024
6	0.8261	0.6521	0.5435
7	0.8038	0.6076	0.4984
8	0.7855	0.5710	0.4624
9	0.7702	0.5403	0.4330
10	0.7570	0.5140	0.4083

Table 2. Table 2: Lower bounds for q ∗ ( G m n ) superscript 𝑞 superscript subscript 𝐺 𝑚 𝑛 q^{*}(G_{m}^{n})

$m$	$7$	$8$	$9$	$10$	$100$	$1000$
$L_{1}$	0.142	0.125	0.111	0.100	0.0100	0.0010
$L_{2}$	0.156	0.136	0.136	0.123	0.0397	0.0126

Equations172

q_{A} = A \in A \sum (\frac{e ( A )}{∣ E ( G ) ∣} - \frac{( \sum _{v \in A} de g ( v ) ) ^{2}}{4∣ E ( G ) ∣ ^{2}}),

q_{A} = A \in A \sum (\frac{e ( A )}{∣ E ( G ) ∣} - \frac{( \sum _{v \in A} de g ( v ) ) ^{2}}{4∣ E ( G ) ∣ ^{2}}),

q^{*} (G) = A max q_{A} (G) .

q^{*} (G) = A max q_{A} (G) .

P (i = s) = {de g (v_{s}, t - 1) / (2 t - 1) 1/ (2 t - 1) 1 \leq s \leq t - 1, s = t,

P (i = s) = {de g (v_{s}, t - 1) / (2 t - 1) 1/ (2 t - 1) 1 \leq s \leq t - 1, s = t,

d(x,y)=\min\big{\{}||x-y+u||_{p}\,:\,u\in\{-1,0,1\}^{m}\big{\}}.

d(x,y)=\min\big{\{}||x-y+u||_{p}\,:\,u\in\{-1,0,1\}^{m}\big{\}}.

∣ S (v, t) ∣ = min {\frac{A _{1} de g ^{-} ( v , t ) + A _{2}}{t}, 1} .

∣ S (v, t) ∣ = min {\frac{A _{1} de g ^{-} ( v , t ) + A _{2}}{t}, 1} .

ρ (G) = V (G) = V_{1} \cup V_{2} min \frac{e ( V _{1} , V _{2} )}{min { ∣ V _{1} ∣ , ∣ V _{2} ∣ }},

ρ (G) = V (G) = V_{1} \cup V_{2} min \frac{e ( V _{1} , V _{2} )}{min { ∣ V _{1} ∣ , ∣ V _{2} ∣ }},

q^{*} (G) \leq max {1 - ρ (G) / d, 3/4} .

q^{*} (G) \leq max {1 - ρ (G) / d, 3/4} .

q^{*} (G_{n, d}) \leq U_{1} = U_{1} (d) := max {1/2 + η /2, 3/4} .

q^{*} (G_{n, d}) \leq U_{1} = U_{1} (d) := max {1/2 + η /2, 3/4} .

q_{δ} (G) \leq 1 - 2 ρ (G) / d + ε .

q_{δ} (G) \leq 1 - 2 ρ (G) / d + ε .

U_{2} = U_{2} (d) := η + ε

U_{2} = U_{2} (d) := η + ε

q^{*} (G_{n, d}) \geq \frac{2}{d} - O (1/ n) = \frac{2 + o ( 1 )}{d} .

q^{*} (G_{n, d}) \geq \frac{2}{d} - O (1/ n) = \frac{2 + o ( 1 )}{d} .

q_{A} = i = 1 \sum k x_{i} (\frac{y _{i}}{d} - x_{i}) .

q_{A} = i = 1 \sum k x_{i} (\frac{y _{i}}{d} - x_{i}) .

f (x,

f (x,

- x y lo g (y) /2 - x (d - y) lo g (d - y) - (d - 2 x d + x y) lo g (d - 2 x d + x y) /2.

q^{*} (G_{n, d}) \leq U_{3} + ε / d,

q^{*} (G_{n, d}) \leq U_{3} + ε / d,

U_{3} = U_{3} (d) := x \in (0, 1) sup (\frac{y ˉ ( x , d )}{d} - x) .

U_{3} = U_{3} (d) := x \in (0, 1) sup (\frac{y ˉ ( x , d )}{d} - x) .

X (x, y)

X (x, y)

M (i) = \frac{i !}{( i /2 )! 2 ^{i /2}} .

M (i) = \frac{i !}{( i /2 )! 2 ^{i /2}} .

X (x, y)

X (x, y)

X (x, y) = Θ (n^{- 1}) e^{f (x, y, d) n},

X (x, y) = Θ (n^{- 1}) e^{f (x, y, d) n},

λ (G) \leq 2 d - 1 + ε .

λ (G) \leq 2 d - 1 + ε .

q^{*} (G_{n, d}) \leq \frac{λ}{d} .

q^{*} (G_{n, d}) \leq \frac{λ}{d} .

q^{*} (G_{n, d}) \leq \frac{2}{d} .

q^{*} (G_{n, d}) \leq \frac{2}{d} .

∣ E (S, V ∖ S) ∣ \geq \frac{( d - λ ) ∣ S ∣∣ V ∖ S ∣}{n}

∣ E (S, V ∖ S) ∣ \geq \frac{( d - λ ) ∣ S ∣∣ V ∖ S ∣}{n}

e (S) = \frac{d ∣ S ∣ - ∣ E ( S , V ∖ S ) ∣}{2} \leq \frac{d x n - ( d - λ ) x ( 1 - x ) n}{2} = \frac{d x + λ ( 1 - x )}{2} \cdot x n .

e (S) = \frac{d ∣ S ∣ - ∣ E ( S , V ∖ S ) ∣}{2} \leq \frac{d x n - ( d - λ ) x ( 1 - x ) n}{2} = \frac{d x + λ ( 1 - x )}{2} \cdot x n .

q^{*} (F_{n}) \geq 1 - 3 \frac{Δ}{n} .

q^{*} (F_{n}) \geq 1 - 3 \frac{Δ}{n} .

q^{*} (G_{n}) \geq \frac{2}{d ˉ} - 3 \frac{Δ}{n d ˉ} - \frac{Δ}{n d ˉ} .

q^{*} (G_{n}) \geq \frac{2}{d ˉ} - 3 \frac{Δ}{n d ˉ} - \frac{Δ}{n d ˉ} .

A \in A \sum \frac{e ( A )}{∣ E ( G _{n} ) ∣} \geq \frac{n - k}{n d ˉ /2} \geq \frac{2}{d ˉ} - \frac{2}{\frac{h}{Δ} - 1} = \frac{2}{d ˉ} - 2 \frac{Δ}{n d ˉ} .

A \in A \sum \frac{e ( A )}{∣ E ( G _{n} ) ∣} \geq \frac{n - k}{n d ˉ /2} \geq \frac{2}{d ˉ} - \frac{2}{\frac{h}{Δ} - 1} = \frac{2}{d ˉ} - 2 \frac{Δ}{n d ˉ} .

A \in A \sum \frac{vol _{G_{n}}^{2} ( A )}{4∣ E ( G _{n} ) ∣ ^{2}} = \frac{hn d ˉ}{n ^{2} d ˉ ^{2}} = \frac{h}{n d ˉ} = \frac{Δ}{n d ˉ} + \frac{Δ}{n d ˉ} .

A \in A \sum \frac{vol _{G_{n}}^{2} ( A )}{4∣ E ( G _{n} ) ∣ ^{2}} = \frac{hn d ˉ}{n ^{2} d ˉ ^{2}} = \frac{h}{n d ˉ} = \frac{Δ}{n d ˉ} + \frac{Δ}{n d ˉ} .

A \in A \sum \frac{e ( A )}{∣ E ( F _{n} ) ∣} \geq 1 - \frac{1}{\frac{h}{2Δ}} = 1 - 2 \frac{Δ}{n} .

A \in A \sum \frac{e ( A )}{∣ E ( F _{n} ) ∣} \geq 1 - \frac{1}{\frac{h}{2Δ}} = 1 - 2 \frac{Δ}{n} .

A \in A \sum \frac{vol _{F_{n}}^{2} ( A )}{4∣ E ( F _{n} ) ∣ ^{2}} \leq \frac{h vol _{F_{n}} ( F _{n} )}{vol _{F_{n}} ( F _{n} ) ^{2}} \leq \frac{h}{n} = \frac{Δ}{n} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Modularity of complex networks models

Liudmila Ostroumova Prokhorenkova1,2

Paweł Prałat3,4

Andrei Raigorodskii1,2,5,6

(1Moscow Institute of Physics and Technology, Moscow, Russia

2Yandex, Moscow, Russia

3Ryerson University, Toronto, ON, Canada

4The Fields Institute for Research in Mathematical Sciences, Toronto, ON, Canada

5Moscow State University, Moscow, Russia

6Buryat State Unversity, Ulan-Ude, Buryat Republic, Russia )

Abstract

Modularity is designed to measure the strength of division of a network into clusters (known also as communities). Networks with high modularity have dense connections between the vertices within clusters but sparse connections between vertices of different clusters. As a result, modularity is often used in optimization methods for detecting community structure in networks, and so it is an important graph parameter from a practical point of view. Unfortunately, many existing non-spatial models of complex networks do not generate graphs with high modularity; on the other hand, spatial models naturally create clusters. We investigate this phenomenon by considering a few examples from both sub-classes. We prove precise theoretical results for the classical model of random $d$ -regular graphs as well as the preferential attachment model, and contrast these results with the ones for the spatial preferential attachment (SPA) model that is a model for complex networks in which vertices are embedded in a metric space, and each vertex has a sphere of influence whose size increases if the vertex gains an in-link, and otherwise decreases with time. The results obtained in this paper can be used for developing statistical tests for models selection and to measure statistical significance of clusters observed in complex networks.

1 Introduction

Many social, biological, and information systems can be represented by networks, whose vertices are items and links are relations between these items [2, 7, 9, 16]. That is why the evolution of complex networks attracted a lot of attention in recent years and there has been a great deal of interest in modelling of these networks [12, 20, 42]. The hyperlinked structure of the Web, citation patterns, friendship relationships, infectious disease spread are seemingly disparate linked data sets which have fundamentally very similar natures. Indeed, it turns out that many real-world networks have some typical properties: heavy tailed degree distribution, small diameter, high clustering coefficient, and others [39, 41, 47]. Such properties are well-studied both in real-world networks and in many theoretical models.

Another important property of complex networks is their community structure, that is, the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters [24, 28]. In social networks communities may represent groups by interest, in citation networks they correspond to related papers, in the Web communities are formed by pages on related topics, etc. Being able to identify communities in a network could help us to exploit this network more effectively. For example, clusters in citation graphs may help to find similar scientific papers, discovering users with similar interests is important for targeted advertisement, clustering can also be used for network compression and visualization.

The key ingredient for many clustering algorithms is modularity, which is at the same time a global criterion to define communities, a quality function of community detection algorithms, and a way to measure the presence of community structure in a network. Modularity was introduced by Newman and Girvan [43] and it is based on the comparison between the actual density of edges inside a community and the density one would expect to have if the vertices of the graph were attached at random, regardless of community structure.

Unfortunately, modularity is not a well studied parameter for the existing random graph models, at least from a rigorous, theoretical point of view. We are only aware about results for binomial random graphs $G(n,p)$ and random $d$ -regular graphs (see Section 2.3 for more details). In this paper, we continue investigating random $d$ -regular graphs and obtain new upper bounds for their modularity. Then we move to the preferential attachment model, introduced by Barabási and Albert [8], which is probably the most well-studied model of complex networks. For this model no results on modularity are known and we obtain both lower and upper bounds. In fact, one of the lower bound we present holds for all graphs with average degree $d$ and sublinear maximum degree.

As expected, the models discussed above, as well as many others, have a common weakness of low modularity. One family of models which overcomes this deficiency is the family of spatial (or geometric) models, wherein the vertices are embedded in a metric space such that similar vertices are closer to each other than dissimilar ones. The underlying geometry of spatial models naturally leads to the emergence of clusters. We prove this statement rigorously for one example of a geometric model, the Spatial Preferential Attachment model introduced in [1].

This paper is a journal version of [44] and is structured as follows. In the next section, we formally define modularity, discuss several random graph models and present known results on modularity in these models. In Sections 3, 5 and 6 we analyze modularity in random $d$ -regular graphs, preferential attachment and SPA models, respectively. In Section 4 we discuss lower bounds for modularity of forests and constant average degree graphs. Section 7 concludes the paper and outlines the directions for future research.

2 Preliminaries

2.1 Modularity

The definition of modularity was first introduced by Newman and Girvan in [43]. Since then, many popular and applied algorithms used to find clusters in large data-sets are based on finding partitions with high modularity [18, 34, 40]. The modularity function favours partitions in which a large proportion of the edges fall entirely within the parts and biases against having too few or too unequally sized parts. Formally, for a given partition ${\cal{A}}=\{A_{1},\ldots,A_{k}\}$ of the vertex set $V(G)$ , let

[TABLE]

where $e(A)=|\{uv\in E(G):u,v\in A\}|$ is the number of edges in the graph induced by the set $A$ . The first term, $\sum_{A\in{\cal{A}}}\frac{e(A)}{|E(G)|}$ , is called the edge contribution, whereas the second one, $\sum_{A\in{\cal{A}}}\frac{(\sum_{v\in A}\deg(v))^{2}}{4|E(G)|^{2}}$ , is called the degree tax. It is easy to see that $q_{{\cal{A}}}$ is always smaller than one. Also, if ${\cal{A}}=\{V(G)\}$ , then $q_{{\cal{A}}}=0$ .

The modularity $q^{*}(G)$ is defined as the maximum of $q_{{\cal{A}}}$ over all possible partitions ${\cal{A}}$ of $V(G)$ ; that is,

[TABLE]

In order to maximize $q_{{\cal{A}}}(G)$ one wants to find a partition with large edge contribution subject to small degree tax. If $q^{*}(G)$ approaches 1 (which is the maximum), we observe a strong community structure; conversely, if $q^{*}(G)$ is close to zero, we are given a graph with no community structure.

Modularity is known to have some weaknesses, as discussed in [24]. For example, [25] shows that this measure fails to detect communities if their sizes are too small. However, despite this, modularity still remains to be the most popular measure used by many well known clustering algorithms [18, 34, 40].

2.2 Random graph models

Random $d$ -regular graphs.

We consider the probability space of random $d$ -regular graphs with uniform probability distribution. This space is denoted $\mathcal{G}_{n,d}$ , and asymptotics are for $n\to\infty$ with $d\geq 2$ fixed, and $n$ even if $d$ is odd.

We say that an event in a probability space holds asymptotically almost surely (or a.a.s.) if the probability that it holds tends to $1$ as $n$ goes to infinity. Since we aim for results that hold a.a.s., we will always assume that $n$ is large enough.

Preferential Attachment.

The Preferential Attachment (PA) model [8] was an early stochastic model of complex networks. We will use the following precise definition of the model, as considered by Bollobás and Riordan in [13] as well as Bollobás, Riordan, Spencer, and Tusnády [14].

Let $G_{1}^{0}$ be the null graph with no vertices (or let $G_{1}^{1}$ be the graph with one vertex, $v_{1}$ , and one loop). The random graph process $(G_{1}^{t})_{t\geq 0}$ is defined inductively as follows. Given $G_{1}^{t-1}$ , we form $G_{1}^{t}$ by adding a vertex $v_{t}$ together with a single edge between $v_{t}$ and $v_{i}$ , where $i$ is selected randomly with the following probability distribution:

[TABLE]

where $\deg(v_{s},t-1)$ denotes the degree of $v_{s}$ in $G_{1}^{t-1}$ (loops are counted twice). In other words, at $t$ -th step of the process we send an edge $e$ from $v_{t}$ to a random vertex $v_{i}$ , where the probability that a vertex is chosen is proportional to its current degree, counting $e$ as already contributing one to the degree of $v_{t}$ .

For $m\in\mathbb{N}\setminus\{1\}$ , the process $(G_{m}^{t})_{t\geq 0}$ is defined similarly with the only difference that $m$ edges are added to $G_{m}^{t-1}$ to form $G_{m}^{t}$ (one at a time), counting previous edges as already contributing to the degree distribution. Equivalently, one can define the process $(G_{m}^{t})_{t\geq 0}$ by considering the process $(G_{1}^{t})_{t\geq 0}$ on a sequence $v^{\prime}_{1},v^{\prime}_{2},\ldots$ of vertices; the graph $G_{m}^{t}$ is formed from $G_{1}^{tm}$ by identifying vertices $v^{\prime}_{1},v^{\prime}_{2},\ldots,v^{\prime}_{m}$ to form $v_{1}$ , identifying vertices $v^{\prime}_{m+1},v^{\prime}_{m+2},\ldots,v^{\prime}_{2m}$ to form $v_{2}$ , and so on. Note that in this model $G_{m}^{t}$ is in general a multigraph, possibly with multiple edges between two vertices (if $m\geq 2$ ) and self-loops.

It was shown in [14] that for any $m\in\mathbb{N}$ a.a.s. the degree distribution of $G_{m}^{n}$ follows a power law: the number of vertices with degree at least $k$ falls off as $(1+o(1))ck^{-2}n$ for some explicit constant $c=c(m)$ and large $k\leq n^{1/15}$ . Also, in the case $m=1$ , each vertex sends an edge either to itself or to an earlier vertex, so $G_{1}^{n}$ is a forest with each component containing a single looped vertex. The expected number of components is then $\sum_{t=1}^{n}1/(2t-1)\sim(1/2)\log n$ and, since events are independent, we derive that a.a.s. there are $(1/2+o(1))\log n$ components in $G_{1}^{n}$ by Chernoff’s bound. In contrast, for the case $m\geq 2$ it is known that a.a.s. $G_{m}^{n}$ is connected and its diameter is $(1+o(1))\log n/\log\log n$ [13].

Spatial Preferential Attachment.

The Spatial Preferential Attachment (SPA) model [1], designed as a model for the World Wide Web, combines geometry and preferential attachment, as its name suggests. Setting the SPA model apart is the incorporation of ‘spheres of influence’ to accomplish preferential attachment: the greater the degree of a vertex, the larger its sphere of influence, and hence the higher the likelihood of the vertex gaining more neighbours.

We now give a precise description of the SPA model. Let $S=[0,1]^{m}$ be the unit hypercube in $\mathbb{R}^{m}$ , equipped with the torus metric derived from any of the $L_{p}$ norms. This means that for any two points $x$ and $y$ in $S$ ,

[TABLE]

The torus metric thus ‘wraps around’ the boundaries of the unit square; this metric was chosen to eliminate boundary effects. The parameters of the model consist of the link probability $p\in[0,1]$ , and two positive constants $A_{1}$ and $A_{2}$ , which, in order to avoid the resulting graph becoming too dense, must be chosen so that $pA_{1}<1$ . The SPA model generates stochastic sequences of directed graphs $(G_{t}:t\geq 0)$ , where $G_{t}=(V_{t},E_{t})$ , and $V_{t}\subseteq S$ . Let $\deg^{-}(v,t)$ be the in-degree of the vertex $v$ in $G_{t}$ , and $\deg^{+}(v,t)$ its out-degree. We define the sphere of influence $S(v,t)$ of the vertex $v$ at time $t\geq 1$ to be the ball centered at $v$ with volume $|S(v,t)|$ defined as follows:

[TABLE]

The process begins at $t=0$ , with $G_{0}$ being the null graph. Time step $t$ , $t\geq 1$ , is defined to be the transition between $G_{t-1}$ and $G_{t}$ . At the beginning of each time step $t$ , a new vertex $v_{t}$ is chosen uniformly at random from $S$ , and added to $V_{t-1}$ to create $V_{t}$ . Next, independently, for each vertex $u\in V_{t-1}$ such that $v_{t}\in S(u,t-1)$ , a directed link $(v_{t},u)$ is created with probability $p$ . Thus, the probability that a link $(v_{t},u)$ is added in time-step $t$ equals $p\,|S(u,t-1)|$ .

The SPA model produces scale-free networks, which exhibit many of the characteristics of real-life networks (see [1, 19]). In [31], it was shown that the SPA model gave the best fit, in terms of graph structure, for a series of social networks derived from Facebook. In [32], some properties of common neighbors were used to explore the underlying geometry of the SPA model and quantify vertex similarity based on distance in the space. However, the distribution of vertices in space was assumed to be uniform [32] and so in [33] non-uniform distributions were investigated which is clearly a more realistic setting.

2.3 Previous results on modularity

In this section we discuss known bounds for modularity in different random graph models.

The isoperimetric number (known also as edge expansion) of a graph $G$ is defined as

[TABLE]

where $e(V_{1},V_{2})=|\{uv\in E(G):u\in V_{1},v\in V_{2}\}|$ is the number of edges between the sets $V_{1}$ and $V_{2}$ . The following result was shown by McDiarmid and Skerman in [35]. Let $G$ be any $d$ -regular graph on $n$ vertices. Then, the following useful upper bound on the modularity is almost immediate:

[TABLE]

Turning to random $d$ -regular graphs, Bollobás in [11] showed that a.a.s. $\rho({\mathcal{G}}_{n,d})\geq(1-\eta)d/2$ , where $0<\eta<1$ is such that $2^{4/d}<(1-\eta)^{1-\eta}(1+\eta)^{1+\eta}$ and so a.a.s.

[TABLE]

As a result, we get the first non-trivial upper bounds for $q^{*}({\mathcal{G}}_{n,d})$ presented in Table 1 that hold a.a.s.

In [35], the bound (3) was slightly improved when the maximum size of parts in our partition is restricted. Formally, given $\delta>0$ , for a graph $G$ with $n\geq 1/\delta$ vertices, they define $q_{\delta}(G)$ to be the maximum modularity of all partitions for $G$ such that each part has size at most $\delta n$ . They show that for any $\varepsilon>0$ there exists $\delta>0$ such any $d$ -regular graph with at least $1/\delta$ vertices satisfies

[TABLE]

Again, using the result of Bollobás we get that there exists $\delta>0$ such that

[TABLE]

serves as an upper bound that holds a.a.s. for $q_{\delta}({\mathcal{G}}_{n,d})$ ; again, see Table 1 for numerical values for small values of $d$ . It is straightforward to see that $(G)\geq d/2-\sqrt{(\log 2)d}$ (see, for example, [11]) and so, in particular, $U_{2}$ can be made arbitrarily small by taking $d$ large enough (and $\delta$ small enough). However, let us note that these upper bounds for $q_{\delta}({\mathcal{G}}_{n,d})$ , while useful, cannot be directly translated into any bound for $q^{*}({\mathcal{G}}_{n,d})$ .

Investigating random $d$ -regular graphs continues in [36], a very recent paper. In fact, the numerical upper bound presented in Section 3.3, as well as the result in Theorem 4, are obtained independently there. Moreover, [36] investigates the class of graphs whose product of treewidth and maximum degree is much less than the number of edges. Their result shows, for example, that random planar graphs typically have modularity close to 1, which is another indication that clusters naturally emerge where geometry is included. Also, a particular case of their theorem shows that trees with maximum degree $o(n)$ have asymptotic modularity one.

3 Random $d$ -regular graphs

3.1 Pairing model

Instead of working directly in the uniform probability space of random regular graphs on $n$ vertices $\mathcal{G}_{n,d}$ , we use the pairing model (also known as the configuration model) of random regular graphs, first introduced by Bollobás [10], which is described next. Suppose that $dn$ is even, as in the case of random regular graphs, and consider $dn$ points partitioned into $n$ labelled buckets $v_{1},v_{2},\ldots,v_{n}$ of $d$ points each. A pairing of these points is a perfect matching into $dn/2$ pairs. Given a pairing $P$ , we may construct a multigraph $G(P)$ , with loops allowed, as follows: the vertices are the buckets $v_{1},v_{2},\ldots,v_{n}$ , and a pair $\{x,y\}$ in $P$ corresponds to an edge $v_{i}v_{j}$ in $G(P)$ if $x$ and $y$ are contained in the buckets $v_{i}$ and $v_{j}$ , respectively. It is an easy fact that the probability of a random pairing corresponding to a given simple graph $G$ is independent of the graph, hence the restriction of the probability space of random pairings to simple graphs is precisely $\mathcal{G}_{n,d}$ . Moreover, it is well known that a random pairing generates a simple graph with probability asymptotic to $e^{-(d^{2}-1)/4}$ depending on $d$ , so that any event holding a.a.s. over the probability space of random pairings also holds a.a.s. over the corresponding space $\mathcal{G}_{n,d}$ . For this reason, asymptotic results over random pairings suffice for our purposes. For more information on this model, see, for example, the survey of Wormald [48].

3.2 Lower bound

For completeness, let us briefly discuss the following known lower bound for the modularity of ${\mathcal{G}}_{n,d}$ . It is known that a.a.s. for any $d\in\mathbb{N}\setminus\{1,2\}$ , ${\mathcal{G}}_{n,d}$ is Hamiltonian. As pointed out in [35], one can use this fact to partition the graph such that it breaks the cycle into $\lceil\sqrt{n}\rceil$ paths of length at most $\lceil\sqrt{n}\rceil$ . For this particular partition the edge contribution is $2/d-O(1/\sqrt{n})$ and the degree tax is $O(1/\sqrt{n})$ . It follows then that a.a.s.

[TABLE]

(Our more general lower bound that holds for graphs with average degree $d$ implies the same—see Theorem 6 for more.) Whereas this trivial lower bound could be sharp for $d=3$ it is definitely not the case for large $d$ . As pointed out in [36], there exists a universal constant $c>0$ such that a.a.s. $q^{*}({\mathcal{G}}_{n,d})\geq c/\sqrt{d}$ .

3.3 Numerical upper bound

The following straightforward lemma is useful for obtaining upper bounds for modularity of random $d$ -regular graphs.

Lemma 1

Consider any $d$ -regular graph on $n$ vertices $G_{n,d}$ . If no subset of $V(G_{n,d})$ of size $xn$ induces $yxn/2$ edges with $y/d-x\geq U$ , then $q^{*}({\mathcal{G}}_{n,d})<U$ .

*Proof. * For a given partition ${\cal{A}}=\{A_{1},\ldots,A_{k}\}$ of the vertex set $V(G)$ , let $x_{i}=|A_{i}|/n$ and $y_{i}=2|E(A_{i})|/|A_{i}|$ ; that is, set $A_{i}$ has $x_{i}n$ vertices and induces $y_{i}x_{i}n/2$ edges. Then, taking into account the fact that for any $A\subseteq V(G)$ we have $\sum_{v\in A}\deg(v)=d|A|$ , we can rewrite (1) as

[TABLE]

As it is simply a weighted average, $q_{{\cal{A}}}\geq U$ would imply that there exists some set of size $xn$ that induces $yxn/2$ edges, and $y/d-x\geq U$ . So, the proof of the lemma is finished. $\Box$

To formulate the main theorem of this section, we need the following notation. For a given $d\in\mathbb{N}\setminus\{1,2\}$ , let

[TABLE]

It will be clear once we establish the connection between function $f$ and random $d$ -regular graphs, but it is straightforward to see that for any $x\in(0,1)$ we have $f(x,d,d)<0$ (more precisely, its limit value) and $f(x,y,d)>0$ for some $y\in(0,d)$ . Indeed, for example note that $f(x,xd,d)=-x\log(x)+(x-1)\log(1-x)>0$ . Also, it is easy to see that $f(x,y,d)$ is continuous on $y\in(0,d)$ .

Finally, let $\bar{y}=\bar{y}(x,d)$ be largest value of $y\in(0,d)$ such that $f(x,y,d)=0$ ; in particular, $f(x,y,d)<0$ for any $y\in(\bar{y},d)$ .

Theorem 2

Let $d\in\mathbb{N}\setminus\{1,2\}$ and $\varepsilon>0$ be an arbitrarily small constant. Then a.a.s.

[TABLE]

where

[TABLE]

As usual, see Table 1 for numerical values for small values of $d$ .

*Proof. * We prove below that the following property holds a.a.s. for ${\mathcal{G}}_{n,d}$ . No set $A$ of size $xn$ (for any $x=x(n)\in(0,1)$ ) induces a graph with $yxn/2$ edges, where $\bar{y}(x,d)+\varepsilon\leq y\leq d$ and $\bar{y}(x,d)$ is defined as above. Then Theorem 2 follows directly from Lemma 1.

Consider $\mathcal{G}_{n,d}$ for some $d\in\mathbb{N}\setminus\{1,2\}$ and let $\varepsilon>0$ be an arbitrarily small constant. Our goal is to show that the expected number of sets $S$ such that $|S|=xn$ and $e(S)=yxn/2$ with $y\geq\bar{y}(x,d)+\varepsilon$ is $o(n^{-2})$ . (For simplicity, we do not round numbers that are supposed to be integers either up or down; this is justified since these rounding errors are negligible to the asymptomatic calculations we will make.) This, together with the first moment principle, implies that a.a.s. no such set exists for any $x\in(0,1)$ and $y\in[\bar{y}(x,d)+\varepsilon,d]$ (as there are $O(n)$ possible sizes of $S$ and $O(n)$ possible values of $e(S)$ that we need to consider).

Let $x=x(n)$ and $y=y(n)$ be any functions of $n$ such that $0<x<1$ and $\bar{y}(x,d)+\varepsilon<y<d$ . Let $X(x,y)$ be the expected number of sets $S$ such that $|S|=xn$ and $e(S)=yxn/2$ . Using the pairing model, it is clear that

[TABLE]

where $M(i)$ is the number of pairings of $i$ vertices, that is,

[TABLE]

(Each time we deal with pairings, $i$ is assumed to be an even number.) After simplification we get

[TABLE]

Using Stirling’s formula ( $i!\sim\sqrt{2\pi i}(i/e)^{i}$ ) and focusing on the exponential part we obtain

[TABLE]

where $f(x,y,d)$ is defined in (5). It follows immediately from the definition of $\bar{y}(x,d)$ that $f(x,y,d)<0$ is bounded away from zero for any pairs of integers $xn$ and $yxn/2$ under consideration, and so for any pair we get $X(x,y)=o(n^{-2})$ and the proof is finished. $\Box$

3.4 Explicit but weaker upper bound

Theorem 2 provides an upper bound that can be easily numerically computed for a given $d\in\mathbb{N}\setminus\{1,2\}$ . Next, we present a slightly weaker but an explicit bound that can be obtained using the expansion properties of random $d$ -regular graphs that follow from their eigenvalues. In particular, it will imply that a.a.s. $q^{*}({\mathcal{G}}_{n,d})=O(1/\sqrt{d})$ and so $q^{*}({\mathcal{G}}_{n,d})\to 0$ as $d\to\infty$ .

The adjacency matrix $A=A(G)$ of a given $d$ -regular graph $G$ with $n$ vertices, is an $n\times n$ real and symmetric matrix. Thus, the matrix $A$ has $n$ real eigenvalues which we denote by $\lambda_{1}\geq\lambda_{2}\geq\cdots\geq\lambda_{n}$ . It is known that certain properties of a $d$ -regular graph are reflected in its spectrum but, since we focus on expansion properties, we are particularly interested in the following quantity: $\lambda=\lambda(G)=\max(|\lambda_{2}|,|\lambda_{n}|)$ . In words, $\lambda$ is the largest absolute value of an eigenvalue other than $\lambda_{1}=d$ (for more details, see the general survey [29] about expanders, or [6], Chapter 9).

The value of $\lambda$ for random $d$ -regular graphs has been studied extensively. A major result due to Friedman [26] is the following:

Lemma 3 ([26])

For every fixed $\varepsilon>0$ and for $G\in\mathcal{G}_{n,d}$ , a.a.s.

[TABLE]

We prove the following theorem.

Theorem 4

Let $d\in\mathbb{N}\setminus\{1,2\}$ . Then, for any $d$ -regular graph $G_{n,d}$ we have

[TABLE]

In particular, for random $d$ -regular graphs a.a.s.

[TABLE]

*Proof. * The second part of the theorem follows from Lemma 3, as for a random $d$ -regular graphs a.a.s. $\frac{\lambda}{d}\leq\frac{2\sqrt{d-1}+\varepsilon}{d}\leq\frac{2}{\sqrt{d}}$ for sufficiently small $\varepsilon>0$ . Let us now show that $q^{*}({\mathcal{G}}_{n,d})\leq\frac{\lambda}{d}$ .

The number of edges $|E(S,T)|$ between sets $S$ and $T$ is expected to be close to the expected number of edges between $S$ and $T$ in a random graph of edge density $d/n$ , namely $d|S||T|/n$ . A small $\lambda$ (or large spectral gap) implies that this deviation is small. Namely, for our purpose here we will use the following lower estimate for $|E(S,V\setminus S)|$

[TABLE]

for all $S\subseteq V$ . This is proved in [5], see also [6]. Using this inequality we get immediately that for any $S$ of size $xn$ we have

[TABLE]

So, a.a.s., in ${\mathcal{G}}_{n,d}$ no set $A$ of size $xn$ induces a graph with more than $yxn/2$ edges, where $y=dx+\lambda(1-x)$ . Now the desired upper bound follows from Lemma 1. $\Box$

We have also tried several other ideas attempting to obtain a better upper bound. Unfortunately, they did not lead to improvements, therefore we place the discussion of these ideas to Appendix.

4 Lower bounds in terms of average degree

In this section, we obtain some general lower bounds for modularity. In particular, the obtained bounds are useful for graphs with bounded average degree. In Section 5, we apply these results to obtain a lower bound for the modularity of preferential attachment model (see Theorem 10).

Let us start with the analysis of trees. It was proven in [38] that trees with maximum degree $\Delta=o(\sqrt[5]{n})$ have asymptotic modularity 1. We generalize this result in two ways: first, we relax the condition on maximum degree; second, we allow our graphs to be disconnected, that is, we consider forests instead of trees. We prove the following theorem.

Theorem 5

Let $\{F_{n}\}$ be a sequence of forests, where $F_{n}$ has $n$ non-isolated vertices and the maximum degree $\Delta=\Delta(F_{n})$ . Then the following lower bound holds

[TABLE]

This theorem implies that if the maximum degree $\Delta(F_{n})=o(n)$ , then $q^{*}(F_{n})=1-o(1)$ . Note that it is also known that the asymptotic modularity of trees with maximum degree $\Delta=\Omega(n)$ is strictly less than 1 [38]. Hence, the assumption $\Delta=o(n)$ cannot be eliminated.

We further generalize the above theorem to all connected graphs and prove the following result.

Theorem 6

Let $\{G_{n}\}$ be a sequence graphs, where $G_{n}$ is a connected graph on $n$ vertices with the maximum degree $\Delta=\Delta(G_{n})$ and the average degree $\bar{d}=\bar{d}(G_{n})$ . Then

[TABLE]

The theorem implies that if $\bar{d}(G_{n})\leq D$ for some constant $D$ and $\Delta(G_{n})=o(n)$ , then $q^{*}(G_{n})\geq\frac{2}{D}-o(1)$ . Note that for $\bar{d}=2$ Theorem 6 looks similar to Theorem 5. However, there are two important differences: Theorem 6 is not restricted to forests, but requires graphs to be connected.

Before we prove both theorems let us introduce some notation and the main lemma which we will use.

Definition 7

Let $G$ be a graph and let $A$ be any subset of its vertex set $V(G)$ . We define $\operatorname{vol}_{G}(A):=\sum_{v\in A}\deg(v)$ , where $\deg(v)$ is the degree of a vertex $v$ in $G$ . We also use the notation $\operatorname{vol}_{G}(G^{\prime}):=\operatorname{vol}_{G}(V(G^{\prime}))$ , where $G^{\prime}$ is a subgraph of $G$ .

Lemma 8

For every connected graph $G$ with maximum degree $\Delta$ and every $h>0$ there exists a partition of the vertex set into connected parts $A_{1},\ldots,A_{k}$ such that $\frac{h}{\Delta}-1\leq\operatorname{vol}_{G}(A_{i})\leq h$ . for all $1\leq i\leq k$ .

*Proof. * For a graph $G$ let us consider its spanning tree $T$ and decompose it, by removing some edges, into subtrees $T_{1},\ldots,T_{k}$ such that $\frac{h}{\Delta}-1\leq\operatorname{vol}_{G}(T_{i})\leq h$ for each $1\leq i\leq k$ . The way we do this decomposition is in a sense similar to the algorithm greedy-decompose≤h from [38]. Namely, we first redefine a notion of a centroid edge of a subtree $T^{\prime}$ of the initial tree $T$ .

Definition 9

The removal of any edge from a tree $T^{\prime}$ splits $T^{\prime}$ into two parts $T^{1}$ and $T^{2}$ . A centroid edge of $T$ is an edge chosen to maximize $\min\{\operatorname{vol}_{G}(T^{1}),\operatorname{vol}_{G}(T^{2})\}$ .

Our algorithm is the following: as long as our forest contains a tree $T^{\prime}$ with $\operatorname{vol}_{G}(T^{\prime})>h$ , it finds a centroid edge $e$ of $T^{\prime}$ and removes it. After this decomposition, we obtain trees $T_{1},\ldots,T_{k}$ and we set $A_{i}=V(T_{i})$ for $1\leq i\leq k$ .

Obviously, for each $i$ we have $\operatorname{vol}_{G}(A_{i})\leq h$ . Let us show that we also have $\operatorname{vol}_{G}(A_{i})\geq\frac{h}{\Delta}-1$ . Consider any step of our decomposition procedure. We take a tree $T^{\prime}$ with $\operatorname{vol}_{G}(T^{\prime})=h^{\prime}>h$ , remove its centroid edge $e$ , and obtain two trees $T^{1}$ and $T^{2}$ . Without loss of generality we may assume that $\operatorname{vol}_{G}(T^{1})\leq\operatorname{vol}_{G}(T^{2})$ . Let $s=\operatorname{vol}_{G}(T^{1})$ , $s\leq h^{\prime}/2$ . Let $x$ be the vertex incident with $e$ and belonging to $T^{2}$ . For every edge $e^{\prime}$ incident with $x$ , for the part $T^{\prime\prime}$ of $T^{\prime}-e^{\prime}$ not containing $x$ we have $\operatorname{vol}_{G}(T^{\prime\prime})\leq s$ (otherwise $e$ is not a centroid edge). As $x$ has degree at most $\Delta$ , we have $h^{\prime}\leq\Delta s+\Delta$ (at most $s$ for each of the $\leq\Delta$ parts plus the degree of $x$ itself). So, $s\geq\frac{h^{\prime}-\Delta}{\Delta}>\frac{h}{\Delta}-1$ . This proves that $\operatorname{vol}_{G}(A_{i})\geq\frac{h}{\Delta}-1$ and completes the proof of the lemma. $\Box$

Now, we are ready to prove Theorem 6 and Theorem 5.

*Proof. * (Proof of Theorem 6.) Let us take $h=\sqrt{n\Delta\bar{d}}+\Delta$ and partition $V(G_{n})$ into $A_{1},\ldots,A_{k}$ according to Lemma 8. To obtain the desired lower bound, we estimate $q_{{\cal{A}}}$ for ${\cal{A}}=\{A_{1},\ldots,A_{k}\}$ . We first deal with the edge contribution. As stated in Lemma 8, we have $\operatorname{vol}_{G_{n}}(A_{i})>\frac{h}{\Delta}-1$ for all $i$ . Also, $\sum_{i}\operatorname{vol}_{G_{n}}(A_{i})=\operatorname{vol}_{G_{n}}(G_{n})=n\bar{d}$ . Therefore, $k\leq n\bar{d}/(\frac{h}{\Delta}-1)$ . The number of intracluster edges in the spanning tree is $n-k$ , and clearly this is the lower bound for $\sum_{A\in{\cal{A}}}e(A)$ . Finally,

[TABLE]

It remains to estimate the degree tax. Recall that a $\operatorname{vol}_{G_{n}}(A_{i})\leq h$ for all $i$ and $\sum_{i}\operatorname{vol}_{G_{n}}(A_{i})=n\bar{d}$ . Therefore,

[TABLE]

and so the proof is finished. $\Box$

*Proof. * (Proof of Theorem 5.) This proof is similar to the previous one. Let us fix $h=\sqrt{\Delta n}$ . The idea is to partition $V(F_{n})$ into $A_{1},\ldots,A_{k}$ such that for each $i$ : $\operatorname{vol}_{F_{n}}(A_{i})\leq h$ and a subgraph induced by $A_{i}$ is a tree. Our forest $F_{n}$ may already contain trees $T_{1},\ldots,T_{\ell}$ with $\operatorname{vol}_{F_{n}}(T_{i})\leq h$ . Let us denote the corresponding vertex sets by $A_{1},\ldots,A_{l}$ . We decompose the remaining trees according to Lemma 8 (applied to each tree separately) into $A_{l+1},\ldots,A_{k}$ .

Now we have the partition ${\cal{A}}=\{A_{1},\ldots,A_{k}\}$ of the vertex set $V(F_{n})$ . In order to estimate $q_{{\cal{A}}}$ we first consider the edge contribution. According to Lemma 8, $\operatorname{vol}_{F_{n}}(A_{i})\geq\frac{h}{\Delta}-1$ for $\ell+1\leq i\leq k$ . Therefore, it is easy to show that for each intercluster edge we can find at least $\frac{h}{2\Delta}-1$ inracluster edges. Hence,

[TABLE]

It remains to estimate the degree tax. Recall that $\operatorname{vol}_{F_{n}}(A_{i})\leq h$ for $1\leq i\leq k$ and $\sum_{i}\operatorname{vol}_{F_{n}}(A_{i})\geq n$ . Therefore,

[TABLE]

and so the proof is finished. $\Box$

5 The Preferential Attachment model

5.1 Lower bound

The following theorem easily follows from the results of the previous section.

Theorem 10

For any $\varepsilon>0$ a.a.s.

[TABLE]

*Proof. * Let $\varepsilon>0$ . It is well-known that a.a.s. $\Delta(G_{m}^{n})=O\left(n^{\frac{1}{2}+2\varepsilon}\right)$ (see, e.g., [22] and Theorem 17 in [12]). Also, clearly the average degree of $G_{m}^{n}$ is at most $2m$ (it can be less due to the removal of loops and multiple edges). In addition, for $m\geq 2$ a.a.s. $G_{m}^{n}$ is connected [13]. So, the statement of Theorem 10 follows directly from Theorems 5 and 6. $\Box$

We would like to remark that the obtained lower bound holds for many other models of complex networks. For example, it holds for the Random Apollonion Network [50] (in this case $m=3$ ) or for the Buckley-Osthus model [17] (with slightly corrected error term).

As in the case of random $d$ -regular graphs, it is natural to conjecture that the above lower bound is not sharp. Let $c\in(0,1)$ and consider the following partition: $A_{1}=\{v_{1},\ldots,v_{cn}\}$ , $A_{2}=V(G_{m}^{n})\setminus A_{1}=\{v_{cn+1},\ldots,v_{n}\}$ . Using martingales, it is possible to show that a.a.s. $\sum_{v\in A_{1}}\deg(v,n)\sim 2mn\sqrt{c}$ (and so $\sum_{v\in A_{2}}\deg(v,n)\sim 2mn(1-\sqrt{c})$ ); see Lemma 11 below. Clearly, $e(A_{1})=mcn$ and so a.a.s. $e(A_{1},A_{2})\sim 2mn(\sqrt{c}-c)$ and $e(A_{2})\sim mn(1+c-2\sqrt{c})$ . The edge contribution and the degree tax are then both asymptotic to $1+2c-2\sqrt{c}$ . Not surprisingly, such partition cannot be used to get a non-trivial lower bound for the modularity but, similarly to the situation for random $d$ -regular graphs, we may try to use it as a starting point to get slightly better partition. The basic idea is very simple: one can start with a given partition (or partition the vertices randomly into two classes), and if a vertex has more neighbours in the other class than in its own, then we randomly decide whether to shift it to the other class or leave it where it is. This approach proved to be useful to get a bound for the bisection width in random $d$ -regular graphs [3] which, in turn, yields a lower bound for the modularity [36]. In the proceeding version of this paper [44] we promised to investigate this approach. However, the following turns out to be slightly easier to do.

We will use the following standard martingale tool: the Hoeffding-Azuma inequality; for more details, see, for example, [30]. Let $X_{0},X_{1},\ldots$ be a martingale. Suppose that there exist $c_{1},c_{2},\ldots,c_{n}>0$ such that $|X_{k}-X_{k-1}|\leq c_{k}$ for each $1\leq k\leq n$ . Then, for every $x>0$ ,

[TABLE]

The Hoeffding-Azuma inequality can be generalized to include random variables close to martingales. One of our proofs, proof of Lemma 11, will use the supermartingale method of Pittel et al. [46], as described in [49, Corollary 4.1]. Let $X_{0},X_{2},\ldots,X_{n}$ be a sequence of random variables. Suppose that there exist $c_{1},c_{2},\ldots,c_{n}>0$ and $b_{1},b_{2},\ldots,b_{n}>0$ such that

[TABLE]

for each $1\leq k\leq n$ . Then, for every $x>0$ ,

[TABLE]

Let us now prove the following lemma.

Lemma 11

Fix any constant $c\in(0,1)$ and $m\in\mathbb{Z}_{\geq 0}$ . The following property holds a.a.s. for $G^{n}_{m}$ . For any $s$ , $cn\leq s\leq n$ ,

[TABLE]

where $Y_{s}:=\sum_{w\in[cn]}\deg(w,s)$ .

*Proof. * In view of the identification between the models $G_{m}^{n}$ (on the vertex set $1,2,\ldots,n$ ) and $G_{1}^{mn}$ (on the vertex set $1^{\prime},2^{\prime},\ldots,mn^{\prime}$ ), it will be useful to investigate the following random variable instead of $Y_{s}$ : for $m\lfloor cn\rfloor\leq t\leq mn$ , let

[TABLE]

Clearly, $Y_{s}=X_{sm}$ . It follows that $X_{m\lfloor cn\rfloor}=Y_{\lfloor cn\rfloor}=2m\lfloor cn\rfloor$ . Moreover, for $m\lfloor cn\rfloor<t\leq mn$ ,

[TABLE]

The conditional expectation is given by

[TABLE]

Taking expectation again, we derive that

[TABLE]

Hence, it follows that

[TABLE]

In order to transform $X_{t}$ into something close to a martingale (to be able to apply the generalized Azuma-Hoeffding inequality (9)), we set for $m\lfloor cn\rfloor\leq t\leq mn$

[TABLE]

(note that $Z_{m\lfloor cn\rfloor}=0$ ) and use the following stopping time

[TABLE]

Indeed, we have for $m\lfloor cn\rfloor<t\leq mn$

[TABLE]

provided $t\leq T$ , and $|Z_{t}-Z_{t-1}|\leq 1$ as $t>cmn$ . Let $t\wedge T$ denote $\min\{t,T\}$ . We apply the generalized Azuma-Hoeffding inequality (9) to the sequence $(Z_{t\wedge T}:m\lfloor cn\rfloor\leq t\leq mn)$ , with $c_{t}=1$ , $b_{t}=0.51t^{-1/3}$ and $x=0.1t^{2/3}$ , to conclude that a.a.s. for all $t$ such that $m\lfloor cn\rfloor\leq t\leq mn$

[TABLE]

To complete the proof we need to show that a.a.s. $T=mn$ . The events asserted by the equation hold a.a.s. up until time $T$ , as shown above. Thus, in particular, a.a.s.

[TABLE]

which implies that $T=mn$ a.a.s. In particular, it follows that a.a.s., for any $cn\leq s\leq n$ , $Y_{s}=X_{ms}<2mn\sqrt{cs/n}+(mn)^{2/3}=2mn\sqrt{cs/n}+o(n)$ . The lower bound can be obtained by applying the same argument symmetrically to $(-Z_{t\wedge T}:m\lfloor cn\rfloor\leq t\leq mn)$ , and so the proof is finished. $\Box$

Now, we are ready to prove the following, stronger, lower bound.

Theorem 12

A.a.s.:

[TABLE]

That is, a.a.s.

[TABLE]

In particular, a.a.s. $q^{*}(G_{m}^{n})=\Omega\left(1/\sqrt{m}\right)$ .

Before we prove the theorem, let us present numerical values for a few values of $m$ : $L_{1}=L_{1}(m)=1/m$ is the lower bound following from Theorem 10 and $L_{2}=L_{2}(m)$ is the lower bound from Theorem 12; see Table 2. Large degree tax hidden in $L_{2}$ makes this bound weaker for small values of $m\leq 6$ ; for larger values $L_{2}$ is better than $L_{1}$ .

*Proof. * Let $\varepsilon>0$ be any constant. Let us start with generating $G_{m}^{\varepsilon n}$ ; vertices from $[\varepsilon n/4]$ are coloured red and vertices from $[\varepsilon n]\setminus[\varepsilon n/4]$ are coloured blue. We will continue generating $G_{m}^{n}$ , colouring vertices red or blue (one by one, as they are introduced in the process), depending on how many of their neighbours are of each colour. We want to control the sum of degrees of vertices in each colour; that is, the following random variable

[TABLE]

The colouring process depends on the parity of $m$ . If $m$ is even, we colour vertex $t\in[n]\setminus[\varepsilon n]$ red if more than $m/2$ neighbours (in $G_{m}^{t}$ ) are red. If the number of red neighbours is precisely $m/2$ , we colour it red with probability $1/2+p_{t}$ , where $p_{t}=p_{t}(Y_{t-1})=o(1)$ will be determined soon. Otherwise, $t$ is coloured blue. If $m$ is odd, the process is slightly different. If the number of red neighbours is more than $(m+1)/2$ , we colour it red. If it is $(m+1)/2$ or $(m-1)/2$ , we colour it red with probability $1-q_{t}$ and, respectively $r_{t}$ , where $q_{t}+r_{t}=q_{t}(Y_{t-1})+r_{t}(Y_{t-1})=o(1)$ . Otherwise, $t$ is coloured blue. The arguments for both cases are almost identical so we assume now that $m$ is even; it will be clear what needs to be adjusted for odd value of $m$ . In both situations, our hope is that the two graphs, induced by red and blue vertices, will be dense.

It follows from Lemma 11 that a.a.s. $|Y_{\varepsilon n}-m\varepsilon n|\leq(m\varepsilon n)^{2/3}$ , so we may assume that this inequality holds. This time we use the following stopping time

[TABLE]

Arguing as in the previous lemma, we get that

[TABLE]

provided that $t<T$ . Since

[TABLE]

and

[TABLE]

we get that

[TABLE]

Since ${\mathbb{P}}\left(\operatorname{Bin}\left(m,1/2\right)=m/2\right)=\binom{m}{m/2}2^{-m}=\Theta(1)$ , we can adjust $p_{t}=p_{t}(Y_{t-1})=O(t^{-1/3})$ so that ${\bf E}(Y_{t}-Y_{t-1}~{}~{}|~{}~{}Y_{t-1})=m$ ; that is, the sequence of random variables $Z_{t}=(Y_{t}-Y_{\varepsilon n})-m(t-\varepsilon n)$ is a martingale. It follows from the classic Hoeffding-Azuma inequality (8), applied to $Z_{t}$ with $c_{t}=m$ and $x=(mn)^{2/3}$ , that a.a.s., for each $\varepsilon n\leq t\leq n$ ,

[TABLE]

The rest of the proof is straightforward. We partition the vertex set of $G_{n}^{m}$ into red and blue vertices. The degree tax is a.a.s.

[TABLE]

It remains to estimate the edge contribution. Clearly, the process guarantees that at least half of the edges are within the two clusters. However, we will do slightly better than that. For any $i\in[m/2]$ , with probability asymptotic to $2{\mathbb{P}}(\operatorname{Bin}(m,1/2)=m/2+i)$ , at any point of the process we add $m/2+i$ edges to some cluster; $m/2$ edges are added with probability asymptotic to ${\mathbb{P}}(\operatorname{Bin}(m,1/2)=m/2)$ . Hence, the expected number of edges added to some cluster is asymptotic to

[TABLE]

The expected edge contribution is then asymptotic to

[TABLE]

Finally, one can bound the edge contribution (independently, from above and from below) by the sum of independent random variables, and use Chernoff bound to get a concentration. It follows that a.a.s.

[TABLE]

and the result holds after taking $\varepsilon\to 0$ sufficiently slowly.

Finally, some elementary calculations show that for any $t\in[0,m/8]$ , we have

[TABLE]

see, for example, [15]. (More general and precise bounds can be found in [21].) It follows that a.a.s. $q^{*}(G_{m}^{n})\geq 2(\sqrt{m}/8)e^{-1/4}/(15m)+o(1)=\Omega(1/\sqrt{m})$ , and the proof is finished. $\Box$

5.2 Upper bound

Recall that the edge expansion $\rho=\rho(G)$ of a graph $G$ is defined as follows:

[TABLE]

In [37] it was shown that a.a.s. $\rho(G^{n}_{m})\geq\alpha$ , provided that $2(m-1)-4\alpha-1>0$ . In other words, for any $\varepsilon>0$ we have that a.a.s.

[TABLE]

Using this observation one can easily obtain a non-trivial upper bound for $q^{*}(G_{m}^{n})$ .

Let $\varepsilon>0$ be an arbitrary small constant. Consider any partition ${\cal{A}}=\{A_{1},\ldots,A_{k}\}$ of the vertex set $V(G_{m}^{n})$ . If $|A_{i}|>n/2$ for some $i$ , then the degree tax is at least

[TABLE]

On the other hand, if $|A_{i}|\leq n/2$ for all $i$ , then a.a.s. the number of edges between parts is equal to

[TABLE]

and so the edge contribution is a.a.s. at most

[TABLE]

for any $m\geq 2$ . Therefore, the following result holds.

Theorem 13

For any $\varepsilon>0$ a.a.s.

[TABLE]

Moreover, for any $m\geq 3$ a.a.s.

[TABLE]

Some stronger expansion properties were recently obtained in [27]. However, whereas they presumably could be used to obtain some small improvements for an upper bound of $q^{*}(G_{m}^{n})$ (for specific values of $m$ ), we do not know how to show that $q^{*}(G_{m}^{n})\to 0$ as $m\to\infty$ . Perhaps $q^{*}(G_{m}^{n})=\Theta(1/\sqrt{m})$ as in the case of random $(2m)$ -regular graphs?

6 The Spatial Preferential Attachment model

Consider $G_{n}=(V_{n},E_{n})$ , a graph generated by the SPA model. As the modularity is defined for undirected graphs, we consider $\hat{G}_{n}$ that is a graph obtained from $G_{n}$ by replacing each directed edge $(u,v)$ by undirected edge $uv$ . (As edges in $G_{n}$ are always from ‘younger’ to ‘older’ vertices, there is no problem with generating multigraph; $\hat{G}_{n}$ is a simple graph.) Let us recall that $V_{n}\subseteq S$ where $S$ is the unit hypercube $[0,1]^{m}$ . We will use the geometry of the model to obtain a suitable partition that yields high modularity of $G_{n}$ . The following properties (proved many times; see, for example, [1, 19]) are the only properties of the model that will be used in the proof: a.a.s. for every pair $i,t$ such that $1\leq i\leq t\leq n$ we have that

[TABLE]

and $|E(G_{n})|=\Theta(n)$ . Since we aim for a result that holds a.a.s., we may assume in the proof below that these properties hold deterministically. Now, we are ready to state our result for the SPA model.

Theorem 14

Let $p\in(0,1]$ , $A_{1},A_{2}>0$ , and suppose that $pA_{1}<1$ . Then, the following holds a.a.s.:

[TABLE]

*Proof. * Let $\omega=\left[n^{\min\{1/m,1-pA_{1}\}/2}\log n^{-1/2}\right]$ . Note that $\omega\geq n^{\varepsilon}$ for some $\varepsilon>0$ that depends on the parameters of the model. Let us partition the space $S$ into $\omega$ parts as follows: for each integer $1\leq r\leq\omega$ ,

[TABLE]

This partition of $S$ naturally gives us a partition ${\cal{A}}$ of the vertex set: for each $1\leq r\leq\omega$ , $A_{r}=V_{n}\cap S_{r}$ . We will show that a.a.s.

[TABLE]

which will finish the proof as $q^{*}(\hat{G}_{n})\geq q_{{\cal{A}}}(\hat{G}_{n})$ and always $q^{*}(\hat{G}_{n})\leq 1$ .

First, let us start with estimating the edge contribution. In order to do that, we need to estimate the number of edges between different parts. So, let us focus on any part $A_{r}$ . We will investigate how many bad edges in $G_{n}$ connect vertices outside of $A_{r}$ with vertices inside $A_{r}$ by counting (independently) bad edges directed to vertices of similar age. (Note that for convenience we consider here directed graph $G_{n}$ instead of $\hat{G}_{n}$ .) For a given integer $k$ such that $0\leq k\leq\lfloor\log n\rfloor$ , let

[TABLE]

It is clear that $\{V^{(k)}:0\leq k\leq\lfloor\log n\rfloor\}$ and $\{E^{(k)}:0\leq k\leq\lfloor\log n\rfloor\}$ are partitions of the vertex set and the edge set (both in $\hat{G_{n}}$ and $G_{n}$ ), respectively, and so $\{C^{(k)}:0\leq k\leq\lfloor\log n\rfloor\}$ is a partition of the bad edges we want to count. It remains to estimate the size of $C^{(k)}$ for a given value of $k$ .

Fix $0\leq k\leq\lfloor\log n\rfloor$ , and let us concentrate on any $v_{i}\in V^{(k)}$ . It follows from (10) that the maximum volume of a sphere of influence of $v_{i}$ is $O(i^{-1}\log^{2}n)=O(e^{-k}\log^{2}n)$ (during the whole process) and so the maximum radius of influence of $v_{i}$ is $O((e^{-k}\log^{2}n)^{1/m})$ . Therefore, if there is an edge in the cut directed to $v_{i}=(s_{1},\ldots,s_{m})$ , then $v_{i}$ must fall not only into $A_{r}$ but also into a strip within distance $O((e^{-k}\log^{2}n)^{1/m})$ from one of the two cutting hyperplanes separating $A_{r}$ from the neighbouring parts; that is, $|s_{1}-\frac{r-1}{\omega}|=O((e^{-k}\log^{2}n)^{1/m})$ or $|s_{1}-\frac{r}{\omega}|=O((e^{-k}\log^{2}n)^{1/m})$ . Since $|V^{(k)}|=O(e^{k})$ , we get that

[TABLE]

vertices of $V^{(k)}$ are expected to appear in these two strips during the whole process. Hence, it follows from Chernoff bound that with probability at least $1-\exp(-\Theta(\log^{2}n))$ there are $O(e^{k(1-1/m)}\log^{2}n)$ vertices in these strips at the end of the process. Note that the exponent of $\log n$ has changed from $2/m$ to $2$ in order to guarantee the claimed upper bound is at least $\log^{2}n$ which is required for a bound to hold with the desired probability. Using (10) one more time, we get that all vertices introduced in this time period have (final) in-degree $O((n/e^{k})^{pA_{1}}\log^{2}n)$ . Hence, there are

[TABLE]

edges in the cut with probability at least $1-\exp(-\Theta(\log^{2}n))$ and so this property holds a.a.s. for all parts $A_{r}$ and all values of $k$ . It follows that a.a.s. the number of bad edges involving $A_{r}$ is at most

[TABLE]

Finally, we get an estimate for the edge contribution: a.a.s.

[TABLE]

It remains to estimate the degree tax. In order to do that we need to, for a given $r$ under consideration, estimate $\sum_{v\in A_{r}}\deg(v)$ in $\hat{G}_{n}$ ; that is, $\sum_{v\in A_{r}}(\deg^{-}(v)+\deg^{+}(v))$ in $G_{n}$ . As before, we partition the vertices of $A_{r}$ into sets containing vertices of similar age. Let $k_{0}$ be the largest integer $k$ such that $(k-1)\omega\log^{2}n<n$ . Clearly, $k_{0}=O(n/(\omega\log^{2}n)$ . This time, for a given integer $k$ such that $1\leq k\leq k_{0}$ , let

[TABLE]

and our goal is to estimate the size of $A_{r}\cap V^{(k)}$ . The expected number of vertices of $V^{(k)}$ that fall into $A_{r}$ is $|V^{(k)}|/\omega\leq\log^{2}n+1$ and it follows from Chernoff’s bound that with probability at least $1-\exp(-\Theta(\log^{2}n))$ it is $O(\log^{2}n)$ . Using (10) for the last time, we get that all vertices introduced in this time period have (final) in-degree $O((n/(k\omega\log^{2}n))^{pA_{1}}\log^{2}n)$ , provided $k\geq 2$ ; and $O(n^{pA_{1}}\log^{2}n)$ for $k=1$ . It follows that with the desired probability

[TABLE]

and so it holds a.a.s. for all $r$ . Similarly, using Chernoff’s bound and (11) we get that a.a.s. for all $r$ we have $|A_{r}|\sim n/\omega$ and so

[TABLE]

Finally, we are able to get an estimate for the degree tax in $\hat{G_{n}}$ : a.a.s.

[TABLE]

and the proof is finished. $\Box$

7 Discussion and future research

In this paper, we investigated modularity and provided precise theoretical bounds for several random graph models, such as random $d$ -regular graphs, constant average degree graphs, preferential attachment and SPA models. However, there are plenty of directions for future research. For example, for preferential attachment model we expect that $q^{*}(G_{m}^{n})=\Theta(1/\sqrt{m})$ . However, even the fact that $q^{*}(G_{m}^{n})\to 0$ as $m\to\infty$ is still unproven.

Also, in this paper we studied the most popular version of modularity, while other definitions (suitable for some particular clustering problems) were proposed in the literature (see discussion in [24]). For example, it was proposed to multiply the degree tax by a resolution parameter $\gamma$ . Note that most of our results can be easily extended to such definition, as we separately estimate edge contribution and degree tax. Also, Erdős–Rényi random graph model can be used as a null model (instead of the pairing model) to compute the degree tax. This version of modularity is much easier to analyze, but such null model cannot describe real networks well, since it has an unrealistic Poisson degree distribution.

Finally, we would like to note that there is another model, which, similarly to SPA, combines geometry and preferential attachment [23]. It would be interesting to investigate the modularity for this model and we expect that its modularity tends to 1 (as for the SPA model). However, these two models are different and our result does not imply anything for the other model.

Acknowledgements

This work is supported by Russian Science Foundation (grant number 16-11-10014), NSERC, The Tutte Institute for Mathematics and Computing, and Ryerson University.

Appendix

7.1 Random $d$ -regular graphs, some ideas for an upper bound

Idea 1: Recall that in order to get the current best upper bound we showed that a.a.s. no set of size $xn$ induces more than $\bar{y}(x,d)xn/2$ edges. As a result the largest value of $y_{i}/d-x_{i}$ in (4) is at most $U_{3}(d)$ . For example, for $d=3$ the optimal choice that maximizes $U_{3}(3)$ is: $x=\hat{x}\approx 0.0225$ , $y=\hat{y}\approx 2.4789$ , and so $U_{3}(3)\approx 0.8038$ as reported in Table 1. However, clearly it is impossible to partition a graph precisely into parts of size $\hat{x}n$ . It is possible to show that the following upper bound holds, which is clearly not larger than the previous one:

[TABLE]

Unfortunately, this maximum value is achieved for $k=45$ (which corresponds to parts of size roughly $0.0222n$ , and no improvement is achieved: $U_{4}(3)\approx 0.8038$ . The reason this idea fails is that the optimal value of $\hat{x}$ is small so that rounding to the nearest integer for $k$ does not improve the bound much.

Idea 2: Let us look at (4) again but this time let us order the terms so that

[TABLE]

It follows that

[TABLE]

It is slightly more tedious than before, but one can get an improvement by considering (ordered) disjoint pairs of vertices $X_{1},X_{2}$ with $|X_{1}|=x_{1}n$ , $|X_{2}|=x_{2}n$ , $e(X_{1})=y_{1}x_{1}n/2$ , $e(X_{2})=y_{2}x_{2}n/2$ , and $e(X_{1},X_{2})=zn$ . Unfortunately, this idea also does not provide any reasonable improvement. For $d=3$ , the expected number of pairs of sets for the following vector $(x_{1},x_{2},y_{1},y_{2},z)=(\hat{x_{1}},\hat{x_{2}},\hat{y_{1}},\hat{y_{2}},\hat{z})\approx(0.0239,0.0225,2.4830,2.4790,0.000037)$ and, again, no substantial improvement is achieved: $U_{5}(3)\approx 0.8038$ .

Idea 3: As before, let us concentrate on the case $d=3$ but similar ideas can be used for any integer $d\geq 3$ . We can try to use the fact that $G_{n,3}$ can be constructed by putting a random matching on the vertices of a Hamiltonian cycle. Let us fix any set of vertices of size $xn$ that induces $zn$ components (paths) by restricting only to edges of the Hamiltonian cycle. Each such set can be represented by the following triple: vertex $v$ , vector $(a_{1}-1,\ldots,a_{zn}-1)$ , and vector $(b_{1}-1,\ldots,b_{zn}-1)$ : $v$ starts some path, $a_{i}$ is the number of vertices on path $i$ , $b_{i}$ is the number of vertices not in the set and right after path $i$ . The number of such sets is at most $n{xn\choose zn}{(1-x)n\choose zn}$ . The number of edges within this set that are part of the Hamiltonian cycle is $xn-zn$ . Hence, in order for the set to induce $yxn/2$ edges, $(yx/2-x+z)n$ edges must be coming from the matching.

The hope is (that is, was) that for small values of $z$ , there are only a few sets to consider. On the other hand, if $z$ is closer to $x$ , then less edges are “for free” (edges of the Hamiltonian cycle). Unfortunately, again this idea does not lead to any substantial improvement. Concentrating on $d=3$ , $\hat{x}=0.0225$ , $\hat{y}=2.4789$ , and tuning $\hat{z}\approx 0.00392$ , the expected number of such sets is tending to infinity as $n\to\infty$ .

Conclusion: The lack of improvement is disappointing but perhaps should not be surprising. Looking at one or two parts of a partition maximizing $q^{*}$ is not enough (local property). Having one large term $y_{i}/d-x_{i}$ in (4) might be possible but having all of them to be large perhaps is not. So in order to improve the upper bound, one needs to consider all parts at the same time (global property).

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W. Aiello, A. Bonato, C. Cooper, J. Janssen, and P. Prałat, A spatial web graph model with local influence regions, Internet Mathematics 5 (2009), 175–196.
2[2] R. Albert, A.-L. Barabási, Statistical mechanics of complex networks, Reviews of modern physics, vol. 74 (2002), 47–97.
3[3] N. Alon, On the edge-expansion of graphs, Combinatorics, Probability and Computing 6 (1997), 145–152.
4[4] N. Alon and F.R.K. Chung, Explicit construction of linear sized tolerant networks, Discrete Math., 72 (1988), 15–19.
5[5] N. Alon and V.D. Milman, λ 1 subscript 𝜆 1 \lambda_{1} , isoperimetric inequalities for graphs and superconcentrators, J. Combinatorial Theory, Ser. B 38 (1985), 73–88.
6[6] N. Alon and J.H. Spencer, The Probabilistic Method, Wiley, 1992 (Second Edition, 2000).
7[7] S. Bansal, S. Khandelwal, L.A. Meyers, Exploring biological network structure with clustered random networks, BMC Bioinformatics, 10:405 (2009)
8[8] A.L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509–512.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Modularity of complex networks models

Abstract

1 Introduction

2 Preliminaries

2.1 Modularity

2.2 Random graph models

Random ddd-regular graphs.

Preferential Attachment.

Spatial Preferential Attachment.

2.3 Previous results on modularity

3 Random ddd-regular graphs

3.1 Pairing model

3.2 Lower bound

3.3 Numerical upper bound

Lemma 1

Theorem 2

3.4 Explicit but weaker upper bound

Lemma 3** ([26])**

Theorem 4

4 Lower bounds in terms of average degree

Theorem 5

Theorem 6

Definition 7

Lemma 8

Definition 9

5 The Preferential Attachment model

5.1 Lower bound

Theorem 10

Lemma 11

Theorem 12

5.2 Upper bound

Theorem 13

6 The Spatial Preferential Attachment model

Theorem 14

7 Discussion and future research

Acknowledgements

Appendix

7.1 Random ddd-regular graphs, some ideas for an upper bound

Random $d$ -regular graphs.

3 Random $d$ -regular graphs

Lemma 3 ([26])

7.1 Random $d$ -regular graphs, some ideas for an upper bound