Step-by-Step Community Detection in Volume-Regular Graphs
Luca Becchetti, Emilio Cruciani, Francesco Pasquale, Sara Rizzo

TL;DR
This paper extends spectral community detection methods to volume-regular graphs, showing that under certain spectral gap conditions, the community structure can be efficiently recovered without explicit eigenvector computation.
Contribution
It generalizes previous approaches to a broader class of graphs, establishing a connection between volume regularity and Markov chain lumpability for community detection.
Findings
Community structure can be recovered in logarithmic time.
The class of volume-regular graphs admits stepwise eigenvectors.
Spectral gap conditions ensure successful recovery.
Abstract
Spectral techniques have proved amongst the most effective approaches to graph clustering. However, in general they require explicit computation of the main eigenvectors of a suitable matrix (usually the Laplacian matrix of the graph). Recent work (e.g., Becchetti et al., SODA 2017) suggests that observing the temporal evolution of the power method applied to an initial random vector may, at least in some cases, provide enough information on the space spanned by the first two eigenvectors, so as to allow recovery of a hidden partition without explicit eigenvector computations. While the results of Becchetti et al. apply to perfectly balanced partitions and/or graphs that exhibit very strong forms of regularity, we extend their approach to graphs containing a hidden partition and characterized by a milder form of volume-regularity. We show that the class of -volume-regular graphs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Step-by-Step Community Detection in Volume-Regular Graphs
Luca Becchetti
Sapienza Università di Roma
Rome, Italy
[email protected] Partially supported by ERC Advanced Grant 788893 AMDROMA “Algorithmic and Mechanism Design Research in Online Markets” and MIUR PRIN project ALGADIMAR “Algorithms, Games, and Digital Markets”
Emilio Cruciani
Inria, I3S Lab, UCA, CNRS
Sophia Antipolis, France
Francesco Pasquale
Università di Roma Tor Vergata
Rome, Italy
[email protected] Partially supported by the University of “Tor Vergata” under research programme “Mission: Sustainability” project ISIDE (grant no. E81I18000110005)
Sara Rizzo
Gran Sasso Science Institute
L’Aquila, Italy
Abstract
Spectral techniques have proved amongst the most effective approaches to graph clustering. However, in general they require explicit computation of the main eigenvectors of a suitable matrix (usually the Laplacian matrix of the graph).
Recent work (e.g., Becchetti et al., SODA 2017) suggests that observing the temporal evolution of the power method applied to an initial random vector may, at least in some cases, provide enough information on the space spanned by the first two eigenvectors, so as to allow recovery of a hidden partition without explicit eigenvector computations. While the results of Becchetti et al. apply to perfectly balanced partitions and/or graphs that exhibit very strong forms of regularity, we extend their approach to graphs containing a hidden partition and characterized by a milder form of volume-regularity. We show that the class of -volume regular graphs is the largest class of undirected (possibly weighted) graphs whose transition matrix admits “stepwise” eigenvectors (i.e., vectors that are constant over each set of the hidden partition). To obtain this result, we highlight a connection between volume regularity and lumpability of Markov chains. Moreover, we prove that if the stepwise eigenvectors are those associated to the first eigenvalues and the gap between the -th and the (+1)-th eigenvalues is sufficiently large, the Averaging dynamics of Becchetti et al. recovers the underlying community structure of the graph in logarithmic time, with high probability.
Keywords: Distributed algorithms, Community detection, Markov chains, Spectral analysis
1 Introduction
Clustering a graph in a way that reflects underlying community structure is a very important mining task [For10]. Informally speaking, in the classical setting, we are given a possibly weighted graph and an integer . Our goal is to partition the vertex set of into disjoint subsets, so that the induced subgraphs have high inner and low outer expansion. Spectral techniques have proved amongst the most effective approaches to graph clustering [NJW02, SM00, VL07]. The general approach to spectral graph clustering [VL07] normally implies embedding the vertices of into the -dimensional subspace spanned by the main eigenvectors of a matrix defined in terms of ’s adjacency matrix, typically its (normalized) Laplacian. Intuitively, one expects that, for a well-clustered graph with communities, the profiles of the first eigenvectors are correlated with the underlying community structure of . Recent work has provided theoretical support to this approach. In particular, [LGT14] showed that, given the first orthonormal eigenvectors of the normalized Laplacian, it is possible to produce a -partition of the vertex set, corresponding to suitably-defined indicator vectors, such that the associated values of the Rayleigh quotient are relatively small. More recently, [PSZ17] proved that, under suitable hypotheses on the spectral gap between the -th and (+1)-th eigenvalue of the normalized Laplacian of , the span of the first eigenvectors largely overlaps with the span of , where is the diagonal degree matrix of , while the ’s are indicator vectors describing a -way partition of such that, for every , the conductance of is at most the -way expansion constant [LGT14]. Note that, if is an eigenvector associated to the -th smallest eigenvalue of the normalized Laplacian, is an eigenvector corresponding to the -th largest eigenvalue of the random walk’s transition matrix associated to . Hence, when is well-clustered, one might reasonably expect the first eigenvectors of to exhibit almost-“stepwise” profiles reflecting ’s underlying community structure. The aforementioned spectral approaches require explicit computation of the main eigenvectors of a (generally symmetric) matrix.
In [BCN*+*17], the authors considered the case for which they proposed the following distributed algorithm (Averaging dynamics, Algorithm 1): “At the outset, every node picks an initial value, independently and uniformly at random in ; then, in each synchronous round, every node updates its value to the average of those held by its neighbors. A node also tags itself blue if the last update increased its value, red otherwise” [BCN*+*17]. The authors showed that, under a variety of graph models exhibiting sparse balanced cuts, including the stochastic block model [HLL83], the process resulting from the above simple local rule converges, in logarithmic time, to a coloring that, depending on the model, exactly or approximately reflects the underlying cut. They further elaborated on how to extend the proposed approach to the case of multiple communities, providing an analysis for a strongly regular version of the stochastic block model with multiple communities. While results like those presented in [LGT14, PSZ17] provide further theoretical justification for spectral clustering, the approach proposed in [BCN*+*17] suggests that observing the temporal evolution of the power method applied to an initial random vector may, at least in some cases, provide equivalent information, without requiring explicit eigenvector computations.
1.1 Our contributions
The goal of this work is to take a further step in this direction by considering a more general class of graphs, even if still relatively “regular”, than the one considered in [BCN*+*17]. The analysis of the Averaging dynamics on this class is considerably harder, but it is likely to provide insights into the challenges of analyzing the general case, without all the intricacies of the latter. Our contribution is as follows:
- •
We define the class of -volume-regular graphs. This class of edge-weighted graphs includes those considered in [BCN*+*17] and it is the largest class of undirected, possibly weighted graphs that admit “stepwise” eigenvectors (i.e., having constant values over the steps that identify the hidden partition). This result uses a connection between volume regularity and lumpability of Markov chains [KS60, TK06].
- •
If the stepwise eigenvectors are those associated to the first eigenvalues and the gap between the -th and the (+1)-th eigenvalues is sufficiently large, we show that running the Averaging dynamics for a suitable number of steps allows recovery of the underlying community structure of the graph, with high probability.111An event holds with high probability (w.h.p.) if , for some constant . To prove this, we provide a family of mutually orthonormal vectors which, when the graph is volume-regular, span the eigenspace of the main eigenvectors of the normalized adjacency matrix of the graph. It should be noted that the first and second of these vectors are respectively the main eigenvector and the Fiedler vector [Fie89] associated to the normalized adjacency matrix.
- •
While the results of [BCN*+*17] apply when the underlying communities are of the same size, our results do not require this assumption and they apply to weighted graphs. It should also be noted that volume regularity is a weaker notion than regularity of the graph.
- •
We further show that variants of the Averaging dynamics (and/or its labeling rule) can address different problems (e.g., identifying bipartiteness) and/or other graph classes.
We further note that the overall algorithm we consider can be viewed as a fully decentralized, synchronous algorithm that works in anonymous networks,222Nodes do not possess distinguished identities. with a completely local clustering criterion, though it cannot be considered a dynamics in the sense of [BCN*+*17] since it requires a bound on the number of nodes in the underlying network.
Finally, this paper extends a preliminary version [BCPR19] in several ways. To begin, the main result presented in [BCPR19] was weaker, in the sense that the constraints imposed on the eigenvalues in [BCPR19, Theorem 9] polynomially depend on network parameters like the maximum degree and the number of vertices. In this respect, they are substantially stronger than those imposed to prove Theorem 4.1, where results (in particular, the time window in which recovery of the hidden partition is possible) are expressed in terms of the spectrum of the graph, while constraints imposed on the second eigenvalue only logarithmically depend on the aforementioned network parameters. In reframing these results, we also realized that the presence of a window in which recovery is possible is something that is hardly avoidable in general using the simple averaging heuristic of [BCN*+*17]. This is something we remark right after Theorem 4.1 (see Remark 1), while we also observe (see Remarks 3 and 4) that the analysis presented here also encompasses the class of regular graphs considered in [BCN*+*17] as a special case, something that was not obvious in [BCPR19]. Finally, the result given in [BCPR19] for bipartite graphs assumed volume regularity, an assumption that is not necessary as we show in Section 5.
1.2 Further related work
We briefly discuss further work that bears some relationship to this paper, either because it adopts simple and/or decentralized heuristics to uncover community structure, or because it relies on the use of spectral techniques.
Decentralized heuristics for block reconstruction.
Label propagation algorithms [RAK07] are dynamics based on majority updating rules [AAE08] and have been applied for detecting communities in complex networks. Several papers present experimental results for such protocols on specific classes of clustered graphs [BC09, LM10, RAK07]. The only available rigorous analysis of a label propagation algorithm on planted partition graphs is the one presented in [KPS13], where the authors analyze a label propagation algorithm on graphs in the case of dense topologies. In particular, their analysis considers the case where and , a parameter range in which very dense clusters of constant diameter separated by a sparse cut occur w.h.p. In this setting, characterized by a polynomial gap between and , simple combinatorial and concentration arguments show that the protocol converges in constant expected time. A logarithmic bound for sparser topologies is conjectured in [KPS13].
Following [BCN*+*17], a number of recent papers analyze simple distributed algorithms for community detection that rely on elementary dynamics. In the Averaging dynamics considered in this paper, every node communicates in parallel with all its neighbors in each round. While this might be too expensive in scenarios characterized by dense topologies, it is simply infeasible in other settings (for instance, when links represent opportunistic meetings that occur asynchronously). Motivated by similar considerations, a first line of follow-up work considered “sparsified”, asynchronous variants of the Averaging dynamics [BCM*+*18, MMM18, SZ17].
Another interesting direction is the rigorous analysis of well-known (non-linear) dynamics based on majority rules on graphs that exhibit community structure. In [CNNS18], Cruciani et al. consider the 2-Choices dynamics where, in each round, every node picks two random neighbors and updates its value to the most frequent among its value and those held by its sampled neighbors. They show that if the underlying graph has a suitable core-periphery structure and the process starts in a configuration where nodes in core and periphery have different states, the system either rapidly converges to the core’s state or reaches a metastable regime that reflects the underlying graph structure. Similar results have been also obtained for clustered regular graphs with dense communities in [CNS19], where the 2-Choices dynamics is proposed as a distributed algorithm for community detection.
Although based on the Averaging dynamics and thus extremely simple and fully decentralized, the algorithm we consider in this paper is not itself a dynamics in the sense proposed in [BCN*+*17], since its clustering criterion is applied within a time window, which in turn requires (at least approximate) knowledge of the network size.
Because of their relevance for the reconstruction problem, we also briefly discuss the class of belief propagation algorithms, best known as message-passing algorithms for performing inference in graphical models [Mac03]. Though not a dynamics, belief propagation is still a simple approach. Moreover, there is non-rigorous, strong supporting evidence that some belief propagation algorithms might be optimal for the reconstruction problem [DKMZ11]. A rigorous analysis is a major challenge; in particular, convergence to the correct value of belief propagation is far from being fully-understood on graphs which are not trees [MK07, Wei00]. As we discuss in the next subsection, more complex algorithms inspired by belief propagation have been rigorously shown to perform reconstruction optimally.
General algorithms for block reconstruction.
Several algorithms for community detection are spectral: They typically consider the eigenvector associated to the second largest eigenvalue of the adjacency matrix of , or the eigenvector corresponding to the largest eigenvalue of the matrix [Bop87, CO05, CO10, McS01],333 is the adjacency matrix of , is the matrix having all entries equal to , is the average degree, and is the number of vertices. since these are correlated with the hidden partition. More recently spectral algorithms have been proposed [AS15, BLM15, CO10, KMM*+*13, MNS13, PSZ17] that find a weak reconstruction even in the sparse, tight regime.
Interestingly, spectral algorithms turn out to be a feasible approach also in distributed settings. In particular, Kempe and McSherry [KM04] show that eigenvalue computations can be performed in a distributed fashion, yielding distributed algorithms for community detection under various models, including the stochastic block model. However, their algorithm does not match any simple decentralized computing model. In particular, the algorithm of Kempe and McSherry as well as any distributed version of the above mentioned centralized algorithms are neither dynamics, nor do they correspond to the notion of light-weight algorithm of Hassin and Peleg [HP01]. Moreover, the mixing time of the simple random walk on the graph is a bottleneck for the distributed algorithm of Kempe and McSherry and for any algorithm that performs community detection in a graph by employing the power method or the Lanczos method [Lan50] as a subroutine. This is not the case for the Averaging dynamics, since it removes the component of the state in the span of the main eigenvector.
In general, the reconstruction problem has been studied extensively using a multiplicity of techniques, which include combinatorial algorithms [DF89], belief propagation [DKMZ11] and variants of it [MNS16], spectral-based techniques [CO10, McS01], Metropolis approaches [JS98], and semidefinite programming [ABH14], among others.
1.3 Roadmap
The rest of this paper is organized as follows. In Section 2, we formally define the Averaging dynamics and briefly recall how it is connected with the transition matrix of a random walk on the underlying graph. We also define the notion of community-sensitive algorithm and the class of clustered volume-regular graphs. In Section 3 we show the relation between lumpability of Markov chains and volume-regular graphs. In Section 4 we state the main result of the paper (see Theorem 4.1) on the analysis of the Averaging for clustered volume-regular graphs: We give the two main technical lemmas and show how the main theorem derives from them. In Section 5, we show how slightly modified versions of the Averaging dynamics can be used to identify the hidden partition of other non-clustered volume-regular graphs, e.g., bipartite graphs. In Section 6 we briefly show how our approach can be extended to slightly more general graph classes than the ones considered in this paper. We finally highlight some open problems and directions for further research on the topic.
2 Preliminaries
Notation.
Consider an undirected edge-weighted graph with nonnegative weights. For each node , we denote by the volume, or weighted degree, of node , namely Similarly, we denote the volume of a set of nodes as . denotes the diagonal matrix, such that for each . Without loss of generality we assume , since the behavior of the Averaging dynamics (and the corresponding analysis) is not affected by a normalization of the weights. We refer to the maximum volume of a node as .
In the remainder, denotes the weighted adjacency matrix of , while is the transition matrix of a random walk on , in which a transition from node to node occurs with probability proportional to . We call the eigenvalues of , in non-increasing order, and a family of eigenvectors of , such that . We let denote the normalized weighted adjacency matrix of . Note that is real and symmetric (thus, the eigenvectors of are orthogonal) and that its spectrum is the same as that of . We denote by a family of eigenvectors of , such that . It is important to note that is an eigenvector of if and only if is an eigenvector of .
We use the Bachmann–Landau asymptotic notation (i.e., ) to describe the limiting behavior of functions depending on . In this sense, our results only hold for large . We say that an event holds with high probability (w.h.p., in short) if , for any positive constant .
2.1 Averaging dynamics
The simple algorithm we consider in this paper, named Averaging dynamics (Algorithm 1) after [BCN*+*17] in which the algorithm was first proposed, can be seen as an application of the power method, augmented with a Rademacher initialization and a suitable labeling scheme. In this form, it is best described as a distributed process, executed by the nodes of an underlying edge-weighted graph. The Averaging dynamics can be used as a building-block to achieve “community detection” in some classes of “regular” and “almost regular” graphs. Herein, we extend its use and analysis to broader graph classes and, in one case, to a different problem.
Spectral decomposition of the transition matrix.
Let denote the state vector at time , i.e., the vector whose -th entry is the value held by node at time . We let denote the initial state vector. Globally, the averaging update rule of Algorithm 1 corresponds to one iteration of the power method, in this case an application of the transition matrix to the current state vector, i.e., . We can write
[TABLE]
where in we spectrally decomposed the matrix and expressed the vector as a linear combination of the eigenvectors of , i.e., , with ; in we used that the eigenvectors of are orthonormal, i.e., that for every and that for every and such that . By explicitly writing the s and by noting that we conclude that
[TABLE]
where is the length of the projection of on .
Note that and ,444Here and in the remainder, denotes the vector whose entries are . since is stochastic, and for every , if is connected and non bipartite. The long term behavior of the dynamics can be written as
[TABLE]
i.e., each node converges to the initial global weighted average of the network.
2.2 Community-sensitive algorithms
We give the following definition of community-sensitive algorithm, that closely resembles that of locality-sensitive hashing (see, e.g., [LRU14]).
Definition 2.1** (Community-sensitive algorithm).**
Let be a randomized algorithm that takes in input a (possibly weighted) graph with a hidden partition and assigns a Boolean value to each node . We say is an -community-sensitive algorithm, for some , if the following two conditions hold:
For each set of the partition and for each pair of nodes in that set, the probability that the algorithm assigns the same Boolean value to and is at least :
[TABLE] 2. 2.
For each pair of distinct sets of the partition and for each pair of nodes and , the probability that the algorithm assigns the same value to and is at most :
[TABLE]
For example, for , an algorithm that simply assigns the same value to all nodes would satisfy the first condition but not the second one, while an algorithm assigning [math] or to each node with probability , independently of the other nodes, would satisfy the second condition but not the first one.
Note that Algorithm 1 is a distributed algorithm that, at each round , assigns one out of two labels to each node of a graph. In the next section (see Theorem 4.1) we prove that a time window exists, such that for all rounds , the assignment of the Averaging dynamics satisfies both conditions in Definition 2.1: The first condition with , the second with .
Community-sensitive labeling.
We here generalize the concept of community-sensitive labeling (appeared in [BCM*+*18, Definition 3]), given only for the case of two communities, to the case of multiple communities. If we execute independent runs of an -community-sensitive algorithm , each node is assigned a signature of binary values, with pairwise Hamming distances probabilistically reflecting community membership of the nodes. More precisely, let be an -community-sensitive algorithm and let be independent runs of . For each node , let denote the signature of node , where . For each pair nodes , let be the Hamming distance between and .
Lemma 2.2** (Community-sensitive labeling).**
Let be an -community-sensitive algorithm with for any arbitrarily small positive constant , and . Let , with , and for any constant and such that . Then, for each pair of nodes it holds that:
If and belong to the same community then , w.h.p. 2. 2.
If and belong to different communities then , w.h.p.
Proof.
From the definition of -community-sensitive algorithm we have that, if and belong to the same community, then . Similarly, if they belong to different communities, then . If and belong to the same community, we compute and by Markov inequality we get that
[TABLE]
where in the last inequality we use the hypothesis and . On the other hand, if and belong to different communities, we apply Theorem A.1 to by using the lower bound on the expected value of and the hypothesis . Thus,
[TABLE]
where is a positive constant. The thesis follows by combing Eqs. 2 and 3. ∎
2.3 Volume-regular graphs
Recall that, for an undirected edge-weighted graph , we denote by the volume a node , i.e., . Note that the transition matrix of a random walk on is such that . Given a partition of the set of nodes , for a node and a partition index , denotes the overall weight of edges connecting to nodes in , Hence, .
Definition 2.3** (Volume-regular graph).**
Let be an undirected edge-weighted graph with nodes and let be a -partition of the nodes, for some . We say that is volume regular with respect to if, for every pair of partition indexes and for every pair of nodes , We say that is -volume regular if there exists a -partition of the nodes such that is volume regular with respect to .
In other words, is volume regular if there exists a partition of the nodes such that the fraction of a node’s volume toward a set of the partition is constant across nodes of the same set. Note that all graphs with nodes are trivially - and -volume regular.
Let be a -volume regular graph and let be the transition matrix of a random walk on . In the next lemma we prove that the span of linearly independent eigenvectors of equals the span of the indicator vectors of the communities of . The proof makes use of the correspondence between random walks on volume regular graphs and ordinary lumpable Markov chains [KS60]; in particular the result follows from Lemma 3.2 and Lemma 3.3, that we prove in Section 3.
Lemma 2.4**.**
Let be the transition matrix of a random walk on a -volume regular graph with -partition . There exists a family of linearly independent eigenvectors of such that with the indicator vector of the -th set of the partition, for .
In the rest of the paper we call “stepwise” the eigenvectors of that can be written as linear combinations of the indicator vectors of the communities. In the next definition, we formalize the fact that a -volume regular graph is clustered if the linearly independent stepwise eigenvectors of , whose existence is guaranteed by the above lemma, are associated to the largest eigenvalues of .
Definition 2.5** (Clustered volume regular graph).**
Let be a -volume regular graph and let be the transition matrix of a random walk on . We say that is a clustered -volume regular graph if the stepwise eigenvectors of are associated to the first largest eigenvalues of .
3 Volume-regular graphs and lumpable Markov chains
The class of volume-regular graphs is deeply connected with the definition of lumpability [KS60] of Markov chains. We here first recall the definition of lumpable Markov chain and then show that a graph is volume-regular if and only if the associated weighted random walk is a lumpable Markov chain.
Definition 3.1** (Ordinary lumpability of Markov Chains).**
Let be a finite Markov chain with state space and transition matrix and let be a partition of the state space. Markov chain is ordinary lumpable with respect to if, for every pair of partition indexes and for every pair of nodes in the same set of the partition , it holds that
[TABLE]
We define the lumped matrix of the Markov Chain as the matrix such that , for any .
We first prove that random walks on Volume-regular graphs define exactly the subset of reversible and ordinary lumpable Markov chains.
Lemma 3.2**.**
A reversible Markov chain is ordinary lumpable if and only if it is a random walk on a volume-regular graph.
Proof.
Assume first that is ordinary lumpable and let be the corresponding transition matrix. Consider the weighted graph obtained from as follows: corresponds to the set of states in , while , for every , with the stationary distribution of . Note that is an undirected graph, i.e., where holds because is reversible. Moreover
[TABLE]
where holds because is stochastic. Thus meets Definition 2.3 because, for any ,
[TABLE]
Next, assume is -volume-regular with respect to the partition . Let be the transition matrix of the corresponding random walk. For every and for every we have:
[TABLE]
where follows from Definition 2.3. Moreover note that is reversible with respect to distribution , where . ∎
Note that infinitely many -volume-regular graphs have the same -ordinary lumpable random walk chain.
We next show that a Markov chain is -ordinary lumpable if and only if the corresponding transition matrix has stepwise, linearly independent eigenvectors.
Lemma 3.3**.**
Let be the transition matrix of a Markov chain. Then has stepwise linearly independent eigenvectors if and only if is ordinary lumpable.
Proof.
We divide the proof in two parts. First, we assume that is ordinary lumpable and show that has stepwise linearly independent eigenvectors. Second, we assume that has stepwise linearly independent eigenvectors and show that is ordinary lumpable.
Let be ordinary lumpable and its lumped matrix. Let be the eigenvalues and eigenvectors of , for each . Let be a stepwise vector defined as
[TABLE]
where indicates the -th component of , and then the components relative to are all equal to .
Since the eigenvectors of are linearly independent, the vectors are also linearly independent. Moreover, it is easy to see that by just verifying the equation for every .
Assume has stepwise linearly independent eigenvectors , associated to eigenvalues , for each . Let the vector that has as components the constant values in the steps of . Since the are linearly independent, the also are.
For every eigenvector and for every two states , for every , we have that since is stepwise. Then, since , we have that
[TABLE]
Thus and then it follows that
[TABLE]
where . Since the are linearly independent vectors in a -dimensional space, cannot be orthogonal to all of them and then it has to be the null vector, i.e., for all . This implies that is ordinary lumpable, i.e., . It is easy to verify that the eigenvalues and eigenvectors of are exactly , with . ∎
4 Averaging dynamics on clustered volume regular graphs
Let and be the maximum and minimum sizes of the communities of a volume-regular graph with nodes and -partition . Recall also that is the maximum weighted degree of the nodes of and are the eigenvalues of the transition matrix of a random walk on (see Section 2). In this section we prove the following result.
Theorem 4.1**.**
Let be a connected clustered -volume-regular graph with nodes and -partition , such that and . Assume further that and . A non-empty time interval exists, with and , such that for each , the Averaging dynamics truncated at round is a -community-sensitive algorithm.
Remark 1** (The extent of the time-window).**
Notice that the time window cannot be too long: by Cheeger’s inequality ,555This can be seen by observing that: the minimum volume of a cut must be at least half the minimum degree of the graph, which we normalize to , and in computing , we restrict to subsets of volume at most , which is at most . thus .
Remark 2** (The extent of non-regularity).**
Notice that the condition implies
[TABLE]
In other words, the Averaging dynamics gives a good community-sensitive labeling when the communities are not too unbalanced in terms of their volumes. Moreover, the smaller the number of communities the more the volume-balance requirement is tight.
In the remainder of this section, we first introduce further notation and then state the main technical lemmas (Lemmas 4.2, 4.3 and 4.4), that will be used in the proof of Theorem 4.1, which concludes this section.
Let be a clustered -volume regular graph and, without loss of generality, let be an arbitrary ordering of its communities. We introduce a family of stepwise vectors that generalize Fiedler vector [Fie89], namely
[TABLE]
where is the indicator vector of the set and, for convenience sake, we denoted by the volume of the -th community, the set of all nodes in communities , and the volume of , i.e., , , and Note that vectors s are “stepwise” with respect to the communities of (i.e., for every , whenever and belong to the same community).
Recall from Eq. 1 that the initial state vector can be written as . Let and note that by applying Lemma 2.4 and because . Let us now define the vector or, equivalently,
[TABLE]
Note that the coefficients s are proportional to the length of the projection of the (inhomogeneously) contracted state vector on the (inhomogeneously) contracted s; the previous expression is valid since the vectors in are mutually orthogonal.666The mutual orthogonality of the vectors, including , is also one of the reasons why other “simpler” families of stepwise vectors, e.g., the indicator vectors of the communities, are not used instead.
In Lemma 4.2 we show that every component of , i.e., the projection of the (inhomogeneously) contracted initial state vector on the (inhomogeneously) contracted vectors s, is not too small, w.h.p.
Lemma 4.2** (Length of the projection of the state vector).**
Let be a connected clustered -volume-regular graph with nodes and -partitions . Under the hypotheses of Theorem 4.1, for every ,
[TABLE]
Proof.
Without loss of generality, we assume , which possibly just amounts to a relabeling of the nodes. With this assumption, we have
[TABLE]
where the second equality follows from the definitions of the ’s (Eq. 5) and the fact that . Next, observe that we have:
[TABLE]
where . We now bound
[TABLE]
More precisely, we prove that it is at least with probability , where probability is computed over the randomness of .
Assume for the moment that . From the definition of we have:
[TABLE]
Now, set and note that from the hypothesis that and since , for every . We can thus apply Theorem A.5 to with , so that we can write:
[TABLE]
where the equality follows since . Hence, with probability we have and thus, with the same probability:
[TABLE]
Assume now that . This time we write:
[TABLE]
and we set . Note that, again, for every . Proceeding as in the previous case we obtain with probability and thus, with the same probability:
[TABLE]
where in we used that (see Remark 2). This concludes the proof. ∎
In Lemma 4.3 we show that given any “pair of steps” of the vector (defined in Eq. 6), the two steps have different signs, with constant probability.
Lemma 4.3** (Different communities, different signs).**
Let be a clustered -volume regular graph with maximum weighted degree and with . For each pair of nodes and , with , it holds that
[TABLE]
Proof.
Since the ordering of the communities (and consequent definition of the ’s, given in Eq. 5) is completely arbitrary, we can assume and , without loss of generality. Let us define , where is the initial state vector.
Note that and , since the other terms of the s are equal to 0 on the components relative to and . Thus, with some algebra, we get
[TABLE]
where . Note that, by linearity of expectation, . Moreover, since the terms s are independent Rademacher random variables, we can write the standard deviation of as
[TABLE]
Then we can upper and lower bound the standard deviation getting where the lower bound follows from , where is the vector of weighted degrees of nodes in community , and for the upper bound we used that , for each .
Let us now define the following three events:
; 2. 2.
; 3. 3.
,
When are true it follows directly that . As for we have
[TABLE]
since, for the last inequality, by hypothesis.
Note that all three events have probability at least constant and, being the events independent, also is constant. Indeed, it is possible to prove the constant lower bounds on the probabilities by approximating the random variables with Gaussian ones using Berry-Esseen’s theorem (Theorem A.4). Note that all are of the form , for some and where . Recall that and that . Moreover, note that the third absolute moment of is Therefore we can apply Theorem A.4 which claims that there exists a positive constant [Ber41] such that, for every ,
[TABLE]
where is the cumulative distribution function of the standardized normal distribution. Thus
[TABLE]
Since by hypothesis and for every , taking it follows from Eq. 7 that
[TABLE]
since . Since the distribution of is symmetric for every , it holds that . Similarly, it also holds that
[TABLE]
Recall that the binary labeling of each node only depends on the difference of its state in two consecutive rounds (see Algorithm 1). In Lemma 4.4 we show that, under suitable assumptions on the transition matrix of a random walk on , a large enough time window exists where, for each node , the sign of the difference of the state vector across two consecutive rounds equals the sign of , w.h.p. Since (defined in Eq. 6) is a stepwise vector, this implies that two nodes in the same community have the same label, w.h.p. For the sake of readability, in the proof of Lemma 4.4 we use two technical lemmas as black boxes, postponing their proofs to Subsection 4.1.
Lemma 4.4** (Sign of the difference).**
Let be a clustered -volume regular graph with maximum weighted degree . If , , for every , then a non-empty time interval exists, with and , such that, for every and every of the Averaging dynamics,
[TABLE]
Proof.
Recall from Eq. 1 that the state vector at time , i.e., , can be written as the sum of the first stepwise vectors of and of the remaining ones, namely
[TABLE]
In what follows we call the core contribution and the error contribution. If we look at the difference of the state vector in two consecutive rounds, the first term cancels out being constant over time, so that
[TABLE]
for each node . Note that the sign of the difference between two consecutive rounds is determined by the difference of the core contributions, , whenever
[TABLE]
To identify conditions on for which Eq. 8 holds, we give suitable bounds on both hand sides of the inequality. In more detail:
In Lemma 4.5 we prove that for every , so that
[TABLE] 2. 2.
In Lemma 4.6 we prove that for every and for every time , where ; note that the hypotheses on imply . Moreover, the assumptions of Lemma 4.6 are satisfied, since and .
Combining Lemma 4.5 and Lemma 4.6, we see that Eq. 8 holds whenever
[TABLE]
An easy calculation shows that this happens for all , where
[TABLE]
Note that and, e.g., whenever .
We next show that, under the assumptions of the lemma, the window is not empty and, actually, it has a width that depends on the magnitude of and the ratio . To this purpose, we first observe that Cheeger’s inequality for weighted graphs (Theorem A.3) implies (recall the footnote in Remark 1). Moreover, recalling that we are assuming ,777It may be worth recalling that our hypothesis on holds with high probability from Lemma 4.2. we have:
[TABLE]
where: in we used Cheeger’s inequality (Theorem A.3) in the way described above and our assumptions on , in we used the hypothesis on , which implies , and in the lower bound on given by Lemma 4.6.
From Lemma 4.6 we also know that for every time ; therefore we conclude that
[TABLE]
for every node and for every round of the Averaging dynamics. ∎
Proof of Theorem 4.1.
The binary labeling of the nodes of produced by the Averaging dynamics during the time window is such that the two conditions required by the definition of -community-sensitive algorithm (Definition 2.1) are met, with and . Indeed, the first condition follows directly from Lemma 4.4 together with the fact that is a “stepwise” vector, while Lemma 4.2 implies that , since is not too small with probability at least . The second condition, instead, follows directly from the combination of Lemmas 4.4 and 4.3. ∎
Remark 3** (Equal-sized communities).**
If , then an alternative version of Lemma 4.6 would tell us that, for every node , and thus , in every round (with no need of ); this would imply an infinite time window starting at the first round (where the “error contribution” becomes small). In this sense our result also covers the case of multiple communities analyzed in [BCN*+*17], with equal-sized communities in an unweighted graph and then .
Remark 4** (Two communities).**
Our result also generalizes that of [BCN*+*17] in the simpler case of two communities. In fact we don’t require the graph to be regular, but only volume-regular, thus taking into account communities that are potentially unbalanced. Ideed, for , the Averaging dynamics truncated at round is a -community-sensitive algorithm for every round , with . Therefore, a single run of the dynamics highlights the community structure, i.e., the sign of the difference is equal for nodes in the same communities and different for nodes in different communities, w.h.p.
4.1 Proofs for Lemma 4.4
In this section we prove the two lemmas used in the proof of Lemma 4.4: the upper bound on the “error contribution” and the lower bound on the “core contribution.”
Lemma 4.5** (Upper bound on the error contribution).**
Let . For every , it holds that
[TABLE]
Proof.
To bound all components of vector we use its norm, defined for any vector as . In particular
[TABLE]
By using Cauchy-Schwarz inequality (Theorem A.2) and applying the definition of spectral norm of an operator, i.e., , we get that
[TABLE]
since the s are orthonormal. With some additional simple bounds it follows that
[TABLE]
By using the fact that the spectral norm of a diagonal matrix is equal to its maximum value, we conclude that
[TABLE]
Thus, for every it holds that ∎
In Lemma 4.6 we show that the difference of the core contribution in consecutive rounds can be approximated, for our purposes in Lemma 4.4, with .
Lemma 4.6** (Lower bound on the core contribution).**
Let and let for every . If , then, for every and for every , with , the following holds:
- •
;
- •
.
Proof.
Let us define . Note that
[TABLE]
where in the last equality we applied Lemma 2.4 to get , and where we defined . Using the definition of , we get
[TABLE]
Note that , for every . Since the minimum and the maximum are obtained for and respectively, we have . Let us call the positive and negative terms of as
[TABLE]
Therefore, for each , it holds that
[TABLE]
In the following we look for a time such that, for every it holds that
[TABLE]
Note that . We consider two cases: and .
Case : We look for a time such that, for every time , it holds that
[TABLE]
Indeed, since , we can use to upper bound the right hand side of the previous equation, so that Eq. 13 holds for every that satisfies:
[TABLE]
i.e., for every , where
[TABLE]
Next, note that whenever . Hence, under the hypotheses of the lemma, we can use that for every and for every . Thus:
[TABLE]
Plugging Eq. 13 into Eq. 11 we finally get
[TABLE]
Case : Proceeding along the same lines we obtain:
[TABLE]
for every . Therefore, by combining Eq. 15 and Eq. 12 we obtain
[TABLE]
Finally, by combining Eq. 14 and Eq. 16:
- •
;
- •
.∎
5 Bipartite graphs
Assume is an edge-weighted bipartite graph with and , i.e. a graph with hidden partition identified by the bipartition. In this case, basic properties of random walks imply that the Averaging dynamics does not converge to the global (weighted) average of the values, but it periodically oscillates. In fact, in this case the transition matrix has an eigenvector with eigenvalue (as implied by Lemma 5.1). Thus, the state vector is mainly affected by the eigenvectors associated to the two eigenvalues of absolute value (i.e., and ). After a number of rounds of the dynamics that depends on , we have that, in even rounds, all nodes in () have a state that is close to some local average ; in odd rounds, these values are swapped (as shown in Eq. 17).
If one were observing the process in even rounds,888Or, equivalently, in odd rounds. however, the states of nodes in would converge to and those of nodes in would converge to . Unfortunately, convergence to the local average for nodes belonging to the same community does not eventually become monotone (i.e., increasing or decreasing). This follows since the eigenvector associated to is no longer stepwise in general. However, we can easily modify the labeling scheme of the Averaging dynamics to perform bipartiteness detection as follows: Nodes apply the labeling rule every two time steps and they do it between the states of two consecutive rounds, i.e., each node sets if and otherwise. We call this new protocol Averaging Bipartite dynamics.
We now show how Averaging Bipartite dynamics can perform bipartiteness detection. Recall that we denote with the weighted adjacency matrix of . Since is undirected and bipartite, the matrix can be written as
[TABLE]
Thus, the transition matrix of a random walk on , i.e., where is a diagonal matrix and , has the form
[TABLE]
Lemma 5.1 shows that the spectrum of is symmetric and it gives a relation between the eigenvectors of symmetric eigenvalues.
Lemma 5.1**.**
Let be an edge-weighted undirected bipartite graph with bipartition and such that . If , with , is an eigenvector of with eigenvalue , then is an eigenvector of with eigenvalue .
Proof.
If then we have that and . Using these two equalities we get that . Indeed,
[TABLE]
The transition matrix is stochastic, thus the vector (i.e., the vector of all ones) is an eigenvector associated to , that is the first largest eigenvalue of . Lemma 5.1 implies that is an eigenvector of with eigenvalue .
As in Section 2, we write the state vector at time using the spectral decomposition of . Let be the eigenvalues of . We denote by a family of linearly independent eigenvectors of , where each is the eigenvector associated to . Thus, we have that
[TABLE]
where . The last equation implies that does not converge to some value as tends to infinity, but oscillates. In particular, nodes in on even rounds and nodes in on odd rounds, converge to . Instead in the symmetric case, i.e., odd rounds for nodes in and even rounds for nodes in , the process converges to . These quantities are proportional to the weighted average of the initial values in the first and in the second partition, respectively.
Theorem 5.2, whose proof follows, shows that Averaging Bipartite dynamics performs bipartiteness detection in rounds. Note that, as in the case of volume-regular graphs with two communities (see Remark 4), one single run of the dynamics identifies the bipartition. Moreover, if , then the Averaging Bipartite dynamics takes logarithmic time to find the bipartition.
Theorem 5.2**.**
Let be an edge-weighted bipartite graph with bipartition and maximum weighted degree , for any arbitrary positive constant . Then for every time , with , the Averaging Bipartite dynamics truncated at round is a -community-sensitive algorithm.
Proof.
We assume that the labeling rule is applied between every even and every odd round (conversely, the signs of the nodes in the analysis are swapped). Recall the definition of the error contribution, namely We compute the difference between the state vectors of two consecutive steps by using Eq. 17, namely
[TABLE]
We want to find a time such that for every the sign of a node depends only on , i.e., . Since , the last equation holds whenever
[TABLE]
We upper bound by using Lemma 4.5, getting that . Therefore, with some algebra we get that Eq. 18 holds in every round , where is defined as
[TABLE]
To conclude the proof, we provide a lower bound on showing that it is not too small, w.h.p. Recall that and thus
[TABLE]
The lower bound then follows, with high probability. Indeed,
[TABLE]
where in we used Eq. 19 and in we applied Theorem A.5. The thesis then follows from the above bound on and from the hypothesis on , for any arbitrary positive constant . ∎
6 Discussion and Outlook
The focus of this work is on heuristics that implicitely perform spectral graph clustering, without explicitely computing the main eigenvectors of a matrix describing connectivity properties of the underlying network (typically, its Laplacian or a related matrix). In this perspective, we extended the work of Becchetti et al. [BCN*+*17] in several ways. In particular, for communities, [BCN*+*17] considered an extremely regular case, in which the second eigenvalue of the (normalized) Laplacian has algebraic and geometric multiplicities and the corresponding eigenspace is spanned by a basis of indicator vectors. We considered a more general case in which the first eigenvalues are in general different, but the span of the corresponding eigenvectors again admits a base of indicator vectors. We also made a connection between this stepwise property and lumpability properties of the underlying random walk, which results in a class of volume-regular graphs, that may not have constant degree, nor exhibit balanced communities. We further showed that our approach naturally lends itself to addressing related, yet different problems, such as identifying bipartiteness. Finally, in the paragraphs that follow we discuss extensions to slightly more general classes than the ones considered in this work.
Other graph classes.
Consider -volume regular graphs whose stepwise eigenvectors are associated to the largest eigenvalues, in absolute value. These graphs include many -partite graphs (e.g., regular ones), graphs that are “close” to being -partite (i.e., ones that would become -partite upon removal of a few edges). Differently from the clustered case (Theorem 4.1) some of the eigenvalues can in general be negative.
Consider the following variant of the labeling scheme of the Averaging dynamics, in which nodes apply their labeling rule only on even rounds, comparing their value with the one they held at the end of the last even round, i.e., each node sets if and otherwise. Since the above protocol amounts to only taking even powers of eigenvalues, the analysis of this modified protocol proceeds along the same lines as the clustered case, while the results of Theorem 4.1 seamlessly extend to this class of graphs.
Outlook.
Though far from conclusive, we believe our results point to potentially interesting directions for future research. In general, our analysis sheds further light on the connections between temporal evolution of the power method and spectral-related clustering properties of the underlying network. At the same time, we showed that variants of the Averaging dynamics (and/or its labeling rule) might be useful in addressing different problems and/or other graph classes, as the examples given in Section 5 suggest. On the other hand, identifying hidden partitions using the algorithm presented in [BCN*+*17] requires relatively strong assumptions on the main eigenvalues and knowledge of an upper bound to the graph size,999As anecdotal experimental evidence suggests, the presence of a time window to perform labeling is not an artifact of our analysis. while the analysis becomes considerably more intricate than the perfectly regular and completely balanced case addressed in [BCN*+*17]. Some aspects of our analysis (e.g., the aforementioned presence of a size-dependent time window in which the labeling rule has to be applied) suggest that more sophisticated variants of the Averaging dynamics might be needed to express the full power of a spectral method that explicitely computes the main eigenvectors of a graph-related matrix. While we believe this goal can be achieved, designing and analyzing such an algorithm might prove a challenging task.
Appendix
Appendix A Useful inequalities
Theorem A.1** (Extension of Chernoff Bounds [DP09]).**
Let where are independent distributed random variables taking values in and let . Suppose that . Then, for ,
[TABLE]
[TABLE]
Theorem A.2** (Cauchy-Schwarz’s inequality).**
For all vectors of an inner product space it holds that where is the inner product.
Theorem A.3** (Cheeger’s inequality [Chu96]).**
Let be the transition matrix of a connected edge-weighted graph and let be its second largest eigenvalue. Let and Then
[TABLE]
Theorem A.4** (Berry-Esseen’s theorem [Ber41]).**
Let be independent random variables with mean , variance , and third absolute moment , for every . Let and let be the standard deviation of ; let be the cumulative distribution function of ; let the cumulative distribution function of the standard normal distribution. Then, there exists a positive constant such that, for all and for all ,
[TABLE]
where .
Theorem A.5** (Littlewood-Offord’s small ball [Erd45]).**
Let be real numbers with for every and let be any real number. Let be a family of independent Rademacher random variables (taking values with probability ) and let be their sum weighted with the s, i.e., , then
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AAE 08] Dana Angluin, James Aspnes, and David Eisenstat. A Simple Population Protocol for Fast Robust Approximate Majority. Distributed Computing , 21(2):87–102, 2008. (Preliminary version in DISC’07).
- 2[ABH 14] Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Trans. on Information Theory , 62(1):471–487, 2014.
- 3[AS 15] Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap. ar Xiv preprint ar Xiv:1512.09080 , 2015.
- 4[BC 09] Michael J. Barber and John W. Clark. Detecting network communities by propagating labels under constraints. Physical Review E , 80(2):026129, 2009.
- 5[BCM + 18] Luca Becchetti, Andrea E.F. Clementi, Pasin Manurangsi, Emanuele Natale, Francesco Pasquale, Prasad Raghavendra, and Luca Trevisan. Average whenever you meet: Opportunistic protocols for community detection. In 26th Annual European Symposium on Algorithms, ESA 2018, August 20-22, 2018, Helsinki, Finland , pages 7:1–7:13, 2018.
- 6[BCN + 17] Luca Becchetti, Andrea E.F. Clementi, Emanuele Natale, Francesco Pasquale, and Luca Trevisan. Find your place: Simple distributed algorithms for community detection. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 , pages 940–959, 2017.
- 7[BCPR 19] Luca Becchetti, Emilio Cruciani, Francesco Pasquale, and Sara Rizzo. Step-by-step community detection in volume-regular graphs. In 30th International Symposium on Algorithms and Computation (ISAAC 2019) . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
- 8[Ber 41] Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the american mathematical society , 49(1):122–136, 1941.
