Step-by-Step Community Detection in Volume-Regular Graphs

Luca Becchetti; Emilio Cruciani; Francesco Pasquale; Sara Rizzo

arXiv:1907.07149·cs.DC·May 11, 2020

Step-by-Step Community Detection in Volume-Regular Graphs

Luca Becchetti, Emilio Cruciani, Francesco Pasquale, Sara Rizzo

PDF

TL;DR

This paper extends spectral community detection methods to volume-regular graphs, showing that under certain spectral gap conditions, the community structure can be efficiently recovered without explicit eigenvector computation.

Contribution

It generalizes previous approaches to a broader class of graphs, establishing a connection between volume regularity and Markov chain lumpability for community detection.

Findings

01

Community structure can be recovered in logarithmic time.

02

The class of volume-regular graphs admits stepwise eigenvectors.

03

Spectral gap conditions ensure successful recovery.

Abstract

Spectral techniques have proved amongst the most effective approaches to graph clustering. However, in general they require explicit computation of the main eigenvectors of a suitable matrix (usually the Laplacian matrix of the graph). Recent work (e.g., Becchetti et al., SODA 2017) suggests that observing the temporal evolution of the power method applied to an initial random vector may, at least in some cases, provide enough information on the space spanned by the first two eigenvectors, so as to allow recovery of a hidden partition without explicit eigenvector computations. While the results of Becchetti et al. apply to perfectly balanced partitions and/or graphs that exhibit very strong forms of regularity, we extend their approach to graphs containing a hidden $k$ partition and characterized by a milder form of volume-regularity. We show that the class of $k$ -volume-regular graphs…

Equations158

x^{(t)} = P^{t} x = D^{- \frac{1}{2}} N^{t} D^{\frac{1}{2}} x = (a) D^{- \frac{1}{2}} i = 1 \sum n λ_{i}^{t} w_{i} w_{i}^{⊺} i = 1 \sum n β_{i} w_{i} = (b) i = 1 \sum n λ_{i}^{t} β_{i} D^{- \frac{1}{2}} w_{i},

x^{(t)} = P^{t} x = D^{- \frac{1}{2}} N^{t} D^{\frac{1}{2}} x = (a) D^{- \frac{1}{2}} i = 1 \sum n λ_{i}^{t} w_{i} w_{i}^{⊺} i = 1 \sum n β_{i} w_{i} = (b) i = 1 \sum n λ_{i}^{t} β_{i} D^{- \frac{1}{2}} w_{i},

x^{(t)} = i = 1 \sum n λ_{i}^{t} \frac{⟨ D ^{\frac{1}{2}} x , D ^{\frac{1}{2}} v _{i} ⟩}{∥ D ^{\frac{1}{2}} v _{i} ∥} D^{- \frac{1}{2}} \frac{D ^{\frac{1}{2}} v _{i}}{∥ D ^{\frac{1}{2}} v _{i} ∥} = i = 1 \sum n λ_{i}^{t} α_{i} v_{i},

x^{(t)} = i = 1 \sum n λ_{i}^{t} \frac{⟨ D ^{\frac{1}{2}} x , D ^{\frac{1}{2}} v _{i} ⟩}{∥ D ^{\frac{1}{2}} v _{i} ∥} D^{- \frac{1}{2}} \frac{D ^{\frac{1}{2}} v _{i}}{∥ D ^{\frac{1}{2}} v _{i} ∥} = i = 1 \sum n λ_{i}^{t} α_{i} v_{i},

t \to \infty lim x^{(t)} = t \to \infty lim i = 1 \sum n λ_{i}^{t} α_{i} v_{i} = α_{1} 1, with α_{1} = \frac{\sum _{u \in V} δ ( u ) x ( u )}{\sum _{u \in V} δ ( u )} = u \in V \sum \frac{δ ( u )}{vol ( V )} x (u),

t \to \infty lim x^{(t)} = t \to \infty lim i = 1 \sum n λ_{i}^{t} α_{i} v_{i} = α_{1} 1, with α_{1} = \frac{\sum _{u \in V} δ ( u ) x ( u )}{\sum _{u \in V} δ ( u )} = u \in V \sum \frac{δ ( u )}{vol ( V )} x (u),

\forall i \in [k], \forall u, v \in V_{i}, P (A (G) [u] = A (G) [v]) ⩾ 1 - ε .

\forall i \in [k], \forall u, v \in V_{i}, P (A (G) [u] = A (G) [v]) ⩾ 1 - ε .

\forall i, j \in [k] \mbox w i t h i \neq = j, \forall u \in V_{i}, \forall v \in V_{j}, P (A (G) [u] = A (G) [v]) ⩽ δ .

\forall i, j \in [k] \mbox w i t h i \neq = j, \forall u \in V_{i}, \forall v \in V_{j}, P (A (G) [u] = A (G) [v]) ⩽ δ .

P (h (u, v) ⩾ α ℓ) ⩽ \frac{E [ h ( u , v ) ] mi ss in g}{α ℓ} ⩽ \frac{ε}{α} = O (\frac{1}{n ^{c}}),

P (h (u, v) ⩾ α ℓ) ⩽ \frac{E [ h ( u , v ) ] mi ss in g}{α ℓ} ⩽ \frac{ε}{α} = O (\frac{1}{n ^{c}}),

P (h (u, v) < b (1 - δ) ℓ) ⩽ exp (- \frac{( 1 - b ) ^{2}}{2} (1 - δ) ℓ) = O (\frac{1}{n ^{d}}),

P (h (u, v) < b (1 - δ) ℓ) ⩽ exp (- \frac{( 1 - b ) ^{2}}{2} (1 - δ) ℓ) = O (\frac{1}{n ^{d}}),

w \in V_{i} \sum P_{u w} = w \in V_{i} \sum P_{v w}, \forall u, v \in V_{j} .

w \in V_{i} \sum P_{u w} = w \in V_{i} \sum P_{v w}, \forall u, v \in V_{j} .

δ (u) = z \in V \sum w (u, v) = z \in V \sum π (u) P_{uv} = π (u) z \in V \sum P_{uv} = (a) π (u),

δ (u) = z \in V \sum w (u, v) = z \in V \sum π (u) P_{uv} = π (u) z \in V \sum P_{uv} = (a) π (u),

\frac{δ _{j} ( u )}{δ ( u )} = \frac{1}{π ( u )} z \in V_{j} \sum w (u, z) = z \in V_{j} \sum P_{u z} = z \in V_{j} \sum P_{v z} = \frac{1}{π ( v )} z \in V_{j} \sum w (v, z) = \frac{δ _{j} ( v )}{δ ( v )} .

\frac{δ _{j} ( u )}{δ ( u )} = \frac{1}{π ( u )} z \in V_{j} \sum w (u, z) = z \in V_{j} \sum P_{u z} = z \in V_{j} \sum P_{v z} = \frac{1}{π ( v )} z \in V_{j} \sum w (v, z) = \frac{δ _{j} ( v )}{δ ( v )} .

z \in V_{j} \sum P_{u z} = z \in V_{j} \sum \frac{w ( u , z )}{δ ( u )} = \frac{δ _{j} ( u )}{δ ( u )} = (a) \frac{δ _{j} ( v )}{δ ( v )} = z \in V_{j} \sum \frac{w ( v , z )}{δ ( v )} = z \in V_{j} \sum P_{v z},

z \in V_{j} \sum P_{u z} = z \in V_{j} \sum \frac{w ( u , z )}{δ ( u )} = \frac{δ _{j} ( u )}{δ ( u )} = (a) \frac{δ _{j} ( v )}{δ ( v )} = z \in V_{j} \sum \frac{w ( v , z )}{δ ( v )} = z \in V_{j} \sum P_{v z},

w_{i} = (∣ V_{1} ∣ times v_{i} (1), \dots, v_{i} (1), ∣ V_{2} ∣ times v_{i} (2), \dots, v_{i} (2), \dots, ∣ V_{k} ∣ times v_{i} (k), \dots, v_{i} (k))^{⊺},

w_{i} = (∣ V_{1} ∣ times v_{i} (1), \dots, v_{i} (1), ∣ V_{2} ∣ times v_{i} (2), \dots, v_{i} (2), \dots, ∣ V_{k} ∣ times v_{i} (k), \dots, v_{i} (k))^{⊺},

j = 1 \sum k z \in V_{j} \sum P_{x z} v_{i} (j) = (P w_{i}) (x) = (P w_{i}) (y) = j = 1 \sum k z \in V_{j} \sum P_{y z} v_{i} (j) .

j = 1 \sum k z \in V_{j} \sum P_{x z} v_{i} (j) = (P w_{i}) (x) = (P w_{i}) (y) = j = 1 \sum k z \in V_{j} \sum P_{y z} v_{i} (j) .

j = 1 \sum k v_{i} (j) u_{x y} (j) = ⟨ u_{x y}, v_{i} ⟩ = 0,

j = 1 \sum k v_{i} (j) u_{x y} (j) = ⟨ u_{x y}, v_{i} ⟩ = 0,

i \in [k] max {vol (V_{i})} ⩽ Δ n_{_{m a x}} < \frac{k}{2} n_{_{m i n}} ⩽ \frac{k}{2} i \in [k] min {vol (V_{i})} .

i \in [k] max {vol (V_{i})} ⩽ Δ n_{_{m a x}} < \frac{k}{2} n_{_{m i n}} ⩽ \frac{k}{2} i \in [k] min {vol (V_{i})} .

{χ_{i} = \frac{m ^ _{i}}{m _{i}} 1_{V_{i}} - \frac{m _{i}}{m ^ _{i}} 1_{\hat{V}_{i}} : i \in [k - 1]},

{χ_{i} = \frac{m ^ _{i}}{m _{i}} 1_{V_{i}} - \frac{m _{i}}{m ^ _{i}} 1_{\hat{V}_{i}} : i \in [k - 1]},

y = i = 1 \sum k - 1 γ_{i} χ_{i}, where γ_{i} = \frac{x ^{⊺} D χ _{i}}{D ^{1/2} χ _{i} ^{2}} .

y = i = 1 \sum k - 1 γ_{i} χ_{i}, where γ_{i} = \frac{x ^{⊺} D χ _{i}}{D ^{1/2} χ _{i} ^{2}} .

P (∣ y (u) ∣ ⩾ \frac{1}{Δ n}) ⩾ 1 - O (\frac{1}{n}) .

P (∣ y (u) ∣ ⩾ \frac{1}{Δ n}) ⩾ 1 - O (\frac{1}{n}) .

y (u) = γ_{1} χ_{1} (u) = γ_{1} \frac{m ^ _{1}}{m _{1}},

y (u) = γ_{1} χ_{1} (u) = γ_{1} \frac{m ^ _{1}}{m _{1}},

D^{1/2} χ_{1}^{2} = \frac{m ^ _{1}}{m _{1}} v \in V_{1} \sum δ (v) + \frac{m _{1}}{m ^ _{1}} v \in \hat{V}_{1} \sum δ (v) = \overset{m}{^}_{1} + m_{1} = m,

D^{1/2} χ_{1}^{2} = \frac{m ^ _{1}}{m _{1}} v \in V_{1} \sum δ (v) + \frac{m _{1}}{m ^ _{1}} v \in \hat{V}_{1} \sum δ (v) = \overset{m}{^}_{1} + m_{1} = m,

∣ γ_{1} ∣ = \frac{∣ x ^{⊺} D χ _{1} ∣}{D ^{1/2} χ _{1} ^{2}} = \frac{∣ x ^{⊺} D χ _{1} ∣}{m} .

∣ γ_{1} ∣ = \frac{∣ x ^{⊺} D χ _{1} ∣}{D ^{1/2} χ _{1} ^{2}} = \frac{∣ x ^{⊺} D χ _{1} ∣}{m} .

x^{⊺} D χ_{1} = x^{⊺} D (\frac{m ^ _{1}}{m _{1}} 1_{V_{1}} - \frac{m _{1}}{m ^ _{1}} 1_{\hat{V}_{1}}) = \frac{m _{1}}{m ^ _{1}} x^{⊺} D (\frac{m ^ _{1}}{m _{1}} 1_{V_{1}} - 1_{\hat{V}_{1}}) .

x^{⊺} D χ_{1} = x^{⊺} D (\frac{m ^ _{1}}{m _{1}} 1_{V_{1}} - \frac{m _{1}}{m ^ _{1}} 1_{\hat{V}_{1}}) = \frac{m _{1}}{m ^ _{1}} x^{⊺} D (\frac{m ^ _{1}}{m _{1}} 1_{V_{1}} - 1_{\hat{V}_{1}}) .

P (∣ x^{⊺} D χ_{1} ∣ < \frac{m _{1}}{m ^ _{1}}) = P (∣ x^{T} w ∣ < 1) ⩽ O (\frac{1}{n}),

P (∣ x^{⊺} D χ_{1} ∣ < \frac{m _{1}}{m ^ _{1}}) = P (∣ x^{T} w ∣ < 1) ⩽ O (\frac{1}{n}),

∣ y (u) ∣ = ∣ γ_{1} ∣ \frac{m ^ _{1}}{m _{1}} ⩾ \frac{1}{m} ⩾ \frac{1}{Δ n} .

∣ y (u) ∣ = ∣ γ_{1} ∣ \frac{m ^ _{1}}{m _{1}} ⩾ \frac{1}{m} ⩾ \frac{1}{Δ n} .

x^{⊺} D χ_{1} = \frac{m ^ _{1}}{m _{1}} x^{⊺} D (1_{V_{1}} - \frac{m _{1}}{m ^ _{1}} 1_{\hat{V}_{1}})

x^{⊺} D χ_{1} = \frac{m ^ _{1}}{m _{1}} x^{⊺} D (1_{V_{1}} - \frac{m _{1}}{m ^ _{1}} 1_{\hat{V}_{1}})

∣ y (u) ∣ = ∣ γ_{1} ∣ \frac{m ^ _{1}}{m _{1}} ⩾ \frac{m ^ _{1}}{m m _{1}} = \frac{m - m _{1}}{m m _{1}} ⩾ (a) \frac{1}{m} ⩾ \frac{1}{Δ n},

∣ y (u) ∣ = ∣ γ_{1} ∣ \frac{m ^ _{1}}{m _{1}} ⩾ \frac{m ^ _{1}}{m m _{1}} = \frac{m - m _{1}}{m m _{1}} ⩾ (a) \frac{1}{m} ⩾ \frac{1}{Δ n},

P (sgn (y (u)) \neq = sgn (y (v))) = Ω (1) .

P (sgn (y (u)) \neq = sgn (y (v))) = Ω (1) .

y (u)

y (u)

y (v)

σ (X (V_{i})) = w \in V_{i} \sum σ^{2} (δ (w) x (w)) = w \in V_{i} \sum (E [δ (w)^{2} x (w)^{2}] - E [δ (w) x (w)]^{2}) = w \in V_{i} \sum δ (w)^{2} .

σ (X (V_{i})) = w \in V_{i} \sum σ^{2} (δ (w) x (w)) = w \in V_{i} \sum (E [δ (w)^{2} x (w)^{2}] - E [δ (w) x (w)]^{2}) = w \in V_{i} \sum δ (w)^{2} .

\frac{m ^ _{1}}{m _{1}} X (V_{1}) - X (V_{2}) - X (\hat{V}_{2})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Step-by-Step Community Detection in Volume-Regular Graphs

Luca Becchetti

Sapienza Università di Roma

Rome, Italy

[email protected] Partially supported by ERC Advanced Grant 788893 AMDROMA “Algorithmic and Mechanism Design Research in Online Markets” and MIUR PRIN project ALGADIMAR “Algorithms, Games, and Digital Markets”

Emilio Cruciani

Inria, I3S Lab, UCA, CNRS

Sophia Antipolis, France

[email protected]

Francesco Pasquale

Università di Roma Tor Vergata

Rome, Italy

[email protected] Partially supported by the University of “Tor Vergata” under research programme “Mission: Sustainability” project ISIDE (grant no. E81I18000110005)

Sara Rizzo

Gran Sasso Science Institute

L’Aquila, Italy

[email protected]

Abstract

Spectral techniques have proved amongst the most effective approaches to graph clustering. However, in general they require explicit computation of the main eigenvectors of a suitable matrix (usually the Laplacian matrix of the graph).

Recent work (e.g., Becchetti et al., SODA 2017) suggests that observing the temporal evolution of the power method applied to an initial random vector may, at least in some cases, provide enough information on the space spanned by the first two eigenvectors, so as to allow recovery of a hidden partition without explicit eigenvector computations. While the results of Becchetti et al. apply to perfectly balanced partitions and/or graphs that exhibit very strong forms of regularity, we extend their approach to graphs containing a hidden $k$ partition and characterized by a milder form of volume-regularity. We show that the class of $k$ -volume regular graphs is the largest class of undirected (possibly weighted) graphs whose transition matrix admits $k$ “stepwise” eigenvectors (i.e., vectors that are constant over each set of the hidden partition). To obtain this result, we highlight a connection between volume regularity and lumpability of Markov chains. Moreover, we prove that if the stepwise eigenvectors are those associated to the first $k$ eigenvalues and the gap between the $k$ -th and the ( $k$ +1)-th eigenvalues is sufficiently large, the Averaging dynamics of Becchetti et al. recovers the underlying community structure of the graph in logarithmic time, with high probability.

Keywords: Distributed algorithms, Community detection, Markov chains, Spectral analysis

1 Introduction

Clustering a graph in a way that reflects underlying community structure is a very important mining task [For10]. Informally speaking, in the classical setting, we are given a possibly weighted graph $G$ and an integer $k$ . Our goal is to partition the vertex set of $G=(V,E)$ into $k$ disjoint subsets, so that the $k$ induced subgraphs have high inner and low outer expansion. Spectral techniques have proved amongst the most effective approaches to graph clustering [NJW02, SM00, VL07]. The general approach to spectral graph clustering [VL07] normally implies embedding the vertices of $G$ into the $k$ -dimensional subspace spanned by the main $k$ eigenvectors of a matrix defined in terms of $G$ ’s adjacency matrix, typically its (normalized) Laplacian. Intuitively, one expects that, for a well-clustered graph with $k$ communities, the profiles of the first $k$ eigenvectors are correlated with the underlying community structure of $G$ . Recent work has provided theoretical support to this approach. In particular, [LGT14] showed that, given the first $k$ orthonormal eigenvectors of the normalized Laplacian, it is possible to produce a $k$ -partition of the vertex set, corresponding to $k$ suitably-defined indicator vectors, such that the associated values of the Rayleigh quotient are relatively small. More recently, [PSZ17] proved that, under suitable hypotheses on the spectral gap between the $k$ -th and ( $k$ +1)-th eigenvalue of the normalized Laplacian of $G$ , the span of the first $k$ eigenvectors largely overlaps with the span of $\{D^{\frac{1}{2}}\bm{g}_{1},\ldots,D^{\frac{1}{2}}\bm{g}_{k}\}$ , where $D$ is the diagonal degree matrix of $G$ , while the $\bm{g}_{i}$ ’s are indicator vectors describing a $k$ -way partition $\{S_{i}\}_{i=1}^{k}$ of $V$ such that, for every $i$ , the conductance of $S_{i}$ is at most the $k$ -way expansion constant $\rho(k)$ [LGT14]. Note that, if $\bm{v}$ is an eigenvector associated to the $i$ -th smallest eigenvalue of the normalized Laplacian, $D^{-\frac{1}{2}}\bm{v}$ is an eigenvector corresponding to the $i$ -th largest eigenvalue of the random walk’s transition matrix associated to $G$ . Hence, when $G$ is well-clustered, one might reasonably expect the first $k$ eigenvectors of $P$ to exhibit almost-“stepwise” profiles reflecting $G$ ’s underlying community structure. The aforementioned spectral approaches require explicit computation of the $k$ main eigenvectors of a (generally symmetric) matrix.

In [BCN*+*17], the authors considered the case $k=2$ for which they proposed the following distributed algorithm (Averaging dynamics, Algorithm 1): “At the outset, every node picks an initial value, independently and uniformly at random in $\{-1,1\}$ ; then, in each synchronous round, every node updates its value to the average of those held by its neighbors. A node also tags itself blue if the last update increased its value, red otherwise” [BCN*+*17]. The authors showed that, under a variety of graph models exhibiting sparse balanced cuts, including the stochastic block model [HLL83], the process resulting from the above simple local rule converges, in logarithmic time, to a coloring that, depending on the model, exactly or approximately reflects the underlying cut. They further elaborated on how to extend the proposed approach to the case of multiple communities, providing an analysis for a strongly regular version of the stochastic block model with multiple communities. While results like those presented in [LGT14, PSZ17] provide further theoretical justification for spectral clustering, the approach proposed in [BCN*+*17] suggests that observing the temporal evolution of the power method applied to an initial random vector may, at least in some cases, provide equivalent information, without requiring explicit eigenvector computations.

1.1 Our contributions

The goal of this work is to take a further step in this direction by considering a more general class of graphs, even if still relatively “regular”, than the one considered in [BCN*+*17]. The analysis of the Averaging dynamics on this class is considerably harder, but it is likely to provide insights into the challenges of analyzing the general case, without all the intricacies of the latter. Our contribution is as follows:

•

We define the class of $k$ -volume-regular graphs. This class of edge-weighted graphs includes those considered in [BCN*+*17] and it is the largest class of undirected, possibly weighted graphs that admit $k$ “stepwise” eigenvectors (i.e., having constant values over the $k$ steps that identify the hidden partition). This result uses a connection between volume regularity and lumpability of Markov chains [KS60, TK06].

•

If the stepwise eigenvectors are those associated to the first $k$ eigenvalues and the gap between the $k$ -th and the ( $k$ +1)-th eigenvalues is sufficiently large, we show that running the Averaging dynamics for a suitable number of steps allows recovery of the underlying community structure of the graph, with high probability.111An event $\mathcal{E}_{n}$ holds with high probability (w.h.p.) if $\mathbf{P}\left(\mathcal{E}_{n}\right)=1-\mathcal{O}(n^{-\gamma})$ , for some constant $\gamma>0$ . To prove this, we provide a family of mutually orthonormal vectors which, when the graph is volume-regular, span the eigenspace of the main $k$ eigenvectors of the normalized adjacency matrix of the graph. It should be noted that the first and second of these vectors are respectively the main eigenvector and the Fiedler vector [Fie89] associated to the normalized adjacency matrix.

•

While the results of [BCN*+*17] apply when the underlying communities are of the same size, our results do not require this assumption and they apply to weighted graphs. It should also be noted that volume regularity is a weaker notion than regularity of the graph.

•

We further show that variants of the Averaging dynamics (and/or its labeling rule) can address different problems (e.g., identifying bipartiteness) and/or other graph classes.

We further note that the overall algorithm we consider can be viewed as a fully decentralized, synchronous algorithm that works in anonymous networks,222Nodes do not possess distinguished identities. with a completely local clustering criterion, though it cannot be considered a dynamics in the sense of [BCN*+*17] since it requires a bound on the number of nodes in the underlying network.

Finally, this paper extends a preliminary version [BCPR19] in several ways. To begin, the main result presented in [BCPR19] was weaker, in the sense that the constraints imposed on the eigenvalues in [BCPR19, Theorem 9] polynomially depend on network parameters like the maximum degree and the number of vertices. In this respect, they are substantially stronger than those imposed to prove Theorem 4.1, where results (in particular, the time window in which recovery of the hidden partition is possible) are expressed in terms of the spectrum of the graph, while constraints imposed on the second eigenvalue only logarithmically depend on the aforementioned network parameters. In reframing these results, we also realized that the presence of a window in which recovery is possible is something that is hardly avoidable in general using the simple averaging heuristic of [BCN*+*17]. This is something we remark right after Theorem 4.1 (see Remark 1), while we also observe (see Remarks 3 and 4) that the analysis presented here also encompasses the class of regular graphs considered in [BCN*+*17] as a special case, something that was not obvious in [BCPR19]. Finally, the result given in [BCPR19] for bipartite graphs assumed volume regularity, an assumption that is not necessary as we show in Section 5.

1.2 Further related work

We briefly discuss further work that bears some relationship to this paper, either because it adopts simple and/or decentralized heuristics to uncover community structure, or because it relies on the use of spectral techniques.

Decentralized heuristics for block reconstruction.

Label propagation algorithms [RAK07] are dynamics based on majority updating rules [AAE08] and have been applied for detecting communities in complex networks. Several papers present experimental results for such protocols on specific classes of clustered graphs [BC09, LM10, RAK07]. The only available rigorous analysis of a label propagation algorithm on planted partition graphs is the one presented in [KPS13], where the authors analyze a label propagation algorithm on $\mathcal{G}_{2n,p,q}$ graphs in the case of dense topologies. In particular, their analysis considers the case where $p=\Omega(1/n^{\frac{1}{4}-\varepsilon})$ and $q=\mathcal{O}(p^{2})$ , a parameter range in which very dense clusters of constant diameter separated by a sparse cut occur w.h.p. In this setting, characterized by a polynomial gap between $p$ and $q$ , simple combinatorial and concentration arguments show that the protocol converges in constant expected time. A logarithmic bound for sparser topologies is conjectured in [KPS13].

Following [BCN*+*17], a number of recent papers analyze simple distributed algorithms for community detection that rely on elementary dynamics. In the Averaging dynamics considered in this paper, every node communicates in parallel with all its neighbors in each round. While this might be too expensive in scenarios characterized by dense topologies, it is simply infeasible in other settings (for instance, when links represent opportunistic meetings that occur asynchronously). Motivated by similar considerations, a first line of follow-up work considered “sparsified”, asynchronous variants of the Averaging dynamics [BCM*+*18, MMM18, SZ17].

Another interesting direction is the rigorous analysis of well-known (non-linear) dynamics based on majority rules on graphs that exhibit community structure. In [CNNS18], Cruciani et al. consider the 2-Choices dynamics where, in each round, every node picks two random neighbors and updates its value to the most frequent among its value and those held by its sampled neighbors. They show that if the underlying graph has a suitable core-periphery structure and the process starts in a configuration where nodes in core and periphery have different states, the system either rapidly converges to the core’s state or reaches a metastable regime that reflects the underlying graph structure. Similar results have been also obtained for clustered regular graphs with dense communities in [CNS19], where the 2-Choices dynamics is proposed as a distributed algorithm for community detection.

Although based on the Averaging dynamics and thus extremely simple and fully decentralized, the algorithm we consider in this paper is not itself a dynamics in the sense proposed in [BCN*+*17], since its clustering criterion is applied within a time window, which in turn requires (at least approximate) knowledge of the network size.

Because of their relevance for the reconstruction problem, we also briefly discuss the class of belief propagation algorithms, best known as message-passing algorithms for performing inference in graphical models [Mac03]. Though not a dynamics, belief propagation is still a simple approach. Moreover, there is non-rigorous, strong supporting evidence that some belief propagation algorithms might be optimal for the reconstruction problem [DKMZ11]. A rigorous analysis is a major challenge; in particular, convergence to the correct value of belief propagation is far from being fully-understood on graphs which are not trees [MK07, Wei00]. As we discuss in the next subsection, more complex algorithms inspired by belief propagation have been rigorously shown to perform reconstruction optimally.

General algorithms for block reconstruction.

Several algorithms for community detection are spectral: They typically consider the eigenvector associated to the second largest eigenvalue of the adjacency matrix $A$ of $G$ , or the eigenvector corresponding to the largest eigenvalue of the matrix $A-\frac{d}{n}J$ [Bop87, CO05, CO10, McS01],333 $A$ is the adjacency matrix of $G$ , $J$ is the matrix having all entries equal to $1$ , $d$ is the average degree, and $n$ is the number of vertices. since these are correlated with the hidden partition. More recently spectral algorithms have been proposed [AS15, BLM15, CO10, KMM*+*13, MNS13, PSZ17] that find a weak reconstruction even in the sparse, tight regime.

Interestingly, spectral algorithms turn out to be a feasible approach also in distributed settings. In particular, Kempe and McSherry [KM04] show that eigenvalue computations can be performed in a distributed fashion, yielding distributed algorithms for community detection under various models, including the stochastic block model. However, their algorithm does not match any simple decentralized computing model. In particular, the algorithm of Kempe and McSherry as well as any distributed version of the above mentioned centralized algorithms are neither dynamics, nor do they correspond to the notion of light-weight algorithm of Hassin and Peleg [HP01]. Moreover, the mixing time of the simple random walk on the graph is a bottleneck for the distributed algorithm of Kempe and McSherry and for any algorithm that performs community detection in a graph $G$ by employing the power method or the Lanczos method [Lan50] as a subroutine. This is not the case for the Averaging dynamics, since it removes the component of the state in the span of the main eigenvector.

In general, the reconstruction problem has been studied extensively using a multiplicity of techniques, which include combinatorial algorithms [DF89], belief propagation [DKMZ11] and variants of it [MNS16], spectral-based techniques [CO10, McS01], Metropolis approaches [JS98], and semidefinite programming [ABH14], among others.

1.3 Roadmap

The rest of this paper is organized as follows. In Section 2, we formally define the Averaging dynamics and briefly recall how it is connected with the transition matrix of a random walk on the underlying graph. We also define the notion of community-sensitive algorithm and the class of clustered volume-regular graphs. In Section 3 we show the relation between lumpability of Markov chains and volume-regular graphs. In Section 4 we state the main result of the paper (see Theorem 4.1) on the analysis of the Averaging for clustered volume-regular graphs: We give the two main technical lemmas and show how the main theorem derives from them. In Section 5, we show how slightly modified versions of the Averaging dynamics can be used to identify the hidden partition of other non-clustered volume-regular graphs, e.g., bipartite graphs. In Section 6 we briefly show how our approach can be extended to slightly more general graph classes than the ones considered in this paper. We finally highlight some open problems and directions for further research on the topic.

2 Preliminaries

Notation.

Consider an undirected edge-weighted graph $G=(V,E,w)$ with nonnegative weights. For each node $u\in V$ , we denote by $\delta(u)$ the volume, or weighted degree, of node $u$ , namely $\delta(u)=\sum_{v:(u,v)\in E}w(u,v).$ Similarly, we denote the volume of a set of nodes $T\subseteq V$ as $\mathrm{vol}(T):=\sum_{u\in T}\delta(u)$ . $D$ denotes the diagonal matrix, such that $D_{uu}=\delta(u)$ for each $u\in V$ . Without loss of generality we assume $\min_{u}\delta(u)=1$ , since the behavior of the Averaging dynamics (and the corresponding analysis) is not affected by a normalization of the weights. We refer to the maximum volume of a node as $\Delta:=\max_{u}\delta(u)$ .

In the remainder, $W$ denotes the weighted adjacency matrix of $G$ , while $P=D^{-1}W$ is the transition matrix of a random walk on $G$ , in which a transition from node $u$ to node $v$ occurs with probability proportional to $w(u,v)$ . We call $\lambda_{1},\ldots,\lambda_{n}$ the eigenvalues of $P$ , in non-increasing order, and $\bm{v}_{1},\ldots,\bm{v}_{n}$ a family of eigenvectors of $P$ , such that $P\bm{v}_{i}=\lambda_{i}\bm{v}_{i}$ . We let $N=D^{-\frac{1}{2}}WD^{-\frac{1}{2}}=D^{\frac{1}{2}}PD^{-\frac{1}{2}}$ denote the normalized weighted adjacency matrix of $G$ . Note that $N$ is real and symmetric (thus, the eigenvectors of $N$ are orthogonal) and that its spectrum is the same as that of $P$ . We denote by $\bm{w}_{1},\ldots,\bm{w}_{n}$ a family of eigenvectors of $N$ , such that $N\bm{w}_{i}=\lambda_{i}\bm{w}_{i}$ . It is important to note that $\bm{w}_{i}$ is an eigenvector of $N$ if and only if $D^{-\frac{1}{2}}{}\bm{w}_{i}$ is an eigenvector of $P$ .

We use the Bachmann–Landau asymptotic notation (i.e., $\omega,\Omega,\Theta,\mathcal{O},o$ ) to describe the limiting behavior of functions depending on $n$ . In this sense, our results only hold for large $n$ . We say that an event $\mathcal{E}_{n}$ holds with high probability (w.h.p., in short) if $\mathbf{P}\left(\mathcal{E}_{n}\right)=1-\mathcal{O}(n^{-\gamma})$ , for any positive constant $\gamma$ .

2.1 Averaging dynamics

The simple algorithm we consider in this paper, named Averaging dynamics (Algorithm 1) after [BCN*+*17] in which the algorithm was first proposed, can be seen as an application of the power method, augmented with a Rademacher initialization and a suitable labeling scheme. In this form, it is best described as a distributed process, executed by the nodes of an underlying edge-weighted graph. The Averaging dynamics can be used as a building-block to achieve “community detection” in some classes of “regular” and “almost regular” graphs. Herein, we extend its use and analysis to broader graph classes and, in one case, to a different problem.

Spectral decomposition of the transition matrix.

Let $\bm{x}^{(t)}$ denote the state vector at time $t$ , i.e., the vector whose $u$ -th entry is the value held by node $u$ at time $t$ . We let $\bm{x}^{(0)}=\bm{x}$ denote the initial state vector. Globally, the averaging update rule of Algorithm 1 corresponds to one iteration of the power method, in this case an application of the transition matrix $P$ to the current state vector, i.e., $\bm{x}^{(t)}=P\bm{x}^{(t-1)}$ . We can write

[TABLE]

where in $(a)$ we spectrally decomposed the matrix $N^{t}$ and expressed the vector $D^{\frac{1}{2}}\bm{x}$ as a linear combination of the eigenvectors of $N$ , i.e., $D^{\frac{1}{2}}\bm{x}=\sum_{i=1}^{n}\beta_{i}\bm{w}_{i}$ , with $\beta_{i}=\langle D^{\frac{1}{2}}\bm{x},\bm{w}_{i}\rangle$ ; in $(b)$ we used that the eigenvectors of $N$ are orthonormal, i.e., that $\bm{w}_{i}^{\intercal}\bm{w}_{i}=1$ for every $i\in\{1,\ldots,n\}$ and that $\bm{w}_{i}^{\intercal}\bm{w}_{j}=0$ for every $i,j\in\{1,\ldots,n\}$ and such that $i\neq j$ . By explicitly writing the $\beta_{i}$ s and by noting that $\bm{w}_{i}=\frac{D^{\frac{1}{2}}\bm{v}_{i}}{\|D^{\frac{1}{2}}\bm{v}_{i}\|}$ we conclude that

[TABLE]

where $\alpha_{i}:=\frac{\langle D^{\frac{1}{2}}\bm{x},D^{\frac{1}{2}}\bm{v}_{i}\rangle}{\|D^{\frac{1}{2}}\bm{v}_{i}\|^{2}}=\frac{\bm{x}^{\intercal}D\bm{v}_{i}}{\|D^{\frac{1}{2}}\bm{v}_{i}\|^{2}}$ is the length of the projection of $D^{\frac{1}{2}}\bm{x}$ on $D^{\frac{1}{2}}\bm{v}_{i}$ .

Note that $\lambda_{1}=1$ and $\bm{v}_{1}=\bm{1}$ ,444Here and in the remainder, $\bm{1}$ denotes the vector whose entries are $1$ . since $P$ is stochastic, and $\lambda_{i}\in(-1,1)$ for every $i>1$ , if $G$ is connected and non bipartite. The long term behavior of the dynamics can be written as

[TABLE]

i.e., each node converges to the initial global weighted average of the network.

2.2 Community-sensitive algorithms

We give the following definition of community-sensitive algorithm, that closely resembles that of locality-sensitive hashing (see, e.g., [LRU14]).

Definition 2.1 (Community-sensitive algorithm).

Let $\mathcal{A}$ be a randomized algorithm that takes in input a (possibly weighted) graph $G=(V,E)$ with a hidden partition $\mathcal{V}=\{V_{1},\dots,V_{k}\}$ and assigns a Boolean value $\mathcal{A}(G)[v]\in\{0,1\}$ to each node $v\in V$ . We say $\mathcal{A}$ is an $\left(\varepsilon,\,\delta\right)$ -community-sensitive algorithm, for some $\varepsilon,\delta>0$ , if the following two conditions hold:

For each set $V_{i}$ of the partition and for each pair of nodes $u,v\in V_{i}$ in that set, the probability that the algorithm assigns the same Boolean value to $u$ and $v$ is at least $1-\varepsilon$ :

[TABLE] 2. 2.

For each pair $V_{i},V_{j}$ of distinct sets of the partition and for each pair of nodes $u\in V_{i}$ and $v\in V_{j}$ , the probability that the algorithm assigns the same value to $u$ and $v$ is at most $\delta$ :

[TABLE]

For example, for $(\varepsilon,\delta)=(1/n,1/2)$ , an algorithm that simply assigns the same value to all nodes would satisfy the first condition but not the second one, while an algorithm assigning [math] or $1$ to each node with probability $1/2$ , independently of the other nodes, would satisfy the second condition but not the first one.

Note that Algorithm 1 is a distributed algorithm that, at each round $t$ , assigns one out of two labels to each node of a graph. In the next section (see Theorem 4.1) we prove that a time window $[T_{1},T_{2}]$ exists, such that for all rounds $t\in[T_{1},T_{2}]$ , the assignment of the Averaging dynamics satisfies both conditions in Definition 2.1: The first condition with $\varepsilon=\varepsilon(n)=\mathcal{O}(n^{-\frac{1}{2}})$ , the second with $\delta=\delta(n)=1-\Omega(1)$ .

Community-sensitive labeling.

We here generalize the concept of community-sensitive labeling (appeared in [BCM*+*18, Definition 3]), given only for the case of two communities, to the case of multiple communities. If we execute $\ell=\Theta(\log n)$ independent runs of an $(\varepsilon,\delta)$ -community-sensitive algorithm $\mathcal{A}$ , each node is assigned a signature of $\ell$ binary values, with pairwise Hamming distances probabilistically reflecting community membership of the nodes. More precisely, let $\mathcal{A}$ be an $\left(\varepsilon,\,\delta\right)$ -community-sensitive algorithm and let $\mathcal{A}_{1},\dots,\mathcal{A}_{\ell}$ be $\ell=\Theta(\log n)$ independent runs of $\mathcal{A}$ . For each node $u\in V$ , let $\bm{s}(u)=(s_{1}(u),\dots,s_{\ell}(u))$ denote the signature of node $u$ , where $s_{i}(u)=\mathcal{A}_{i}(G)[u]$ . For each pair nodes $u,v$ , let $h(u,v)=|\{i\in[\ell]\,:\,s_{i}(u)\neq s_{i}(v)\}|$ be the Hamming distance between $\bm{s}(u)$ and $\bm{s}(v)$ .

Lemma 2.2 (Community-sensitive labeling).

Let $\mathcal{A}$ be an $\left(\varepsilon,\,\delta\right)$ -community-sensitive algorithm with $\varepsilon=\mathcal{O}(\frac{1}{n^{\gamma}})$ for any arbitrarily small positive constant $\gamma$ , and $\delta=1-\Omega(1)$ . Let $\ell=\Theta(\log n)$ , $\alpha=\Omega(\frac{1}{n^{\gamma-c}})$ with $c\in(0,\gamma]$ , and $\beta=b(1-\delta)$ for any constant $b\in(0,1)$ and such that $0\leqslant\alpha\leqslant\beta\leqslant 1$ . Then, for each pair of nodes $u,v\in V$ it holds that:

If $u$ and $v$ belong to the same community then $h(u,v)<\alpha\ell$ , w.h.p. 2. 2.

If $u$ and $v$ belong to different communities then $h(u,v)\geqslant\beta\ell$ , w.h.p.

Proof.

From the definition of $\left(\varepsilon,\,\delta\right)$ -community-sensitive algorithm we have that, if $u$ and $v$ belong to the same community, then $\mathbf{E}\left[h(u,v)\right]=\sum_{i=1}^{\ell}\mathbf{P}\left(s_{i}(u)\neq s_{i}(v)\right)\leqslant\varepsilon\ell$ . Similarly, if they belong to different communities, then $\mathbf{E}\left[h(u,v)\right]=\sum_{i=1}^{\ell}\mathbf{P}\left(s_{i}(u)\neq s_{i}(v)\right)\geqslant(1-\delta)\ell$ . If $u$ and $v$ belong to the same community, we compute $\mathbf{P}\left(h(u,v)>\alpha\ell\right)$ and by Markov inequality we get that

[TABLE]

where in the last inequality we use the hypothesis $\varepsilon=\mathcal{O}(\frac{1}{n^{\gamma}})$ and $\alpha=\Omega(\frac{1}{n^{\gamma-c}})$ . On the other hand, if $u$ and $v$ belong to different communities, we apply Theorem A.1 to $h(u,v)$ by using the lower bound on the expected value of $h(u,v)$ and the hypothesis $\ell=\Theta(\log n)$ . Thus,

[TABLE]

where $d$ is a positive constant. The thesis follows by combing Eqs. 2 and 3. ∎

2.3 Volume-regular graphs

Recall that, for an undirected edge-weighted graph $G=(V,E,w)$ , we denote by $\delta(u)$ the volume a node $u\in V$ , i.e., $\delta(u)=\sum_{v:(u,v)\in E}w(u,v)$ . Note that the transition matrix $P$ of a random walk on $G$ is such that $P_{uv}=w\left(u,v\right)/\delta(u)$ . Given a partition ${\mathcal{V}}=\{V_{1},\ldots,V_{k}\}$ of the set of nodes $V$ , for a node $u\in V$ and a partition index $i\in[k]$ , $\delta_{i}(u)$ denotes the overall weight of edges connecting $u$ to nodes in $V_{i}$ , $\delta_{i}(u)=\sum_{v\in V_{i}\,:\,{u,v}\in E}w\left({u,v}\right).$ Hence, $\delta(u)=\sum_{i=1}^{k}\delta_{i}(u)$ .

Definition 2.3 (Volume-regular graph).

Let $G=(V,E,w)$ be an undirected edge-weighted graph with $|V|=n$ nodes and let ${\mathcal{V}}=\{V_{1},\ldots,V_{k}\}$ be a $k$ -partition of the nodes, for some $k\in[n]$ . We say that $G$ is volume regular with respect to $\mathcal{V}$ if, for every pair of partition indexes $i,j\in[k]$ and for every pair of nodes $u,v\in V_{i}$ , $\frac{\delta_{j}(u)}{\delta(u)}=\frac{\delta_{j}(v)}{\delta(v)}.$ We say that $G$ is $k$ -volume regular if there exists a $k$ -partition $\mathcal{V}$ of the nodes such that $G$ is volume regular with respect to $\mathcal{V}$ .

In other words, $G$ is volume regular if there exists a partition of the nodes such that the fraction of a node’s volume toward a set of the partition is constant across nodes of the same set. Note that all graphs with $n$ nodes are trivially $1$ - and $n$ -volume regular.

Let $G=(V,E,w)$ be a $k$ -volume regular graph and let $P$ be the transition matrix of a random walk on $G$ . In the next lemma we prove that the span of $k$ linearly independent eigenvectors of $P$ equals the span of the indicator vectors of the $k$ communities of $G$ . The proof makes use of the correspondence between random walks on volume regular graphs and ordinary lumpable Markov chains [KS60]; in particular the result follows from Lemma 3.2 and Lemma 3.3, that we prove in Section 3.

Lemma 2.4.

Let $P$ be the transition matrix of a random walk on a $k$ -volume regular graph $G=(V,E,w)$ with $k$ -partition $\mathcal{V}=\{V_{1},\dots,V_{k}\}$ . There exists a family $\{\bm{v}_{1},\ldots,\bm{v}_{k}\}$ of linearly independent eigenvectors of $P$ such that $Span\left(\{\bm{v}_{1},\dots,\bm{v_{k}}\}\right)=Span\left(\{{\mathbf{1}}_{V_{1}},\dots,{\mathbf{1}}_{V_{k}}\}\right),$ with ${\mathbf{1}}_{V_{i}}$ the indicator vector of the $i$ -th set of the partition, for $i\in[k]$ .

In the rest of the paper we call “stepwise” the eigenvectors of $P$ that can be written as linear combinations of the indicator vectors of the communities. In the next definition, we formalize the fact that a $k$ -volume regular graph is clustered if the $k$ linearly independent stepwise eigenvectors of $P$ , whose existence is guaranteed by the above lemma, are associated to the $k$ largest eigenvalues of $P$ .

Definition 2.5 (Clustered volume regular graph).

Let $G=(V,E,w)$ be a $k$ -volume regular graph and let $P$ be the transition matrix of a random walk on $G$ . We say that $G$ is a clustered $k$ -volume regular graph if the $k$ stepwise eigenvectors of $P$ are associated to the first $k$ largest eigenvalues of $P$ .

3 Volume-regular graphs and lumpable Markov chains

The class of volume-regular graphs is deeply connected with the definition of lumpability [KS60] of Markov chains. We here first recall the definition of lumpable Markov chain and then show that a graph $G$ is volume-regular if and only if the associated weighted random walk is a lumpable Markov chain.

Definition 3.1 (Ordinary lumpability of Markov Chains).

Let $\{X_{t}\}_{t}$ be a finite Markov chain with state space $V$ and transition matrix $P=(P_{uv})_{u,v\in V}$ and let $\mathcal{V}=\{V_{1},\ldots,V_{k}\}$ be a partition of the state space. Markov chain $\{X_{t}\}_{t}$ is ordinary lumpable with respect to $\mathcal{V}$ if, for every pair of partition indexes $i,j\in[k]$ and for every pair of nodes in the same set of the partition $u,v\in V_{i}$ , it holds that

[TABLE]

We define the lumped matrix $\widehat{P}$ of the Markov Chain as the matrix such that $\widehat{P}_{ij}=\sum_{w\in V_{i}}P_{uw}$ , for any $u\in V_{j}$ .

We first prove that random walks on Volume-regular graphs define exactly the subset of reversible and ordinary lumpable Markov chains.

Lemma 3.2.

A reversible Markov chain $\{X_{t}\}_{t}$ is ordinary lumpable if and only if it is a random walk on a volume-regular graph.

Proof.

Assume first that $\{X_{t}\}_{t}$ is ordinary lumpable and let $P$ be the corresponding transition matrix. Consider the weighted graph $G=(V,E,w)$ obtained from $P$ as follows: $V$ corresponds to the set of states in $P$ , while $w(u,v)=\pi(u)P_{uv}$ , for every $u,v\in V$ , with $\pi$ the stationary distribution of $P$ . Note that $G$ is an undirected graph, i.e., $w(u,v)=\pi(u)P_{uv}\stackrel{{\scriptstyle(a)}}{{=}}\pi(v)P_{vu}=w(v,u),$ where $(a)$ holds because $P$ is reversible. Moreover

[TABLE]

where $(a)$ holds because $P$ is stochastic. Thus $G$ meets Definition 2.3 because, for any $u,v\in V_{i}$ ,

[TABLE]

Next, assume $G$ is $k$ -volume-regular with respect to the partition ${\mathcal{V}}=\{V_{1},\ldots,V_{k}\}$ . Let $P$ be the transition matrix of the corresponding random walk. For every $i,j\in[k]$ and for every $u,v\in V_{i}$ we have:

[TABLE]

where $(a)$ follows from Definition 2.3. Moreover note that $P$ is reversible with respect to distribution $\pi$ , where $\pi(u)=\frac{\delta(u)}{\mathrm{vol}(G)}$ . ∎

Note that infinitely many $k$ -volume-regular graphs have the same $k$ -ordinary lumpable random walk chain.

We next show that a Markov chain is $k$ -ordinary lumpable if and only if the corresponding transition matrix $P$ has $k$ stepwise, linearly independent eigenvectors.

Lemma 3.3.

Let $P$ be the transition matrix of a Markov chain. Then $P$ has $k$ stepwise linearly independent eigenvectors if and only if $P$ is ordinary lumpable.

Proof.

We divide the proof in two parts. First, we assume that $P$ is ordinary lumpable and show that $P$ has $k$ stepwise linearly independent eigenvectors. Second, we assume that $P$ has $k$ stepwise linearly independent eigenvectors and show that $P$ is ordinary lumpable.

Let $P$ be ordinary lumpable and $\widehat{P}$ its lumped matrix. Let $\lambda_{i},\bm{v}_{i}$ be the eigenvalues and eigenvectors of $\widehat{P}$ , for each $i\in[k]$ . Let $\bm{w}_{i}\in\mathbb{R}^{n}$ be a stepwise vector defined as

[TABLE]

where $\bm{v}_{i}(j)$ indicates the $j$ -th component of $\bm{v}_{i}$ , and then the $|V_{j}|$ components relative to $V_{j}$ are all equal to $\bm{v}_{i}(j)$ .

Since the eigenvectors $\bm{v}_{i}$ of $\widehat{P}$ are linearly independent, the vectors $\bm{w}_{i}$ are also linearly independent. Moreover, it is easy to see that $P\bm{w}_{i}=\lambda_{i}\bm{w}_{i}$ by just verifying the equation for every $i\in[k]$ .

Assume $P$ has $k$ stepwise linearly independent eigenvectors $\bm{w}_{i}$ , associated to $k$ eigenvalues $\lambda_{i}$ , for each $i\in[k]$ . Let $\bm{v}_{i}\in\mathbb{R}^{k}$ the vector that has as components the $k$ constant values in the steps of $\bm{w}_{i}$ . Since the $\bm{w}_{i}$ are linearly independent, the $\bm{v}_{i}$ also are.

For every eigenvector $\bm{w}_{i}$ and for every two states $x,y\in V_{l}$ , for every $l\in[k]$ , we have that $\lambda_{i}\bm{w}_{i}(x)=\lambda_{i}\bm{w}_{i}(y)$ since $\bm{w}_{i}$ is stepwise. Then, since $P\bm{w}_{i}=\lambda_{i}\bm{w}_{i}$ , we have that

[TABLE]

Thus $\sum_{j=1}^{k}\bm{v}_{i}(j){\sum_{z\in V_{j}}\left(P_{xz}-P_{yz}\right)}=0$ and then it follows that

[TABLE]

where $\bm{u}_{xy}(j):=\sum_{z\in V_{j}}\left(P_{xz}-P_{yz}\right)$ . Since the $\bm{v}_{i}$ are $k$ linearly independent vectors in a $k$ -dimensional space, $\bm{u}_{xy}$ cannot be orthogonal to all of them and then it has to be the null vector, i.e., $\bm{u}_{xy}(j)=0$ for all $j\in[k]$ . This implies that $P$ is ordinary lumpable, i.e., $\sum_{z\in V_{j}}P_{xz}=\sum_{z\in V_{j}}P_{yz}$ . It is easy to verify that the eigenvalues and eigenvectors of $\widehat{P}$ are exactly $\lambda_{i},\bm{v}_{i}$ , with $i\in[k]$ . ∎

4 Averaging dynamics on clustered volume regular graphs

Let $n_{{}_{\min}}:=\min_{i\in[k]}|V_{i}|$ and $n_{{}_{\max}}:=\max_{i\in[k]}|V_{i}|$ be the maximum and minimum sizes of the communities of a volume-regular graph $G=(V,E,w)$ with $n$ nodes and $k$ -partition $\mathcal{V}=\{V_{1},\ldots,V_{k}\}$ . Recall also that $\Delta$ is the maximum weighted degree of the nodes of $G$ and $\lambda_{1},\dots,\lambda_{n}$ are the eigenvalues of the transition matrix of a random walk on $G$ (see Section 2). In this section we prove the following result.

Theorem 4.1.

Let $G=(V,E,w)$ be a connected clustered $k$ -volume-regular graph with $n$ nodes and $k$ -partition $\mathcal{V}=\{V_{1},\ldots,V_{k}\}$ , such that $\Delta\leqslant\frac{\sqrt{n_{{}_{\min}}}}{25}$ and $2\Delta(n_{{}_{\max}}/n_{{}_{\min}})<k\leqslant\sqrt{n}$ . Assume further that $1-\lambda_{2}\leqslant\frac{\lambda_{k}\log(\lambda_{k}/\lambda_{k+1})}{7\log(2\Delta n)}$ and $\lambda_{k}\geqslant\frac{7\lambda_{2}-5}{2}$ . A non-empty time interval $[T_{1},T_{2}]$ exists, with $T_{1}=\mathcal{O}\left(\frac{\log n}{\log(\lambda_{k}/\lambda_{k+1})}\right)$ and $T_{2}=\Omega\left(\frac{\lambda_{k}}{1-\lambda_{2}}\right)$ , such that for each $t\in[T_{1},T_{2}]$ , the Averaging dynamics truncated at round $t$ is a $(\mathcal{O}(n^{-1/2}),\,1-\Omega(1))$ -community-sensitive algorithm.

Remark 1 (The extent of the time-window).

Notice that the time window cannot be too long: by Cheeger’s inequality $1-\lambda_{2}\geqslant\frac{h_{G}^{2}}{2}\geqslant 1/(2\Delta^{2}n^{2})$ ,555This can be seen by observing that: $i)$ the minimum volume of a cut must be at least half the minimum degree of the graph, which we normalize to $1$ , and $ii)$ in computing $h_{G}$ , we restrict to subsets of volume at most $\mathrm{vol}(G)$ , which is at most $\Delta n$ . thus $T_{2}=\mathcal{O}(\Delta^{2}n^{2})$ .

Remark 2 (The extent of non-regularity).

Notice that the condition $k>2\Delta(n_{{}_{\max}}/n_{{}_{\min}})$ implies

[TABLE]

In other words, the Averaging dynamics gives a good community-sensitive labeling when the communities are not too unbalanced in terms of their volumes. Moreover, the smaller the number of communities the more the volume-balance requirement is tight.

In the remainder of this section, we first introduce further notation and then state the main technical lemmas (Lemmas 4.2, 4.3 and 4.4), that will be used in the proof of Theorem 4.1, which concludes this section.

Let $G=(V,E,w)$ be a clustered $k$ -volume regular graph and, without loss of generality, let $V_{1},\dots,V_{k}$ be an arbitrary ordering of its communities. We introduce a family of stepwise vectors that generalize Fiedler vector [Fie89], namely

[TABLE]

where ${\mathbf{1}}_{V_{i}}$ is the indicator vector of the set $V_{i}$ and, for convenience sake, we denoted by $m_{i}$ the volume of the $i$ -th community, $\hat{V}_{i}$ the set of all nodes in communities $i+1,\dots,k$ , and $\hat{m}_{i}$ the volume of $\hat{V}_{i}$ , i.e., $m_{i}:=\sum_{u\in V_{i}}\delta(u)$ , $\hat{V}_{i}:=\bigcup_{h=i+1}^{k}V_{h}$ , and $\hat{m}_{i}:=\sum_{h=i+1}^{k}m_{h}.$ Note that vectors $\bm{\chi}_{i}$ s are “stepwise” with respect to the communities of $G$ (i.e., for every $i\in[k-1]$ , $\bm{\chi}_{i}(u)=\bm{\chi}_{i}(v)$ whenever $u$ and $v$ belong to the same community).

Recall from Eq. 1 that the initial state vector can be written as $\bm{x}=\sum_{i=1}^{n}\alpha_{i}\bm{v}_{i}$ . Let $\bm{z}:=\sum_{i=1}^{k}\alpha_{i}\bm{v}_{i}$ and note that $\bm{z}=\alpha_{1}{\mathbf{1}}{}+\sum_{i=1}^{k-1}\gamma_{i}\bm{\chi}_{i}$ by applying Lemma 2.4 and because $Span\left(\{{\mathbf{1}}{},\bm{\chi}_{1},\dots,\bm{\chi}_{k-1}\}\right)=Span\left(\{{\mathbf{1}}_{V_{1}},\dots,{\mathbf{1}}_{V_{k}}\}\right)$ . Let us now define the vector $\bm{y}:=\bm{z}-\alpha_{1}{\mathbf{1}}{}$ or, equivalently,

[TABLE]

Note that the coefficients $\gamma_{i}$ s are proportional to the length of the projection of the (inhomogeneously) contracted state vector on the (inhomogeneously) contracted $D^{\frac{1}{2}}{}\bm{\chi}_{i}$ s; the previous expression is valid since the vectors in $\{D^{\frac{1}{2}}{}{\mathbf{1}}{}\}\cup\{D^{\frac{1}{2}}{}\bm{\chi}_{i}:i\in[k-1]\}$ are mutually orthogonal.666The mutual orthogonality of the vectors, including $D^{\frac{1}{2}}{}{\mathbf{1}}{}$ , is also one of the reasons why other “simpler” families of stepwise vectors, e.g., the indicator vectors of the communities, are not used instead.

In Lemma 4.2 we show that every component of $\bm{y}$ , i.e., the projection of the (inhomogeneously) contracted initial state vector $D^{\frac{1}{2}}\bm{x}$ on the (inhomogeneously) contracted vectors $D^{\frac{1}{2}}{}\bm{\chi}_{i}$ s, is not too small, w.h.p.

Lemma 4.2 (Length of the projection of the state vector).

Let $G=(V,E,w)$ be a connected clustered $k$ -volume-regular graph with $n$ nodes and $k$ -partitions $\mathcal{V}=\{V_{1},\ldots,V_{k}\}$ . Under the hypotheses of Theorem 4.1, for every $u\in V$ ,

[TABLE]

Proof.

Without loss of generality, we assume $u\in V_{1}$ , which possibly just amounts to a relabeling of the nodes. With this assumption, we have

[TABLE]

where the second equality follows from the definitions of the $\bm{\chi}_{i}$ ’s (Eq. 5) and the fact that $u\in V_{1}$ . Next, observe that we have:

[TABLE]

where $m:=\mathrm{vol}(V)$ . We now bound

[TABLE]

More precisely, we prove that it is at least $1/m$ with probability $1-\mathcal{O}\left(\frac{1}{\sqrt{n}}\right)$ , where probability is computed over the randomness of $\bm{x}$ .

Assume for the moment that $\hat{m}_{1}\geqslant m_{1}$ . From the definition of $\bm{\chi}_{1}$ we have:

[TABLE]

Now, set $\bm{w}=D\left(\frac{\hat{m}_{1}}{m_{1}}\bm{1}_{V_{1}}-\bm{1}_{\hat{V}_{1}}\right)$ and note that $|\bm{w}(u)|\geqslant 1$ from the hypothesis that $\hat{m}_{1}\geqslant m_{1}$ and since $\delta(v)\geqslant 1$ , for every $v\in V$ . We can thus apply Theorem A.5 to $\bm{w}$ with $r=0$ , so that we can write:

[TABLE]

where the equality follows since $\bm{x}^{\intercal}D\bm{\chi}_{1}=\sqrt{\frac{m_{1}}{\hat{m}_{1}}}\bm{x}^{T}\bm{w}$ . Hence, with probability $1-\mathcal{O}\left(\frac{1}{\sqrt{n}}\right)$ we have $\left|\gamma_{1}\right|\geqslant\sqrt{\frac{m_{1}}{\hat{m}_{1}}}\cdot\frac{1}{m}$ and thus, with the same probability:

[TABLE]

Assume now that $m_{1}>\hat{m}_{1}$ . This time we write:

[TABLE]

and we set $\bm{w}=D\left(\bm{1}_{V_{1}}-\frac{m_{1}}{\hat{m}_{1}}\bm{1}_{\hat{V}_{1}}\right)$ . Note that, again, $|\bm{w}(v)|\geqslant 1$ for every $v\in V$ . Proceeding as in the previous case we obtain $\left|\gamma_{1}\right|\geqslant\sqrt{\frac{\hat{m}_{1}}{m_{1}}}\cdot\frac{1}{m}$ with probability $1-\mathcal{O}\left(\frac{1}{\sqrt{n}}\right)$ and thus, with the same probability:

[TABLE]

where in $(a)$ we used that $m_{i}<\frac{m}{2}$ (see Remark 2). This concludes the proof. ∎

In Lemma 4.3 we show that given any “pair of steps” of the vector $\bm{y}$ (defined in Eq. 6), the two steps have different signs, with constant probability.

Lemma 4.3 (Different communities, different signs).

Let $G=(V,E,w)$ be a clustered $k$ -volume regular graph with maximum weighted degree $\Delta\leqslant\frac{\sqrt{n_{{}_{\min}}}}{25}$ and with $k>2\Delta(n_{{}_{\max}}/n_{{}_{\min}})$ . For each pair of nodes $u\in V_{i}$ and $v\in V_{j}$ , with $i\neq j$ , it holds that

[TABLE]

Proof.

Since the ordering of the communities (and consequent definition of the $\bm{\chi}_{i}$ ’s, given in Eq. 5) is completely arbitrary, we can assume $i=1$ and $j=2$ , without loss of generality. Let us define $X(V_{i}):=\sum_{w\in V_{i}}\delta(w)\bm{x}(w)$ , where $\bm{x}=\bm{x}^{(0)}$ is the initial state vector.

Note that $\bm{y}(u)=\gamma_{1}\bm{\chi}_{1}(u)$ and $\bm{y}(v)=\gamma_{1}\bm{\chi}_{1}(v)+\gamma_{2}\bm{\chi}_{2}(v)$ , since the other terms of the $\bm{\chi}_{i}$ s are equal to 0 on the components relative to $u$ and $v$ . Thus, with some algebra, we get

[TABLE]

where $\hat{V}_{i}:=\bigcup_{h=i+1}^{k}V_{h}$ . Note that, by linearity of expectation, $\mathbf{E}\left[X(V_{i})\right]=0$ . Moreover, since the terms $\bm{x}(w)$ s are independent Rademacher random variables, we can write the standard deviation of $X(V_{i})$ as

[TABLE]

Then we can upper and lower bound the standard deviation $\sigma(X(V_{i}))$ getting $\frac{m_{i}}{\sqrt{n_{i}}}\leqslant\sigma(X(V_{i}))\leqslant\Delta\sqrt{n_{i}},$ where the lower bound follows from $\left\|\bm{d}\right\|_{2}\geqslant\left\|\bm{d}\right\|_{1}/\sqrt{n_{i}}$ , where $\bm{d}_{i}$ is the vector of weighted degrees of nodes in community $V_{i}$ , and for the upper bound we used that $\delta(w)\leqslant\Delta$ , for each $w\in V$ .

Let us now define the following three events:

$E_{1}\text{: }X(V_{1})>\sigma(X(V_{1}))\implies X(V_{1})>\frac{m_{1}}{\sqrt{n_{1}}}\geqslant\frac{\min_{i}m_{i}}{\sqrt{n_{{}_{\max}}}}$ ; 2. 2.

$E_{2}\text{: }X(V_{2})<-\sigma(X(V_{2}))\implies X(V_{2})<-\frac{m_{2}}{\sqrt{n_{2}}}\leqslant-\frac{\min_{i}m_{i}}{\sqrt{n_{{}_{\max}}}}$ ; 3. 3.

$E_{3}\text{: }0\leqslant X(\hat{V}_{2})<\left(2/\sqrt{k}\right)\sigma(X(\hat{V}_{2}))\implies 0\leqslant X(\hat{V}_{2})<2\Delta\sqrt{(1/k)\sum_{i=3}^{k}n_{i}}\leqslant 2\Delta\sqrt{n_{{}_{\max}}}$ ,

When $E_{1},E_{2},E_{3}$ are true it follows directly that $\bm{y}(v)<0$ . As for $\bm{y}(u)>0$ we have

[TABLE]

since, for the last inequality, $k>2\Delta(n_{{}_{\max}}/n_{{}_{\min}})$ by hypothesis.

Note that all three events $E_{1},E_{2},E_{3}$ have probability at least constant and, being the events independent, also $\mathbf{P}\left(E_{1}\cap E_{2}\cap E_{3}\right)$ is constant. Indeed, it is possible to prove the constant lower bounds on the probabilities by approximating the random variables with Gaussian ones using Berry-Esseen’s theorem (Theorem A.4). Note that $X(V_{1}),X(V_{2}),X(\hat{V}_{2})$ all are of the form $Z=\sum_{w\in T}Z_{w}$ , for some $T\subseteq V$ and where $Z_{w}=\delta(w)\bm{x}(w)$ . Recall that $\mathbf{E}\left[Z_{w}\right]=0$ and that $\sigma^{2}(Z_{w})=\delta(w)^{2}$ . Moreover, note that the third absolute moment of $Z_{w}$ is $\mathbf{E}\left[\lvert Z_{w}\rvert^{3}\right]=\delta(w)^{3}\mathbf{E}\left[\lvert\bm{x}(w)\rvert^{3}\right]=\delta(w)^{3}.$ Therefore we can apply Theorem A.4 which claims that there exists a positive constant $C\leqslant 1.88$ [Ber41] such that, for every $z\in\mathbb{R}$ ,

[TABLE]

where $\Phi$ is the cumulative distribution function of the standardized normal distribution. Thus

[TABLE]

Since $\Delta\leqslant\frac{\sqrt{n_{{}_{\min}}}}{25}$ by hypothesis and $\sigma(Z)\geqslant\sqrt{|T|}$ for every $T\subseteq V$ , taking $z=1$ it follows from Eq. 7 that

[TABLE]

since $\frac{1}{25}<\frac{1-\Phi(1)}{2C}\approx 0.042$ . Since the distribution of $Z$ is symmetric for every $T\subseteq V$ , it holds that $\mathbf{P}\left(E_{2}\right)\geqslant 1-\Phi(1)-\frac{C}{25}\approx 0.08$ . Similarly, it also holds that

[TABLE]

Recall that the binary labeling of each node only depends on the difference of its state in two consecutive rounds (see Algorithm 1). In Lemma 4.4 we show that, under suitable assumptions on the transition matrix of a random walk on $G$ , a large enough time window exists where, for each node $u$ , the sign of the difference $\bm{x}^{(t)}(u)-\bm{x}^{(t+1)}(u)$ of the state vector across two consecutive rounds equals the sign of $\bm{y}(u)$ , w.h.p. Since $\bm{y}$ (defined in Eq. 6) is a stepwise vector, this implies that two nodes in the same community have the same label, w.h.p. For the sake of readability, in the proof of Lemma 4.4 we use two technical lemmas as black boxes, postponing their proofs to Subsection 4.1.

Lemma 4.4 (Sign of the difference).

Let $G=(V,E,w)$ be a clustered $k$ -volume regular graph with maximum weighted degree $\Delta\leqslant\frac{\sqrt{n_{{}_{\min}}}}{25}$ . If $\lambda_{k}\geqslant\frac{7\lambda_{2}-5}{2}$ , $1-\lambda_{2}\leqslant\frac{\lambda_{k}\log(\lambda_{k}/\lambda_{k+1})}{7\log(2\Delta n)}$ , $\left|\bm{y}(u)\right|\geqslant\frac{1}{\Delta n}$ for every $u\in V$ , then a non-empty time interval $[T_{1},T_{2}]$ exists, with $T_{1}=\mathcal{O}\left(\frac{\log n}{\log(\lambda_{k}/\lambda_{k+1})}\right)$ and $T_{2}=\Omega\left(\frac{\lambda_{k}}{1-\lambda_{2}}\right)$ , such that, for every $u\in V$ and every $t\in[T_{1},T_{2}]$ of the Averaging dynamics,

[TABLE]

Proof.

Recall from Eq. 1 that the state vector at time $t$ , i.e., $\bm{x}^{(t)}$ , can be written as the sum of the first $k$ stepwise vectors of $P$ and of the remaining ones, namely

[TABLE]

In what follows we call $\bm{c}^{(t)}:=\sum_{i=2}^{k}\lambda_{i}^{t}\alpha_{i}\bm{v}_{i}$ the core contribution and $\bm{e}^{(t)}:=\sum_{i=k+1}^{n}\lambda_{i}^{t}\alpha_{i}\bm{v}_{i}$ the error contribution. If we look at the difference of the state vector in two consecutive rounds, the first term cancels out being constant over time, so that

[TABLE]

for each node $u\in V$ . Note that the sign of the difference between two consecutive rounds is determined by the difference of the core contributions, $\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u)$ , whenever

[TABLE]

To identify conditions on $t$ for which Eq. 8 holds, we give suitable bounds on both hand sides of the inequality. In more detail:

In Lemma 4.5 we prove that $|\bm{e}^{(t)}(u)|\leqslant\lambda_{k+1}^{t}\sqrt{\Delta n},$ for every $u\in V$ , so that

[TABLE] 2. 2.

In Lemma 4.6 we prove that $\left|\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u)\right|>\lambda_{k}^{t}(1-\lambda_{2})\left|\bm{y}(u)\right|$ for every $u\in V$ and for every time $t<T_{2}$ , where $T_{2}\geqslant\frac{\lambda_{k}}{2(1-\lambda_{2})}$ ; note that the hypotheses on $1-\lambda_{2}$ imply $T_{2}=\Omega\left(\frac{\log n}{\log(\lambda_{k}/\lambda_{k+1})}\right)$ . Moreover, the assumptions of Lemma 4.6 are satisfied, since $\bm{y}(u)\neq 0$ and $\lambda_{k}\geqslant\frac{7\lambda_{2}-5}{2}$ .

Combining Lemma 4.5 and Lemma 4.6, we see that Eq. 8 holds whenever

[TABLE]

An easy calculation shows that this happens for all $t>T_{1}$ , where

[TABLE]

Note that $T_{1}=\mathcal{O}\left(\frac{\log n}{\log(\lambda_{k}/\lambda_{k+1})}\right)$ and, e.g., $T_{1}=\mathcal{O}(\log n)$ whenever $\frac{\lambda_{k}}{\lambda_{k+1}}=1+\Omega(1)$ .

We next show that, under the assumptions of the lemma, the window $[T_{1},T_{2}]$ is not empty and, actually, it has a width that depends on the magnitude of $\lambda_{2}$ and the ratio $\lambda_{k}/\lambda_{k+1}$ . To this purpose, we first observe that Cheeger’s inequality for weighted graphs (Theorem A.3) implies $1-\lambda_{2}\geqslant\frac{h_{G}^{2}}{2}\geqslant\frac{1}{2(\Delta n)^{2}}$ (recall the footnote in Remark 1). Moreover, recalling that we are assuming $\left|\bm{y}(u)\right|\geqslant\frac{1}{\Delta n}$ ,777It may be worth recalling that our hypothesis on $\left|\bm{y}(u)\right|$ holds with high probability from Lemma 4.2. we have:

[TABLE]

where: in $(a)$ we used Cheeger’s inequality (Theorem A.3) in the way described above and our assumptions on $\left|\bm{y}(u)\right|$ , in $(b)$ we used the hypothesis on $1-\lambda_{2}$ , which implies $\log(\lambda_{k}/\lambda_{k+1})\geqslant\frac{7(1-\lambda_{2})\log(2\Delta n)}{\lambda_{k}}$ , and in $(c)$ the lower bound on $T_{2}$ given by Lemma 4.6.

From Lemma 4.6 we also know that $=\mathrm{sgn}(\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u))=\mathrm{sgn}(\bm{y}(u))$ for every time $t<T_{2}$ ; therefore we conclude that

[TABLE]

for every node $u\in V$ and for every round $t\in[T_{1},T_{2}]$ of the Averaging dynamics. ∎

Proof of Theorem 4.1.

The binary labeling of the nodes of $G$ produced by the Averaging dynamics during the time window $[T_{1},T_{2}]$ is such that the two conditions required by the definition of $(\varepsilon,\delta)$ -community-sensitive algorithm (Definition 2.1) are met, with $\varepsilon=\mathcal{O}(n^{-1/2})$ and $\delta=1-\Omega(1)$ . Indeed, the first condition follows directly from Lemma 4.4 together with the fact that $\bm{y}$ is a “stepwise” vector, while Lemma 4.2 implies that $\varepsilon=\mathcal{O}(n^{-\frac{1}{2}})$ , since $\bm{y}(u)$ is not too small with probability at least $1-\mathcal{O}(n^{-1/2})$ . The second condition, instead, follows directly from the combination of Lemmas 4.4 and 4.3. ∎

Remark 3 (Equal-sized communities).

If $\lambda_{k}=\lambda_{2}$ , then an alternative version of Lemma 4.6 would tell us that, for every node $u\in V$ , $\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u)=\lambda_{k}^{t}(1-\lambda_{2})\bm{y}(u)$ and thus $\mathrm{sgn}(\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u))=\mathrm{sgn}(\bm{y}(u))$ , in every round (with no need of $T_{2}$ ); this would imply an infinite time window starting at the first round $t>T_{1}$ (where the “error contribution” becomes small). In this sense our result also covers the case of multiple communities analyzed in [BCN*+*17], with $k$ equal-sized communities in an unweighted graph and then $\lambda_{k}=\lambda_{2}$ .

Remark 4 (Two communities).

Our result also generalizes that of [BCN*+*17] in the simpler case of two communities. In fact we don’t require the graph to be regular, but only volume-regular, thus taking into account communities that are potentially unbalanced. Ideed, for $k=2$ , the Averaging dynamics truncated at round $t$ is a $(\mathcal{O}(n^{-\frac{1}{2}}),\,\mathcal{O}(n^{-\frac{1}{2}}))$ -community-sensitive algorithm for every round $t>T_{1}$ , with $T_{1}=\mathcal{O}\left(\frac{\log n}{\log(\lambda_{2}/\lambda_{3})}\right)$ . Therefore, a single run of the dynamics highlights the community structure, i.e., the sign of the difference $\bm{x}^{(t)}-\bm{x}^{(t)}$ is equal for nodes in the same communities and different for nodes in different communities, w.h.p.

4.1 Proofs for Lemma 4.4

In this section we prove the two lemmas used in the proof of Lemma 4.4: the upper bound on the “error contribution” and the lower bound on the “core contribution.”

Lemma 4.5 (Upper bound on the error contribution).

Let $\bm{e}^{(t)}:=\sum_{i=k+1}^{n}\lambda_{i}^{t}\alpha_{i}\bm{v}_{i}$ . For every $u\in V$ , it holds that

[TABLE]

Proof.

To bound all components of vector $\bm{e}^{(t)}$ we use its $\ell^{\infty}$ norm, defined for any vector $\bm{x}$ as $\|\bm{x}\|_{\infty}:=\sup_{i}|\bm{x}(i)|$ . In particular

[TABLE]

By using Cauchy-Schwarz inequality (Theorem A.2) and applying the definition of spectral norm of an operator, i.e., $\|A\|:=\sup_{\bm{x}:\|\bm{x}=1\|}\|A\bm{x}\|$ , we get that

[TABLE]

since the $\bm{w}_{i}$ s are orthonormal. With some additional simple bounds it follows that

[TABLE]

By using the fact that the spectral norm of a diagonal matrix is equal to its maximum value, we conclude that

[TABLE]

Thus, for every $u\in V$ it holds that $|\bm{e}^{(t)}(u)|\leqslant\sqrt{\|\bm{e}^{(t)}\|_{\infty}^{2}}\leqslant\lambda_{k+1}^{t}\sqrt{\Delta n}.$ ∎

In Lemma 4.6 we show that the difference of the core contribution in consecutive rounds can be approximated, for our purposes in Lemma 4.4, with $\bm{y}$ .

Lemma 4.6 (Lower bound on the core contribution).

Let $\bm{c}^{(t)}:=\sum_{i=2}^{k}\lambda_{i}^{t}\alpha_{i}\bm{v}_{i}$ and let $\bm{y}(u)\neq 0$ for every $u\in V$ . If $\lambda_{k}\geqslant\frac{7\lambda_{2}-5}{2}$ , then, for every $u\in V$ and for every $t\leqslant T_{2}$ , with $T_{2}\geqslant\frac{\lambda_{k}}{2(1-\lambda_{2})}$ , the following holds:

•

$\mathrm{sgn}(\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u))=\mathrm{sgn}(\bm{y}(u))$ ;

•

$\left|\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u)\right|>\lambda_{k}^{t}(1-\lambda_{2})\left|\bm{y}(u)\right|$ .

Proof.

Let us define $d_{i,j}:=\lambda_{i}-\lambda_{j}$ . Note that

[TABLE]

where in the last equality we applied Lemma 2.4 to get $\sum_{i=2}^{k}\alpha_{i}\bm{v}_{i}=\bm{y}$ , and where we defined $c_{i}:=\lambda_{k}^{t}d_{2,i}+[(\lambda_{k}+d_{i,k})^{t}-\lambda_{k}^{t}](1-\lambda_{2}+d_{2,i})$ . Using the definition of $d_{i,j}$ , we get

[TABLE]

Note that $\min_{i}[\lambda_{i}^{t}(1-\lambda_{i})]-\lambda_{k}^{t}(1-\lambda_{2})\leqslant c_{i}\leqslant\max_{i}[\lambda_{i}^{t}(1-\lambda_{i})]-\lambda_{k}^{t}(1-\lambda_{2})$ , for every $i\in\{2,\dots,k\}$ . Since the minimum and the maximum are obtained for $i=k$ and $i=2$ respectively, we have $\lambda_{k}^{t}(\lambda_{2}-\lambda_{k})\leqslant c_{i}\leqslant(\lambda_{2}^{t}-\lambda_{k}^{t})(1-\lambda_{2})$ . Let us call the positive and negative terms of $\bm{y}(u)$ as

[TABLE]

Therefore, for each $u\in V$ , it holds that

[TABLE]

In the following we look for a time $T_{2}$ such that, for every $t\leqslant T_{2}$ it holds that

[TABLE]

Note that $\bm{y}(u)=\bm{y}^{+}(u)-\bm{y}^{-}(u)$ . We consider two cases: $\bm{y}(u)>0$ and $\bm{y}(u)<0$ .

Case $\bm{y}(u)>0$ : We look for a time $T_{2}$ such that, for every time $t\leqslant T_{2}$ , it holds that

[TABLE]

Indeed, since $\bm{y}(u)>0$ , we can use $\bm{y}^{+}(u)>\bm{y}^{-}(u)$ to upper bound the right hand side of the previous equation, so that Eq. 13 holds for every $t$ that satisfies:

[TABLE]

i.e., for every $t\leqslant T_{2}$ , where

[TABLE]

Next, note that $\frac{\lambda_{2}-\lambda_{k}}{1-\lambda_{2}}\leqslant\frac{5}{2}$ whenever $\lambda_{k}\geqslant\frac{7\lambda_{2}-5}{2}$ . Hence, under the hypotheses of the lemma, we can use that $1+x\geqslant e^{\frac{x}{2}}$ for every $x\in[0,\frac{5}{2}]$ and $1+x\leqslant e^{x}$ for every $x$ . Thus:

[TABLE]

Plugging Eq. 13 into Eq. 11 we finally get

[TABLE]

Case $\bm{y}(u)<0$ : Proceeding along the same lines we obtain:

[TABLE]

for every $t\leqslant T_{2}$ . Therefore, by combining Eq. 15 and Eq. 12 we obtain

[TABLE]

Finally, by combining Eq. 14 and Eq. 16:

•

$\mathrm{sgn}(\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u))=\mathrm{sgn}(\bm{y}(u))$ ;

•

$\left|\bm{c}^{(t)}(u)-\bm{c}^{(t+1)}(u)\right|>\lambda_{k}^{t}(1-\lambda_{2})\left|\bm{y}(u)\right|$ .∎

5 Bipartite graphs

Assume $G=(V,E,w)$ is an edge-weighted bipartite graph with $V=V_{1}\cup V_{2}$ and $E\subseteq V_{1}\times V_{2}$ , i.e. a graph with hidden partition identified by the bipartition. In this case, basic properties of random walks imply that the Averaging dynamics does not converge to the global (weighted) average of the values, but it periodically oscillates. In fact, in this case the transition matrix $P$ has an eigenvector $\bm{\chi}=\bm{1}_{V_{1}}-\bm{1}_{V_{2}}$ with eigenvalue $\lambda_{n}=-1$ (as implied by Lemma 5.1). Thus, the state vector is mainly affected by the eigenvectors associated to the two eigenvalues of absolute value $1$ (i.e., $\lambda_{1}$ and $\lambda_{n}$ ). After a number of rounds of the dynamics that depends on $1/\lambda_{2}$ , we have that, in even rounds, all nodes in $V_{i}$ ( $i=1,2$ ) have a state that is close to some local average $\mu_{i}$ ; in odd rounds, these values are swapped (as shown in Eq. 17).

If one were observing the process in even rounds,888Or, equivalently, in odd rounds. however, the states of nodes in $V_{1}$ would converge to $\mu_{1}$ and those of nodes in $V_{2}$ would converge to $\mu_{2}$ . Unfortunately, convergence to the local average for nodes belonging to the same community does not eventually become monotone (i.e., increasing or decreasing). This follows since the eigenvector associated to $\lambda_{2}$ is no longer stepwise in general. However, we can easily modify the labeling scheme of the Averaging dynamics to perform bipartiteness detection as follows: Nodes apply the labeling rule every two time steps and they do it between the states of two consecutive rounds, i.e., each node $v\in V$ sets $\texttt{label}^{(2t)}(v)=1$ if $\bm{x}^{(2t)}(v)\geqslant\bm{x}^{(2t-1)}(v)$ and $\texttt{label}^{(2t)}(v)=0$ otherwise. We call this new protocol Averaging Bipartite dynamics.

We now show how Averaging Bipartite dynamics can perform bipartiteness detection. Recall that we denote with $W\in\mathbb{R}^{n\times n}$ the weighted adjacency matrix of $G$ . Since $G$ is undirected and bipartite, the matrix $W$ can be written as

[TABLE]

Thus, the transition matrix of a random walk on $G$ , i.e., $P=D^{-1}W$ where $D^{-1}$ is a diagonal matrix and $D_{ii}=\frac{1}{\delta(i)}$ , has the form

[TABLE]

Lemma 5.1 shows that the spectrum of $P$ is symmetric and it gives a relation between the eigenvectors of symmetric eigenvalues.

Lemma 5.1.

Let $G=(V_{1}\cup V_{2},E,w)$ be an edge-weighted undirected bipartite graph with bipartition $(V_{1},V_{2})$ and such that $|V_{i}|=n_{i}$ . If $\bm{v}=(\bm{v}_{1},\bm{v}_{2})^{T}$ , with $\bm{v}_{i}\in\mathbb{R}^{n_{i}}$ , is an eigenvector of $P$ with eigenvalue $\lambda$ , then $\bm{v}^{\prime}=(\bm{v}_{1},-\bm{v}_{2})^{\intercal}$ is an eigenvector of $P$ with eigenvalue $-\lambda$ .

Proof.

If $P\bm{v}=\lambda\bm{v}$ then we have that $P_{1}\bm{v}_{2}=\lambda\bm{v}_{1}$ and $P_{1}^{\intercal}\bm{v}_{2}=\lambda\bm{v}_{2}$ . Using these two equalities we get that $P\bm{v}^{\prime}=-\lambda\bm{v}^{\prime}$ . Indeed,

[TABLE]

The transition matrix $P$ is stochastic, thus the vector $\bm{1}$ (i.e., the vector of all ones) is an eigenvector associated to $\lambda_{1}=1$ , that is the first largest eigenvalue of $P$ . Lemma 5.1 implies that $\bm{\chi}=\bm{1}_{V_{1}}-\bm{1}_{V_{2}}$ is an eigenvector of $P$ with eigenvalue $\lambda_{n}=-1$ .

As in Section 2, we write the state vector at time $t$ using the spectral decomposition of $P$ . Let $1=\lambda_{1}>\lambda_{2}\geqslant\ldots>\lambda_{n}=-1$ be the eigenvalues of $P$ . We denote by $\bm{1}=\bm{v}_{1},\bm{v}_{2},\ldots,\bm{v}_{n}=\bm{\chi}$ a family of $n$ linearly independent eigenvectors of $P$ , where each $\bm{v}_{i}$ is the eigenvector associated to $\lambda_{i}$ . Thus, we have that

[TABLE]

where $\alpha_{i}=\frac{\langle D^{\frac{1}{2}}\bm{x},D^{\frac{1}{2}}\bm{v}_{i}\rangle}{\|D^{\frac{1}{2}}\bm{v}_{i}\|^{2}}$ . The last equation implies that $\bm{x}^{(t)}=P^{t}\bm{x}$ does not converge to some value as $t$ tends to infinity, but oscillates. In particular, nodes in $V_{1}$ on even rounds and nodes in $V_{2}$ on odd rounds, converge to $\alpha_{1}+\alpha_{n}$ . Instead in the symmetric case, i.e., odd rounds for nodes in $V_{1}$ and even rounds for nodes in $V_{2}$ , the process converges to $\alpha_{1}-\alpha_{n}$ . These quantities are proportional to the weighted average of the initial values in the first and in the second partition, respectively.

Theorem 5.2, whose proof follows, shows that Averaging Bipartite dynamics performs bipartiteness detection in $\mathcal{O}(\log n\,/\,\log(1/\lambda_{2}))$ rounds. Note that, as in the case of volume-regular graphs with two communities (see Remark 4), one single run of the dynamics identifies the bipartition. Moreover, if $\log(1/\lambda_{2})=\Omega(1)$ , then the Averaging Bipartite dynamics takes logarithmic time to find the bipartition.

Theorem 5.2.

Let $G=(V,E,w)$ be an edge-weighted bipartite graph with bipartition $(V_{1},V_{2})$ and maximum weighted degree $\Delta=\mathcal{O}(n^{K})$ , for any arbitrary positive constant $K$ . Then for every time $t>T$ , with $T=\mathcal{O}(\log n\,/\,\log(1/\lambda_{2}))$ , the Averaging Bipartite dynamics truncated at round $t$ is a $(\mathcal{O}(n^{-\frac{1}{2}}),\,\mathcal{O}(n^{-\frac{1}{2}}))$ -community-sensitive algorithm.

Proof.

We assume that the labeling rule is applied between every even and every odd round (conversely, the signs of the nodes in the analysis are swapped). Recall the definition of the error contribution, namely $\bm{e}^{(t)}(u)=\sum_{i=2}^{n-1}\lambda_{i}^{t}\alpha_{i}\bm{v}_{i}(u).$ We compute the difference between the state vectors of two consecutive steps by using Eq. 17, namely

[TABLE]

We want to find a time $T$ such that for every $t>T$ the sign of a node $u\in V$ depends only on $\bm{\chi}(u)$ , i.e., $\mathrm{sgn}(\bm{x}^{(2t)}(u)-\bm{x}^{(2t+1)}(u))=\mathrm{sgn}(\alpha_{n}\bm{\chi}(u))$ . Since $|\bm{\chi}(u)|=1$ , the last equation holds whenever

[TABLE]

We upper bound $|\bm{e}^{(2t)}(u)-\bm{e}^{(2t+1)}(u)|$ by using Lemma 4.5, getting that $|\bm{e}^{(2t)}(u)-\bm{e}^{(2t+1)}(u)|\leqslant 2\lambda_{2}^{2t}\sqrt{\Delta n}$ . Therefore, with some algebra we get that Eq. 18 holds in every round $t>T$ , where $T$ is defined as

[TABLE]

To conclude the proof, we provide a lower bound on $|\alpha_{n}|$ showing that it is not too small, w.h.p. Recall that $\alpha_{i}=\frac{\langle D^{\frac{1}{2}}\bm{x},D^{\frac{1}{2}}\bm{v}_{i}\rangle}{\|D^{\frac{1}{2}}\bm{v}_{i}\|^{2}}$ and thus

[TABLE]

The lower bound then follows, with high probability. Indeed,

[TABLE]

where in $(a)$ we used Eq. 19 and in $(b)$ we applied Theorem A.5. The thesis then follows from the above bound on $|\alpha_{n}|$ and from the hypothesis on $\Delta=\mathcal{O}(n^{K})$ , for any arbitrary positive constant $K$ . ∎

6 Discussion and Outlook

The focus of this work is on heuristics that implicitely perform spectral graph clustering, without explicitely computing the main eigenvectors of a matrix describing connectivity properties of the underlying network (typically, its Laplacian or a related matrix). In this perspective, we extended the work of Becchetti et al. [BCN*+*17] in several ways. In particular, for $k$ communities, [BCN*+*17] considered an extremely regular case, in which the second eigenvalue of the (normalized) Laplacian has algebraic and geometric multiplicities $k-1$ and the corresponding eigenspace is spanned by a basis of indicator vectors. We considered a more general case in which the first $k$ eigenvalues are in general different, but the span of the corresponding eigenvectors again admits a base of indicator vectors. We also made a connection between this stepwise property and lumpability properties of the underlying random walk, which results in a class of volume-regular graphs, that may not have constant degree, nor exhibit balanced communities. We further showed that our approach naturally lends itself to addressing related, yet different problems, such as identifying bipartiteness. Finally, in the paragraphs that follow we discuss extensions to slightly more general classes than the ones considered in this work.

Other graph classes.

Consider $k$ -volume regular graphs whose $k$ stepwise eigenvectors are associated to the $k$ largest eigenvalues, in absolute value. These graphs include many $k$ -partite graphs (e.g., regular ones), graphs that are “close” to being $k$ -partite (i.e., ones that would become $k$ -partite upon removal of a few edges). Differently from the clustered case (Theorem 4.1) some of the $k$ eigenvalues can in general be negative.

Consider the following variant of the labeling scheme of the Averaging dynamics, in which nodes apply their labeling rule only on even rounds, comparing their value with the one they held at the end of the last even round, i.e., each node $v\in V$ sets $\texttt{label}^{(2t)}(v)=1$ if $\bm{x}^{(2t)}(v)\geqslant\bm{x}^{(2t-2)}(v)$ and $\texttt{label}^{(2t)}(v)=0$ otherwise. Since the above protocol amounts to only taking even powers of eigenvalues, the analysis of this modified protocol proceeds along the same lines as the clustered case, while the results of Theorem 4.1 seamlessly extend to this class of graphs.

Outlook.

Though far from conclusive, we believe our results point to potentially interesting directions for future research. In general, our analysis sheds further light on the connections between temporal evolution of the power method and spectral-related clustering properties of the underlying network. At the same time, we showed that variants of the Averaging dynamics (and/or its labeling rule) might be useful in addressing different problems and/or other graph classes, as the examples given in Section 5 suggest. On the other hand, identifying $k$ hidden partitions using the algorithm presented in [BCN*+*17] requires relatively strong assumptions on the $k$ main eigenvalues and knowledge of an upper bound to the graph size,999As anecdotal experimental evidence suggests, the presence of a time window to perform labeling is not an artifact of our analysis. while the analysis becomes considerably more intricate than the perfectly regular and completely balanced case addressed in [BCN*+*17]. Some aspects of our analysis (e.g., the aforementioned presence of a size-dependent time window in which the labeling rule has to be applied) suggest that more sophisticated variants of the Averaging dynamics might be needed to express the full power of a spectral method that explicitely computes the $k$ main eigenvectors of a graph-related matrix. While we believe this goal can be achieved, designing and analyzing such an algorithm might prove a challenging task.

Appendix

Appendix A Useful inequalities

Theorem A.1 (Extension of Chernoff Bounds [DP09]).

Let $X=\sum_{i=1}^{n}X_{i}$ where $X_{i}$ are independent distributed random variables taking values in $\{0,1\}$ and let $\mu=\mathbf{E}\left[X\right]$ . Suppose that $\mu_{L}\leqslant\mu\leqslant\mu_{H}$ . Then, for $0<\delta<1$ ,

[TABLE]

Theorem A.2 (Cauchy-Schwarz’s inequality).

For all vectors $\bm{u},\bm{v}$ of an inner product space it holds that $|\langle\bm{u},\bm{v}\rangle|^{2}\leqslant\langle{\bm{u},\bm{u}}\rangle\cdot\langle{\bm{v},\bm{v}}\rangle,$ where $\langle{\cdot,\cdot}\rangle$ is the inner product.

Theorem A.3 (Cheeger’s inequality [Chu96]).

Let $P$ be the transition matrix of a connected edge-weighted graph $G=(V,E,w)$ and let $\lambda_{2}$ be its second largest eigenvalue. Let $|E(S,V\setminus S)|=\sum_{u\in S,\,v\in V\setminus S}w(u,v)$ and $h_{G}=\min_{S:\mathrm{vol}(S)\leqslant\frac{\mathrm{vol}(V)}{2}}\frac{|E(S,V\setminus S)|}{\mathrm{vol}(S)}.$ Then

[TABLE]

Theorem A.4 (Berry-Esseen’s theorem [Ber41]).

Let $X_{1},\ldots,X_{n}$ be independent random variables with mean $\mu_{i}=0$ , variance $\sigma_{i}^{2}>0$ , and third absolute moment $\rho_{i}<\infty$ , for every $i=1,\ldots,n$ . Let $S_{n}=\sum_{i=1}^{n}X_{i}$ and let $\sigma=\sqrt{\sum_{i=1}^{n}\sigma_{i}^{2}}$ be the standard deviation of $S_{n}$ ; let $F_{n}$ be the cumulative distribution function of $\frac{S_{n}}{\sigma}$ ; let $\Phi$ the cumulative distribution function of the standard normal distribution. Then, there exists a positive constant $C$ such that, for all $x$ and for all $n$ ,

[TABLE]

where $\psi:=\max_{i\in\{1,\ldots,n\}}\frac{\rho_{i}}{\sigma_{i}^{2}}$ .

Theorem A.5 (Littlewood-Offord’s small ball [Erd45]).

Let $a_{1},\dots,a_{n}\in\mathbb{R}$ be real numbers with $|a_{i}|\geqslant 1$ for every $i=1,\dots,n$ and let $r\in\mathbb{R}$ be any real number. Let $\{X_{i}\,:\,i=1,\dots,n\}$ be a family of independent Rademacher random variables (taking values $\pm 1$ with probability $1/2$ ) and let $X$ be their sum weighted with the $a_{i}$ s, i.e., $X=\sum_{i=1}^{n}a_{i}X_{i}$ , then

[TABLE]

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AAE 08] Dana Angluin, James Aspnes, and David Eisenstat. A Simple Population Protocol for Fast Robust Approximate Majority. Distributed Computing , 21(2):87–102, 2008. (Preliminary version in DISC’07).
2[ABH 14] Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Trans. on Information Theory , 62(1):471–487, 2014.
3[AS 15] Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap. ar Xiv preprint ar Xiv:1512.09080 , 2015.
4[BC 09] Michael J. Barber and John W. Clark. Detecting network communities by propagating labels under constraints. Physical Review E , 80(2):026129, 2009.
5[BCM + 18] Luca Becchetti, Andrea E.F. Clementi, Pasin Manurangsi, Emanuele Natale, Francesco Pasquale, Prasad Raghavendra, and Luca Trevisan. Average whenever you meet: Opportunistic protocols for community detection. In 26th Annual European Symposium on Algorithms, ESA 2018, August 20-22, 2018, Helsinki, Finland , pages 7:1–7:13, 2018.
6[BCN + 17] Luca Becchetti, Andrea E.F. Clementi, Emanuele Natale, Francesco Pasquale, and Luca Trevisan. Find your place: Simple distributed algorithms for community detection. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 , pages 940–959, 2017.
7[BCPR 19] Luca Becchetti, Emilio Cruciani, Francesco Pasquale, and Sara Rizzo. Step-by-step community detection in volume-regular graphs. In 30th International Symposium on Algorithms and Computation (ISAAC 2019) . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
8[Ber 41] Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the american mathematical society , 49(1):122–136, 1941.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Step-by-Step Community Detection in Volume-Regular Graphs

Abstract

1 Introduction

1.1 Our contributions

1.2 Further related work

Decentralized heuristics for block reconstruction.

General algorithms for block reconstruction.

1.3 Roadmap

2 Preliminaries

Notation.

2.1 Averaging dynamics

Spectral decomposition of the transition matrix.

2.2 Community-sensitive algorithms

Definition 2.1** (Community-sensitive algorithm).**

Community-sensitive labeling.

Lemma 2.2** (Community-sensitive labeling).**

Proof.

2.3 Volume-regular graphs

Definition 2.3** (Volume-regular graph).**

Lemma 2.4**.**

Definition 2.5** (Clustered volume regular graph).**

3 Volume-regular graphs and lumpable Markov chains

Definition 3.1** (Ordinary lumpability of Markov Chains).**

Lemma 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

4 Averaging dynamics on clustered volume regular graphs

Theorem 4.1**.**

Remark 1** (The extent of the time-window).**

Remark 2** (The extent of non-regularity).**

Lemma 4.2** (Length of the projection of the state vector).**

Proof.

Lemma 4.3** (Different communities, different signs).**

Proof.

Lemma 4.4** (Sign of the difference).**

Proof.

Proof of Theorem 4.1.

Remark 3** (Equal-sized communities).**

Remark 4** (Two communities).**

4.1 Proofs for Lemma 4.4

Lemma 4.5** (Upper bound on the error contribution).**

Proof.

Lemma 4.6** (Lower bound on the core contribution).**

Proof.

5 Bipartite graphs

Lemma 5.1**.**

Proof.

Theorem 5.2**.**

Proof.

6 Discussion and Outlook

Other graph classes.

Outlook.

Appendix

Appendix A Useful inequalities

Theorem A.1** (Extension of Chernoff Bounds [DP09]).**

Theorem A.2** (Cauchy-Schwarz’s inequality).**

Theorem A.3** (Cheeger’s inequality [Chu96]).**

Theorem A.4** (Berry-Esseen’s theorem [Ber41]).**

Theorem A.5** (Littlewood-Offord’s small ball [Erd45]).**

Definition 2.1 (Community-sensitive algorithm).

Lemma 2.2 (Community-sensitive labeling).

Definition 2.3 (Volume-regular graph).

Lemma 2.4.

Definition 2.5 (Clustered volume regular graph).

Definition 3.1 (Ordinary lumpability of Markov Chains).

Lemma 3.2.

Lemma 3.3.

Theorem 4.1.

Remark 1 (The extent of the time-window).

Remark 2 (The extent of non-regularity).

Lemma 4.2 (Length of the projection of the state vector).

Lemma 4.3 (Different communities, different signs).

Lemma 4.4 (Sign of the difference).

Remark 3 (Equal-sized communities).

Remark 4 (Two communities).

Lemma 4.5 (Upper bound on the error contribution).

Lemma 4.6 (Lower bound on the core contribution).

Lemma 5.1.

Theorem 5.2.

Theorem A.1 (Extension of Chernoff Bounds [DP09]).

Theorem A.2 (Cauchy-Schwarz’s inequality).

Theorem A.3 (Cheeger’s inequality [Chu96]).

Theorem A.4 (Berry-Esseen’s theorem [Ber41]).

Theorem A.5 (Littlewood-Offord’s small ball [Erd45]).