Randomized Gossiping with Effective Resistance Weights: Performance Guarantees and Applications
Bugra Can, Saeed Soori, Necdet Serhat Aybat, Maryam Mehri Dehnavi,, Mert Gurbuzbalaban

TL;DR
This paper introduces a randomized gossiping method using effective resistance weights that enhances distributed averaging and optimization performance by leveraging network structure insights.
Contribution
It proposes a novel ER-based weighting scheme for gossip algorithms, improving convergence times and efficiency in distributed consensus and optimization tasks.
Findings
ER weights reduce averaging time compared to uniform weights
Numerical experiments confirm improved communication efficiency
ER gossiping enhances performance of distributed optimization algorithms
Abstract
The effective resistance between a pair of nodes in a weighted undirected graph is defined as the potential difference induced when a unit current is injected at one node and extracted from the other, treating edge weights as the conductance values of edges. The effective resistance is a key quantity of interest in many applications, e.g., solving linear systems, Markov Chains, and continuous-time averaging networks. We consider effective resistances (ER) in the context of designing randomized gossiping methods for the consensus problem, where the aim is to compute the average of node values in a distributed manner through iteratively computing weighted averages among randomly chosen neighbors. We show that employing ER weights improves the averaging time corresponding to the traditional choice of uniform weights -the amount of improvement depends on the network structure. We illustrate…
| Graph | Method |
|
|
||||||
|---|---|---|---|---|---|---|---|---|---|
| ER | 2.9 | 81 | |||||||
| FMMC | 1.28 | 65 | |||||||
| ER | 8.4 | 198 | |||||||
| FMMC | 3.93 | 130 | |||||||
| ER | 2.6 | 433 | |||||||
| FMMC | 6.4 | 251 | |||||||
| ER | 7.9 | 566 | |||||||
| FMMC | 287 |
| Graph | Method |
|
|
||||||
|---|---|---|---|---|---|---|---|---|---|
| ER | 6.4 | 41 | |||||||
| FMMC | 41075.2 | 84 | |||||||
| ER | 16.8 | 130 | |||||||
| FMMC | 143 | ||||||||
| ER | 19.20 | 315 | |||||||
| FMMC | 370 | ||||||||
| ER | 20.00 | 403 | |||||||
| FMMC | 512 |
| Graph Topology | ER-Kac | ER-Ex | Uniform | Metropolis |
| Small-world, | 530 | 527 | 801 | 623 |
| 1577 | 1525 | 8661 | 8341 | |
| SBM(100,2,0.5,0.01) | 749 | 721 | 1213 | 914 |
| SBM(100,2,0.9,0.01) | 835 | 814 | 1459 | 1248 |
| SBM(120,3,0.9,0.01) | 1022 | 1003 | 2079 | 1541 |
| SBM(120,3,0.9,0.05) | 646 | 639 | 1185 | 674 |
|
|
||||||||
|---|---|---|---|---|---|---|---|---|---|
| 20 | 5.914 | 7.076 | 4.407 | 2.50 | |||||
| 24 | 6.292 | 7.599 | 4.605 | 3.10 | |||||
| 28 | 6.610 | 8.043 | 4.771 | 3.48 | |||||
| 32 | 6.884 | 8.429 | 4.913 | 6.47 | |||||
| 36 | 7.125 | 8.771 | 5.037 | 8.63 | |||||
| 40 | 7.340 | 9.078 | 5.147 | 12.33 | |||||
| 44 | 7.534 | 9.357 | 5.247 | 13.45 | |||||
| 52 | 7.873 | 9.846 | 5.421 | 21.79 | |||||
| 72 | 8.532 | 10.803 | 5.756 | 106.22 |
| n |
|
|
|||||||
|---|---|---|---|---|---|---|---|---|---|
| 22 | 6.490 | 7.150 | 5.457 | 3.02 | |||||
| 24 | 6.276 | 7.356 | 5.072 | 4.09 | |||||
| 30 | 6.743 | 8.049 | 5.236 | 4.02 | |||||
| 32 | 7.104 | 8.328 | 5.573 | 6.98 | |||||
| 36 | 6.498 | 7.524 | 5.302 | 6.74 | |||||
| 38 | 6.268 | 6.959 | 4.974 | 8.58 | |||||
| 40 | 6.613 | 7.475 | 5.264 | 8.97 | |||||
| 44 | 6.621 | 7.420 | 5.265 | 11.04 | |||||
| 48 | 6.904 | 7.814 | 5.361 | 13.58 | |||||
| 50 | 6.969 | 7.947 | 5.505 | 15.05 | |||||
| 52 | 7.166 | 8.249 | 5.566 | 16.93 | |||||
| 54 | 6.934 | 7.840 | 5.474 | 21.39 |
| 10 | 0.802 | 1.735 | 0.865 | 1.848 |
|---|---|---|---|---|
| 16 | 0.818 | 1.756 | 0.883 | 1.873 |
| 18 | 0.822 | 1.761 | 0.887 | 1.878 |
| 20 | 0.825 | 1.765 | 0.891 | 1.882 |
| 22 | 0.828 | 1.769 | 0.893 | 1.886 |
| 28 | 0.834 | 1.778 | 0.900 | 1.894 |
| 30 | 0.836 | 1.780 | 0.901 | 1.896 |
| 36 | 0.841 | 1.786 | 0.905 | 1.900 |
| 38 | 0.842 | 1.788 | 0.906 | 1.902 |
| 44 | 0.845 | 1.793 | 0.909 | 1.905 |
| 46 | 0.846 | 1.794 | 0.910 | 1.906 |
| 48 | 0.847 | 1.795 | 0.911 | 1.907 |
| 50 | 0.848 | 1.797 | 0.912 | 1.908 |
| 100 | 0.862 | 1.815 | 0.923 | 1.920 |
| 500 | 0.886 | 1.847 | 0.939 | 1.938 |
| 1000 | 0.894 | 1.858 | 0.944 | 1.943 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Complex Network Analysis Techniques · Opinion Dynamics and Social Influence
Randomized Gossiping with Effective Resistance Weights: Performance Guarantees and Applications
Bugra Can*∗*
Management Sciences and Information Systems
Rutgers Business School
[email protected] &Saeed Soori
Department of Computer Sciences
University of Toronto
[email protected] &Necdet Serhat Aybat
Industrial and Manufacturing Engineering Department
Penn State University
[email protected] &Maryam Mehri Dehnavi
Department of Computer Sciences
University of Toronto
[email protected] &Mert Gürbüzbalaban
Management Sciences and Information Systems
Rutgers Business School
[email protected] Bugra Can and Mert Gürbüzbalaban acknowledge support from the Office of Naval Research Award Number N00014-21-1-2244, and the grants National Science Foundation (NSF) CCF-1814888, NSF DMS-2053485, NSF DMS-1723085.
Abstract
The effective resistance between a pair of nodes in a weighted undirected graph is defined as the potential difference induced when a unit current is injected at one node and extract from the other, treating edge weights as the conductance values of edges. The effective resistance is a key quantity of interest in many applications, e.g., solving linear systems, Markov Chains, and continuous-time averaging networks. We consider effective resistances (ER) in the context of designing randomized gossiping methods for the consensus problem, where the aim is to compute the average of node values in a distributed manner through iteratively computing weighted averages among randomly chosen neighbours. For barbell graphs, we prove that choosing wake-up and communication probabilities proportional to ER weights improves the averaging time corresponding to the traditional choice of uniform weights. For -barbell graphs, we show that ER weights admit lower and upper bounds on the averaging time that improves upon the lower and upper bounds available for uniform weights. Furthermore, for graphs with a small diameter, we can show that ER weights can improve upon the existing bounds for Metropolis weights by a constant factor under some assumptions. We illustrate these results through numerical experiments where we showcase the efficiency of our approach on several graph topologies including barbell graphs, small-world graphs, and stochastic block models. We also present an application of the ER gossiping to distributed optimization: we numerically verify that using ER gossiping within EXTRA and DPGA-W methods improves their practical performance in terms of communication efficiency.
Keywords Distributed algorithms/control networks of autonomous agents optimization randomized gossiping algorithms
1 Introduction
Let be an undirected, weighted and connected graph defined by the set of nodes (agents) , the set of edges , and the edge weights for . Since is undirected, we assume that both and refer to the same edge when it exists, and for all , we set . Identifying the weighted graph as an electrical network in which each edge corresponds to a branch of conductance , the effective resistance between a pair of nodes and is defined as the voltage potential difference induced between them when a unit current is injected at and extracted at . The effective resistance (ER), also known as the resistance distance, is a key quantity of interest to compute in many applications and algorithmic questions over graphs. It defines a metric on the graph providing bounds on its conductance [1, 2]. Furthermore, it is closely associated with the hitting and commute times of a random walk111The hitting time is the expected number of steps of a random walk starting from until it first visits . The commute time is the expected number of steps required to go from to and from to back again. on the graph when the probability of a transition from to is where denotes the set of neighboring nodes of ; therefore, it arises naturally for studying random walks over graphs and their mixing time properties [3, 4, 5], spectral approximation of graphs [6], continuous-time averaging networks including consensus problems in distributed optimization [3].
There exist centralized algorithms for computing or approximating effective resistances accurately which require global communication beyond local information exchange among the neighboring agents [7, 8, 9, 6, 10]. The references [7, 8, 9] develop key techniques for computing the effective resistances explicitly on specific network types. In particular, [8] addresses a class of graphs which are underlying networks of some symmetric association schemes whereas [7] considers two dimensional resistor networks. The reference [9] provides an algorithm for the calculation of the resistance between two arbitrary nodes in a distance-regular network and also provides analytical formulas. The works [6, 10] are based on computing or approximating the entries of the pseudoinverse of the graph Laplacian matrix , based on the identity [6]
[TABLE]
However, such centralized algorithms are impractical or infeasible for several key applications in multi-agent systems, e.g., randomized gossiping algorithms, for averaging the node values across the whole network, use only local communications between random neighbors (see [11, 12, 13]); this motivates the use of distributed algorithms for computing effective resistances which only rely on the information exchange among immediate neighbors. In these applications, communication among the agents is typically the bottleneck compared to the complexity of local computations of the agents; thus, it is crucial to develop distributed algorithms that are efficient in terms of the total number of communications required. To the best of the authors’ knowledge, the first attempt for computing effective resistances in a decentralized way and also the first ER-based randomized gossiping algorithms appeared in [14]. The latter algorithms are asynchronous gossiping algorithms where each agents’ wake-up and communication probabilities are chosen proportional to ER weights (see Section 2 for details). Aybat and Gürbüzbalaban have shown in [14] that effective resistance (ER) weights can be computed at each agent locally with an efficient distributed algorithm, Distributed Randomized Kaczmarz (D-RK). Our paper is motivated by the numerical evidence presented in [14] that using ER weights has the potential to improve the performance of randomized gossiping algorithms on specific graphs. Since in [14] no rigorous performance guarantees for the use of ER weights were provided, here we focus on establishing the missing theoretical results that match the outstanding empirical behavior.
Contributions. First, in this paper, we provide theoretical guarantees on the ER-based randomized gossiping algorithms proposed in [14] for the consensus problem, where the objective is to compute the average of node values over a network in a decentralized manner [12]. A standard approach for solving the consensus problem is the randomized uniform gossiping, where each node keeps a local estimate of the average of node values and has the equal (uniform) probability of being activated to communicate with a randomly chosen neighbour to update its local estimate. However, this approach treats all the edges (equally) uniformly and can be slow in practice. To overcome this problem, in [14], ER-based randomized gossiping algorithms were proposed without any theoretical guarantees, in which the edges are being activated by non-uniform probabilities that are proportional to their effective resistances.
Our theoretical results presented in Section 3 (see Results 1, 2, and 3) explain the superior empirical behaviour of ER-based gossiping over the uniform gossiping observed in [14]. Briefly, we bound the time required to compute an inexact average using analysis based on conductance and spectral properties of the underlying weighted communication graph, and compare the bounds we obtained corresponding to the ER and uniform gossiping methods. We show that averaging time with ER weights is faster than that of uniform gossiping on a barbell graph where is the number of agents. Furthermore, we also prove that for connected graphs with a small diameter, the averaging time with resistance weights can be faster than known performance bounds for the averaging time with gossiping based on Metropolis weights by a constant factor see (Remark 12). We also provide numerical experiments on several graph topologies which illustrate the performance improvements that can be obtained within ER-based gossiping. In our experiments, the effective resistances are first computed with the normalized D-RK algorithm of [14] and then used for ER-based gossiping. Our theoretical and numerical results show that ER weights are especially useful in the presence of “bottleneck edges" or clusters giving a graph cut leading to small graph conductance values.
On a different note, Aybat and Gürbüzbalaban [14] introduced two alternative methods to compute ER weights in a decentralized manner: D-RK and normalized D-RK –both converging linearly. In our experiments at Section 5, we have adopted the normalized D-RK, upon proving that the convergence rate of normalized D-RK is better than D-RK; resolving a conjecture raised in [14] (see the Supplementary Material).
Second, we consider the consensus optimization problem, where the agents connected on a network aim to collaboratively solve the optimization problem where is a cost function only available to (node) agent . This problem includes a number of key problems in supervised learning including distributed regression and logistic regression or more generally distributed empirical risk minimization problems [15, 16]. The consensus iterations are a building block of many existing state-of-the-art distributed consensus optimization algorithms such as the EXTRA and the distributed proximal gradient (DPGA-W) [17] algorithms for consensus optimization. We show through numerical experiments that our framework based on effective resistances can improve the performance of the EXTRA and DPGA-W algorithms for consensus optimization in terms of the total number of communications required. We believe our framework has far-reaching potential for improving the communication efficiency of many other distributed algorithms including distributed subgradient and ADMM methods, and this will be the subject of future work.
Related work. For consensus problems, there are some alternative methods to accelerate the commonly used consensus protocols. The approach in [18] is a synchronous algorithm combining Metropolis weights with a momentum averaging scheme. There are other approaches based on momentum averaging [19, 20, 21], min-sum splitting [22], and Chebyshev acceleration [23, 24, 25] to accelerate the convergence speed of the consensus methods. This paper is orthogonal to the momentum averaging-based approaches in the sense that it can be used in combination with the aforementioned momentum-based acceleration schemes, we refer the reader to the Supplementary Material for the details. There are also works that provide lower bounds on the distributed averaging time on a graph [12, 26, 27, 28]. In particular, it follows from these lower bounds that for the two-dimensional grid, even the best gossiping weights will not lead to an accelerated performance compared to baseline approaches. Indeed, for special graphs such as the two-dimensional grid, cycle graph or the line graph, ER weights will be similar to uniform weights due to the symmetries in the graph structure and consequently ER weights will not improve the performance compared to uniform weights. However, for graphs with asymmetries involving clusters or bottleneck edges along which the graph cut has low conductance, based on our numerical and theoretical results, we expect ER weights lead to an improved performance.
Outline. In Section 2, we give a brief overview of randomized gossiping including uniform and ER-based gossiping methods. In Section 3, we state our main contributions. In Section 4, we provide detailed arguments establishing the main results stated in Section 3. In Section 5, we provide numerical experiments illustrating that using ER weights can improve the performance of EXTRA and DPGA-W algorithms for consensus optimization. In Section 6, we give some concluding remarks. Finally, we present some of the proofs and supporting results in Appendix A–B.
Notation. Let denote the cardinality of a set , denote the floor function and be the set of nonnegative integers. We define as the degree of , and . Throughout the paper, denotes the weighted Laplacian of , i.e., , if , and equals to [math] otherwise. The diameter of a graph is where is the shortest path on the graph between nodes and . The set denotes the set of real symmetric matrices. We use the notation where ’s are either the columns or rows of the matrix depending on the context. is the column vector with all entries equal to 1, and is the identity matrix. We let denote the norm of a vector for , and let denote the Frobenius norm of a matrix . A square matrix is doubly stochastic if all of its entries are non-negative and all its rows and columns sum up to 1. We say that a square matrix is weakly diagonally dominant if it’s diagonal entries satisfy the inequality for every . Let and be real-valued functions defined over positive integers. We say if is bounded above by asymptotically, i.e., there exist constants and such that for all . Similarly, we say if there exist constants and such that for every ; and we say if and . Finally, denots the natural logarithm of , and is the -th standard basis vector in for .
2 Preliminaries
2.1 Randomized gossiping
Here we give an overview of randomized gossiping methods for the consensus problem. These methods can compute the average of node values over a network in an asynchronous and decentralized manner, for details see [12, 28].
Let be a vector such that the -th component represents the initial value at node . The aim of the randomized gossiping algorithms is to have each node compute the average in a decentralized manner through an iterative procedure. At every iteration , each node possesses a local estimate of the average to be computed and communicates with only randomly selected neighbors to update its estimate. The setup is that each node has an exponential clock ticking with rate where the time between two ticks is exponentially distributed and independent of other nodes’ clocks. A node wakes up when its clock ticks. Since all the clocks are independent, if a node wakes up at time , it is node with probability (w.p.) . Given that the node wakes up at time , the conditional probability that it picks one of its neighbors to communicate with probability , where the probabilities are design parameters satisfying . When either wakes up and picks or vice versa, we say the edge is activated. Once the edge is activated, nodes and exchange their local variables and at time and both compute the average . This is illustrated in Algorithm 1 below which admits an asynchronous implementation – see, e.g., [12].
Assuming *there are no self-loops *for each , let
[TABLE]
where is the (unconditional) probability that the edge is activated by the node . By definition, we have . Let denote an asynchronous gossiping algorithm characterized by a probability matrix as in (2) for some set of probabilities and for . The performance of is typically measured by the -averaging time, defined for any as:
[TABLE]
see, e.g., [12]. Suppose is activated by node , then we can write the update in Step 1 of the Algorithm 1 as
[TABLE]
We also define
[TABLE]
which is the expected value of the random iteration matrix with respect to the distribution defined over and . The following theorem from [12] shows that the second largest eigenvalue of determines the -averaging time.
Theorem 1** ([12, Theorem 3]).**
For a given , the symmetric matrix defined in (4) satisfies
[TABLE]
where is the second largest eigenvalue of .
This result makes the connection between the convergence time of an asynchronous gossiping algorithm and the spectrum of the expected iteration matrix . It is therefore of interest to design through carefully choosing the probabilities and for in order to get the best performance, i.e., the smallest -averaging time.
In this paper, we consider two different randomized gossiping algorithms: uniform gossiping and ER gossiping which differ in how the probabilities and for are selected. In particular, based on Theorem 1, we will study the second largest eigenvalue of the expected iteration matrix corresponding to these two algorithms and compare their -averaging times.
2.2 Randomized uniform gossiping
In the randomized uniform gossiping, each node wakes up with equal probability , i.e., using uniform clock rates for . The superscript stands for the uniform choice of clock rates. Then, node picks the edge with conditional probability for ; thus,
[TABLE]
see, e.g., [29, 30]. One of the drawbacks of this approach is that it can be quite slow over graphs with a high bottleneck ratio [31] where, intuitively speaking, some “bottleneck edges" limit the spread of information over the underlying graph. A classical example of a graph with a high bottleneck ratio is the barbell graph. Barbell graphs are frequently studied within the consensus problem literature as they constitute a worst-case example in terms of both the mixing properties of random walks [4, Section 5] and the performance of distributed averaging algorithms (see, e.g., [3, 32]).
Barbell graphs consist of two complete subgraphs connected with an edge (see Figure 1). Let denote a complete graph with nodes, we will be denoting a barbell graph with nodes by . Let be the edge that connects the two complete subgraphs which we will be referring to as the bottleneck edge. This is the only edge that allows node values to be propagated between the two complete subgraphs; therefore, how frequently it is sampled is a key factor that determines the averaging time.
The probability of sampling the bottleneck edge , with uniform weights can be computed explicitly:
[TABLE]
This implies that it takes iterations in expectation to activate this edge, which is the underlying reason why the randomized uniform gossiping iterates converge slowly when is large on the barbell graph. The effect of bottleneck edges on the performance of gossiping algorithms has been recently studied experimentally by Aybat and Gürbüzbalaban [14] on different topologies including the barbell and small-world graphs. The authors proposed ER gossiping where the edges are sampled with non-uniform probabilities proportional to effective resistances and the numerical experiments in [14] showed that this can lead to significant performance improvement over graphs with bottleneck edges, such as barbell graphs. We next describe this method.
2.3 Effective-resistance (ER) gossiping
In the ER gossiping, each wakes up with probability , i.e., setting clock rate for , and node picks with conditional probability for all ; thus, ER gossiping corresponds to the unconditional probabilities
[TABLE]
for all where the third equality follows from Foster’s Theorem which says that – see, e.g., [33]. This choice of sampling probabilities can lead to bottleneck edges being more frequently sampled. We illustrate this fact on the barbell graph (): Note that the unconditional probability of sampling the bottleneck edge is given explicitly as
[TABLE]
where and we used the fact that (see the proof of Lemma 16 for the derivation of (6)). Hence, comparing (5) and (6), we see that ER weights allow sampling of the bottleneck edge more frequently, by a factor of , than the uniform gossiping on . Intuitively speaking, this is the reason why ER gossiping can be efficient on barbell graphs. Numerical experiments provided in [14] support this intuition where ER gossiping outperforms uniform gossiping over an unweighted barbell graph as well as small-world graphs, which are random graphs that arise frequently in real-world applications such as social networks.
Despite the empirical success of ER gossiping in practice, theoretical results supporting its practical performance have been lacking in the literature. The purpose of this paper is to provide rigorous convergence guarantees for ER gossiping algorithms on certain network topologies (see Section 3 for our main results’ statements and Section 4 for the proofs) and to present further numerical evidence that ER gossiping, beyond distributed averaging, can also improve the practical performance of distributed methods for consensus optimization (Section 5). Indeed, in our analysis, we consider connected graphs characterized by their diameter , barbell graphs and -barbell graphs which are generalizations of barbell graphs. More specifically, a -barbell graph () for is a path of equal-sized complete graphs () [34], e.g., see Figure 2 for . In the special case, when , a -barbell graph is equivalent to the barbell graph. We show that for these graphs, ER gossiping has provably better convergence properties than uniform gossiping in terms of -averaging times. Precise results will be stated in the next section.
3 Main Results
In this section, we state our main theoretical results: we provide performance bounds for the ER gossiping in terms of -averaging time . Our results highlight the performance improvements obtained with this approach.
Our first result concerns -barbell graphs where we focus on the -averaging times of uniform and ER gossiping algorithms. To the best of our knowledge, for -barbell graphs, an analytical formula for the second largest eigenvalue is not analytically available; therefore, in our analysis we estimate this eigenvalue based on graph conductance techniques (see Section 4.1 for details) which leads to the following lower and upper bounds on the -averaging times.
Result 1**.**
Given , and such that , asynchronous randomized gossiping algorithms and on a c-barbell graph with satisfy
[TABLE]
These bounds from Result 1 for the c-barbell graph show that, for any given precision , using effective resistances one can improve upper and lower bounds on the averaging times by a factor of and , respectively. In the Supplementary Material, we also compared the averaging times and numerically based on computing the second-largest eigenvalues and and by invoking Theorem 1. These numerical results are inline with Result 1, showing that effective resistances improve upon uniform weights in the sense that the averaging time for the effective resistances scales better with the number of nodes .
The next result shows that for the case of barbell graphs (when ) the ER gossiping is in fact faster by a factor of . The proof idea is based on computing the eigenvalues of and explicitly via exploiting symmetry group properties of barbell graphs and showing that the lower bounds in (7)–(8) are attained for .
Result 2**.**
Given and , let . The -averaging times of asynchronous gossiping algorithms and on barbell graph satisfy the equality:
[TABLE]
A natural question is whether it is possible to further improve the ER gossiping bounds for barbell graphs; however, in the next result, we show that this is not possible as long as the matrix is symmetric –thus, ER gossiping is optimal. Finally, we also obtain -averaging bounds for a more general class of connected graphs depending on their diameters.
Result 3**.**
Given and , let . Among all the gossiping algorithms with a symmetric on the barbell graph, , randomized ER gossiping leads to , which is optimal with respect to and , and cannot be improved.
In a more general setting, let be a connected graph with diameter . The -averaging time of satisfies
[TABLE]
Remark 2**.**
*The -averaging time of randomized gossiping with lazy Metropolis weights222For lazy Metropolis weights see (15) and the paragraph after. on any graph is ; while, for the barbell graph, Metropolis weights perform similar to uniform weights; both require time which can be improved to by ER gossiping. *
Remark 3**.**
If the diameter , our bounds for ER gossiping improve upon that of the randomized gossiping with lazy Metropolis weights by a (small) constant factor (see Remark 12). Note for barbell graphs and is also reasonable for mid-size small-world graphs which are random graphs that arise frequently in real-world applications [35]. For instance, Cont et al. [35] show that the diameter of the randomized community-based small-world graphs admits upper bound almost surely; hence, for these graphs almost surely for . Indeed, we empirically observe that randomly generated small-world graphs with parameters and using the methodology described in the numerical experiments in Section 5.1 satisfy on average over independent and identically distributed (i.i.d.) samples.
4 Proofs of Main Results
In order for both uniform and ER gossiping methods to have the same expected number of node wake-ups in a given time period, one should have for within the uniform gossiping model –recall that for for ER gossiping; hence, the rate of both Poisson processes will be the same, i.e., . We note that the number of clock ticks can be converted to absolute time easily with standard arguments (simply dividing by to get the expected time of the -th tick), e.g., see [12, Lemma 1]. This allows us to use the number of iterations (clock ticks) to compare asynchronous algorithms.
It can be easily verified that for a given , the expected iteration matrix defined in (4) satisfies
[TABLE]
where is a diagonal matrix with -th entry . Note defined in Section 2.1 is a doubly stochastic, non-negative and weakly diagonally dominant matrix for all and ; therefore, , which is a convex combination of matrices, is also a doubly stochastic, non-negative and weakly diagonally dominant matrix. It follows then from the Gershgorin’s Disc Theorem (see e.g. [36]) that all the eigenvalues of are non-negative. Moreover, since is a non-negative doubly stochastic matrix, its largest eigenvalue . Plugging in and for in this identity respectively leads immediately to the following result.
Lemma 4**.**
The matrices and satisfy the following identities:
[TABLE]
where and are diagonal matrices satisfying , where .
Recall the definition of given in (3), i.e., -averaging time of an asynchronous gossiping algorithm characterized by a probability matrix . According to Theorem 1, to compare uniform and ER gossiping methods introduced in Section 2, it is sufficient to estimate the second largest eigenvalues of and and compare them. In the rest of this section, we discuss estimating the second largest eigenvalues of and based on the notions of graph conductance and hitting times when the eigenvalues are not readily available in closed form. We will also discuss some examples for which we can explicitly compute the eigenvalues.
It is worth emphasizing that since the matrices and are symmetric and doubly stochastic, they can both be viewed as the probability transition matrix of a reversible Markov Chain on the graph , both with a uniform stationary distribution. We saw that depending on the type of randomized gossiping, the sampling probabilities of the bottleneck edge can differ significantly –by a factor of on barbell graphs implied by (5) and (6). A similar effect can also be observed for the Markov chains defined by the transition probability matrices and . In fact, by an explicit computation based on Lemma 4 (see Lemma 16 for details), we get
[TABLE]
That is, the probability of moving from one complete subgraph to the other is significantly larger (by a factor of ) for the Markov chain corresponding to than that of the chain with . Intuitively speaking, this fact allows the ER-based chain to traverse between the complete subgraphs faster when is large, leading to faster averaging over the nodes. This will be formalized and proven in the next subsection, where we study gossiping algorithms over barbell and -barbell graphs.
4.1 Proof of Result 1 via conductance-based analysis
Probability transition matrices on graphs have been studied well; in particular, there are some combinatorial techniques to bound their eigenvalues based on graph conductance [4] as well as some algebraic techniques that allow one to compute all the eigenvalues explicitly exploiting symmetry groups of a graph [37] as we shall discuss in Section 4.2.
The notion of graph conductance is tied to a transition matrix over a graph which corresponds to a reversible Markov chain admitting an arbitrary stationary distribution . It can be viewed as a measure of how hard it is for the Markov chain to go from a subgraph to its complement in the worst case.
The notion of graph conductance allows us to provide bounds on the mixing time of the corresponding Markov chain as we discuss below.
Definition 5** (Conductance).**
Let be the transition matrix of a reversible Markov chain333That is for all . on the graph with a stationary distribution . The conductance is defined as
[TABLE]
where .
Given a transition matrix , the relation between conductance and the second largest eigenvalue is well-known and given by the Cheeger inequalities:
[TABLE]
–see, e.g., [38, Proposition 6]. Therefore, larger conductance leads to faster averaging, i.e., shorter , in light of Theorem 1. In particular, we can get lower and upper bounds on the averaging time for both uniform and ER gossiping methods using the Cheeger’s inequality. We study the performance bounds for these gossiping algorithms over c-barbell graphs; and our next result shows improvement on the conductance of effective resistance-based transition probabilities compared to uniform probabilities on a c-barbell graph with nodes.
Proposition 6**.**
*Given such that , consider the two Markov chains on the -barbell graph with nodes defined by the transition matrices and . Let c_{*}=\big{(}\lfloor\frac{c}{2}\rfloor\big{)}^{-1}. The conductance values are given by *
[TABLE]
Remark 7**.**
Since a barbell graph is a special case of a c-barbell graph with and , Proposition 6 implies that and .
Given the transition matrix , by taking the logarithm of the Cheeger inequalities in (11), for , we obtain
[TABLE]
Then, choosing and above, applying Theorem 1 and Proposition 6 and noting for close to 0, leads to the lower and upper bounds on the averaging time of uniform and ER gossiping algorithms as shown in Result 1 of our main results section (Section 3). In the Supplementary Material, we also studied the tightness of our conductance bounds (13) numerically on the -barbell graphs to show that our bounds are reasonable. In particular, we observe that our lower bounds gets tighter as the number of nodes, , increases on c-barbell graphs.
Although this analysis is also applicable to other graphs with low conductance, it does not typically lead to tight estimates, i.e., the lower and upper bounds do not match in terms of their dependency on . In the next section, we show that for the case of barbell graphs, we get tight estimates on the averaging time by computing the eigenvalues of the averaging matrices and explicitly. More precisely, we will show in Proposition 9 that the lower bounds in (7)–(8) are tight for in the sense that and and the effective resistance-based averaging is faster by a factor of which will imply Result 2.
4.2 Proof of Result 2 via spectral analysis
Eigenvalues of probability transition matrices defined on barbell graphs are studied in the literature. Consider the edge-weighted barbell graph with nodes, where is the vector of edge weights that have positive entries. Suppose each node has a self-loop, e.g., see Fig. 3. Let be the edge that connects the two complete subgraphs. The result [37, Prop. 5.1] gives an explicit formula for the eigenvalues of a probability transition matrix with transition probabilities proportional to edge weights, i.e., where satisfy the following assumptions: , , for all and , for all in each such that and , and for for some . Note we cannot immediately use this result to compute the eigenvalues of the transition matrices and defined in Lemma 4. Mainly because all the diagonal entries of and being strictly positive breaks the assumption of [37, Prop. 5.1]. In Proposition 8, we adapt [37, Prop. 5.1] to our setting with some minor modifications to allow for any so that it becomes applicable to and . The proof of Proposition 8, provided in the Supplementary Material, is similar to the proof of [37, Prop. 5.1] and is based on exploiting the symmetry properties of the weighted barbell graph as described above –illustrated in Figure 3 for .
Proposition 8** (Generalization of Proposition 5.1 in [37]).**
Consider the edge-weighted barbell graph with nodes. Let be the edge that connects the two complete subgraphs. Assume that weights are of the form , , for all and , for all in each such that and , and for for some . Consider the transition matrix associated to this graph with entries , then the eigenvalues of are
- •
* with multiplicity one,*
- •
* with multiplicity one,*
- •
* with multiplicity ,*
- •
\lambda_{\pm}\triangleq\frac{1}{2}\Big{(}\frac{F}{B+F}+\frac{G-A}{A+E+G}\,\pm\,\sqrt{S}\Big{)},
*where , and S\triangleq\big{(}\frac{F}{B+F}+\frac{G-A}{A+E+G}\big{)}^{2}-\frac{4(FG-BE-AF)}{(B+F)(A+E+G)}. *
Based on this result, in Proposition 9 we characterize the second largest eigenvalue of the transition matrices and – the proof can be found in the appendix.
Proposition 9**.**
Consider Markov chains on the barbell graph with transition matrices and . The second largest eigenvalues of these matrices are given by
[TABLE]
Result 2 follows as a direct consequence of Proposition 9 and Theorem 1. Thus, we establish that that averaging time with resistance weights is faster on a barbell graph.
4.3 Proof of Result 3 via hitting and mixing times
Before giving a formal definition of the -mixing time, we introduce the total variation (TV) distance between two probability measures and defined on the set of nodes . TV distance between and is defined as Given a Markov chain with a probability transition matrix and stationary distribution , -mixing time is a measure of how many iterations are needed for the probability distribution of the chain to be -close to the stationary distribution in the TV distance. A related notion is the hitting time which is a measure of how fast the Markov chain travels between any two nodes.
Definition 10**.**
(Mixing time and hitting times) Given and a Markov chain with probability transition matrix and stationary distribution , the -mixing time is defined as
[TABLE]
and the hitting time is the expected number of steps until the Markov chain reaches starting from .
Mixing-times and averaging times are closely related. In fact, given probability transition matrix , it is known that and admit the same bounds up to factors [12, Theorem 7] for .444Note [12, Theorem 7] uses absolute time whereas we used number of node wake-ups to define -averaging and -mixing times; therefore, we multiplied factor in [12, Theorem 7] by to convert absolute times to number of node wake-ups. Hence, designing algorithms with a smaller mixing time, often leads to better algorithms for distributed averaging (see also [28]). It is also known that mixing time is closely related to hitting times [39, Theorem 1.1].
Next, we show the first part of Result 3, i.e., is optimal among all with a symmetric . Note is symmetric implies that it is doubly stochastic. For large and doubly stochastic , by [12, Corollary 1], we have . On the other hand, Roch proved in [26, Section 3.3.1] that any symmetric doubly stochastic matrix on the barbell graph with nodes satisfies the bound . Inserting this estimate into the expression for the averaging time, we obtain for any with symmetric on barbell graphs.
We conclude that the averaging time of the ER-based gossiping on the barbell graph, which satisfies by Proposition 9 and Theorem 1, is optimal with respect to its dependency to and among all symmetric choices of the matrix.
Next, given any connected graph , we obtain a bound on the second largest eigenvalue of the and show that the averaging time with effective resistance weights where is the diameter of the graph.
Theorem 11**.**
Let be a graph with diameter . The second largest eigenvalue of satisfies
Proof.
It follows from our discussion in Section 4 that is non-negative and doubly stochastic (see the paragraph before Lemma 4). Therefore, for analysis purposes, we can interpret as the transition matrix of a Markov chain whose stationary distribution is the uniform distribution. Our analysis is based on relating the eigenvalues of matrix to the hitting times of the Markov chain where we follow the proof technique of [40, Lemma 2.1]. By Lemma 17 from the appendix, we get if For any graph, it is also known that555This follows directly from the Rayleigh’s monotonicity rule [5] which says that if an edge is removed from a graph, effective resistance on any edge can only increase. Therefore, the complete graph provides a lower bound for where (see also [41]). Therefore, for any neighbors and , For any two vertices and not necessarily neighbors, , let be the shortest path connecting and . Then, by the subadditivity property of hitting times, for any , we obtain . It follows from an analysis similar to [42] that
[TABLE]
From [42, eqn. (12.12)], we also have
[TABLE]
Combining this with the estimate (14) implies directly which proves the claim. ∎
Metropolis vs ER gossiping: Given a connected , suppose there are no self-loops, i.e., for . Uniform weights can result in slow mixing on some graphs such as the barbell graph (see Proposition 9) or other graphs like lollipop graphs [4] which have both high degree and low degree nodes together. A popular alternative to uniform weights for is the Metropolis weights defined as
[TABLE]
Let denote the matrix whose entries are the Metropolis weights . The weights determined by the matrix are also popular in the distributed optimization practice [40] which is referred to as the lazy version of the Metropolis weights. The matrix is symmetric and positive semi-definite, unlike the matrix which may have negative eigenvalues that can be close to (therefore, it can be problematic for the convergence of distributed algorithms, see e.g. [43]). Combined with uniform wake-up of nodes, this leads to the following wake-up probabilities for the Metropolis weights based system: and the associated matrix In particular, for any connected graph with nodes, we have the following guarantees from [40, Lemma 2.1] on the lazy Metropolis weights:
[TABLE]
By (9), we have also Therefore, from (16), we get the bound , for any connected graph . Therefore, we conclude from Theorem 1 that the -averaging time of Metropolis weights-based gossiping on any graph is – again using the fact that for close to [math]. That said, for barbell graphs, Metropolis weights perform similar to uniform weights; both require time which is improved by the effective resistance-based weights to . This completes the proof of Result 3.
Remark 12**.**
Comparing the inequalities and , we see that for , the upper bound on will be smaller than the upper bound for . Therefore, performance bounds obtained on the -averaging time through Theorem 1 for ER weights will be better than those of Metropolis weights by a (small) constant factor for .
5 Numerical Experiments
In this section, we demonstrate the benefits of using effective resistances for solving the consensus problem and also within DPGA-W [17] and EXTRA [43] algorithms for consensus optimization.
5.1 Consensus exploiting effective resistances
Gossiping algorithms have been studied extensively and there have been a number of approaches [28, 44, 45, 46, 47, 48, 49]. In light of Theorem 1, among all the algorithms with a symmetric , the matrix that minimizes the second largest eigenvalue, i.e. , is the fastest. The gossiping algorithm with optimal choice of the probability matrix is called the Fastest Mixing Markov Chain (FMMC) in the literature [27]. In [12], Boyd et al. propose a distributed subgradient method to compute the matrix . This method requires a decaying step size and computation of the subgradient of the objective with respect to the decision variable at every iteration which itself requires solving a consensus problem at every iteration. This can be expensive in practice in terms of average number of communications required per node, and its convergence to can be slow with at most sublinear convergence rate [12]. In contrast, ER probabilities are optimal for some graphs (such as the barbell graph, see Result 3) and can be computed efficiently with the normalized D-RK algorithm (see the Supplementary Material) which admits linear convergence guarantees. Therefore, ER weights can serve as a computationally efficient alternative to optimal weights for consensus. For illustrating this point, we compare communication requirements per node for ER gossiping and FMMC on barbell and small-world graphs. This comparison consists of two stages: pre-computation stage (where the probability matrices and are computed up to a given tolerance) asynchronous consensus stage (where we run ER and FMMC with probability matrices an obtained from the previous stage to solve a consensus problem).
First, we implement subgradient method with decaying step size from [12] where is tuned to the graph to achieve the best performance and stop the computation of matrix of FMMC at step if the iterate satisfies where is the given precision level.666The optimal probability matrix which serves as a baseline in the stopping criterion is estimated accurately by solving the semi-definite program (SDP) [12, eqn. (53)] directly using the CVX software [50] with a centralized method and computations required to solve this SDP is not counted as a part of the communication cost we report for FMMC in Tables 1-2. Similarly, we compute for ER and stop the normalized D-RK algorithm when the iterate at step satisfies . Since the distributed subgradient method of [12] is based on synchronous computations, we also implemented the normalized D-RK algorithm with synchronous computations for fairness of comparison. We define the communication for a node as a contact with its neighbour either to compute an average of their state vectors or to update the matrix at any iteration.
We compared both of the algorithms based on their communication performances on stage-i an stage-ii. In particular, we considered the number of communications required per node to obtain the matrix for ER and FMMC at stage-i and at stage-ii, we generated 1000 instances of to start consensus and compare the average number of communications per node required to achieve satisfying where is the tolerance level.
For the barbell graph, the initial state vector for consensus is sampled from the normal distribution if and from if where tolerance levels are set to be . We also compare ER and FMMC on small-world graphs while the number of nodes is varied with an edge density where is the total number of edges. On small-world graphs we generated instances of drawn from and stopped algorithms whenever tolerance levels are obtained or the number of communications per node exceeded .
Results for both of the graphs are reported in Tables 1 and 2 in which we compare the average communication per node in the pre-computation (stage-) and in the consensus computation (stage-) where results are averaged over 1000 runs. On barbell graph, we observe that FMMC requires less communications at the second (consensus) stage as expected (as FMMC is based on the optimal matrix ), but in terms of total communications (stage- + stage-) ER outperforms FMMC. In the case of small-world graphs, computation of exceeded the maximum communication limit which caused FMMC to perform worse than ER in stage- (since the stage- solution is not a precise approximation of anymore). We can say that ER performs better than FMMC in terms of total communications for both graph types.
In addition to FMMC, we consider Metropolis weights-based gossiping and fastest quantum gossiping (FQG) proposed by Jafarizadeh in [51]. In the Metropolis-based gossiping approach, each node wakes up with uniform probabilities (i.e. ) and communicates with one of its neighbors with probability . The FQG, on the other hand, calculates the wake-up () and conditional communication probabilities at each agent by solving an SDP problem. This SDP is targeted to optimize the spectral gap of the expected iteration matrix. The method proposed in [51] for solving this SDP is a centralized algorithm; therefore, we made the comparisons among these methods in terms of the time required to compute the probability matrices and in a centralized manner. The entries of these matrices are computed according to and . For the Metropolis weights, the probability matrix does not require any pre-computation time as it is only based on the degrees of the nodes and is assumed to be known. We also introduce the spectral gaps defined as , , of the corresponding expected iteration matrices.
In our next set of experiments, we consider barbell graphs and random graphs generated with the stochastic block model (SBM). The stochastic block model , also known as the planted partition model [52, 53], consist of nodes and clusters where each node in every cluster is connected to any other node in the same cluster with probability , whereas the nodes that are not in the same cluster are connected with probability .
We summarize our results for barbell graph in Table 4 and for in Table 5. A gossiping algorithm will be faster if its spectral gap is larger. We observe that is smaller than and larger than as is varied, therefore is larger than and is smaller than ; so, we conclude that ER performs faster then Metropolis and slower than FQG. This is expected as effective-resistance weights are not optimized to increase the spectral gap whereas FQG weights are targeted to optimize the spectral gap. However, when we look at the CPU time required to compute effective resistances and FQG weights, reported in the columns titled “CPU Time ER" and “CPU Time FQG", we observe that effective resistance weights can be computed faster as it only requires a matrix inversion whereas FQG algorithm requires solving a semi-definite program (SDP).777We used the SDP solver SeDuMi in the software CVX to compute FQG weights and matrix inversion function in Matlab to compute ER weights in a centralized manner.The advantage of using the ER weights is that they are faster to compute, and this effect becomes more pronounced for larger graphs. Moreover, ER weights can be computed efficiently (with a linear convergence rate) and asynchronously in the decentralized setting using the randomized Kaczmarz algorithm. We note that solving SDPs in the decentralized setting is possible with subgradient methods as discussed in [12] but are typically much slower in the decentralized setting as they admit at most sublinear rates. This fact is also reflected in our results in Table 1 and Table 2 for the FMMC method which used a subgradient method to compute the weights. To summarize, we conclude that the Metropolis weights require no pre-computation but they are the slowest in terms of the spectral gap. ER is faster than Metropolis but slower than FQG weights; but the advantage is that computing ER weights require less CPU time. ER weights can also be computed efficiently in the decentralized setting with a linearly convergent algorithm.
Lastly, we compare the performance of ER-based asynchronous gossiping with Metropolis weights-based asynchronous gossiping (Metropolis) and classical asynchronous gossiping (Uniform) on small-world, barbell, and random graphs generated with the stochastic block model, . We considered two types of ER-based gossiping algorithms: (i) The first algorithm ER-Ex uses the exact effective resistance probabilities that are computed based on calculating the pseudo-inverse of the Laplacian matrix (with a centralized approach based on standard matrix inversion techniques), (ii) The second algorithm ER-Kac is based on the effective-resistance weights approximated by the decentralized Kacmarz method.
In the experiments, each node possesses an initial vector and the goal is to approximate the node averages . We draw the data randomly according to the standard multi-variate normal distribution admitting a zero mean and a unit covariance matrix. In each trial, we record the number of wake-ups required to obtain the relative accuracy . We set and generated 250 independent runs. We calculated the average and the standard deviation of the wake-ups among these 250 runs on SBM, barbell graph, and small-world graphs. We presented our results in the Table 3 (see Figure 4 for the details of these graphs).
We observe that effective-resistance based algorithms (ER-Kac and ER-Ex) improve clearly upon the uniform (uniform weights-based) gossiping on all of the graph types in terms of both average wake-ups required and the standard deviation of the wake-ups. When we compare ER-Kac and ER-Ex with Metropolis weights, we observe that ER-Kac and ER-Ex are more efficient compared to Metropolis weights in the sense that they require a smaller number of wake-ups on average with a smaller standard deviation. The improvement is more pronounced for the graphs in Fig. 4(b)–4(e). These experimental results illustrate the effectiveness of our approach compared to existing approaches on a number of random graph topologies that can arise in practice.
5.2 Effective resistance-based DPGA-W and EXTRA
We implemented our ER-based communication framework into the state of the art distributed algorithms: DPGA-W [17] and EXTRA [43] to solve regularized logistic regression problems over a barbell graph with nodes: We minimize with
[TABLE]
where is the number of samples at each node, for denote the set of feature vectors and corresponding labels. We let and . For each and , we randomly generated i.i.d. instances of the problem in (17) by sampling independently from the normal distribution and setting if and to otherwise. Both algorithms are terminated after iterations. For benchmark, we also solved each instance of (17) using MOSEK [54] within CVX [50]. We initialized the iterates uniformly sampling each components from the interval for nodes in one , and from for nodes in the the other . The results for and are displayed in Fig. 5 and Fig. 6, respectively. We plotted relative suboptimality , function value sequence for the range , and consensus violation , where denotes the (synchronous) communication round counter – in each communication round neighboring nodes communicate among each other synchronously once – and denotes the th iterate; moreover, , , and is the minimizer to (17).
Both DPGA-W888In DPGA-W stepsize parameter is set to for – see [17]. and EXTRA uses a communication matrix that encodes the network topology. DPGA-W uses node-specific step-sizes initialized at for , where denotes the Lipschitz constant of , we adopted the adaptive step-size strategy described in [17, Sec. III.D]; and for EXTRA, we choose the constant step-size, common for all nodes, as suggested in [43], i.e., we choose the step size as , where .
For both algorithms, we compared two choices of : based on uniform edge weights, and based on effective resistances. In DPGA-W, the graph Laplacian is adopted for uniform weights, i.e., , while for the ER-based weights, we set where for and for and [math] otherwise. For EXTRA, where where denotes the largest eigenvalue; on the other hand, where for .
Figures 5 and 6 illustrate the performance comparison of both DPGA-W and EXTRA algorithms with effective resistance and uniform weights in terms of suboptimality, convergence in function values and consensus violation for the barbell graph and respectively – the reported results are averages over the 20 problem instances. The subfigures on the left of Figures 5 and 6 are for noise level whereas those on the right are for . In Figures 5 and 6, we observe that using ER weights improves upon the uniform weights for both EXTRA and DPGA-W methods consistently to solve the logistic regression problem in terms of suboptimality, function values and consensus violation significantly. We also observe that with noisier data, DPGA-W works typically faster than EXTRA in terms of function values and suboptimality. This is because when noise level gets larger, the local Lipschitz constant of the nodes demonstrate higher variability, and DPGA-W adapts to this variability as it uses a step size that is different at each node in a way to adapt to , whereas EXTRA uses a constant step size that is the same for all nodes. On the other hand, in terms of consensus violation, we see that EXTRA with ER weights typically outperforms DPGA-W with ER weights.
6 Conclusions
We obtained a number of theoretical guarantees for ER gossiping algorithms for the consensus problem for -barbell graphs and barbell graphs, and for arbitrary graphs depending on their diameter. The results fill a gap between the theory and practice of these methods. We also showed that these methods are effective for solving the consensus problem in practice over barbell graphs and small-world graphs. We provided numerical experiments demonstrating that using ER gossiping within EXTRA and DPGA-W methods improves their practical performance in terms of communication efficiency.
Acknowledgments
Bugra Can and Mert Gürbüzbalaban’s research were supported by the the Office of Naval Research Award Number N00014-21-1-2244, and the grants National Science Foundation (NSF) CCF-1814888 and NSF DMS-2053485. N. Serhat Aybat’s research is supported by the grants NSF CMMI-1635106 and ARO W911NF-17-1-0298.
Appendix A Proof of Propositions 6 and 9
Proof of Proposition 6.
The proof is based on finding the subset of the vertex set of -barbell graph that determines the conductance, i.e. that solves the minimization problem (10). First, for any given , the conductance of a subset with respect to the probability transition matrix is defined as
[TABLE]
Notice that the definition (10) implies that we have .999This follows after straightforward computations based on the the fact that the Markov chain with transition matrix and stationary distribution is a reversible Markov chain, i.e. for any with .With slight abuse of notation, for a subgraph with a vertex set , we define . We say that a vertex set on graph is a one-cut set if its complement is a connected subgraph of . Similarly, we define two-cut set to be a set whose complement consists of two disjoint non-empty connected subgraphs and of . We define
[TABLE]
For , we also define
[TABLE]
Note that matrices and are symmetric and Markov chains with these transition matrices have the uniform distribution as a stationary distribution. Therefore, Lemmas 13 and 14 provided in Appendix B imply that a set with minimal conductance should be a one-cut set and has to be given by the vertices of a subgraph for some for both and . The conductance of one-cut subgraphs with respect to these transition matrices can be computed explicitly (see Proof of Lemma 14 for details):
[TABLE]
Both of the expressions at (21) are minimized for the choice of . Therefore, the minimal conductance is attained for the subgraph . Plugging into the expressions above yields the graph conductance values at (12). The bounds (7) and (8) follow from Theorem 1 and inequalities (13). ∎
Proof of Proposition 9.
It follows from Corollary 15 and Lemma 16 in Appendix B that the second largest eigenvalues of and are given by: and . This implies directly and , which completes the proof. ∎
Appendix B Supporting Results
Lemma 13**.**
Consider a reversible Markov chain on a -barbell graph with a uniform stationary distribution. Let be a subgraph of whose vertex set is a non-empty two-cut set satisfying . Then, there exists another subgraph of such that .
Proof.
Let and be the vertex sets of two disjoint non-empty connected subgraphs within satisfying . Note that implies either or . Using the fact that the transition matrix of a reversible Markov chain with a uniform stationary distribution is symmetric, the definition (18) implies . Without loss of generality, choose to be the subgraph with vertices with (otherwise, pick the subgraph with vertex set instead), then
[TABLE]
which proves Lemma 13. ∎
Lemma 14**.**
Consider a Markov chain on a -barbell graph with a probability transition matrix . If or , then for any subgraph having a one-cut vertex set , there exists a subgraph for some such that where is defined by (19) and (20).
Proof.
For any subgraph having a one-cut vertex set , we can always a find a subgraph with vertex set for some such that either or (with the convention that is a singleton graph with a vertex set consisting of a single node). Let be the subgraph with vertex set . Since for both and , without loss of generality we can assume that satisfies the property (otherwise, we can replace with in the proof below). It follows after a straightforward computation (similar to the proof technique of Lemma 16) that transition probability matrices and on admit the explicit formula , [\overline{W}_{P^{u}}]_{i^{*}j}=\frac{1}{2c{\tilde{n}}^{2}}\Big{(}\frac{2{\tilde{n}}-1}{{\tilde{n}}-1}\Big{)}, , whereas , , , where and denote two adjacent nodes belonging to different complete subgraphs of , i.e., those with degree , and or such that and denote nodes in with degree . Note is greater than as in the case. Hence, for ,
[TABLE]
In the case of , let be the subset of nodes in the subgraph that contains nodes from both and – if no such exists, then corresponds to a subgraph for some . Now consider the former case, let us denote . The number of edges between and is given by . This is due to the fact that each node in has exactly many edges that connect to its complement. We have also for . This yields . ∎
Corollary 15**.**
Under the setting of Proposition 8, assume that the weight matrix is normalized, i.e., for all . Then is a doubly stochastic matrix and the eigenvalues of become
- •
* with multiplicity one,*
- •
* with multiplicity one,*
- •
* with multiplicity ,*
- •
\lambda_{\pm}=\frac{1}{2}\Big{(}F+G-A\,\pm\,\sqrt{S}\Big{)},
where and are as in Proposition 8. Moreover, satisfies
[TABLE]
and is the second largest eigenvalue, i.e. .
Proof.
Since is normalized, Proposition 8 applies with and . Thus eigenvalues simplify to the forms given in the statement. Note that . Therefore, satisfies (22). is the unique largest eigenvalue since is stochastic. It remains to show that is the second largest eigenvalue. Using (22), we can write \lambda_{+}\geq\frac{1}{2}\big{(}F+G-A+|F-G+A|\big{)}. There are two cases: or . In both cases, we observe . Since , we also have . Therefore Furthermore, since ; therefore . Finally, since . Thus, is non-negative and is the second largest eigenvalue. ∎
Lemma 16**.**
Consider the setting of Proposition 8:
If , then Proposition 8 applies with , , , and where
[TABLE]
The second largest eigenvalue of is given by , for .
If , then Proposition 8 applies with , , , and where
[TABLE]
Moreover, the second largest eigenvalue of is given by , for .
Proof of Lemma 16.
We first compute the entries of both and matrices explicitly for the barbell graph (i.e. ). Former one can be found directly from degrees of the nodes: if , if . Calculating requires us to find effective resistances on the graph. Following definition of resistance allows us to calculate them using Cayley’s formula for complete graphs,
[TABLE]
A complete graph with vertices has spanning trees, therefore barbell graph has spanning trees. Let be the number of trees passing from an edge then . So we have . This implies that number of spanning trees passing from an edge is on barbell graph, and definitely the number of spanning trees passing from the edge is . This implies, if , otherwise. Once we have explicit characterizations of and , using Lemma 4 we can compute the entries of and to be given as in and . The second largest eigenvalues of and follow from Corollary 15. ∎
Lemma 17**.**
[55, Eqn. (2.2)]** Let be the transition matrix of a Markov chain with stationary distribution . Let be a neighbor of , i.e. , then
Discussions on The Momentum-Based Acceleration Methods and ER-based Gossiping
In the literature, there have been two main approaches to improve the performance of gossiping algorithms: (i) improving the communication weights, (ii) modifying the averaging scheme, e.g., adding a momentum term. ER-based approach corresponds to the first category whereas the papers [19, 20, 21] belong to the second category and proposes alternative averaging techniques based on a momentum term. In momentum-based approaches, the next iterate at node does not only depend on the current iterate but also on the previous iterate as well as , i.e., the current and previous iterates of the neighbors of node , (see for example [19]).
In the following discussion, we illustrate the benefits of momentum-based approaches and how they can be used together with effective resistance weights to improve performance. For the sake of simplicity of the argument, we consider the case when the updates are synchronous. In this case, if denotes the local estimate of the global average, , at node in iteration , where denotes the vector of ones, gossiping algorithms consist of updates of the form:
[TABLE]
starting from the initial point , where is a doubly stochastic matrix. A common choice for the mixing matrix is
[TABLE]
where is a symmetric weighted Laplacian matrix and is a scalar satisfying (see e.g. [43, Section 2.4]). For each , for all , where is the set of neighbors of the node ; if and . Different choices of the matrix gives different algorithms. For example, uniform gossiping corresponds to the choice such that101010Note that instead of , for uniform gossip we set it as in (25) so that becomes symmetric.
[TABLE]
Similarly, we can study gossiping based on the ER-based weights in synchronous setting by considering the choice such that
[TABLE]
where is the effective resistance on the edge , and for all .
Gossiping algorithms with weighted Laplacian matrix are related to first-order, i.e., gradient-based, optimization algorithms. To illustrate this point further, consider the following convex quadratic optimization problem:
[TABLE]
where and is the global average that we want to compute. Noting that , the updates given in (23) with the choice of as in (24) can be viewed as applying a gradient descent update with step size on the quadratic optimization problem in (27). From the standard theory of gradient descent, it is well-known that the distance of converges to linearly at a rate for where denotes the minimum positive eigenvalue of , i.e., the second smallest eigenvalue for connected graphs. Therefore, we get the following non-asymptotic convergence:
[TABLE]
If we set the stepsize as where denotes the largest eigenvalue of , we get
[TABLE]
is called the condition number. When the condition number is very large, the convergence can be slow. Adding a momentum term is a technique to improve the convergence rate of gradient descent methods with respect to its dependency to the condition number. For example, Polyak’s heavy-ball (HB) method applied to the objective (27) consists of the iterations
[TABLE]
where the last term is referred to as the momentum term and is called the momentum parameter (see e.g., [19]). The convergence rate of the heavy-ball (HB) method on quadratic objectives of the form (27) has been well-studied in the literature and it can be shown that the heavy-ball method given in iterations (28) will converge to the consensus vector with the asymptotic linear convergence rate
[TABLE]
for a specific choice of the stepsize provided that is tuned properly as a function of the eigenvalue [19]. Achieving this rate with the choice of in [19] would require estimating . That being said for ill-conditioned problems when the condition number is sufficiently large, we observe that HB converges faster, i.e. . For example, for the barbell graph, with an analysis similar to that in Proposition 8 of the revised manuscript, we can characterize the eigenvalues of the weighted graph Laplacians and that correspond to uniform weights and ER-based weights given in (25) and (26) respectively and obtain
[TABLE]
Therefore, without momentum averaging (when ), we obtain the convergence rates
[TABLE]
for uniform weights and ER-based weights. On the other hand, for HB method, we obtain the rates
[TABLE]
We observe that fastest rate is obtained by using the HB method on the quadratic problem in (27) defined by the weighted Laplacian corresponding to the ER-weights, i.e., is the fastest rate in terms of its dependency to . This shows that ER weights can be used together with momentum averaging techniques. Basically, from (29), we observe that effective-resistance based approach yields to a better conditioned Laplacian compared to uniform weights; and further improvement can be achieved by employing momentum averaging. In other words, ER weights are needed to improve the conditioning of the weighted Laplacian matrix and momentum-based approaches can be used on top of this to get further performance improvement. Besides the HB method, Nesterov’s accelerated gradient method is an alternative momentum averaging-based technique which will also yield to similar accelerated convergence rates.
The discussion we provided was for the synchronous setup, the asynchronous setup can be analyzed similarly.111111In the asynchronous setup, at every iteration, node contacts a neighbor randomly to update its decision variable rather than contacting all the neighbors. In the case of the barbell graph, each node has neighbors so needs on average iterations to contact all the neighbors. Consequently, more iterations will be required to converge compared to the synchronous setup. With a similar analysis to above, it can be shown that ER weights on barbell graphs lead to where instead of obtained above in (30). The rate also follows directly from Proposition 9.
Further Discussions on Our Conductance Bounds and Averaging Time with Effective Resistances
We recall that the averaging time with an expected iteration matrix satisfies
[TABLE]
where denotes the second-largest eigenvalue. Therefore, comparing effective-resistance (ER) weights with uniform weights amounts to comparing the second-largest eigenvalues and , where and are the expected iteration matrices defined using ER and uniform weights, respectively. For barbell graphs (that correspond to the special case of -barbell graphs with ), our analysis is tight as we have developed an explicit formula for computing the second-largest eigenvalue of the matrix as well as the second-largest eigenvalue of . However, for -barbell graphs with , the second-largest eigenvalues of the gossiping matrices and are not explicitly known. Therefore, in our paper, we resorted to the conductance bounds which is a common technique in the literature to obtain lower and upper bounds on the second largest eigenvalue and consequently the spectral gap through the Cheeger inequalities. Based on this approach, we can obtain the following lower and the upper bounds for the spectral gaps and that correspond to ER and uniform weights, respectively:
[TABLE]
where denotes the graph conductance as defined in the paper for the reversible Markov chain corresponding to transition probability matrix .
To illustrate the tightness of our bounds, we consider the approximation ratio, i.e., the ratio of these bounds in a logarithmic scale
[TABLE]
We define and similarly for the uniform weights.
The closer the ratios and are to 1, the better the approximation quality is. In Table 6, we illustrate the tightness of our bounds for -barbell graphs that consists of cliques where each clique has nodes, where we report , . We also display the ratios and for uniform weights, which are computed similarly. The results illustrate that all the ratios lie in a reasonable range (in the interval ) with lower bounds being tighter than the upper bounds. These results show that conductance-based analysis leads to useful approximations. In particular, we can see that the lower bounds are becoming tighter ( is increasing) as the number of nodes increases on the graph.
As an additional experiment, we also computed the eigenvalues of and with the standard eigenvalue solver in Matlab 2021a (using the function eig with default settings). Using the second largest eigenvalues of and , we compute the times and required for both approaches. From (31), we see that
[TABLE]
In Figure 7, we plot the ratio on the right hand-side for the -barbell graph, denoted as . For different values of fixed, we vary and observe that the ratio is always larger than 1 and the ratio is growing as increases. This shows that ER weights admits better (smaller) averaging times for especially large networks, i.e., the performance gain being more and more significant as the number of nodes increases. In light of these experiments, we can conclude the superiority of ER weights over the uniform weights from a numerical perspective as well.
Normalized D-RK Algorithm
D-RK method for computing the effective resistances in a decentralized way and its normalized version which we call normalized D-RK has been introduced in [14] where the authors show that these methods converge linearly with rates
[TABLE]
respectively where denotes the smallest positive eigenvalue and is a normalization matrix defined as
[TABLE]
Based on numerical evidence, it was conjectured in [14] that normalized D-RK is faster than D-RK, i.e. . First, we provide a technical result and then the following proposition proves this conjecture.
Lemma 18**.**
The Laplacian has the following property: , where is defined by (34).
Proof.
Note that where we used the fact that for all . Applying arithmetic-harmonic mean inequality to the sequence , we obtain \frac{1}{n}||\mathcal{L}||_{F}^{2}=\frac{1}{n}\sum_{i=1}^{n}s_{i}\geq n\Big{[}\sum_{i=1}^{n}\frac{1}{s_{i}}\Big{]}^{-1}. We conclude by multiplying both sides with .∎
Now we are ready to prove our conjecture.
Proposition 19**.**
For defined by (34), the following inequality holds: . Then, it follows that where and are defined by (33).
Proof.
Since and are symmetric matrices so are and . Let and denote the eigenvalues of these matrices sorted in increasing order, i.e. is the largest eigenvalue, is the smallest one. By the eigenvalue interlacing result in [56, Chapter 2, Eq. (2.0.7)], we obtain121212We set and for in Eq. (2.0.7) in [56].
[TABLE]
where all the matrices have non-negative real eigenvalues as both and are symmetric with non-negative eigenvalues. Clearly, . Furthermore, the eigenvalues of and are the same 131313If is an eigenvector of the latter matrix corresponding to a non-zero eigenvalue , then would be the right eigenvector of the former matrix with the same eigenvalue; similarly, if is a right-eigenvector of corresponding to a nonzero eigenvalue , then is an eigenvector of with the same eigenvalue. Therefore, since is positive semidefinite with , we also have
[TABLE]
Moreover, is a diagonal matrix with diagonal entries ; therefore, eigenvalues of are given by with . Hence (35) is equivalent to
[TABLE]
where the inequalities follow from Lemma 18 and the fact that due to being a connected graph, where denotes the smallest positive eigenvalue. From (36) and (37), we conclude that is the smallest positive eigenvalue of , i.e.,
[TABLE]
Finally, using the fact that the eigenvalues of and are the same once again, we get . Combining this with (37) and (38) leads to
[TABLE]
which directly implies . This completes the proof. ∎
Proof of Proposition 8
The proof follows by adapting the proof of [37, Proposition 5.1] to our setting with minor modifications. It is based on exploiting the symmetry group properties of the barbell graph with algebraic techniques. We first give relevant background material below before going into the details of the proof.
Background Material
Consider a weighted graph . A permutation is a mapping that rearranges the vertices, i.e. it is a bijection from the node set to itself. We consider a permutation group , which is a group whose elements are permutations of and whose group operation is the composition of permutations in . By the group property, if two permutations , then the composition and in particular the identity permutation which maps all the elements of to itself is also contained in . The group that contains all the permutations defined on is denoted as .
The direct product of two groups is defined as the group that consists of elements from the Cartesian product of and with the elementwise composition, i.e. if and only if and and if and then the composition operation over is defined as . A subgroup of a group is normal if for all and we have . The semidirect product of two groups and is the group that consists of elements with and and the subgroup is normal in with the condition . The orbit of an element , under a permutation group is the set . In other words, the orbit of node is the set of vertices that can be mapped to by an element of the permutation group . This definition creates an equivalence relation on ; for , we say if . In particular, equivalence classes form a partition of .
A permutation is called an automorphism of the weighted graph if the weight matrix is invariant under , i.e. if . From this definition, an automorphism also satisfies where is the transition probability. We are interested in such permutations that preserve the structure of and therefore . The group of all automorphisms with the operation of composition of permutations is called the automorphism group of the graph and is denoted by . Let be a subgroup of and consider the orbits under the permutation group which partition the set . We define orbit graph to be the graph whose vertices consist of the equivalence classes for and we consider an induced Markov chain on the orbit graph with probability transition probabilities defined as
[TABLE]
This Markov chain is also called the orbit chain. It can be shown that the definition of the weights above does not depend on the choice of the element from the set (see e.g. [37]).
Proof
First, we consider the automorphism group of the barbell graph with edge weights given by Proposition 8. Consider the nodes and that connect the complete subgraphs of the barbell graph and without loss of generality assume that we enumerate the nodes so that , and a node is on the complete subgraph on the left hand-side and any node is on the complete subgraph on the right-hand side. We see from the symmetry structure of that if we take any two nodes from a complete subgraph and permute them, this would be an automorphism. Similarly, swapping the two complete subgraphs between them would be an automorphism; i.e. the permutation that maps is an automorphism. It follows from these observations that the automorphism group of is the group (see also [37] for more details). It is known that for any subgroup of the automorphism group, the eigenvalues of the transition matrix defined by (39) should also be an eigenvalue of the transition matrix (see e.g. [37, Section 3]). Note that the square matrix has dimension where , so the set of eigenvalues of are a subset of the set of all eigenvalues of . We are going to use this result to prove the Proposition 8. Next, we consider the eigenvalues of the transition matrices of the orbit chains under subgroups of :
a) The orbit chain under (Figure 8) has the transition matrix . Since is an eigenvalue, and its trace is the sum of eigenvalues; it follows that the other eigenvalue of this matrix is given by .
b) Consider the orbit chain under illustrated on the left panel of Figure 9.
This orbit graph has two orbits under permutation : One of them contains only one node (the node with a self-loop with weight ) and the other orbit has the remaining nodes. Notice that the latter orbit has identical elements and therefore the permutation group fixes one of the nodes having a loop with weight and permutes the remaining nodes among themselves without affecting the orbit with one node. Therefore, by [37, Thereom 3.1], the eigenvalues of the transition matrix of the orbit graph obtained by the permutation group (illustrated on the right panel of Figure 9) are also eigenvalues of the transition matrix . The transition matrix is with three eigenvalues, including and that we have already found at part . The third eigenvalue can be computed from the transition matrix of the orbit chain under :
[TABLE]
where we use to denote the entries of this matrix that will not be relevant to our discussion. In particular, the eigenvalues of this matrix will be , and ; the latter will be an eigenvalue of with multiplicity . Again, using the fact that the trace of a matrix is equal to the sum of its eigenvalues, we obtain
[TABLE]
c) Lastly, orbit chain under consists of four orbits: points in the left and right complete graphs and vertices and as illustrated in Figure 10.
This orbit chain has the transition matrix of the form
[TABLE]
After a straightforward computation, it can be checked that this matrix has the eigenvalues, where
[TABLE]
and S=\bigg{(}\frac{F}{B+F}+\frac{G-A}{A+E+G}\bigg{)}^{2}-\frac{4(FG-BE-AF)}{(B+F)(A+E+G)}.
Remark 20**.**
Boyd et al. [37] studied the case where similar orbit chains and graphs arise. The proof of Proposition 8 given here is a minor modification of the original proof of Boyd et al. [37, Proposition 2.2] and extends it to the more general case where or can be strictly positive.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. J. Klein. Resistance-distance sum rules. Croatica chemica acta , 75(2):633–649, 2002.
- 2[2] D. J. Klein and M. Randić. Resistance distance. Journal of Mathematical Chemistry , 12(1):81–95, 1993.
- 3[3] A. Ghosh, S. Boyd, and A. Saberi. Minimizing effective resistance of a graph. SIAM review , 50(1):37–66, 2008.
- 4[4] D. Aldous and J. A. Fill. Reversible Markov chains and random walks on graphs, 2014. Unfinished monograph, available at: http://www.stat.berkeley.edu/ ∼ similar-to \sim aldous/RWG/book.html.
- 5[5] P. G. Doyle and J. L. Snell. Random walks and electric networks . Mathematical Association of America,, 1984.
- 6[6] D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM Journal on Computing , 40(6):1913–1926, 2011.
- 7[7] Rajat Chandra Mishra and Himadri Barman. Effective resistances of two-dimensional resistor networks. European Journal of Physics , 42(1):015205, Dec 2020.
- 8[8] M. A. Jafarizadeh, R. Sufiani, and S. Jafarizadeh. Calculating effective resistances on underlying networks of association schemes. Journal of Mathematical Physics , 49(7):073303, Jul 2008.
