Randomized Load Balancing on Networks with Stochastic Inputs
Leran Cai, Thomas Sauerwald

TL;DR
This paper analyzes the average-case performance of load balancing algorithms on various network topologies with stochastic inputs, providing bounds on discrepancy that highlight differences from worst-case scenarios.
Contribution
It introduces new bounds on load discrepancy for multiple network types under stochastic inputs, extending previous worst-case analyses to average-case scenarios.
Findings
Bounds on discrepancy for cycles, tori, hypercubes, and expanders.
Significant difference between worst-case and average-case convergence.
Applicable to various probability distributions including unbounded ones.
Abstract
Iterative load balancing algorithms for indivisible tokens have been studied intensively in the past. Complementing previous worst-case analyses, we study an average-case scenario where the load inputs are drawn from a fixed probability distribution. For cycles, tori, hypercubes and expanders, we obtain almost matching upper and lower bounds on the discrepancy, the difference between the maximum and the minimum load. Our bounds hold for a variety of probability distributions including the uniform and binomial distribution but also distributions with unbounded range such as the Poisson and geometric distribution. For graphs with slow convergence like cycles and tori, our results demonstrate a substantial difference between the convergence in the worst- and average-case. An important ingredient in our analysis is new upper bound on the t-step transition probability of a general Markov…
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\Copyright
Leran Cai and Thomas Sauerwald
Randomized Load Balancing on Networks with Stochastic Inputs
Leran Cai
University of Cambridge, email: (lc647|tms41)@cl.cam.ac.uk
Thomas Sauerwald
University of Cambridge, email: (lc647|tms41)@cl.cam.ac.uk
Abstract.
Iterative load balancing algorithms for indivisible tokens have been studied intensively in the past, e.g., [22, 19, 25]. Complementing previous worst-case analyses, we study an average-case scenario where the load inputs are drawn from a fixed probability distribution. For cycles, tori, hypercubes and expanders, we obtain almost matching upper and lower bounds on the discrepancy, the difference between the maximum and the minimum load. Our bounds hold for a variety of probability distributions including the uniform and binomial distribution but also distributions with unbounded range such as the Poisson and geometric distribution. For graphs with slow convergence like cycles and tori, our results demonstrate a substantial difference between the convergence in the worst- and average-case. An important ingredient in our analysis is new upper bound on the -step transition probability of a general Markov chain, which is derived by invoking the evolving set process.
Key words and phrases:
random walks, randomized algorithms, parallel computing
1991 Mathematics Subject Classification:
G.3 Probability and Statistics
1. Introduction
In the last decade, large parallel networks became widely available for industrial and academic users. An important prerequisite for their efficient usage is to balance their work efficiently. Load balancing is known to have applications to scheduling [28], routing [8], numerical computation such as solving partial differential equations [30, 29, 27], and finite element computations [13]. In the standard abstract formulation of load balancing, processors are represented by nodes of a graph, while links are represented by edges. The objective is to balance the load by allowing nodes to exchange loads with their neighbors via the incident edges. In this work we will study a decentralized and iterative load balancing protocol where a processor knows only its current load and that of the neighboring processors and based on this, decides how much load should be sent (or received).
Load Balancing Models. A widely used approach is diffusion, e.g., the first-order-diffusion scheme [8, 19], where the amount of load sent along each edge in each round is proportional to the load difference between the incident nodes. In this work, we consider the alternative, the so-called matching model, where in each round only the edges of the matching are used to average the load locally. In comparison to diffusion, the matching model reduces the communication in the network and moreover tends to behave in a more “monotone” fashion than diffusion, since it avoids concurrent load exchanges which may increase the maximum load or decrease the minimum load in certain cases.
We measure the smoothness of the load distribution by the so-called discrepancy which is the difference between the maximum and minimum load among all nodes. In view of more complex scenarios where jobs are eventually removed or new jobs are generated, the discrepancy seems to be a more appropriate measure than the makespan, which only considers the maximum load.
Many studies in load balancing assume that load is arbitrarily divisible. In this so-called continuous case, load balancing corresponds to a Markov chain on the graph and one can resort to a wide range of established techniques to analyze the convergence speed [6, 10, 19]. In particular, the spectral gap captures the time to reach a small discrepancy fairly accurately, e.g., see [26, 22] for the diffusion and see [7, 18] for the matching model.
However, in many applications a processor’s load may consist of tasks which are not further divisible, which is why the continuous case has been also referred to as “idealized case” [22]. A natural way to model indivisible tasks is the unit-size token model where one assumes a smallest load entity, the unit-size token, and load is always represented by a multiple of this smallest entity. In the following, we will refer to the unit-size token model as the discrete case.
Initiated by the work of [22], there has been a number of studies on load balancing in the discrete case. Unlike [22], [25] analyzed a randomized rounding based strategy, meaning that an excess token will be distributed uniformly at random among the two communicating nodes. The authors of [25] proved that with this strategy the time to reach constant discrepancy in the discrete case is essentially the same as the corresponding time in the continuous case. Their results hold both for the random matching model, where in each round a new random matching is generated by a simple distributed protocol, and the balancing circuit model (a.k.a. dimension exchange), where a fixed sequence of matching is applied periodically. In this work, we will focus on the balancing circuit model, which is particularly well suited for highly structured graphs such as cycles, tori or hypercubes.
Worst-Case vs. Average-Case Inputs. Previous work has almost always adopted the usual worst-case framework for deriving bounds on the load discrepancy [22]. That means that any upper bound on the discrepancy holds for an arbitrary input, i.e., an arbitrary initial load vector. While it is of course very natural and desirable to have such general bounds, the downside is that for graphs with poor expansion like cycles or 2D-tori, the convergence is rather slow, i.e., quadratic or linear in the number of nodes .
This serves as a motivation to explore an average-case input. Specifically, we assume that the number of load items at each node is sampled independently from a fixed distribution. Our main results demonstrate that the convergence of the load vector is considerably quicker (measured by the load discrepancy), especially on networks with slow convergence in the worst-case such as cycles and 2D-tori.
We point out that many related problems including scheduling on parallel machines or load balancing in a dynamic setting (meaning that jobs are continuously added and processed) have been studied under random inputs, e.g., [4, 11, 2]. To the best of our knowledge, only very few works have studied this question in iterative load balancing. One exception is [23], which investigated the performance of continuous load balancing on tori in the diffusion model. In contrast to this work, however, only upper bounds are given and they hold for the multiplicative ratio between maximum and minimum load, rather than the discrepancy.
Our main results in this paper hold for all distributions satisfying the following definition, which is satisfied by the uniform, binomial, Poisson and geometric distribution (see Section 2).
Definition 1.1**.**
We say that a distribution over is exponentially concentrated if there is a constant so that for any , ,
[TABLE]
where and are the expectation and variance of . In the following, we refer to average-case when the initial number of load items on each vertex is drawn independently from a fixed exponentially concentrated distribution.
Our Results. Our first contribution is a general formula that allows us to express the load difference between an arbitrary pair of nodes in round . Here the round matrix is the product of the matching matrices that are applied periodically (cf. Section 2).
Theorem 1.2**.**
Consider the balancing circuit model with an arbitrary round matrix in the average case. Then for any pair of nodes and round , it holds for any that
[TABLE]
Further, for any pair of vertices and any round satisfying ,
[TABLE]
The proof of the upper bound Theorem 1.2 is the easier direction, and it relies on a previous result relating continuous and discrete load balancing from [25]. The lower bound is technically more challenging and applies a generalized version of the central limit theorem.
Together, the upper and lower bound in the above result establish that the load deviation between any two nodes and is essentially captured by . However, in some instances it might be desirable to have a more tangible estimate at the expense of generality. A first step towards this goal is to observe that (see Lemma 4.1). Hence we are left with the problem of understanding the -step probablity vector .
For reversible Markov chains, the last expression has been analyzed in several works, e.g., a result from [15, Lemma 3.6] implies that for random walks on graphs, (cf. [15]). However, the Markov chain associated to is not reversible in general. For irreversible Markov chains, [14] use the so-called evolving set process to derive a similar bound. Specifically, they proved in [14, Theorem 17.17] that if denotes the transition matrix of a lazy random walk (i.e., a random walk with loop probability at least ) on a graph with maximal degree , then for any vertex :
[TABLE]
where is the stationary distribution of . Such estimates have been used in various applications besides load balancing, including distributed random walks and spanning tree enumeration [24, 15]. Here we generalize this result to Markov chains with an arbitrary loop probability and to arbitrary -step transition probabilities:
Theorem 1.3**.**
Let be the transition matrix of an irreducible Markov chain and its stationary distribution. Then we have for all states and step ,
[TABLE]
where and .
Applying this bound to a round matrix that is formed of matchings, we obtain It should be noted that [25, Lemma 2.5] proved a weaker version where the upper bound is only instead of . As we will prove in Lemma 5.3, the bound is asymptotically tight if we consider the balancing circuit model on cycles.
Combining the bound in Theorem 1.3 with the upper bound in Theorem 1.2 yields:
Theorem 1.4**.**
Consider the balancing circuit model with an arbitrary round matrix consisting of matchings in the average case. Then the discrepancy after rounds is with probability .
Since the initial discrepancy in the average case is (see Lemma 2.2), Theorem 1.4 implies that in the average case, there is a signficant decrease (roughly of order ) in the discrepancy, regardless of the underlying topology.
For round matrices with small second largest eigenvalue, the next result provides a significant improvement:
Theorem 1.5**.**
Consider the balancing circuit model with an arbitrary round matrix consisting of matchings in the average case. Then the discrepancy after rounds is with probability .
Hence for graphs where is bounded away from , we even obtain an exponential convergence.
In Section 5, we derive bounds on the discrepancy for cycles, -dim. Torus, expanders and hypercubes. A summary of these results can be found in Figure 1.
Finally, we discuss our results and contrast them to the convergence of the discrepancy in the worst-case in Section 6. On a high level, these results demonstrate that on all the considered topologies, we have much faster convergence in the average-case than in the worst-case. However, if we are only interested in the time to achieve a very small, say, constant or poly-logarithmic discrepancy, then we reveal an interesting dichotomy: we have a quicker convergence than in the worst-case if and only if the standard deviation is smaller than some threshold, which depends on the actual toplogy. We observe the same phenomena in our experiments, which are also discussed in Section A.
2. Notation and Background
We assume that is an undirected, connected graph with nodes labelled in . Unless stated otherwise, all logarithms are to the base . The notations and denote the probability of an event and the expectation of a random variable , respectively. For any -dimensional vector , denotes the discrepancy.
Matching Model. In the matching model (sometimes also called dimension exchange model), every two matched nodes in round balance their load as evenly as possible. This can be expressed by a symmetric by matching matrix , where with slight abuse of notation we use the same symbol for the matching and the corresponding matching matrix. Formally, matrix is defined by , and if , and , if is not matched.
Balancing Circuit. In the balancing circuit model, a specific sequence of matchings is applied periodically. More precisely, let be a sequence of matching matrices, also called period 111Note that may be different from the maximal degree (or degree) of the underlying graph.. Then in step , we apply the matching matrix . We define the round matrix by . If is symmetric, we define to be its second largest eigenvalue (in absolute value). Following [22], if is not symmetric (which is usually the case), we define as the second largest eigenvalue of the symmetric matrix , where is the transpose of . We always assume that , which is guaranteed to hold if the matrix is irreducible. Notice that since is doubly stochastic, all powers of are doubly stochastic as well. A natural choice for the matching matrices is given by an edge coloring of . There are various efficient distributed edge coloring algorithms, e.g. [21, 20].
Balancing Circuit on Specific Toplogies. For hypercubes, the canonical choice is dimension exchange consisting of matching matching matrices by if and only if the bit representation of and differ only in bit . Then the round matrix is defined by . For cycles, we will consider the natural “Odd-Even”-scheme meaning that for , the matching consists of all edges for any odd , while for , the matching consists of all edges for any even . More generally, for -dimensional tori with vertex set , we will have matchings in total, meaning that for every dimension we have two matchings along dimension , similar to the definition of matchings for the cycle.
The Continuous Case. In the continuous case, load is arbitrarily divisible. Let be the initial load represented as a row vector, and in every round two matched nodes average their load perfectly. We consider the load vector after rounds in the balancing circuit model (that means, after the executions of matchings in total). This process corresponds to a linear system and , can be expressed as , which results in .
The Discrete Case. Let us now turn to the discrete case with indivisible, unit-size tokens. Let be the initial load vector with average load , and be the load vector at the end of round . In case the sum of tokens of the two paired nodes is odd, we employ the so-called random orientation (or randomized rounding) [22, 25]. More precisely, if there are two nodes and with load and being paired by matching , then node gets either \big{\lceil}\frac{a+b}{2}\big{\rceil} or \big{\lfloor}\frac{a+b}{2}\big{\rfloor} tokens, with probability each. The remaining tokens are assigned to node .
The Average-Case Setting. We consider a setting where each entry of the initial load vector is chosen from an exponentially concentrated probability distribution with expectation and variance (see Definition 1.1). It is not difficult to verify that many natural distributions satisfy the condition of exponentially concentrated (see the appendix for more details).
Lemma 2.1**.**
The uniform distribution, binomial distribution, geometric distribution and Poisson distribution are all exponentially concentrated.
Proof.
Note that the uniform distribution is trivially exponentially concentrated, since . However, also distributions with unbounded range may be exponentially concentrated, with one example being the geometric distribution . To verify this, first note that we have and (and so ) and thus holds trivially for a sufficiently small constant . Secondly, for the upper tail, by Markov’s inequality, , and by the memoryless property of the geometric distribution, for any , .
For the binomial distribution with expectation and standard deviation , we will assume w.l.o.g. that , so that . Then by [17, Theorem 2.3], we have for , . Choosing yields , as needed. For the lower tails, we use and obtain a similar result as before (see again [17, Theorem 2.3]).
For the Poisson Distribution , we can verify in an analogous way that it is exponentially distributed by using the following two Chernoff bounds for Poisson random variables (B.1). ∎
The definition of exponentially concentrated implies the following concentration result:
Lemma 2.2**.**
Let be an exponentially concentrated distribution and let . Then,
[TABLE]
In particular, the initial discrepancy satisfies with probability at least .
The advantage of Lemma 2.2 is that we can use a simple conditioning trick to work with distributions that have a finite range and are therefore easier to analyze with concentration tools like Hoeffding’s inequality (Theorem B.3). That is in the analysis we simply work with a bounded-range distribution , which is the distribution under the condition that only values in the interval occur.
3. Proof of the General Bound (Theorem 1.2)
See 1.2
3.1. Proof of Theorem 1.2 (Upper Bound)
We will use the following result from [25] that bounds the deviation between the continuous and discrete load, assuming that we have .
Theorem 3.1** ([25, Theorem 3.6()]).**
Consider the balancing circuit model with an arbitrary round matrix . Then for any round it holds that
[TABLE]
Proof of Theorem 1.2 (Upper Bound).
Recall that the initial vector consists of i.i.d. random variables. As explained at the end of Section 2, we condition on the event
[TABLE]
By Lemma 2.2, . In the remainder of the proof, all random variables are conditional on , but for simplicity we will not explicitly express this conditioning.
Since , the load is just a weighted sum of i.i.d. random variables and we obtain
[TABLE]
which is in fact still a sum of i.i.d. random variables. The expectation is
[TABLE]
where the last equality holds since is doubly stochastic.
Now applying Hoeffding’s inequality (Theorem B.3) and recalling that conditional on , the range of each is , we obtain that
[TABLE]
Applying Theorem 3.1 yields
[TABLE]
The statement of the theorem follows by scaling and recalling that . ∎
3.2. Proof of Theorem 1.2 (Lower Bound)
The proof of the lower bound will use the following quantitative version of a central limit type theorem for independent but non-identical random variables.
Theorem 3.2** (Berry-Esseen Theorem [5, 9] for non-identical r.v.).**
Let be independently distributed with , , and . If is the distribution of and is the standard normal distribution, then
[TABLE]
where and is a constant.
With this concentration tool at hand, we are able to prove the lower bound in Theorem 1.2. Unfortunately, it appears quite difficult to apply Theorem 3.2 directly to equation (3.1), since we need a good bound on the error term . To this end, we will first partition the vertex set into buckets with equal contribution to . Then we will apply Theorem to the bucket with the largest variance, for which we can show that , thanks to the precondition that and the bound in Theorem 1.3.
Proof of Theorem 1.2 (Lower Bound).
As in the derivation of the upper bound, we first consider :
[TABLE]
Again we are dealing with a weighted sum of i.i.d. random variables with expectation and variance . As mentioned earlier, we have since is a doubly stochastic matrix. Of course, we could apply Theorem 3.2 directly to , but it appears difficult to control the error term . Therefore we will first partition the above sum into buckets where the weights of the random variables are roughly the same.
More precisely, we will partition into buckets, where for each we have for , and .
Further, let us consider the variance of :
[TABLE]
Then by the pigeonhole principle there exists an index such that
[TABLE]
Firstly, if that index is equal to , then
[TABLE]
and the lower bounds holds trivially. Therefore, we will assume in the remainder of the proof that . We now decompose into , where
[TABLE]
and
[TABLE]
Let us first analyze . We will now apply Theorem 3.2 to . In preparation for this, let us first upper bound . Using the definition of exponentially concentrated, it follows that for any constant , the first moments are all bounded from above by . Hence,
[TABLE]
Recalling that for any , , we can simplify the above expression as follows:
[TABLE]
However, since we have , by Theorem 1.3, and therefore it must be that , and we conclude that .
Before applying Theorem 3.2, we scale the original distribution to . Since , we have
[TABLE]
As derived earlier , and therefore
[TABLE]
where last inequality uses [1, Formula 7.1.13]:
[TABLE]
Therefore, by substitution, we get
[TABLE]
Hence with ,
[TABLE]
Similarly, we can derive that
[TABLE]
Hence, independent of what the value is, there is still a probability of at least so that . ∎
4. Proof of the Universal Bounds (Theorem 1.4, Theorem 1.5)
In the previous section we proved that the deviation between the loads of two nodes and is essentially captured by . However, in some cases it might be hard to compute or estimate this quantity for arbitrary vertices and . Therefore we will first prove the following universal upper bound on the discrepancy that works for arbitrary graphs and pair of nodes, as stated on page 1.4.
See 1.4
4.1. Proof of Theorem 1.4
The proof of Theorem 1.4 is fairly involved and we first sketch the high level ideas. We first show that can be upper bounded in terms of the -distance to the stationary distribution.
Lemma 4.1**.**
Consider the balancing circuit model with an arbitrary round matrix . Then for all , we have Further, for any we have .
Proof.
[TABLE]
and the first statement follows. We now prove the second statement:
[TABLE]
We first look at the difference between these two terms squared. That is, for any vertex we have
[TABLE]
Now let be a uniform random variable over the set . Then it follows that
[TABLE]
Further, by linearity of expectations
[TABLE]
By definition of expectation, this implies that there exists a vertex such that
[TABLE]
[TABLE]
where the second last inequality holds since is doubly stochastic. ∎
The next step and main ingredient of the proof of Theorem 1.4 is to establish that . This result will be a direct application of a general bound on the -step probabilities of an arbitrary, possibly non-reversible Markov chain, as given in Theorem 1.3 from page 1.3:
See 1.3
In this subsection we prove Theorem 1.4, assuming the correctness of Theorem 1.3 whose proof is deferred to Section 4.2.
Proof of Theorem 1.4.
By Theorem 1.2 and Lemma 4.1, we obtain
[TABLE]
Hence we can find a so that the latter probability gets smaller than . Further, by applying Theorem 1.3 with to we conclude that since . Using the fact , and by the union bound, with probability at least . ∎
4.2. Proof of Theorem 1.3
This section is devoted to the proof of Theorem 1.3. Our proof is based on the evolving-set process, which is a Markov chain based on any given irreducible, not necessarily reversible Markov chain on . For the definition of the evolving set process, we closely follow the exposition in [14, Chapter 17].
Let denote the transition matrix of an irreducible Markov chain and its stationary distribution. is the -step transition probability matrix. The edge measure is defined by and .
Definition 4.2**.**
Given a transition matrix , the evolving-set process is a Markov chain on subsets of defined as follows. Suppose the current state is . Let be a random variable which is uniform on . The next state of the chain is the set
[TABLE]
This chain is not irreducible because and are absorbing states. It follows that
[TABLE]
since the probability that is equal to the probability of the event that the chosen value of is less than .
Proposition 4.3** ([14, Proposition 17.19]).**
Let be a non-negative martingale with respect to , and define Assume that for any
- (i)
For , , and 2. (ii)
.
Let . If is a constant, then
We now generalize [14, Lemma 17.14] to cover arbitrarily small loop probabilities.
Lemma 4.4**.**
Let be a sequence of independent random variables, each uniform on , such that is generated from using . Then with ,
[TABLE]
We list a few auxiliary results from [14] about the evolving set process that will be used to prove the result.
Lemma 4.5** ([14, Lemma 17.12]).**
If is the evolving-set process associated to the transition matrix , then for any time and
[TABLE]
Recall that is the evolving-set process based on the Markov chain whose transition matrix is . means the probability of the event with the initial state of the evolving set being .
Lemma 4.6** ([14, Lemma 17.13]).**
The sequence is a martingale.
Theorem 4.7** ([14, Corollary 17.7]).**
Let be a martingale and a stopping time. If and for all and some constant where , then .
Proof of Lemma 4.4.
Given , the distribution of is uniform on .
**Case 1: **For , we know that for satisfying
[TABLE]
and for satisfying ,
[TABLE]
We know that
[TABLE]
Since if and only if , we therefore can combine the above results by using an inequality and conclude that
[TABLE]
because and .
Case 2: For , we have , it follows that when
[TABLE]
We have
[TABLE]
Based on previous results, we can see that
[TABLE]
By Lemma 4.6 and the formulas above,
[TABLE]
Rearranging shows that
[TABLE]
∎
The derivation of the next lemma closely follows the analysis in [14, Chapter 17]. For the sake of completeness, a proof can be found in the appendix.
Lemma 4.8**.**
For any two states ,
Proof.
First of all, let the hitting time
[TABLE]
We have and . We consider an evolving set process with . By Theorem B.2 and Lemma 4.6,
[TABLE]
For the last equality, it is true because we know that can only be or . Hence, the probability that is an element in is equal to the probability that is . Note that here the second in the last line can be any other element in . For example, we also know that
[TABLE]
For our bound, we know that by Lemma 4.5 and (4.3),
[TABLE]
By (4.4),
[TABLE]
By simple substitution we obtain
[TABLE]
The last line is true because we remove all possible intersections. ∎
Now we want to use Proposition 4.3 to bound . To apply it, we substitute the following parameters: is chosen to be , is , and . Hence in our case, is the same as (or ) in the proposition. The following two lemmas elaborate on the two preconditions (i) and (ii) of Proposition 4.3.
Lemma 4.9**.**
For any time and ,
Proof.
Conditioning always reduces variance and or , we have
[TABLE]
For ,
[TABLE]
and by Lemma 4.4, we know that
[TABLE]
For simplicity, we let be , be and be . Then we have
[TABLE]
In order to derive a lower bounds on this variance, based on Lemma 4.4 we let and . With this we obtain
[TABLE]
Therefore, provided , we have
[TABLE]
The last inequality follows from the fact that if then there exist with , whence
[TABLE]
Since , we finally obtain
[TABLE]
∎
Finally, we derive an upper bound on the amount by which can increase in one iteration.
Lemma 4.10**.**
For any time and ,
Proof.
Since
[TABLE]
If decreases to [math], then every is at least connected to an . In other words, for and . Hence .
We also know that
[TABLE]
∎
The proof of Theorem 1.3 follows then by combining Proposition 4.3, Lemma 4.4, Lemma 4.8, Lemma 4.9 and Lemma 4.10.
Proof of Theorem 1.3.
With the help of the previous three lemmas, we can apply Proposition 4.3 with , and to obtain
[TABLE]
∎
4.3. Proof of Theorem 1.5
We now prove the following discrepancy bound that depends on the , as defined in Section 2.
Proof.
By [25, Lemma 2.4], for any pair of vertices , Hence by Lemma 4.1 , and the bound on the discrepancy follows from Theorem 1.2 and the union bound over all vertices. ∎
5. Applications to Different Graph Topologies
Cycles. Recall that for the cycle, is the set of vertices, and the distance between two vertices is for any pair of vertices .
The upper bound on the discrepancy follows directly from Theorem 1.4, and it only remains to prove the lower bound. To this end, we will apply the lower bound in Theorem 1.2 and need to derive a lower bound on . Intuitively, if we had a simple random walk, we could immediately infer that this quantity is , since after steps, the random walk is with probability at any vertex with distance at most . To prove that this also holds for the load balancing process, we first derive a concentration inequality that upper bounds the probability for the random walk to reach a distant state:
Lemma 5.1**.**
Consider the standard balancing circuit model on the cycle with round matrix . Then for any and , we have
[TABLE]
Proof.
The proof of the lemma above makes uses of the following variant of Azuma’s concentration inequality for martingales, which can be for instance found in McDiarmid’s survey on concentration inequalities.
Lemma 5.2** ([17, Theorem 3.13 & Inequality 41]).**
Let be a martingale difference sequence with for each , for suitable constants . Then for any ,
[TABLE]
Note that the balancing circuit on the cycle corresponds to the following random walk on the vertex set , where for any time-step , denotes the position of the random walk after step . First, we consider the transition for any odd : If is odd, then with probability , and otherwise . If is even, then with probability , and otherwise (additions and subtractions are under the implicit assumptions that and ). The case for even is analogous.
We will couple the random walk with another random walk on the integers , where again denotes the position of the walk after step . The transition probabilities are exactly the same as for the walk , the only difference is that we don’t use the equivalences and . It is clear that we can couple the transitions of the two walks so that they evolve identically as long as the walks do not reach any of the two boundary points or .
Let us first analyze for an odd time step. As described above, the distribution of depends on whether is even or not. However, notice regardless of where the random walk is at step , the random walk will be at an odd or even vertex at step with probability each. Hence for any starting position ,
[TABLE]
and further,
[TABLE]
Combining the last two inequalities shows that for any start vertex ,
[TABLE]
With the same arguments as before we conclude that for any fixed start vertex ,
[TABLE]
because the expected differences of are all zero whenever .
Let us now consider the martingale , and let be the corresponding martingale difference sequence. As shown before, . Hence by Lemma 5.2,
[TABLE]
If for every , holds, then this implies both random walks and behave identically since none of them ever reaches any of the two boundary points or . In particular we conclude that for the original walk ,
[TABLE]
where the second-to-last inequality is due to the fact that . ∎
With the help Lemma 5.1, we can indeed verify our intuition:
Lemma 5.3**.**
Consider the standard balancing circuit model on the cycle with round matrix . Then for any vertex , .
Proof.
Define , so that . With and
[TABLE]
By Cauchy-Schwarz inequality,
[TABLE]
∎
Lemma 5.3 also proves that the factor in the upper bound in Theorem 1.3 is best possible. The lower bound on the discrepancy now follows by combining Lemma 5.3 with Theorem 1.2 and Lemma 4.1 stating that for any vertex , there exists another vertex such that .
Tori. In this section we consider -dimensional tori, where is any constant. For the upper bound, note that the computation of can be decomposed to independent computations in the dimensions, and each dimension has the same distribution as the cycle on vertices. Specifically, if we denote by the round matrix of the standard balancing circuit scheme on the cycle with vertices and is the round matrix of the -dimensional torus with vertices, then for any pair of vertices on the torus we have From Theorem 1.3, , and therefore, since is constant,
[TABLE]
and thus for any pair of vertices . Hence by Lemma 4.1, . Plugging this bound into Theorem 1.2 yields that the load difference between any pair of the nodes and at round is at most with probability at least . The bound on the discrepancy now simply follows by the union bound.
We now turn to the lower bound on the discrepancy. With the same derivation as in Lemma 5.3 we obtain the following result:
Lemma 5.4**.**
Consider the standard balancing circuit model on the -dimensional torus with round matrix . Then for any vertex , .
As before, the lower bound on the torus now follows by combining Lemma 5.4 with the general lower bound given in Theorem 1.2.
Expanders. The upper bound for expanders follows immediately from Theorem 1.5. For the lower bound, since the round matrix consists of matchings, it is easy to verify that whenever , we have . Consequently, for any vertex , . Plugging this into Theorem 1.2 yields a lower bound on the discrepancy which is .
Hypercubes. For the hypercube, there is a worst-case bound of [16, Theorem 5.1 5.3] for any input after iterations of the dimension-exchange, i.e., after one execution of the round matrix. Hence, we will only analyze the discrepancy after matchings, where .
The derivation of the lower bound is almost analogous to the one for expanders, since for any pair of vertices , (recall that is the matching applied in the -step of the dimension exchange). The only difference is that we are counting matchings individually and not full periods. By applying the same analysis as in Theorem 1.5, but with the stronger inequality , and we obtain that the upper bound of the discrepancy is . Applying Theorem 1.2, we obtain the lower bound .
6. Discussion and Empirical Results
6.1. Average-Case versus Worst-Case
We will now compare our average-case to a worst-case scenario on cycles, 2D-tori and hypercubes. For the sake of concreteness, we always assume that the input is drawn from the uniform distribution , where will be specified later. Note that the total number of tokens is , and the initial discrepancy will be . Our choice for the worst-case load vector will have the same number of tokens and initial discrepancy, however, the exact definition of the vector as well as the choice of the parameter will depend on the underlying topology.
Cycles. As one representative of a worst-case setting, fix an arbitrary node and let all nodes with distance at most initially have a load of while all other nodes have load [math]. This gives rise to a load vector with tokens and initial discrepancy .
2D-Tori. Again, we fix an arbitrary node and assign a load of to the -nearest neighbors of and load [math] to the other nodes. Again, this defines a load vector with tokens and initial discrepancy .
The next result provides a lower bound on the discrepancy for cycles and 2D-tori in the aforementioned worst-case setting. It essentially shows that for worst-case inputs, rounds and rounds are necessary for the cycle, 2D-tori, respectively, in order to reduce the discrepancy by more than a constant factor. This stands in sharp contrast to Theorem 1.4, proving a decay of the discrepancy by , starting from the first round.
Proposition 6.1**.**
For the aforementioned worst-case setting on the cycle, it holds for any round that with probability at least . Further, for 2D-tori, it holds for any round that with probability at least .
Proof.
We first consider the case of a cycle. Let be the subset of nodes that have a non-zero initial load; so . Clearly, there is a subset of nodes with so that for each node , only nodes with can have .
We will now derive a lower bound on the discrepancy in this worst-case setting by upper bounding the load of vertices in the subset . To lower bound the discrepancy at round , recall that by Lemma 5.1 we have that
[TABLE]
Let us now choose , and we thus conclude that
[TABLE]
This implies for the total load of vertices in at time :
[TABLE]
where is the average load. Recalling that , by the pigeonhole principle there exists a node such that
[TABLE]
This immediately implies the following lower bound on the discrepancy:
[TABLE]
where is the average load. The corresponding lower bound on follows by Theorem 3.1 and the union bound.
The proof for the 2-dimensional torus is almost identical. Again, let be the set of nodes that have a non-zero load. Clearly, there is a subset with so that for each node , only nodes with can have .
Let us now view as the transition matrix of a Markov chain. Then is obtained by running two independent Markov chains (one for each dimension), where each of the two Markov chains corresponds to the round matrix of the cycle. We can still apply Lemma 5.1 as before, even though here the size of each cycle is , to obtain that
[TABLE]
Here we choose , and the remaining part of the proof is exactly the same as before. ∎
Hypercube. Regarding the hypercube, we will consider only rounds, since the discrepancy is after rounds and after rounds [16]. A natural corresponding worst-case distribution is to have load on all nodes whose -th bit is equal to one and load [math] otherwise. This way, the discrepancy is only reduced in the final round .
6.2. Experimental Setup
For each of the three graphs cycles, 2D-tori and hypercube, we consider two comparative experiments with an average-case load vector and a worst-case initial load vector each. The plots and tables on the next two pages display the results, where for each case we took the average discrepancy over 10 independent runs.
The first experiment considers a “lightly loaded case”, where the theoretical results suggest that a small (i.e., constant or logarithmic) discrepancy is reached well before the expected “worst-case load balancing times”, which are for cycles and for 2D-tori. The second experiments considers a “heavily loaded case”, where the theoretical results suggest that a small discrepancy is not reached faster than in the worst-case.
Specifically, for cycles and 2D-tori, we choose for the lightly loaded case and for the heavily loaded case . The experiments confirm the theoretical results in the sense that for both choices of , we have a much quicker convergence of the discrepancy than in the corresponding worst cases. However, the experiments also demonstrate that only in the lightly loaded case we reach a small discrepancy quickly, whereas in the heavily loaded case there is no big difference between worst-case and average-case if it comes to the time to reach a small discrepancy.
On the hypercube, since we are interested in the case where , our bounds on the discrepancy indicates that we should choose smaller than in the case of cycles and 2D-tori. That is why we choose in the lightly loaded case and in the heavily loaded case (As a side remark, we note that due to the symmetry of the hypercube, any initial load vector sampled from is equivalent to an initial load vector sampled from .) With these adjustments of in both cases, the experimental results of the hypercube are inline with the ones for the cycle and 2D-tori.
The details of the experiments containing plots and tables with the sampled discrepancies can be found on the following two pages (Section A).
Appendix A Experimental Data and Charts
t$$25$$50$$75$$100$$12512^{4}$$2^{8}$$2^{12}$$2^{16}$$2^{20}$$2^{24}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})
[TABLE] t$$10^{1}$$10^{2}$$10^{3}$$10^{4}$$10^{5}$$10^{6}$$10^{7}$$10^{8}12^{4}$$2^{8}$$2^{12}$$2^{16}$$2^{20}$$2^{24}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})
[TABLE] t$$100$$200$$300$$400$$50012^{4}$$2^{8}$$2^{12}$$2^{16}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})
[TABLE]
Experimental Results: Experiments (i) on the cycle with and initial discrepancy , (ii) on the cycle with and initial discrepancy , and (iii) on the 2D-torus with and initial discrepancy of . For the heavily loaded case, we used logarithmic scaling on the -axis to highlight the behavior when is close to the worst-case load balancing time.
t$$10^{1}$$10^{2}$$10^{3}$$10^{4}$$10^{5}$$10^{6}$$10^{7}$$10^{8}$$10^{9}$$10^{10}12^{4}$$2^{8}$$2^{12}$$2^{16}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})
[TABLE] t$$25$$50$$75$$100$$12515$$10$$15$$20$$25$$28$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})
[TABLE] t$$10^{1}$$10^{2}$$10^{3}$$10^{4}$$10^{5}$$10^{6}$$10^{7}$$10^{8}$$10^{9}15$$10$$15$$20$$25$$28$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})
[TABLE]
Experimental Results (cntd.): Experiments (iv) on the 2D-torus with and initial discrepancy , (v) on the hypercube with and initial discrepancy , and (vi) on the hypercube with and initial discrepancy of . For the heavily loaded cases, we used logarithmic scaling on the -axis to highlight the behaviour when is close to the worst-case load balancing time.
Appendix B Concentration Tools
Lemma B.1** ([3, Theorem A.1.15]).**
Let have a Poisson distribution with mean . Then for any ,
[TABLE]
Theorem B.2** (Optional Stopping Theorem [14, Corollary 17.7]).**
Let be a martingale and a stopping time. If and for all and some constant where , then .
Theorem B.3** (Hoeffding’s Inequality [12]).**
Consider a collection of independent random variables with . Then for any number ,
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Milton Abramowitz, Irene A Stegun, et al. Handbook of mathematical functions. Applied mathematics series , 55:62, 1966.
- 2[2] Dan Alistarh, Keren Censor-Hillel, and Nir Shavit. Are lock-free concurrent algorithms practically wait-free? J. ACM , 63(4):31:1–31:20, 2016.
- 3[3] N. Alon and J. Spencer. The Probabilistic Method . Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons, 2nd edition, 2000.
- 4[4] Aris Anagnostopoulos, Adam Kirsch, and Eli Upfal. Load balancing in arbitrary network topologies with stochastic adversarial input. SIAM J. Comput. , 34(3):616–639, 2005.
- 5[5] Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society , 49(1):122–136, 1941.
- 6[6] J. E. Boillat. Load balancing and poisson equation in a graph. Concurrency: Pract. Exper. , 2:289–313, 1990.
- 7[7] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized Gossip Algorithms. IEEE Transactions on Information Theory and IEEE/ACM Transactions on Networking , 52:2508–2530, 2006.
- 8[8] G. Cybenko. Load balancing for distributed memory multiprocessors. J. Parallel and Distributed Comput. , 7:279–301, 1989.
