Randomized Load Balancing on Networks with Stochastic Inputs

Leran Cai; Thomas Sauerwald

arXiv:1703.08702·cs.DC·March 28, 2017

Randomized Load Balancing on Networks with Stochastic Inputs

Leran Cai, Thomas Sauerwald

PDF

TL;DR

This paper analyzes the average-case performance of load balancing algorithms on various network topologies with stochastic inputs, providing bounds on discrepancy that highlight differences from worst-case scenarios.

Contribution

It introduces new bounds on load discrepancy for multiple network types under stochastic inputs, extending previous worst-case analyses to average-case scenarios.

Findings

01

Bounds on discrepancy for cycles, tori, hypercubes, and expanders.

02

Significant difference between worst-case and average-case convergence.

03

Applicable to various probability distributions including unbounded ones.

Abstract

Iterative load balancing algorithms for indivisible tokens have been studied intensively in the past. Complementing previous worst-case analyses, we study an average-case scenario where the load inputs are drawn from a fixed probability distribution. For cycles, tori, hypercubes and expanders, we obtain almost matching upper and lower bounds on the discrepancy, the difference between the maximum and the minimum load. Our bounds hold for a variety of probability distributions including the uniform and binomial distribution but also distributions with unbounded range such as the Poisson and geometric distribution. For graphs with slow convergence like cycles and tori, our results demonstrate a substantial difference between the convergence in the worst- and average-case. An important ingredient in our analysis is new upper bound on the t-step transition probability of a general Markov…

Figures2

Click any figure to enlarge with its caption.

Figure 2

Equations204

P [∣ X - μ ∣ \geq δ \cdot σ] \leq exp (- κ δ),

P [∣ X - μ ∣ \geq δ \cdot σ] \leq exp (- κ δ),

P [x_{u}^{(t)} - x_{v}^{(t)} \geq δ \cdot 128 κ \cdot σ \cdot lo g n \cdot M_{., u}^{t} - M_{., v}^{t}_{2} + 48 lo g n] \leq 2 \cdot e^{- δ^{2}} + 2 n^{- 2} .

P [x_{u}^{(t)} - x_{v}^{(t)} \geq δ \cdot 128 κ \cdot σ \cdot lo g n \cdot M_{., u}^{t} - M_{., v}^{t}_{2} + 48 lo g n] \leq 2 \cdot e^{- δ^{2}} + 2 n^{- 2} .

P [x_{u}^{(t)} - x_{v}^{(t)} \geq σ / (2 2 lo g_{2} σ) \cdot M_{., u}^{t} - M_{., v}^{t}_{2} - 48 lo g n] \geq \frac{1}{16} .

P [x_{u}^{(t)} - x_{v}^{(t)} \geq σ / (2 2 lo g_{2} σ) \cdot M_{., u}^{t} - M_{., v}^{t}_{2} - 48 lo g n] \geq \frac{1}{16} .

P_{x, x}^{t} - π_{x} \leq \frac{2 Δ ^{5/2}}{t},

P_{x, x}^{t} - π_{x} \leq \frac{2 Δ ^{5/2}}{t},

P_{x, y}^{t} - π_{y} \leq \frac{π _{m a x}^{3/2}}{π _{m i n}^{3/2}} \cdot \frac{2}{β ^{1/2} α} \frac{1 - β + α}{α t},

P_{x, y}^{t} - π_{y} \leq \frac{π _{m a x}^{3/2}}{π _{m i n}^{3/2}} \cdot \frac{2}{β ^{1/2} α} \frac{1 - β + α}{α t},

P [X \in [μ - 8/ κ \cdot σ lo g n, μ + 8/ κ \cdot σ lo g n]] \geq 1 - n^{- 2} .

P [X \in [μ - 8/ κ \cdot σ lo g n, μ + 8/ κ \cdot σ lo g n]] \geq 1 - n^{- 2} .

P [w \in V max x_{w}^{(t)} - ξ_{w}^{(t)} \leq 12 \cdot lo g n] \geq 1 - n^{- 2} .

P [w \in V max x_{w}^{(t)} - ξ_{w}^{(t)} \leq 12 \cdot lo g n] \geq 1 - n^{- 2} .

E := w \in V ⋂ {ξ_{w}^{(0)} - μ \leq 8/ κ \cdot σ \cdot lo g n} .

E := w \in V ⋂ {ξ_{w}^{(0)} - μ \leq 8/ κ \cdot σ \cdot lo g n} .

ξ_{u}^{(t)} - ξ_{v}^{(t)}

ξ_{u}^{(t)} - ξ_{v}^{(t)}

E [ξ_{u}^{(t)} - ξ_{v}^{(t)}]

E [ξ_{u}^{(t)} - ξ_{v}^{(t)}]

P [ξ_{u}^{(t)} - ξ_{v}^{(t)} \geq δ]

P [ξ_{u}^{(t)} - ξ_{v}^{(t)} \geq δ]

P [x_{u}^{(t)} - x_{v}^{(t)} \geq δ + 48 \cdot lo g n] \leq 2 \cdot exp (\frac{- 2 δ ^{2}}{256/ κ ^{2} \cdot σ ^{2} \cdot lo g ^{2} n \cdot M _{., u}^{t} - M _{., v}^{t} _{2}^{2}}) + n^{- 2} .

P [x_{u}^{(t)} - x_{v}^{(t)} \geq δ + 48 \cdot lo g n] \leq 2 \cdot exp (\frac{- 2 δ ^{2}}{256/ κ ^{2} \cdot σ ^{2} \cdot lo g ^{2} n \cdot M _{., u}^{t} - M _{., v}^{t} _{2}^{2}}) + n^{- 2} .

∣ F_{n} (x) - Φ (x) ∣

∣ F_{n} (x) - Φ (x) ∣

d e v := ξ_{u}^{(t)} - ξ_{v}^{(t)} = w \in V \sum ξ_{w}^{(0)} \cdot (M_{w, u}^{t} - M_{w, v}^{t}) .

d e v := ξ_{u}^{(t)} - ξ_{v}^{(t)} = w \in V \sum ξ_{w}^{(0)} \cdot (M_{w, u}^{t} - M_{w, v}^{t}) .

σ^{2} = w \in V \sum (M_{w, k}^{t} - M_{w, k^{'}}^{t})^{2} .

σ^{2} = w \in V \sum (M_{w, k}^{t} - M_{w, k^{'}}^{t})^{2} .

v \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2} \geq \frac{1}{2 lo g _{2} σ} \cdot σ^{2} .

v \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2} \geq \frac{1}{2 lo g _{2} σ} \cdot σ^{2} .

M_{., u}^{t} - M_{., v}^{t}_{2}^{2} = O (σ^{- 1}),

M_{., u}^{t} - M_{., v}^{t}_{2}^{2} = O (σ^{- 1}),

S := w \in V_{i} \sum ξ_{w}^{(0)} \cdot (M_{w, u}^{t} - M_{w, v}^{t})

S := w \in V_{i} \sum ξ_{w}^{(0)} \cdot (M_{w, u}^{t} - M_{w, v}^{t})

S^{c} := w \neq \in V_{i} \sum ξ_{w}^{(0)} \cdot (M_{w, u}^{t} - M_{w, v}^{t}) .

S^{c} := w \neq \in V_{i} \sum ξ_{w}^{(0)} \cdot (M_{w, u}^{t} - M_{w, v}^{t}) .

ψ_{0}

ψ_{0}

ψ_{0}

ψ_{0}

F_{n} (x)

F_{n} (x)

= P \frac{\sum _{w \in V_{i}} ξ _{w}^{(0)} \cdot ( M _{w, u}^{t} - M _{w, v}^{t} ) - \sum _{w \in V_{i}} μ \cdot ( M _{w, u}^{t} - M _{w, v}^{t} )}{σ \sum _{w \in V_{i}} ( M _{w, u}^{t} - M _{w, v}^{t} ) ^{2}} \leq x

= P \frac{S - E [ S ]}{σ \sum _{w \in V_{i}} ( M _{w, u}^{t} - M _{w, v}^{t} ) ^{2}} \leq x

= P S - E [S] \leq x σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2} .

P S - E [S] \geq x σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

P S - E [S] \geq x σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

\geq \frac{1}{π ( x + x ^{2} + 2 ) e ^{x^{2}}} - o (1),

\frac{1}{x + x ^{2} + 2} < e^{x^{2}} \int_{x}^{\infty} e^{- t^{2}} d t ⩽ \frac{1}{x + x ^{2} + 4/ π} (x > 0) .

\frac{1}{x + x ^{2} + 2} < e^{x^{2}} \int_{x}^{\infty} e^{- t^{2}} d t ⩽ \frac{1}{x + x ^{2} + 4/ π} (x > 0) .

\frac{1}{π ( x + x ^{2} + 2 ) e ^{x^{2}}} < Φ^{c} (x) ⩽ \frac{1}{π ( x + x ^{2} + 4/ π ) e ^{x^{2}}} .

\frac{1}{π ( x + x ^{2} + 2 ) e ^{x^{2}}} < Φ^{c} (x) ⩽ \frac{1}{π ( x + x ^{2} + 4/ π ) e ^{x^{2}}} .

P S - E [S] \geq σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

P S - E [S] \geq σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

P E [S] - S \geq σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

P E [S] - S \geq σ w \in V_{i} \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

w \in V \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

w \in V \sum (M_{w, u}^{t} - M_{w, v}^{t})^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\Copyright

Leran Cai and Thomas Sauerwald

Randomized Load Balancing on Networks with Stochastic Inputs

Leran Cai

University of Cambridge, email: (lc647|tms41)@cl.cam.ac.uk

Thomas Sauerwald

University of Cambridge, email: (lc647|tms41)@cl.cam.ac.uk

Abstract.

Iterative load balancing algorithms for indivisible tokens have been studied intensively in the past, e.g., [22, 19, 25]. Complementing previous worst-case analyses, we study an average-case scenario where the load inputs are drawn from a fixed probability distribution. For cycles, tori, hypercubes and expanders, we obtain almost matching upper and lower bounds on the discrepancy, the difference between the maximum and the minimum load. Our bounds hold for a variety of probability distributions including the uniform and binomial distribution but also distributions with unbounded range such as the Poisson and geometric distribution. For graphs with slow convergence like cycles and tori, our results demonstrate a substantial difference between the convergence in the worst- and average-case. An important ingredient in our analysis is new upper bound on the $t$ -step transition probability of a general Markov chain, which is derived by invoking the evolving set process.

Key words and phrases:

random walks, randomized algorithms, parallel computing

1991 Mathematics Subject Classification:

G.3 Probability and Statistics

1. Introduction

In the last decade, large parallel networks became widely available for industrial and academic users. An important prerequisite for their efficient usage is to balance their work efficiently. Load balancing is known to have applications to scheduling [28], routing [8], numerical computation such as solving partial differential equations [30, 29, 27], and finite element computations [13]. In the standard abstract formulation of load balancing, processors are represented by nodes of a graph, while links are represented by edges. The objective is to balance the load by allowing nodes to exchange loads with their neighbors via the incident edges. In this work we will study a decentralized and iterative load balancing protocol where a processor knows only its current load and that of the neighboring processors and based on this, decides how much load should be sent (or received).

Load Balancing Models. A widely used approach is diffusion, e.g., the first-order-diffusion scheme [8, 19], where the amount of load sent along each edge in each round is proportional to the load difference between the incident nodes. In this work, we consider the alternative, the so-called matching model, where in each round only the edges of the matching are used to average the load locally. In comparison to diffusion, the matching model reduces the communication in the network and moreover tends to behave in a more “monotone” fashion than diffusion, since it avoids concurrent load exchanges which may increase the maximum load or decrease the minimum load in certain cases.

We measure the smoothness of the load distribution by the so-called discrepancy which is the difference between the maximum and minimum load among all nodes. In view of more complex scenarios where jobs are eventually removed or new jobs are generated, the discrepancy seems to be a more appropriate measure than the makespan, which only considers the maximum load.

Many studies in load balancing assume that load is arbitrarily divisible. In this so-called continuous case, load balancing corresponds to a Markov chain on the graph and one can resort to a wide range of established techniques to analyze the convergence speed [6, 10, 19]. In particular, the spectral gap captures the time to reach a small discrepancy fairly accurately, e.g., see [26, 22] for the diffusion and see [7, 18] for the matching model.

However, in many applications a processor’s load may consist of tasks which are not further divisible, which is why the continuous case has been also referred to as “idealized case” [22]. A natural way to model indivisible tasks is the unit-size token model where one assumes a smallest load entity, the unit-size token, and load is always represented by a multiple of this smallest entity. In the following, we will refer to the unit-size token model as the discrete case.

Initiated by the work of [22], there has been a number of studies on load balancing in the discrete case. Unlike [22], [25] analyzed a randomized rounding based strategy, meaning that an excess token will be distributed uniformly at random among the two communicating nodes. The authors of [25] proved that with this strategy the time to reach constant discrepancy in the discrete case is essentially the same as the corresponding time in the continuous case. Their results hold both for the random matching model, where in each round a new random matching is generated by a simple distributed protocol, and the balancing circuit model (a.k.a. dimension exchange), where a fixed sequence of matching is applied periodically. In this work, we will focus on the balancing circuit model, which is particularly well suited for highly structured graphs such as cycles, tori or hypercubes.

Worst-Case vs. Average-Case Inputs. Previous work has almost always adopted the usual worst-case framework for deriving bounds on the load discrepancy [22]. That means that any upper bound on the discrepancy holds for an arbitrary input, i.e., an arbitrary initial load vector. While it is of course very natural and desirable to have such general bounds, the downside is that for graphs with poor expansion like cycles or 2D-tori, the convergence is rather slow, i.e., quadratic or linear in the number of nodes $n$ .

This serves as a motivation to explore an average-case input. Specifically, we assume that the number of load items at each node is sampled independently from a fixed distribution. Our main results demonstrate that the convergence of the load vector is considerably quicker (measured by the load discrepancy), especially on networks with slow convergence in the worst-case such as cycles and 2D-tori.

We point out that many related problems including scheduling on parallel machines or load balancing in a dynamic setting (meaning that jobs are continuously added and processed) have been studied under random inputs, e.g., [4, 11, 2]. To the best of our knowledge, only very few works have studied this question in iterative load balancing. One exception is [23], which investigated the performance of continuous load balancing on tori in the diffusion model. In contrast to this work, however, only upper bounds are given and they hold for the multiplicative ratio between maximum and minimum load, rather than the discrepancy.

Our main results in this paper hold for all distributions satisfying the following definition, which is satisfied by the uniform, binomial, Poisson and geometric distribution (see Section 2).

Definition 1.1.

We say that a distribution $D$ over $\mathbb{N}\cup\{0\}$ is exponentially concentrated if there is a constant $\kappa>0$ so that for any $X\sim D$ , $\delta>0$ ,

[TABLE]

where $\mu$ and $\sigma^{2}$ are the expectation and variance of $D$ . In the following, we refer to average-case when the initial number of load items on each vertex is drawn independently from a fixed exponentially concentrated distribution.

Our Results. Our first contribution is a general formula that allows us to express the load difference between an arbitrary pair of nodes in round $t$ . Here the round matrix $\mathbf{M}$ is the product of the matching matrices that are applied periodically (cf. Section 2).

Theorem 1.2.

Consider the balancing circuit model with an arbitrary round matrix $\mathbf{M}$ in the average case. Then for any pair of nodes $u,v$ and round $t$ , it holds for any $\delta>0$ that

[TABLE]

Further, for any pair of vertices $u,v$ and any round $t$ satisfying $t=\omega(1)$ ,

[TABLE]

The proof of the upper bound Theorem 1.2 is the easier direction, and it relies on a previous result relating continuous and discrete load balancing from [25]. The lower bound is technically more challenging and applies a generalized version of the central limit theorem.

Together, the upper and lower bound in the above result establish that the load deviation between any two nodes $u$ and $v$ is essentially captured by $\left\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\right\|_{2}$ . However, in some instances it might be desirable to have a more tangible estimate at the expense of generality. A first step towards this goal is to observe that $\left\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\right\|_{2}^{2}\leq 4\cdot\max_{k\in V}\|\mathbf{M}^{t}_{.,k}-\mathbf{\frac{1}{n}}\|_{2}^{2}$ (see Lemma 4.1). Hence we are left with the problem of understanding the $t$ -step probablity vector $\mathbf{M}^{t}_{.,k}$ .

For reversible Markov chains, the last expression has been analyzed in several works, e.g., a result from [15, Lemma 3.6] implies that for random walks on graphs, $\mathbf{P}_{u,v}^{t}=O(\operatorname{deg}(v)/\sqrt{t})$ (cf. [15]). However, the Markov chain associated to $\mathbf{M}$ is not reversible in general. For irreversible Markov chains, [14] use the so-called evolving set process to derive a similar bound. Specifically, they proved in [14, Theorem 17.17] that if $\mathbf{P}$ denotes the transition matrix of a lazy random walk (i.e., a random walk with loop probability at least $1/2$ ) on a graph with maximal degree $\Delta$ , then for any vertex $x\in V$ :

[TABLE]

where $\pi$ is the stationary distribution of $\mathbf{P}$ . Such estimates have been used in various applications besides load balancing, including distributed random walks and spanning tree enumeration [24, 15]. Here we generalize this result to Markov chains with an arbitrary loop probability and to arbitrary $t$ -step transition probabilities:

Theorem 1.3.

Let $\mathbf{P}$ be the transition matrix of an irreducible Markov chain and $\pi$ its stationary distribution. Then we have for all states $x,y$ and step $t$ ,

[TABLE]

where $\alpha:=\min\limits_{u\neq v}\mathbf{P}_{u,v}>0$ and $\beta:=\min\limits_{u}\mathbf{P}_{u,u}>0$ .

Applying this bound to a round matrix $\mathbf{M}$ that is formed of $d=O(1)$ matchings, we obtain $\left|\mathbf{M}^{t}_{u,v}-1/n\right|=O(t^{-1/2}).$ It should be noted that [25, Lemma 2.5] proved a weaker version where the upper bound is only $O(t^{-1/8})$ instead of $O(t^{-1/2})$ . As we will prove in Lemma 5.3, the bound $O(t^{-1/2})$ is asymptotically tight if we consider the balancing circuit model on cycles.

Combining the bound in Theorem 1.3 with the upper bound in Theorem 1.2 yields:

Theorem 1.4.

Consider the balancing circuit model with an arbitrary round matrix $\mathbf{M}$ consisting of $d=O(1)$ matchings in the average case. Then the discrepancy after $t$ rounds is $O(t^{-1/4}\cdot\sigma\cdot(\log n)^{3/2}+\sqrt{\log n})$ with probability $1-O(n^{-1})$ .

Since the initial discrepancy in the average case is $O(\sigma\cdot\log n)$ (see Lemma 2.2), Theorem 1.4 implies that in the average case, there is a signficant decrease (roughly of order $t^{-1/4}$ ) in the discrepancy, regardless of the underlying topology.

For round matrices $\mathbf{M}$ with small second largest eigenvalue, the next result provides a significant improvement:

Theorem 1.5.

Consider the balancing circuit model with an arbitrary round matrix $\mathbf{M}$ consisting of $d$ matchings in the average case. Then the discrepancy after $t$ rounds is $O(\lambda(\mathbf{M})^{t/4}\cdot\sigma\cdot(\log n)^{3/2}+\sqrt{\log n})$ with probability $1-O(n^{-1})$ .

Hence for graphs where $\lambda$ is bounded away from $1$ , we even obtain an exponential convergence.

In Section 5, we derive bounds on the discrepancy for cycles, $r$ -dim. Torus, expanders and hypercubes. A summary of these results can be found in Figure 1.

Finally, we discuss our results and contrast them to the convergence of the discrepancy in the worst-case in Section 6. On a high level, these results demonstrate that on all the considered topologies, we have much faster convergence in the average-case than in the worst-case. However, if we are only interested in the time to achieve a very small, say, constant or poly-logarithmic discrepancy, then we reveal an interesting dichotomy: we have a quicker convergence than in the worst-case if and only if the standard deviation $\sigma$ is smaller than some threshold, which depends on the actual toplogy. We observe the same phenomena in our experiments, which are also discussed in Section A.

2. Notation and Background

We assume that $G=(V,E)$ is an undirected, connected graph with $n$ nodes labelled in $[0,n-1]$ . Unless stated otherwise, all logarithms are to the base $e$ . The notations $\mathbb{P}\left[{\mathcal{E}}\right]$ and $\mathbb{E}\left[{X}\right]$ denote the probability of an event $\mathcal{E}$ and the expectation of a random variable $X$ , respectively. For any $n$ -dimensional vector $x$ , $\operatorname{disc}(x)=\max_{i}x_{i}-\min_{i}x_{i}$ denotes the discrepancy.

Matching Model. In the matching model (sometimes also called dimension exchange model), every two matched nodes in round $t$ balance their load as evenly as possible. This can be expressed by a symmetric $n$ by $n$ matching matrix $\mathbf{M}^{(t)}$ , where with slight abuse of notation we use the same symbol for the matching and the corresponding matching matrix. Formally, matrix $\mathbf{M}^{(t)}$ is defined by $\mathbf{M}_{u,u}^{(t)}:=1/2$ , $\mathbf{M}_{v,v}^{(t)}:=1/2$ and $\mathbf{M}_{u,v}^{(t)}=\mathbf{M}_{v,u}^{(t)}:=1/2$ if $\{u,v\}\in\mathbf{M}^{(t)}\subseteq E$ , and $\mathbf{M}^{(t)}_{u,u}=1$ , $\mathbf{M}^{(t)}_{u,v}=0\leavevmode\nobreak\ (u\neq v)$ if $u$ is not matched.

Balancing Circuit. In the balancing circuit model, a specific sequence of matchings is applied periodically. More precisely, let $\mathbf{M}^{(1)},\dots,\mathbf{M}^{(d)}$ be a sequence of $d$ matching matrices, also called period 111Note that $d$ may be different from the maximal degree (or degree) of the underlying graph.. Then in step $t\geq 1$ , we apply the matching matrix $\mathbf{M}^{(t)}:=\mathbf{M}^{(((t-1)\mod d)+1)}$ . We define the round matrix by $\mathbf{M}:=\prod_{s=1}^{d}\mathbf{M}^{(s)}$ . If $\mathbf{M}$ is symmetric, we define $\lambda(\mathbf{M})$ to be its second largest eigenvalue (in absolute value). Following [22], if $\mathbf{M}$ is not symmetric (which is usually the case), we define $\lambda(\mathbf{M})$ as the second largest eigenvalue of the symmetric matrix $\mathbf{M}\cdot\mathbf{M}^{T}$ , where $\mathbf{M}^{T}$ is the transpose of $\mathbf{M}$ . We always assume that $\lambda(\mathbf{M})<1$ , which is guaranteed to hold if the matrix $\mathbf{M}$ is irreducible. Notice that since $\mathbf{M}$ is doubly stochastic, all powers of $\mathbf{M}$ are doubly stochastic as well. A natural choice for the $d$ matching matrices is given by an edge coloring of $G$ . There are various efficient distributed edge coloring algorithms, e.g. [21, 20].

Balancing Circuit on Specific Toplogies. For hypercubes, the canonical choice is dimension exchange consisting of $d=\log_{2}n$ matching matching matrices $\mathbf{M}^{(i)}$ by $\mathbf{M}_{u,v}^{(i)}=1/2$ if and only if the bit representation of $u$ and $v$ differ only in bit $i$ . Then the round matrix $\mathbf{M}$ is defined by $\prod_{i=1}^{\log_{2}n}\mathbf{M}^{(i)}$ . For cycles, we will consider the natural “Odd-Even”-scheme meaning that for $\mathbf{M}^{(1)}$ , the matching consists of all edges $\{j,(j+1)\pmod{n}\}$ for any odd $j$ , while for $\mathbf{M}^{(2)}$ , the matching consists of all edges $\{j,(j+1)\pmod{n}\}$ for any even $j$ . More generally, for $r$ -dimensional tori with vertex set $[0,n^{1/r}-1]^{r}$ , we will have $2\cdot r$ matchings in total, meaning that for every dimension $1\leq i\leq r$ we have two matchings along dimension $i$ , similar to the definition of matchings for the cycle.

The Continuous Case. In the continuous case, load is arbitrarily divisible. Let $\xi^{(0)}\in\mathbb{R}^{n}$ be the initial load represented as a row vector, and in every round two matched nodes average their load perfectly. We consider the load vector $\xi^{(t)}$ after $t$ rounds in the balancing circuit model (that means, after the executions of $t\cdot d$ matchings in total). This process corresponds to a linear system and $\xi^{(t)},$ $t\in\mathbb{N}$ , can be expressed as $\xi^{(t)}=\xi^{(t-1)}\,\mathbf{M}$ , which results in $\xi^{(t)}=\xi^{(0)}\,\mathbf{M}^{t}$ .

The Discrete Case. Let us now turn to the discrete case with indivisible, unit-size tokens. Let $x^{(0)}\in\mathbb{Z}^{n}$ be the initial load vector with average load $\overline{x}:=\sum_{w\in V}x_{w}^{(0)}/n$ , and $x^{(t)}$ be the load vector at the end of round $t$ . In case the sum of tokens of the two paired nodes is odd, we employ the so-called random orientation (or randomized rounding) [22, 25]. More precisely, if there are two nodes $u$ and $v$ with load $a$ and $b$ being paired by matching $\mathbf{M}^{(t)}$ , then node $u$ gets either $\big{\lceil}\frac{a+b}{2}\big{\rceil}$ or $\big{\lfloor}\frac{a+b}{2}\big{\rfloor}$ tokens, with probability $1/2$ each. The remaining tokens are assigned to node $v$ .

The Average-Case Setting. We consider a setting where each entry of the initial load vector $x^{(0)}$ is chosen from an exponentially concentrated probability distribution $D$ with expectation $\mu$ and variance $\sigma^{2}$ (see Definition 1.1). It is not difficult to verify that many natural distributions satisfy the condition of exponentially concentrated (see the appendix for more details).

Lemma 2.1.

The uniform distribution, binomial distribution, geometric distribution and Poisson distribution are all exponentially concentrated.

Proof.

Note that the uniform distribution $\mathsf{Uni}[0,k]$ is trivially exponentially concentrated, since $\sigma=\Theta(k)$ . However, also distributions with unbounded range may be exponentially concentrated, with one example being the geometric distribution $\mathsf{Geo}(p)$ . To verify this, first note that we have $\mu=1/p$ and $\sigma=\sqrt{(1-p)/p^{2}}$ (and so $\mu=\Theta(\sigma)$ ) and thus $\mathbb{P}\left[{\mu-X\geq\delta\cdot\sigma}\right]\leq\exp\left(-\kappa\delta\right)$ holds trivially for a sufficiently small constant $\kappa>0$ . Secondly, for the upper tail, by Markov’s inequality, $\mathbb{P}\left[{X\geq 2\cdot\mathbb{E}\left[{X}\right]}\right]\leq 1/2$ , and by the memoryless property of the geometric distribution, for any $j\geq 1$ , $\mathbb{P}\left[{X\geq j\cdot 2\cdot\mathbb{E}\left[{X}\right]}\right]\leq 2^{-j}$ .

For the binomial distribution $\mathsf{Bin}[m,p]$ with expectation $\mu=m\cdot p$ and standard deviation $\sigma=\sqrt{m\cdot p\cdot(1-p)}$ , we will assume w.l.o.g. that $p\leq 1/2$ , so that $\sigma=\Theta(\sqrt{mp})$ . Then by [17, Theorem 2.3], we have for $X\sim\mathsf{Bin}[m,p]$ , $\mathbb{P}\left[{X-\mu\geq\epsilon\cdot\mu}\right]\leq\exp\left(-\frac{\epsilon^{2}\mu}{2+2\epsilon/3}\right)$ . Choosing $\epsilon=\delta\cdot\sigma/\mu$ yields $\mathbb{P}\left[{X-\mu\geq\delta\cdot\sigma}\right]\leq\exp\left(-\frac{\delta^{2}\cdot\sigma^{2}/\mu}{2+2\sigma/\mu}\right)$ , as needed. For the lower tails, we use $\mathbb{P}\left[{\mu-X\geq\epsilon\cdot\mu}\right]\leq e^{-1/2\epsilon^{2}\mu}$ and obtain a similar result as before (see again [17, Theorem 2.3]).

For the Poisson Distribution $\mathsf{Poi}[\mu]$ , we can verify in an analogous way that it is exponentially distributed by using the following two Chernoff bounds for Poisson random variables (B.1). ∎

The definition of exponentially concentrated implies the following concentration result:

Lemma 2.2.

Let $D$ be an exponentially concentrated distribution and let $X\sim D$ . Then,

[TABLE]

In particular, the initial discrepancy satisfies $\operatorname{disc}(x^{(0)})=O(\sigma\cdot\log n)$ with probability at least $1-n^{-1}$ .

The advantage of Lemma 2.2 is that we can use a simple conditioning trick to work with distributions that have a finite range and are therefore easier to analyze with concentration tools like Hoeffding’s inequality (Theorem B.3). That is in the analysis we simply work with a bounded-range distribution $\widetilde{D}$ , which is the distribution $D$ under the condition that only values in the interval $[\mu-8/c\cdot\sigma\log(n),\mu+8/c\cdot\sigma\log(n)]$ occur.

3. Proof of the General Bound (Theorem 1.2)

See 1.2

3.1. Proof of Theorem 1.2 (Upper Bound)

We will use the following result from [25] that bounds the deviation between the continuous and discrete load, assuming that we have $\xi^{(0)}=x^{(0)}$ .

Theorem 3.1 ([25, Theorem 3.6( $i$ )]).

Consider the balancing circuit model with an arbitrary round matrix $\mathbf{M}$ . Then for any round $t\geq 1$ it holds that

[TABLE]

Proof of Theorem 1.2 (Upper Bound).

Recall that the initial vector $\xi^{(0)}=x^{(0)}$ consists of $n$ i.i.d. random variables. As explained at the end of Section 2, we condition on the event

[TABLE]

By Lemma 2.2, $\mathbb{P}\left[{\mathcal{E}}\right]\geq 1-n^{-2}$ . In the remainder of the proof, all random variables are conditional on $\mathcal{E}$ , but for simplicity we will not explicitly express this conditioning.

Since $\xi_{u}^{(t)}=\sum_{w\in V}\xi_{w}^{(0)}\mathbf{M}^{t}_{w,k}$ , the load $\xi_{u}^{(t)}$ is just a weighted sum of i.i.d. random variables and we obtain

[TABLE]

which is in fact still a sum of $n$ i.i.d. random variables. The expectation is

[TABLE]

where the last equality holds since $\mathbf{M}$ is doubly stochastic.

Now applying Hoeffding’s inequality (Theorem B.3) and recalling that conditional on $\mathcal{E}$ , the range of each $\xi_{w}^{(t)}$ is $16/\kappa\cdot\sigma\cdot\log n$ , we obtain that

[TABLE]

Applying Theorem 3.1 yields

[TABLE]

The statement of the theorem follows by scaling $\delta$ and recalling that $\mathbb{P}\left[{\mathcal{E}}\right]\geq 1-n^{-2}$ . ∎

3.2. Proof of Theorem 1.2 (Lower Bound)

The proof of the lower bound will use the following quantitative version of a central limit type theorem for independent but non-identical random variables.

Theorem 3.2 (Berry-Esseen Theorem [5, 9] for non-identical r.v.).

Let $X_{1},X_{2},...,X_{n}$ be independently distributed with $\mathbb{E}\left[{X_{i}}\right]=0$ , $\mathbb{E}\left[{X_{i}^{2}}\right]=\mathrm{Var}{X_{i}}=\sigma_{i}^{2}$ , and $\mathbb{E}\left[{|X_{i}|^{3}}\right]=\rho_{i}<\infty$ . If $F_{n}(x)$ is the distribution of $\frac{X_{1}+...+X_{n}}{\sqrt{\sigma_{1}^{2}+\sigma_{2}^{2}+...+\sigma_{n}^{2}}}$ and $\Phi(x)$ is the standard normal distribution, then

[TABLE]

where $\psi_{0}=\left(\sum_{i=1}^{n}\sigma_{i}^{2}\right)^{-3/2}\cdot\sum_{i=1}^{n}\rho_{i}$ and $C_{0}>0$ is a constant.

With this concentration tool at hand, we are able to prove the lower bound in Theorem 1.2. Unfortunately, it appears quite difficult to apply Theorem 3.2 directly to equation (3.1), since we need a good bound on the error term $\psi_{0}$ . To this end, we will first partition the vertex set $V$ into buckets with equal contribution to $\xi_{u}^{(t)}-\xi_{v}^{(t)}$ . Then we will apply Theorem to the bucket with the largest variance, for which we can show that $\psi_{0}=o(1)$ , thanks to the precondition that $t=\omega(1)$ and the bound in Theorem 1.3.

Proof of Theorem 1.2 (Lower Bound).

As in the derivation of the upper bound, we first consider $\xi_{u}^{(t)}-\xi_{v}^{(t)}$ :

[TABLE]

Again we are dealing with a weighted sum of i.i.d. random variables with expectation $\mu$ and variance $\sigma^{2}$ . As mentioned earlier, we have $\mathbb{E}\left[{dev}\right]=\sum_{w\in V}\mathbb{E}\left[{\xi_{w}^{0}}\right]\cdot\left(\mathbf{M}^{t}_{w,u}-\mathbf{M}^{t}_{w,v}\right)=0$ since $\mathbf{M}$ is a doubly stochastic matrix. Of course, we could apply Theorem 3.2 directly to $dev$ , but it appears difficult to control the error term $\psi_{0}$ . Therefore we will first partition the above sum into buckets where the weights of the random variables are roughly the same.

More precisely, we will partition $V$ into $2\log_{2}\sigma-1$ buckets, where for each $i$ we have $V_{i}:=\left\{w\in V\colon|\mathbf{M}^{t}_{w,u}-\mathbf{M}^{t}_{w,v}|\in(2^{-i},2^{-i+1}]\right\}$ for $1\leq i\leq 2\log_{2}\sigma-1$ , and $V_{2\log_{2}\sigma-1}:=\left\{w\in V\colon\left|\mathbf{M}^{t}_{w,u}-\mathbf{M}^{t}_{w,v}\right|\leq\frac{1}{\sigma^{2}}\right\}$ .

Further, let us consider the variance of $dev$ :

[TABLE]

Then by the pigeonhole principle there exists an index $1\leq i\leq 2\log_{2}\sigma-1$ such that

[TABLE]

Firstly, if that index $i$ is equal to $2\log_{2}\sigma$ , then

[TABLE]

and the lower bounds holds trivially. Therefore, we will assume in the remainder of the proof that $i<2\log_{2}\sigma-1$ . We now decompose $dev$ into $dev=S+S^{c}$ , where

[TABLE]

and

[TABLE]

Let us first analyze $S$ . We will now apply Theorem 3.2 to $S$ . In preparation for this, let us first upper bound $\psi_{0}$ . Using the definition of exponentially concentrated, it follows that for any constant $k$ , the first $k$ moments are all bounded from above by $O(\sigma^{k})$ . Hence,

[TABLE]

Recalling that for any $w\in V_{i}$ , $\left|\mathbf{M}^{t}_{w,u}-\mathbf{M}^{t}_{w,v}\right|\in(2^{-i},2^{-i+1}]$ , we can simplify the above expression as follows:

[TABLE]

However, since we have $t=\omega(1)$ , by Theorem 1.3, $|\mathbf{M}^{t}_{x,y}-\frac{1}{n}|=O(t^{-1/2})$ and therefore it must be that $|V_{i}|=\omega(1)$ , and we conclude that $\psi_{0}=o(1)$ .

Before applying Theorem 3.2, we scale the original distribution to $\xi_{w}^{{}^{\prime}(0)}=\xi_{w}^{(0)}-\mu$ . Since $\mathrm{Var}(aX)=a^{2}\mathrm{Var}(X)$ , we have

[TABLE]

As derived earlier $\psi_{0}=o(1)$ , and therefore

[TABLE]

where last inequality uses [1, Formula 7.1.13]:

[TABLE]

Therefore, by substitution, we get

[TABLE]

Hence with $x=1$ ,

[TABLE]

Similarly, we can derive that

[TABLE]

Hence, independent of what the value $S^{c}$ is, there is still a probability of at least $1/16$ so that $|S+S^{c}|\geq\sigma/2\cdot\sqrt{1/(2\log_{2}\sigma)}\cdot\sqrt{\sum_{w\in V}\left(\mathbf{M}^{t}_{w,u}-\mathbf{M}^{t}_{w,v}\right)^{2}}$ . ∎

4. Proof of the Universal Bounds (Theorem 1.4, Theorem 1.5)

In the previous section we proved that the deviation between the loads of two nodes $u$ and $v$ is essentially captured by $\left\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\right\|_{2}$ . However, in some cases it might be hard to compute or estimate this quantity for arbitrary vertices $u$ and $v$ . Therefore we will first prove the following universal upper bound on the discrepancy that works for arbitrary graphs and pair of nodes, as stated on page 1.4.

See 1.4

4.1. Proof of Theorem 1.4

The proof of Theorem 1.4 is fairly involved and we first sketch the high level ideas. We first show that $\left\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\right\|_{2}^{2}$ can be upper bounded in terms of the $\ell_{2}$ -distance to the stationary distribution.

Lemma 4.1.

Consider the balancing circuit model with an arbitrary round matrix $\mathbf{M}$ . Then for all $u,v\in V$ , we have $\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\|_{2}^{2}\leq 4\cdot\max_{k\in V}\|\mathbf{M}^{t}_{.,k}-\mathbf{\frac{1}{n}}\|_{2}^{2}.$ Further, for any $u\in V$ we have $\max_{v\in V}\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\|_{2}^{2}\geq\|\mathbf{M}^{t}_{.,u}-\mathbf{\frac{1}{n}}\|_{2}^{2}$ .

Proof.

[TABLE]

and the first statement follows. We now prove the second statement:

[TABLE]

We first look at the difference between these two terms squared. That is, for any vertex $v\in V$ we have

[TABLE]

Now let $Z$ be a uniform random variable over the set $V\backslash\{u\}$ . Then it follows that

[TABLE]

Further, by linearity of expectations

[TABLE]

By definition of expectation, this implies that there exists a vertex $v\in V,v\neq u$ such that

[TABLE]

Combining (4.1) and (4.2),

[TABLE]

where the second last inequality holds since $\mathbf{M}$ is doubly stochastic. ∎

The next step and main ingredient of the proof of Theorem 1.4 is to establish that $\|\mathbf{M}^{t}_{.,k}-\mathbf{\frac{1}{n}}\|_{\infty}=O(1/\sqrt{t})$ . This result will be a direct application of a general bound on the $t$ -step probabilities of an arbitrary, possibly non-reversible Markov chain, as given in Theorem 1.3 from page 1.3:

See 1.3

In this subsection we prove Theorem 1.4, assuming the correctness of Theorem 1.3 whose proof is deferred to Section 4.2.

Proof of Theorem 1.4.

By Theorem 1.2 and Lemma 4.1, we obtain

[TABLE]

Hence we can find a $\delta=\sqrt{3\log n}$ so that the latter probability gets smaller than $3n^{-2}$ . Further, by applying Theorem 1.3 with $\alpha=\beta=2^{-d}$ to $\mathbf{P}=\mathbf{M}$ we conclude that $\|\mathbf{M}^{t}_{.,k}-\mathbf{\frac{1}{n}}\|_{\infty}=O(t^{-1/2}),$ since $d=O(1)$ . Using the fact $\|.\|_{2}^{2}\leq\|.\|_{\infty}\cdot\|.\|_{1}$ , $\|\mathbf{M}^{t}_{.,k}-\mathbf{\frac{1}{n}}\|_{2}^{2}=O(t^{-1/2}),$ and by the union bound, $\operatorname{disc}(x^{(t)})=O(t^{-1/4}\cdot\sigma\cdot(\log n)^{3/2}+\sqrt{\log n})$ with probability at least $1-3n^{-1}$ . ∎

4.2. Proof of Theorem 1.3

This section is devoted to the proof of Theorem 1.3. Our proof is based on the evolving-set process, which is a Markov chain based on any given irreducible, not necessarily reversible Markov chain on $\Omega$ . For the definition of the evolving set process, we closely follow the exposition in [14, Chapter 17].

Let $\mathbf{P}$ denote the transition matrix of an irreducible Markov chain and $\pi$ its stationary distribution. $\mathbf{P}^{t}$ is the $t$ -step transition probability matrix. The edge measure $Q$ is defined by $Q_{x,y}:=\pi_{x}\mathbf{P}_{x,y}$ and $Q(A,B)=\sum_{x\in A,y\in B}Q_{x,y}$ .

Definition 4.2.

Given a transition matrix $\mathbf{P}$ , the evolving-set process is a Markov chain on subsets of $\Omega$ defined as follows. Suppose the current state is $S\subset\Omega$ . Let $U$ be a random variable which is uniform on $[0,1]$ . The next state of the chain is the set

[TABLE]

This chain is not irreducible because $\varnothing$ and $\Omega$ are absorbing states. It follows that

[TABLE]

since the probability that $y\in S_{t+1}$ is equal to the probability of the event that the chosen value of $U$ is less than $\frac{Q(S_{t},y)}{\pi_{y}}$ .

Proposition 4.3 ([14, Proposition 17.19]).

Let $(M_{t})$ be a non-negative martingale with respect to $(Y_{t})$ , and define $T_{h}:=\min\{t\geq 0:M_{t}=0\ or\ M_{t}\geq h\}$ Assume that for any $h\geq 0$

(i)

For $t<T_{h}$ , $\mathrm{Var}(M_{t+1}\,|\,Y_{0},\ldots,Y_{t})\geq\sigma^{2}$ , and 2. (ii)

$M_{T_{h}}\leq Dh$ .

Let $T:=T_{1}$ . If $M_{0}$ is a constant, then $\mathbb{P}\left[{T>t}\right]\leq\frac{2M_{0}}{\sigma}\sqrt{\frac{D}{t}}.$

We now generalize [14, Lemma 17.14] to cover arbitrarily small loop probabilities.

Lemma 4.4.

Let $(U_{t})$ be a sequence of independent random variables, each uniform on $[0,1]$ , such that $S_{t+1}$ is generated from $S_{t}$ using $U_{t+1}$ . Then with $\beta:=\min\limits_{u}\mathbf{P}_{u,u}>0$ ,

[TABLE]

We list a few auxiliary results from [14] about the evolving set process that will be used to prove the result.

Lemma 4.5 ([14, Lemma 17.12]).

If $(S_{t})_{t\geq 0}$ is the evolving-set process associated to the transition matrix $\mathbf{P}$ , then for any time $t$ and $x,y\in\Omega$

[TABLE]

Recall that $(S_{t})$ is the evolving-set process based on the Markov chain whose transition matrix is $\mathbf{P}$ . $\mathbb{P}_{\{x\}}\left[{y\in S_{t}}\right]$ means the probability of the event $y\in S_{t}$ with the initial state of the evolving set being $\{x\}$ .

Lemma 4.6 ([14, Lemma 17.13]).

The sequence $\{\pi(S_{t})\}$ is a martingale.

Theorem 4.7 ([14, Corollary 17.7]).

Let $(M_{t})$ be a martingale and $\tau$ a stopping time. If $\mathbb{P}\left[{\tau<\infty}\right]$ and $|M_{t\land\tau}|\leq K$ for all $t$ and some constant $K$ where $t\land\tau:=\min\{t,\tau\}$ , then $\mathbb{E}\left[{M_{\tau}}\right]=\mathbb{E}\left[{M_{0}}\right]$ .

Proof of Lemma 4.4.

Given $U_{t+1}\leq\beta$ , the distribution of $U_{t+1}$ is uniform on $[0,\beta]$ .

**Case 1: **For $y\notin S$ , we know that for $y$ satisfying $\frac{Q(S,y)}{\pi_{y}}\in[0,\beta]$

[TABLE]

and for $y$ satisfying $\frac{Q(S,y)}{\pi_{y}}\in(\beta,1]$ ,

[TABLE]

We know that

[TABLE]

Since $y\in S_{t+1}$ if and only if $U_{t+1}\leq Q(S_{t},y)/\pi_{x}$ , we therefore can combine the above results by using an inequality and conclude that

[TABLE]

because $\beta\leq 1$ and $Q(S,y)/\pi_{y}\leq 1$ .

Case 2: For $y\in S$ , we have $Q(S,y)/\pi_{y}\geq Q(y,y)/\pi_{y}\geq\beta$ , it follows that when $U_{t+1}\leq\beta$

[TABLE]

We have

[TABLE]

Based on previous results, we can see that

[TABLE]

By Lemma 4.6 and the formulas above,

[TABLE]

Rearranging shows that

[TABLE]

∎

The derivation of the next lemma closely follows the analysis in [14, Chapter 17]. For the sake of completeness, a proof can be found in the appendix.

Lemma 4.8.

For any two states $x,y$ , $\left|\mathbf{P}^{t}_{x,y}-\pi_{y}\right|\leq\frac{\pi_{y}}{\pi_{x}}\cdot\mathbb{P}_{\{x\}}\left[{\tau>t}\right].$

Proof.

First of all, let the hitting time

[TABLE]

We have $S_{\tau}\in\{\varnothing,\Omega\}$ and $\pi(S_{\tau})=\mathbbm{1}_{\{S_{\tau}=\Omega\}}$ . We consider an evolving set process with $S_{0}=\{x\}$ . By Theorem B.2 and Lemma 4.6,

[TABLE]

For the last equality, it is true because we know that $S_{\tau}$ can only be $\varnothing$ or $\Omega$ . Hence, the probability that $x$ is an element in $S_{\tau}$ is equal to the probability that $S_{\tau}$ is $\Omega$ . Note that here the second $x$ in the last line can be any other element in $\Omega$ . For example, we also know that

[TABLE]

For our bound, we know that by Lemma 4.5 and (4.3),

[TABLE]

By (4.4),

[TABLE]

By simple substitution we obtain

[TABLE]

The last line is true because we remove all possible intersections. ∎

Now we want to use Proposition 4.3 to bound $\mathbb{P}_{\{x\}}\left[{\tau>t}\right]$ . To apply it, we substitute the following parameters: $M_{0}$ is chosen to be $\pi(\{x\})$ , $Y_{t}$ is $S_{t}$ , and $T=T_{1}:=\min\{t\geq 0:\pi(S_{t})=0\ or\ \pi(S_{t})\geq 1\}$ . Hence in our case, $\tau$ is the same as $T$ (or $T_{1}$ ) in the proposition. The following two lemmas elaborate on the two preconditions (i) and (ii) of Proposition 4.3.

Lemma 4.9.

For any time $t$ and $S_{0}=\{x\}$ , $\mathrm{Var}_{S_{t}}(\pi(S_{t+1}))\geq\beta\pi_{\min}^{2}\alpha^{2}.$

Proof.

Conditioning always reduces variance and $S_{t}\neq\varnothing$ or $\Omega$ , we have

[TABLE]

For $S_{t}=S$ ,

[TABLE]

and by Lemma 4.4, we know that

[TABLE]

For simplicity, we let $\mathbb{E}_{S_{t}}\left[{\pi(S_{t+1})\,|\,\mathbbm{1}_{\{U_{t}\leq\beta\}}}\right]$ be $X$ , $\mathbb{E}\left[{\pi(S_{t+1})\,|\,U_{t+1}\leq\beta,S_{t}=S}\right]$ be $x_{1}$ and $\mathbb{E}\left[{\pi(S_{t+1}\,|\,U_{t+1}>\beta,S_{t}=S)}\right]$ be $x_{2}$ . Then we have

[TABLE]

In order to derive a lower bounds on this variance, based on Lemma 4.4 we let $x_{1}=\pi(S)+Q(S,S^{c})$ and $x_{2}=\pi(S)-(\beta/1-\beta)Q(S,S^{c})$ . With this we obtain

[TABLE]

Therefore, provided $S_{t}\notin\{\varnothing,\Omega\}$ , we have

[TABLE]

The last inequality follows from the fact that if $S\notin\{\varnothing,\Omega\}$ then there exist $u\in S,v\notin S$ with $\mathbf{P}_{u,v}>0$ , whence

[TABLE]

Since $1-\beta<1$ , we finally obtain

[TABLE]

∎

Finally, we derive an upper bound on the amount by which $S_{t}$ can increase in one iteration.

Lemma 4.10.

For any time $t$ and $S_{0}=\{x\}$ , $\pi(S_{t+1})\leq\left(\frac{1-\beta}{\alpha}+1\right)\frac{\pi_{\max}}{\pi_{\min}}\cdot\pi(S_{t}).$

Proof.

Since

[TABLE]

If $U$ decreases to [math], then every $y\in S_{t+1}$ is at least connected to an $x\in S_{t}$ . In other words, $\mathbf{P}_{x,y}>0$ for $x\in S_{t}$ and $y\in S_{t+1}$ . Hence $|S_{t+1}|\leq(\frac{1-\beta}{\alpha}+1)|S_{t}|$ .

We also know that

[TABLE]

∎

The proof of Theorem 1.3 follows then by combining Proposition 4.3, Lemma 4.4, Lemma 4.8, Lemma 4.9 and Lemma 4.10.

Proof of Theorem 1.3.

With the help of the previous three lemmas, we can apply Proposition 4.3 with $M_{0}=\pi_{x}$ , $\sigma\geq\beta^{1/2}\pi_{x}\alpha$ and $D=\left(\frac{1-\beta}{\alpha}\right)\frac{\pi_{\max}}{\pi_{\min}}$ to obtain

[TABLE]

∎

4.3. Proof of Theorem 1.5

We now prove the following discrepancy bound that depends on the $\lambda(\mathbf{M})$ , as defined in Section 2.

Proof.

By [25, Lemma 2.4], for any pair of vertices $u,v\in V$ , $\left|\mathbf{M}^{t}_{u,v}-\frac{1}{n}\right|\leq\lambda(\mathbf{M})^{t/2}.$ Hence by Lemma 4.1 $\left\|\mathbf{M}^{t}_{.,u}-\mathbf{M}^{t}_{.,v}\right\|_{2}=O(\lambda(\mathbf{M})^{t/4})$ , and the bound on the discrepancy follows from Theorem 1.2 and the union bound over all vertices. ∎

5. Applications to Different Graph Topologies

Cycles. Recall that for the cycle, $V=\{0,\ldots,n-1\}$ is the set of vertices, and the distance between two vertices is $\operatorname{dist}(x,y)=\min\{y-x,x+n-y\}$ for any pair of vertices $x<y$ .

The upper bound on the discrepancy follows directly from Theorem 1.4, and it only remains to prove the lower bound. To this end, we will apply the lower bound in Theorem 1.2 and need to derive a lower bound on $\|\mathbf{M}_{.,u}^{t}-\mathbf{\frac{1}{n}}\|_{2}^{2}$ . Intuitively, if we had a simple random walk, we could immediately infer that this quantity is $\Omega(1/\sqrt{t})$ , since after $t$ steps, the random walk is with probability $\approx 1/\sqrt{t}$ at any vertex with distance at most $O(\sqrt{t})$ . To prove that this also holds for the load balancing process, we first derive a concentration inequality that upper bounds the probability for the random walk to reach a distant state:

Lemma 5.1.

Consider the standard balancing circuit model on the cycle with round matrix $\mathbf{M}$ . Then for any $u\in V$ and $\delta\in(0,n/2-1)$ , we have

[TABLE]

Proof.

The proof of the lemma above makes uses of the following variant of Azuma’s concentration inequality for martingales, which can be for instance found in McDiarmid’s survey on concentration inequalities.

Lemma 5.2 ([17, Theorem 3.13 & Inequality 41]).

Let $Z_{1},Z_{2},\ldots,Z_{n}$ be a martingale difference sequence with $a_{k}\leq Z_{k}\leq b_{k}$ for each $k$ , for suitable constants $a_{k},b_{k}$ . Then for any $\delta\geq 0$ ,

[TABLE]

Note that the balancing circuit on the cycle corresponds to the following random walk $(X_{1},X_{2},\ldots,X_{t})$ on the vertex set $V=\{-n/2+1,\ldots,0,\ldots,n/2-1\}$ , where for any time-step $t\in\mathbb{N}$ , $X_{t}$ denotes the position of the random walk after step $t$ . First, we consider the transition for any odd $s$ : If $X_{s}$ is odd, then with probability $1/2$ , $X_{s+1}=X_{s}+1$ and otherwise $X_{s+1}=X_{s}$ . If $X_{s}$ is even, then with probability $1/2$ , $X_{s+1}=X_{s}-1$ and otherwise $X_{s+1}=X_{s}$ (additions and subtractions are under the implicit assumptions that $-n/2+1\equiv n/2-1$ and $n/2\equiv-n/2+1$ ). The case for even $s$ is analogous.

We will couple the random walk $(X_{t})_{t\geq 0}$ with another random walk $(Y_{t})_{t\geq 0}$ on the integers $\mathbb{N}$ , where again $Y_{t}$ denotes the position of the walk after step $t$ . The transition probabilities are exactly the same as for the walk $(X_{t})_{t\geq 0}$ , the only difference is that we don’t use the equivalences $-n/2+1\equiv n/2-1$ and $n/2\equiv-n/2+1$ . It is clear that we can couple the transitions of the two walks so that they evolve identically as long as the walks do not reach any of the two boundary points $-n/2+1$ or $n/2-1$ .

Let us first analyze $\mathbb{E}\left[{Y_{t}}\right]$ for an odd time step. As described above, the distribution of $Y_{t}-Y_{t-1}$ depends on whether $Y_{t-1}$ is even or not. However, notice regardless of where the random walk is at step $t-2$ , the random walk will be at an odd or even vertex at step $t-1$ with probability $1/2$ each. Hence for any starting position $y$ ,

[TABLE]

and further,

[TABLE]

Combining the last two inequalities shows that for any start vertex $y$ ,

[TABLE]

With the same arguments as before we conclude that for any fixed start vertex $Y_{0}=y_{0}$ ,

[TABLE]

because the expected differences of $Y_{t}-Y_{t-1}$ are all zero whenever $t\geq 3$ .

Let us now consider the martingale $W_{i}=\mathbb{E}\left[{Y_{t}\,\mid\,Y_{0},Y_{1},\ldots,Y_{i}}\right]$ , and let $Z_{i}:=W_{i}-W_{i-1}$ be the corresponding martingale difference sequence. As shown before, $|W_{i}-W_{i-1}|\leq 2$ . Hence by Lemma 5.2,

[TABLE]

If for every $1\leq j\leq t$ , $\sum_{i=1}^{j}W_{i}<\delta$ holds, then this implies both random walks $(X_{t})_{t\geq 0}$ and $(Y_{t})_{t\geq 0}$ behave identically since none of them ever reaches any of the two boundary points $-n/2+1$ or $n/2-1$ . In particular we conclude that for the original walk $(X_{t})_{t\geq 0}$ ,

[TABLE]

where the second-to-last inequality is due to the fact that $\mathbb{E}\left[{\left|\sum_{i=1}^{j}Y_{t}\right|}\right]\leq 2$ . ∎

With the help Lemma 5.1, we can indeed verify our intuition:

Lemma 5.3.

Consider the standard balancing circuit model on the cycle with round matrix $\mathbf{M}$ . Then for any vertex $u\in V$ , $\|\mathbf{M}_{.,u}^{t}-\mathbf{\frac{1}{n}}\|_{2}^{2}=\Omega(1/\sqrt{t})$ .

Proof.

Define $S_{\delta}:=\{w\in V:\operatorname{dist}(w,u)\leq\delta\}$ , so that $|S_{\delta}|=2\delta$ . With $\delta=20\sqrt{t}$ and $t\geq 10$

[TABLE]

By Cauchy-Schwarz inequality,

[TABLE]

∎

Lemma 5.3 also proves that the factor $\sqrt{1/t}$ in the upper bound in Theorem 1.3 is best possible. The lower bound on the discrepancy now follows by combining Lemma 5.3 with Theorem 1.2 and Lemma 4.1 stating that for any vertex $u\in V$ , there exists another vertex $v\in V$ such that $\|\mathbf{M}_{.,u}^{t}-\mathbf{M}_{.,v}^{t}\|_{2}^{2}\geq\|\mathbf{M}_{.,u}^{t}-\mathbf{\frac{1}{n}}\|_{2}^{2}=\Omega(1/\sqrt{t})$ .

Tori. In this section we consider $r$ -dimensional tori, where $r\geq 1$ is any constant. For the upper bound, note that the computation of $\mathbf{M}^{t}_{.,.}$ can be decomposed to independent computations in the $r$ dimensions, and each dimension has the same distribution as the cycle on $n^{1/r}$ vertices. Specifically, if we denote by $\widetilde{\mathbf{M}}$ the round matrix of the standard balancing circuit scheme on the cycle with $n^{1/r}$ vertices and $\mathbf{M}$ is the round matrix of the $r$ -dimensional torus with $n$ vertices, then for any pair of vertices $x=(x_{1},\ldots,x_{r}),v=(y_{1},\ldots,y_{r})$ on the torus we have $\mathbf{M}_{x,y}^{t}=\prod_{i=1}^{r}\widetilde{\mathbf{M}}_{x_{i},y_{i}}^{t}.$ From Theorem 1.3, $|\widetilde{\mathbf{M}}_{x_{i},y_{i}}^{t}-\frac{1}{n^{1/r}}|=O(t^{-1/2})$ , and therefore, since $r$ is constant,

[TABLE]

and thus $\left\|\mathbf{M}_{x,y}^{t}-\mathbf{\frac{1}{n}}\right\|_{2}^{2}=O(t^{-r/2})$ for any pair of vertices $x,y$ . Hence by Lemma 4.1, $\left\|\mathbf{M}_{.,u}^{t}-\mathbf{M}_{.,v}^{t}\right\|_{2}^{2}=O(t^{-r/2})$ . Plugging this bound into Theorem 1.2 yields that the load difference between any pair of the nodes $u$ and $v$ at round $t$ is at most $O(t^{-r/4}\cdot\sigma\cdot\log^{3/2}n+\sqrt{\log n})$ with probability at least $1-2n^{-2}$ . The bound on the discrepancy now simply follows by the union bound.

We now turn to the lower bound on the discrepancy. With the same derivation as in Lemma 5.3 we obtain the following result:

Lemma 5.4.

Consider the standard balancing circuit model on the $r$ -dimensional torus with round matrix $\mathbf{M}$ . Then for any vertex $u\in V$ , $\|\mathbf{M}_{.,u}^{t}-\mathbf{\frac{1}{n}}\|_{2}^{2}=\Omega(t^{-r/2})$ .

As before, the lower bound on the torus now follows by combining Lemma 5.4 with the general lower bound given in Theorem 1.2.

Expanders. The upper bound $O(\lambda(\mathbf{M})^{t/4}\cdot\sigma\cdot(\log n)^{3/2}+\sqrt{\log n})$ for expanders follows immediately from Theorem 1.5. For the lower bound, since the round matrix consists of $d$ matchings, it is easy to verify that whenever $\mathbf{M}_{u,v}^{t}>0$ , we have $\mathbf{M}_{u,v}^{t}\geq 2^{-d\cdot t}$ . Consequently, for any vertex $u\in V$ , $\left\|\mathbf{M}_{.,u}^{t}-\mathbf{\frac{1}{n}}\right\|_{2}^{2}=\Omega(2^{-d\cdot t})$ . Plugging this into Theorem 1.2 yields a lower bound on the discrepancy which is $\Omega(2^{-d\cdot t/2}\cdot\sigma/\sqrt{\log\sigma})$ .

Hypercubes. For the hypercube, there is a worst-case bound of $\log_{2}\log_{2}n+O(1)$ [16, Theorem 5.1 $\&$ 5.3] for any input after $\log_{2}n$ iterations of the dimension-exchange, i.e., after one execution of the round matrix. Hence, we will only analyze the discrepancy after $s$ matchings, where $1\leq s<\log_{2}n$ .

The derivation of the lower bound is almost analogous to the one for expanders, since for any pair of vertices $u,v$ , $\prod_{i=s}^{t}\mathbf{M}_{u,v}^{(s)}\in\{0,2^{-t}\}$ (recall that $\mathbf{M}_{.,.}^{(s)}$ is the matching applied in the $s$ -step of the dimension exchange). The only difference is that we are counting matchings individually and not full periods. By applying the same analysis as in Theorem 1.5, but with the stronger inequality $|\prod_{s=1}^{t}\mathbf{M}_{u,v}^{(s)}-\frac{1}{n}|\leq 2^{-t}$ , and we obtain that the upper bound of the discrepancy is $O(2^{-t/2}\cdot\sigma\cdot(\log n)^{3/2}+\sqrt{\log n})$ . Applying Theorem 1.2, we obtain the lower bound $\Omega(2^{-t/2}\cdot\sigma/\sqrt{\log\sigma})$ .

6. Discussion and Empirical Results

6.1. Average-Case versus Worst-Case

We will now compare our average-case to a worst-case scenario on cycles, 2D-tori and hypercubes. For the sake of concreteness, we always assume that the input is drawn from the uniform distribution $\mathsf{Uni}[0,2K]$ , where $K$ will be specified later. Note that the total number of tokens is $\approx n\cdot K$ , and the initial discrepancy will be $\Theta(K)$ . Our choice for the worst-case load vector will have the same number of tokens and initial discrepancy, however, the exact definition of the vector as well as the choice of the parameter $K$ will depend on the underlying topology.

Cycles. As one representative of a worst-case setting, fix an arbitrary node $u\in V$ and let all nodes with distance at most $n/4$ initially have a load of $2K$ while all other nodes have load [math]. This gives rise to a load vector with $n\cdot K$ tokens and initial discrepancy $2K$ .

2D-Tori. Again, we fix an arbitrary node $u\in V$ and assign a load of $2K$ to the $n/2$ -nearest neighbors of $u$ and load [math] to the other nodes. Again, this defines a load vector with $n\cdot K$ tokens and initial discrepancy $2K$ .

The next result provides a lower bound on the discrepancy for cycles and 2D-tori in the aforementioned worst-case setting. It essentially shows that for worst-case inputs, $\Omega(n^{2})$ rounds and $\Omega(n)$ rounds are necessary for the cycle, 2D-tori, respectively, in order to reduce the discrepancy by more than a constant factor. This stands in sharp contrast to Theorem 1.4, proving a decay of the discrepancy by $\approx t^{-1/4}$ , starting from the first round.

Proposition 6.1.

For the aforementioned worst-case setting on the cycle, it holds for any round $t>0$ that $\operatorname{disc}(x^{(t)})\geq\frac{1}{8}\cdot K\cdot\left(1-\exp\left(-\frac{n^{2}}{2048t}\right)\right)-\sqrt{48\log n},$ with probability at least $1-n^{-1}$ . Further, for 2D-tori, it holds for any round $t>0$ that $\operatorname{disc}(x^{(t)})\geq\frac{1}{8}\cdot K\cdot\left(1-\exp\left(-\frac{n}{2048t}\right)\right)-\sqrt{48\log n},$ with probability at least $1-n^{-1}$ .

Proof.

We first consider the case of a cycle. Let $S_{1}$ be the subset of nodes that have a non-zero initial load; so $|S_{1}|=n/2$ . Clearly, there is a subset of nodes $S_{2}\subseteq V$ with $|S_{2}|=n/8$ so that for each node $u\in S_{2}$ , only nodes $v$ with $\operatorname{dist}(u,v)\geq n/16$ can have $x_{v}^{(0)}>0$ .

We will now derive a lower bound on the discrepancy in this worst-case setting by upper bounding the load of vertices in the subset $S_{2}$ . To lower bound the discrepancy at round $t$ , recall that by Lemma 5.1 we have that

[TABLE]

Let us now choose $\delta=n/16$ , and we thus conclude that

[TABLE]

This implies for the total load of vertices in $S_{2}$ at time $t$ :

[TABLE]

where $K$ is the average load. Recalling that $|S_{2}|=n/8$ , by the pigeonhole principle there exists a node $v\in S_{2}$ such that

[TABLE]

This immediately implies the following lower bound on the discrepancy:

[TABLE]

where $\bar{\xi}=K$ is the average load. The corresponding lower bound on $\operatorname{disc}(x^{(t)})$ follows by Theorem 3.1 and the union bound.

The proof for the 2-dimensional torus is almost identical. Again, let $S_{1}$ be the set of nodes that have a non-zero load. Clearly, there is a subset $S_{2}\subseteq V$ with $|S_{2}|=n/8$ so that for each node $u\in S_{2}$ , only nodes $v$ with $\operatorname{dist}(u,v)\geq\sqrt{n}/16$ can have $x_{v}^{(0)}>0$ .

Let us now view $\mathbf{M}$ as the transition matrix of a Markov chain. Then $\mathbf{M}^{t}$ is obtained by running two independent Markov chains (one for each dimension), where each of the two Markov chains corresponds to the round matrix of the cycle. We can still apply Lemma 5.1 as before, even though here the size of each cycle is $\sqrt{n}$ , to obtain that

[TABLE]

Here we choose $\delta=\sqrt{n}/16$ , and the remaining part of the proof is exactly the same as before. ∎

Hypercube. Regarding the hypercube, we will consider only $\log_{2}n$ rounds, since the discrepancy is $\log\log_{2}n+O(1)$ after $\log_{2}n$ rounds and $O(1)$ after $2\log_{2}n$ rounds [16]. A natural corresponding worst-case distribution is to have load $2K$ on all nodes whose $\log_{2}n$ -th bit is equal to one and load [math] otherwise. This way, the discrepancy is only reduced in the final round $\log_{2}n$ .

6.2. Experimental Setup

For each of the three graphs cycles, 2D-tori and hypercube, we consider two comparative experiments with an average-case load vector and a worst-case initial load vector each. The plots and tables on the next two pages display the results, where for each case we took the average discrepancy over 10 independent runs.

The first experiment considers a “lightly loaded case”, where the theoretical results suggest that a small (i.e., constant or logarithmic) discrepancy is reached well before the expected “worst-case load balancing times”, which are $\approx n^{2}$ for cycles and $\approx n$ for 2D-tori. The second experiments considers a “heavily loaded case”, where the theoretical results suggest that a small discrepancy is not reached faster than in the worst-case.

Specifically, for cycles and 2D-tori, we choose for the lightly loaded case $K=\sqrt{n}$ and for the heavily loaded case $K=n^{2}$ . The experiments confirm the theoretical results in the sense that for both choices of $K$ , we have a much quicker convergence of the discrepancy than in the corresponding worst cases. However, the experiments also demonstrate that only in the lightly loaded case we reach a small discrepancy quickly, whereas in the heavily loaded case there is no big difference between worst-case and average-case if it comes to the time to reach a small discrepancy.

On the hypercube, since we are interested in the case where $1\leq t\leq\log_{2}n$ , our bounds on the discrepancy indicates that we should choose $K$ smaller than in the case of cycles and 2D-tori. That is why we choose $K=n^{1/4}$ in the lightly loaded case and $K=n$ in the heavily loaded case (As a side remark, we note that due to the symmetry of the hypercube, any initial load vector sampled from $\mathsf{Uni}[0,\beta\cdot(n-1)]$ is equivalent to an initial load vector sampled from $\mathsf{Uni}[0,n-1]$ .) With these adjustments of $K$ in both cases, the experimental results of the hypercube are inline with the ones for the cycle and 2D-tori.

The details of the experiments containing plots and tables with the sampled discrepancies can be found on the following two pages (Section A).

Appendix A Experimental Data and Charts

$t$$25$$50$$75$$100$$125$ 1 $2^{4}$$2^{8}$$2^{12}$$2^{16}$$2^{20}$$2^{24}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})$

[TABLE] $t$$10^{1}$$10^{2}$$10^{3}$$10^{4}$$10^{5}$$10^{6}$$10^{7}$$10^{8}$ 1 $2^{4}$$2^{8}$$2^{12}$$2^{16}$$2^{20}$$2^{24}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})$

[TABLE] $t$$100$$200$$300$$400$$500$ 1 $2^{4}$$2^{8}$$2^{12}$$2^{16}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})$

[TABLE]

Experimental Results: Experiments (i) on the cycle with $n=2^{12}$ and initial discrepancy $2^{7}=128$ , (ii) on the cycle with $n=2^{12}$ and initial discrepancy $2^{25}=33,554,432$ , and (iii) on the 2D-torus with $n=2^{16}$ and initial discrepancy of $2^{9}$ . For the heavily loaded case, we used logarithmic scaling on the $y$ -axis to highlight the behavior when $t$ is close to the worst-case load balancing time.

$t$$10^{1}$$10^{2}$$10^{3}$$10^{4}$$10^{5}$$10^{6}$$10^{7}$$10^{8}$$10^{9}$$10^{10}$ 1 $2^{4}$$2^{8}$$2^{12}$$2^{16}$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})$

[TABLE] $t$$25$$50$$75$$100$$125$ 1 $5$$10$$15$$20$$25$$28$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})$

[TABLE] $t$$10^{1}$$10^{2}$$10^{3}$$10^{4}$$10^{5}$$10^{6}$$10^{7}$$10^{8}$$10^{9}$ 1 $5$$10$$15$$20$$25$$28$$\operatorname{disc}_{wc}(x^{(t)})$$\operatorname{disc}_{ac}(x^{(t)})$

[TABLE]

Experimental Results (cntd.): Experiments (iv) on the 2D-torus with $n=2^{16}$ and initial discrepancy $2^{33}=8,589,934,592$ , (v) on the hypercube with $n=2^{28}$ and initial discrepancy $256$ , and (vi) on the hypercube with $n=2^{28}$ and initial discrepancy of $2^{28}=268,435,456$ . For the heavily loaded cases, we used logarithmic scaling on the $y$ -axis to highlight the behaviour when $t$ is close to the worst-case load balancing time.

Appendix B Concentration Tools

Lemma B.1 ([3, Theorem A.1.15]).

Let $X$ have a Poisson distribution with mean $\mu$ . Then for any $\epsilon>0$ ,

[TABLE]

Theorem B.2 (Optional Stopping Theorem [14, Corollary 17.7]).

Let $(M_{t})$ be a martingale and $\tau$ a stopping time. If $\mathbb{P}\left[{\tau<\infty}\right]$ and $|M_{t\land\tau}|\leq K$ for all $t$ and some constant $K$ where $t\land\tau:=\min\{t,\tau\}$ , then $\mathbb{E}\left[{M_{\tau}}\right]=\mathbb{E}\left[{M_{0}}\right]$ .

Theorem B.3 (Hoeffding’s Inequality [12]).

Consider a collection of independent random variables $X_{i}\in[a_{i},b_{i}]$ with $i\in[n]$ . Then for any number $\delta>0$ ,

[TABLE]

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Milton Abramowitz, Irene A Stegun, et al. Handbook of mathematical functions. Applied mathematics series , 55:62, 1966.
2[2] Dan Alistarh, Keren Censor-Hillel, and Nir Shavit. Are lock-free concurrent algorithms practically wait-free? J. ACM , 63(4):31:1–31:20, 2016.
3[3] N. Alon and J. Spencer. The Probabilistic Method . Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons, 2nd edition, 2000.
4[4] Aris Anagnostopoulos, Adam Kirsch, and Eli Upfal. Load balancing in arbitrary network topologies with stochastic adversarial input. SIAM J. Comput. , 34(3):616–639, 2005.
5[5] Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society , 49(1):122–136, 1941.
6[6] J. E. Boillat. Load balancing and poisson equation in a graph. Concurrency: Pract. Exper. , 2:289–313, 1990.
7[7] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized Gossip Algorithms. IEEE Transactions on Information Theory and IEEE/ACM Transactions on Networking , 52:2508–2530, 2006.
8[8] G. Cybenko. Load balancing for distributed memory multiprocessors. J. Parallel and Distributed Comput. , 7:279–301, 1989.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Randomized Load Balancing on Networks with Stochastic Inputs

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

Definition 1.1**.**

Theorem 1.2**.**

Theorem 1.3**.**

Theorem 1.4**.**

Theorem 1.5**.**

2. Notation and Background

Lemma 2.1**.**

Proof.

Lemma 2.2**.**

3. Proof of the General Bound (Theorem 1.2)

3.1. Proof of Theorem 1.2 (Upper Bound)

Theorem 3.1** ([25, Theorem 3.6(iii)]).**

Proof of Theorem 1.2 (Upper Bound).

3.2. Proof of Theorem 1.2 (Lower Bound)

Theorem 3.2** (Berry-Esseen Theorem [5, 9] for non-identical r.v.).**

Proof of Theorem 1.2 (Lower Bound).

4. Proof of the Universal Bounds (Theorem 1.4, Theorem 1.5)

4.1. Proof of Theorem 1.4

Lemma 4.1**.**

Proof.

Proof of Theorem 1.4.

4.2. Proof of Theorem 1.3

Definition 4.2**.**

Proposition 4.3** ([14, Proposition 17.19]).**

Lemma 4.4**.**

Lemma 4.5** ([14, Lemma 17.12]).**

Lemma 4.6** ([14, Lemma 17.13]).**

Theorem 4.7** ([14, Corollary 17.7]).**

Proof of Lemma 4.4.

Lemma 4.8**.**

Proof.

Lemma 4.9**.**

Proof.

Lemma 4.10**.**

Proof.

Proof of Theorem 1.3.

4.3. Proof of Theorem 1.5

Proof.

5. Applications to Different Graph Topologies

Lemma 5.1**.**

Proof.

Lemma 5.2** ([17, Theorem 3.13 & Inequality 41]).**

Lemma 5.3**.**

Proof.

Lemma 5.4**.**

6. Discussion and Empirical Results

6.1. Average-Case versus Worst-Case

Proposition 6.1**.**

Proof.

6.2. Experimental Setup

Appendix A Experimental Data and Charts

Appendix B Concentration Tools

Lemma B.1** ([3, Theorem A.1.15]).**

Theorem B.2** (Optional Stopping Theorem [14, Corollary 17.7]).**

Theorem B.3** (Hoeffding’s Inequality [12]).**

Definition 1.1.

Theorem 1.2.

Theorem 1.3.

Theorem 1.4.

Theorem 1.5.

Lemma 2.1.

Lemma 2.2.

Theorem 3.1 ([25, Theorem 3.6( $i$ )]).

Theorem 3.2 (Berry-Esseen Theorem [5, 9] for non-identical r.v.).

Lemma 4.1.

Definition 4.2.

Proposition 4.3 ([14, Proposition 17.19]).

Lemma 4.4.

Lemma 4.5 ([14, Lemma 17.12]).

Lemma 4.6 ([14, Lemma 17.13]).

Theorem 4.7 ([14, Corollary 17.7]).

Lemma 4.8.

Lemma 4.9.

Lemma 4.10.

Lemma 5.1.

Lemma 5.2 ([17, Theorem 3.13 & Inequality 41]).

Lemma 5.3.

Lemma 5.4.

Proposition 6.1.

Lemma B.1 ([3, Theorem A.1.15]).

Theorem B.2 (Optional Stopping Theorem [14, Corollary 17.7]).

Theorem B.3 (Hoeffding’s Inequality [12]).