Dynamic Averaging Load Balancing on Arbitrary Graphs

Petra Berenbrink; Lukas Hintze; Hamed Hosseinpour; Dominik Kaaser,; Malin Rau

arXiv:2302.12201·cs.DC·February 24, 2023

Dynamic Averaging Load Balancing on Arbitrary Graphs

Petra Berenbrink, Lukas Hintze, Hamed Hosseinpour, Dominik Kaaser,, Malin Rau

PDF

TL;DR

This paper analyzes dynamic load balancing on arbitrary graphs using averaging over matchings, providing bounds on load discrepancy for both discrete and continuous loads, and introduces a novel drift analysis technique.

Contribution

It presents the first analysis of discrete and dynamic averaging load balancing on general graphs, employing a new drift technique linked to electrical network resistance.

Findings

01

Bounds discrepancy for various matching models

02

Applies to broad class of graphs

03

Introduces drift analysis method

Abstract

In this paper we study dynamic averaging load balancing on general graphs. We consider infinite time and dynamic processes, where in every step new load items are assigned to randomly chosen nodes. A matching is chosen, and the load is averaged over the edges of that matching. We analyze the discrete case where load items are indivisible, moreover our results also carry over to the continuous case where load items can be split arbitrarily. For the choice of the matchings we consider three different models, random matchings of linear size, random matchings containing only single edges, and deterministic sequences of matchings covering the whole graph. We bound the discrepancy, which is defined as the difference between the maximum and the minimum load. Our results cover a broad range of graph classes and, to the best of our knowledge, our analysis is the first result for discrete and…

Tables2

Table 1. Table 1: Asymptotic upper bounds on the discrepancy in specific graph classes.

Graph	$SBal (𝒟_{RM} (G), 1, m)$	$SBal (𝒟_{BC} (G), 1, m)$	$ABal (𝒟_{A} (G), 1)$
	Section B.5	Section C.1	Section D.1
$d$ -regular graph (const. $d$ )	$\log (n) + \sqrt{m \cdot \log (n)}$	$\log (n) + \sqrt{m \cdot \log (n)}$	$\sqrt{n \cdot \log (n)}$
cycle $C_{n}$	$\log (n) + \sqrt{m \cdot \log (n)}$	$\log (n) + \sqrt{m \cdot \log (n)}$	$\sqrt{n \cdot \log (n)}$
2-D torus	$\log (n) + \sqrt{m / n} \cdot \log^{3 / 2} (n)$	$(1 + \sqrt{m / n}) \cdot \log (n)$	$\log^{3 / 2} (n)$
$r$ -D torus (const. $r \geq 3$ )	$(1 + \sqrt{m / n}) \cdot \log (n)$	$\log (n) + \sqrt{m / n \cdot \log (n)}$	$\log (n)$
hypercube	$(1 + \sqrt{m / n}) \cdot \log (n)$	$(1 + \sqrt{m / n}) \cdot \log (n)$	$\log (n)$

Table 2. Table 2: Asymptotic lower bounds on the discrepancy in specific graph classes.

Graph	$SBal (𝒟_{BC} (G), 1, m)$
	Corollary C.2
$d$ -regular graph (const. $d$ )	$\sqrt{m}$
cycle $C_{n}$	$\sqrt{m}$
2-D torus	$\sqrt{(m / n) \cdot \log (n)}$
$r$ -D torus (const. $r \geq 3$ )	$\sqrt{m / n}$
hypercube	$\sqrt{m / n}$

Equations513

L_{i,j}\coloneqq\begin{cases}\mathopen{}\mathclose{{}\left\lceil\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rceil,&\text{with probability }p,\\[5.0pt] \mathopen{}\mathclose{{}\left\lfloor\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rfloor,&\text{with probability }1-p.\end{cases}

L_{i,j}\coloneqq\begin{cases}\mathopen{}\mathclose{{}\left\lceil\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rceil,&\text{with probability }p,\\[5.0pt] \mathopen{}\mathclose{{}\left\lfloor\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rfloor,&\text{with probability }1-p.\end{cases}

M_{i, j}^{β} (t) : = ⎩ ⎨ ⎧ 1, 1 - β /2, β /2, 0, if i = j and i is not matched at time t, if i = j and i is matched at time t, if i and j are matched at time t, otherwise.

M_{i, j}^{β} (t) : = ⎩ ⎨ ⎧ 1, 1 - β /2, β /2, 0, if i = j and i is not matched at time t, if i = j and i is matched at time t, if i and j are matched at time t, otherwise.

M^{[t_{1}, t_{2}]} : = M (t_{2}) \cdot M (t_{2} - 1) \cdot \dots \cdot M (t_{1} + 1) \cdot M (t_{1}),

M^{[t_{1}, t_{2}]} : = M (t_{2}) \cdot M (t_{2} - 1) \cdot \dots \cdot M (t_{1} + 1) \cdot M (t_{1}),

\vec{X}(t)=\mathbf{M}(t)\cdot\mathopen{}\mathclose{{}\left(\vec{X}(t-1)+\vec{\ell}(t)}\right)+\vec{\varepsilon}(t).

\vec{X}(t)=\mathbf{M}(t)\cdot\mathopen{}\mathclose{{}\left(\vec{X}(t-1)+\vec{\ell}(t)}\right)+\vec{\varepsilon}(t).

{\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}\mathopen{}\mathclose{{}\left(\log(n)\cdot\mathopen{}\mathclose{{}\left(1+\sqrt{\frac{m}{n}\cdot\frac{\operatorname{t^{*}_{\mathrm{hit}}(G)}}{n}}}\right)+\sqrt{\frac{\log(n)}{\beta}\cdot\frac{m}{n}\cdot T(G)}}\right).}

{\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}\mathopen{}\mathclose{{}\left(\log(n)\cdot\mathopen{}\mathclose{{}\left(1+\sqrt{\frac{m}{n}\cdot\frac{\operatorname{t^{*}_{\mathrm{hit}}(G)}}{n}}}\right)+\sqrt{\frac{\log(n)}{\beta}\cdot\frac{m}{n}\cdot T(G)}}\right).}

X (t)

X (t)

\displaystyle=\mathbf{M}(t)\cdot\Big{(}\underbrace{\mathopen{}\mathclose{{}\left(\mathbf{M}(t-1)\cdot\mathopen{}\mathclose{{}\left(\vec{X}(t-2)+\vec{\ell}(t-1)}\right)+\vec{\varepsilon}(t-1)}\right)}_{\vec{X}(t-1)}+\vec{\ell}(t)\Big{)}+\vec{\varepsilon}(t)

= M^{[t - 1, t]} \cdot X (t - 2) + τ = t - 1 \sum t M^{[τ, t]} \cdot ℓ (τ) + τ = t - 1 \sum t M^{[τ + 1, t]} \cdot ε (τ)

X (t) = I (t) τ = 1 \sum t M^{[1, t]} \cdot X (0) + D (t) τ = 1 \sum t M^{[τ, t]} \cdot ℓ (τ) + R (t) τ = 1 \sum t M^{[τ + 1, t]} \cdot ε (τ) .

X (t) = I (t) τ = 1 \sum t M^{[1, t]} \cdot X (0) + D (t) τ = 1 \sum t M^{[τ, t]} \cdot ℓ (τ) + R (t) τ = 1 \sum t M^{[τ + 1, t]} \cdot ε (τ) .

disc (X (t)) \leq disc (I (t)) + disc (D (t)) + disc (R (t)) .

disc (X (t)) \leq disc (I (t)) + disc (D (t)) + disc (R (t)) .

\operatorname{\mathrm{disc}}(\vec{D}(t))={\operatorname{O}}\mathopen{}\mathclose{{}\left(\gamma\log(n)\cdot\mathopen{}\mathclose{{}\left(1+\sqrt{\frac{m}{n}\cdot\frac{\operatorname{t^{*}_{\mathrm{hit}}(G)}}{n}}}\right)+\sqrt{\frac{\gamma\log(n)}{\beta}\cdot\frac{m}{n}\cdot T(G)}}\right)

\operatorname{\mathrm{disc}}(\vec{D}(t))={\operatorname{O}}\mathopen{}\mathclose{{}\left(\gamma\log(n)\cdot\mathopen{}\mathclose{{}\left(1+\sqrt{\frac{m}{n}\cdot\frac{\operatorname{t^{*}_{\mathrm{hit}}(G)}}{n}}}\right)+\sqrt{\frac{\gamma\log(n)}{\beta}\cdot\frac{m}{n}\cdot T(G)}}\right)

disc (I (t)) \leq 1.

disc (I (t)) \leq 1.

disc (R (t)) \leq 2 \cdot γ lo g (n) / β .

disc (R (t)) \leq 2 \cdot γ lo g (n) / β .

\operatorname{\mathrm{disc}}(\vec{D}(t))={\operatorname{O}}\mathopen{}\mathclose{{}\left(\gamma\log(n)\cdot\mathopen{}\mathclose{{}\left(1+\sqrt{\frac{m}{n}\cdot\frac{\operatorname{t^{*}_{\mathrm{hit}}(G)}}{n}}}\right)+\sqrt{\frac{\gamma\log(n)}{\beta}\cdot\frac{m}{n}\cdot T(G)}}\right).

\operatorname{\mathrm{disc}}(\vec{D}(t))={\operatorname{O}}\mathopen{}\mathclose{{}\left(\gamma\log(n)\cdot\mathopen{}\mathclose{{}\left(1+\sqrt{\frac{m}{n}\cdot\frac{\operatorname{t^{*}_{\mathrm{hit}}(G)}}{n}}}\right)+\sqrt{\frac{\gamma\log(n)}{\beta}\cdot\frac{m}{n}\cdot T(G)}}\right).

\Upsilon(\mathbf{M}^{[t]})\coloneqq\max_{k\in[n]}\Upsilon_{k}(\mathbf{M}^{[t]}),\quad\textup{where}\quad\Upsilon_{k}(\mathbf{M}^{[t]})\coloneqq\sqrt{\sum_{\tau=1}^{t}\mathopen{}\mathclose{{}\left\lVert\mathbf{M}^{[\tau,t]}_{k,\cdot}-\frac{\vec{1}}{n}}\right\rVert_{2}^{2}}.

\Upsilon(\mathbf{M}^{[t]})\coloneqq\max_{k\in[n]}\Upsilon_{k}(\mathbf{M}^{[t]}),\quad\textup{where}\quad\Upsilon_{k}(\mathbf{M}^{[t]})\coloneqq\sqrt{\sum_{\tau=1}^{t}\mathopen{}\mathclose{{}\left\lVert\mathbf{M}^{[\tau,t]}_{k,\cdot}-\frac{\vec{1}}{n}}\right\rVert_{2}^{2}}.

\mathopen{}\mathclose{{}\left\lvert D_{k}(t)-t\cdot\frac{m}{n}}\right\rvert\geq\frac{4}{3}\cdot\gamma\log(n)+\sqrt{8\gamma\log(n)\cdot\frac{m}{n}}\cdot\Upsilon_{k}(\mathbf{m}^{[t]}).

\mathopen{}\mathclose{{}\left\lvert D_{k}(t)-t\cdot\frac{m}{n}}\right\rvert\geq\frac{4}{3}\cdot\gamma\log(n)+\sqrt{8\gamma\log(n)\cdot\frac{m}{n}}\cdot\Upsilon_{k}(\mathbf{m}^{[t]}).

B (τ, j, w) : = {1, 0, if the j -th load item of step τ is allocated to node w, \mbox o t h er w i se .

B (τ, j, w) : = {1, 0, if the j -th load item of step τ is allocated to node w, \mbox o t h er w i se .

D_{k} (t)

D_{k} (t)

\displaystyle{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[{C}_{k}{(\tau,j)}}\right]}={\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\!\sum_{w\in[n]\!\!\!}\mathopen{}\mathclose{{}\left(\mathbf{m}^{[\tau,t]}_{k,w}\cdot{B}({\tau,j,w})}\right)}\right]}\!\!=\!\sum_{\!\!\!w\in[n]\!\!\!}\mathbf{m}^{[\tau,t]}_{k,w}\cdot{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[{B}({\tau,j,w})}\right]}\!=\!\sum_{\!\!\!w\in[n]\!\!\!}\mathbf{m}^{[\tau,t]}_{k,w}\cdot\frac{1}{n}\!=\!\frac{1}{n},

\displaystyle{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[{C}_{k}{(\tau,j)}}\right]}={\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\!\sum_{w\in[n]\!\!\!}\mathopen{}\mathclose{{}\left(\mathbf{m}^{[\tau,t]}_{k,w}\cdot{B}({\tau,j,w})}\right)}\right]}\!\!=\!\sum_{\!\!\!w\in[n]\!\!\!}\mathbf{m}^{[\tau,t]}_{k,w}\cdot{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[{B}({\tau,j,w})}\right]}\!=\!\sum_{\!\!\!w\in[n]\!\!\!}\mathbf{m}^{[\tau,t]}_{k,w}\cdot\frac{1}{n}\!=\!\frac{1}{n},

Var [C_{k} (τ, j)]

Var [C_{k} (τ, j)]

\displaystyle=\sum_{w^{\prime}\in[n]}\frac{1}{n}\cdot\mathopen{}\mathclose{{}\left(\mathbf{m}^{[\tau,t]}_{k,w^{\prime}}-\frac{1}{n}}\right)^{2}=\frac{1}{n}\cdot\mathopen{}\mathclose{{}\left\lVert\mathbf{m}^{[\tau,t]}_{k,\cdot}-\frac{\vec{1}}{n}}\right\rVert_{2}^{2},

\displaystyle{\operatorname{\mathrm{Var}}\mathopen{}\mathclose{{}\left[\sum_{\tau=1}^{t}\sum_{j\in[m]}{C}_{k}{(\tau,j)}}\right]}

\displaystyle{\operatorname{\mathrm{Var}}\mathopen{}\mathclose{{}\left[\sum_{\tau=1}^{t}\sum_{j\in[m]}{C}_{k}{(\tau,j)}}\right]}

\displaystyle=\frac{m}{n}\cdot\mathopen{}\mathclose{{}\left(\Upsilon_{k}(\mathbf{m}^{[t]})}\right)^{2},

{\operatorname{\mathbb{P}}}\mathopen{}\mathclose{{}\left[{D_{k}(t)-t\cdot\frac{m}{n}}\geq\frac{2}{3}\cdot\gamma\log(n)+\sqrt{2\gamma\log(n)\cdot\frac{m}{n}}\cdot\Upsilon_{k}(\mathbf{m}^{[t]})}\right]\leq n^{-\gamma}.

{\operatorname{\mathbb{P}}}\mathopen{}\mathclose{{}\left[{D_{k}(t)-t\cdot\frac{m}{n}}\geq\frac{2}{3}\cdot\gamma\log(n)+\sqrt{2\gamma\log(n)\cdot\frac{m}{n}}\cdot\Upsilon_{k}(\mathbf{m}^{[t]})}\right]\leq n^{-\gamma}.

{\operatorname{\mathbb{P}}\mathopen{}\mathclose{{}\left[\mathopen{}\mathclose{{}\left\lvert D_{k}(t)-t\cdot\frac{m}{n}}\right\rvert\geq\frac{4}{3}\cdot\gamma\log(n)+\sqrt{8\gamma\log(n)\cdot\frac{m}{n}}\cdot\Upsilon_{k}(\mathbf{m}^{[t]})}\right]}\leq 2\cdot n^{-\gamma}.\qed

{\operatorname{\mathbb{P}}\mathopen{}\mathclose{{}\left[\mathopen{}\mathclose{{}\left\lvert D_{k}(t)-t\cdot\frac{m}{n}}\right\rvert\geq\frac{4}{3}\cdot\gamma\log(n)+\sqrt{8\gamma\log(n)\cdot\frac{m}{n}}\cdot\Upsilon_{k}(\mathbf{m}^{[t]})}\right]}\leq 2\cdot n^{-\gamma}.\qed

\operatorname{\Phi}(\vec{x})\coloneqq\sum_{i\in[n]}\mathopen{}\mathclose{{}\left(x_{i}-\overline{x}}\right)^{2},\quad\text{where}\quad\overline{x}\coloneqq\frac{1}{n}\cdot\sum_{j\in[n]}x_{j}.

\operatorname{\Phi}(\vec{x})\coloneqq\sum_{i\in[n]}\mathopen{}\mathclose{{}\left(x_{i}-\overline{x}}\right)^{2},\quad\text{where}\quad\overline{x}\coloneqq\frac{1}{n}\cdot\sum_{j\in[n]}x_{j}.

Ψ_{S} (x) : = {i, j} \in S \sum (x_{i} - x_{j})^{2} .

Ψ_{S} (x) : = {i, j} \in S \sum (x_{i} - x_{j})^{2} .

\mathopen{}\mathclose{{}\left(\Upsilon_{k}(\mathbf{M}^{[t]})}\right)^{2}\leq 8\sigma^{2}(\gamma\log(n)+\log(8\sigma^{2}))+\frac{2}{\beta}\cdot\int_{0}^{1}\frac{x}{g(x)}\,{\mathrm{d}x}.

\mathopen{}\mathclose{{}\left(\Upsilon_{k}(\mathbf{M}^{[t]})}\right)^{2}\leq 8\sigma^{2}(\gamma\log(n)+\log(8\sigma^{2}))+\frac{2}{\beta}\cdot\int_{0}^{1}\frac{x}{g(x)}\,{\mathrm{d}x}.

g_{G}(x)\coloneqq\frac{1}{16d}\cdot\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot x,\frac{x^{2}}{\mathrm{Res}(G)},\frac{4}{27}\cdot x^{3}}\right\}\text{ and }\sigma_{G}^{2}=32\cdot(\operatorname{t^{*}_{\mathrm{hit}}(G)}/n)+5.

g_{G}(x)\coloneqq\frac{1}{16d}\cdot\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot x,\frac{x^{2}}{\mathrm{Res}(G)},\frac{4}{27}\cdot x^{3}}\right\}\text{ and }\sigma_{G}^{2}=32\cdot(\operatorname{t^{*}_{\mathrm{hit}}(G)}/n)+5.

\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\geq\frac{1}{16d}\cdot\operatorname{\Psi}_{G}(\vec{x}).

\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\geq\frac{1}{16d}\cdot\operatorname{\Psi}_{G}(\vec{x}).

\operatorname{\Psi}_{G}(\vec{x})\geq\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\operatorname{\Phi}(\vec{x}),\frac{\operatorname{\Phi}(\vec{x})^{2}}{\mathrm{Res}(G)},\frac{4}{27}\cdot\operatorname{\Phi}(\vec{x})^{3}}\right\}.

\operatorname{\Psi}_{G}(\vec{x})\geq\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\operatorname{\Phi}(\vec{x}),\frac{\operatorname{\Phi}(\vec{x})^{2}}{\mathrm{Res}(G)},\frac{4}{27}\cdot\operatorname{\Phi}(\vec{x})^{3}}\right\}.

\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\geq\frac{1}{16d}\cdot\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\operatorname{\Phi}(\vec{x}),\frac{\operatorname{\Phi}(\vec{x})^{2}}{\mathrm{Res}(G)},\frac{4}{27}\cdot\operatorname{\Phi}(\vec{x})^{3}}\right\},

\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\geq\frac{1}{16d}\cdot\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\operatorname{\Phi}(\vec{x}),\frac{\operatorname{\Phi}(\vec{x})^{2}}{\mathrm{Res}(G)},\frac{4}{27}\cdot\operatorname{\Phi}(\vec{x})^{3}}\right\},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Dynamic Averaging Load Balancing on Arbitrary Graphs

Petra Berenbrink111Universität Hamburg, Germany, Lukas Hintze1, Hamed Hosseinpour1,

Dominik Kaaser222TU Hamburg, Germany, Malin Rau1

Abstract

In this paper we study dynamic averaging load balancing on general graphs. We consider infinite time and dynamic processes, where in every step new load items are assigned to randomly chosen nodes. A matching is chosen, and the load is averaged over the edges of that matching. We analyze the discrete case where load items are indivisible, moreover our results also carry over to the continuous case where load items can be split arbitrarily. For the choice of the matchings we consider three different models, random matchings of linear size, random matchings containing only single edges, and deterministic sequences of matchings covering the whole graph. We bound the discrepancy, which is defined as the difference between the maximum and the minimum load. Our results cover a broad range of graph classes and, to the best of our knowledge, our analysis is the first result for discrete and dynamic averaging load balancing processes. As our main technical contribution we develop a drift result that allows us to apply techniques based on the effective resistance in an electrical network to the setting of dynamic load balancing.

33footnotetext: Petra Berenbrink, Hamed Hosseinpour, Malin Rau: Supported by DFG Research Group ADYN (FOR 2975) under grant DFG 41136273544footnotetext: Petra Berenbrink, Hamed Hosseinpour: Supported by the DFG under grant 427756233

1 Introduction

Parallel and distributed computing is ubiquitous in science, technology, and beyond. Key to the performance of a distributed system is the efficient utilization of resources: in order to obtain a substantial speed-up it is of utmost importance that all processors have to handle the same amount of work. Unfortunately, many practical applications such as finite element simulations are highly “irregular”, and the amount of load generated on some processors is much larger than the amount of load generated on others. We therefore investigate load balancing to redistribute the load. Efficient load balancing schemes have a plenitude of applications, including high performance computing [45], cloud computing [39], numerical simulations [37], and finite element simulations [41].

In this paper we consider neighborhood load balancing on arbitrary graphs with $n$ nodes, where the nodes balance their load in each step only with their direct neighbors. We assume discrete load items as opposed to continuous (or idealized) load items which can be broken into arbitrarily small pieces. We study infinite and dynamic processes where new load items are generated in every step. We consider two different settings. In the synchronous setting $m$ load items are generated on randomly chosen nodes. Then a matching is chosen and the load of the nodes is balanced (via weighted averaging) over the edges of that matching. Here we further distinguish between two matching models. We consider the random matching model where linear-size matchings are randomly chosen, and the balancing circuit model where the graph is divided deterministically into $d_{\max}$ many matchings. Here $d_{\max}$ is the maximum degree of any node. In the asynchronous model exactly one load item is generated on a randomly chosen node. In turn, the node chooses one of its edges at random and balances its load with the corresponding neighbor. This model can be regarded as a variant of the synchronous model where the randomly chosen matching has size one. It was introduced by [4] where the authors show results for cycles assuming continuous load. Our goal is to bound the so-called discrepancy, which is defined as the maximal load of any node minus the minimal load of any node.

Results in a Nutshell

In this paper we present, for the three models introduced above, bounds on the expected discrepancy and bounds that hold with high probability. Our bounds for the synchronous model with balancing circuits hold for arbitrary graphs $G$ , the bounds for the asynchronous model and the synchronous model with random matchings hold for regular graphs $G$ only. For the asynchronous model and the model with random matchings our bounds on the discrepancy are expressed in terms of hitting times of a standard random walk on $G$ , as well as in terms of the spectral gap of the Laplacian of $G$ . For the synchronous model with balancing circuits we express our bounds in terms of the global divergence. This can be thought of as a measure of the convergence speed of the Markov chains modeling a random walk on $G$ . However, it does not directly measure the speed of convergence of the chain. It accounts for the time period in which the chain keeps a given distance from the stationary (and uniform) distribution. In physics terminology, it is a measure of total absement, which is the time-integral of displacement.

For all three infinite processes our bounds on the discrepancy hold at an arbitrary point of time as long as the system is initially empty. Otherwise, the bounds hold after an initial time period, its length is a function of the initial discrepancy. In the following we give some exemplary results assuming that the system is initially empty and $m=n$ . For the synchronous model with random matchings and the asynchronous model we can bound the discrepancy by ${\operatorname{O}}(\sqrt{n}\log(n))$ for any regular graph $G$ . Our results show a polylogarithmic bound on the discrepancy for all regular graphs with a hitting time at most ${\operatorname{O}}(n\operatorname{\mathrm{poly}}\log(n))$ (e.g., the two-dimensional torus or the hypercube). In all models we can bound the discrepancy by ${\operatorname{O}}(\sqrt{n\log(n)})$ for arbitrary constant-degree regular graphs. For the full results we refer the reader to Theorem 3.1, Theorem 4.1, and Theorem 5.1. We give a detailed overview on the results on specific graph classes in Table 1 in Section 7.

All bounds presented in this paper also hold for the corresponding continuous processes without rounding. The authors of [4] consider the asynchronous process on cycles in the continuous setting where the load items can be divided into arbitrary small pieces. They bound the expected discrepancy and show that $\operatorname{\mathrm{disc}}(G)=O(\sqrt{n}\log(n))$ for a cycle $G$ with $n$ nodes. In contrast, we improve that bound for the cycle to $\operatorname{\mathrm{disc}}(G)=O(\sqrt{n\log(n)})$ . Note that our result not only bounds the expected discrepancy but it also holds with high probability.

Our main analytical vehicle is a drift theorem that bounds the tail of the sum of a non-increasing sequence of random variables. Our drift theorem adapts known drift results from the literature, similarly to the Variable Drift Theorem in [31].

1.1 Related Work

There is a vast body of literature on iterative load balancing schemes on graphs where nodes are allowed to balance (or average) their load with neighbors only. One distinguishes between diffusion load balancing where the nodes balance their load with all neighbors at the same time and the matching model (or dimension exchange) model where the edges which are used for the balancing form a matching. In the latter model every resource is only involved in one balancing action per step, which greatly facilitates the analysis.

In this overview we only consider theoretical results and, as it is beyond the scope of this work to provide a complete survey, we focus on results for discrete load balancing. For results about continuous load balancing see, for example, [18, 29]. There are also many results in the context of balancing schemes where not the resources try to balance their load but the tokens (acting as selfish players) try to find a resource with minimum load. See [22] for a comprehensive survey about selfish load balancing and [2, 27, 12] for some recent results. Another related topic is token distribution where nodes do not balance their entire load with neighbors but send only single tokens over to neighboring nodes with a smaller load. See [24, 7, 42] for the static setting and [6] for the dynamic setting.

Discrete Models

The authors of [40] give the first rigorous result for discrete load balancing in the diffusion model. They assume that the number of tokens sent along each edge is obtained by rounding down the amount of load that would be sent in the continuous case. Using this approach they establish that the discrepancy is at most $O(n^{2})$ after $O(\log(Kn))$ steps, where $K$ is the initial discrepancy. Similar results for the matching model are shown in [25]. While always rounding down may lead to quick stabilization, the discrepancy tends to be quite large, a function of the diameter of the graph. Therefore, the authors of [43] suggest to use randomized rounding in order to get a better approximation of the continuous case. They show results for a wide class of diffusion and matching load balancing protocols and introduce the so-called local divergence, which aggregates the sum of load differences over all edges in all rounds. The authors prove that the local divergence gives an upper bound on the maximum deviation between the continuous and discrete case of a protocol. In [23] the authors show several results for a randomized protocol with rounding in the matching model. For complete graphs their results show a discrepancy of $O(n\sqrt{\log n})$ after $\Theta(\log(Kn))$ steps. Later, [8] extended some of these results to the diffusion model. In [44] the authors show that the number of rounds needed to reach constant discrepancy is w.h.p. bounded by a function of the spectral gap of the relevant mixing matrix and the initial discrepancy. In [9] the authors propose a very simple potential function technique to analyze discrete diffusion load balancing schemes, both for discrete and continuous settings. In [10] the authors investigate a load balancing process on complete graphs. In each round a pair of nodes is selected uniformly at random and completely balance their loads up to a rounding error of $\pm 1$ .

The authors of [15] study load balancing via matchings assuming random placement of the load items. The initial load distribution is sampled from exponentially concentrated distributions (including the uniform, binomial, geometric, and Poisson distributions). The authors show that in this setting the convergence time is smaller than in the worst case setting. Regardless of the graph’s topology, the discrepancy decreases by a factor of $\sqrt[4]{t}$ within $t$ synchronous rounds. Their approach of using concentration inequalities to bound the discrepancy (in terms of the squared $2$ -norm of the columns of the matrices underlying the mixing process) strongly influenced our approach.

Dynamic Models

There are far less results for the dynamic setting where new load enters the system over time. In [4] the authors study a model similar to our asynchronous model. In each step one load item is allocated to a chosen node. In the same step the chosen node picks a random neighbor, and the two nodes balance their loads by averaging them (continuous model). The authors show that the expected discrepancy is bounded by $O(n\sqrt{n}\log n)$ , as well as a lower bound on the square of the discrepancy of $\Omega(n)$ . The authors of [5] consider load balancing via matchings in a dynamic model where the load is, in every step, distributed by an adversary. They show the system is stable for sufficiently limited adversaries. They also give some upper bounds on the maximum load for the somewhat more restricted adversary. The authors of [11] consider discrete dynamic diffusion load balancing on arbitrary graphs. In each step up to $n$ load items are generated on arbitrary nodes (the allocation is determined by an adversary). Then the nodes balance their load with each neighbor and finally one load item is deleted from every non-empty node. The authors show that the system is stable, which means that the total load remains bounded over time (as a function of $n$ alone and independently of the time $t$ ).

2 Balancing Models and Notation

We consider the following class of dynamic load balancing processes on $d$ -regular graphs $G$ with $n$ nodes $V(G)=[n]$ . Each process is modeled by a Markov chain $(\vec{X}(t))_{t\in\operatorname{\mathbb{N}}_{0}}$ , where the load vector $\vec{X}(t)=(X_{i}(t))_{i\in[n]}\in\operatorname{\mathbb{R}}^{n}$ is the state of the process at the end of step $t$ , and $X_{i}(t)$ is the load of node $i$ at time $t$ . We measure a load vector’s imbalance by the discrepancy $\operatorname{\mathrm{disc}}(\vec{x})$ , which is the difference between the maximum load and the minimum load $\operatorname{\mathrm{disc}}(\vec{x})\coloneqq\max_{i\in[n]}x_{i}-\min_{j\in[n]}x_{j}$ .

We consider two balancing processes, the synchronous process SBal and the asynchronous process ABal. Both processes are parameterized by a balancing parameter $\beta$ determining the balancing speed and a matching distribution $\mathcal{D}(G)$ . For SBal, $\mathcal{D}(G)$ is a distribution over linear-sized matchings of $G$ . For ABal, $\mathcal{D}(G)$ is a distribution over edges of $G$ . SBal is additionally parameterized by the number of load items $m\in\operatorname{\mathbb{N}}^{+}$ allocated in each round. ABal allocates only one new load item per step.

Synchronous Processes

The synchronous process $\textsc{SBal}(\mathcal{D}(G),\beta,m)$ works as follows. The process first allocates $m$ items to randomly chosen nodes. Then it uses the matching distribution $\mathcal{D}(G)$ to determine the matching which is applied. Finally it balances the load over the edges of the matching (see Process $\textsc{Bal}(\mathbf{m},\beta)$ described below). The parameter $\beta\in(0,1]$ controls the fraction of the load difference that is sent over an edge in a step.

For the synchronous process SBal we consider two families of matching distributions, random matchings ( $\mathcal{D}_{\textsc{RM}}(G)$ ) and balancing circuits ( $\mathcal{D}_{\textsc{BC}}(G)$ ). $\mathcal{D}_{\textsc{RM}}(G)$ is generated according to the following method described in [25]. First an edge set $S$ is formed by including each edge with probability $1/(4d)-1/(16d^{2})=\Theta(1/d)$ , independently from all other edges. Then a linear-sized matching $\mathbf{M}(t)\subseteq S$ is computed locally. We will use capital $\mathbf{M}$ for randomly chosen matchings. The analysis for the random matching model can be found in Section 3. In the balancing circuit model we assume $G$ is covered by $\zeta$ fixed matchings $\mathbf{m}(1),\ldots,\mathbf{m}(\zeta)$ . $\mathcal{D}_{\textsc{BC}}(G)$ deterministically chooses matchings in periodic manner such that in step $t$ the matching $\mathbf{m}(t)=\mathbf{m}(t\bmod\zeta)$ is chosen. We will use small $\mathbf{m}$ for deterministically chosen matchings. The analysis for the balancing circuit model can be found in Section 4.

Asynchronous Process

The asynchronous process $\textsc{ABal}(\mathcal{D}(G),\beta)$ works as follows. The process first uses $\mathcal{D}(G)$ to generate a matching, this time containing one edge only. The distribution we consider, $\mathcal{D}_{\textsc{A}}(G)$ , first chooses a node $i$ uniformly at random and then it chooses one of the nodes’ edges $(i,j)$ uniformly at random. Finally one new token is assigned to either node $i$ or $j$ and then the edge $(i,j)$ is used for balancing (see $\textsc{Bal}(\mathbf{m},\beta)$ ). Note that for $\textsc{ABal}(\mathcal{D}_{\textsc{A}}(G),\beta)$ the load allocation heavily depends on the edges which are used for balancing. This makes the analysis for this model quite challenging. In contrast, in $\textsc{SBal}(\mathcal{D}_{\textsc{A}}(G),\beta,m)$ the load allocation and the balancing are independent. Note that in the case of $d$ -regular graphs $\mathcal{D}_{\textsc{A}}(G)$ is equivalent to the uniform distribution over all edges or to choosing a random matching of size one. We analyze the asynchronous model in Section 5.

$\textsc{SBal}(\mathcal{D}(G),\beta,m)$ : In each round $t\in\operatorname{\mathbb{N}}^{+}$ :

Allocate $m$ discrete, unit-sized load items to the nodes uniformly and independently at random. Define $\ell_{i}(t)$ as the number of tokens assigned to node $i$ .

Sample a matching $\mathbf{M}(t)$ according to $\mathcal{D}(G)$ .

Balance with $\textsc{Bal}(\mathbf{M}(t),\beta)$ applied to $X_{i}(t):=X_{i}(t)+\ell_{i}(t)$ , $i\in\{1,\ldots n\}$ .

$\textsc{ABal}(\mathcal{D}(G),\beta)$ : In each round $t\in\operatorname{\mathbb{N}}^{+}$ :

Select an edge $\{i,j\}$ according to $\mathcal{D}(G)$ .

Allocate a single unit-size load item to either node $i$ or $j$ with a probability of $1/2$ .

I.e., with prob. $1/2$ set $\ell_{i}(t)=1$ and $\ell_{k}=0$ for all $k\neq i$ , otherwise set $\ell_{j}(t)=1$ and $\ell_{k}=0$ for all $k\neq j$ .

Balance with $\textsc{Bal}(\mathbf{M}(t),\beta)$ applied to $X_{i}(t):=X_{i}(t)+\ell_{i}(t)$ , where $\mathbf{M}(t)$ includes just the edge $\{i,j\}$ .

$\textsc{Bal}(\mathbf{m},\beta)$ : For each edge $\{i,j\}$ in the matching $\mathbf{m}$ balance loads of $i$ and $j$ :

Assume w.l.o.g. that $X_{i}(t)\geq X_{j}(t)$ .

Let $p=\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}-\mathopen{}\mathclose{{}\left\lfloor\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rfloor$ .

Then, node $i$ sends $L_{i,j}$ load items to node $j$ where

$L_{i,j}\coloneqq\begin{cases}\mathopen{}\mathclose{{}\left\lceil\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rceil,&\text{with probability }p,\\[5.0pt] \mathopen{}\mathclose{{}\left\lfloor\frac{\beta\cdot(X_{i}(t)-X_{j}(t))}{2}}\right\rfloor,&\text{with probability }1-p.\end{cases}$

In the idealized setting, where the load is continuously divisible, a load of ${\beta(X_{i}(t)-X_{j}(t))}/{2}$ is sent from node $i$ to node $j$ .

2.1 Notation

We are given an arbitrary graph $G=(V,E)$ with $n$ nodes. We mainly assume that $G$ is regular and write $d$ for the node degree. Recall that the process is modeled by a Markov chain $(\vec{X}(t))_{t\in\operatorname{\mathbb{N}}}$ , where $\vec{X}(t)=(X_{i}(t))_{i\in[n]}\in\operatorname{\mathbb{R}}^{n}$ is the load vector at the end of step $t$ , and $X_{i}(t)$ is the load of node $i$ at time $t$ . We write $\ell_{i}(t)$ for the number of load items allocated to node $i$ in step $t$ and define $\vec{\ell}(t)=(\ell_{i}(t))_{i\in[n]}$ . We will use upper case letters such as $X_{i}(t)$ and $\mathbf{M}(t)$ to denote random variables and random matrices and lower case letters (like $x_{i}(t)$ , $\mathbf{m}(t)$ ) for fixed outcomes. If clear from the context we will omit $t$ from a random variable.

We model the idealized balancing step in round $t$ by multiplication with a matrix $\mathbf{M}^{\beta}(t)\in\operatorname{\mathbb{R}}^{n\times n}$ given by

[TABLE]

We will omit the parameter $\beta$ if it is clear from context. With slight abuse of notation we use the same symbol $\mathbf{M}(t)$ for the matching itself and the associated balancing matrix and refer to both as just “matchings”. Furthermore, we write $E(\mathbf{M}(t))$ for their edges. For the product of all matching matrices from time $t_{1}$ to time $t_{2}$ we write

[TABLE]

where for $t_{1}>t_{2}$ we consider this to be the identity matrix. We generally refer to these matrices as mixing matrices. Moreover, we write $\mathbf{M}^{[t]}$ for the sequence of matching matrices $(\mathbf{M}(\tau))_{\tau\in[t]}$ and analogously $\mathbf{m}^{[t]}$ for a fixed sequence of matching matrices $(\mathbf{m}(\tau))_{\tau\in[t]}$ . We will write $\mathbf{M}_{k,\cdot}$ for the vector forming the $k$ th row of the matrix $\mathbf{M}$ (which we often treat as a column vector despite it being a row).

In the balancing circuit model we define the round matrix $\mathbf{R}\coloneqq\mathbf{m}^{[1,\zeta]}$ as the product of the matching matrices forming a complete period of the balancing circuit. Note that $\zeta$ has no relation to the minimum or maximum degree, although we may assume w.l.o.g. that each edge is covered by at least one of the matchings. We write $\operatorname{\lambda}(\mathbf{R})$ for the spectral gap of the round matrix $\mathbf{R}$ , i.e., for the difference between the largest two eigenvalues of $\mathbf{R}$ .

We write $\vec{\varepsilon}(t)\in\operatorname{\mathbb{R}}^{n}$ for the vector of additive rounding errors in round $t$ . Then $\varepsilon_{k}(t)$ is the difference between the load at node $k$ after step $t$ and the load at node $k$ after step $t$ in an idealized scheme where loads are arbitrarily divisible.

Putting all of this together we can express the load vector at the end of step $t\in\operatorname{\mathbb{N}}^{+}$ as

[TABLE]

We write $\operatorname{t_{\mathrm{hit}}(G)}$ for the hitting time of $G$ , which is the maximum expected time it takes for a standard random walk on $G$ (i.e., the walk moves to a neighbor chosen uniformly at random in each step) to reach a given node $i$ from a given node $j$ , with the maximum taken over all such pairs of nodes. We write $\operatorname{t^{*}_{\mathrm{hit}}(G)}$ for the edge hitting time of $G$ , which is defined like the hitting time, except that the maximum is taken over adjacent nodes only. We write $\operatorname{\mathbf{L}}(G)$ for the normalized Laplacian matrix of a graph $G$ . For regular graphs it may be defined as $\operatorname{\mathbf{L}}(G)\coloneqq\operatorname{\mathbf{I}}-\operatorname{\mathbf{A}}(G)/d$ , where $\operatorname{\mathbf{A}}(G)$ is the adjacency matrix of $G$ . Writing $\lambda_{0}\leq\lambda_{1}\leq\ldots\leq\lambda_{n-1}$ for the real eigenvalues of $\operatorname{\mathbf{L}}(G)$ , we let $\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\coloneqq\lambda_{1}-\lambda_{0}$ be the spectral gap of the Laplacian of $G$ .

3 Random Matching Model

In this section we analyze the process $\textsc{SBal}(\mathcal{D}_{\textsc{RM}}(G),\beta,m)$ for $d$ -regular graphs $G$ , where the matching distribution $\mathcal{D}_{\textsc{RM}}(G)$ is generated by the algorithm given in [25]. Note that the result (as well as the results for the two other models) holds at any point of time $t$ if the system is initially empty. Furthermore, we can show the same results in the idealized setting where load items can be divided into arbitrarily small pieces (see [4]). For more details we refer the reader to the paragraph directly after Eq. 3.

Theorem 3.1.

Let $G$ be a $d$ -regular graph and define $T(G)\coloneqq\min\Big{\{}\frac{\operatorname{t_{\mathrm{hit}}(G)}}{n}\cdot\log(n),\sqrt{\frac{d}{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}},\frac{1}{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}\Big{\}}$ . Let $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{RM}}(G),\beta,m)$ at time $t$ with $\operatorname{\mathrm{disc}}(\vec{X}(0))\eqqcolon K\geq 1$ . There exists a constant $c>0$ such that for all $t\geq c\cdot\log(K\cdot n)/({\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\beta})$ it holds w.h.p.111The expression with high probability (w.h.p.) denotes a probability of at least $1-n^{-\Omega(1)}$ . and in expectation

[TABLE]

Proof.

We first expand the recurrence of Eq. 1 (cf. [43]). After one step we get

[TABLE]

We repeatedly expand this form up to the beginning of the process and get

[TABLE]

We write $\vec{I}(t)$ , $\vec{D}(t)$ , and $\vec{R}(t)$ for the three terms as indicated. Note that in general these terms are vectors of real numbers. The sum $\vec{I}(t)+\vec{D}(t)$ can be regarded as the contribution of an idealized process, where $\vec{I}(t)$ is the contribution of the initial load and $\vec{D}(t)$ is the contribution of the dynamically allocated load. Thus, $\vec{R}(t)$ is the deviation between the idealized process without rounding and the discrete process described in Section 2.

To bound the discrepancy $\operatorname{\mathrm{disc}}(\vec{X}(t))$ of the load vector $\vec{X}(t)$ at time $t$ we use the fact that the discrepancy is sub-additive such that $\operatorname{\mathrm{disc}}(\vec{x}+\vec{y})\leq\operatorname{\mathrm{disc}}(\vec{x})+\operatorname{\mathrm{disc}}(\vec{y})$ (see B.1 in Appendix B). Hence, to bound $\operatorname{\mathrm{disc}}(\vec{X}(t))$ we individually bound the discrepancies of the three terms in Eq. 2 and get

[TABLE]

If the system is initially empty, then $\operatorname{\mathrm{disc}}(\vec{I}(t))=0$ . Moreover, in the idealized setting without rounding $\operatorname{\mathrm{disc}}(\vec{R}(t))=0$ . Techniques to bound the first term $\operatorname{\mathrm{disc}}(\vec{I}(t))$ and the last term $\operatorname{\mathrm{disc}}(\vec{R}(t))$ are well-established. We state the corresponding results in LABEL:lem:initial:load:vanishes and LABEL:lem:rounding:errors:are:small directly below the proof of our theorem. The main part of the proof is to bound $\operatorname{\mathrm{disc}}(\vec{D}(t))$ , which will be done in Section 3.1.

Let now $\gamma>1$ . First, it follows from LABEL:lem:initial:load:vanishes that for all $t\geq c\cdot\log(K\cdot n)/({\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\beta})$ we have $\operatorname{\mathrm{disc}}(\vec{I}(t))\leq 1$ with probability at least $1-n^{-\gamma}$ . Second, it follows from Lemma 3.4 that $\operatorname{\mathrm{disc}}(\vec{R}(t))\leq 2\sqrt{\gamma\log(n)/\beta}$ with probability at least $1-3\cdot n^{-\gamma+1}$ . Third, it follows from LABEL:lem:rounding:errors:are:small that

[TABLE]

with probability at least $1-2\cdot n^{-\gamma+1}$ . The statement of the theorem therefore follows from a union bound over the statements of LABEL:lem:initial:load:vanishes, LABEL:lem:rounding:errors:are:small, and Lemma 3.4. The bound on expectation follows analogously from the linearity of expectation and the bounds on the expected discrepancies in the aforementioned lemmas. ∎

Intuitively, LABEL:lem:initial:load:vanishes states that the contribution of the initial load to the discrepancy is insignificant if $t$ is large enough. We generalize the analysis of Theorem 1 [43] (or Theorem 2.9 in [44]) to establish a bound on the discrepancy of the initial load as a function of $\beta$ . For the sake of completeness the proof of LABEL:lem:initial:load:vanishes is given in Section B.1.

Lemma 3.2 (name=Memorylessness Property,restate=restateInitialLoadVanishes,label=lem:initial:load:vanishes).

Let $G$ be a $d$ -regular graph. Let $K=\operatorname{\mathrm{disc}}(\vec{X}(0))$ . Then there exists a constant $c>0$ such that for all $\gamma>0$ and $t\in\operatorname{\mathbb{N}}$ with $t\geq t_{0}(\gamma)\coloneqq c\cdot\max\mathopen{}\mathclose{{}\left\{\gamma\log(n),\log(K\cdot n)}\right\}\cdot\smash[b]{\frac{1}{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\beta}}$ we get with probability at least $1-n^{-\gamma}$ and in expectation

[TABLE]

The next lemma bounds $\operatorname{\mathrm{disc}}(\vec{R}(t))$ , the discrepancy contribution of cumulative rounding errors. Note that this result does not just hold for the random matching model, but for all the three models that we consider in this paper. In the proof of the lemma we extend then results of Theorem 3.6 in [44] (which is based on work in [8]) to establish a bound as a function of $\beta$ . The proof is given in Section B.2.

Lemma 3.3 (name=Insignificance of Rounding Errors,restate=restateRoundingErrorsAreSmall,label=lem:rounding:errors:are:small).

Let $G$ be an arbitrary graph. Then for all $\gamma>1$ , $t\in\operatorname{\mathbb{N}}$ , and $k\in[n]$ we get with probability at least $1-2n^{-\gamma+1}$ and in expectation

[TABLE]

To bound $\operatorname{\mathrm{disc}}(\vec{D}(t))$ , the discrepancy contribution of dynamically allocated load items we apply the next lemma. It is in fact the core of our work. We prove it in Section 3.1.

Lemma 3.4 (Contribution of Dynamically Allocated Load).

Let $G$ be a $d$ -regular graph. Define $T(G)\coloneqq\min\mathopen{}\mathclose{{}\left\{\operatorname{t_{\mathrm{hit}}(G)}\cdot\log n/{n},\sqrt{d/{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}},1/{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}}\right\}$ . Then for all $\gamma>1$ and $t\in\operatorname{\mathbb{N}}$ we get with probability at least $1-3n^{-\gamma+1}$ and in expectation

[TABLE]

3.1 Bounding the Contribution of Dynamically Allocated Load

In this section we prove Lemma 3.4. Some of the proofs are omitted and can be found in Section B.3. As a first step, we bound $\operatorname{\mathrm{disc}}(\vec{D}(t))$ using the global divergence $\Upsilon(\mathbf{M}^{[t]})$ , which is defined over a sequence of matching matrices $\mathbf{M}^{[t]}$ as

[TABLE]

The global divergence can be regarded as a measure of the convergence speed of a random walk that uses the matching matrices as transition probabilities. In [23, 44, 8] the authors use a related notion which they call the local $p$ -divergence, also defined on a sequence of matchings $\mathbf{m}^{[t]}$ . The difference lies in the fact that the global divergence, essentially, measures differences between nodes’ values and a global average, while the local divergence measures differences between neighboring nodes. To show Lemma 3.4 we first observe the following.

Observation 3.5.

It holds that $\operatorname{\mathrm{disc}}(\vec{D}(t))\leq 2\cdot\max_{k\in[n]}\lvert D_{k}(t)-t\cdot m/n\rvert$ .

Next we consider a fixed node $k$ and show a concentration inequality on $D_{k}(t)$ in terms of $\Upsilon_{k}(\mathbf{m}^{[t]})$ , where $\mathbf{m}^{[t]}$ is the sequence of matchings applied by our process (Lemma 3.6). Note that in the lemma we assume the matchings are fixed and the randomness is due to the random load placement only. Hence, the lemma directly applies to $\mathcal{D}_{\textsc{BC}}(G)$ . Afterwards, we bound the global divergence of the random sequence of matchings, $\Upsilon_{k}(\mathbf{M}^{[t]})$ in terms of a notion of “goodness” of the used matching distribution $\mathcal{D}$ , for the random sequence of matchings (LABEL:lem:glob:div:bound:drift), and then bound the “goodness” of the distribution $\mathcal{D}_{\textsc{RM}}(G)$ used in the random matching model (Lemma 3.10). We start with a bound on the deviation of $D_{k}(t)$ from the average load $t\cdot m/n$ in terms of $\Upsilon(\mathbf{m}^{[t]})$ .

Lemma 3.6 (Load Concentration).

Let $\mathbf{m}^{[t]}$ be an arbitrary sequence of matchings. Then for all $\gamma>0$ , $t\in\operatorname{\mathbb{N}}$ , and $k\in[n]$ we get with probability at most $2\cdot n^{-\gamma}$

[TABLE]

Proof.

Our goal is to decompose $D_{k}(t)$ into a sum of independent random variables. Recall that we assume that the matching matrices are fixed and all randomness is due to the random choices of the load items. This will enable us to apply a concentration inequality to this sum. For the decomposition observe that $\vec{D}(t)=\sum_{\tau=1}^{t}\mathbf{m}^{[\tau,t]}\cdot\vec{\ell}(\tau),$ where $\vec{\ell}(\tau)$ is the random load vector corresponding to the $m$ load items allocated at time $\tau$ . So the $k$ th coordinate of $\vec{D}(t)$ is $D_{k}(t)=\sum_{\tau=1}^{t}\sum_{w\in[n]}\mathbf{m}^{[\tau,t]}_{k,w}\cdot\ell_{w}(\tau).$ We define the indicator random variable ${B}({\tau,j,w})$ for $\tau\in[t],j\in[m]$ and $w\in[n]$ as

[TABLE]

Note that for fixed $\tau$ and $j$ we have $\sum_{w\in[n]}{B}({\tau,j,w})=1$ , ${\operatorname{\mathbb{P}}}\mathopen{}\mathclose{{}\left[{B}({\tau,j,w})=1}\right]=1/n$ and $\operatorname{\mathbb{E}}[{B}({\tau,j,w})]=1/n$ . Observe that $\ell_{w}(\tau)$ , the load allocated to node $w$ at step $\tau$ , can be expressed as $\sum_{j\in[m]}{B}({\tau,j,w})$ . Merging this with the value of $D_{k}(t)$ gives

[TABLE]

For a fixed $\tau\in[t]$ and $j\in[m]$ we define ${C}_{k}{(\tau,j)}\coloneqq\sum_{w\in[n]}\mathbf{m}^{[\tau,t]}_{k,w}\cdot{B}({\tau,j,w})$ . This random variable measures the contribution of $j$ -th load item of round $\tau$ to $D_{k}(t)$ . Note that the load items are allocated independently from each other. Since $\mathbf{m}^{[\tau,t]}$ are fixed matrices, then ${C}_{k}{(\tau,j)}$ and ${C}_{k}{(\tau^{\prime},j^{\prime})}$ are independent for all $\tau$ and $\tau^{\prime}$ and $j\neq j^{\prime}$ . To apply the concentration inequality from Theorem A.14 we need to show that ${C}_{k}{(\tau,j)}\leq 1$ and compute an upper bound on $\operatorname{\mathrm{Var}}[{C}_{k}{(\tau,j)}]$ . Showing the first condition is easy since exactly one of the indicator random variables ${B}({\tau,j,w})$ is one and $\mathbf{m}^{[\tau,t]}_{k,w}$ has a value between zero and one.

It remains to consider the variance of ${C}_{k}{(\tau,j)}$ . First note that by linearity of expectation

[TABLE]

where the last equality follows form the fact that $\mathbf{m}^{[\tau,k]}$ is doubly stochastic. Now we get

[TABLE]

where we used that for each $\tau$ and each $j$ exactly one of the ${B}({\tau,j,w})$ is one and all others are zero, and each of the $n$ possible cases has uniform probability.

Recall that ${C}_{k}{(\tau,j)}$ and ${C}_{k}{(\tau^{\prime},j^{\prime})}$ are independent for all $\tau,\tau^{\prime}$ and $j\neq j^{\prime}$ . Hence we get

[TABLE]

where the final equality uses the definition of the global divergence $\Upsilon_{k}(\mathbf{m}^{[t]})$ . Applying Theorem A.14 with $M=1$ and $X=D_{k}(t)=\sum_{\tau=1}^{t}\sum_{j\in[m]}{C}_{k}{(\tau,j)}$ with $\lambda=2\gamma\log(n)/3+\Upsilon_{k}(\mathbf{m}^{[t]})\cdot\sqrt{2\gamma m/n}$ results in

[TABLE]

The lower bound can be established using Theorem A.15 (with $a_{i}=0$ and $M=1$ ) instead of Theorem A.14. Via a union bound we get

[TABLE]

To bound the global divergence of the matching sequence used by the process we use two potential functions. The quadratic node potential $\operatorname{\Phi}(\vec{x})$ is given by

[TABLE]

For a set of edges $S$ on the nodes $[n]$ and a vector $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ , the quadratic edge potential is

[TABLE]

We may also write $\operatorname{\Psi}_{G}\coloneqq\operatorname{\Psi}_{E(G)}$ whenever $G$ is a graph, and $\operatorname{\Psi}_{\mathbf{M}}\coloneqq\operatorname{\Psi}_{E(\mathbf{M})}$ whenever $\mathbf{M}$ is a matching matrix. The following observation relates the drop of node potential to the edge potential in terms of $\beta$ .

Observation 3.7 (name=,label=obs:node_potential_change_exact,restate=restateObsPotentialRelation).

Let $\mathbf{M}^{\beta}$ be a matching matrix with parameter $\beta\in(0,1]$ . Then for any $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ we have $\operatorname{\Phi}(\vec{x})-\operatorname{\Phi}(\mathbf{M}^{\beta}\cdot\vec{x})=\frac{1-(1-\beta)^{2}}{2}\cdot\operatorname{\Psi}_{E(\mathbf{M}^{\beta})}(\vec{x})$ .

We now define a notion of a matching distribution being good. In LABEL:lem:glob:div:bound:drift below we show that the notion is sufficient for showing that matching sequences generated from such distributions have bounded global divergence. Note that the “goodness” of a distribution does not depend on $\beta$ but on graph properties and the random choices with which the matchings are chosen. Hence, we assume $\beta=1$ .

Theorem 3.8.

Assume $G$ is an arbitrary $d$ -regular graph. Let $g\colon\operatorname{\mathbb{R}}_{0}^{+}\to\operatorname{\mathbb{R}}^{+}$ be an increasing function and let $\sigma^{2}>1$ . Then a matching distribution $\mathcal{D}(G)$ is $(g,\sigma^{2})$ -good if the following conditions hold for $\mathbf{M}^{1}\sim\mathcal{D}(G)$ and all stochastic vectors $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ .

$\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})]}\geq g(\operatorname{\Phi}(\vec{x})).$ 2. 2.

${\operatorname{\mathrm{Var}}[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})]}\leq(\sigma^{2}-1)\cdot\mathopen{}\mathclose{{}\left(\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})]}}\right)^{2}.$

It remains to show two results. First, assuming a matching distribution is $(g,\sigma^{2})$ -good, the global divergence of a matching sequence generated by that distribution can be bounded in terms of $g$ and $\sigma$ (LABEL:lem:glob:div:bound:drift). Second, we have to calculate a function $g_{G}$ and the values of $\sigma_{G}$ for which the matching distribution $\mathcal{D}_{\textsc{RM}}(G)$ is $(g_{G},\sigma_{G}^{2})$ -good (see Lemma 3.10).

Lemma 3.9 (name=Global Divergence,label=lem:glob:div:bound:drift,restate=restateLemGlobalDivergence).

Assume $G$ is an arbitrary graph. Let $g\colon\operatorname{\mathbb{R}}_{0}^{+}\to\operatorname{\mathbb{R}}^{+}$ be an increasing function, $\sigma^{2}>1$ , and $\beta\in(0,1]$ . Let $\mathbf{M}^{[t]}=(\mathbf{M}^{\beta}(\tau))_{\tau=1}^{t}$ be an i.i.d. sequence of matching matrices generated by $\mathcal{D}(G)$ and assume $\mathcal{D}(G)$ is a $(g,\sigma^{2})$ -good matching distribution. Then for all $\gamma>0$ and $k\in[n]$ we get with probability at least $1-n^{-\gamma}$

[TABLE]

Lemma 3.10.

Assume $G$ is an arbitrary $d$ -regular graph. Let

[TABLE]

Then $\mathcal{D}_{\textsc{RM}}(G)$ is $(g_{G},\sigma_{G}^{2})$ -good.

Proof.

First, note that the function $g_{G}(x)$ is increasing in $x$ . Applying the first part of LABEL:prop:node_potential_change_statistics (see below) we get that for any vector $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ it holds that

[TABLE]

From the first two statements of LABEL:lem:edge_potential_bounds (stated behind LABEL:lem:edge_potential_bounds) we see that for $\mathbf{M}^{1}\sim\mathcal{D}_{\textsc{RM}}(G)$ and all stochastic vectors $\vec{x}\in\operatorname{\mathbb{R}}^{n}$

[TABLE]

Hence,

[TABLE]

and as a consequence, $\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})]}\geq g_{G}(\operatorname{\Phi}(\vec{x}))$ by the definition of $g_{G}$ .

It remains to check the second condition of Definition 3.8 with our claimed value $\sigma_{G}^{2}$ . Inserting its value as stated in the lemma, the condition requires that

[TABLE]

which is given in the second part of LABEL:prop:node_potential_change_statistics (see below). ∎

In LABEL:prop:node_potential_change_statistics we first relate the drop of $\operatorname{\Phi}$ to the quadratic edge potential $\operatorname{\Psi}$ . In the second part we bound the variance of the potential drop as a function of the edge hitting time.

Lemma 3.11 (label=prop:node_potential_change_statistics,restate=restateLemNodePotentialChangeStatistics).

Let $G$ be a $d$ -regular graph, let $\mathbf{M}^{1}\sim\mathcal{D}_{\textsc{RM}}(G)$ , and let $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ , then

$\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\geq\frac{1}{16d}\cdot\operatorname{\Psi}_{G}(\vec{x}).$ ** 2. 2.

${\operatorname{\mathrm{Var}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\leq(32\cdot(\operatorname{t^{*}_{\mathrm{hit}}(G)}/n)+4)\cdot\mathopen{}\mathclose{{}\left(\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}}\right)^{2}.$ **

In LABEL:lem:edge_potential_bounds we relate the size of the quadratic edge potential $\operatorname{\Psi}_{G}$ to the second-largest eigenvalue of $\operatorname{\mathbf{L}}(G)$ , the effective resistance of $G$ and node potential. To state it, we need some additional definitions. For any two nodes $i$ and $j$ of the graph $G$ $\mathrm{Res}(i,j)$ is the effective resistance (or resistive distance) between $i$ and $j$ in $G$ (for a detailed definition see Section A.1). Furthermore, we write $\mathrm{Res}(G)$ for the resistive diameter of $G$ , i.e., the largest resistive distance between any pair of nodes in $G$ , and write $\mathrm{Res}^{*}(G)$ for the maximum effective resistance between any pair of nodes adjacent in $G$ . I.e., $\mathrm{Res}(G)\coloneqq\max_{i,j\in[n]}\mathrm{Res}(i,j)$ and $\mathrm{Res}^{*}(G)\coloneqq\max_{\{i,j\}\in E(G)}\mathrm{Res}(i,j)$ . The first part of the following lemma was previously shown in [25, 44].

Lemma 3.12 (label=lem:edge_potential_bounds,restate=restateEdgePotentialBounds).

Let $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ , and let $G$ be a connected $d$ -regular graph.

$\operatorname{\Psi}_{G}(\vec{x})\geq d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\operatorname{\Phi}(\vec{x})$ . 2. 2.

If $\vec{x}$ is stochastic, then $\operatorname{\Psi}_{G}(\vec{x})\geq\max\mathopen{}\mathclose{{}\left\{\frac{1}{\mathrm{Res}(G)}\cdot\operatorname{\Phi}(\vec{x})^{2},\frac{4}{27}\cdot\operatorname{\Phi}(\vec{x})^{3}}\right\}$ 3. 3.

$\max_{\{i,j\}\in E(G)}(x_{i}-x_{j})^{2}\leq\mathrm{Res}^{*}(G)\cdot\operatorname{\Psi}_{G}(\vec{x}).$ **

Proof of Lemma 3.4

Proof.

Define $g_{G}(x)=\frac{1}{16d}\cdot\max\mathopen{}\mathclose{{}\left\{d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot x,x^{2}/\mathrm{Res}(G),4x^{3}/27}\right\}$ and let $\sigma_{G}^{2}\coloneqq 32\cdot(\operatorname{t^{*}_{\mathrm{hit}}(G)}/n)+5$ . Then by Lemma 3.10 the matching distribution $\mathcal{D}_{\textsc{RM}}(G)$ is $(g_{G},\sigma_{G}^{2})$ -good. By LABEL:lem:glob:div:bound:drift we have for all $t\in\operatorname{\mathbb{N}}$ , $k\in[n]$

[TABLE]

To bound $\Upsilon_{k}(\mathbf{M}^{[t]})$ we use the following two claims (see Section B.4 for the proof).

*Claim 3.13**.*

It holds that $\displaystyle\int_{0}^{1}{x}/{g_{G}(x)}\,{\mathrm{d}x}={\operatorname{O}}(T(G))$ .

*Claim 3.14**.*

For any $d$ -regular graph $G$ it holds that $\operatorname{t^{*}_{\mathrm{hit}}(G)}/n\geq 1/2$ .

Together we get from 3.13 and 3.14 that with probability at least $1-n^{-(\gamma+1)}$

[TABLE]

Since $\operatorname{t^{*}_{\mathrm{hit}}(G)}={\operatorname{O}}(n^{3})$ (Proposition 10.16 in [32]), $\log(\operatorname{t^{*}_{\mathrm{hit}}(G)}/n)={\operatorname{O}}(\log n)$ , and $\gamma>1$ ,

[TABLE]

Now Lemma 3.6 states that for any fixed sequence of matching matrices $\mathbf{m}^{[t]}$ , with probability at least $1-2n^{-(\gamma+1)}$ it holds that

[TABLE]

Applying a union bound over all $k\in[n]$ , Eq. 4 and Eq. 5 hold for all $k$ with probability at least $1-3n^{-\gamma}$ . Hence, for all $k\in[n]$

[TABLE]

The high-probability bound now follows from 3.5. The corresponding bound on ${\operatorname{\mathbb{E}}[\operatorname{\mathrm{disc}}(\vec{D}(t)]}$ follows readily; see Lemma A.7 in Section A.2 for the details. ∎

4 Balancing Circuit Model

Here we assume $\beta=1$ . Recall that we assume $G$ is covered by $\zeta$ fixed matchings $\mathbf{m}(1),\ldots,\mathbf{m}(\zeta)$ . The matching distribution $\mathcal{D}_{\textsc{BC}}(G)$ then deterministically chooses the matching $\mathbf{m}(t)=\mathbf{m}(t\bmod\zeta)$ in step $t$ . The round matrix is defined as $\mathbf{R}\coloneqq\mathbf{m}^{[1,\zeta]}$ and the mixing matrices are fixed in this model. Thus, for a sequence of matchings $\mathbf{m}^{[t]}$ the global divergence is $\Upsilon(\mathbf{m}^{[t]})\coloneqq\max_{k\in[n]}\sqrt{\sum_{\tau=1}^{t}\mathopen{}\mathclose{{}\left\lVert\mathbf{m}^{[\tau,t]}_{k,\cdot}-1/n}\right\rVert_{2}^{2}}$ . The next theorem provides an upper bound on the discrepancy for this model. Note that the following theorem holds for arbitrary graphs, while Theorem 3.1 only holds for $d$ -regular graphs.

Theorem 4.1.

Let $G$ be an arbitrary graph and $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{BC}}(G),1,m)$ at time $t$ with $\operatorname{\mathrm{disc}}(\vec{X}(0))\eqqcolon K$ . For all $t\in\operatorname{\mathbb{N}}$ with $t\geq\frac{\zeta}{\operatorname{\lambda}{(\mathbf{R})}}\cdot\mathopen{}\mathclose{{}\left(\ln(K\cdot n)}\right)$ it holds w.h.p. and in expectation

[TABLE]

Proof.

The proof follows the same line as the proof Theorem 3.1, which is proved via LABEL:lem:initial:load:vanishes, Lemma 3.4, and LABEL:lem:rounding:errors:are:small bounding $\vec{I}(t),\vec{D}(t)$ , and $\vec{R}(t)$ , respectively. LABEL:lem:initial:load:vanishes is replaced by Lemma 4.2 below. LABEL:lem:initial:load:vanishes can also be applied to the balancing circuit model since it only requires that the subgraph used for balancing is a matching.

It remains to replace LABEL:lem:rounding:errors:are:small. Since the matching matrices are fixed this time the proof is much simpler. The proof of Lemma 3.6 carries to over to this model giving us a bound on $\lvert D_{k}(t)-tm/n\rvert$ for $k\in[n]$ with probability at least $1-2\cdot n^{-\gamma}$ . Applying the union bound over all nodes $k\in[n]$ , together with 3.5 (stating that $\operatorname{\mathrm{disc}}(\vec{D}(t))\leq 2\cdot\max_{k\in[n]}\lvert D_{k}(t)-t\cdot m/n\rvert$ ), gives a bound on $\operatorname{\mathrm{disc}}(\vec{D}(t))$ which holds with probability at least $1-2\cdot n^{\gamma+1}$ . ∎

Lemma 4.2 (Memorylessness Property).

For all $t\in\operatorname{\mathbb{N}}$ with $t\geq{\zeta}/{\operatorname{\lambda}{(\mathbf{R})}}\cdot\mathopen{}\mathclose{{}\left(\ln(K\cdot n)}\right)$ it holds that $\operatorname{\mathrm{disc}}(\vec{I}(t))\leq 2$ .

Proof.

Since $\operatorname{\Phi}(\vec{x})\leq K^{2}\cdot n$ it follows from Lemma 2 in [26] that

[TABLE]

Setting $t\geq(\zeta/\operatorname{\lambda}{(\mathbf{R})})\cdot\mathopen{}\mathclose{{}\left(\ln(Kn)}\right)$ gives $\operatorname{\Phi}\mathopen{}\mathclose{{}\left(\mathbf{m}^{[1,t]}\cdot\vec{x}}\right)\leq 1$ which implies that $\operatorname{\mathrm{disc}}(\vec{I}(t)\leq 2$ . ∎

Note that a similar statement was shown in [43, 44, 8].

The next theorem provides a lower bound on the discrepancy for this model. The proof can be found in Appendix C.

Theorem 4.3.

Let $G$ be an arbitrary graph and $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{BC}}(G),1,m)$ at time $t$ . Then for all $t\in\operatorname{\mathbb{N}}$ and $m\geq 4n\cdot\log(n)/\Upsilon(\mathbf{m}^{[t]})$ it holds with constant probability

[TABLE]

5 Asynchronous Model

The following is our main theorem for the asynchronous model. The bounds provided by Theorem 5.1 for the asynchronous model differ from those in Theorem 3.1 for the random matching model in two details. First, the lower bound on the balancing time is larger by a factor of $n$ . This is due to the fact that the asynchronous model balances across just one edge per round in contrast to $\Theta(n)$ edges in the random matching model. Second, the upper bound on $\operatorname{\mathrm{disc}}(\vec{X}(t))$ is much simpler. Note, however that setting $m=n$ in Theorem 3.1 and further simplifying the result by using $\operatorname{t^{*}_{\mathrm{hit}}(G)}/n=\Omega(1)$ (see also 3.14 in the proof of Lemma 3.4) results in the same asymptotic bound as in Theorem 5.1.

Theorem 5.1.

Let $G$ be a $d$ -regular graph and define $(T(G)\coloneqq\min\Big{\{}\frac{\operatorname{t_{\mathrm{hit}}(G)}}{n}\cdot\log(n),\sqrt{\frac{d}{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}},\frac{1}{\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}\Big{\}}$ . Let $\vec{X}(t)$ be the state of process $\textsc{ABal}(\mathcal{D}_{\textsc{A}}(G),\beta)$ at time $t$ with $\operatorname{\mathrm{disc}}(\vec{X}(0))\eqqcolon K\geq 1$ . There exists a constant $c>0$ such that for all $t\geq c\cdot n\cdot\log(K\cdot n)/(\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\beta)$ it holds w.h.p. and in expectation

[TABLE]

Proof Sketch of Theorem 5.1.

The proof of the theorem follows along the same lines at the proof of Theorem 3.1. However, there are some major differences. Most importantly, the proof of Lemma 3.6 (giving a concentration bound on $D_{k}(t)$ in terms of the global divergence of the sequence of matching matrices) can not be applied for ABal. The proof heavily relies on the fact that the load allocation and the matching edges are chosen independently from each other, which is certainly not the case for ABal. Our new lemma (Lemma D.1 in Appendix D) carefully analyses the dependency, and it uses a stronger concentration inequality. In addition, we also have to re-calculate the function $g_{G}$ and $\sigma_{G}$ to show that the matching distribution used by $\mathcal{D}_{\textsc{A}}$ is $(g_{G},\sigma_{G}^{2})$ -good (see Lemma D.2 in Appendix D). ∎

6 Drift Result

In our analysis we use the following tail bound for the sum of a non-increasing sequence of random variables with variable negative drift. The proof uses established methods from drift analysis. In particular, it relies one techniques found in the proof of the Variable Drift Theorem in [31]. The full technical proof can be found in Appendix E.

Theorem 6.1 (name=,restate=restateLemDrift,label=lem:drift).

Let $(X(t))_{t\geq 0}$ be a non-increasing sequence of discrete random variables with $X(t)\in\operatorname{\mathbb{R}}^{+}_{0}$ for all $t$ with fixed $X(0)=x_{0}$ . Assume there exists an increasing function $h\colon\operatorname{\mathbb{R}}^{+}_{0}\to\operatorname{\mathbb{R}}^{+}$ and a constant $\sigma>0$ such that the following holds. For all $t\in\operatorname{\mathbb{N}}$ and all $x>0$ with ${\operatorname{\mathbb{P}}[X(t)=x]}>0$

${\operatorname{\mathbb{E}}[X(t+1)\mid X(t)=x]}\leq x-h(x),$ ** 2. 2.

${\operatorname{\mathrm{Var}}[X(t+1)\mid X(t)=x]}\leq\sigma\cdot\mathopen{}\mathclose{{}\left({\operatorname{\mathbb{E}}[X(t+1)\mid X(t)=x]}-x}\right)^{2}.$ **

Then the following statements hold.

For all $\delta\in(0,1)$ and any arbitrary but fixed $t$

[TABLE] 2. 2.

For all $\delta\in(0,1)$ and $p\in(0,1)$ we define $t_{0}\coloneqq\frac{2(\sigma+1)}{\delta^{2}}\mathopen{}\mathclose{{}\left(-\log(p)+\log\mathopen{}\mathclose{{}\left(\frac{2(\sigma+1)}{\delta^{2}}}\right)}\right)$ . Then

[TABLE]

7 Conclusions and Open Problems

In this paper we analyze discrete load balancing processes on graphs. As our main contribution we bound the discrepancy that arises in dynamic load balancing in three models, the random matching model, the balancing circuit model, and the asynchronous model. Our results for the random matching model and the asynchronous model hold for $d$ -regular graphs, while our analysis for the balancing circuit model applies to arbitrary graphs.

To the best of our knowledge our results constitute the first bounds for discrete, dynamic balancing processes on graphs. Furthermore, our results improve the work by Alistarh et al. [4] who prove that the expected discrepancy is bounded by $\sqrt{n}\log(n)$ in the (arguably simpler) continuous asynchronous process $\textsc{ABal}^{\text{(cont)}}(\mathcal{D}_{\textsc{A}}(G),1)$ . We improve their bound to $\sqrt{n\log(n)}$ and additionally show that it holds with high probability. We conjecture that our results are tight up to polylogarithmic factors. However, showing tight upper and lower bounds remains an open problem.

Results for Specific Graph Classes

We show an overview of our bounds on the discrepancy for specific graph classes in Table 1. The corresponding results are formally derived in Section B.5 for the random matching model, Section C.1 for the balancing circuit model, and Section D.1 for the asynchronous model.

Open Problems

We are confident that our results carry over to arbitrary graphs (as opposed to regular graphs), provided that there exists a lower bound on the probability $p_{min}$ with which an edge is used for balancing. However, to show bounds on the discrepancy one has to overcome fundamental problems such as the bias introduced by high-degree nodes. Another interesting open question is whether the results carry over to a model where the amount of load that may transmitted over an edge in each step is bounded by a constant. If only a single load item can be transferred per edge and step the problem is similar to the token distribution problem (see, for example, [7]).

Finally, we believe that one can also adapt our analysis to variant of a graphical balls-into-bins process. The process works as follows. In each step an edge $(i,j)$ is sampled uniformly at random. W.l.o.g. assume that the load of $i$ is smaller than the load of $j$ by an additive term $\Delta$ . Then a biased coin is tossed showing heads with probability $p\coloneqq\min\{1,(1+\beta\cdot\Delta)/2\}$ and tails otherwise, where $\beta$ is a suitably chosen and non-constant parameter. If the coin hits heads one item is allocated to $i$ and otherwise to $j$ . A formal analysis of this allocation process (as well as of other, related balls-into-bins processes) is beyond the scope of our paper and remains an open problem.

Appendix A Auxiliary Results

A.1 Random Walks, Hitting Times, and Effective Resistance

In this appendix we present for completeness fundamental definitions and relations concerning random walks, hitting times, and the effective resistance. We start with a definition of the effective resistance of a network in Definition A.1. For a motivation of the definition see [32, Chapter 9]. Further details and properties can also be found in [19] and [34, Section 4].

Theorem A.1 (Harmonic Functions and Effective Resistance).

Let $G$ be a graph and let $i,j\in[n]$ be nodes of the graph. Then a harmonic function on $G$ with the poles $i$ and $j$ (for unit edge weights) is a function $f:[n]\to\operatorname{\mathbb{R}}$ such that for all $k\in[n]\setminus\{i,j\}$ we have $f(k)=\frac{1}{d(k)}\cdot\sum_{l\in N_{G}(k)}f(l)$ , where $N_{G}(k)$ is the set of $k$ ’s neighbors in $G$ .

Given a harmonic function $f$ on $G$ with the poles $i$ and $j$ (with arbitrary boundary values $f(i)\neq f(j)$ ), the effective resistance (or resistive distance between $i$ and $j$ in $G$ is given by

[TABLE]

Note that the value is not dependent on the boundary values of the harmonic function.

Note that for boundary values $f(i)$ and $f(j)$ the harmonic function is unique [32, Proposition 9.1].

The following is a well-known property of effective resistances; it is a direct consequence of, e.g., Corollary 9.13 in [32].

Lemma A.2.

Let $G$ be a graph, and write $\mathrm{d}(i,j)$ for the (standard) distance between $i$ and $j$ in $G$ . Then $\mathrm{Res}(i,j)\leq d(i,j)$ .

For a graph $G$ , and nodes $i,j\in V(G)$ , let $H(i,j)$ be the hitting time from $i$ to $j$ , i.e., the expected time for a random walk on $G$ starting at $i$ to reach $j$ for the first time.

Theorem A.3 (Theorem 4.1 (i) in [34]).

Let $G$ be a graph. Then for any $i,j\in V(G)$ ,

[TABLE]

Corollary A.4.

Let $G$ be a graph. Then for any $i,j\in V(G)$ ,

[TABLE]

Proof.

For the first inequality, since one of $H(i,j)$ and $H(j,i)$ is at least the maximum of the two, we have, by Theorem A.3:

[TABLE]

And for the second inequality, since both $H(i,j)$ and $H(j,i)$ are at most the maximum of the two, we have, again by Theorem A.3

[TABLE]

as claimed. ∎

Theorem A.5 (Dirichlet’s principle, see Exercise 2.13 in [35]; or Exercise 9.9 in [32], referencing Theorem 6.1 in [33]).

Let $u,v$ be distinct nodes of a graph $G$ . Then

[TABLE]

Theorem A.6 (Corollary 3.3 in [34], applied to $d$ -regular graphs).

Let $G$ be an arbitrary graph on $n$ nodes. Then

[TABLE]

A.2 Tail Bounds

The following lemma allows us to turn a high-probability bound into a bound on the expected value. We consider this result folklore. For completeness we give a formal proof below.

Lemma A.7.

Let $X$ be a non-negative real random variable, and let $n\in\operatorname{\mathbb{N}}$ . Then if there are $c,C>0$ such that for all $\gamma>0$ ,

[TABLE]

then

[TABLE]

Proof.

Observe that when $x=(\gamma+1)C$ we have $\gamma=\frac{x}{C}-1$ , so that for all $x\geq C$ we have

[TABLE]

Thus,

[TABLE]

as claimed. ∎

Theorem A.8 (Bhatia-Davis inequality [14]).

Let $X$ be a real random variable with $X\in[m,M]$ . Then ${\operatorname{\mathrm{Var}}[X]}\leq(M-{\operatorname{\mathbb{E}}[X]})({\operatorname{\mathbb{E}}[X]}-m).$

Theorem A.9 (Azuma–Hoeffding inequality Theorem 13.6 in [38]).

Let $(X(t))_{t=0}^{n}$ be a martingale associated with the filter $(\mathcal{F}(t))_{t=0}^{n}$ , where there exist non-negative sequences $(a_{t})_{t=1}^{n}$ , $(b_{t})_{t=1}^{n}$ and $(\sigma_{t})_{t=1}^{n}$ such that for all $t\in[n]$ ,

[TABLE]

Then for all $\varepsilon>0$ ,

[TABLE]

Theorem A.10 (Adapted from Theorem 6.6 in [17]).

Let $(X(t))_{t=0}^{n}$ be a martingale associated with the filter $(\mathcal{F}(t))_{t=0}^{n}$ , where there exist $(a_{t})_{t=1}^{n}$ and $(\sigma_{t})_{t=1}^{n}$ such that for all $t\in[n]$ ,

$X(t)-X(t-1)\geq a_{t}$ ; 2. 2.

${\operatorname{\mathrm{Var}}[X(t)\mid\mathcal{F}(t-1)]}\leq\sigma_{t}^{2}$ .

Then for all $\varepsilon>0$ ,

[TABLE]

Theorem A.11 (Adapted from Theorem 2.1 and combined with Remark 2.1 and Equation 18 in [21]).

Let $(X(t))_{t=0}^{n}$ be a supermartingale associated with the filter $(\mathcal{F}(t))_{t=0}^{n}$ , where $X(t)-X(t-1)\leq 1$ for all $t\in[n]$ . Let $\langle X\rangle$ be the quadratic characteristic of $X$ , i.e., let

[TABLE]

Then, for any $\varepsilon\geq 0$ and $\sigma>0$ ,

[TABLE]

Corollary A.12.

Let $(X(t))_{t=0}^{n}$ be a martingale associated with the filter $(\mathcal{F}(t))_{t=0}^{n}$ , where $\lvert X(t)-X(t-1)\rvert\leq 1$ for all $t\in[n]$ . Then with $\langle X\rangle$ as in Theorem A.11, for any $\varepsilon\geq 0$ and $\sigma>0$ ,

[TABLE]

Proof.

As $(X(t))_{t=0}^{n}$ is a martingale, it is also a supermartingale, and it fulfills the conditions of Theorem A.11 by the assumptions of the claim. So way may use Theorem A.11 to see that

[TABLE]

As ${\operatorname{\mathbb{P}}[A]}\leq{\operatorname{\mathbb{P}}[(A\wedge B)\vee B]}\leq{\operatorname{\mathbb{P}}[A\wedge B]}+{\operatorname{\mathbb{P}}[B]},$ this implies that

[TABLE]

The claim follows from applying the same argument to the supermartingale $(-X(t))_{t=0}^{n}$ and a union bound. ∎

Theorem A.13 (Berry-Esseen Theorem [13, 20] for Non-identical Random Variables).

Let $Y_{1},Y_{2},\cdots,Y_{k}$ be independently distributed with $\operatorname{\mathbb{E}}[Y_{i}]=0$ , $\operatorname{\mathbb{E}}[Y_{i}^{2}]=\operatorname{\mathrm{Var}}[Y_{i}]=\sigma_{i}^{2}$ and $\operatorname{\mathbb{E}}[|Y_{i}|^{3}]=\rho_{i}<\infty$ . If $F_{k}(x)$ is the distribution of $\frac{Y_{1}+Y_{2}+\cdots+Y_{k}}{\sqrt{\sigma_{1}^{2}+\sigma_{2}^{2}+\cdots+\sigma_{k}^{2}}}$ and $\Phi_{N}(x)$ is the standard normal distribution, then

[TABLE]

where $\psi_{0}=\frac{\sum_{i=1}^{k}\rho_{i}}{\mathopen{}\mathclose{{}\left(\sum_{i=1}^{k}\sigma_{i}^{2}}\right)^{3/2}}$ and $C_{0}$ is a constant.

Theorem A.14 (Theorem 3.4 of [17], [36]).

let $X_{i}$ ( $1\leq i\leq n$ ) be independent random variables satisfying $X_{i}\leq\operatorname{\mathbb{E}}[X_{i}]+M$ , for $1\leq i\leq n$ . We consider the sum $X=\sum_{i=1}^{n}X_{i}$ with expectation $\operatorname{\mathbb{E}}[X]=\sum_{i=1}^{n}\operatorname{\mathbb{E}}[X_{i}]$ and variance $\operatorname{\mathrm{Var}}[X]=\sum_{i=1}^{n}\operatorname{\mathrm{Var}}[X_{i}]$ . Then we have

[TABLE]

Theorem A.15 (Theorem 4.1 of [17]).

Let $X_{i}$ denote independent random variable satisfying $X_{i}\geq\operatorname{\mathbb{E}}[X_{i}]-a_{i}-M$ for $0\leq i\leq n$ . For $X=\sum_{i=1}^{n}X_{i}$ we have

[TABLE]

Appendix B Omitted Proofs from Section 3

In this appendix we present the omitted proofs from Section 3. We first formally prove that the discrepancy is sub-additive.

Observation B.1.

For two vectors $\vec{x},\vec{y}\in\operatorname{\mathbb{R}}^{n}$ ,

[TABLE]

Proof.

For any $\vec{a},\vec{b}\in\operatorname{\mathbb{R}}^{n}$ ,

[TABLE]

and thus

[TABLE]

as claimed. ∎

B.1 Proof of LABEL:lem:initial:load:vanishes

\restateInitialLoadVanishes

Proof.

To bound $\operatorname{\mathrm{disc}}(\vec{I}(t))$ , we use the following claim:

*Claim**.*

If $t\geq t_{0}(0)$ , then ${\operatorname{\mathbb{E}}[\operatorname{\Phi}(\vec{I}(t))]}\leq 1/4,$ and if $t\geq t_{0}(\gamma)$ , then ${\operatorname{\mathbb{P}}[\operatorname{\Phi}(\vec{I}(t))\leq\frac{1}{4}]}\geq 1-n^{-\gamma}.$

First, note that $\max_{i\in[n]}\lvert x_{i}-\overline{x}\rvert\leq\sqrt{\operatorname{\Phi}(\vec{x})}$ by definition of $\operatorname{\Phi}$ . Hence, $\operatorname{\mathrm{disc}}(\vec{x})\leq 2\sqrt{\operatorname{\Phi}(\vec{x})}$ . By the claim, if $t\geq t_{0}(\gamma)$ , then $\operatorname{\Phi}(\vec{I}(t))\leq 1/4$ with probability at least $1-n^{-\gamma}$ , and hence $\operatorname{\mathrm{disc}}(\vec{I}(t))\leq 2\sqrt{\operatorname{\Phi}(\vec{I}(t))}\leq 2\sqrt{1/4}=1$ . Also by the claim, if $t\geq t_{0}(0)$ , then ${\operatorname{\mathbb{E}}[\vec{I}(t)]}\leq 1/4$ , and then by Jensen’s inequality,

[TABLE]

Proof of the claim.

We aim to use the first statement of LABEL:lem:drift on $\operatorname{\Phi}(\vec{I}(t))$ and therefore need to check its preconditions. By the definition of $\vec{I}(t)$ , for all $t\geq 1$ ,

[TABLE]

Entirely analogous to the calculations in the proof of LABEL:lem:glob:div:bound:drift (Eqs. 9 and 10), we have, writing $\vec{V}=\vec{I}(t-1)$ (so that $\vec{I}(t)=\mathbf{M}^{\beta}\cdot\vec{V}$ ),

[TABLE]

and from the latter it immediately follows that for all $\varphi$

[TABLE]

Combining the first statement of LABEL:lem:edge_potential_bounds and the first statement of LABEL:prop:node_potential_change_statistics gives us, for all $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ ,

[TABLE]

so that, for all $\varphi$ ,

[TABLE]

By the second statement of LABEL:prop:node_potential_change_statistics, for all $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ :

[TABLE]

And so,

[TABLE]

So we can now apply LABEL:lem:drift with

[TABLE]

With these values and $\delta=1/2$ , the first statement of LABEL:lem:drift gives us

[TABLE]

The integral evaluates to

[TABLE]

This is at least $t/2$ if and only if

[TABLE]

which follows after rearranging the initial inequality and exponentiation. So

[TABLE]

Now, let $K\coloneqq\operatorname{\mathrm{disc}}(\vec{I}(0))=\operatorname{\mathrm{disc}}(\vec{X}(0))$ . Then in particular, $\operatorname{\Phi}(\vec{I}(0))\leq n\cdot K^{2}$ , so that $\log(\operatorname{\Phi}(\vec{I}(0)))\leq 2\log(K\cdot n)$ . Furthermore, it is the case that $0.5\leq\operatorname{t^{*}_{\mathrm{hit}}(G)}/n\leq 1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))$ (by Theorem A.6) and that $\beta\in(0,1]$ .

Therefore, there is a sufficiently large constant $c>0$ such that if $t\geq t_{0}(\gamma)=c\cdot\max\{\gamma\log(n),\log(K\cdot n)\}/(\beta\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G)))$ , then

[TABLE]

as well as

[TABLE]

From $t\geq\frac{\beta\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}{32}\cdot\log(8\cdot\operatorname{\Phi}(\vec{I}(0)))$ , it follows that

[TABLE]

From $t\geq\max\{\gamma\log(n),\log(\operatorname{\Phi}(\vec{I}(0)))\}\cdot 8(\sigma+1)$ , it follows that

[TABLE]

And so, for $t\geq t_{0}(\gamma)$ , Eq. 7 entails

[TABLE]

which is the remaining claim for the high-probability statement.

For the remaining claim (i.e., the statement concerning the expectation), note that for $t\geq t_{0}(0),$ the calculations above and Eq. 7 entail that

[TABLE]

Hence, as $\operatorname{\Phi}(\vec{I}(\tau))\leq\operatorname{\Phi}(\vec{I}(0))$ for all $\tau\in\operatorname{\mathbb{N}}$ , we have, for all $t\geq t_{0}(0)$ ,

[TABLE]

as claimed. ∎

This concludes the proof of the lemma. ∎

B.2 Proof of LABEL:lem:rounding:errors:are:small

\restateRoundingErrorsAreSmall

The proof is similar to the proof of [44, Theorem 3.4].

Proof.

We show the concentration bound on $\operatorname{\mathrm{disc}}(\vec{R}(t))$ by proving concentration bounds on the absolute values $\lvert R_{k}(t)\rvert$ for each $k\in[n]$ and then applying a union bound over all $k$ . To show the concentration bound on $R_{k}(t)$ holds for any fixed sequence of matchings $\mathbf{m}^{[t]}=(\mathbf{m}^{\beta}(\tau))_{\tau=1}^{t}$ ; this implies a concentration bound on a random sequence of matchings by the law of total probability.

So we fix $\mathbf{m}^{[t]}$ . Recall that

[TABLE]

where $\vec{\varepsilon}(\tau)=(\varepsilon_{k}(t))_{k\in[n]}$ is the vector of additive rounding errors incurred in round $\tau$ : it is the difference between the load vector step $t$ , and what the load vector would be after step $t$ if the balancing in this step were idealized. This additive rounding error stems from the constraint that only whole items can be transferred across the edges $\{i,j\}$ of the matching at time $\tau$ . From the description of the protocol, it is immediate that the rounding errors at matched nodes sum to [math], so that $\varepsilon_{i}(\tau)=-\varepsilon_{j}(\tau)$ for all edges $\{i,j\}\in E(\mathbf{m}(\tau))$ matched in round $\tau$ . Thus,

[TABLE]

We will derive the claimed tail bound on $R_{k}(t)$ by applying the Azuma-Hoeffding inequality (Theorem A.9) to a sequence of partial sums as follows. We sequence the rounding actions with $\tau$ increasing and arbitrarily within rounds. If $i$ is the representative node of the $k$ th edge in round $\tau$ (with $k\in[\lfloor n/2\rfloor]$ and $\tau\in[t]$ ), for $l=(\tau-1)\cdot\lfloor n/2\rfloor+k$ let us write

[TABLE]

and let $Y_{l}=0$ if there are fewer than $k$ edges are in the matching in round $\tau$ . Se sequence of partial sums is then $S_{l}\coloneqq\sum_{a\in[l]}Y_{l}$ , which we consider with respect to the filtration $(\mathcal{F}(l))_{l=0}^{t\cdot\lfloor n/2\rfloor}$ in which $\mathcal{F}(l-1)$ completely determines the state right before the rounding action corresponding to the term $Y_{l}$ . Note that $S_{t\cdot\lfloor n/2\rfloor}=R_{k}(t)$ . To apply Theorem A.9, it is enough to show that the conditional expectation of the difference between successive terms is zero, and that we can bound the differences between terms.

To check these preconditions, let us write $F_{l}$ for the fractional value of the load at node $i$ before the rounding action (i.e., the fractional value of the load $i$ if balancing were idealized and no rounding was necessary). Then the load will be rounded up with probability $F_{l}$ , resulting in a positive rounding error of $\varepsilon_{i}(\tau)=1-F_{l}$ , or rounded down with probability $1-F_{l}$ , resulting in a negative rounding error of $\varepsilon_{i}(\tau)=-F_{l}$ . Hence,

[TABLE]

so that, as required,

[TABLE]

From this description, it is also clear that writing $\delta_{i,j}(\tau)\coloneqq\mathbf{m}^{[\tau+1,t]}_{k,i}-\mathbf{m}^{[\tau+1,t]}_{k,j}$ , the term $Y_{l}$ is bounded from above by $a_{l}\coloneqq\delta_{i,j}(\tau)(1-F_{i}(\tau))$ , and from below by $-b_{l}\coloneqq-\delta_{i,j}(\tau)F_{i}(\tau)$ , so that $a_{l}+b_{l}=\delta_{i,j}(\tau)$ .

So we may apply Theorem A.9; to use it we require (an upper bound on) the value of the sum $\sum_{l=1}^{\tau\cdot\lfloor n/2\rfloor}(a_{l}+b_{l})^{2}$ , which we bound by applying LABEL:obs:node_potential_change_exact and collapsing the ensuing telescoping sum (analogously to the proof of Theorem 3.2 in [44]):

[TABLE]

where $(a)$ follows from the fact that $\beta\in(0,1]$ and therefore, $1-(1-\beta)^{2}\geq\beta$ . So by Theorem A.9 (with $\varepsilon=\sqrt{(\gamma+1)\log(n)/\beta}$ and $\operatorname{\mathbb{E}}[R_{k}(t)]=0$ ) we have

[TABLE]

Since $\operatorname{\mathrm{disc}}(\vec{R}(t))=\max_{k\in[n]}R_{k}(t)-\min_{k\in[n]}R_{k}(t)$ , applying a union bound over all nodes $k\in[n]$ we see that

[TABLE]

which is the claimed concentration bound.

To show the bound on ${\operatorname{\mathbb{E}}[\operatorname{\mathrm{disc}}(\vec{R}(t)]}$ , we apply Lemma A.7 with $X=\operatorname{\mathrm{disc}}(\vec{R}(t))$ , $c=2$ and $C=2\sqrt{\log(n)/{\beta}}$ to see that,

[TABLE]

B.3 Omitted Proofs from Section 3.1

\restateObsPotentialRelation

Proof.

We assume w.l.o.g. that the entries of $\vec{x}$ sum to [math], meaning that $\overline{x}=0$ , so that $\operatorname{\Phi}(\vec{x})=\sum_{i\in[n]}x_{i}^{2}$ . As loads only change at matched nodes, let us investigate the potential change at two matched nodes $i$ and $j$ , where w.l.o.g. $x_{i}\geq x_{j}$ . The amount of load transferred from $i$ to $j$ under idealized balancing (without rounding) is $(x_{i}-x_{j})\cdot\beta/2$ . So with

[TABLE]

the loads before balancing are $x_{i}=a+b$ and $x_{j}=a-b$ , and the loads after idealized balancing are $x_{i}^{\prime}=a+c$ and $x_{j}^{\prime}=a-c$ . So the change of the potential contributions at $i$ and $v$ is

[TABLE]

where we used $(x+y)^{2}+(x-y)^{2}=(x^{2}+2xy+y^{2})+(x^{2}-2xy+y^{2})=2x^{2}+2y^{2}$ . Now,

[TABLE]

Summing this over all edges in the matching gives, as claimed,

[TABLE]

\restateLemGlobalDivergence

Proof.

First recall that

[TABLE]

As the mixing matrices are doubly stochastic, each row is a stochastic vector $\vec{x}$ . By definition of the node potential $\operatorname{\Phi}$ we know

[TABLE]

and hence

[TABLE]

To bound this sum we will apply the second statement of LABEL:lem:drift to the sequence of values $\operatorname{\Phi}(\mathbf{M}^{[\tau,t]})$ for $\tau=t,\dots,1$ . Since the matching matrices $\mathbf{M}^{\beta}(1)\ldots,\mathbf{M}^{\beta}(t)$ are symmetric we get

[TABLE]

By LABEL:obs:node_potential_change_exact with $S=E(\mathbf{M}^{\beta}(\tau))$ defined as the edges of $\mathbf{M}^{\beta}(\tau)$ we get

[TABLE]

This shows that $\operatorname{\Phi}(\mathbf{M}^{[\tau,t]}_{k,\cdot})\leq\operatorname{\Phi}(\mathbf{M}^{[\tau+1,t]}_{k,\cdot})$ for all $\tau$ . Expressing Eq. 8 with Balancing Parameter $1$ and, for the ease of presentation, setting $\vec{V}\coloneqq\mathbf{M}^{[\tau+1,t]}_{k,\cdot}$ gives us

[TABLE]

Since $\beta\leq 1-(1-\beta)^{2}\leq 2\beta$ for $\beta\in(0,1]$ we get

[TABLE]

As $\mathcal{D}(G)$ is $(g,\sigma^{2})$ -good, for any stochastic vector $\vec{v}\in\operatorname{\mathbb{R}}^{n}$ we have ${\operatorname{\mathbb{E}}[\operatorname{\Phi}(\vec{v})-\operatorname{\Phi}(\mathbf{M}^{1}(\tau)\cdot\vec{v})]}\geq g(\operatorname{\Phi}(\vec{v})).$ Combining this with Eq. 9 gives

[TABLE]

And thus,

[TABLE]

Similarly, as $\mathcal{D}(G)$ is $(g,\sigma^{2})$ -good, for any stochastic vector $\vec{v}\in\operatorname{\mathbb{R}}^{n}$ we have

${\operatorname{\mathrm{Var}}[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{v})]}\leq(\sigma^{2}-1)\cdot\mathopen{}\mathclose{{}\left(\operatorname{\Phi}(\vec{v})-{\operatorname{\mathbb{E}}[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{v})]}}\right)^{2}.$ Combining this with Eq. 10 gives us

[TABLE]

and thus

[TABLE]

We apply the second statement of LABEL:lem:drift with $p=n^{-\gamma}$ , $\delta=0.5$ , and $h(x)\coloneqq\beta\cdot g(x)$ , which is an increasing function as $g$ is increasing by the definition of $(g,\sigma^{2})$ -good, and get

[TABLE]

where $t_{0}=8\sigma^{2}(\gamma\log(n)+\log(8\sigma^{2}))$ . From this follows that with probability at least $1-n^{-\gamma}$

[TABLE]

where $(a)$ follows from the fact that $\operatorname{\Phi}(\mathbf{M}_{k,\cdot})<1$ for $k$ -th row of any stochastic matrix $\mathbf{M}$ . The lemma follows applying the definition of $t_{0}$ . ∎

\restateLemNodePotentialChangeStatistics

Proof.

By LABEL:obs:node_potential_change_exact, we have

[TABLE]

Rearranging this lower bound into

[TABLE]

and expanding the definition of $\operatorname{\Psi}_{E(\mathbf{M}^{1})}$ we have by linearity of expectation

[TABLE]

where the inequality used that, for $\mathbf{M}^{1}\sim\mathcal{D}_{\textsc{RM}}(G)$ and all edges $e\in E(G)$ , it holds that ${\operatorname{\mathbb{P}}[e\in E(\mathbf{M}^{1})]}\geq 1/(8d)$ [25, Lemma 2]. It finishes the proof of the first statement.

For the second statement observe that by LABEL:obs:node_potential_change_exact we have

[TABLE]

Then, as $\operatorname{\Phi}(\vec{x})$ is constant for a given $\vec{x}$ ,

[TABLE]

Recall that the matching distribution $\mathcal{D}_{\textsc{RM}}(G)$ is obtained as follows. First, generate a random edge set $S$ as follows. For each $e\in E(G)$ , $e\in S$ with probability $p_{\mathrm{max}}\coloneqq{\operatorname{\mathbb{P}}[e\in S]}=1/(4d)-1/(64d^{2})\leq 1/(4d)$ , independently of all other edges. Then, some edges of $S$ are deleted to create a proper matching, resulting in $E(\mathbf{M}^{1})\subseteq S$ . Hence

[TABLE]

and

[TABLE]

Observe that $\operatorname{\Psi}_{S}(\vec{x})$ can be expressed as $\operatorname{\Psi}_{S}(\vec{x})=\sum_{\{i,j\}\in E(G)}(x_{i}-x_{j})^{2}\operatorname{\mathbf{1}}_{\{i,j\}\in S}$ with ${\operatorname{\mathbb{P}}[\operatorname{\mathbf{1}}_{\{i,j\}\in S}=1]}=p_{\mathrm{max}}$ . Thus,

[TABLE]

By using LABEL:lem:edge_potential_bounds(3) and then LABEL:claim:hitting_time_resistance_relation(1) we get that

[TABLE]

Hence,

[TABLE]

Applying the first statement of this lemma we get

[TABLE]

Putting everything together the second statement follows from

[TABLE]

\restateEdgePotentialBounds

Proof.

First note that for all $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ , $a,b\in\operatorname{\mathbb{R}}$ , and $S\subseteq E(G)$ ,

[TABLE]

The proof of the first part is similar to that of Theorem 2.6 in [44]. First, see that

[TABLE]

As $\operatorname{\Psi}_{G}(\vec{x}-b)=\operatorname{\Psi}_{G}(\vec{x})$ by Eq. 15, we may assume w.l.o.g. that $\langle\vec{x},\vec{1}\rangle=0$ by subtracting $b\coloneqq\langle\vec{x},\vec{1}\rangle/n$ from every coordinate of $\vec{x}$ . For such a vector we have $\operatorname{\Phi}(\vec{x})=\lVert x\rVert^{2}_{2}=\langle\vec{x},\vec{x}\rangle$ , and

[TABLE]

where the final equality is due to the min-max theorem and the fact that the smallest eigenvalue of $\operatorname{\mathbf{L}}(G)$ is [math], with its associated eigenvector being $\vec{1}$ .

For the second part, let $i,j\in[n]$ be two distinct nodes of the graph with $x_{i}\neq x_{j}$ . Then

[TABLE]

where the first equality uses Eq. 15, the central inequality holds because the argument of $\operatorname{\Psi}_{G}$ is a vector $\vec{a}\in\operatorname{\mathbb{R}}^{n}$ with $a_{i}=1$ and $a_{j}=0$ , and the final equality is by Dirichlet’s principle (Theorem A.5). Note that the bound also holds when $x_{i}=x_{j}$ .

Given Eq. 16, we now show that $\operatorname{\Psi}_{G}(\vec{x})$ is larger than the first, resp. second, term inside the maximum of the second part’s statement. For the first term, we choose $i$ and $j$ such that $x_{i}-x_{j}=\operatorname{\mathrm{disc}}(\vec{x})$ , and recall that $\mathrm{Res}(i,j)\leq\mathrm{Res}(G)$ for all $i,j\in[n]$ . Then, Eq. 16 states that $\operatorname{\Psi}_{G}(\vec{x})\geq\operatorname{\mathrm{disc}}(\vec{x})^{2}/\mathrm{Res}(G),$ and it remains to bound $\operatorname{\mathrm{disc}}(\vec{x})$ from below by $\operatorname{\Phi}(\vec{x})$ . To that end, as the vector $\vec{x}$ is stochastic by assumption, the sum over all its entries is 1, and there is at least one $k\in[n]$ with $x_{k}\leq 1/n$ . Hence, $\operatorname{\mathrm{disc}}(\vec{x})\geq\max_{k\in[n]}(x_{k}-1/n)$ , and so

[TABLE]

as needed to complete the bound for the first term.

For the second term, we choose $i$ and $j$ such that $x_{i}=\max_{k\in[n]}x_{k}$ , $x_{j}\leq x_{i}-2/3\cdot\operatorname{\mathrm{disc}}(\vec{x})$ with the distance $D$ between $i$ and $j$ being minimal. As $x_{i}\geq\operatorname{\mathrm{disc}}(\vec{x})$ , each of the entries of $\vec{x}$ for the $D-1$ non-terminal nodes on a shortest path between $i$ and $j$ is at least $\operatorname{\mathrm{disc}}(\vec{x})/3$ . As $\vec{x}$ is stochastic by assumption, the sum of all loads is at most $1$ , and we have

[TABLE]

which implies $D\leq 3/\operatorname{\mathrm{disc}}(\vec{x}).$ Since $\mathrm{Res}(i,j)$ is bounded by the standard distance between $i$ and $j$ (see Lemma A.2), and $x_{i}-x_{j}\geq 2/3\cdot\operatorname{\mathrm{disc}}(\vec{x}),$ we thus have, by Eq. 16,

[TABLE]

where the final inequality uses $\operatorname{\mathrm{disc}}(\vec{x})\geq\operatorname{\Phi}(\vec{x})$ as shown above.

For the third statement we first rearrange Eq. 16 to see that, for all $i\neq j$ ,

[TABLE]

Taking the maximum over all $\{i,j\}\in E(G)$ on both sides gives us

[TABLE]

as claimed, where the final equality is by definition of $\mathrm{Res}^{*}(G)$ . ∎

The following lemma is well-known, we state it for completeness. It relates the hitting time of a graph $G$ to its resistive diameter and the edge hitting time of $G$ to the $\mathrm{Res}^{*}(G)$ .

Lemma B.2 (label=claim:hitting_time_resistance_relation,restate=restateHittingTimeResistanceRelation).

For any graph $G=(V,E)$

$\mathrm{Res}^{*}(G)\cdot\lvert E\rvert\leq\operatorname{t^{*}_{\mathrm{hit}}(G)}\leq 2\cdot\mathrm{Res}^{*}(G)\cdot\lvert E\rvert$ , and 2. 2.

$\mathrm{Res}(G)\cdot\lvert E\rvert\leq\operatorname{t_{\mathrm{hit}}(G)}\leq 2\cdot\mathrm{Res}(G)\cdot\lvert E\rvert$ .

Proof.

Recall that

[TABLE]

and that

[TABLE]

For the first inequality, let $i,j\in V$ be adjacent nodes for which $\mathrm{Res}(i,j)=\mathrm{Res}^{*}(G)$ . Then, by Corollary A.4,

[TABLE]

which becomes the first inequality after dividing by 2 on both sides. For the second inequality, let $i,j\in V$ be adjacent nodes for which $\operatorname{t^{*}_{\mathrm{hit}}(G)}=H(i,j)$ . Then, again by Corollary A.4,

[TABLE]

The second statement is entirely analogous, except that the $i,j\in V$ are no longer required to be adjacent, and that they are chosen such that $\mathrm{Res}(i,j)=\mathrm{Res}(G)$ for the first inequality, or, for the second inequality, that $H(i,j)=\operatorname{t_{\mathrm{hit}}(G)}$ . ∎

B.4 Omitted Details from the Proof of Lemma 3.4

Proof of 3.13.

First, expanding the definition of $g_{G}(x)$ , pulling out constant factors, and simplifying fractions results in

[TABLE]

and we write $f_{1}(x)$ , $f_{2}(x)$ , and $f_{3}(x)$ for the first, second, and third argument of the minimum. For $x\geq 0$ , the indefinite integrals of these functions are

[TABLE]

First, we show that $\int_{0}^{1}x/g_{G}(x)\,{\mathrm{d}x}={\operatorname{O}}(1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G)))$ : As $\min\{f_{1}(x),f_{2}(x),f_{3}(x)\}\leq f_{1}(x)$ , we bound the integral in question as

[TABLE]

Next, we show that $\int_{0}^{1}x/g_{G}(x)\,{\mathrm{d}x}={\operatorname{O}}(\sqrt{d/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))})$ : Let $x_{1,3}\coloneqq\sqrt{\frac{27}{4}d\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))}$ be the $x$ such that $f_{1}(x)=f_{3}(x)$ . If $x_{1,3}\leq 1$ , then

[TABLE]

But if $x_{1,3}>1$ , the same bound also holds: we showed above that the integral in question is bounded by ${\operatorname{O}}(1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G)))$ , so that if $x_{1,3}>1$ , we have an upper bound of

[TABLE]

Last, we show that $\int_{0}^{1}x/g_{G}(x)\,{\mathrm{d}x}={\operatorname{O}}(\operatorname{t_{\mathrm{hit}}(G)}/n\cdot\log(n))$ : Let $x_{1,2}\coloneqq d\cdot\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))\cdot\mathrm{Res}(G)$ be the $x$ such that $f_{1}(x)=f_{2}(x)$ . If $x_{1,2}\leq 1$ , then

[TABLE]

where the penultimate bound uses the fact that $\mathrm{Res}(G)\cdot\lvert E(G)\rvert=\mathrm{Res}(G)\cdot dn/2\leq\operatorname{t_{\mathrm{hit}}(G)}$ (LABEL:claim:hitting_time_resistance_relation), and the final bound uses the fact that the inverse spectral gap of the normalized Laplacian $1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))$ is bounded from above by ${\operatorname{O}}(n^{3})$ (cf. [3]), and that $\operatorname{t_{\mathrm{hit}}(G)}\geq 1$ , so that the argument of the logarithm is polynomial in $n$ .

Otherwise, if $x_{1,2}>1$ , the same bound also holds: we show above that the integral is bounded by ${\operatorname{O}}(1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G)))$ , so that if $x_{1,2}>1$ we have an upper bound of

[TABLE]

Combining the three bounds, we have, as claimed,

[TABLE]

Proof of 3.14.

By the first inequality of Corollary 3.3 in [34] it holds for any nodes $i,j\in V(G)$ that

[TABLE]

As $G$ is regular we have $d(i)=d(j)=d$ and $\lvert E(G)\rvert=dn/2$ , and since the statement holds in particular for any pair of nodes that is adjacent this entails

[TABLE]

and the claim follows. ∎

B.5 Bounds for Specific Graph Classes

In this appendix we show bounds on the discrepancy for specific graph classes. Note that we assume that initially the system is empty.

Corollary B.3.

Let $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{RM}}(G),\beta,m)$ where $\vec{X}(0)=\vec{0}$ . For an arbitrary $t$ it holds w.h.p. and in expectation

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\sqrt{m}\log(n))$ * for any regular graph.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log(n)+\sqrt{m\log(n)})$ * for cycles and constant-degree regular graphs.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log(n)+\sqrt{m/n}\cdot\log^{3/2}(n))$ * for the two-dimensional torus graphs.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}((1+\sqrt{m/n})\cdot\log(n))$ * for torus graphs with $\geq 3$ dimensions, the hypercube, and all $d$ -regular graphs with $d\geq\lfloor n/2\rfloor$ .*

To show the above corollary we require bounds on $T(G)$ (Lemma B.4) and bounds on $\operatorname{t^{*}_{\mathrm{hit}}(G)}$ (Lemma B.6). Then the corollary immediately follows from Theorem 3.1.

In the following lemma we provide some bounds on $T(G)$ for several specific graph classes.

Lemma B.4.

Assume $G$ is a graph with $n$ nodes.

•

For constant-degree regular graphs $G$ we have $T(G)={\operatorname{O}}(n)$ .

•

For a two-dimensional $k\times k$ toroidal mesh $G$ we have $T(G)={\operatorname{O}}(\log^{2}(n))$ .

•

For a $r$ -dimensional $k\times\cdots\times k$ toroidal mesh (with $r\geq 3$ ) we have $T(G)={\operatorname{O}}(\log(n))$ .

•

For a $r$ -dimensional hypercube $G$ we have $T(G)={\operatorname{O}}(\log(n))$ .

•

For a $d$ -regular graph $G$ with $d\geq\lfloor\frac{n}{2}\rfloor$ we have $T(G)={\operatorname{O}}(\log(n))$ .

•

For an arbitrary $d$ -regular graph $G$ we have $T(G)={\operatorname{O}}(n\log(n))$ .

Proof.

Recall that $T(G)=\min\mathopen{}\mathclose{{}\left\{1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G)),\sqrt{d/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))},(\operatorname{t_{\mathrm{hit}}(G)}/n)\cdot\log(n)}\right\}$ , and that $\operatorname{t_{\mathrm{hit}}(G)}\leq 2\cdot\mathrm{Res}(G)\cdot\lvert E\rvert$ (LABEL:claim:hitting_time_resistance_relation), so that $\operatorname{t_{\mathrm{hit}}(G)}/n={\operatorname{O}}(d\cdot\mathrm{Res}(G))$ .

For $d$ -regular graphs with $d$ being constant, $1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))={\operatorname{O}}(n\cdot d\cdot(\mathrm{diam}(G)+1))$ by [30], where $\mathrm{diam}(G)$ diameter of $G$ . As $\mathrm{diam}(G)\leq n$ and $d$ is constant, $1/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))={\operatorname{O}}(n^{2})$ , so that $T(G)={\operatorname{O}}(\sqrt{d/\operatorname{\lambda}(\operatorname{\mathbf{L}}(G))})={\operatorname{O}}(n)$ .

For the two-dimensional $k\times k$ toroidal mesh, $d\leq 4$ and $\mathrm{Res}(G)={\operatorname{O}}(\log(n))$ by [16, Theorem 6.1], so that $T(G)={\operatorname{O}}((\operatorname{t_{\mathrm{hit}}(G)}/n)\cdot\log(n))={\operatorname{O}}(\log^{2}(n))$ .

For a $r$ -dimensional $k\times\cdots k$ toroidal mesh with $r\geq 3$ , as well as the $r$ -dimensional hypercube, $d\leq 2r$ and $\mathrm{Res}(G)={\operatorname{O}}(r^{-1})$ by [16, Theorem 6.1], so that $T(G)={\operatorname{O}}((\operatorname{t_{\mathrm{hit}}(G)}/n)\cdot\log(n))={\operatorname{O}}((d\cdot\mathrm{Res}(G))\log(n))={\operatorname{O}}(r\cdot r^{-1}\cdot\log(n))={\operatorname{O}}(\log(n))$ .

For a $d$ -regular graph $G$ with $d\geq\lfloor\frac{n}{2}\rfloor$ , $\mathrm{Res}(G)={\operatorname{O}}(d^{-1})$ by [16, Theorem 3.3], so that $T(G)={\operatorname{O}}((\operatorname{t_{\mathrm{hit}}(G)}/n)\cdot\log(n))={\operatorname{O}}((d\cdot\mathrm{Res}(G))\log(n))={\operatorname{O}}(d\cdot d^{-1}\cdot\log(n))={\operatorname{O}}(\log(n))$ .

For general $d$ -regular graphs $G$ , $\operatorname{t_{\mathrm{hit}}(G)}\leq 3n^{2}-nd$ by [32, Proposition 10.16], so that $T(G)={\operatorname{O}}((\operatorname{t_{\mathrm{hit}}(G)}/n)\cdot\log(n))={\operatorname{O}}((n^{2}/n)\log(n))={\operatorname{O}}(n\log(n))$ . ∎

To bound $\operatorname{t^{*}_{\mathrm{hit}}(G)}$ for many specific graph classes we use the following.

Theorem B.5 (Theorem 2.10 of [34], citing [28]).

Let $G$ be a graph and $i\in[n]$ be one of its nodes. Then if $J\in[n]$ is chosen uniformly at random from the neighbors of $i$ in $G$ , ${\operatorname{\mathbb{E}}[H(i,J)]}=2\lvert E\rvert/d(i)-1$ , where $d(i)$ is the degree of $i$ in $G$ .

This gives us the following bounds.

Lemma B.6.

Assume $G$ is a graph with $n$ nodes.

•

For $G$ being a toroidal mesh (including cycles and hypercubes), or being a $d$ -regular graph with $d\geq\lfloor n/2\rfloor$ , we have $\operatorname{t^{*}_{\mathrm{hit}}(G)}={\operatorname{O}}(n)$

•

For an arbitrary $d$ -regular graph $G$ we have $\operatorname{t^{*}_{\mathrm{hit}}(G)}\leq dn$ .

Proof.

Recall that $\operatorname{t^{*}_{\mathrm{hit}}(G)}\coloneqq\max_{i,j\in V,\{i,j\}\in E}H(i,j)$ . Toroidal meshes are symmetric or arc-transitive graphs: for every two ordered pairs of adjacent nodes $(i_{1},j_{1})$ and $(i_{2},j_{2})$ there is a graph automorphism $f$ such that $f(i_{1})=i_{2}$ and $f(j_{1})=j_{2}$ . Hence, for every such two ordered pairs, $H(i_{1},i_{1})=H(i_{2},j_{2})$ , and thus $\operatorname{t^{*}_{\mathrm{hit}}(G)}=H(i,j)$ for any pair of adjacent nodes $i,j$ . So applying Theorem B.5 shows that $\operatorname{t^{*}_{\mathrm{hit}}(G)}=2\lvert E\rvert/d-1$ . As $\lvert E\rvert=dn/2$ for $d$ -regular graphs, $\operatorname{t^{*}_{\mathrm{hit}}(G)}=2(dn/2)/d-1=n-1={\operatorname{O}}(n),$ as claimed.

For dense graphs we bound $\operatorname{t^{*}_{\mathrm{hit}}(G)}$ as $\operatorname{t^{*}_{\mathrm{hit}}(G)}\leq\operatorname{t_{\mathrm{hit}}(G)}\leq 2\cdot\mathrm{Res}(G)\cdot\lvert E\rvert$ (see LABEL:claim:hitting_time_resistance_relation). As $\mathrm{Res}(G)={\operatorname{O}}(1/d)$ by [16, Theorem 3.3], we get since $\lvert E\rvert=dn/2$ that $\operatorname{t^{*}_{\mathrm{hit}}(G)}={\operatorname{O}}(dn/d)={\operatorname{O}}(n)$ .

For arbitrary $d$ -regular graphs, $\operatorname{t^{*}_{\mathrm{hit}}(G)}\leq 2\cdot\mathrm{Res}^{*}(G)\cdot\lvert E\rvert$ by the first statement of LABEL:claim:hitting_time_resistance_relation. As $\lvert E\rvert=dn/2$ for a $d$ -regular graph, and as $\mathrm{Res}^{*}(G)\leq 1$ (by definition of $\mathrm{Res}^{*}(G)$ and Lemma A.2), we thus have $\operatorname{t^{*}_{\mathrm{hit}}(G)}\leq 2\cdot 1\cdot dn/2=dn$ . ∎

Appendix C Balancing Circuit Model

In this appendix we prove Theorem 4.3. The proof is similar to Theorem 1.2 in [15].

Proof of Theorem 4.3.

First we show a lower bound on $D_{k}(t)$ . The idea is to decompose $D_{k}(t)$ into sum of independent $Y_{\ell}$ random variable which have expected value zero. It then remains to show that $\sum_{\ell}\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\mathopen{}\mathclose{{}\left|Y_{\ell}^{3}}\right|}\right]$ is properly bounded. It allows us to apply a concentration inequality to the sum. To do so, we define several intermediate random variables similar to the proof of Lemma 3.6.

Fix round $t$ and consider node $k\in[n]$ such that $\Upsilon_{k}(\mathbf{m}^{[t]})=\Upsilon(\mathbf{m}^{[t]})$ . Recall that,

[TABLE]

We define indicator random variables ${B}({\tau,j,w})$ for $\tau\in[t]$ , $j\in[m]$ and $w\in[n]$ as follows.

[TABLE]

Note that for fixed $j$ and $\tau$ , $\sum_{w\in[n]}{B}({\tau,j,w})=1$ and ${\operatorname{\mathbb{P}}}\mathopen{}\mathclose{{}\left[{B}({\tau,j,w})=1}\right]=1/n$ . Recall that $\ell_{w}(\tau)$ can be expressed as $\sum_{j\in[m]}{B}({\tau,j,w})$ . It then follows that

[TABLE]

We define the derivative from the average for $D_{k}(t)$ as

[TABLE]

It immediately follows that $\widetilde{D}_{k}(t)=D_{k}(t)-t\cdot m/n.$ We call

[TABLE]

the contribution of the $j$ -th load item (of step $\tau$ ) to $\widetilde{D}_{k}(t)$ . For a fixed $\tau$ and $j$ , from the linearity of expectation, it follows that

[TABLE]

where the last inequality follows since $\mathbf{m}^{[\tau,t]}$ is a doubly stochastic matrix.

Here for $\ell=(\tau-1)\cdot m+j$ such that $\tau\in[t]$ and $j\in[m]$ we define $Y_{\ell}\coloneqq{C}_{k}{(\tau,j)}$ and it follows $\widetilde{D}_{k}(t)=\sum_{\ell=1}^{t\cdot m}Y_{\ell}$ . Note that $Y_{\ell}$ ’s are independent. We want to apply the Berry-Esseen Theorem [13, 20] (see Theorem A.13 in Section A.2). To do so, we need to compute $\operatorname{\mathrm{Var}}[Y_{\ell}]$ and $\operatorname{\mathbb{E}}[|Y_{\ell}|^{3}]$ . Then we get

[TABLE]

where in the second last equality we used the fact that for each $\tau$ and each $j$ exactly one of the ${B}({\tau,j,w})$ is one and all others are zero, and that each of the $n$ possible cases has uniform probability. Similarly we have

[TABLE]

where $(a)$ follows form the law to total expectation, $(b)$ from the fact that for any $w^{\prime}\in[n]$ , $|\mathbf{m}^{[\tau,t]}_{k,w^{\prime}}-1/n|<1$ .

Recall that $\|\mathbf{m}^{[\tau,t]}_{k,\cdot}-\frac{\vec{1}}{n}\|_{2}^{2}=\operatorname{\Phi}{(\mathbf{m}^{[\tau,t]}_{k,\cdot})}$ . By defining $F_{t\cdot m}(x)$ as the distribution of $\widetilde{D}_{k}(t)/\sqrt{\sum_{\ell=1}^{t\cdot m}\operatorname{\mathrm{Var}}[Y_{\ell}]}$ , from Theorem A.13 it follows that,

[TABLE]

in which the last inequality follows from the assumption, $m\geq 4n\log(n)/\sum_{\tau=1}^{t}\operatorname{\Phi}{(\mathbf{m}^{[\tau,t]}_{k,\cdot})}$ , and $C_{0}$ is some constant. Note that $\Phi_{N}(x)$ is the standard normal distribution. Therefore it holds that,

[TABLE]

where the last inequality follows from [[1], Formula 7.1.13] which states

[TABLE]

Hence with $x=1$ we have

[TABLE]

Therefore by replacing the definition of $F_{t\cdot m}(1)$ we get that

[TABLE]

Recall that $\widetilde{D}_{k}(t)=D_{k}(t)-\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[D_{k}(t)}\right]$ , then it follows that

[TABLE]

Moreover, when node $k$ receives more than expectation from the allocated load items, there is (at least) one node $w$ receiving less than expectation. Hence,

[TABLE]

Since $\vec{X}(0)=\vec{0}$ , then $I_{k}(t)=I_{w}(t)=0$ . From LABEL:lem:rounding:errors:are:small it follows that $\lvert R_{k}(t)-R_{w}(t)\rvert\leq\sqrt{\log n}$ with probability $1-o(1)$ . Since $m\geq 4n\cdot\log(n)/\sum_{\tau=1}^{t}\operatorname{\Phi}{(\mathbf{m}^{[\tau,t]}_{k,\cdot})}$ and $X_{k}(t)=I_{k}(t)+D_{k}(t)+R_{k}(t)$ , then it follows

[TABLE]

∎

Theorem 4.3 states that for a sequence of matchings $\mathbf{m}^{[t]}$ as long as $m\geq 4n\cdot\log n/\Upsilon_{k}(\mathbf{m}^{[t]})$ , then the load derivation of node $k$ from the expectation at round $t$ normalized by its standard deviation follows a standard normal distributed variable.

C.1 Bounds for Specific Graph Classes

In the following we drive some bounds on the discrepancy for specific graph classes. Note that we assume that initially the system is empty. The first corollary gives some upper bounds and the second one lower bounds. Corollary C.1 and Corollary C.2 are summarized in Table 1 (in Section 7) and Table 2 (below), respectively.

Corollary C.1.

Let $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{BC}}(G),1,m)$ at time $t$ with $\vec{X}(0)=\vec{0}$ and assume $G$ has $n$ nodes. For an arbitrary $t$ it holds w.h.p. and in expectation

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log(n)+\sqrt{(\zeta\cdot m)/(n\cdot\operatorname{\lambda}{(\mathbf{R}}))}\cdot\sqrt{\log(n)})$ * for arbitrary graphs with round matrix $\mathbf{R}$ .*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log(n)+\sqrt{m}\cdot\sqrt{\log(n)})$ * for cycle and regular graphs with constant $\zeta$ .*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}((1+\sqrt{{m}/{n}})\cdot\log(n))$ * for the two-dimensional torus or hypercube graphs.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log(n)+\sqrt{{m}/{n}}\cdot\sqrt{\log(n)})$ * for constant three or more-dimensional torus.*

Proof.

The bounds follow from a straight-forward combination of the upper bounds on the local divergence from Lemma C.3 with Theorem 4.1. ∎

Corollary C.2.

Let $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{BC}}(G),1,m)$ at time $t$ with $\vec{X}(0)=\vec{0}$ . It holds with constant probability that

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))=\Omega\mathopen{}\mathclose{{}\left(\sqrt{m}}\right)$ , for cycle, constant $d$ -regular graphs, $t=\Omega(n^{2})$ and $m\geq 4\log(n)$ .

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))=\Omega\mathopen{}\mathclose{{}\left(\sqrt{\frac{m}{n}\cdot\log(n)}}\right)$ * for two-dimensional torus, $t=\Omega(n)$ , and $m\geq 4n$ .*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))=\Omega\mathopen{}\mathclose{{}\left(\sqrt{\frac{m}{n}}}\right)$ , for constant $r\geq 3$ -dimensional torus, hypercube graphs, $t\in\mathbb{N}$ , and $m\geq 4n\cdot\log(n)$ .

Proof.

The bounds follow from a straight-forward combination of the bounds on the local divergence from Lemma C.3 to Theorem 4.3. ∎

The two corollaries above show that our bounds are almost tight for cycle graphs, constant $d$ -regular graphs, $r$ -dimensional torus graphs with constant $r$ and hypercube graphs. For instance, consider a cycle constructed by Odd-Even scheme and assume $m\geq\log(n)$ . Corollary C.1 states that the discrepancy is, w.h.p., ${\operatorname{O}}(\sqrt{m\cdot\log n})$ while Corollary C.2 implies that, with constant probability, the discrepancy is $\Omega(\sqrt{m})$ .

We now compute the global divergence for following concrete graphs and circuits: For cycles of even length, we consider the “Odd-Even” scheme in which the first matching $\mathbf{m}(1)$ consists of all edges $\{j,(j+1)\pmod{n}\}$ for any odd $j\in[n]$ , and the second matching $\mathbf{m}(2)$ consists of all edges $\{j,(j+1)\pmod{n}\}$ for any even $j\in[n]$ . More generally, for $r$ -dimensional torus with node set $[n^{1/r}]^{r}$ , the balancing circuit consists of $2r$ matchings in total, two matchings for each dimension $i$ , analogously to the cycle. For the hypercube, the canonical choice is the dimension exchange circuit consisting of $\log_{2}(n)$ matchings, where nodes $u$ and $v$ are matched in $\mathbf{m}(i)$ if and only if their binary representations differ in bit $i$ only (see, e.g., [15]).

Recall that $\operatorname{\Phi}{(\mathbf{m}^{[\tau,t]}_{k,\cdot})}=\lVert\mathbf{m}^{[\tau,t]}_{k,\cdot}-\frac{\vec{1}}{n}\rVert_{2}^{2}$ and $\mathbf{R}\coloneqq\mathbf{m}^{[1,\zeta]}$ . The next lemma is about the global divergence of some specific graphs for the distribution $\mathcal{D}_{\textsc{BC}}(G)$ .

Lemma C.3 (Global Divergence).

Let $G$ be a graph and consider $\mathcal{D}_{\textsc{BC}}(G)$ constructed by Odd-Even scheme such that it produces the round matrix $\mathbf{R}$ .

For each $t\in N$ it holds $(\Upsilon(\mathbf{M}^{[t]}))^{2}={\operatorname{O}}\mathopen{}\mathclose{{}\left(\zeta/\operatorname{\lambda}(\mathbf{R})}\right)$ . 2. 2.

For a constant $\zeta$ and each $t\in\operatorname{\mathbb{N}}$ it holds $(\Upsilon(\mathbf{M}^{[t]}))^{2}={\operatorname{O}}\mathopen{}\mathclose{{}\left(n}\right)$ . It also holds for any $t=\Omega(n^{2})$ , $(\Upsilon(\mathbf{M}^{[t]}))^{2}=\Omega(n)$ . 3. 3.

For two-dimensional torus $G$ and for each $t\in\operatorname{\mathbb{N}}$ it holds $(\Upsilon(\mathbf{M}^{[t]}))^{2}={\operatorname{O}}\mathopen{}\mathclose{{}\left(\log(n)}\right)$ . It also holds for any $t=\Omega(n)$ , $(\Upsilon(\mathbf{M}^{[t]}))^{2}=\Omega(\log n)$ . 4. 4.

For constant $r\geq 3$ -dimensional torus $G$ and each $t\in\operatorname{\mathbb{N}}$ it holds $(\Upsilon(\mathbf{M}^{[t]}))^{2}={\operatorname{O}}\mathopen{}\mathclose{{}\left(r}\right)$ . It also holds for any $t\in\operatorname{\mathbb{N}}$ , $(\Upsilon(\mathbf{M}^{[t]}))^{2}=\Omega(1)$ . 5. 5.

For hypercube graphs $G$ and each $t\in\operatorname{\mathbb{N}}$ it holds $(\Upsilon(\mathbf{M}^{[t]}))^{2}={\operatorname{O}}\mathopen{}\mathclose{{}\left(\log(n)}\right)$ . It also holds for any $t$ , $(\Upsilon(\mathbf{M}^{[t]}))^{2}=\Omega(1)$ .

Proof.

Recall that the sequence of matching matrices $\mathbf{m}^{[t]}$ has global divergence $\Upsilon(\mathbf{m}^{[t]})$ , if

[TABLE]

Since the matchings are fixed we have $\mathopen{}\mathclose{{}\left(\Upsilon(\mathbf{m}^{[t]})}\right)^{2}=\max_{w\in[n]}\sum_{\tau=1}^{t}\|\mathbf{m}^{[\tau,t]}_{w,\cdot}-\frac{\vec{1}}{n}\|_{2}^{2}$ . Consider a node $k\in[n]$ such that $\Upsilon_{k}(\mathbf{m}^{[t]})=\Upsilon(\mathbf{m}^{[t]})$ . We have seen that

[TABLE]

Since $\operatorname{\Phi}{(\mathbf{R}^{[1,\tau]}_{k,\cdot})}$ is non increasing in $\tau\in\operatorname{\mathbb{N}}$ and $\mathbf{R}\coloneqq\mathbf{m}^{[1,\zeta]}$ , then

[TABLE]

Hence, to bound $\mathopen{}\mathclose{{}\left(\Upsilon(\mathbf{m}^{[t]})}\right)^{2}$ , it is enough to bound $\zeta\cdot\sum_{\tau=1}^{\infty}\operatorname{\Phi}{(\mathbf{R}^{[1,\tau]}_{k,\cdot})}$ .

General case:

Here we get,

[TABLE]

where $(a)$ follows from [26, Lemma 2]. Note that $\operatorname{\Phi}{(\mathbf{R}^{[1,1]}_{k,\cdot})}\leq 1$ .

Cycles:

Recall that in cycle $\zeta=2$ . It holds that

[TABLE]

where $(b)$ and $(d)$ follows [15], $(c)$ from [26, Lemma 2]. To see $(e)$ , consider that the spectral gap of the round matrix corresponding to a cycle is $\Theta(1/n^{2})$ [43]. Moreover, for $t=cn^{2}$ with some constant $c$ , it follows from [15] that

[TABLE]

for $c_{1}\in[1,2]$ .

Two-dimensional torus:

Note that in $r$ -dimensional torus graphs $\zeta=2r=4$ , and the spectral gap of the round matrix corresponding to a $r$ -dimensional torus is $\Theta(1/n^{2/r})$ [43]. Hence,

[TABLE]

where $(f)$ and $(h)$ follow from [15], $(g)$ from [26, Lemma 2]. Moreover, for $t=cn$ with some constant $c$ , it follows from [15] that

[TABLE]

for $c_{1}\in[1,4]$ .

Constant three or more-dimensional torus:

Let us assume $r=2(1+\epsilon)$ for some $\epsilon>0$ then

[TABLE]

where $(i)$ follows form [15].

Hypercubes:

Similarly, it holds that

[TABLE]

where $(j)$ follows from [15]. Recall that in hypercube $\zeta\leq\log(n)$ .

The lower bound of $1$ is trivial. ∎

Appendix D Asynchronous Model

The following is the equivalent of Lemma 3.6 for the process ABal:

Lemma D.1.

Let $G$ be a regular graph, and let $t\in\operatorname{\mathbb{N}}$ . Then in $\textsc{ABal}(\mathcal{D}_{\textsc{A}}(G),\beta)$ , for all $k\in[n]$ , $\gamma>0$ , and for $\hat{\Upsilon}_{k}>0$ such that ${\operatorname{\mathbb{P}}[\Upsilon_{k}(\mathbf{M}^{[t]})+1>\hat{\Upsilon}_{k}]}\leq n^{-\gamma}$ , we have

[TABLE]

Proof.

Let $\vec{\ell}(\tau)$ be the vector of allocated loads in round $\tau$ and recall that we have

[TABLE]

Using $\mathbf{M}^{[\tau,t]}=\mathbf{M}^{[\tau+1,t]}\cdot\mathbf{M}(\tau)$ , we can express the $k$ th coordinate of $\vec{D}(t)$ as

[TABLE]

is the contribution of the load item allocated in round $\tau$ to $D_{k}(t)$ . Note that in the second factorization of the $C_{k}(\tau)$ , the two factors are independent as they concern disjoint rounds.

Now consider the sequence $(Y(l))_{l=0}^{t}$ of partial sums $Y(l)=\sum_{\tau=t-l+1}^{t}(C_{k}(\tau)-1/n)$ with respect to the natural filtration $\mathcal{F}=(\mathcal{F}(l))_{l=0}^{t}$ on the sequence of edges $(I(t-l),J(t-l))$ . In particular, we have

[TABLE]

and $\mathcal{F}(l)$ determines all edges used in rounds $t-l+1$ up to round $t$ . To apply the martingale tail inequality Corollary A.12 to $(Y(l))_{l=0}^{t}$ , we need to check that ${\operatorname{\mathbb{E}}[Y(l)-Y(l-1)\mid\mathcal{F}(l-1)]}=0$ and that $\lvert Y(l)-Y(l-1)\rvert\leq 1$ .

For the first condition, note that both $\mathbf{M}^{[\tau,t]}_{k,\cdot}$ and $\vec{\ell}(\tau)$ are stochastic vectors (for the latter, this is because exactly one load item is allocated in each round in the asynchronous model). Thus, their inner product $C_{k}(\tau)$ has a value in the interval $[0;1],$ so that $\lvert Y(l)-Y(l-1)\rvert=\lvert C_{k}(t-l)-1/n\rvert\leq 1-1/n\leq 1$ , as required.

For the second condition, note that

[TABLE]

so that it is enough to show that the expected value of the $C_{k}(\tau)$ is $1/n$ when conditioned on the matching choices in rounds $\tau+1$ to $t$ . The bound given by Corollary A.12 also involves the quantity

[TABLE]

so we will investigate $C_{k}(\tau)$ more thoroughly than would be required to compute only its conditional expectation.

To this end, let us first make the dependence between $\mathbf{M}(\tau)$ and $\vec{\ell}(\tau)$ more explicit. Let $(I(\tau),J(\tau))$ be the random orientation of the random edge selected in round $\tau$ , so that the load item in round $\tau$ is allocated to $I(\tau)$ , and then the load is balanced across the edge $\{I(\tau),J(\tau)\}$ . Then

[TABLE]

Using this, we may see that

[TABLE]

Now $\mathcal{D}_{\textsc{A}}(G)$ is the uniform distribution over the edges of $G$ , and the node to which load is allocated is a uniformly random endpoint of the chosen edge. Thus, $(I(\tau),J(\tau))$ is distributed uniformly over the oriented edges $\bigcup_{\{i,j\}\in E(G)}\{(i,j),(j,i)\}$ . Since $G$ is $d$ -regular, there are $2\cdot\lvert E(G)\rvert=2\cdot(dn/2)=dn$ such oriented edges. Hence, for all $i\in[n]$ ,

[TABLE]

By an entirely analogous calculation, ${\operatorname{\mathbb{P}}[J(\tau)=i]}=1/n$ holds as well. So $I(\tau)$ and $J(\tau)$ are identically distributed (but not necessarily independent). Because of this, the two sums over $i\in[n]$ on the right-hand side of Eq. 17 are also identically distributed.

We can now compute the conditional expectation of $C_{k}(\tau)$ . Using Eq. 17 and linearity of expectation we see that

[TABLE]

So ${\operatorname{\mathbb{E}}[Y(l)-Y(l-1)\mid\mathcal{F}(l-1)]}=1/n-1/n=0,$ as required for applying Corollary A.12.

So all preconditions of Corollary A.12 hold. Applying it with $\varepsilon=\gamma\log(n)$ and $\sigma=\hat{\Upsilon}_{k}/\sqrt{n}$ yields

[TABLE]

We will now show that $\langle Y\rangle_{t}\leq 1/n\cdot(\Upsilon_{k}(\mathbf{M}^{[t]})+1)^{2}$ , which finishes the proof after noting that then,

[TABLE]

with the last inequality using the condition on $\hat{\Upsilon}_{k}$ in the statement.

So to bound $\langle Y\rangle_{t}$ , recall that

[TABLE]

with the latter equality using the fact the expected value of $(Y(l)-Y(l-1))$ conditioned on $\mathcal{F}(l-1)$ is [math]. And since $Y(l)-Y(l-1)=C_{k}(t-l)-1/n$ and $1/n$ is a constant,

[TABLE]

By Eq. 17, and as for two identically distributed random variables $A$ and $B$ , and $a,b\in\operatorname{\mathbb{R}}^{+}$ , we have ${\operatorname{\mathrm{Var}}[aA+bB]}=a^{2}{\operatorname{\mathrm{Var}}[A]}+2ab{\operatorname{\mathrm{Cov}}[A,B]}+b^{2}{\operatorname{\mathrm{Var}}[B]}\leq(a^{2}+2ab+b^{2}){\operatorname{\mathrm{Var}}[A]}=(a+b)^{2}{\operatorname{\mathrm{Var}}[A]}$ :

[TABLE]

And hence we may bound $\langle Y\rangle_{t}$ from above using the global divergence:

[TABLE]

which is all that remained to be shown. ∎

The next result is the analogue of Lemma 3.10:

Lemma D.2.

Assume $G$ is an arbitrary $d$ -regular graph. Then $\mathcal{D}_{\textsc{A}}(G)$ is $(g_{G},\sigma_{G}^{2})$ -good, where

[TABLE]

The proof of Lemma D.2 is analogous to that of Lemma 3.10, except that we use Lemma D.3 stated below instead of LABEL:prop:node_potential_change_statistics.

Lemma D.3.

Let $G$ be a $d$ -regular graph, let $\mathbf{M}^{1}\sim\mathcal{D}_{\textsc{A}}(G)$ , and let $\vec{x}\in\operatorname{\mathbb{R}}^{n}$ , Then

$\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}=\frac{1}{dn}\cdot\operatorname{\Psi}_{G}(\vec{x}).$ ** 2. 2.

${\operatorname{\mathrm{Var}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}\leq(2\cdot\operatorname{t^{*}_{\mathrm{hit}}(G)}-1)\cdot\mathopen{}\mathclose{{}\left(\operatorname{\Phi}(\vec{x})-{\operatorname{\mathbb{E}}\mathopen{}\mathclose{{}\left[\operatorname{\Phi}(\mathbf{M}^{1}\cdot\vec{x})}\right]}}\right)^{2}.$ **

Proof.

For the first statement, we use LABEL:obs:node_potential_change_exact as well as the fact that $\mathcal{D}_{\textsc{A}}(G)$ is the uniform distribution over the edges of $G$ to see that, as claimed.

[TABLE]

For the second statement we first observe that $\operatorname{\Phi}(\vec{x})$ is constant and by LABEL:obs:node_potential_change_exact we have

[TABLE]

We bound this variance using the Bhatia-Davis inequality (see Theorem A.8 in Section A.2). It states that, for a random variable $X$ taking values in $[m,M]$ , and with $\mu\coloneqq{\operatorname{\mathbb{E}}[X]}$ , it is the case that ${\operatorname{\mathrm{Var}}[X]}\leq(M-\mu)(\mu-m).$ Now from the definition of $\operatorname{\Psi}$ , it is immediate that $\operatorname{\Psi}_{\mathbf{M}^{1}}(\vec{x})\geq 0$ . For the upper bound on $\operatorname{\Psi}_{\mathbf{M}^{1}}(\vec{x})$ , recall that the matchings $\mathbf{M}^{1}\sim\mathcal{D}_{\textsc{A}}$ consist of just one edge, and so $\operatorname{\Psi}_{\mathbf{M}^{1}}\leq\max_{\{i,j\}\in E(G)}(x_{i}-x_{j})^{2}$ . The latter is bounded from above by the third statement of LABEL:lem:edge_potential_bounds, yielding

[TABLE]

And so, by the Bhatia-Davis inequality (Theorem A.8),

[TABLE]

where the last inequality used the fact that $\mathrm{Res}^{*}(G)\cdot dn=2\cdot\mathrm{Res}^{*}(G)\cdot\lvert E\rvert\leq 2\cdot\operatorname{t^{*}_{\mathrm{hit}}(G)}$ by LABEL:claim:hitting_time_resistance_relation. ∎

D.1 Bounds for Specific Graph Classes

Again as in Section B.5 we consider specific graph classes and use the bounds on $T(G)$ and on the hitting time from Section B.5. When applied to Theorem 5.1 we get the following results w.h.p. and in expectation.

Corollary D.4.

Let $\vec{X}(t)$ be the state of process $\textsc{SBal}(\mathcal{D}_{\textsc{RM}}(G),\beta,m)$ where $\vec{X}(0)=\vec{0}$ . For an arbitrary $t$ it holds w.h.p. and in expectation

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\sqrt{n}\log(n))$ * for any regular graph.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\sqrt{n\log(n)})$ * for cycle and constant-degree regular graphs.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log^{3/2}(n))$ * for the two-dimensional torus graph.*

•

$\operatorname{\mathrm{disc}}(\vec{X}(t))={\operatorname{O}}(\log(n))$ * for $r$ -dimensional torus graphs with $r\geq 3$ dimensions, for the hypercube, and for all $d$ -regular graphs with $d\geq\lfloor n/2\rfloor$ .*

Appendix E Proof of the Drift Result

In this appendix we give the full proof of our drift result from Section 6. We restate it for convenience.

\restateLemDrift

Proof.

Throughout this proof we write

[TABLE]

We start by proving the first statement. Let $a,b\in\operatorname{\mathbb{R}}^{+}$ with $a\leq b\leq x_{0}$ be two arbitrary numbers. Since $h$ is increasing we have $h(a)\leq h(b)$ and $1/h(a)\geq 1/h(b)$ . Hence,

[TABLE]

From condition 1 of the theorem it follows that ${\operatorname{\mathbb{E}}[X(t+1)\mid X(t)=b]}\leq b-h(b)$ and consequently $h(b)\leq b-{\operatorname{\mathbb{E}}[X(t+1)\mid X(t)=b]}$ giving us with $X(t)=b$

[TABLE]

We introduce a new sequence of random variables for which we will derive a lower tail bound, defined as $(Y(t))_{t\in\operatorname{\mathbb{N}}}$ given by $Y(0)\coloneqq 0$ and

[TABLE]

Comparing this with Eq. 18 we see that regardless of the value of $X(t)$ it holds that

[TABLE]

By induction over $t$ , and since $f(x_{0})=\int_{x_{0}}^{x_{0}}(1/h(\varphi))\,{\mathrm{d}\varphi}=0$ and $Y(0)=0$ , we have for all $t$

[TABLE]

From the definition of $(Y_{t})_{t\geq 0}$ it follows assuming $X(t)=x$ that

[TABLE]

Then, from the law of total expectation we get that

[TABLE]

Since $Y(0)=0$ it immediately follows that $\operatorname{\mathbb{E}}[Y(t)]=t$ . Furthermore, we may bound the variance of the change of $Y$ given $X(t)=x$ by

[TABLE]

where $(a)$ follows from Condition 2 of the theorem. The sequence $(Y(t)-{\operatorname{\mathbb{E}}[Y(t)]})_{t\geq 0}$ is a martingale and hence fulfills the preconditions of Theorem A.10 (Theorem 6.6 from [17]) with $a_{t}\coloneqq 1$ and $\sigma^{2}_{t}\coloneqq\sigma$ . Note that $\operatorname{\mathbb{E}}[Y(t)-\operatorname{\mathbb{E}}[Y(t)]]=0$ . Hence, we obtain

[TABLE]

Recalling that $f(X(t))\geq Y(t)$ and ${\operatorname{\mathbb{E}}[Y(t)]}=t$ and setting $\varepsilon=\delta t$ for some $\delta\in(0,1)$ we arrive at the first statement of the theorem;

[TABLE]

Next we prove the second statement and bound $\sum_{t=t_{0}+1}^{\infty}X(t)$ . Let $T(x)\coloneqq\min\{t\in\operatorname{\mathbb{N}}\mid X(t)\leq x\}$ be a hitting time for the event that $X(t)\leq x$ . Using $\operatorname{\mathbf{1}}_{x<X(t)}$ as the indicator variable (which is one if $x<X(t)$ and zero otherwise) we can write $X(t)=\int_{0}^{x_{0}}\operatorname{\mathbf{1}}_{X(t)>x}\,{\mathrm{d}x}$ because $x_{0}$ is fixed and $X(t)$ is non-increasing in $t$ resulting in $X(t)\in[0,x_{0}]$ . As a consequence it holds that

[TABLE]

We now proceed to bound the $T(x)$ . Using the first statement with a union bound over all $t>t_{0}\coloneqq\frac{2(\sigma+1)}{\delta^{2}}\cdot\mathopen{}\mathclose{{}\left(-\log(p)+\log\mathopen{}\mathclose{{}\left(\frac{2(\sigma+1)}{\delta^{2}}}\right)}\right)$ gives us

[TABLE]

As a consequence,

[TABLE]

and

[TABLE]

Recalling that $T(x)\coloneqq\min\{t\in\operatorname{\mathbb{N}}\mid X(t)\leq x\}$ Eq. 19 implies that

[TABLE]

since $X(T(x)-1)>x$ by the definition of $T(x)$ and $f$ is non-increasing it holds that $f(X(T(x)-1))\leq f(x)$ . It follows that

[TABLE]

As a consequence we get that with probability at least $1-p$

[TABLE]

Finally, we find that

[TABLE]

Putting everything together we see with probability at least $1-p$ that

[TABLE]

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables . Dover, New York, ninth dover printing, tenth gpo printing edition, 1964.
2[2] Heiner Ackermann, Simon Fischer, Martin Hoefer, and Marcel Schöngens. Distributed algorithms for Qo S load balancing. Distributed Comput. , 23(5-6):321–330, 2011. doi:10.1007/s 00446-010-0125-1 . · doi ↗
3[3] Sinan G. Aksoy, Fan Chung, Michael Tait, and Josh Tobin. The maximum relaxation time of a random walk. Adv. Appl. Math. , 101:1–14, 2018. doi:10.1016/j.aam.2018.07.002 . · doi ↗
4[4] Dan Alistarh, Giorgi Nadiradze, and Amirmojtaba Sabour. Dynamic averaging load balancing on cycles. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020 , volume 168 of LIP Ics , pages 7:1–7:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIP Ics.ICALP.2020.7 . · doi ↗
5[5] Aris Anagnostopoulos, Adam Kirsch, and Eli Upfal. Load balancing in arbitrary network topologies with stochastic adversarial input. SIAM Journal on Computing , 34(3):616–639, 2005. doi:10.1137/S 0097539703437831 . · doi ↗
6[6] Elliot Anshelevich, David Kempe, and Jon M. Kleinberg. Stability of load balancing algorithms in dynamic adversarial systems. SIAM J. Comput. , 37(5):1656–1673, 2008. doi:10.1137/050639272 . · doi ↗
7[7] Friedhelm Meyer auf der Heide, Brigitte Oesterdiekhoff, and Rolf Wanka. Strongly adaptive token distribution. Algorithmica , 15(5):413–427, 1996. doi:10.1007/BF 01955042 . · doi ↗
8[8] Petra Berenbrink, Colin Cooper, Tom Friedetzky, Tobias Friedrich, and Thomas Sauerwald. Randomized diffusion for indivisible loads. J. Comput. Syst. Sci. , 81(1):159–185, 2015. doi:10.1016/j.jcss.2014.04.027 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Dynamic Averaging Load Balancing on Arbitrary Graphs

Abstract

1 Introduction

Results in a Nutshell

1.1 Related Work

Discrete Models

Dynamic Models

2 Balancing Models and Notation

Synchronous Processes

Asynchronous Process

2.1 Notation

3 Random Matching Model

Theorem 3.1**.**

Proof.

Lemma 3.2** (name=Memorylessness Property,restate=restateInitialLoadVanishes,label=lem:initial:load:vanishes).**

Lemma 3.3** (name=Insignificance of Rounding Errors,restate=restateRoundingErrorsAreSmall,label=lem:rounding:errors:are:small).**

Lemma 3.4** (Contribution of Dynamically Allocated Load).**

3.1 Bounding the Contribution of Dynamically Allocated Load

Observation 3.5**.**

Lemma 3.6** (Load Concentration).**

Proof.

Observation 3.7** (name=,label=obs:node_potential_change_exact,restate=restateObsPotentialRelation).**

Theorem 3.8**.**

Lemma 3.9** (name=Global Divergence,label=lem:glob:div:bound:drift,restate=restateLemGlobalDivergence).**

Lemma 3.10**.**

Proof.

Lemma 3.11** (label=prop:node_potential_change_statistics,restate=restateLemNodePotentialChangeStatistics).**

Lemma 3.12** (label=lem:edge_potential_bounds,restate=restateEdgePotentialBounds).**

Proof of Lemma 3.4

Proof.

Claim 3.13*.*

Claim 3.14*.*

4 Balancing Circuit Model

Theorem 4.1**.**

Proof.

Lemma 4.2** (Memorylessness Property).**

Proof.

Theorem 4.3**.**

5 Asynchronous Model

Theorem 5.1**.**

Proof Sketch of Theorem 5.1.

6 Drift Result

Theorem 6.1** (name=,restate=restateLemDrift,label=lem:drift).**

7 Conclusions and Open Problems

Results for Specific Graph Classes

Open Problems

Appendix A Auxiliary Results

A.1 Random Walks, Hitting Times, and Effective Resistance

Theorem A.1** (Harmonic Functions and Effective Resistance).**

Lemma A.2**.**

Theorem A.3** (Theorem 4.1 (i) in [34]).**

Corollary A.4**.**

Proof.

Theorem A.5** (Dirichlet’s principle, see Exercise 2.13 in [35]; or Exercise 9.9 in [32], referencing Theorem 6.1 in [33]).**

Theorem A.6** (Corollary 3.3 in [34], applied to ddd-regular graphs).**

A.2 Tail Bounds

Lemma A.7**.**

Proof.

Theorem A.8** (Bhatia-Davis inequality [14]).**

Theorem A.9** (Azuma–Hoeffding inequality Theorem 13.6 in [38]).**

Theorem A.10** (Adapted from Theorem 6.6 in [17]).**

Theorem A.11** (Adapted from Theorem 2.1 and combined with Remark 2.1 and Equation 18 in [21]).**

Corollary A.12**.**

Proof.

Theorem A.13** (Berry-Esseen Theorem [13, 20] for Non-identical Random Variables).**

Theorem A.14** (Theorem 3.4 of [17], [36]).**

Theorem A.15** (Theorem 4.1 of [17]).**

Appendix B Omitted Proofs from Section 3

Observation B.1**.**

Proof.

B.1 Proof of LABEL:lem:initial:load:vanishes

Proof.

Claim*.*

Proof of the claim.

Theorem 3.1.

Lemma 3.2 (name=Memorylessness Property,restate=restateInitialLoadVanishes,label=lem:initial:load:vanishes).

Lemma 3.3 (name=Insignificance of Rounding Errors,restate=restateRoundingErrorsAreSmall,label=lem:rounding:errors:are:small).

Lemma 3.4 (Contribution of Dynamically Allocated Load).

Observation 3.5.

Lemma 3.6 (Load Concentration).

Observation 3.7 (name=,label=obs:node_potential_change_exact,restate=restateObsPotentialRelation).

Theorem 3.8.

Lemma 3.9 (name=Global Divergence,label=lem:glob:div:bound:drift,restate=restateLemGlobalDivergence).

Lemma 3.10.

Lemma 3.11 (label=prop:node_potential_change_statistics,restate=restateLemNodePotentialChangeStatistics).

Lemma 3.12 (label=lem:edge_potential_bounds,restate=restateEdgePotentialBounds).

*Claim 3.13**.*

*Claim 3.14**.*

Theorem 4.1.

Lemma 4.2 (Memorylessness Property).

Theorem 4.3.

Theorem 5.1.

Theorem 6.1 (name=,restate=restateLemDrift,label=lem:drift).

Theorem A.1 (Harmonic Functions and Effective Resistance).

Lemma A.2.

Theorem A.3 (Theorem 4.1 (i) in [34]).

Corollary A.4.

Theorem A.5 (Dirichlet’s principle, see Exercise 2.13 in [35]; or Exercise 9.9 in [32], referencing Theorem 6.1 in [33]).

Theorem A.6 (Corollary 3.3 in [34], applied to $d$ -regular graphs).

Lemma A.7.

Theorem A.8 (Bhatia-Davis inequality [14]).

Theorem A.9 (Azuma–Hoeffding inequality Theorem 13.6 in [38]).

Theorem A.10 (Adapted from Theorem 6.6 in [17]).

Theorem A.11 (Adapted from Theorem 2.1 and combined with Remark 2.1 and Equation 18 in [21]).

Corollary A.12.

Theorem A.13 (Berry-Esseen Theorem [13, 20] for Non-identical Random Variables).

Theorem A.14 (Theorem 3.4 of [17], [36]).

Theorem A.15 (Theorem 4.1 of [17]).

Observation B.1.

*Claim**.*

Lemma B.2 (label=claim:hitting_time_resistance_relation,restate=restateHittingTimeResistanceRelation).

Corollary B.3.

Lemma B.4.

Theorem B.5 (Theorem 2.10 of [34], citing [28]).

Lemma B.6.

Corollary C.1.

Corollary C.2.

Lemma C.3 (Global Divergence).

Lemma D.1.

Lemma D.2.

Lemma D.3.

Corollary D.4.