Noidy Conmunixatipn: On the Convergence of the Averaging Population   Protocol

Frederik Mallmann-Trenn; Yannic Maus; Dominik Pajak

arXiv:1904.10984·cs.DC·April 26, 2019

Noidy Conmunixatipn: On the Convergence of the Averaging Population Protocol

Frederik Mallmann-Trenn, Yannic Maus, Dominik Pajak

PDF

TL;DR

This paper analyzes a distributed averaging protocol with noisy communication, providing probabilistic bounds on convergence time and showing that the total sum of squares remains small for polynomially many rounds despite eventual divergence.

Contribution

It offers the first probabilistic bounds on convergence time and precise analysis of the divergence of the total sum of squares in noisy averaging protocols.

Findings

01

Convergence time of the running average is probabilistically bounded.

02

Total sum of squares remains small for polynomially many rounds.

03

Results extend to synchronous and discrete-value settings.

Abstract

We study a process of \emph{averaging} in a distributed system with \emph{noisy communication}. Each of the agents in the system starts with some value and the goal of each agent is to compute the average of all the initial values. In each round, one pair of agents is drawn uniformly at random from the whole population, communicates with each other and each of these two agents updates their local value based on their own value and the received message. The communication is noisy and whenever an agent sends any value $v$ , the receiving agent receives $v + N$ , where $N$ is a zero-mean Gaussian random variable. The two quality measures of interest are (i) the total sum of squares $T S S (t)$ , which measures the sum of square distances from the average load to the \emph{initial average} and (ii) $\overset{ˉ}{ϕ} (t)$ , measures the sum of square distances from the average load to the \emph{running…

Figures2

Click any figure to enlarge with its caption.

Equations207

\overset{ˉ}{ϕ} (X^{(t_{1})})

\overset{ˉ}{ϕ} (X^{(t_{1})})

T S S (x^{(t)}) = i \sum (x_{i}^{(t)} - \emptyset^{(0)})^{2}, \overset{ˉ}{ϕ} (x^{(t)}) = i \sum (x_{i}^{(t)} - \emptyset^{(t)})^{2}, ϕ (x^{(t)}) = i, j \sum (x_{i}^{(t)} - x_{j}^{(t)})^{2} .

T S S (x^{(t)}) = i \sum (x_{i}^{(t)} - \emptyset^{(0)})^{2}, \overset{ˉ}{ϕ} (x^{(t)}) = i \sum (x_{i}^{(t)} - \emptyset^{(t)})^{2}, ϕ (x^{(t)}) = i, j \sum (x_{i}^{(t)} - x_{j}^{(t)})^{2} .

T S S (t)

T S S (t)

= i \sum ((x_{t}^{(i)} - \emptyset^{(t)})^{2} + 2 (x_{t}^{(i)} - \emptyset^{(t)}) (\emptyset^{(0)} - \emptyset^{(t)}) + (\emptyset^{(0)} - \emptyset^{(t)})^{2})

= \overset{ˉ}{ϕ} (X^{(t)}) + 2 (i \sum x_{t}^{(i)} - n \emptyset^{(t)}) (\emptyset^{(0)} - \emptyset^{(t)}) + n (\emptyset^{(0)} - \emptyset^{(t)})^{2}

= \overset{ˉ}{ϕ} (X^{(t)}) + n (\emptyset^{(0)} - \emptyset^{(t)})^{2} .

ϕ (x)

ϕ (x)

= 2 n (i \sum x_{i}^{2} - i \sum x_{i} \emptyset) = 2 n (i \sum x_{i}^{2} - 2 i \sum x_{i} \emptyset + n \emptyset^{2}) = 2 n i \sum (x_{i} - \emptyset)^{2} = 2 n \cdot \overset{ˉ}{ϕ} (x) .

N^{' (t)} = (N_{1}^{(t)})^{2} + (N_{2}^{(t)})^{2}, N^{* (t)} = N_{1}^{(t)} + N_{2}^{(t)} .

N^{' (t)} = (N_{1}^{(t)})^{2} + (N_{2}^{(t)})^{2}, N^{* (t)} = N_{1}^{(t)} + N_{2}^{(t)} .

\overset{ˉ}{ϕ} (X^{(t + 1)}) - \overset{ˉ}{ϕ} (x^{(t)}) \leq - Δ^{(t + 1)} \overset{ˉ}{ϕ} (x^{(t)}) + \frac{N ^{' (t + 1)}}{4} + N^{* (t + 1)} (\frac{x _{i}^{(t)} + x _{j}^{(t)}}{2} - \emptyset^{(t)}) .

\overset{ˉ}{ϕ} (X^{(t + 1)}) - \overset{ˉ}{ϕ} (x^{(t)}) \leq - Δ^{(t + 1)} \overset{ˉ}{ϕ} (x^{(t)}) + \frac{N ^{' (t + 1)}}{4} + N^{* (t + 1)} (\frac{x _{i}^{(t)} + x _{j}^{(t)}}{2} - \emptyset^{(t)}) .

S^{'} (I) = τ \in I \sum N^{' (τ)} /4, S^{*} (I) = τ \in I \sum N^{* (τ)} (\frac{x _{i}^{(τ - 1)} + x _{j}^{(τ - 1)}}{2} - \emptyset^{(τ)}), S^{-} (I) = τ \in I \sum Δ^{(τ)} .

S^{'} (I) = τ \in I \sum N^{' (τ)} /4, S^{*} (I) = τ \in I \sum N^{* (τ)} (\frac{x _{i}^{(τ - 1)} + x _{j}^{(τ - 1)}}{2} - \emptyset^{(τ)}), S^{-} (I) = τ \in I \sum Δ^{(τ)} .

\overset{ˉ}{ϕ} (X^{(t_{1})})

\overset{ˉ}{ϕ} (X^{(t_{1})})

m_{t, δ} = ar g ℓ max {P [max ({N^{^{'} (t_{0})}, \dots, N^{^{'} (t_{0} + t)}} \cup {N^{* (t_{0})}, \dots, N^{* (t_{0} + t)}}) \leq ℓ] \geq 1 - δ} .

m_{t, δ} = ar g ℓ max {P [max ({N^{^{'} (t_{0})}, \dots, N^{^{'} (t_{0} + t)}} \cup {N^{* (t_{0})}, \dots, N^{* (t_{0} + t)}}) \leq ℓ] \geq 1 - δ} .

S^{*} (I) + S^{'} (I) \leq

S^{*} (I) + S^{'} (I) \leq

\frac{t}{4} E [N^{'}] + 5 \frac{t}{n} (ln (4 t / δ) m_{t, δ /4}^{*})^{2} (2 + E [N^{'}]) \overset{ˉ}{ϕ} (x^{(t_{0})}) + 9 t E [N^{'}] + 2 .

P [\overset{ˉ}{ϕ} (X^{(t^{*})}) \geq ln (1/ δ) n E [N^{'}] + b ∣ F_{t_{0}}] \leq δ,

P [\overset{ˉ}{ϕ} (X^{(t^{*})}) \geq ln (1/ δ) n E [N^{'}] + b ∣ F_{t_{0}}] \leq δ,

\overset{ˉ}{ϕ} (X^{(t_{1})})

\overset{ˉ}{ϕ} (X^{(t_{1})})

\frac{1}{2 π} \frac{x}{x ^{2} + 1} exp (- x^{2} /2) \leq Φ (x) \leq \frac{1}{2 π} \frac{1}{x} exp (- x^{2} /2) .

\frac{1}{2 π} \frac{x}{x ^{2} + 1} exp (- x^{2} /2) \leq Φ (x) \leq \frac{1}{2 π} \frac{1}{x} exp (- x^{2} /2) .

P [2 n (\emptyset^{(t)} - \emptyset^{(0)}) \geq x] = Φ (\frac{x - μ _{x}}{σ _{x}}),

P [2 n (\emptyset^{(t)} - \emptyset^{(0)}) \geq x] = Φ (\frac{x - μ _{x}}{σ _{x}}),

P [2 n (\emptyset^{(t)} - \emptyset^{(0)}) \leq - x]

P [2 n (\emptyset^{(t)} - \emptyset^{(0)}) \leq - x]

\leq \frac{2 t σ ^{2}}{2 σ t ln ( 1/ δ ) 2 π} exp (- \frac{4 σ ^{2} t ln ( 1/ δ )}{4 t σ ^{2}}) \leq \frac{δ}{2} .

P [2 n (\emptyset^{(t)} - \emptyset^{(0)}) \leq - x]

P [2 n (\emptyset^{(t)} - \emptyset^{(0)}) \leq - x]

\geq \frac{1}{2 π} \frac{y}{y ^{2}} exp (- \frac{4 σ ^{2} t ln ( 1/ δ )}{4 t σ ^{2}}) \geq \frac{δ}{2 2 ln ( 1/ δ )} .

E [1^{T} X^{(t + 1)} - 1^{T} x ∣ X^{(t)} = x, ζ_{t} = (i, j)]

E [1^{T} X^{(t + 1)} - 1^{T} x ∣ X^{(t)} = x, ζ_{t} = (i, j)]

= \frac{E [ N _{i} ] + E [ N _{j} ]}{2} = 0.

- m_{t, δ / (2 t)} \leq 1^{T} X^{(t + 1)} - 1^{T} X^{(t)} \leq m_{t, δ / (2 t)}

- m_{t, δ / (2 t)} \leq 1^{T} X^{(t + 1)} - 1^{T} X^{(t)} \leq m_{t, δ / (2 t)}

P [∣ \emptyset^{(t)} - \emptyset^{(0)} ∣ \geq b] \leq exp - \frac{b ^{2}}{2 ( \sum _{i = 1}^{t} σ ^{2} + M b /3 )} \leq exp (- ln (2 t / δ)) .

P [∣ \emptyset^{(t)} - \emptyset^{(0)} ∣ \geq b] \leq exp - \frac{b ^{2}}{2 ( \sum _{i = 1}^{t} σ ^{2} + M b /3 )} \leq exp (- ln (2 t / δ)) .

i \prod (1 - x_{i})^{w_{i}} \geq 1 - i \sum w_{i} x_{i},

i \prod (1 - x_{i})^{w_{i}} \geq 1 - i \sum w_{i} x_{i},

i \prod (1 - x_{i})^{w_{i}} \leq 1 - i \sum w_{i} x_{i},

i \prod (1 - x_{i})^{w_{i}} \leq 1 - i \sum w_{i} x_{i},

i \sum w_{i} ln (1 - x_{i}) \leq ln (i \sum w_{i} \cdot (1 - x_{i})) \leq ln (1 - i \sum w_{i} x_{i}),

i \sum w_{i} ln (1 - x_{i}) \leq ln (i \sum w_{i} \cdot (1 - x_{i})) \leq ln (1 - i \sum w_{i} x_{i}),

P [∣ X - E [X] ∣ \geq b] \leq exp (- \frac{2 b ^{2}}{\sum _{i = 1}^{m} ( b _{i} - a _{i} ) ^{2}}) .

P [∣ X - E [X] ∣ \geq b] \leq exp (- \frac{2 b ^{2}}{\sum _{i = 1}^{m} ( b _{i} - a _{i} ) ^{2}}) .

P [X - E [X] \geq b] \leq exp (- \frac{b ^{2}}{2 ( \sum _{i = 1}^{m} σ _{i}^{2} + M b /3 )}) .

P [X - E [X] \geq b] \leq exp (- \frac{b ^{2}}{2 ( \sum _{i = 1}^{m} σ _{i}^{2} + M b /3 )}) .

P [X \leq E [X] - b] \leq exp (- \frac{b ^{2}}{2 ( \sum _{i = 1}^{m} ( σ _{i}^{2} + a _{i}^{2} ) + M b /3 )}) .

P [X \leq E [X] - b] \leq exp (- \frac{b ^{2}}{2 ( \sum _{i = 1}^{m} ( σ _{i}^{2} + a _{i}^{2} ) + M b /3 )}) .

\overset{ˉ}{ϕ} (X^{(t + 1)}) - \overset{ˉ}{ϕ} (x^{(t)}) = - \frac{( x _{i} - x _{j} ) ^{2}}{2} + \frac{n _{i}^{2} + n _{j}^{2}}{4} - \frac{( n _{i} + n _{j} ) ^{2}}{4 n} + (n_{i} + n_{j}) (\frac{x _{i} + x _{j}}{2} - \emptyset^{(t)})

\overset{ˉ}{ϕ} (X^{(t + 1)}) - \overset{ˉ}{ϕ} (x^{(t)}) = - \frac{( x _{i} - x _{j} ) ^{2}}{2} + \frac{n _{i}^{2} + n _{j}^{2}}{4} - \frac{( n _{i} + n _{j} ) ^{2}}{4 n} + (n_{i} + n_{j}) (\frac{x _{i} + x _{j}}{2} - \emptyset^{(t)})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newaliascnt

lemmatheorem \aliascntresetthelemma

\newaliascntcorollarytheorem \aliascntresetthecorollary

\newaliascntdefinitiontheorem \aliascntresetthedefinition

\newaliascntremarktheorem \aliascntresettheremark

\newaliascntfacttheorem \aliascntresetthefact

\newaliascntconjecturetheorem \aliascntresettheconjecture

\newaliascntobservationtheorem \aliascntresettheobservation

\newaliascntassumptiontheorem \aliascntresettheassumption

\newaliascntassumptionstheorem \aliascntresettheassumptions

\newaliascntpropositiontheorem \aliascntresettheproposition

\newaliascntclaimtheorem \aliascntresettheclaim

Noidy Conmunixatipn: On the Convergence of the Averaging Population Protocol

Frederik Mallmann-Trenn

Department of Computer Science, Technion, Haifa, Israel

Yannic Maus

Dominik Pajak

Department of Computer Science, Technion, Haifa, Israel

Abstract

We study a process of averaging in a distributed system with noisy communication. Each of the agents in the system starts with some value and the goal of each agent is to compute the average of all the initial values. In each round, one pair of agents is drawn uniformly at random from the whole population, communicates with each other and each of these two agents updates their local value based on their own value and the received message. The communication is noisy and whenever an agent sends any value $v$ , the receiving agent receives $v+N$ , where $N$ is a zero-mean Gaussian random variable. The two quality measures of interest are (i) the total sum of squares $TSS(t)$ , which measures the sum of square distances from the average load to the initial average and (ii) $\bar{\phi}(t)$ , measures the sum of square distances from the average load to the running average (average at time $t$ ).

It is known that the simple averaging protocol—in which an agent sends its current value and sets its new value to the average of the received value and its current value—converges eventually to a state where $\bar{\phi}(t)$ is small. It has been observed that $TSS(t)$ , due to the noise, eventually diverges and previous research—mostly in control theory—has focused on showing eventual convergence w.r.t. the running average. We obtain the first probabilistic bounds on the convergence time of $\bar{\phi}(t)$ and precise bounds on the drift of $TSS(t)$ that show that albeit $TSS(t)$ eventually diverges, for a wide and interesting range of parameters, $TSS(t)$ stays small for a number of rounds that is polynomial in the number of agents. Our results extend to the synchronous setting and settings where the agents are restricted to discrete values and perform rounding.

Frederik Mallmann-Trenn and Dominik Pajak were supported in part by NSF Award Numbers CCF-1461559, CCF-0939370, and CCF-18107. Yannic Maus was partly supported by ERC Grant No. 336495 (ACDC).

1 Introduction
1.1 Motivation and Related Work
1.2 Formal Results
1.3 Technical Contributions
2 Model
3 The Sequential Setting: Convergence towards the Running Average
4 Deviation from the Initial Average
5 Experimental Results
5.1 The Distribution of the Distances
5.2 The Bounded Values Setting
6 Conclusion and Open Problems
A Auxiliary Claims
B Missing Proofs: Sequential Setting (Section 3)
B.1 Sequential Setting: One Step Potential Change
B.2 Bounding $S^{\prime}$ , $S^{*}$ , and $S^{-}$
B.3 Proof of Section 3 and Theorem 1.1
C Synchronous Model
D The Influence of Rounding

1 Introduction

We consider the problem of distributed averaging by a group of agents (e.g., sensors), initialized with values that represent, for example, different temperature measurements. The agents’ goal is to compute the average of all the initial values using the following simple dynamic: In each discrete round, two agents are drawn uniformly at random from the whole population, communicate their values to each other and set their new values to the average of their old value and the received value. Converging to the average plays a key-role in many applications, e.g., for sensor networks [58, 52], social insects [10], and robotics [21, 31]. In all of these applications, the agents (sensors, ants, and robots) are very simple and are therefore limited in both memory and communication. Moreover, communication is often erroneous.111Consult Section 1.1 for a more detailed review of these applications including the limitation of agents and further motivation. Section 1.1 also contains related work on the averaging protocol. This motivates the study of the aforementioned simple averaging dynamic in a setting where the agents only remember one value, do not use any additional memory, and the communication is subject to noise. We model the noise in the communication as follows: Whenever an agent sends any value $v$ , the receiving agent receives $v+N$ , where random variable $N$ is distributed according to some zero-mean probability distribution $\aleph$ , e.g., a normal distribution. The agents update their values as follows: whenever two agents communicate, each agent sets its new value to the average of their old value and the received value; note that—due to the noise—the two agents might have distinct new values.

The values of the $n$ nodes in step $t$ of the process are denoted by $X_{1}^{(t)},X_{2}^{(t)},\dots,X_{n}^{(t)}$ . We consider the following models: (i) the sequential setting where one pair of agents is chosen uniformly at random and (ii) the synchronous setting where each agent is matched to exactly one other agent chosen uniformly at random. The two quality measures of the convergence used in this work are (i) the total sum of squares $TSS(t)=\sum_{i}(X_{i}^{(t)}-\varnothing^{(0)})^{2}$ , where $\varnothing^{(0)}=\sum_{i}X_{i}^{(0)}/n$ is the initial average and (ii) the sum of squared distances to the running average $\bar{\phi}(t)=\sum_{i}(X_{i}^{(t)}-\varnothing^{(t)})^{2}$ , where $\varnothing^{(t)}=\sum_{i}X_{i}^{(t)}/n$ is the running average. Our contributions can be informally summarized as follows:

(i)

We give, under mild assumptions on the noise, the first bounds on the convergence time of the running average $\bar{\phi}(t)$ in the noisy gossip-based communication setting. The bounds we obtain are—up to a constant factor—tight. In particular, the potential converges to a value that is linear in $n$ and the second moment of the noise $\mathbb{E}\left[\,N^{2}\,\right]$ ; which is tight. So far it was only known that the process eventually converges to a state where $\bar{\phi}(t)$ is small (e.g., [56]), but precise bounds were not known. (Theorem 1.1) 2. (ii)

We show that, in contrast to the current belief, one can hope to converge to the initial average in addition to convergence to the running average as long as the number of rounds are bounded: It was known that $TSS(t)$ , due to the noise, eventually diverges (the running average diverges from the initial average) and for this reason related research—mostly in control theory—has focused on showing eventual convergence w.r.t. $\bar{\phi}(t)$ ; leaving $TSS(t)$ aside. Since we give precise bounds on the convergence time of the running average, we can show the following. Under mild assumptions on the noise, $TSS(t)$ converges to almost the same value as $\bar{\phi}(t)$ as long as the number of time steps $t$ is bounded by $O(n^{2})$ , where $n$ is the number of nodes. (Section 1.2) 3. (iii)

We pioneer in the discrete setting in which the agents can store only integer values and the noise is also an integer. In this setting the agents in our algorithm perform randomized rounding. We show that this only causes a negligible difference from the continuous case. (Section 1.2) 4. (iv)

We study both the sequential and the synchronous setting and show that there is no significant difference (up to a scaling of time) between the models. (Section 1.2) 5. (v)

We perform simulations in the setting where nodes are limited in storage, * i.e.,* they can only store values from a bounded range. This leads to a much faster (by order of magnitude) divergence between the running average and the initial average. Our simulations also seem to indicate strong bounds on the distribution of distances to the running average in our main model (unbounded values). (Section 5)

The convergence time of the averaging processes in the gossip-based communication setting without noise has been studied before (e.g., [39]). However, to the best of our knowledge, no bounds on the convergence time are known in the gossip-based communication setting with noise. We continue with a detailed motivation for studying noise in the simple averaging dynamic and related work.

1.1 Motivation and Related Work

Converging to the average plays a key role in many applications in which agents have limited computational and communication power, e.g.,

(i)

sensor networks [58, 52]: here there is a wide range of application including terrain monitor applications [53], computing an average temperature, PIR sensors measuring the infrared light radiation emitted from objects, and many more applications. In such scenarios links are often faded [48, 14], 2. (ii)

social insects: for ants, values could represent the individuals’ different assessments of nest qualities when house hunting [10] or the deficit of workers at a given task [43], and 3. (iii)

robotics [21, 31] and in particular memory-limited robots, e.g., Kilobots exploring the percentage of white tiles in an area [22], or microbots measuring the concentration of chemicals.

In all of these applications the agents (representing sensors, ants or robots) are very simple and severely limited in both memory and communication. Moreover, the communication is often not only limited but also erroneous (e.g., consider wireless communication with obstacles between robots), or received messages are subject to interpretation (e.g., when insects communicate through gestures [41]). Motivated by this unreliable communication in applications we study the simple averaging dynamic where the communication is subject to noise.

We continue with related work. The problem of distributed values converging to the average (often without noise) has been studied in various areas reaching back to early versions studied in statistics [19, 27, 32]. However, to the best of our knowledge, none of the studied models match our model. We review the related work by areas:

average consensus and its applications,
gossip-based communication models,
consensus protocols in population protocols,
biological distributed algorithms,
noise and failures in sensor networks.

Average consensus and its applications

Consensus has been studied intensively in various settings in general network topologies, much of it under the name of average consensus [57, 55]. Most of this work is orthogonal to our work: First, due to the general network topology and the fact that, in each step of the studied algorithms, the agents update their values with a weighted average of all of their neighbors’ values whereas in our averaging dynamic, an agent can only access a single other value per interaction. Second, while the potential functions in these works and the noise, if any, are usually identically or similarly defined as in our work the main goal of these papers is—just as in the classic works—to study under which circumstances the processes eventually converge to a state with a small potential function [57], whereas we are interested in the number of interactions until our process obtains a small potential. Recent papers [47, 11, 42, 15] consider the convergence rate of the weighted averaging process, but only in the noiseless setting. Average consensus has also been studied in networks with time-varying topologies [46, 51]. Variants with noisy communication were studied [57, 38], but they only consider additive noise and assume it to be zero-mean with unit variance (as mentioned before, only convergence in the limit is shown). The noisy version of the problem also received ample attention in control theory [54, 50, 49]. Already in the early works on average consensus immediate applications of converging to the average were discovered and intensively studied, e.g., applications to load balancing between parallel machines [9, 18] or to coordinate distributed mobile agents [9, 36, 24]. For a more detailed overview on average linear consensus consult the survey [28].

Gossip-based communication models

Much closer to our work is the study of aggregating information in gossip-based model. In this model, each node can contact one of its neighbors in the network in each round and exchange information with it. Even though a node can be contacted by many neighbors in a single round, this model, if applied to the complete graph, is very similar to our synchronous model. On the complete graph [39] shows that $O(n\cdot\ln n)$ interactions are enough to approximate the average well with high probability. On the one hand they consider more general graphs (in some sense we consider the complete graph); on the other hand they do not consider noise, which simplifies their analysis of the convergence time significantly.

Consensus protocols in population protocols, biological distributed algorithms

Motivated by biological applications, population protocols have also been studied in the noisy setting in the context of biological distributed algorithms. The authors of [25] study rumor spreading and consensus in extremely faulty networks where a bit in a message can be flipped with probability $1/2-\varepsilon$ . This was later generalized in [26] to plurality consensus. The authors of [8] study the differences between pull and push rumor spreading in the noisy setting. Reaching consensus to an opinion in population protocols in the noiseless setting has received much attention (see e.g., [4, 23, 1, 2, 5, 6, 20, 7, 40, 30, 29, 37]).

Noise and failures in sensor networks

The problem of converging to the average (and similar problems) have also been studied in (noisy) sensor networks [58, 52] where nodes again can interact with all their neighbors. In these networks another type of unreliable communication, i.e., packages might be dropped, has received ample attention, e.g., [12] studies the broadcast problem and [13] develops a framework to transform certain algorithms for failure free networks to also work in faulty sensor networks.

An interesting type of failure has been studied in [33]. There failures do not happen during the communication but the algorithm itself might be faulty, i.e., a state machine run at an agent might switch to a wrong state.

1.2 Formal Results

We now formally state our main theorems. For the ease of presentation, in the discussion we assume that noise is normally distributed with unit variance, $N\sim\mathcal{N}(0,1)$ , but our results hold for general variance $\sigma^{2}$ . Let $\phi_{0}=\bar{\phi}(\mathbf{X}^{(0)})$ be the initial potential. Our first theorem shows that the agents converge to a small value of $\bar{\phi}(t)=O(n)$ after parallel time222Recall that in parallel time we scale time by a factor of $n$ for a fair comparison with the synchronous time model. that is logarithmic in $\phi_{0}/n$ . In particular, if we use $b$ to denote the initial imbalance ( $b=\max_{i,j}\{x_{i}^{(0)}-x_{j}^{(0)}\}$ ), then it takes $O(\ln b)$ parallel steps for the potential to become $\bar{\phi}(t)=O(n)$ . Note that $\bar{\phi}(t)=O(n)$ means that the ‘average’ difference between the values of any two agents is constant and we show that the constant hidden in the $O$ -notation is actually very small. It is worth mentioning that this is tight in two senses: (i) In expectancy we have $\bar{\phi}(t)=\Omega(n)$ for any fixed time step $t\geq n$ , (i.e., after one parallel time step). Even in the case where all nodes initially have the same value, our results show that the potential increases after $n$ interactions in expectation by $\Omega(n\mathbb{E}\left[\,N^{2}\,\right])=\Omega(n)$ . (ii) At least $\Omega(\ln b)$ parallel time steps are required333For the case where constant fraction of the values are at distance $b$ . to decrease the potential to $O(n)$ , since the potential only drops in expectation by a constant factor in each parallel step. The formal statement is as follows.

Theorem 1.1 (Convergence to Running Avg.).

Consider any noise-distribution $\aleph$ with (at least) exponential-decay444In fact we only require the function to be smooth, which we define later. This class is much broader and contains most of the famous distributions including the normal distribution, geometric distribution and the Poisson distribution. . Fix any $\delta\in\mathbb{R}$ . Let $n=n(\delta)$ be large enough. The following hold:

(i)

for any $t=\Omega\left(n\ln\left(\frac{\phi_{0}}{\delta\sigma^{2}n}\right)\right)$ with probability at least $1-\delta$ we have $\bar{\phi}(\mathbf{X}^{(t)})=O(\sigma^{2}n\ln(1/\delta))$ , 2. (ii)

for any $t\geq n$ (parallel time) with constant probability we have $\bar{\phi}(\mathbf{X}^{(t)})=\Omega(\sigma^{2}n)$ and 3. (iii)

even without noise, for any $t=o\left(n\ln\left(\frac{\phi_{0}}{\sigma^{2}n}\right)\right)$ we have $\mathbb{E}\left[\,\bar{\phi}(\mathbf{X}^{(t)})\,\right]=\omega(\sigma^{2}n)$ .

While the above theorem shows a quick convergence to the running average, this does not imply convergence to the initial average. In fact, as time progresses the distance to the initial average ( $TSS(\mathbf{X}^{(t)})$ ) is likely to increase. Nonetheless, in the case of the Gaussian white noise model we can bound the drift of the running average from the initial average in a time window of $O(n^{2})$ steps (cf. Section 4). Theorem 1.1 roughly says that after at least $t=\Omega(n\log n)$ steps the distance to the running average is small if we start with a potential that is polynomial in $n$ . Thus, as long as $t=\Omega(n\log n)$ and $t=O(n^{2})$ we obtain $TSS(\mathbf{X}^{(t)})=O\left(n\right)$ . After the $O(n^{2})$ step time window the potential starts to increase again, which, is unavoidable, due to the noise causing drift of the running average; in Gaussian white noise model, the running average after $t$ steps diverges with constant probability from the initial average by $\frac{\sqrt{t}}{n}$ (Section 4). This in turn implies that $TSS(\mathbf{X}^{(t)})\geq t/n$ .

Corollary \thecorollary ((Bounded) Divergence from Initial Avg.).

In the case of Gaussian white noise model, for any $\delta\in\mathbb{R}$ and large enough $n=n(\delta)$ and all $t=\Omega\left(n\ln\left(\frac{\bar{\phi}(\mathbf{X}^{(0)})}{\delta\sigma^{2}n}\right)\right)$ we have

(i)

‘non-divergence for $O(n^{2})$ steps’, i.e., $TSS(\mathbf{X}^{(t)})=O\left(\left(\frac{t}{n}+n\right)\sigma^{2}\ln(1/\delta)\right)$ with probability at least $1-\delta$ and 2. (ii)

‘divergence for $\omega(n^{2})$ steps’, i.e., $TSS(\mathbf{X}^{(t)})=\Omega\left(\left(\frac{t}{n}+n\right)\sigma^{2}\right)$ with constant probability.

If one can bound the divergence between the running average and the initial average for a general noise-distribution $\aleph$ with (at least) exponential-decay555Again, we only require the function to be smooth, which we define in Section 3. the following remark is useful to obtain a similar bound for the $TSS(\mathbf{X}^{(t)})$ as in Section 1.2.

Recall that $\varnothing^{(t)}=\sum_{i}X_{i}^{(t)}/n$ and in particular, $\varnothing^{(0)}$ denotes the initial average.

Remark \theremark.

Fix any $\delta\in\mathbb{R}$ . Let $n=n(\delta)$ be large enough. For any fixed $t=\Omega\left(n\ln\left(\frac{\phi_{0}}{\delta\sigma^{2}n}\right)\right)$ with probability at least $1-\delta$ we have $TSS(\mathbf{X}^{(t)})=\Theta\left(n\left(\varnothing^{(t)}-\varnothing^{(0)}\right)^{2}+\sigma^{2}n\ln(1/\delta)\right)$ .

Section 1.2 follows by rewriting $TSS(t)=\bar{\phi}(\mathbf{X}^{(t)})+n\cdot\left(\varnothing^{(0)}-\varnothing^{(t)}\right)^{2}$ (cf. Section 2) and plugging in the first part of Theorem 1.1. Section 1.2 then follows by plugging in the bounded deviation of the running average from the initial average for the Gaussian white noise model (cf. Section 4).

The Influence of Rounding

Agents with limited computational power might not be able to store real values. Motivated by this we also consider the setting where agents can only store integers. In particular, we consider the case that the averaging protocol is augmented with the following rounding procedure: Assume that the noise $N\sim\aleph$ takes only integer variables. After a node $i$ receives the value from node $j$ , the node averages it as before and then rounds up or down with equal probability. In Appendix D we show how to relate the setting of rounding to the original setting allowing us to derive the following corollary.

Corollary \thecorollary.

The bounds of Theorem 1.1 and Section 1.2 hold even if rounding is used.

The Synchronous Model

In Appendix C, we show how our results extend to the synchronous setting. It turns out that the results are the same up to a rescaling of time.

Corollary \thecorollary (Synchronous Setting).

The bounds of Theorem 1.1 and Section 1.2 hold even in the synchronous setting, where time is rescaled by a factor of $2/n$ .

Experimental Results

In Section 5, we simulate the averaging dynamic in various settings. In the first setting, we consider the distribution of the distances between agents’ values and the running average. Our simulations show that these distances seem to follow an exponential law, i.e., the concentration is even stronger than what Theorem 1.1 implies.

Due to the limited memory of agents it would be desirable to obtain similar results as in Theorem 1.1 for the averaging dynamic in the setting where agents can only store values from a bounded range. However, our simulations in Section 5 show that this setting leads to a much faster (by order of magnitude) divergence between the running average and the initial average.

1.3 Technical Contributions

While it is not hard to show that in expectation the potentials $TSS(t)$ and $\bar{\phi}(t)$ decrease in one step as long as their value is large, it is surprisingly challenging to derive probabilistic bounds on either potential at an arbitrary point in time, * i.e.,* bounds of the type $\mathbb{P}\left[\,\bar{\phi}(t)\geq b\,\right]\leq p(b)$ . Two of the reasons are as follows. (i) The potential decreases (expectedly) only conditioned on the fact that it is large enough. In fact, when the potential is small, then due to the noise it will increase in expectation. (ii) Since we study general distributions and in particular the normal distribution, the noise in a given round can be arbitrarily large leading to an arbitrarily large increase in $\bar{\phi}(t)$ ; if the protocol runs long enough (possibly exponentially long in $n$ ) we, indeed, will have encountered some time steps with a very large potential increase. There are surprisingly few analytical tools for using potentials as $\bar{\phi}(t)$ with challenges (i) and (ii). One notable exception is Hajek’s theorem [34], which can be used to bound the value of such a potential at a given time $t$ . However, in our setting—with our potential function—the results obtained are very weak.666Hajek’s theorem considers the moment generating function of the potential. In order to apply the theorem to our potential, it seems that one would need to consider a logarithmic version of the potential, which together with the moment generating function results in bound that is weaker than a simple union bound.

Instead, we use a more sophisticated approach that at its core has a decomposition of the potential change in a single time step into three additive (but dependent) random variables. We iterate this decomposition over time throughout some interval $\mathcal{I}=(t_{0},t_{1}]$ and sum the respective variables which we will denote as $S^{-}(\mathcal{I})$ , $S^{\prime}(\mathcal{I})$ , and $S^{*}(\mathcal{I})$ . Then (cf. Section 3) we are able to bound the potential change at the end of the interval as

[TABLE]

Due to the dependencies between the three variables we use strong Martingale concentration bounds to separately upper bound $S^{\prime}(\mathcal{I})+S^{*}(\mathcal{I})$ and lower bound $S^{-}(\mathcal{I})$ (cf. Section 3). We then use union bound—to circumvent the dependencies—to bound each of these variables allowing us to get a bound on Equation 1. It is critical that we define the random variable $S^{-}$ in such a way that it always has an expected decrease. This is in stark contrast to the entire potential, which, as we mentioned before in (i), only decreases in expectation when it is large. Having an unconditional decrease of $S^{-}$ allows us to consider arbitrarily large intervals. With these bounds at hand one can use Equation 1 to obtain probabilistic bounds on the potential at any given point time $t_{1}$ . However, due to the bound on $S^{\prime}(\mathcal{I})+S^{*}(\mathcal{I})$ the total bound becomes very weak for large intervals. As a remedy, we carefully trace the change in the potential in different regimes (with several phases in each regime) and we separately apply the aforementioned analysis with a fresh (small) interval in each phase. The intervals (and thus also the phases) have variable length—decreasing geometrically or even exponentially, depending on the regime.

2 Model

In this section we present the model including all assumptions. We have a collection of $n$ agents that have initial values $X_{1}^{(0)},X_{2}^{(0)},\dots,X_{n}^{(0)}$ . Time is discrete and $X_{i}^{(t)}$ denotes the value of agent $i\in[n]$ at time $t$ . Recall that $\varnothing^{(t)}=\sum_{i}X_{i}^{(t)}/n$ denotes the average value at time $t$ ; in particular, $\varnothing^{(0)}$ denotes the initial average. For two random variables $X$ and $Y$ we write $X\stackrel{{\scriptstyle d}}{{=}}Y$ if they have the same (probability) distribution. Next, we define the communication models.

Definition \thedefinition (Communication Models).

We consider two communication models.

(i)

Sequential model*: At every discrete time step two of the agents $i,j$ are chosen uniformly at random (with replacement777This is not crucial to our results, but simplifies the calculations slightly.) and exchange their current values $x_{i}$ and $x_{j}$ , where the values received are $x_{i}+N_{i}$ and $x_{j}+N_{j}$ , where $N_{i},N_{j}\stackrel{{\scriptstyle d}}{{=}}N$ .* 2. (ii)

Synchronous model*: At every discrete time step a perfect matching is chosen u.a.r. among all perfect matchings on the $n$ agents.888Again, we allow matchings of the kind $(i,i)$ for simplicity. It is easy but slightly less aesthetic to modify our results to exclude matchings $(i,i)$ . * All matched agents interchange their values as in the sequential model. 3. (iii)

Sequential model*: At every discrete time step two of the agents $i,j$ are chosen uniformly at random (with replacement999This is not crucial to our results, but simplifies the calculations slightly.) and send their current values $x_{i}$ and $x_{j}$ to each other, where the values received are $x_{i}+N_{i}$ and $x_{j}+N_{j}$ , where $N_{i},N_{j}\stackrel{{\scriptstyle d}}{{=}}N$ .* 4. (iv)

Synchronous model*: At every discrete time step a perfect matching is chosen u.a.r. among all perfect matchings on the $n$ agents*101010Again, we allow matchings of the kind $(i,i)$ for simplicity. It is easy but slightly less aesthetic to modify our results to exclude matchings $(i,i)$ . . All matched agents interchange their values as in the sequential model.

We use the parallel time, which was first defined in [3], to denote the time step $t/n$ in the sequential model. This notion eases the comparison of results in both models, as the total number of interactions is up to a factor of $2$ equal.

Definition \thedefinition (Noise Models).

Let $v$ be the value sent by an agent. The value received is $v+N$ , where $N$ is distributed according to some zero-mean noise distribution $\aleph$ and let $\sigma^{2}=\operatorname{Var}\left[\,N\,\right]$ .

We consider general noise distributions and our results depend on the moments of $N$ . The following two models are of special interest in this paper.

(i)

Gaussian white noise model where $\aleph=\mathcal{N}(0,\sigma^{2})$ for an arbitrary $\sigma$ . 2. (ii)

Discrete white noise model where $\aleph=\mathcal{D}(p)$ , with $\mathbb{P}\left[\,N=i\,\right]=\frac{1}{2}p(1-p)^{|i|}$ , for $i\in\mathbb{Z}\setminus\{0\}$ and $\mathbb{P}\left[\,N=0\,\right]=p$ , where $p\in(0,1]$ . Note that $\operatorname{Var}\left[\,N\,\right]=\frac{1-p}{p^{2}}$ .

From now on we assume that the noise $N$ is distributed according to a fixed noise distribution $\aleph$ that is independent of $n$ .

Definition \thedefinition (Averaging Dynamic).

We consider the real valued and the discrete valued algorithm. A node with value $v$ at time receiving the input $w$ sets its new value to

(i)

$v^{\prime}=\nicefrac{{(v+w)}}{{2}}$ * in the real valued model.* 2. (ii)

$v^{\prime}=\begin{cases}\left\lceil\nicefrac{{(v+w)}}{{2}}\right\rceil&\text{ w.p.$ \frac{1}{2} $}\\ \left\lfloor\nicefrac{{(v+w)}}{{2}}\right\rfloor&\text{ otherwise}\end{cases}$ * in the discrete valued model.*

A probability distribution $\mathcal{D}$ is called sub-Gaussian if for $X\sim\mathcal{D}$ we have that there exists positive constants $c_{1},c_{2}$ such that for every $x$ we have $\mathbb{P}\left[\,|X|\geq x\,\right]\leq c_{1}\exp(-c_{2}x^{2}).$

Whenever we calculate the new values $\mathbf{X}^{(t+1)}$ by conditioning on the current state, $\mathbf{X}^{(t)}=\mathbf{x}^{(t)}$ we use small letters $x_{i}^{(t)}$ to denote fixed values and capitalized letters $X_{i}^{(t+1)}$ to denote random variables. Furthermore, we use bold-face to denote vectors. Throughout the paper we will assume that the number of agents $n$ is large enough and in particular $n\mathbb{E}\left[\,N^{2}\,\right]\geq 1.$

We define the following potentials which are essential in all our proofs and formal results.

Definition \thedefinition (Potentials).

[TABLE]

When clear from the context we drop the time index $t$ and we write $\mathbf{x}$ instead of $\mathbf{x}^{(t)}$ , $x_{i}$ instead of $x_{i}^{(t)}$ , etc. Similarly we will use the following short forms $TSS(t)=TSS(\mathbf{x}^{(t)})$ and $\bar{\phi}(t)=\bar{\phi}(\mathbf{x}^{(t)})$ . We emphasize that the difference between $\bar{\phi}(\mathbf{x})$ and $TSS(t)$ is that the former measures the squared distance w.r.t. the running average and the latter w.r.t. initial average. Initially, we have $\bar{\phi}(\mathbf{x}^{(0)})=TSS(0)$ . The following fact shows how $\bar{\phi}(\mathbf{X}^{(t)})$ relates to $TSS(t)$ and how $\bar{\phi}$ relates to $\phi$ .

Fact \thefact.

We have that

(i)

$TSS(t)=\bar{\phi}(\mathbf{X}^{(t)})+n\cdot\left(\varnothing^{(0)}-\varnothing^{(t)}\right)^{2}$ * and* 2. (ii)

$\phi(\mathbf{x})=2n\cdot\bar{\phi}(\mathbf{x}).$ **

Proof.

Consider part $(i)$ .

[TABLE]

∎

Consider part $(ii)$ .

[TABLE]

Note that many alternative ways to define the potential at a time $t$ such as the max distance and $\ell 1$ norm give only a very partial picture: The max distance to the mean for example does not distinguish between just one node being far and all nodes being far. On the other hand, the $\ell 1$ norm does not does not ‘punish’ outliers enough: there is no difference between $n$ nodes being off by $1$ from the average and one node being off by $n$ .

Notation

We use $X\sim\mathcal{D}$ to denote that $X$ is distributed according to probability distribution $\mathcal{D}$ . For two random variables $X$ and $Y$ we write $X\leq^{\text{st}}Y$ if $X$ is stochastically dominated by $Y$ , i.e., $\mathbb{P}\left[\,X\geq x\,\right]\leq\mathbb{P}\left[\,Y\geq x\,\right]$ for all $x\in\mathbb{R}$ . We use $\left\lVert\mathbf{x}\right\rVert_{2}$ to denote the $L2$ -norm. In the sequential model we have two random variables $N_{1}^{(t)}$ and $N_{2}^{(t)}$ for the noise of the channel at time step $t$ (recall that $N_{1}^{(t)}$ and $N_{2}^{(t)}$ are distributed according to $\aleph$ ). We define the following two random variables $N^{\prime(t)}$ and $N^{*(t)}$ that will play a key-role in our analysis:

[TABLE]

Fact \thefact.

In the Gaussian noise model, we have $N^{*(t)}\sim\mathcal{N}(0,2\sigma^{2})$ and $N^{\prime(t)}\sim\Gamma(1,2\sigma^{2})$ , where $\Gamma(\cdot,\cdot)$ denotes the gamma distribution.

When clear from the context we simply write $N^{\prime}$ and $N^{*}$ instead of $N^{\prime(t)}$ and $N^{*(t)}$ , respectively. We use $\mathcal{F}_{t}$ to denote the filtration at time $t$ , which encapsulates all randomness up to time $t$ as well as the initial values of the nodes; hence it defines the state at time $t$ completely.

3 The Sequential Setting: Convergence towards the Running Average

Conditioning on all the randomness until time $t$ , * i.e.,* conditioning on $\mathcal{F}_{t}$ , we define

$\Delta^{(t+1)}=\begin{cases}\frac{\left(x_{i}^{(t)}-x_{j}^{(t)}\right)^{2}}{2\bar{\phi}(\mathbf{x}^{(t)})}&\text{ for$ \bar{\phi}(\mathbf{x}^{(t)})>0 $}\\ 1/n&\text{ otherwise }\end{cases}$ , where $i$ and $j$ are the chosen in round $t$ .

Lemma \thelemma (One Step Bound).

Fix an arbitrary potential at time $t$ . Suppose the pair $i,j$ was chosen to communicate and condition on the filtration $\mathcal{F}_{t}$ (all events that happened up to round $t$ ). Then, the following holds

[TABLE]

Further we have $\mathbb{E}\left[\,\Delta^{(t+1)}~{}|~{}\mathcal{F}_{t}\,\right]=\frac{1}{n}$ .

In order to prove the statement, we first calculate the exact expected change in one step (Section B.1). We then majorize (stochastic dominance) with the slightly more convenient statement above.

For an arbitrary time interval $\mathcal{I}$ define

[TABLE]

Note that, in the definition of $S^{*}$ , we sum up over all time steps $\tau$ in the interval $\mathcal{I}$ and we consider the pair $i$ and $j$ that is chosen in round $\tau$ (in each round a different pair $i$ and $j$ can be chosen). With Section 3 and the definitions of $S^{\prime},S^{*}$ and $S^{-}$ we can deduce the following decomposed bound on the potential for an arbitrary interval.

Proposition \theproposition (Decomposition of Potential).

Fix arbitrary $t_{0},t_{1}$ and consider the interval $\mathcal{I}=(t_{0},t_{1}]$ . For $t=t_{1}-t_{0}$ we have that

[TABLE]

In the following we define smooth noise distributions. Define

[TABLE]

Using strong martingale concentration bounds (Theorem A.3 and Theorem A.4) and bounding the variance, we deduce the following upper bound on $S^{*}+S^{\prime}$ and lower bound on $S^{-}$ .

Lemma \thelemma.

Let $t_{0},t_{1}$ be such that $t_{1}>t_{0}$ and consider the interval $\mathcal{I}=(t_{0},t_{1}]$ .

(i)

With probability $1-\delta$ we have

[TABLE] 2. (ii)

For any $\gamma<1$ , w.p. at least $1-\exp\left(-\frac{3\gamma^{2}t}{8n}\right)$ we have $S^{-}(\mathcal{I})\geq(1-\gamma)\frac{t}{n}.$

Our main results only hold for smooth noise distributions, which we define in the following.

Definition \thedefinition.

A noise distribution $\aleph$ is smooth if for all $\delta>0$ and all $t>0$ we have $m_{t,\delta}\leq\left(\frac{t}{\delta}\right)^{1/20}$ .

However, note that any (sub-)linear probability distribution and even some inverse polynomial distributions are smooth. Thus many practically relevant distributions such as Gaussian, binomial and Poisson distributions are smooth. For example, for the standard normal distribution ( $N\sim\mathcal{N}(0,1)$ ) we have $m_{t,\delta}=\log(t/\delta)$ , since in each time step the probability that the $N^{2}$ exceeds $\log(t/\delta)$ is equal to the probability that $N$ exceeds $\sqrt{\log(t/\delta)}$ which happens w.p. at most $\delta/t$ . Taking union bound over all $t$ steps shows that it is smooth.

For smooth noise distributions we can upper bound the additive increase due to the noise.

The following proposition almost directly implies Theorem 1.1.

Proposition \theproposition.

Fix any $\delta\in(0,1]$ and assume that the noise distribution is smooth. There exists a constant $c$ such that for a time step $t_{0}$ with potential $\bar{\phi}(\mathbf{x}^{(t_{0})})$ we have

[TABLE]

where $t^{*}=t_{0}+cn\ln\left(\frac{\bar{\phi}(\mathbf{x}^{(t_{0})})}{\mathbb{E}\left[\,N^{\prime}\,\right]n\delta}\right)$ and $b=2\left(1+\mathbb{E}\left[\,N^{\prime}\,\right]\right)\left(\ln(1/\delta)\right)^{9}n^{9/10}.$

Proof Sketch.

We only sketch the proof idea for a simplified setting; during the sketch we assume that $N\sim\mathcal{N}(0,1)$ (with $\mathbb{E}\left[\,N^{\prime}\,\right]=O(1)$ ) and also that $\delta$ is at least $1/n^{3}$ . The main ingredients for the proof are Section 3 and Section 3. For an interval $\mathcal{I}=(t_{0},t_{1}]$ Section 3 upper bounds the potential at time $t_{1}$ by

[TABLE]

where $t$ is the length of the interval. Section 3 lower bounds $S^{-}(\mathcal{I})$ and upper bounds the sum $S^{\prime}(\mathcal{I})+S^{*}(\mathcal{I})$ . To prove Section 3 we have to show that the initial potential $\bar{\phi}(\mathbf{x}^{(t_{0})})$ decreases to $O(n)$ after $O(n\cdot\log\bar{\phi}(\mathbf{x}^{(t_{0})}))$ time steps with probability $1-\delta$ . Optimally, we would use a single application of Section 3 to upper bound the potential as in Equation 2 and then bound the terms $S^{-}(\mathcal{I})$ and $S^{\prime}(\mathcal{I})+S^{*}(\mathcal{I})$ via Section 3. However, the bounds on $S^{-}$ and $S^{\prime}+S^{*}$ given by Section 3 are too loose to yield the desired result via a single application of Section 3 and Section 3 with the whole time interval $\mathcal{I}=[t_{0},t_{0}+O(n\log\bar{\phi}(\mathbf{x}^{(t_{0})}))]$ . For example, the bound on $S^{\prime}+S^{*}$ inherently has a term of order $\sqrt{\bar{\phi}}$ , where $\bar{\phi}$ is the potential at the start of the interval for which Section 3, (i) is applied. Thus a one shot proof as described above can never reach a potential below $\sqrt{\bar{\phi}}$ . This is not sufficient if the initial potential is large, e.g., say for $\bar{\phi}\gg n^{8/3}$ .

To circumvent this problem we apply Section 3 and Section 3 several times for smaller time intervals: More detailed, we split the proof of Section 3 into two regimes. In regime $2$ we use several phases to decrease the potential to $\Theta(n^{4/3})$ . If the potential is $\bar{\phi}$ at the beginning of a phase a single application of Section 3 and Section 3 reduces the potential to $\bar{\phi}^{3/4}$ . The length of each such phase is geometrically decreasing by a factor $3/4$ where the first phase is of length $O\left(n\ln\left(\frac{\bar{\phi}(\mathbf{x}^{(t_{0})})}{n\delta}\right)\right)$ . After the last phase of regime $2$ the potential is of order $n^{4/3}$ .

Then, in regime $1$ the potential reduces from $\Theta(n^{4/3})$ to $O(n)$ , again through several phases. If the first phase of regime 1 starts with a potential of size $B$ , the phase has length $t=O(n\ln(B))$ . If there was no additive increase due to the noise, then this would reduce the potential to [math]. However, there is an additive increase of $\Theta(t)=\Theta(n\ln(B))$ which leaves us with a potential of size $O(n\ln(B))$ . The next phase will therefore be of length $n\ln\ln(B)$ etc. This is repeated for $\ln^{*}(B)$ phases until the potential reduces to $O(n)$ , which, as we explained in Section 1.2, is the furthest the potential can be decreased .

Putting everything together, we get that after $O\left(n\ln\left(\frac{\bar{\phi}(\mathbf{x}^{(t_{0})})}{n\delta}\right)\right)$ rounds the potential reduces to $O(n)$ . ∎

The full proof of Section 3 handles general $\mathbb{E}\left[\,N^{\prime}\,\right]$ and general $\delta$ and thus it is significantly more technical. It can be found in Section B.3. From Section 3 we are able to derive Theorem 1.1, whose proof can be found in the appendix.

4 Deviation from the Initial Average

An informal argument for the statements in this section in the special case of $\sigma=1$ can be found in [56]. Before we state our results we need the following result on the standard normal distribution.

Theorem 4.1 ([17]).

Let $\Phi(x)$ denote the cumulative distribution function of the standard normal distribution. We have for $x\geq 0$ :

[TABLE]

We can now state and prove the main results of this section.

Lemma \thelemma.

For any $t$ and any $\delta<1$ , we have $\varnothing^{(t)}-\varnothing^{(0)}\sim\frac{\sum_{\tau=1}^{2t}N^{(\tau)}}{2n}$ with probability at least $1-\delta$ , where $N^{(\tau)}$ is the noise of the channel. In particular, for the Gaussian white noise model setting where $N\sim\mathcal{N}(0,\sigma^{2})$ we have $\sum_{\tau=1}^{2t}N^{(\tau)}\sim\mathcal{N}(0,2t\sigma^{2})$ . Thus

(i)

$|\varnothing^{(t)}-\varnothing^{(0)}|\leq\frac{\sigma\sqrt{t\ln(1/\delta)}}{n}\text{ w.p. at least$ 1-\delta $}$ ** 2. (ii)

$|\varnothing^{(t)}-\varnothing^{(0)}|\geq\frac{\sigma\sqrt{t\ln(1/\delta)}}{n}\text{ w.p. at least$ \frac{\delta}{2\sqrt{2\ln(1/\delta)}}. $}$ **

Proof.

Note that $\mathbf{1}^{T}\mathbf{X}^{(t+1)}=\frac{N_{i}+N_{j}}{2}+\mathbf{1}^{T}\mathbf{X}^{(t)}$ , where $i$ and $j$ are the nodes scheduled in the current round and $\mathbf{1}^{T}\mathbf{X}^{(t)}=\sum_{i}x_{i}^{(t)}$ . Applying this recursively and using that all $N_{i}$ follow the same distribution we have $\mathbf{1}^{T}\mathbf{X}^{(t)}=\frac{\sum_{\tau=1}^{2t}N^{(\tau)}}{2}+\mathbf{1}^{T}\mathbf{X}^{(0)}$ . Using that $\varnothing^{(t)}=\mathbf{1}^{T}\mathbf{X}^{(t)}/n$ completes the proof of the first part.

Consider $(a)$ . For a general normal distribution with mean $\mu_{x}$ and variance $\sigma^{2}_{x}$ we have that

[TABLE]

where $\Phi(x)$ denotes the cumulative density function of the standard normal distribution. Applying the upper bound of Theorem 4.1 and using symmetry of the normal distribution, it holds for $x=2\sigma\sqrt{t\ln(1/\delta)}$ , $\sigma_{x}=\sqrt{2t\sigma^{2}}$ and $\mu_{x}=0$ that

[TABLE]

Taking Union bound yields the claim.

Consider $(b)$ . By applying the lower bound of Theorem 4.1 and using similar arguments as before, we have for $x=2\sigma\sqrt{t\ln(1/\delta)}$ , $\sigma_{x}=\sqrt{2t\sigma^{2}}$ , $\mu_{x}=0$ and $y=\frac{x-\mu_{x}}{\sigma_{x}}=\sqrt{2\ln(1/\delta)}\geq\sqrt{2}$

[TABLE]

Using the Berry-Esseen theorem, one can easily prove similar bounds for any distribution with bounded third moment including discrete white noise. Similarly, rounding can easily be taken care of by applying the ideas from Appendix D.

In the following we consider the potential $(\varnothing_{t})_{t\geq 0}$ as a Martingale allowing us to use Theorem A.3 to derive the desired concentration bounds. The following bound is weaker than the aforementioned bounds, however, it is useful whenever the noise is such that $m_{t,\delta/(2t)}$ is small.

Proposition \theproposition.

For any $t\geq 2$ and any $\delta<1,$ we have $-m_{t,\delta/(2t)}\sigma\sqrt{2t}\leq\varnothing^{(t)}-\varnothing^{(0)}\leq m_{t,\delta/(2t)}\sigma\sqrt{2t}$ with probability at least $1-\delta$ .

Proof.

We start by showing that the sum of entries $(\mathbf{1}^{T}\mathbf{X}^{(t)})_{t\geq 0}$ is a Martingale

[TABLE]

By law of total expectation, summing over all choices of $i$ and $j$ , we get that $(\mathbf{1}^{T}\mathbf{X}^{(t)})_{t\geq 0}$ is a Martingale.

Since $(\mathbf{1}^{T}\mathbf{X}^{(t)})_{t\geq 0}$ is a Martingale, so is $(\varnothing^{(t)})_{t\geq 0}$ , where we used that $\varnothing^{(t)}=\mathbf{1}^{T}\mathbf{X}^{(t)}/n$ . Note that

[TABLE]

w.p. at least $1-\delta/(2t)$ per time step and hence, by Union bond, w.p. at least $1-\delta/2$ throughout the interval. By Theorem A.3, with $M=m_{t,\delta/(2t)}$ , $\sigma_{i}^{2}\leq\sigma^{2}$ , and $b=m_{t,\delta/(2t)}\sigma\sqrt{2t}$ we get

[TABLE]

Taking Union bound yields the r.h.s. inequality of the claim The l.h.s. follows by using Theorem A.4 instead of Theorem A.3. ∎

5 Experimental Results

The goal of this section is twofold. First, we seek to better understand the distribution $\mathcal{D}$ of the distances $x_{i}^{(t)}-\varnothing^{(t)}$ . Second, we simulate a setting in which the range of values is bounded motivated by computational and storage limited agents. All results in this section are based on an implementation of the simple averaging dynamic. The code (python3) for the experiments can be found here [44].

5.1 The Distribution of the Distances

The experiments suggest that the distance decays at least exponentially. Note that the experiments only show a single iteration, however, this phenomena was observable in every single run. The bound on $\mathbb{E}\left[\,\bar{\phi}\left(\mathbf{X}^{(t)}\right)\,\right]$ we obtained in Theorem 1.1 only implies that $\mathcal{D}$ is at most $O(1/d^{3})$ . However, we conjecture, for sub-Gaussian noise that $\mathbb{P}\left[\,|X_{i}^{(t)}-\varnothing^{(t)}|\geq x\,\right]=O(\exp^{-x})$ (cf. 1(a)). Showing this rigorously is challenging due to the dependencies among the values. Nonetheless, such bounds are very important since they immediately bound the maximum difference and we consider this the most important open question.

5.2 The Bounded Values Setting

One of the motivations for the very simple averaging dynamic arises in the setting of limited computational power of the interacting agents. So far we assumed that agents can store and transmit (intermediate) values from an unbounded range. For many applications and in particular motivated by agents with bounded memory one would hope for similar results if there is a maximum and a minimum value that can be stored or transmitted. The formal definition is as follows: values can only be from the range $[v_{min},v_{max}]$ ( $=[1,10]$ in our experiments). We assume noise of the channel cannot produce values larger than $v_{max}$ or smaller than $v_{min}$ , which can be motivated as follows in the setting where the values correspond to amplitudes: here $v_{max}$ and $v_{min}$ are simply the amplitudes (high amplitude and no amplitude) where the signal-to-noise ratio is very large, and noise becomes negligible. An equivalent model is that the agents know the range of possible communication values, and hence, they can simply correct every value larger than $v_{max}$ to $v_{max}$ . In particular when agents only have limited storage, the communication range will often be bounded, and even rounding might become necessary (see Appendix D).

We refer to these equivalent models as the model with cutoffs. While the experiments indicate that values still converge towards the running average, there is a clear drift of the running average from the initial average if the input values are chosen unsuitably. In our experiments, we set the range of values to $[1,10]$ , use the noise described in the discrete noise model together with rounding. Initially, all agents have value $10$ . We see a drastic drift of the running average (see 1(b)). Even though the initial average is $10$ , the running average appears to approach the midpoint of the range, i.e., 5. The histogram of distances to the initial average shows even more clearly that the values are not concentrated around the initial average. Although the experiments only show a single iteration, this phenomena was observable in every single run. We believe that the reason for this is simply that the noise is no longer symmetric and no longer zero-mean due to the cutoffs $[1,10]$ . Proving convergence to the running-average in this model seems challenging and interesting.

We believe that the insights in bounding this potential might be useful in similar problems.

6 Conclusion and Open Problems

In this paper we showed bounds on the convergence time for the unbounded setting. Our simulations in Section 5 yield two interesting open problems: (i) study the setting where the values are restricted to some interval (in this case the noise is no longer symmetrical) and (ii) prove tail bounds on the distance distribution w.r.t. to the running or initial average. Another interesting research direction is to move away from zero-mean noise and consider biased noise models: how quickly can the bias(es) be estimated and is convergence still feasible by compensating for the (learned) bias?

Appendix A Auxiliary Claims

Theorem A.1 (Weierstrass Product Inequality).

We have

(i)

[TABLE]

if $x_{i}\leq 1$ and either $w_{i}\geq 1$ for all $i$ or $w_{i}\leq 0$ for all $i$ . 2. (ii)

[TABLE]

if $\sum_{i}w_{i}\leq 1$ , $w_{i}\in[0,1]$ and $x_{i}\leq 1$ for all $i$ .

Proof.

Consider $(i)$ which trivially holds for $w_{i}\geq 1$ . Now consider $(ii)$ . Taking the logarithm on both sides and treating the $w_{i}$ as probabilities, where we introduce a dummy element with $x_{0}=0$ , $w_{0}=1-\sum_{i\geq 1}w_{i}$ and derive

[TABLE]

Where we used Jensen’s inequality. This concludes the proof.

∎

Proposition \theproposition (Distribution Facts).

Let $X^{2}\sim\mathcal{N}(0,\sigma^{2})$ . We have

(i)

$\mathbb{E}\left[\,X^{2}\,\right]=\sigma^{2}$ ** 2. (ii)

$\operatorname{Var}\left[\,X^{2}\,\right]=2\sigma^{4}$ .

Proof.

First observe that $X^{2}\sim\sigma^{2}\chi_{1}^{2}$ , where $\chi_{1}^{2}$ is the chi-squared distribution with $1$ degree of freedom. Hence, $\mathbb{E}\left[\,\chi_{1}^{2}\,\right]=1$ and $\operatorname{Var}\left[\,\chi_{1}^{2}\,\right]=2$ implying $(i)$ and $(ii)$ . ∎

We will make use of a slightly generalized version of the Hoeffding bound (see [35]).

Theorem A.2 ([35]).

Let $X=\sum_{i=1}^{m}X_{i}$ be a sum of $m$ independent random variables with $a_{i}\leq X_{i}\leq b_{i}$ for all $i$ . Then

[TABLE]

The following Theorem finds its origins in the work of [45].

Theorem A.3 ([16, Theorem 6.1]).

Let $X$ be the martingale associated with a filter $\mathcal{F}$ satisfying

(i)

$\operatorname{Var}\left[\,X_{i}~{}|~{}\mathcal{F}_{i-1}\,\right]\leq\sigma_{i}^{2}$ , for $1\leq i\leq m$ ; 2. (ii)

$|X_{i}-X_{i-1}|\leq M$ , for $1\leq i\leq m$ .

Then we have

[TABLE]

Theorem A.4 ([16, Theorem 6.5]).

Let $X$ be the martingale associated with a filter $\mathcal{F}$ satisfying

(i)

$\operatorname{Var}\left[\,X_{i}~{}|~{}\mathcal{F}_{i-1}\,\right]\leq\sigma_{i}^{2}$ , for $1\leq i\leq m$ ; 2. (ii)

$X_{i-1}-M-a_{i}\leq X_{i}$ , for $1\leq i\leq m$ .

Then we have

[TABLE]

Throughout this paper we will frequently make use of the fact that the sum of independent variables is a martingale.

Appendix B Missing Proofs: Sequential Setting (Section 3)

B.1 Sequential Setting: One Step Potential Change

In the following we bound the one step potential change.

Lemma \thelemma.

Fix an arbitrary potential at time $t$ . Suppose the pair $i,j$ was chosen to communicate and that the coins have been flipped to determine the noise, i.e., $N_{i}=n_{i}$ and $N_{j}=n_{j}$ . Then,

[TABLE]

Proof.

In order to bound $\bar{\phi}(\mathbf{X}^{(t+1)})-\bar{\phi}(\mathbf{x}^{(t)})$ we will make use of Section 2 and analyze ${\phi}(\mathbf{X}^{(t+1)})-{\phi}(\mathbf{x}^{(t)})$ , which is slightly more convenient, since we do not need to compute the change of $\varnothing^{(t+1)}$ .

Note that besides node $i$ and node $j$ no other nodes will change their value. However, the contribution of each agent to the potential might change. Consider for $k\not\in\{i,j\}$

[TABLE]

The same holds if we substitute $n_{i}$ with $n_{j}$ and thus, for agent $k\not\in\{i,j\}$ the change in the contribution to the potential equals:

[TABLE]

We get that if in step $t$ an interaction between $i$ and $j$ happens, and $N_{i}=n_{i},N_{j}=n_{j}$ that the change of the potential is as follows:

[TABLE]

By Section 2 and via dividing the equation by $2n$ we obtain:

[TABLE]

Recall, that by definition (see Section 2) $N^{\prime(t+1)}=\left(N_{1}^{(t+1)}\right)^{2}+\left(N_{2}^{(t+1)}\right)^{2}$ and $N^{*(t+1)}=N_{1}^{(t+1)}+N_{2}^{(t+1)}$ where $N_{1}^{(t+1)}$ and $N_{2}^{(t+1)}$ are the random variables that determine the noise of the communication in time step $t+1$ . Conditioning on all the randomness that has happened until time $t$ , * i.e.,* conditioning on $\mathcal{F}_{t}$ , we define the $\Delta^{(t+1)}$ as follows:

$\Delta^{(t+1)}=\begin{cases}\frac{\left(x_{i}^{(t)}-x_{j}^{(t)}\right)^{2}}{2\bar{\phi}(\mathbf{x}^{(t)})}&\text{ for$ \bar{\phi}(\mathbf{x}^{(t)})>0 $}\\ 1/n&\text{ otherwise }\end{cases}$ , where $i$ and $j$ are the chosen agents in round $t$ .

Using the above, we can prove the following two statements.

Proof of Section 3.

The first result follows with the definition of $N^{\prime(t+1)},N^{*(t+1)}$ and $\Delta^{(t+1)}$ and with Section B.1.

Note that the term $-\frac{(n_{i}+n_{j})^{2}}{4n}$ is always negative.Hence, by Section B.1, we get that for fixed $i$ and $j$

[TABLE]

W.l.o.g. assume $\bar{\phi}(\mathbf{x}^{(t)})>0$ ; otherwise the claim follows trivially (by definition of $\Delta^{(t+1)}$ ). Taking the expectation over all choices of $i$ and $j$ :

[TABLE]

Note that in the Gaussian noise model, all $N_{i}$ follow the same law $\mathcal{N}(0,\sigma^{2})$ . Thus, using that the sum of two Gaussian with law $\mathcal{N}(0,\sigma^{2})$ are distributed $\mathcal{N}(0,2\sigma^{2})$ , we obtain $N^{*}\stackrel{{\scriptstyle d}}{{=}}N_{i}+N_{j}\sim\mathcal{N}(0,2\sigma^{2})$ . Finally, the sum of two squared Gaussians each with distribution $N_{i},N_{j}\sim\mathcal{N}(0,\sigma^{2})$ we have $N^{\prime}\stackrel{{\scriptstyle d}}{{=}}N_{i}^{2}+N_{j}^{2}\sim 2\Gamma(1/2,2\sigma^{2})=\Gamma(1,2\sigma^{2})$ .

∎

For an arbitrary time interval $\mathcal{I}$ define

[TABLE]

Note that $i$ and $j$ in the definition of $S^{*}$ also depend on $\tau$ and are the nodes (we use nodes and agents interchangeably) that are chosen in that round.

Proof of Section 3.

The potential $\bar{\phi}(\mathbf{X}^{(t_{1})})$ is maximized, if all decreases happen at the beginning and all increases happen in the last time step.

In order to analyze the decrease due to $S^{-}$ we will make use of the Weierstrass Product Inequality (Theorem A.1) to derive

[TABLE]

B.2 Bounding $S^{\prime}$ , $S^{*}$ , and $S^{-}$

In this section we bound the terms of Section 3 separately. Therefore, for any $\delta\in[0,1]$ , any $t_{0}$ and any $t$ we define the following values:

[TABLE]

We want to emphasize that these values are not random variables and their values are not related to the actual outcome of the randomness during a run of the protocol. In words $m^{\prime}_{t,\delta}$ ( $m^{*}_{t,\delta}$ , respectively) denotes the maximum value that is reached w.p. at most $\delta$ by $N^{\prime}$ and $N^{*}$ during the interval $[t_{0},t_{0}+t]$ , respectively. Note, that the value of $m^{\prime}_{t,\delta}$ and $m^{*}_{t,\delta}$ is independent from the choice of $t_{0}$ as the noise at different time steps is independent. We will assume $m_{t,\delta}\geq 1$ throughout the proofs; we will only consider $t$ that are a function of $n$ and hence we restrict ourselves to noise functions that grow with $n$ .

From now on and throughout the proof assume that $\delta>0$ is fixed and we continue with upper bounding $S^{\prime}$ .

Lemma \thelemma.

Fix arbitrary $t_{0},t_{1}$ , consider the interval $\mathcal{I}=(t_{0},t_{1}]$ and let $b^{\prime}=b_{t,\delta/2}^{\prime}$ be the value as defined in Equation 7 where $t=t_{1}-t_{0}$ . Then, w.p. at least $1-\delta$ we have

[TABLE]

Proof.

By definition $m^{\prime}_{t,\delta/2}$ w.p. at least $1-\delta/2$ for every $t^{\prime}\in\mathcal{I}$ we have $N^{\prime(t^{\prime})}\leq m^{\prime}_{t,\delta/2}$ . Assume that this property holds throughout interval $\mathcal{I}$ .

Define $X_{t^{\prime}}=S^{\prime}((t_{0},t^{\prime}])-\mathbb{E}\left[\,S^{\prime}((t_{0},t^{\prime}])\,\right]$ and note that $\big{(}X_{t^{\prime}}\big{)}_{t_{0}<t^{\prime}\leq t_{1}}$ is a martingale. We obtain that $|X_{t^{\prime}}-X_{t^{\prime}-1}|\leq N^{\prime(t^{\prime})}/4\leq M$ and $\operatorname{Var}\left[\,X_{t^{\prime}}~{}|~{}\mathcal{F}_{t^{\prime}-1}\,\right]=\operatorname{Var}\left[\,N^{\prime}/4\,\right]=\operatorname{Var}\left[\,N^{\prime}\,\right]/16$ . Let $b^{\prime\prime}=b^{\prime}-\frac{t}{4}\mathbb{E}\left[\,N^{\prime}\,\right]$ . We have that

[TABLE]

and apply Theorem A.3 to the martingale which yields

[TABLE]

Taking a union bound with the case that $N^{\prime(t^{\prime})}$ is not smaller than $M$ for some $t^{\prime}\in\mathcal{I}$ , yields the claim. ∎

In the following we bound the first, second moment of $Z^{(t+1)}=\frac{X_{i}^{(t)}+X_{j}^{(t)}}{2}-\varnothing^{(t)}$ and its maximum possible value; we use this result in the proof of Section B.2.

Fact \thefact.

Fix $\mathcal{F}_{t}$ . In particular, this fixes the vector of values $\mathbf{x}^{(t)},\varnothing^{(t)}$ and $\bar{\phi}(\mathbf{x}^{(t)})$ . Define the following random variable $Z^{(t+1)}=\frac{X_{i}^{(t)}+X_{j}^{(t)}}{2}-\varnothing^{(t)}$ , where $i$ and $j$ are chosen uniformly at random. We have:

(i)

$\mathbb{E}\left[\,Z^{(t+1)}~{}|~{}\mathcal{F}_{t}\,\right]=0$ ** 2. (ii)

$\mathbb{E}\left[\,\left(Z^{(t+1)}\right)^{2}~{}\middle|~{}\mathcal{F}_{t}\,\right]\leq\frac{\bar{\phi}(\mathbf{x}^{(t)})}{n}$ ** 3. (iii)

$Z^{(t+1)}\leq\sqrt{\bar{\phi}(\mathbf{x}^{(t)})}$ **

Proof.

Recall that we allow $i=j$ . Using this, we derive

[TABLE]

Moreover, using that $(a+b)^{2}\leq 2a^{2}+2b^{2}$ , we get

[TABLE]

The third claim is true because for any $i$ we have

[TABLE]

We continue with upper bounding $S^{*}$ .

Lemma \thelemma.

Fix arbitrary $t_{0},t_{1}$ , consider the interval $\mathcal{I}=(t_{0},t_{1}]$ and let $b^{*}=b_{t,\delta}^{*}$ be the value as defined in Equation 9 with $t=t_{1}-t_{0}$ . Then, w.p. at least $1-\delta$ , we have

[TABLE]

Proof.

By the definition of $m^{*}_{t,\delta/4}$ (see Equation 4) we have $N^{*{(\tau)}}\leq m^{*}_{t,\delta/4}$ w.p. $1-\delta/4$ throughout $\mathcal{I}$ . Further by Section B.2 with probability $1-\delta/4$ we have $S^{\prime}((t_{0},t^{\prime}])\leq S^{\prime}((t_{0},t_{1}])\leq b^{\prime}_{t,\delta/4}$ for all $t^{\prime}\in(t_{0},t_{1}]$ . We assume that both properties hold (the case that they do not hold is submerged in a union bound (that leads to a probability $\leq\delta$ ) with all other undesirable cases).

Recall the definition $Z^{(\tau)}=\frac{x_{i}^{(\tau-1)}+x_{j}^{(\tau-1)}}{2}-\varnothing^{(\tau-1)}$ as in Section B.2 and note that for $t^{\prime}\in[t_{0},t_{1}]$ , we have $S^{*}((t_{0},t^{\prime}])=\sum_{\tau\in(t_{0},t^{\prime}]}N^{*{(\tau)}}Z^{(\tau)}$ . For each such $t^{\prime}$ the sequence $\big{(}S^{*}((t_{0},\tau])\big{)}_{t_{0}\leq\tau\leq t^{\prime}}$ is a martingale and the goal is to apply Theorem A.3 to it.

We assume a process $P^{*}$ in which $\bar{\phi(\mathbf{X}^{(t^{\prime})})}\leq z$ for all $t^{\prime}\in[t_{0},t_{1})$ , where $z$ is defined as in (8). In this process we will bound the size of $S^{*}$ . Using this bound, we show that the original process $P$ and $P^{*}$ never diverge (with large probability) and hence the bound on $S^{*}$ we obtained in $P^{*}$ carries over to $P$ .

Consider $P^{*}$ . Using that the potential is at most $z$ and Section B.2, we get a bound of $\frac{z}{n}$ on the second moment of $Z^{(\tau)}$ (conditioned on $\mathcal{F}_{\tau-1}$ ). Using this bound and since $Z^{(\tau)}$ and $N^{*{(\tau)}}$ are independent and as $\mathbb{E}\left[\,Z^{(\tau)}~{}|~{}\mathcal{F}_{\tau-1}\,\right]$ and $\mathbb{E}\left[\,N^{*{(\tau)}}~{}|~{}\mathcal{F}_{\tau-1}\,\right]=\mathbb{E}\left[\,N^{*{(\tau)}}\,\right]$ equal [math] we obtain that

[TABLE]

Define $M=m^{*}_{t,\delta/4}\sqrt{z}$ . As the potential is never above $z$ we obtain $Z^{(\tau)}\leq\sqrt{z}$ for all $\tau\in(t_{0},t]$ due to Section B.2, (iii). Due to the definition of $m^{*}_{t,\delta/4}$ this implies that

[TABLE]

throughout the interval $(t_{0},t^{\prime}]$ .

The bound on the variance and Equation 10 are sufficient to apply Theorem A.3 and we obtain

[TABLE]

where we used that $\mathbb{E}\left[\,S^{*}((t_{0},t^{\prime}])\,\right]=0$ . Let $A=\sqrt{\frac{2\ln(2t/\delta)t\mathbb{E}\left[\,\left(N^{*}\right)^{2}\,\right]}{n}}$ and $B=\frac{2\ln(2t/\delta)m^{*}_{t,\delta/4}}{3}$ . Then we have $(b^{*})^{2}=b^{*}\cdot(A+B)\cdot\sqrt{z}=b^{*}A\sqrt{z}+b^{*}B\sqrt{z}\geq A^{2}z+Bb^{*}\sqrt{z}~{},$ which yields

[TABLE]

To prove the equivalence between the processes, we need to show that the potential at step $t^{\prime}$ does not exceed $z$ ; in this proof we use $S^{*}((t_{0},t^{\prime}])\leq b^{*}$ and $S^{\prime}((t_{0},t^{\prime}])\leq b^{\prime}_{t,\delta/4}$ . The first statement holds with probability $1-\delta/(2t)$ as we just showed and we assumed $S^{\prime}((t_{0},t^{\prime}])\leq b^{\prime}_{t,\delta/4}$ to hold throughout the whole proof at the very beginning. Thus we obtain

[TABLE]

where the last inequality follows as $x\sqrt{z^{\prime}}\leq z^{\prime}$ for all $z^{\prime}\geq x^{2}$ .

The last induction step shows $\mathbb{P}\left[\,S^{*}(t_{0},t^{\prime}])\geq b^{*}\,\right]\leq\delta/(2t)$ . This combined with a union bound about the error probabilities from the two assumptions at the start of the lemma ( $\delta/4$ each) yield that the result holds with probability $1-\delta$ . ∎

Now, we lower bound $S^{-}$ which is essential to obtain progress through Section 3.

Proof of Section 3, (ii).

Let $\tau\in\mathcal{I}$ . Note that $\Delta^{(\tau)}\in[0,1]$ . By Section 3, we have $\mathbb{E}\left[\,\Delta^{(\tau)}\,\right]=\frac{1}{n}$ and $\operatorname{Var}\left[\,\Delta^{(\tau)}\,\right]\leq\mathbb{E}\left[\,(\Delta^{(\tau)})^{2}\,\right]\leq\mathbb{E}\left[\,\Delta^{(\tau)}\,\right]\leq 1^{2}\cdot\frac{1}{n}=\frac{1}{n}$ , where Why $\mathbb{E}\left[\,(\Delta^{(\tau)})^{2}\,\right]\leq\mathbb{E}\left[\,\Delta^{(\tau)}\,\right]$ follows since $\Delta^{(\tau)}\in[0,1]$ implies that each element of the sum is smaller. By Theorem A.4 with $M=1,a_{i}=0$ for all $i$ , with $b=\frac{\gamma t}{n}$ , we get

[TABLE]

Proof of Section 3 (i).

In order to derive the result, we the bound on $S^{\prime}$ (Section B.2) and on $S^{*}$ (Section B.2). Section B.2 applied with $\delta/4$ and for $t\geq n$ and $n$ large enough with probability $1-\delta/4$ yields:

Roughly upper bounding the terms yields

[TABLE]

Furthermore, we have, by the definition of $z$ (Equation 8),

[TABLE]

where we used that $\sqrt{\operatorname{Var}\left[\,N^{\prime}\,\right]/8}\leq\sqrt{n}\leq\sqrt{t}$ . Note that $\sqrt{\mathbb{E}\left[\,(N^{*})^{2}\,\right]}=\sqrt{\mathbb{E}\left[\,N^{\prime}\,\right]}\leq 1+\mathbb{E}\left[\,N^{\prime}\,\right]$ . Thus, putting everything together yields

[TABLE]

Applying Section B.2 with $\delta/4$ yields that with probability $1-\delta/4$ we have

[TABLE]

Combining the bound on $S^{\prime}(\mathcal{I})$ and $S^{*}(\mathcal{I})$ yields the claim with probability $1-\delta$ .

∎

B.3 Proof of Section 3 and Theorem 1.1

In the section we prove Section 3. A proof sketch can be found in Section 3. Using Section 3 and Section 3 our main theorem Theorem 1.1 follows almost immediately.

Proof of Section 3.

We distinguish between regimes based on the value of $\bar{\phi}=\bar{\phi}(\mathbf{X}^{(t)})$ for thresholds that we define later. We have two regimes: regime $(2)$ starts at time $t_{0}$ and ends when the potential is below a threshold $b_{2}(1)$ which marks the start of regime $(1)$ ; note that in order to simplify indices in the calculations, we define regimes and phases (within the regimes) backwards, starting with large numbers and then reduce. We divide each regime into phases. The phases of regime $(1)$ are such that the $i$ ’th phase starts when $\bar{\phi}\in[b_{2}(i),b_{2}(i+1)]$ . Phases in regime $(1)$ are also counted backwards starting from phase $i_{max}=\min\{\{i:\bar{\phi}(\mathbf{x}^{(t_{0})})\geq b_{2}(i)\}\cup\{0\}\}$ until we reach phase [math]. We use $\tau_{i}^{\iota}$ , $\iota\in\{(1),(2)\}$ to denote the start of the $i$ ’th phase of regime $\iota$ . Where the first phase is $\tau^{(2)}_{i_{max}}.$

Let $\gamma^{*}=1-e$ , $c^{*}=\frac{8}{3\gamma^{*}}\leq 0.91$ . We define the boundaries $b_{1}$ and $b_{2}$ (which will be used to guide the potential decrease) as follows. Let $c^{**}$ be a large enough constant and $\varepsilon=1/20$ .

[TABLE]

Regime 2

Consider phase $i$ , that is, the potential $\bar{\phi}=\bar{\phi}(\mathbf{x}^{(\tau^{2}_{i})})$ at the start of the phase is in the interval $[b_{2}(i),b_{2}(i+1)]$ . Let $t_{i}=100n\ln\left(\frac{b_{2}(i+1)}{\delta}\right)$ .

In the following we bound the increase due to $S^{*}+S^{-}$ after $t_{i}$ steps. By Section 3, (i), after $t_{i}$ time steps we have that each of the following bound holds w.p. at least $1-\frac{\delta}{b_{2}(i+1)}$ .

[TABLE]

In the following we bound the terms of (12). First we obtain

[TABLE]

Here, $*_{1}$ follows because $\ln(x)\leq x^{\varepsilon}/400$ for large enough $x$ and $*_{2}$ follows because $100\cdot\frac{n\mathbb{E}\left[\,N^{\prime}\,\right]}{\delta^{\varepsilon}}\leq b_{2}^{1-\varepsilon}(i)$ .

We continue by bounding the terms in front of the square-root of (12). To do so we consider the factors separately. Due to $\sqrt{x}\leq x$ and $\ln x\cdot y\leq\ln x\cdot\ln y$ for large enough $x,y>0$ we have

[TABLE]

Due to the smoothness of the noise distribution, we have

[TABLE]

Moreover, again using $\ln x\cdot y\leq\ln x\cdot\ln y$ for $x,y\geq 1$ , we can bound

[TABLE]

Putting everything together, using that $b_{2}(i+1)=b^{4/3}_{2}(i)$ .

[TABLE]

where $*_{1}$ follows because $\ln(x)\leq x^{2/15}/10^{6}$ for large enough $x$ .

Plugging (B.3) and (B.3) into (12) yields

[TABLE]

Finally, by Section 3, (ii) , $S^{-}\geq(1-\gamma^{*})\frac{t_{i}}{n}\geq\ln(b_{2}(i+1))$ and since $\bar{\phi}(\mathbf{X}^{(\tau^{(2)}_{i})})\leq b_{2}(i+1)$ we have

[TABLE]

By Section 3, Section B.3 and Equation 16 we obtain

[TABLE]

The number of time steps $\sum_{i}t_{i}$ of all phases in regime $(2)$ form a geometric series and is dominated by the length of the first phase, that is, regime $(1)$ takes at most

[TABLE]

time steps. The probability of success of each phase also forms a geometric series and by a union bound over all phases, the probability of failure is $\sum_{i}\frac{3\delta}{b_{2}(i+1)}\leq\frac{\delta}{3}$ .

Regime 1

$b_{1}\leq\bar{\phi}<b_{2}(1)$ . Here we define the phases informally to avoid an overload of notation involving the tower-function. Instead of reducing $\phi$ to $\phi^{3/4}$ as in a phase in the regime above, here, in a single phase, we reduce the potentital from $\phi$ to $f\ln(\phi)$ and then from $f\ln(\phi)$ to $f\ln(f\ln(\phi))$ etc., where $f$ is such that this recursion forms a geometric series. We stop once the potential is smaller than $b_{1}$ .

From now on fix a phase in regime $(1)$ and assume that the potential is of value $\bar{\phi}$ at the start of the phase. Let $t_{\bar{\phi}}=c^{*}n\ln\left(c^{**}\frac{\bar{\phi}}{\mathbb{E}\left[\,N^{\prime}\,\right]n\delta}\right)$ , where $c^{**}$ is some large enough constant. If $\bar{\phi}\gg b_{1}$ the length of a phase is $t_{\bar{\phi}}$ and once $\bar{\phi}$ is close to $b_{1}$ the length of a phase is $c^{*}n\ln(c^{**}/\delta)$ , more formally the length of a phase is $t^{\prime}=\max\{t_{\bar{\phi}},c^{*}n\ln(c^{**}/\delta)\}$ . By Section 3, (i), after $t^{\prime}$ time steps we have,

[TABLE]

In the following we bound the second term of (17). Since we are in regime $(2)$ we have $\bar{\phi}\leq b_{2}(1)$ and we can deduce that $t^{\prime}=O\left(n\ln\left(\frac{b_{2}(1)}{\delta}\right)\right)\leq b_{2}(1)$ and $9{t^{\prime}}\mathbb{E}\left[\,N^{\prime}\,\right]+2\leq b_{2}(1)$ . We obtain

[TABLE]

To upper bound this term by $b_{1}$ we observe that the polynomial appearance of $n$ is $n^{9/10}$ in $b_{1}$ and only $n^{5/6}$ in the term $\frac{b_{2}(1)}{\sqrt{n}}$ . All other terms do not ruin the claim and we can bound (18) by $b_{1}$ . Thus,

[TABLE]

By Section 3 and using the lower bound on $S^{-}$ from Section 3, (ii) we obtain

[TABLE]

Now there are two cases. If $b_{1}\leq n\mathbb{E}\left[\,N^{\prime}\,\right]\frac{\ln(\bar{\phi})}{2}$ , then $\bar{\phi}(\mathbf{X}^{(\tau^{(1)}_{\bar{\phi}}+t^{\prime})})\leq n\mathbb{E}\left[\,N^{\prime}\,\right]\ln(\bar{\phi})$ and we continue with the next phase. On the other hand, if $b_{1}>n\mathbb{E}\left[\,N^{\prime}\,\right]\frac{\ln(\bar{\phi})}{2}$ , then we have $\bar{\phi}(\mathbf{X}^{(\tau^{(1)}_{\bar{\phi}}+t^{\prime})})\leq 2b_{1}$ and we are done.

We now calculate the success probability as well as total length of regime $2$ . In the last run we set the error parameter $\delta$ to $\delta/20$ . In the $i$ ’th run before the last run the error parameter is set to $\frac{1}{2^{i}}\delta/20$ . Clearly, the total error sums up to at most $\delta/20$ . Thus with probability of at least $1-\delta/3$ the potential decreases to $2b_{1}$ .

To analyze the runtime of regime $(1)$ first consider the case that regime $(2)$ is executed before regime $(1)$ because the initial potential $\bar{\phi(\mathbf{x}^{(t_{0}))}}$ was larger than $b_{2}(1)$ . Then we have $\log^{*}b_{2}(1)$ phases and the longest phase has length $O\left(n\ln\left(b_{2}(1)/\delta\right)\right)$ . Thus one can immediately bound the runtime of regime $(1)$ as $O\left(\log^{*}b_{2}(1)\cdot n\ln\left(b_{2}(1))\delta\right)\right)$ . As the length of the phases—ignoring the factor of $n$ — is decreasing more than geometrically a tighter analysis shows that the runtime of all phases can be bounded by $O\left(n\ln\left(b_{2}(1)/\delta\right)\right)$ . In the other case that regime $(2)$ is not executed before regime $(1)$ the initial potential $\bar{\phi(\mathbf{x}^{(t_{0}))}}$ is smaller than $b_{2}(1)$ and we can replace all occurrences of $b_{2}(1)$ in the runtime analysis with $\bar{\phi(\mathbf{x}^{(t_{0}))}}$ .

Combining Regimes and Phases

Taking a union bound over all errors in all phases in both regimes gives an error probability of at most $\delta$ . Note that regime 1 takes at most $O\left(n\ln\left(\frac{\bar{\phi(\mathbf{x}^{(t_{0}))}}}{\mathbb{E}\left[\,N^{\prime}\,\right]n\delta}\right)\right)$ rounds. If regime 2 is necessary than $\bar{\phi(\mathbf{x}^{(t_{0}))}}\geq b_{2}(1)\geq(n\mathbb{E}\left[\,N^{\prime}\,\right])^{1.1}$ and hence we can bound the number of rounds in regime 1 by

[TABLE]

Summing over both regime gives yields the claim. ∎

We are ready to prove the first main theorem.

Proof of Theorem 1.1.

Proof of (i). First observe that if $t_{0}=0$ and $t$ were to coincide with time $t^{*}$ as in Section 3 then the proposition immediately yields the result. Otherwise we apply Section 3 with an initial potential $\tilde{\phi}$ that is larger than $\bar{\phi}(\mathbf{X}^{(t_{0})})$ . $\tilde{\phi}$ is chosen such that $t^{*}$ in Section 3 equals the $t$ in Theorem 1.1. Choosing a larger potential than the actual analysis does not harm the correctness of Section 3 as the proof does not consider an exact potential but always just upper bound on the potential.

Proof of (iii). The lower bound on the expected size ( $\mathbb{E}\left[\,\bar{\phi}(\mathbf{X}^{(t)})\,\right]=\omega(\sigma^{2}n)$ ) follows from the following argument. By Section B.1, summing over all pairs of nodes, we get

[TABLE]

Taking expectations on both sides and applying this recursively implies $\mathbb{E}\left[\,\bar{\phi}(\mathbf{X}^{(t)})\,\right]\geq(1-1/n)^{t}\bar{\phi}(\mathbf{x}^{(0)})$ . Thus choosing $t=o\left(n\ln\left(\frac{\mathbf{x}^{(0)}}{\sigma^{2}n}\right)\right)$ yields $\mathbb{E}\left[\,\bar{\phi}(\mathbf{X}^{(t)})\,\right]=\omega(\sigma^{2}n)$ .

Proof of (ii). Fix an arbitrary potential at round $t^{\prime}$ and consider the next $n$ iterations. W.l.o.g. there has to be a constant fraction of the nodes with a value of greater or equal to the running average at time $t^{\prime}$ ; otherwise there has to be such a fraction of nodes that have a value strictly smaller than the running average, in which case the proof is symmetric. Let $S$ be the set of these nodes. Order the nodes of $S$ according to their value in decreasing order (ties broken arbitrarily). Assign the first $\left\lfloor|S|/2\right\rfloor$ to $S_{1}$ and the remaining $\left\lceil|S|/2\right\rceil$ nodes to $S_{2}$ . There will be w.h.p. a set $S_{i}^{\prime}$ , $i\in\{1,2\}$ of linear size in $n$ of nodes of $S_{i}$ that are chosen exactly once to exchange with another node of $S_{i}$ during the last $n$ steps and these nodes were not part of any other exchanges during the last $n$ steps. Now consider the exchange of two nodes that belong to $S_{1}^{\prime}$ : the node with the initially lower value, will after averaging have with constant probability a value that is by $\Omega(\sigma)$ larger than before. Similarly, consider the exchange of two nodes that belong to $S_{2}^{\prime}$ : the node with the initially higher value, will after averaging have with constant probability a value that is by $\Omega(\sigma)$ smaller than before.

Hence, by definition of the running average, irrespective of value of the running average at time $t$ , the potential is of size $\Omega(n\cdot\sigma^{2})$ (due to the nodes of $S_{1}^{\prime}$ or due to the nodes of $S_{2}^{\prime}$ ).

∎

Appendix C Synchronous Model

In this section we consider the synchronous model and show that it is up to scaling of a factor of $n/2$ almost the same. In order to avoid confusion, we introduce for every variable $V$ in the sequential model the synchronous/parallel counterpart ${}^{\parallel}V$ to emphasize the different model and the slightly different notation. The following two lemmas are the counterparts of Section 3 and Section 3. These two lemmas encapsulate the essential difference between both models.

Lemma \thelemma (Synchronous Setting).

There exists random variables $N^{*}$ , $N^{\prime}$ , and ${}^{\parallel}\Delta^{(t+1)}$ s.t.

[TABLE]

where

[TABLE]

In particular, in the Gaussian noise model, we have $N^{*}\sim\mathcal{N}(0,2\sigma^{2})$ and $N^{\prime}\sim\Gamma(1,2\sigma^{2})$ , where $\Gamma(\cdot,\cdot)$ denotes the gamma distribution.

This follows almost immediately from the sequential counter-part. In order to calculate the expectation, we can simply use linearity of expectation and multiply the sequential bound by a factor $n/2$

Proposition \theproposition (Synchronous Setting).

*Consider the interval $\mathcal{I}=(t_{0},t_{1}]$ . Let ${}^{\parallel}S^{\prime}=\sum_{\tau\in\mathcal{I}}\sum_{i=1}^{n}N^{\prime(\tau)}_{i}/4$ ,

let ${}^{\parallel}S^{*}=\sum_{\tau\in\mathcal{I}}N^{*{(\tau)}}\left(x_{i}^{(\tau)}-\varnothing^{(\tau)}\right)$ and let ${}^{\parallel}S^{-}=\sum_{\tau\in\mathcal{I}}^{\parallel}\Delta^{(\tau)}$ . We have that*

[TABLE]

The rest of the analysis is a straight-forward adaption of the sequential setting with time being scaled by a factor of $n/2$ .

Appendix D The Influence of Rounding

The rounding can be implemented as follows assuming that the noise $N\sim\aleph$ takes only integer variables. After a node $i$ receives the value from node $j$ , the node averages it as before and then rounds up or down with equal probability. In symbols,

[TABLE]

where $N\sim\aleph$ is the integer valued channel noise. Equivalently, we can write

[TABLE]

where $R$ is the random variable satisfying

[TABLE]

Regardless of the current state, it holds that $\mathbb{E}\left[\,R~{}|~{}\mathcal{F}_{t}\,\right]=0$ and $\operatorname{Var}\left[\,R~{}|~{}\mathcal{F}_{t}\,\right]=\mathbb{E}\left[\,R^{2}~{}|~{}\mathcal{F}_{t}\,\right]\leq 1$ . Thus we obtain that $\mathbb{E}\left[\,(N+R)^{2}\,\right]=\mathbb{E}\left[\,N^{2}+2NR+R^{2}\,\right]\leq\mathbb{E}\left[\,N^{2}\,\right]+1$ . If we substitute $N$ with $N+R$ in all proofs we obtain essentially the same results; the variance in the statements only increases by $1$ .

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Dan Alistarh, James Aspnes, David Eisenstat, Rati Gelashvili, and Ronald L. Rivest. Time-space trade-offs in population protocols. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 , pages 2560–2579, 2017. URL: https://doi.org/10.1137/1.9781611974782.169 , doi:10.1137/1.9781611974782.169 . · doi ↗
2[2] Dan Alistarh, James Aspnes, and Rati Gelashvili. Space-optimal majority in population protocols. Co RR , abs/1704.04947, 2017. URL: http://arxiv.org/abs/1704.04947 , ar Xiv:1704.04947 .
3[3] Dan Alistarh, Rati Gelashvili, and Milan Vojnović. Fast and exact majority in population protocols. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing , PODC ’15, pages 47–56, New York, NY, USA, 2015. ACM. URL: http://doi.acm.org/10.1145/2767386.2767429 , doi:10.1145/2767386.2767429 . · doi ↗
4[4] Dana Angluin, James Aspnes, Zoë Diamadi, Michael J. Fischer, and René Peralta. Computation in networks of passively mobile finite-state sensors. Distributed Computing , 18(4):235–253, 2006. URL: https://doi.org/10.1007/s 00446-005-0138-3 , doi:10.1007/s 00446-005-0138-3 . · doi ↗
5[5] Luca Becchetti, Andrea E. F. Clementi, Emanuele Natale, Francesco Pasquale, Riccardo Silvestri, and Luca Trevisan. Simple dynamics for plurality consensus. Distributed Computing , 30(4):293–306, 2017. URL: https://doi.org/10.1007/s 00446-016-0289-4 , doi:10.1007/s 00446-016-0289-4 . · doi ↗
6[6] Petra Berenbrink, Andrea E. F. Clementi, Robert Elsässer, Peter Kling, Frederik Mallmann-Trenn, and Emanuele Natale. Ignore or comply?: On breaking symmetry in consensus. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 2017, Washington, DC, USA, July 25-27, 2017 , pages 335–344, 2017. URL: https://doi.org/10.1145/3087801.3087817 , doi:10.1145/3087801.3087817 . · doi ↗
7[7] Petra Berenbrink, Robert Elsässer, Tom Friedetzky, Dominik Kaaser, Peter Kling, and Tomasz Radzik. A population protocol for exact majority with o(log 5/3 n) stabilization time and theta(log n) states. In 32nd International Symposium on Distributed Computing, DISC 2018, New Orleans, LA, USA, October 15-19, 2018 , pages 10:1–10:18, 2018. URL: https://doi.org/10.4230/LIP Ics.DISC.2018.10 , doi:10.4230/LIP Ics.DISC.2018.10 . · doi ↗
8[8] Lucas Boczkowski, Ofer Feinerman, Amos Korman, and Emanuele Natale. Limits for rumor spreading in stochastic populations. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA , pages 49:1–49:21, 2018. URL: https://doi.org/10.4230/LIP Ics.ITCS.2018.49 , doi:10.4230/LIP Ics.ITCS.2018.49 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Noidy Conmunixatipn: On the Convergence of the Averaging Population Protocol

Abstract

Contents

1 Introduction

1.1 Motivation and Related Work

Average consensus and its applications

Gossip-based communication models

Consensus protocols in population protocols, biological distributed algorithms

Noise and failures in sensor networks

1.2 Formal Results

Theorem 1.1** (Convergence to Running Avg.).**

Corollary \thecorollary ((Bounded) Divergence from Initial Avg.).

Remark \theremark.

The Influence of Rounding

Corollary \thecorollary.

The Synchronous Model

Corollary \thecorollary (Synchronous Setting).

Experimental Results

1.3 Technical Contributions

2 Model

Definition \thedefinition (Communication Models).

Definition \thedefinition (Noise Models).

Definition \thedefinition (Averaging Dynamic).

Definition \thedefinition (Potentials).

Fact \thefact.

Proof.

Notation

Fact \thefact.

3 The Sequential Setting: Convergence towards the Running Average

Lemma \thelemma (One Step Bound).

Proposition \theproposition (Decomposition of Potential).

Lemma \thelemma.

Definition \thedefinition.

Proposition \theproposition.

Proof Sketch.

4 Deviation from the Initial Average

Theorem 4.1** ([17]).**

Lemma \thelemma.

Proof.

Proposition \theproposition.

Proof.

5 Experimental Results

5.1 The Distribution of the Distances

5.2 The Bounded Values Setting

6 Conclusion and Open Problems

Appendix A Auxiliary Claims

Theorem A.1** (Weierstrass Product Inequality).**

Proof.

Proposition \theproposition (Distribution Facts).

Proof.

Theorem A.2** ([35]).**

Theorem A.3** ([16, Theorem 6.1]).**

Theorem A.4** ([16, Theorem 6.5]).**

Appendix B Missing Proofs: Sequential Setting (Section 3)

B.1 Sequential Setting: One Step Potential Change

Lemma \thelemma.

Proof.

Proof of Section 3.

Proof of Section 3.

B.2 Bounding S′S^{\prime}S′, S∗S^{*}S∗, and S−S^{-}S−

Lemma \thelemma.

Proof.

Fact \thefact.

Proof.

Lemma \thelemma.

Proof.

Proof of Section 3, (ii).

Proof of Section 3 (i).

B.3 Proof of Section 3 and Theorem 1.1

Proof of Section 3.

Regime 2

Regime 1

Combining Regimes and Phases

Proof of Theorem 1.1.

Theorem 1.1 (Convergence to Running Avg.).

Theorem 4.1 ([17]).

Theorem A.1 (Weierstrass Product Inequality).

Theorem A.2 ([35]).

Theorem A.3 ([16, Theorem 6.1]).

Theorem A.4 ([16, Theorem 6.5]).

B.2 Bounding $S^{\prime}$ , $S^{*}$ , and $S^{-}$