Local non-Bayesian social learning with stubborn agents

Daniel Vial; Vijay Subramanian

arXiv:1904.12767·cs.SI·September 21, 2022

Local non-Bayesian social learning with stubborn agents

Daniel Vial, Vijay Subramanian

PDF

TL;DR

This paper investigates how stubborn agents spreading false information can disrupt social learning, revealing that initial correct beliefs can be overwritten over time, and proposes strategies to mitigate such influence.

Contribution

It introduces a non-Bayesian social learning model with stubborn agents and analyzes how misinformation persists, providing new strategies to counteract fake news influence.

Findings

01

Agents learn the true state initially but forget it over time.

02

Seeding stubborn agents can effectively disrupt correct learning.

03

Proposed strategies outperform heuristics in preventing misinformation.

Abstract

We study a social learning model in which agents iteratively update their beliefs about the true state of the world using private signals and the beliefs of other agents in a non-Bayesian manner. Some agents are stubborn, meaning they attempt to convince others of an erroneous true state (modeling fake news). We show that while agents learn the true state on short timescales, they "forget" it and believe the erroneous state to be true on longer timescales. Using these results, we devise strategies for seeding stubborn agents so as to disrupt learning, which outperform intuitive heuristics and give novel insights regarding vulnerabilities in social learning.

Tables1

Table 1. Table 1 : Dataset details

Name	Description	Nodes	Edges
Gnutella	Peer-to-peer network	6,301	20,777
Wiki-Vote	Wiki admin elections	7,115	103,689
Pokec	Slovakian social network	1,632,803	30,622,564
LiveJournal	Blogging platform	4,847,571	68,993,773

Equations650

θ_{T_{n}} (i^{*}) P n \to \infty {θ, 0, T_{n} (1 - p_{n}) n \to \infty 0 T_{n} (1 - p_{n}) n \to \infty \infty .

θ_{T_{n}} (i^{*}) P n \to \infty {θ, 0, T_{n} (1 - p_{n}) n \to \infty 0 T_{n} (1 - p_{n}) n \to \infty \infty .

μ_{t} (i) (A) \propto \int_{x \in A} x^{α_{t} (i) - 1} (1 - x)^{β_{t} (i) - 1} d x .

μ_{t} (i) (A) \propto \int_{x \in A} x^{α_{t} (i) - 1} (1 - x)^{β_{t} (i) - 1} d x .

α_{t} (i) = (1 - η) (α_{t - 1} (i) + s_{t} (i)) + j \in N_{in} (i) \sum \frac{η α _{t - 1} ( j )}{d _{in} ( i )},

α_{t} (i) = (1 - η) (α_{t - 1} (i) + s_{t} (i)) + j \in N_{in} (i) \sum \frac{η α _{t - 1} ( j )}{d _{in} ( i )},

β_{t} (i) = (1 - η) (β_{t - 1} (i) + 1 - s_{t} (i)) + j \in N_{in} (i) \sum \frac{η β _{t - 1} ( j )}{d _{in} ( i )},

α_{t} (i) = 0, β_{t} (i) = \overset{ˉ}{β} + (1 - η) t \forall t \in [T] .

α_{t} (i) = 0, β_{t} (i) = \overset{ˉ}{β} + (1 - η) t \forall t \in [T] .

W_{p} (μ, ν) = (X, Y) : X \sim μ, Y \sim ν in f (E ∣ X - Y ∣^{p})^{1/ p}

W_{p} (μ, ν) = (X, Y) : X \sim μ, Y \sim ν in f (E ∣ X - Y ∣^{p})^{1/ p}

d_{o u t} (i), d_{in}^{A} (i) \in N, d_{in}^{B} (i) \in N_{0} \forall i \in A,

d_{o u t} (i), d_{in}^{A} (i) \in N, d_{in}^{B} (i) \in N_{0} \forall i \in A,

i \in A \sum d_{o u t} (i) = i \in A \sum d_{in}^{A} (i) .

d (i) = (d_{o u t} (i), d_{in}^{A} (i), d_{in}^{B} (i)) .

d (i) = (d_{o u t} (i), d_{in}^{A} (i), d_{in}^{B} (i)) .

j \in A \cup B \sum \frac{η ∣ { j ^{'} \to i ^{'} \in E : j ^{'} = j , i ^{'} = i } ∣ α _{t - 1} ( j )}{d _{in} ( i )},

j \in A \cup B \sum \frac{η ∣ { j ^{'} \to i ^{'} \in E : j ^{'} = j , i ^{'} = i } ∣ α _{t - 1} ( j )}{d _{in} ( i )},

f_{n}^{*} (i, j, k)

f_{n}^{*} (i, j, k)

f_{n} (i, j, k)

\tilde{p}_{n}^{*}

\tilde{p}_{n}^{*}

\tilde{p}_{n}

\tilde{q}_{n}

Ω_{n, 1}

Ω_{n, 1}

\displaystyle\quad\quad\cap\Big{\{}\Big{|}\frac{\sum_{i=1}^{n}d_{out}(i)^{2}}{n}-\nu_{2}\Big{|}<n^{-\gamma}\Big{\}}

\displaystyle\quad\quad\cap\Big{\{}\Big{|}\frac{\sum_{i=1}^{n}d_{out}(i)d_{in}^{A}(i)}{n}-\nu_{3}\Big{|}<n^{-\gamma}\Big{\}}.

Ω_{n, 2} = {∣ p_{n} - \tilde{p}_{n} ∣ < δ_{n}, \tilde{p}_{n}^{*} \geq \tilde{p}_{n}, \tilde{q}_{n} < 1 - ξ} .

Ω_{n, 2} = {∣ p_{n} - \tilde{p}_{n} ∣ < δ_{n}, \tilde{p}_{n}^{*} \geq \tilde{p}_{n}, \tilde{q}_{n} < 1 - ξ} .

θ_{T_{n}} (i^{*}) P n \to \infty ⎩ ⎨ ⎧ θ, \frac{θ ( 1 - e ^{- cη} )}{cη}, 0, T_{n} (1 - p_{n}) \to 0 T_{n} (1 - p_{n}) \to c \in (0, \infty) T_{n} (1 - p_{n}) \to \infty .

θ_{T_{n}} (i^{*}) P n \to \infty ⎩ ⎨ ⎧ θ, \frac{θ ( 1 - e ^{- cη} )}{cη}, 0, T_{n} (1 - p_{n}) \to 0 T_{n} (1 - p_{n}) \to c \in (0, \infty) T_{n} (1 - p_{n}) \to \infty .

\frac{d _{in}^{A} ( X _{l} )}{( d _{in}^{A} ( X _{l} ) + d _{in}^{B} ( X _{l} )} .

\frac{d _{in}^{A} ( X _{l} )}{( d _{in}^{A} ( X _{l} ) + d _{in}^{B} ( X _{l} )} .

a \in A \sum \frac{d _{in}^{A} ( a )}{d _{in}^{A} ( a ) + d _{in}^{B} ( a )} \frac{d _{o u t} ( a )}{\sum _{a^{'} \in A} d _{o u t} ( a ^{'} )} = \tilde{p}_{n} .

a \in A \sum \frac{d _{in}^{A} ( a )}{d _{in}^{A} ( a ) + d _{in}^{B} ( a )} \frac{d _{o u t} ( a )}{\sum _{a^{'} \in A} d _{o u t} ( a ^{'} )} = \tilde{p}_{n} .

\tilde{p}_{n}^{T_{n}} \approx p_{n}^{T_{n}} = (1 - \frac{T _{n} ( 1 - p _{n} )}{T _{n}})^{T_{n}} \approx e^{- l i m_{n \to \infty} T_{n} (1 - p_{n})} .

\tilde{p}_{n}^{T_{n}} \approx p_{n}^{T_{n}} = (1 - \frac{T _{n} ( 1 - p _{n} )}{T _{n}})^{T_{n}} \approx e^{- l i m_{n \to \infty} T_{n} (1 - p_{n})} .

a \in A \sum \frac{d _{in}^{A} ( a )}{( d _{in}^{A} ( a ) + d _{in}^{B} ( a ) ) ^{2}} \frac{d _{o u t} ( a )}{\sum _{a^{'} \in A} d _{o u t} ( a ^{'} )} = \tilde{q}_{n} .

a \in A \sum \frac{d _{in}^{A} ( a )}{( d _{in}^{A} ( a ) + d _{in}^{B} ( a ) ) ^{2}} \frac{d _{o u t} ( a )}{\sum _{a^{'} \in A} d _{o u t} ( a ^{'} )} = \tilde{q}_{n} .

\frac{ν _{3}}{ν _{1}} \approx i = 1 \sum n \frac{d _{o u t} ( i )}{\sum _{i^{'} = 1}^{n} d _{o u t} ( i ^{'} )} d_{in}^{A} (i) \geq 1,

\frac{ν _{3}}{ν _{1}} \approx i = 1 \sum n \frac{d _{o u t} ( i )}{\sum _{i^{'} = 1}^{n} d _{o u t} ( i ^{'} )} d_{in}^{A} (i) \geq 1,

(ν_{3} / ν_{1})^{T_{n}} = O ((ν_{3} / ν_{1})^{ζ l o g_{ν_{3} / ν_{1}} (n)}) = O (n^{ζ}) = o (n) .

(ν_{3} / ν_{1})^{T_{n}} = O ((ν_{3} / ν_{1})^{ζ l o g_{ν_{3} / ν_{1}} (n)}) = O (n^{ζ}) = o (n) .

\tilde{p}_{n} (d) = i = 1 \sum n \frac{d _{in}^{A} ( i )}{d _{in}^{A} ( i ) + d ( i )} \frac{d _{o u t} ( i )}{m _{n}} \forall d \in N_{0}^{n},

\tilde{p}_{n} (d) = i = 1 \sum n \frac{d _{in}^{A} ( i )}{d _{in}^{A} ( i ) + d ( i )} \frac{d _{o u t} ( i )}{m _{n}} \forall d \in N_{0}^{n},

d \in N_{0}^{n} min \tilde{p}_{n} (d) s . t . i = 1 \sum n d (i) \leq b_{n} .

d \in N_{0}^{n} min \tilde{p}_{n} (d) s . t . i = 1 \sum n d (i) \leq b_{n} .

\displaystyle{\color[rgb]{0,0,0}\hat{p}_{n}(d)=\begin{cases}\tilde{p}_{n}(d),&d\in dom(\hat{p}_{n})\\ \infty,&\textrm{otherwise}\end{cases}.}

\displaystyle{\color[rgb]{0,0,0}\hat{p}_{n}(d)=\begin{cases}\tilde{p}_{n}(d),&d\in dom(\hat{p}_{n})\\ \infty,&\textrm{otherwise}\end{cases}.}

d \in R_{+}^{n} min \tilde{p}_{n} (d) s . t . i = 1 \sum n d (i) \leq b_{n},

d \in R_{+}^{n} min \tilde{p}_{n} (d) s . t . i = 1 \sum n d (i) \leq b_{n},

d_{n}^{r e l} (i) = d_{in}^{A} (i) (\frac{r ( i )}{h ^{*}} - 1)_{+} \forall i \in [n],

d_{n}^{r e l} (i) = d_{in}^{A} (i) (\frac{r ( i )}{h ^{*}} - 1)_{+} \forall i \in [n],

h (x) = \frac{\sum _{i \in [n] : r (i) \geq x^{2}} d _{o u t} ( i ) d _{in}^{A} ( i )}{b _{n} + \sum _{i \in [n] : r (i) \geq x^{2}} d _{in}^{A} ( i )} \forall x \in R_{+} .

h (x) = \frac{\sum _{i \in [n] : r (i) \geq x^{2}} d _{o u t} ( i ) d _{in}^{A} ( i )}{b _{n} + \sum _{i \in [n] : r (i) \geq x^{2}} d _{in}^{A} ( i )} \forall x \in R_{+} .

\frac{d _{in}^{A} ( i )}{d _{in}^{A} ( i ) + d _{n}^{r e l} ( i )} = \frac{d _{in}^{A} ( j )}{d _{in}^{A} ( j ) + d _{n}^{r e l} ( j )} .

\frac{d _{in}^{A} ( i )}{d _{in}^{A} ( i ) + d _{n}^{r e l} ( i )} = \frac{d _{in}^{A} ( j )}{d _{in}^{A} ( j ) + d _{n}^{r e l} ( j )} .

d_{n}^{o pt} \in d \in N_{0}^{n} : \sum_{i = 1}^{n} d (i) \leq b_{n} arg min \tilde{p}_{n} (d),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Local non-Bayesian social learning with stubborn agents

Daniel Vial

Vijay Subramanian

\IEEEmembershipSenior Member, IEEE We are grateful for financial support from the NSF (grants EPCN:1603861 and CIF:AF:2008130) and LGE Inc. via Mcity. D. Vial is with the University of Texas, Austin, TX (email: [email protected]). V. Subramanian is with the University of Michigan, Ann Arbor, MI (email: [email protected]).

Abstract

We study a social learning model in which agents iteratively update their beliefs about the true state of the world using private signals and the beliefs of other agents in a non-Bayesian manner. Some agents are stubborn, meaning they attempt to convince others of an erroneous true state (modeling fake news). We show that while agents learn the true state on short timescales, they “forget” it and believe the erroneous state to be true on longer timescales. Using these results, we devise strategies for seeding stubborn agents so as to disrupt learning, which outperform intuitive heuristics and give novel insights regarding vulnerabilities in social learning.

1 Introduction

With the rise of social networks, people increasingly receive news through non-traditional sources. One recent study shows that two-thirds of American adults have gotten news through social media [1]. Such news sources are fundamentally different than traditional ones like print media and television, in the sense that social media users read and discuss news on the same platform. As a consequence, users turning to these platforms for news receive information not only from major publications but from others users as well; in the words of [2], a user “with no track record or reputation can in some cases reach as many readers as Fox News, CNN, or the New York Times.” This phenomenon famously reared its head during the 2016 United States presidential election when fake news stories were shared tens of millions of times [2], and it has remained a critical issue in 2020 [3].

In this paper, we study a mathematical model describing this situation. The model includes a set of agents attempting to learn the true state of the world (e.g. which of two candidates is better suited for office). Each agent iteratively updates its belief (i.e. its distribution over possible states) in a manner similar to the non-Bayesian social learning model of [4] using information from three sources. First, each agent receives noisy observations of the true state, modeling e.g. news stories. Second, each agent observes the beliefs of a subset of other agents, modeling e.g. discussions with other social media users. Third, each agent may observe the beliefs of stubborn agents or bots who aim to persuade others of an erroneous true state, modeling e.g. users spreading fake news.111The term stubborn agents has been used in the literature to describe such agents; the term bots is used in reference to automated social media accounts spreading fake news while masquerading as real users [5]. This process continues iteratively until a finite learning horizon.

Under this model, two competing forces emerge as the learning horizon grows. On the one hand, agents receive more observations of the true state, which help them learn. On the other hand, the beliefs of the bots gradually propagate through the system, suggesting that agents become increasingly exposed to bots and thus less likely to learn. Hence, while the horizon clearly affects the learning outcome, the nature of this effect – namely, whether learning becomes more or less likely as the horizon grows – is less clear.

This effect of the learning horizon has often been ignored in works with models similar to ours. For example, our model is nearly identical to that in the empirical work [6], in which the authors show that polarized opinions can arise when there are two types of bots with diametrically opposed viewpoints. However, the experiments in [6] simply fix a large learning horizon and do not consider the effect of varying it. Models similar to ours have also been treated analytically in e.g. [4, 7, 8, 9], but these works consider infinite horizons and/or cooperative settings (i.e. no stubborn agents). See Section 5 for details on these (and other) works.

In the first part of the paper (see Section 3), we argue that the learning horizon plays a prominent role when stubborn agents are present and should not be ignored. In particular, we show that the learning outcome depends on the relationship between the horizon $T_{n}$ and a quantity $p_{n}$ that describes the “density” of bots in the system, where both quantities may vary with the number of agents $n$ . Mathematically, letting $\theta\in(0,1)$ denote the true state and $\theta_{T_{n}}(i^{*})$ the mean of the belief (hereafter, the estimate) for a uniformly random agent $i^{*}$ at the horizon $T_{n}$ , we show (see Theorem 1)222The theorem also addresses the case $\lim_{n\rightarrow\infty}T_{n}(1-p_{n})\in(0,\infty)$ .

[TABLE]

Here $p_{n}$ is smaller when more bots are present and 0 is the erroneous true state promoted by the bots. Hence, in words, (1) says the following: if there are sufficiently few bots, in the sense that $T_{n}(1-p_{n})\rightarrow 0$ , $i^{*}$ learns the true state; if there are sufficiently many bots, in the sense that $T_{n}(1-p_{n})\rightarrow\infty$ , $i^{*}$ adopts the extreme estimate 0 promoted by the bots. Additionally, we show the belief of $i^{*}$ converges to a Dirac measure in a certain sense (see Corollary 1).

We note the result in (1) assumes a particular random graph model for the social network connecting agents and bots (a modification of the so-called directed configuration model). For such models, phase transitions – wherein small changes to model parameters lead to starkly different behaviors – are often observed. In this case, assuming $T_{n}=(1-p_{n})^{-k}$ for some $k>0$ , and also assuming $p_{n}\rightarrow 1$ , the learning outcome suddenly drops from $\theta$ to [math] as $k$ changes from e.g. $0.99$ to $1.01$ . Put differently, agents initially (at time $(1-p_{n})^{-0.99}$ ) learn the true state, then suddenly (at time $(1-p_{n})^{-1.01}$ ) “forget” the true state and adopt the extreme estimate [math]. Hence, we show the horizon can lead to drastically different outcomes. We also note proving (1) involves analyzing hitting probabilities for random walks on random graphs with absorbing states (bots in our setting), which may be of independent interest.

In the second part of the paper (see Section 4), we study a setting in which an adversary chooses how many bots to connect to each agent, subject to a budget constraint. The adversary would like to minimize $\theta_{T_{n}}(i^{*})$ (i.e. to convince agents of the erroneous state [math]), but this quantity depends on the graph topology, which is not publicly available for social networks like Twitter. Hence, motivated by (1), we formulate the adversary’s problem as minimizing $p_{n}$ , which only depends on the degrees in the graph – e.g. number of followers on Twitter, which is publicly available. We clarify that $\theta_{T_{n}}(i^{*})$ is monotone in $p_{n}$ only as $n\rightarrow\infty$ for the random graph of Section 3 (see Theorem 1). Thus, we use $p_{n}$ as a tractable (albeit nonrigorous) surrogate for the true objective function $\theta_{T_{n}}(i^{*})$ , and we show empirically that these quantities are closely correlated for real social networks (see Figure 2). Alternatively, given a target $\theta_{T_{n}}(i^{*})$ , we can minimize the horizon $T_{n}$ when this target estimate is reached. However, we view $T_{n}$ as fixed and thus do not pursue this dual problem.

Minimizing $p_{n}$ amounts to solving an integer program, which can be done in polynomial time owing to the structure of $p_{n}$ . However, the computational complexity is $\Omega(n^{2})$ , which is infeasible for social networks like Twitter. Thus, we propose a randomized approximation algorithm that runs in time $n\log n$ and that produces a constant-fraction approximation of the optimal solution with high probability (see Theorem 2). Moreover, whereas the logic of the optimal solution is somewhat opaque, the form of our approximate solution offers the interpretation that successful adversaries carefully balance agents’ influence and susceptibility to influence. For a social network like Twitter, this means targeting users with many followers (i.e. influential users) who follow very few users themselves, so that fake news will occupy a greater portion of the targeted users’ feeds. While somewhat intuitive, the precise form of the randomized scheme is far from obvious. Furthermore, empirical results show that our scheme disrupts learning to a larger extent than schemes that more obviously balance influence and susceptibility. Thus, we believe our analysis provides new insights into vulnerabilities of news sharing platforms and non-Bayesian social learning models.

The paper is organized as follows. In Section 2, we define our learning model. Sections 3 and 4 follow the outline above. We discuss related work in Section 5.

Notational conventions: The following notation is used frequently. For $k\in\mathbb{N}$ , we let $[k]=\{1,\ldots,k\}$ , and for $k,k^{\prime}\in\mathbb{N}$ we let $[k]+k^{\prime}=k^{\prime}+[k]=\{1+k^{\prime},\ldots,k+k^{\prime}\}$ . All vectors are treated as row vectors. We let $e_{i}$ denote the vector with 1 in the $i$ -th position and 0 elsewhere. We denote the set of nonnegative integers by $\mathbb{N}_{0}=\mathbb{N}\cup\{0\}$ . We use $1(A)$ for the indicator function, i.e. $1(A)=1$ if $A$ is true and 0 otherwise. All random variables are defined on a common probability space $(\Omega,\mathcal{F},\mathbb{P})$ , with $\mathbb{E}[\cdot]=\int_{\Omega}\cdot\ d\mathbb{P}$ denoting expectation, $\xrightarrow{\mathbb{P}}$ denoting convergence in probability, and $a.s.$ meaning $\mathbb{P}-$ almost surely.

2 Learning model

We begin by defining the model of social learning studied throughout the paper. The basic ingredients are (1) a true state of the world, (2) a social network connecting two sets of nodes, some who aim to learn the true state and some who wish to persuade others of an erroneous true state, and (3) a learning horizon. We discuss each in turn.

The true state of the world is a constant $\theta\in(0,1)$ . For example, in an election between candidates representing two political parties (say, Party 1 and Party 2), $\theta\approx 0$ and $\theta\approx 1$ means the Party 1 and 2 candidates are superior, respectively. We emphasize that $\theta$ is a deterministic constant and depends neither on time, nor on the number of nodes in the system.

A directed graph $G=(A\cup B,E)$ connects disjoint sets of nodes $A$ and $B$ . We refer to elements of $A$ as regular agents, or simply agents, and elements of $B$ as stubborn agents or bots. While agents attempt to learn the true state $\theta$ , bots aim to disrupt this learning and convince agents that the true state is instead 0. In the election example, agents represent voters who study the two candidates to learn which is superior, while bots are loyal to Party 1 and aim to convince agents that the corresponding candidate is superior (despite possible evidence to the contrary). Edges in the graph represent connections in a social network over which nodes share beleifs in a manner that will be described shortly. An edge $j\rightarrow i$ means that $i$ observes $j$ ’s belief. Let $N_{in}(i)=\{j\in A\cup B:j\rightarrow i\in E\}$ and $d_{in}(i)=|N_{in}(i)|$ ; we assume $N_{in}(i)\neq\emptyset$ .

Agents and bots share beliefs until a learning horizon $T\in\mathbb{N}$ . We will allow the horizon to depend on the number of agents $n\triangleq|A|$ and will thus denote it by $T_{n}$ at times. In the election example, $T$ represents the duration of the election, i.e. the number of time units that agents can learn about the candidates and that bots can attempt to convince agents of the superiority of the Party 1 candidate.

Given these basic ingredients, we can define the learning process. At time $t=0$ , agent $i\in A$ has a $\text{Beta}(\alpha_{0}(i),\beta_{0}(i))$ belief, where $\alpha_{0}(i)\in(0,\bar{\alpha}]$ and $\beta_{0}(i)\in(0,\bar{\beta}]$ for some $\bar{\alpha},\bar{\beta}\in(0,\infty)$ that do not depend on $n$ . For each $t\in[T]$ , $i$ receives the signal $s_{t}(i)\sim\text{Bernoulli}(\theta)$ . In the absence of a network, the Bayesian approach dictates that $i$ update its parameters to $\alpha_{t}(i)=\alpha_{t-1}(i)+s_{t}(i)$ and $\beta_{t}(i)=\beta_{t-1}(i)+(1-s_{t}(i))$ and its belief to $\mu_{t}(i)=\text{Beta}(\alpha_{t}(i),\beta_{t}(i))$ , namely, for any (measurable) $\mathcal{A}\subset[0,1]$ ,

[TABLE]

In our running example, $\alpha_{t}(i)$ and $\beta_{t}(i)$ represent the number of news stories favorable to respective parties that $i$ has read during the election, plus some prior parameters $\alpha_{0}(i)$ and $\beta_{0}(i)$ that account for $i$ ’s biases from before the election. As $t$ grows, the belief $\mu_{t}(i)$ converges to a Dirac measure on its mean $\theta_{t}(i)=\alpha_{t}(i)/(\alpha_{t}(i)+\beta_{t}(i))$ ; intuitively, $i$ becomes increasingly confident that the true state is the fraction of stories favorable to a certain party.

In the presence of a network, we proceed in the same manner, except the parameters are updated as follows:

[TABLE]

where $\eta\in(0,1)$ . Intuitively, $i$ reads the news and calculates its favorability of the parties as before, then discusses with its neighbors to update its favoribility. Mathematically, $i$ performs a Bayesian parameter update and then averages parameters. [6] uses the same update, whereas agents in [4] do Bayesian belief updates and then average beliefs. Our update also resembles the deGroot model [10], where there are no signals and estimates are averaged across neighbors. See Section 5.

Finally, we specify bot behavior. For $i\in B$ , we set $N_{in}(i)=\{i\}$ , $\alpha_{0}(i)=0$ , $\beta_{0}(i)=\bar{\beta}$ , and $s_{t}(i)=0\ \forall\ t\in[T]$ , then iteratively define $\{\alpha_{t}(i),\beta_{t}(i)\}_{t=1}^{T}$ via (3). More explicitly, a simple inductive proof shows

[TABLE]

In our running example, $\alpha_{0}(i)=0$ , $\beta_{0}(i)=\bar{\beta}$ , and $s_{t}(i)=0$ means $i$ ’s prior parameters and signals are maximally biased toward Party 1. Furthermore, we can interpret $N_{in}(i)=\{i\}$ as bots being “echo chambers” who only listen to themselves. Finally, note that since all bots $i\in B$ have the same behavior, we assume (without loss of generality) that the outgoing neighbor set of $i\in B$ is $N_{out}(i)=\{i,i^{\prime}\}$ for some $i^{\prime}\in A$ , i.e. in addition to its self-loop, each bot has a single outgoing neighbor from the agent set.

3 Learning outcome

To begin our analysis of the learning outcome, we show when all agents are (pathwise) connected to bots, their beliefs converge to those of the bots. Formally, for $p\geq 1$ , let

[TABLE]

denote the $p$ -Wasserstein distance for probability measures $\mu$ and $\nu$ , where $X\sim\mu,Y\sim\nu$ means $X$ and $Y$ have respective marginals $\mu$ and $\nu$ . For $x\in[0,1]$ , let $\delta_{x}$ denote the Dirac measure $\delta_{x}(\mathcal{A})=1(x\in\mathcal{A})$ for measurable $\mathcal{A}\subset[0,1]$ . We then have the following (see Appendix 10 for a proof).

Proposition 1

Suppose that for any $i\in A$ , there exists $l\in\mathbb{N}$ and $(i_{\tau})_{\tau=0}^{l}\in(A\cup B)^{l+1}$ such that $i_{0}=i$ , $i_{\tau-1}\rightarrow i_{\tau}\in E\ \forall\ \tau\in[l]$ , and $i_{l}\in B$ . Then for any $i\in A$ and $p\geq 1$ , $\lim_{t\rightarrow\infty}\theta_{t}(i)=\lim_{t\rightarrow\infty}W_{p}(\mu_{t}(i),\delta_{0})=0\ a.s.$

Hence, for a large enough horizon, estimates and beliefs become arbitrarily close to zero. A natural follow-up question is how such a horizon scales – and in which graph parameters – for a sequence of graphs $\{G_{n}\}_{n=1}^{\infty}$ . In this section, we address this question for a particular random graph model, succinctly described as the directed configuration model (DCM) plus bots. The DCM constructs a graph with prespecified degrees, which, conditioned on being simple (i.e. having no self-loops or multi-edges), is uniformly distributed among (simple) graphs of those degrees [11, Proposition 7.15]. This is an appealing property for deGroot-like learning models such as ours, because in the deGroot model for undirected graphs, asymptotic estimates depend only on the degrees and the initial beliefs. Thus, loosely speaking, our analysis is “average-case” over relevant graphs. Furthermore, we will show the graph parameters that dictate learning for the DCM are tractable, which we exploit in Section 4 for general graphs.

Having motivated our study of the DCM, we define it in Section 3.1, present our main result for the DCM in Section 3.2, and discuss our assumptions in Section 3.3.

3.1 Graph model

To begin, we realize a sequence $\{d_{out}(i),d_{in}^{A}(i),d_{in}^{B}(i)\}_{i\in A}$ called the degree sequence from some distribution; here we let $A=[n]$ . In the construction described next, $i\in A$ will have $d_{out}(i)$ outgoing neighbors ( $i$ will be observed by $d_{out}(i)$ other agents), $d_{in}^{A}(i)$ incoming neighbors from the $A$ ( $i$ will observe $d_{in}^{A}(i)$ agents), and $d_{in}^{B}(i)$ incoming neighbors from $B$ ( $i$ will observe $d_{in}^{B}(i)$ bots). Here the total in-degree of $i$ is $d_{in}(i)=d_{in}^{A}(i)+d_{in}^{B}(i)$ (as used in (5)). We assume

[TABLE]

In words, the first condition says $i$ is observed by and observes at least one agent, and may observe one or more bots. The second condition says sum out-degree must equal sum in-degree in the agent sub-graph; this will be necessary to construct a graph with the given degrees. Finally, it will be convenient to define the degree vector of $i\in A$ as

[TABLE]

After realizing the degree sequence, we begin the graph construction.333This construction is presented more formally in Appendix 7.1 . First, we attach $d_{out}(i)$ outgoing half-edges, $d_{in}^{A}(i)$ incoming half-edges labeled $A$ , and $d_{in}^{B}(i)$ incoming half-edges labeled $B$ , to each $i\in A$ ; we will refer to these half-edges as outstubs, $A$ -instubs, and $B$ -instubs, respectively. Let $O_{A}$ denote the set of all agents’ outstubs. We then pair each outstub in $O_{A}$ with an $A$ -instub to form edges between agents in a breadth-first-search fashion that proceeds as follows:

•

Sample $i^{*}$ from $A$ uniformly. For each the $d_{in}^{A}(i^{*})$ $A$ -instubs attached to $i^{*}$ , sample an outstub uniformly from $O_{A}$ (resampling if the sampled outstub has already been paired), and connect the instub and outstub to form an edge from some agent to $i^{*}$ .

•

Let $A_{1}=\{i\in A\setminus\{i^{*}\}:\textrm{an outstub of$ i $was paired with}$ $\textrm{an$ A $-instub of$ i^{*} $}\}$ . For each $i\in A_{1}$ , pair the $d_{in}^{A}(i)$ $A$ -instubs attached to $i$ in the same manner the $A$ -instubs of $i^{*}$ were paired in the previous step.

•

Continue iteratively until all $A$ -instubs have been paired. In particular, during the $l$ -th iteration, we pair all $A$ -instubs attached to $A_{l}$ , the agents at geodesic distance $l$ from $i^{*}$ .

The procedure above yields the standard DCM, plus unpaired $B$ -instubs attached to some agents. To pair these instubs, we define $B=n+\big{[}\sum_{i\in A}d_{in}^{B}(i)\big{]}$ to be the set of bots (hence, the node set is $A\cup B=\big{[}n+\sum_{i\in A}d_{in}^{B}(i)\big{]}$ ). To each $i\in B$ we add a single self-loop and a single unpaired outstub (as described at the end of Section 2). This yields $\sum_{i\in A}d_{in}^{B}(i)$ unpaired outstubs attached to bots. Finally, we pair these outstubs arbitrarily with the $\sum_{i\in A}d_{in}^{B}(i)$ unpaired $B$ -instubs from above to form edges from bots to agents (the pairing can be arbitrary since all bots behave the same).

We note that the pairing of $A$ -instubs with outstubs from $O_{A}$ did not prohibit multi-edges, so the set of edges $E$ formed will in general be a multi-set. For this reason, we replace the summation in the $\alpha_{t}(i)$ update (3) with

[TABLE]

and analogously for the $\beta_{t}(i)$ update, i.e. we weigh the parameters of $i$ ’s neighbors proportional to the number of edges pointing to $i$ . We also note that if $d_{in}^{B}(i)=0\ \forall\ i\in A$ , the construction above reduces to the standard DCM.

Our results will require assumptions on the degree sequence $\{d(i)\}_{i\in A}$ , where (we recall) $d(i)$ is the degree vector of $i$ (see (9)). First, we define $f_{n}^{*},f_{n}:\mathbb{N}\times\mathbb{N}\times\mathbb{N}_{0}\rightarrow[0,1]$ by

[TABLE]

In words, $f_{n}^{*}$ and $f_{n}$ are the degree distributions of agents sampled uniformly and sampled proportional to out-degree, respectively. Note that, since the first agent $i^{*}$ added to the graph is sampled uniformly from $A$ , the degrees of $i^{*}$ are distributed as $f_{n}^{*}$ . Furthermore, recall that, to pair $A$ -instubs, we sample outstubs uniformly from $O_{A}$ , resampling if the sampled outstub is already paired. It follows that, each time we add a new agent to the graph (besides $i^{*}$ ), its degrees are distributed as $f_{n}$ . We also note that, because the degree sequence is random, these distributions are random as well. From these random distributions, we define the random variables

[TABLE]

Following the discussion above, $\tilde{p}_{n}^{*}$ is the expected value (conditioned on the degree sequence) of the ratio of $A$ -instubs to total instubs for $i^{*}$ ; $\tilde{p}_{n}$ is the expected value of this same ratio, but for new agents added to the graph. The interpretation of $\tilde{q}_{n}$ is similar. At the end of Section 3.2, we discuss in more detail why these random variables arise in our analysis.

We now state four assumptions, which we discuss in detail in Section 3.3. Two of these require the degree sequence to be well-behaved (with high probability) – specifically, A1 requires certain moments of the degree sequence to be finite, while A3 requires $\{\tilde{p}_{n}\}_{n\in\mathbb{N}}$ to be close to a deterministic sequence $\{p_{n}\}_{n\in\mathbb{N}}$ . The other assumptions, A2 and A4, impose maximum and minimum rates of growth for the learning horizon $T_{n}$ . In particular, $T_{n}$ must be finite for each finite $n$ but grow to infinity with $n$ .

A1

$\lim_{n\rightarrow\infty}\mathbb{P}(\Omega_{n,1})=1$ , where, for some $\nu_{1},\nu_{2},\nu_{3},\gamma>0$ independent of $n$ such that $\nu_{3}>\nu_{1}$ ,444The assumption $\nu_{3}>\nu_{1}$ only eliminates the trivial case of a line graph; see Section 3.3 for details.

[TABLE] 2. A2

$\exists\ N\in\mathbb{N}$ and $\zeta\in(0,1/2)$ independent of $n$ s.t. $T_{n}\leq\zeta\log(n)/\log(\nu_{3}/\nu_{1})\ \forall\ n\geq N$ . 3. A3

$\lim_{n\rightarrow\infty}\mathbb{P}(\Omega_{n,2})=1$ , where, for some $p_{n}\in[0,1]$ s.t. $\lim_{n\rightarrow\infty}p_{n}=p\in[0,1]$ , some $0\leq\delta_{n}=o(1/T_{n})$ , and some $\xi\in(0,1)$ independent of $n$ ,

[TABLE] 4. A4

$\lim_{n\rightarrow\infty}T_{n}=\infty$ .

3.2 Main result

We can now present Theorem 1. The theorem states that the estimate at time $T_{n}$ of a uniformly random agent converges in probability as $n\rightarrow\infty$ . As discussed in the introduction, the limit depends on the relative asymptotics of the time horizon $T_{n}$ and the quantity $p_{n}$ defined in A3. For example, this limit is $\theta$ when $T_{n}(1-p_{n})\rightarrow 0$ ; note that $T_{n}(1-p_{n})\rightarrow 0$ requires $p_{n}$ to quickly approach 1 (since $T_{n}\rightarrow\infty$ by A4), which by A3 and (13) suggests the number of bots is small. Hence, $i^{*}$ learns the true state when there are sufficiently few bots. (The other cases can be interpreted similarly.)

Theorem 1

Assume that $G$ is the DCM and that A1, A2, A3, and A4 hold. Then for $i^{*}\sim A$ uniformly,

[TABLE]

Before discussing the proof, we make several observations:

•

Suppose $p_{n}$ is fixed and consider varying $T_{n}$ . To be concrete, let $p_{n}=1-(\log n)^{-1/2}$ and define $T_{n,1}=(\log n)^{1/4}$ and $T_{n,2}=(\log n)^{3/4}$ (note $T_{n,1},T_{n,2}$ satisfy A2, A4). Then $T_{n,1}(1-p_{n})\rightarrow 0$ and $T_{n,2}(1-p_{n})\rightarrow\infty$ , so by Theorem 1, the estimate of $i^{*}$ converges to $\theta$ at time $T_{n,1}$ and to 0 at time $T_{n,2}$ . In words, $i^{*}$ initially (at time $(\log n)^{1/4}$ ) learns the state of the world, then later (at time $(\log n)^{3/4}$ ) forgets it and adopts the bot estimates.

•

Alternatively, suppose $T_{n}$ is fixed and consider varying $p_{n}$ . For example, let $p_{n}=1-c/T_{n}$ for some $c\in(0,\infty)$ . Here smaller $c$ implies fewer bots, and Theorem 1 says the limiting estimate of $i^{*}$ is a decreasing convex function of $c$ . One interpretation is that, if an adversary deploys bots in hopes of driving agent estimates to 0, the marginal benefit of deploying additional bots is smaller when $c$ is larger, i.e. the adversary experiences “diminishing returns”. It is also worth noting that, since $(1-e^{-c\eta})/(c\eta)\rightarrow 1$ as $c\rightarrow 0$ and $(1-e^{-c\eta})/(c\eta)\rightarrow 0$ as $c\rightarrow\infty$ , the limiting estimate of $i^{*}$ is continuous as a function of $c$ .

•

If $T_{n}(1-p_{n})\rightarrow c\in(0,\infty)$ , consider the limiting estimate of $i^{*}$ as a function of $\eta$ . By Theorem 1, this estimate tends to $\theta$ as $\eta\rightarrow 0$ and tends to $(1-e^{-c})/c$ as $\eta\rightarrow 1$ . This is expected from (3): when $\eta=0$ , agents ignore the network (and thus avoid exposure to biased bot beliefs) and form estimates based only on unbiased signals; when $\eta=1$ , the opposite is true.

•

If $p_{n}\rightarrow p<1$ , we must have $T_{n}(1-p_{n})\rightarrow\infty$ (since $T_{n}\rightarrow\infty$ by A4), and the estimate of $i^{*}$ tends to 0 by Theorem 1. Loosely speaking, this says that a necessary condition for learning is that the bots vanish asymptotically (in the sense that $p_{n}\rightarrow 1$ ).

•

In fact, in the case $p_{n}\not\rightarrow 1$ , a stronger result holds: the set of agents $i$ for which $\theta_{T_{n}}(i)\not\rightarrow 0$ vanishes relative to $n$ . See Appendix 6 for details.

The proof of Theorem 1 is lengthy and deferred to Appendices 7 and 9, where Appendix 7 lays out the structure of the proof. However, we next present a short argument to illustrate the fundamental reason why the three cases of the limiting estimate arise in Theorem 1.

At a high level, these three cases arise as follows. First, when $T_{n}(1-p_{n})\rightarrow 0$ , the “density” of bots within the $T_{n}$ -step incoming neighborhood of $i^{*}$ is small. As a consequence, $i^{*}$ is not exposed to the biased beliefs of bots by time $T_{n}$ and is able to learn the true state ( $\theta_{T_{n}}(i^{*})\rightarrow\theta$ ). On the other hand, when $T_{n}(1-p_{n})\rightarrow\infty$ , this “density” is large; $i^{*}$ is exposed to bot beliefs and thus adopts them. Finally, when $T_{n}(1-p_{n})\rightarrow c\in(0,\infty)$ , the “density” is moderate; $i^{*}$ does not fully learn, nor does $i^{*}$ fully adopt bot beliefs.

This explanation is not at all surprising; what is more subtle is what precisely density of bots within the $T_{n}$ -step incoming neighborhood of $i^{*}$ means. It turns out that the relevant quantity is the probability that a random walker exploring this neighborhood reaches the set of bots. To illustrate this, consider a random walk $\{X_{l}\}_{l\in\mathbb{N}}$ that begins at $X_{0}=i^{*}$ and, for $l\geq 0$ , chooses $X_{l+1}$ uniformly from all incoming neighbors of $X_{l}$ (agents and bots); note here that the walk follows edges in the direction opposite to their polarity in the graph. For this walk, it is easy to see that, conditioned on the event $X_{l}\in A$ , the event $X_{l+1}\in A$ occurs with probability

[TABLE]

Crucially, we sample this walk and construct the graph simultaneously, by choosing which instub of $X_{l-1}$ to follow before actually pairing these instubs. Assuming they are later paired with agent outstubs chosen uniformly at random, and hence connected to agents chosen proportional to out-degree, we can average (21) over the out-degree distribution to obtain that $X_{l+1}\in A$ occurs with probability

[TABLE]

Now since bots have a self-loop and no other incoming edges, they are absorbing states on this walk. It follows that $X_{T_{n}}\in A$ if and only if $X_{l}\in A\ \forall\ l\in[T_{n}]$ ; by the argument above, this latter event occurs with probability $\tilde{p}_{n}^{T_{n}}$ . Since $\tilde{p}_{n}\approx p_{n}$ by A3, we thus obtain that $X_{T_{n}}\in A$ with probability

[TABLE]

From this final expression, Theorem 1 emerges: when $T_{n}(1-p_{n})\rightarrow 0$ , the random walker remains in the agent set with probability $\approx 1$ ; this corresponds to $i^{*}$ avoiding exposure to bot beliefs and learning the true state. Similarly, $T_{n}(1-p_{n})\rightarrow\infty$ means the walker is absorbed into the bot set with probability $\approx 1$ , corresponding to $i^{*}$ adopting bot estimates. Finally, $T_{n}(1-p_{n})\rightarrow c\in(0,\infty)$ means the walker stays in the agent set with probability $\approx e^{-c}\in(0,1)$ , corresponding to $i^{*}$ not fully learning nor fully adopting bot estimates.

We note that the actual proof of Theorem 1 does not precisely follow the foregoing argument. Instead, we locally approximate the graph construction with a certain branching process; we then study random walks on the tree resulting from this branching process.555This is necessary because the argument leading to (22) assumes instubs are paired with outstubs chosen uniformly at random, which is not true if resampling of outstubs occurs in the construction from Section 3.1. However, the foregoing argument illustrates the basic reason why the three distinct cases of Theorem 1 arise. We also observe that the argument leading to (22) shows why $\tilde{p}_{n}$ enters into our analysis. The other random variables defined in (13) enter similarly. Specifically, $\tilde{p}_{n}^{*}$ arises in almost the same manner, but pertains only to the first step of the walk; this distinction arises since the walk starts at $i^{*}$ , the degrees of which relate to $\tilde{p}_{n}^{*}$ . On the other hand, $\tilde{q}_{n}$ arises when we analyze the variance of agent estimates. This is because analyzing the variance involves studying two random walks; by an argument similar to (22), the probability of both walks visiting the same agent is

[TABLE]

Finally, we note that the proof of Theorem 1 reveals that the variance of each agent’s belief vanishes, so beliefs converge to Dirac measures. Combined with the theorem, this yields the following corollary. See Appendix 10 for a proof.

Corollary 1

Assume $G$ is the DCM and A1, A2, A3, and A4 hold. Let $L(p_{n})=L(\{p_{n}\}_{n=1}^{\infty},T_{n})$ denote the limit (in probability) of $\theta_{T_{n}}(i^{*})$ from Theorem 1. Then for any $p\geq 1$ and for $i^{*}\sim A$ uniformly, $W_{p}(\mu_{T_{n}}(i^{*}),\delta_{L(p_{n})})\xrightarrow[n\rightarrow\infty]{\mathbb{P}}0$ .

3.3 Comments on assumptions

We now return to comment on the assumptions needed to prove our results. First, A1 states that certain empirical moments of the degree distribution – namely, for $i^{*}\sim A$ uniformly, the first two moments of $d_{out}(i^{*})$ and the correlation between $d_{out}(i^{*})$ and $d_{in}^{A}(i^{*})$ – converge to finite limits. Roughly speaking, this says our graph lies in a sparse regime, where typical node degrees do not grow with the number of nodes.666This is analogous to e.g. an Erdős-Rényi model with edge probability $\lambda/n$ for constant $\lambda>0$ , where degrees converge to $\textrm{Poisson}(\lambda)$ random variables. We also note $\nu_{3}>\nu_{1}$ in A1 is minor and simply eliminates an uninteresting case. To see this, first note that when $\Omega_{n,1}$ holds, we have (roughly)

[TABLE]

where we have used the assumed inequality $d_{in}^{A}(i)\geq 1\ \forall\ i\in[n]$ . Hence, $\nu_{3}<\nu_{1}$ cannot occur, so assuming $\nu_{3}>\nu_{1}$ only prohibits $\nu_{3}=\nu_{1}$ . This remaining case is uninteresting because $\nu_{3}/\nu_{1}$ is the limiting number of offspring for each node in the branching process we analyze; thus, if $\nu_{3}=\nu_{1}$ , the tree resulting from this process is simply a line graph.

Next, A2 states $T_{n}=O(\log n)$ . Together with A1, these assumptions are standard given our analysis approach, which, as discussed previously, locally approximates the graph construction with a branching process. We also note that, with the interpretation of $\nu_{3}/\nu_{1}$ above, it follows that the number of agents within the $T_{n}$ -step neighborhood of $i^{*}$ is roughly

[TABLE]

In words, the size of the aforementioned neighborhood vanishes relative to $n$ . This is why our title refers to the learning as “local”: only a vanishing fraction of other agents (those within this neighborhood) affect the estimate of $i^{*}$ .

The remaining statements are needed to establish estimate convergence on the tree resulting from the branching process. A4 states $T_{n}\rightarrow\infty$ with $n$ , which is an obvious requirement for convergence. A3 essentially says that three events occur with high probability. First, $\tilde{p}_{n}$ should be close to a convergent, deterministic sequence $p_{n}$ ; this is necessary since the asymptotics of $p_{n}$ play a prominent role in Theorem 1. Second, $\tilde{p}_{n}^{*}\geq\tilde{p}_{n}$ essentially says that bots prefer to attach to agents with higher out-degrees, i.e. more influential agents; this is the behavior one would intuitively expect from bots aiming to disrupt learning. Third, $\tilde{q}_{n}<1-\xi\in(0,1)$ is satisfied if, for example, all agents have total in-degree at least two.

Finally, while we focused on the DCM in this section, our analytical approach is more general. At a high level, the key properties of the DCM we used are that most nodes’ $O(\log n)$ -step neighborhoods are treelike and “statistically similar,” which allows for a branching process coupling. Such couplings exist more generally, though this $O(\log n)$ scaling will be smaller for denser graphs, which makes $T_{n}$ smaller as well.

4 Adversarial setting

We next formalize the adversarial problem introduced in Section 1. We begin with some notation. Let $m_{n}=\sum_{i=1}^{n}d_{out}(i)$ , and (with slight abuse of notation to the previous section), define the function $\tilde{p}_{n}:\mathbb{N}_{0}^{n}\rightarrow[0,1]$ by

[TABLE]

which is simply $\tilde{p}_{n}$ , as defined in (13), viewed as a function of the bot in-degrees $d(i)\triangleq d_{in}^{B}(i)$ 777We suppress the sub- and super-scripts to avoid cumbersome notation.. Given a budget $b_{n}\in\mathbb{N}$ , the adversary’s problem is then as follows:

[TABLE]

Thus, the adversary’s objective function only depends on the agent degrees $\{d_{out}(i),d_{in}^{A}(i)\}_{i\in[n]}$ (e.g. numbers of followers and followees on Twitter), and not the topology of the agent sub-graph. Consequently, the topology will play no role in this section, i.e. we do not require the DCM assumption. We reiterate that, by Theorem 1, solving (28) is equivalent to minimizing estimates asymptotically for the DCM.888More precisely, this only holds if the solution of (28) converges in the sense of A3. We are unsure if this holds, but we view it as a minor technical point and leave it as an open problem. For general graph topologies, we treat (28) as a nonrigorous but tractable surrogate for estimate minimization, and we will soon show empirically that this is a reasonable choice.

4.1 Exact solution

First, we let $dom(\hat{p}_{n})=\{d\in\mathbb{N}_{0}^{n}:\sum_{i=1}^{n}d(i)=b_{n}\}$ and rewrite (28) as $\min_{d\in\mathbb{Z}^{n}}\hat{p}_{n}(d)$ , where

[TABLE]

In words, we incorporated the constraints from (28) into the objective; we also used the (obvious) fact that the solution of (28) satisfies the budget constraint with equality. The new objective $\hat{p}_{n}$ satisfies a certain discrete convexity property, which implies that $d$ minimizes $\hat{p}_{n}$ if and only if $\hat{p}_{n}(d)\leq\hat{p}_{n}(d+e_{i}-e_{j})$ for any $i,j$ pair. Hence, we can find the minimizer by iteratively replacing $d$ with $d+e_{i}-e_{j}$ until the objective stops decreasing. This approach is known as steepest descent [12, Section 10.1.1] and is provided in Algorithm 1. In Appendix 8.5, we show its runtime is $\Theta(n^{2})$ in the best case and $O(n^{2}b_{n})$ in the general case.

4.2 Approximation algorithm

Algorithm 1’s $\Omega(n^{2})$ runtime is prohibitive for massive networks like Twitter, which motivates our approximation scheme. The idea is to first solve the relaxed problem

[TABLE]

and then to sample bot locations in proportion to the relaxed solution. More formally, our approximate solution $d_{n}^{rand}$ is constructed via Algorithm 2. We note that by definition, the budget constraint holds with equality for Algorithm 2. Also, as shown in Appendix 8.1, the solution of (30) is

[TABLE]

where $x_{+}=x1(x>0)$ , $r(i)=d_{out}(i)/d_{in}^{A}(i)\ \forall\ i\in[n]$ , $h^{*}=\max_{x\in\mathbb{R}_{+}}h(x)$ , and

[TABLE]

This randomized scheme yields useful insights, in contrast to the optimal algorithm. In particular, the randomized and relaxed solutions $d_{n}^{rand}$ and $d_{n}^{rel}$ are equal in expectation, and the relaxed solution $d_{n}^{rel}$ satisfies some intuitive properties:

•

$d_{n}^{rel}(i)$ grows with $r(i)=d_{out}(i)/d_{in}^{A}(i)$ , i.e. the adversary targets agents $i$ with large $d_{out}(i)$ and small $d_{in}^{A}(i)$ under the relaxed solution. Here large $d_{out}(i)$ means $i$ is influential (e.g. $i$ has many Twitter followers), while small $d_{in}^{A}(i)$ means $i$ is susceptible to influence (e.g. $i$ has few Twitter followees, so bot tweets will appear prominently in $i$ ’s Twitter feed).

•

If $r(i)<(h^{*})^{2}$ , then $d_{n}^{rel}(i)=d_{n}^{rand}(i)=0$ . Hence, if $i$ is sufficiently non-influential, and/or sufficiently non-susceptible, targeting $i$ gives no value to the adversary.

•

If $r(i)=r(j)>(h^{*})^{2}$ , the relaxed solution yields

[TABLE]

This can be interpreted as follows: the adversary strives for a similar proportion of fake news in the feeds of users with similar ratios of influence to susceptibility.

In short, our approximate solution strives to balance influence and susceptibility. While somewhat intuitive, the precise manner in which this balance occurs (in particular, the form of (31)-(32)) is far from obvious.

In Appendix 8.5 , we show Algorithm 2 has complexity $O(n\log n+b_{n})$ . In terms of accuracy, we next prove that with high probability, Algorithm 2 is a $(2+\delta)$ -approximation algorithm for the constrained problem $\max_{d\in\mathbb{N}_{0}^{n}:\sum_{d}(i)\leq b_{n}}(1-\tilde{p}_{n}(d))$ , which is equivalent to (28). More precisely, letting $d_{n}^{opt}$ be any solution of (28), i.e.

[TABLE]

we have the following result.

Theorem 2

Let $\delta>0$ and $c_{\delta}=\frac{\delta^{2}}{4(2+\delta)^{2}}$ . Then

[TABLE]

Proof 4.3.

As mentioned above, Appendix 8.1 shows (31) solves (30) (the proof amounts to verifying KKT conditions, see e.g. [13, Section 5.5.3]). Hence, by definition (34),

[TABLE]

We next rewrite $1-\tilde{p}_{n}(d_{n}^{rand})$ in terms of the random vector $W=(W_{j})_{j=1}^{b_{n}}$ from Algorithm 2. Toward this end, let $\bar{r}=\max_{j\in[n]}r(j)$ , and for $w=(w_{j})_{j=1}^{b_{n}}\in[n]^{b_{n}}$ define

[TABLE]

Then a simple calculation yields

[TABLE]

and using Jensen’s inequality, one can show

[TABLE]

(see Appendix 8.2 for details.) Combining (37)-(40),

[TABLE]

Also, using (40) and recalling $\bar{r}=\max_{j\in[n]}r(j)$ , we have

[TABLE]

By the previous two lines, the following implies the theorem:

[TABLE]

Such an inequality would follow from a simple Hoeffding bound if $g_{n}(W)$ was simply $\sum_{j}W_{j}$ ; however, $g_{n}(W)$ is a much more complicated function. Fortunately, $g_{n}$ belongs to a special class called self-bounding functions [14, Section 3.3], for which concentration inequalities of the form (43) are known. See Appendix 8.3 for details.

The tail bound in Theorem 2 is opaque, as it relies on $\tilde{p}_{n}(d_{n}^{rel})$ , which (in general) is difficult to interpret. Under certain assumptions, we can obtain more transparent results. For example, we have the following corollary.

Corollary 4.4.

Let $\bar{r}=\max_{j\in[n]}r(j)$ as above. Assume $\lim_{n\rightarrow\infty}b_{n}=\infty$ and for some $\epsilon>0$ independent of $n$ ,

[TABLE]

Then $\exists\ \{\delta_{n}\}_{n\in\mathbb{N}}\subset(0,\infty)$ s.t. $\lim_{n\rightarrow\infty}\delta_{n}=0$ and

[TABLE]

Proof 4.5.

Since $d_{n}^{rel}$ solves (30), we can weaken the bound in Theorem 2 by replacing $\tilde{p}_{n}(d_{n}^{rel})$ with $\tilde{p}_{n}(d)$ for any $d\in\mathbb{R}_{+}^{n}$ with $\sum_{i}d(i)\leq b_{n}$ . Thus, the proof chooses a particular $d$ that leads to a more tractable bound, and the assumptions ensure this bound vanishes. See Appendix 8.4 for details.

In words, the corollary shows our randomized scheme is (asymptotically) a $2$ -approximation algorithm with probability tending to $1$ . The assumption (44) only precludes the case where only finitely many of the degree ratios $r(i)$ are comparable to the maximum $\bar{r}$ . This restriction arises because our self-bounding concentration analysis in Theorem 2 requires normalization by $\bar{r}$ (see Appendix 8.3.)

4.3 Empirical results

A fundamental assumption in our adversary solutions is that $\tilde{p}_{n}$ and $\theta_{T_{n}}(i^{*})$ are correlated, in the sense that minimizing $\tilde{p}_{n}$ also minimizes $\theta_{T_{n}}(i^{*})$ . While Theorem 1 states this correlation holds for the random graph model of Section 3.1, it is unclear if this correlation occurs in practice. To conclude this section, we present empirical results suggesting that this indeed occurs. In our experiments, we compare our proposed solutions against some natural heuristics:

•

A naive baseline, which uses Algorithm 2 but samples each $W_{j}$ uniformly from $[n]$ .

•

Three schemes which similarly use Algorithm 2, along with the observed degrees: sampling $W_{j}$ proportional to $d_{out}$ (i.e. targeting influential nodes), $d_{in}^{A}$ (i.e. targeting susceptible nodes), and $d_{out}/d_{in}^{A}$ (i.e. naively balancing the two).

•

Sampling $W_{j}$ proportional to $\textrm{PageRank}(\epsilon)$ [15], where999In experiments, we compute the first $\lceil\log(0.99)/\log(1-\epsilon)\rceil$ summands, which guarantees an $l_{1}$ error bound of $0.01$ .

[TABLE]

where $\epsilon\in(0,1)$ , $\mathbf{1}_{n}$ is the length- $n$ ones vector, and $P_{A}$ is the agent sub-graph’s column-normalized adjacency matrix, i.e. the matrix with $(i,j)$ -th element

[TABLE]

PageRank is a commonly-used measure of influence or centrality for graphs in many domains [16] (and a richer such measure than $d_{out}$ ).

We compare our proposed solutions with these heuristics using four datasets from [17], described in Table 1. We chose these datasets so we could test our proposed solutions on real social networks of two scales: Gnutella and Wiki-Vote have $n<10^{4}$ , a scale at which the exact solution Algorithm 1 is feasible; Pokec and LiveJournal have $n>10^{6}$ , a scale that renders Algorithm 1 infeasible but that more closely resembles social networks of interest. For the experiments, we set $\theta=0.5$ (to maximize signal variance), $\eta=0.9$ (to emphasize the effect of the network), and $T_{n}=101$ (to ensure the code had reasonable runtime). We let $b_{n}=\lceil|E_{n}|/400\rceil$ , so that 0.25% of all agent in-edges are connected to bots. For each graph and each of five experimental trials, we chose $\{d_{in}^{B}(i)\}_{i\in[n]}$ as described above, added bots to the original graph accordingly, and simulated the learning process from Section 2.

In Figure 1, we plot the mean and standard deviation (across experimental trials) of $\theta_{t}(i^{*})$ as a function of $t$ . For all datasets, our proposed solutions outperform all heuristics, in the sense that our solutions yield the lowest average $\theta_{t}(i^{*})$ for most values of $t$ . Furthermore, we note the following:

•

Across all graphs, our solutions outperform $\textrm{PageRank}(\epsilon)$ for all values of $\epsilon$ tested. This is quite surprising, because PageRank uses the entire graph topology, whereas our solutions only use degree information. Also, as $\epsilon$ becomes increasingly smaller, $\textrm{PageRank}(\epsilon)$ performs increasingly better, but this comes at the cost of higher runtime to estimate $\textrm{PageRank}(\epsilon)$ .

•

Among the heuristics using (at most) degree information, $d_{out}/d_{in}^{A}$ performs best – but still worse than Algorithm 2 – across all datasets. Put differently, naively balancing influence and susceptibility is not enough; the non-obvious form of Algorithm 2 yields better performance.

•

For Gnutella and Wiki-Vote, Algorithm 1 noticeably outperforms Algorithm 2. Though the former is an exact solution and the latter is an approximation, this is still surprising, since it is unclear that these schemes are even optimizing the correct objective for real graphs.

While Figure 1 only considers one choice of $b_{n}$ , we believe our conclusions are robust. In particular, we also tested the cases $b_{n}=\lceil\tilde{b}|E_{n}|\rceil$ for each $\tilde{b}\in\{\frac{1}{1600},\frac{1}{800},\frac{1}{400},\frac{1}{200},\frac{1}{100}\}$ , so that between $\approx 0.0625\%$ and $\approx 1\%$ of edges connected to bots (thus, Figure 1 shows the intermediate case $\tilde{b}=\frac{1}{400}$ ). Appendix 8.6 contains a figure analogous to Figure 1 for the other choices of $\tilde{b}$ ; the plots are qualitatively similar.

We have thus far shown that our solutions outperform heuristics, even those using graph topology. This is quite surprising: our solutions were derived under the fundamental assumption that minimizing $\theta_{T_{n}}(i^{*})$ amounts to minimizing $\tilde{p}_{n}$ , but we only verified this assumption asymptotically for a class of random graphs. Thus, our empirical results suggest that even for real social networks, this assumption holds. Indeed, in Figure 2 we show scatter plots of $\theta_{T_{n}}(i^{*})$ against $\tilde{p}_{n}$ (each dot represents one experimental trial). For all datasets, the two quantities are closely correlated.

5 Related work

As discussed in Section 2, (3) resembles the non-Bayesian social learning model from [4], which uses belief update

[TABLE]

where $\sum_{j}\eta_{ij}=1$ , $\omega_{t}(i)$ is a signal, and BU means Bayesian update. Hence, agents perform Bayesian updates and then average in terms of beliefs in [4] but parameters in this work. The main advantage of the latter is that beliefs remain Beta distributions, which simplifies our analysis. This simplification, along with weights $\eta/d_{in}(j)$ instead of (48), are needed since we consider a finite horizon and a graph which need not be connected, in contrast to [4]. Another distinction is that agents in [4] cannot learn the true state individually, and need the network for learning. In contrast, agents in our work can learn in isolation (simply by averaging their signals), so the network can either speed up learning or be a detriment. We highlight here the detriment with our model relevant to platforms like Twitter, where users who could have read accurate news in isolation instead of risking exposure to bots.

Our parameter update is also studied in [6], which features bots defined in a slightly different manner but in the same spirit. However, [6] only includes theoretical results in the case $B=\emptyset$ ; the case $B\neq\emptyset$ is studied empirically. This allowed [6] to use a slightly richer model, including a time-varying graph and agent-dependent mixture parameters $\sum_{j\in N_{in}(i)\cup\{i\}}\eta_{ij}$ . Notably, the empirical results from [6] fix a learning horizon and do not investigate the effects of different timescales; in particular, the delicate relationship between timescale and bot prevalence from Theorem 1 is not brought to light. Beyond stubborn agents, [18, 19] propose different non-Bayesian updates to cope with Byzantine agents with arbitrary behavior.

From an analytical perspective, our approach of analyzing estimates by studying random walks is similar to the deGroot model [10]. Here the estimate vector $\theta_{t}=\{\theta_{t}(i)\}_{i}$ is updated as $\theta_{t}=\theta_{t-1}W$ for some column-stochastic matrix $W$ . Hence, $\theta_{t}=\theta_{0}W^{t}$ , so $i$ ’s belief is determined by the distribution of a $t$ -step random walk from $i$ . This observation has been exploited in the literature; see the surveys [20, Section 3] and [21, Section 4], and the references therein. For example, assuming $W$ is irreducible and aperiodic, and therefore has a well-defined stationary distribution $\pi$ , [7] establishes conditions for learning using the fact that $\theta_{t}(i)=\theta_{0}W^{t}e_{i}^{\mathsf{T}}\approx\theta_{0}\pi^{\mathsf{T}}\ \forall\ i$ when $t$ is large. Roughly speaking, our model combines deGroot-like averaging with exogenous unbiased signals. As discussed, the averaging in our case exposes agents to biased beliefs (due to bots); the resulting tension between biased and unbiased information is a key feature in our model not present in deGroot’s. Ours is arguably a richer model of platforms like Twitter, where there is a similar tension between legitimate news and bots. Beyond the deGroot model, agents in [22] perform Bayesian updates using the prior of a randomly-chosen neighbor, which yields a different connection to random walks; assuming strong connectedness, the authors exploit the fact that the walk visits every agent infinitely often (i.o.) to derive conditions for learning.

Similar to [4], the papers of the previous paragraph typically assume strong connectedness and long learning horizons so as to leverage properties such as stationary distributions and i.o. visits. This is a fundamental distinction from our work. Indeed, even if we disregard stubborn agents, the random walk converges to a stationary distribution, but it does not converge within our local learning horizon. This is because, as shown in [23], the DCM we consider has mixing time that exceeds

[TABLE]

where we used Jensen’s inequality and (25). The right side exceeds $T_{n}$ by A2, i.e. our learning horizon occurs before the underlying random walk mixes. In fact, [23] shows that the random walk on the DCM exhibits cutoff, meaning that the $T_{n}$ -step distribution of this walk can be maximally far from the stationary distribution (i.e. the total variation distance between these distributions can be 1 for certain starting locations of the walk). Hence, not only can we not use this stationary distribution, we cannot even use an approximation of it. Again, this means our analysis cannot leverage global properties typically used when relating estimates to random walks. We circument this using the DCM, which has a well-behaved local structure. We also note that our idea to simultaneously construct the graph and sample the walk is taken from [23].

Some other works have considered social learning with stubborn agents. For example, [8] studies a model in which agents meet and either retain their own estimates, adopt the average of their estimates, or adopt a weighted average; the agent whose estimate has a larger weight is called a “forceful” agent. Here the authors show that all agent estimates converge to a common random variable and study its deviation from the true state. A crucial difference between this work and ours is that [8] assumes even forceful agents occasionally observe other agents’ opinions. This yields an underlying Markov chain that is irreducible (unlike ours); the analysis then relies on this chain having a well-defined stationary distribution.

Stubborn agents have also been considered in the consensus setting [24], which asks whether agent estimates converge to a common value, i.e. a consensus. For example, [25] considers a model in which regular agents adopt weighted averages of estimates upon meeting other agents, while stubborn agents always retain their own estimates. This intuitively prohibits a consensus from forming; indeed, it is shown that agent estimates fail to converge, i.e. disagreement can persist indefinitely. Another example is [26], in which an agent’s estimate at time $t+1$ is a weighted average of their own estimate at time 0 and their neighbors’ estimates at time $t$ . In this model, stubborn agents place all weight on their own estimate from time 0 and thus do not update their estimates. The analysis in [26] is similar to ours as it relates agent estimates to hitting probabilities of the stubborn agent set, but it differs as the learning horizon is infinite in [26]. Also in the consensus setting, [27] investigates protocols for robust consensus that may lessen the undesirable effects of stubborn agents.

The problem of deploying stubborn agents is studied in [28, 29], though for the voter model. Both assume knowledge of a matrix describing the graph topology (like $P_{A}$ from Section 4.3), and the optimization requires inverting this matrix at complexity $n^{3}$ . Our algorithms overcome both of these issues. We also note this inversion is common in more general influence maximization settings.

Without stubborn agents, [30] considers a non-Bayesian update for infinite horizons, where agents treat neighbors’ beliefs as independent. Convergence rates are provided in [9, 31, 32] for (3) or similar Bayesian-plus-aggregation updates. An open question is how these models behave with stubborn agents, particularly for [9, 31, 32], where the convergence may be slower than the propagation of stubborn agent bias.

6 Special case

While Theorem 1 establishes convergence for the estimate of a typical agent, a natural question to ask is how many agents have convergent estimates. Our second result, Theorem 6.6, provides a partial answer to this question. To prove the result, we require slightly stronger assumptions than those required for Theorem 1 (we will return shortly to comment on why these are needed). First, we strengthen A1 and A3 to include particular rates of convergence for the probabilities $\mathbb{P}(\Omega_{n,i}),i\in\{1,2\}$ . Second, we strengthen A4 with a minimum rate at which $T_{n}\rightarrow\infty$ (specifically, $T_{n}=\Omega(\log n)$ ). Third, and perhaps most restrictively, we require $p_{n}\rightarrow p<1$ in A1. As a result, Theorem 6.6 only applies to the case $T_{n}(1-p_{n})\rightarrow\infty$ , for which Theorem 1 states the estimate of a uniform agent converges to zero. In this setting, Theorem 6.6 provides an upper bound on how many agents’ estimates do not converge to zero. In particular, this bound is $O(n^{k})$ for some $k<1$ .

Theorem 6.6.

Assume $\exists\ \kappa,\mu>0$ and $N^{\prime}\in\mathbb{N}$ independent of $n$ s.t. the following hold:

•

A1*, with $\mathbb{P}(\Omega_{n,1})=O(n^{-\kappa})$ .*

•

A2*.*

•

A3*, with $\mathbb{P}(\Omega_{n,2})=O(n^{-\kappa})$ and $p<1$ .*

•

A4*, with $T_{n}\geq\mu\log n\ \forall\ n\geq N^{\prime}$ .*

Then for any $\epsilon>0$ , $k>1-\min\{(1/2)-\zeta,\mu(\epsilon\eta(1-p)/\theta)^{2}/16,\kappa\}$ , and $K>0$ , all independent of $n$ ,

[TABLE]

We reiterate that $\zeta<1/2$ by A2 and $\mu,\kappa>0$ by the theorem statement. Hence, $\min\{(1/2)-\zeta,\mu(\epsilon\eta(1-p)/\theta)^{2}/16,\kappa\}>0$ , so one can choose $k<1$ in Theorem 6.6 to show that the size of the non-convergent set of agents vanishes relative to $n$ . We suspect that such a result is the best one could hope for; in particular, we suspect that showing all agent estimates converge to zero is impossible. This is in part because our assumptions do not preclude the graph from being disconnected. Hence, there may be small connected components composed of agents but no bots; in such components, agent estimates will converge to $\theta$ (not zero). Additionally, while the lower bound for $k$ in Theorem 6.6 is somewhat unwieldy, certain terms are easily interpretable: the bound sharpens as $\eta$ grows (i.e. as agents place less weight on their unbiased signals), as $p$ decays (i.e. as the number of bots grows), and as $\theta$ decays (i.e. as signals are more likely to be zero, pushing estimates to zero).

As for Theorem 1, the proof of Theorem 6.6 is outlined in Appendix 7 with details provided in Appendix 9. The crux of the proof involves obtaining a sufficiently fast rate for the convergence in Theorem 1; namely, we show that for some $\gamma>0$ , $\mathbb{P}(\theta_{T_{n}}(i^{*})>\epsilon)=O(n^{-\gamma})$ .101010One may wonder why we derive a separate bound for Theorem 6.6, since we have already bounded $\mathbb{P}(\theta_{T_{n}}(i^{*})>\epsilon)$ to prove Theorem 1. The reason for this is that the bound for Theorem 1 does not decay quickly enough as $n\rightarrow\infty$ to prove Theorem 6.6; on the other hand, the bound for Theorem 6.6 does not decay at all as $n\rightarrow\infty$ for the case $T_{n}(1-p_{n})\rightarrow[0,\infty)$ and therefore cannot be used for all cases of Theorem 1. See Appendix 7.4.2 for details. At a high level, obtaining such a bound requires bounding three probabilities by $O(n^{-\gamma})$ , which also helps explain the stronger assumptions of Theorem 6.6:

•

As for Theorem 1, we first locally approximate the graph construction with a branching process so as to analyze the estimates on a tree. Here strengthening A1 with $\mathbb{P}(\Omega_{n,1})=O(n^{-\kappa})$ is necessary to ensure this approximation fails with probability at most $O(n^{-\gamma})$ .

•

To analyze the estimates on a tree, we first condition on the random tree structure and treat the estimate as a weighted sum of i.i.d. signals using an approach similar to Hoeffding’s inequality. Namely, we obtain the Hoeffding-like tail $O(e^{-2\epsilon^{2}T_{n}})$ ; strengthening A4 with $T_{n}\geq\mu\log n$ is necessary to show this tail is $O(e^{-2\epsilon^{2}\mu\log n})=O(n^{-2\epsilon^{2}\mu})=O(n^{-\gamma})$ .

•

Finally, after conditioning on the tree structure, we show this structure is close to its mean. More specifically, letting $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ denote the expected estimate for the root node in the tree conditioned on the random tree structure (see Appendix 7 for details), we show

[TABLE]

Note the only source of randomness in $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ is the random tree; because this tree is recursively generated, it has a martingale-like structure that can be analyzed using an approach similar to the Azuma-Hoeffding inequality for bounded-difference martingales. Here we require $\mathbb{P}(\Omega_{n,2})=O(n^{-\kappa})$ to ensure the degree sequence is ill-behaved with probability at most $O(n^{-\gamma})$ ; we also require $p_{n}\rightarrow p<1$ in this step (and only in this step).

We now address the most notable difference between Theorems 1 and 6.6; namely, that the latter only applies when $p_{n}\rightarrow p<1$ . We believe this reflects a fundamental distinction between the cases $p_{n}\rightarrow p<1$ and $p_{n}\rightarrow 1$ and is not an artifact of our analysis. An intuitive reason for this is that more bots are present in the former case, so fewer random signals are present (recall we model bot signals as being deterministically zero). As a result, $\theta_{T_{n}}(i^{*})$ is “less random”, so its concentration around its mean is stronger. Towards a more rigorous explanation, we first note that Appendix 7.4.1 provides the following condition for extending Theorem 6.6 to other cases of $p_{n}$ :

[TABLE]

where $L(p_{n})$ is the limit from Theorem 1 based on the relative asymptotics of $T_{n}$ and $p_{n}$ , i.e.

[TABLE]

It is the convergence of $|\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]-L(p_{n})|$ in (52) that we suspect is fundamentally different in the cases $p_{n}\rightarrow p<1$ and $p_{n}\rightarrow 1$ . To illustrate this, we provide empirical results in Figure 3. In the leftmost plot, we show $1-\tilde{p}_{n}$ versus $T_{n}$ ; here the plot is on a log-log scale, so a line with slope $m$ means $(1-\tilde{p}_{n})\propto T_{n}^{m}$ . Hence, we are comparing four cases: $m\approx 0$ , so that $p_{n}\approx p<1$ (blue circles); $m\approx-0.5$ , so that $T_{n}(1-p_{n})\rightarrow\infty$ and $p_{n}\rightarrow 1$ (orange squares); $m\approx-1$ , so that $T_{n}(1-p_{n})\rightarrow 1$ (yellow diamonds); and $m\approx-1.5$ , so that $T_{n}(1-p_{n})\rightarrow 0$ (purple triangles). The second plot reflects the corresponding cases of $L(p_{n})$ : $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ decays to zero in the first two cases, grows towards $\theta=0.5$ in the fourth case, and approaches an intermediate limit in the third case. The final two plots illustrate the convergence (or lack thereof) in (52). Here the empirical mean of the error term $|\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]-L(p_{n})|$ decays quickly for the first case but decays more slowly (or is even non-monotonic) in the other cases. More strikingly, the empirical variance of this error term is several orders of magnitude smaller in the first case. This suggests that $\mathbb{P}(|\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]-L(p_{n})|>\epsilon)$ decays much more rapidly in the case $p_{n}\rightarrow p<1$ , which is why we believe this is the only case for which (52) is satisfied.

In addition to the summary statistics shown in Figure 3, we also show histograms of error term $|\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]-L(p_{n})|$ across the 400 trials in Figure 4. As discussed above, this term must converge to zero (in probability) at a sufficiently fast rate to prove Theorem 6.6. In Figure 4, these histograms appear to converge quickly to a point mass at zero in the case $p_{n}\rightarrow p<1$ ; in other cases, such behavior does not occur, further suggesting a fundamental difference between the cases.

We note here that basic workflow of the experiment above proceeded as follows:

•

Choose a sequence of time horizons $T_{n}$ that increase linearly, then set $n$ accordingly.

•

Realize the degrees $\{d_{out}(i),d_{in}^{A}(i),d_{in}^{B}(i)\}_{i\in[n]}$ after selecting $n$ .

•

Define the empirical distributions $f_{n},f_{n}^{*}$ using the degrees as in (11).

•

Evaluate quantity of interest $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ empirically via (87) using $f_{n},f_{n}^{*}$ .

We repeated this experiment 400 times to obtain 400 samples of $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ ; the plots in Figure 3 show empirical means and variances across these 400 samples. We used the following parameters:

•

We set $\eta=0.9$ to emphasize the effect of the network.

•

We let $d_{in}^{A}(i)=1+\textrm{Poisson}(\lambda_{A}-1)\ \forall\ i\in[n]$ , so that $\mathbb{E}[d_{in}^{A}(i)]=\lambda_{A}$ ; we choose $\lambda_{A}$ independent of $n$ so that $\mathbb{E}[d_{in}^{A}(i)]=O(1)$ , as required by A1. In particular, we choose $\lambda_{A}=2.1$ .

•

After realizing $\{d_{in}^{A}(i)\}_{i\in[n]}$ , we assign one outgoing edge to each $i\in[n]$ , then assign each of the remaining $\sum_{i\in[A]}d_{in}^{A}(i)-n$ outgoing edges independently and uniformly at random. Note that this implies $d_{in}^{A}(i),d_{out}(i)>0$ and $\sum_{i\in[n]}d_{in}^{A}(i)=\sum_{i\in[n]}d_{out}(i)$ , as required by (7).

•

We let $d_{in}^{B}(i)=\textrm{Poisson}(\lambda_{B})$ , with $\lambda_{B}=\lambda_{A}(1-p_{n})/p_{n}$ , so that

[TABLE]

(This is not precisely what we desire, since A3 assumes $p_{n}\approx\tilde{p}_{n}=\mathbb{E}_{n}[\frac{d_{in}^{A}(v^{*})}{d_{in}^{A}(v^{*})+d_{in}^{B}(v^{*})}]$ for $v^{*}$ sampled proportional to out-degree; however, as shown in the second plot in Figure 3, this empirically yields distinct cases rates of convergence for $(1-p_{n})\rightarrow 0$ .)

•

We compare four cases of $p_{n}$ : $p_{n}=p$ and $p_{n}=1-c_{i}T_{n}^{(-i+1)/2}$ for $i\in\{2,3,4\}$ , with $p$ and $c_{i}$ independent of $n$ . Note that the three latter cases satisfy

[TABLE]

as shown in Figure 3. Here $p$ and $c_{i}$ were chosen via trial-and-error so that all four cases behaved roughly the same at the smallest value of $n$ (as in Figure 3). In particular, we chose

[TABLE]

•

We let $T_{n}\in\{2,3,\ldots,11\}$ ; here the minimum of 2 was chosen since $T_{n}=1$ is a trivial case and the maximum of 11 was chosen due to computational limitations.

•

Given $T_{n}$ , we let $n=\lceil\lambda_{A}^{2T_{n}}\rceil$ . Note that this implies $T_{n}\approx(\log n)/(2\log\lambda_{A})$ , roughly the upper bound in A2. With our choice of $T_{n}$ and $\lambda_{A}$ , $n$ ranged from 20 to (roughly) 12 million.

7 Proof of Theorems 1 and 6.6 (outline)

The proofs of Theorems 1 and 6.6 proceed in two steps. First, we show that the graph construction can be locally approximated by a certain branching process. Second, we analyze the estimates of agents in the graph by instead analyzing the estimates of agents in the tree resulting from the branching process. We note that studying tree agent estimates rather than graph agent estimates is advantageous because the tree has a comparatively simple structure that is more amenable to analysis.

The first step is identical for both theorems, while the second step requires a different analysis for each theorem. In Appendix 7.1, we outline the first step, and in Appendices 7.2 and 7.3, respectively, we outline the second step for Theorems 1 and 6.6, respectively. To highlight the key ideas of our analysis, we defer many details to Appendix 9; in particular, proofs pertaining to Appendices 7.1, 7.2, and 7.3 , respectively, can be found in Appendices 9.1, 9.2, and 9.3, respectively. Finally, we note that throughout the analysis we use $\mathbb{P}_{n}$ and $\mathbb{E}_{n}$ , respectively, to denote probability and expectation, respectively, conditioned on the degree sequence $\{d_{out}(i),d_{in}^{A}(i),d_{in}^{B}(i)\}_{i\in[n]}$ .

7.1 Branching process approximation (Step 1 for proofs of Theorems 1 and 6.6)

We first show that the estimate of any agent in the graph depends (asymptotically) only on the structure of the agent’s local neighborhood and on certain signals realized within this neighborhood. This will facilitate the definition of the branching process with which we will approximate the graph construction. Importantly, the graph agent’s estimate will not depend on the prior parameters $\alpha_{0},\beta_{0}$ (asymptotically). This is necessary as we have not specified these priors (beyond assuming they are bounded by some $\bar{\alpha},\bar{\beta}$ independent of $n$ , as discussed in Section 2).

To begin, we require some notation. Let $P$ denote the graph’s column-normalized adjacency matrix, i.e. $P(i,j)=|\{i^{\prime}\rightarrow j^{\prime}\in E:i^{\prime}=i,j^{\prime}=j\}|/d_{in}(j)$ , and set $Q=(1-\eta)I+\eta P$ , where $I$ is the identity matrix of appropriate dimension. (Recall from Section 3.1 that $E$ is in general a multi-set; hence, the numerator in $P(i,j)$ may exceed 1.) Next, for $t\in\mathbb{N}$ , let $s_{t}$ denote the collection of signals $\{s_{t}(i)\}_{i\in A\cup B}$ in vector form. Finally, for $i\in A$ define

[TABLE]

We note that (57) can be rewritten as

[TABLE]

where we have used the fact that $s_{t}(j)=0\ \forall\ t\in\mathbb{N},j\in B$ . From this expression, it is clear that $\vartheta_{T_{n}}(i)$ only depends on the structure of the $T_{n}$ -step neighborhood into $i$ (since only this sub-graph affects the $e_{j}Q^{t}e_{i}^{\mathsf{T}}$ terms) and on certain signals within this neighborhood, as mentioned above. We can then establish the following.

Lemma 7.7.

Given A4, $\forall\ \epsilon>0\ \exists\ N$ s.t. $\forall\ n\geq N$ , $|\theta_{T_{n}}(i)-\vartheta_{T_{n}}(i)|<\epsilon\ a.s.\ \forall\ i\in A$ .

Proof 7.8.

See Appendix 9.1.1.

Before defining the aforementioned branching process, we formally define the graph construction described in Section 3.1. For this, we will use the following additional notation.

•

We let $A_{l},l\in\mathbb{N}_{0}$ denote the set of agents at distance $l$ from the initial agent $i^{*}$ , i.e. $i\in A_{l}$ means a path from $i$ to $i^{*}$ of length $l$ exists, but no shorter path exists (hence, $A_{0}=\{i^{*}\}$ , $A_{1}=N_{in}(i^{*})$ , etc.). Similarly, we let $B_{l},l\in\mathbb{N}_{0}$ denote the set of bots at distance $l$ from $i^{*}$ .

•

We let $\{(i,j):j\in[d_{out}(i)]\}$ denote the set of outstubs belonging to $i\in A$ ; we let $O_{A}$ denote the set of all such outstubs.

•

For each $(i,j)\in O_{A}$ , we define a label $g((i,j))\in\{1,2,3\}$ as follows:

[TABLE]

We will explain the utility of these labels shortly.

With this notation in place, we present the formal graph construction as Algorithm 3. We offer some further comments to help explain the algorithm:

•

The algorithm takes as input the degree sequence $\{d_{out}(i),d_{in}^{A}(i),d_{in}^{B}(i)\}_{i\in A}$ , which is used in Line 3 to define $O_{A}$ . Also in Line 3, we label all outstubs as 1 (since no agents have been added to the graph), and we initialize the set of bots to the empty set.

•

In Line 3, we sample the agent $i^{*}$ from which the graph construction begins. Since $i^{*}$ then belongs to the graph, we change the labels of its outstubs to 2.

•

For the remainder of the algorithm, we proceed in a breadth-first-search fashion, looping over distance $l$ and agents $i$ at distance $l$ from $i^{*}$ . For each such agent, we do the following:

–

For each of the $d_{in}^{A}(i)$ instubs of $i$ intended for pairing with agent outstubs, we sample an agent outstub uniformly (Line 3), resampling until an unpaired outstub (i.e. one with label 1 or 2) has been found (Line 3). Upon finding such an outstub, denoted $(i^{\prime},j^{\prime})$ , we pair it with $i$ ’s instub to form an edge from $i^{\prime}$ to $i$ (Line 3). Note that $g((i^{\prime},j^{\prime}))=1$ implies $i^{\prime}$ was added to the graph when edge $i^{\prime}\rightarrow i$ was formed; hence, because $i\in A_{l}$ , $i^{\prime}$ is at distance $l+1$ from $i^{*}$ and must be added to $A_{l+1}$ (Line 3). Finally, we update the labels of the outstubs of $i^{\prime}$ via (59) (Lines 3-3). (Line 3 will be used in the branching process approximation and will be discussed shortly.)

–

For each of the $d_{in}^{B}(i)$ instubs of $i$ intended for pairing with bot outstubs, we add a new bot with a self-loop and an unpaired outstub to the set of bots, updating $B_{l+1}$ accordingly (Line 3), and then add an edge from the new bot to $i$ (Line 3). Note here that $B=\emptyset$ at the start of the construction; it follows that the $k$ -th bot added to the graph is $n+k+1$ , so $B=n+[\sum_{i\in A}d_{in}^{B}(i)]$ is the set of bots at the end of the construction.

–

Finally, if all agent outstubs have been paired, the construction terminates (Line 3).

We now return to discuss Line 3 of Algorithm 3. Here $\tau_{n}$ denotes the first iteration at which an outstub with label 2 or 3 is sampled for pairing with an instub. Put differently, $\tau_{n}>l$ means that for the first $l$ iterations of the construction, only outstubs with label 1 have been sampled. This has two consequences. First, no edges have been added between two nodes both at distance $\leq l$ from $i^{*}$ , i.e. the $l$ -step incoming neighborhood of $i^{*}$ is a tree (except for the self-loops attached to bots). Second, no resampling of outstubs has occurred (Line 3); this implies that the outstub $(i^{\prime},j^{\prime})$ paired in Line 3 is chosen uniformly from $O_{A}$ , so the degrees $(d_{out}(i^{\prime}),d_{in}^{A}(i^{\prime}),d_{in}^{B}(i^{\prime}))$ of $i^{\prime}$ are distributed according to the out-degree distribution $f_{n}$ defined in (11).

These observations motivate a tree construction that we define next. In particular, we will construct a tree (except for bot self-loops) with edges pointing towards the root. Agents will be added to the tree with degrees sampled from $f_{n}$ , except for the root node, whose degrees are sampled from $f_{n}^{*}$ (also defined in (11)), corresponding to the degrees of $i^{*}$ in the graph construction.

The tree construction requires further notation. First, we let $\hat{A}_{l}$ ( $\hat{B}_{l}$ , respectively) denote agents (bots, respectively) at distance $l$ from the tree’s root. We also set $\hat{A}=\cup_{l=0}^{\infty}\hat{A}_{l},\hat{B}=\cup_{l=0}^{\infty}\hat{B}_{l}$ . (Here and moving forward, we use $\hat{\cdot}$ to distinguish tree-related objects from similarly-defined graph-related ones.) At times, we will use branching process terminology and e.g. refer to $\hat{A}_{l}$ as the $l$ -th generation of agents. We let $\phi$ denote the root node, so that $\hat{A}_{0}=\{\phi\}$ . We will denote generic node in $\hat{A}_{l}\cup\hat{B}_{l}$ as $\mathbf{i}\in\mathbb{N}^{l}$ ; here $\mathbf{i}=(i_{1},\ldots,i_{l})$ encodes the ancestry of $\mathbf{i}$ , i.e. $(i_{1},\ldots,i_{l})$ is the child of $(i_{1},\ldots,i_{l-1})$ , who is in turn the child of $(i_{1},\ldots,i_{l-2})$ , etc. Finally, for such $\mathbf{i}$ and for $j\in\mathbb{N}$ , $(\mathbf{i},j)=(i_{1},\ldots,i_{l},j)$ is the concatenation operation and $\mathbf{i}|j=(i_{1},\ldots,i_{j})$ denotes $\mathbf{i}$ ’s ancestor in generation $j$ , with $\mathbf{i}|0=\phi$ by convention (note also that $\mathbf{i}|l=\mathbf{i}$ for such $\mathbf{i}$ ).

With this notation in place, we define the tree construction in Algorithm 4. We offer several more explanatory comments:

•

Lines 4 and 4-4 define a particular random walk that will be used in Appendix 7.2; they do not affect the tree structure and we defer further explanation to Appendix 7.2.

•

As mentioned above, the root node $\phi$ has degrees sampled from $f_{n}^{*}$ (Line 4), while all other agents have degrees sampled from $f_{n}$ (Line 4).

•

In Line 4, a directed edge is added from $(\mathbf{i},j)$ to $\mathbf{i}$ ; the other $d_{out}((\mathbf{i},j))-1$ outstubs of $(\mathbf{i},j)$ are left unpaired so that the tree structure is preserved (except for bot self-loops).

•

At the conclusion of the $l$ -th iteration, $\mathbf{i}\in\hat{A}_{l}$ has incoming neighbor set (offspring, in the branching process terminology) $\{(\mathbf{i},j):j\in[d_{in}^{A}(\mathbf{i})+d_{in}^{B}(\mathbf{i})]\}$ . More specifically, the subset $(\mathbf{i},1),\ldots,(\mathbf{i},d_{in}^{A}(\mathbf{i}))$ of $\mathbf{i}$ ’s incoming neighbors are agents (Line 4), while the subset $(\mathbf{i},d_{in}^{A}(\mathbf{i})+1),\ldots,(\mathbf{i},d_{in}^{A}(\mathbf{i})+d_{in}^{B}(\mathbf{i}))$ of $\mathbf{i}$ ’s incoming neighbors are bots (Line 4).

•

Unlike the graph construction, the tree construction continues indefinitely, yielding an infinite tree (except for bot self-loops) with edges pointing towards the root node $\phi$ .

Having defined the tree construction, we also define $\hat{\vartheta}_{T_{n}}(\phi)$ as in (57) but using the tree from Algorithm 4 instead of the graph from Algorithm 3. Specifically, we let

[TABLE]

where $\hat{s}_{t}(\mathbf{i})\sim\textrm{Bernoulli}(\theta)\ \forall\ t\in\mathbb{N},\mathbf{i}\in\hat{A}$ ; $\hat{s}_{t}(\mathbf{i})=0\ \forall\ t\in\mathbb{N},\mathbf{i}\in\hat{B}$ ; $\hat{Q}=(1-\eta)I+\eta\hat{P}$ ; and $\hat{P}$ is the column-normalized adjacency matrix of the tree from Algorithm 4. We pause to note that

[TABLE]

where the first inequality holds since (60) is a sum of nonnegative terms, the second follows since $\sum_{\mathbf{i}\in\hat{A}}\hat{s}_{T_{n}-t}(\mathbf{i})e_{\mathbf{i}}\leq\mathbf{1}$ component-wise (where $\mathbf{1}$ is the all ones vectors) and since $\hat{Q}^{t}e_{\phi}^{\mathsf{T}}$ is element-wise nonnegative, and the equality holds by column stochasticity of $\hat{Q}$ .

We can now state Lemma 7.9, which relates the estimate of a uniformly random agent in the graph with the estimate of the root node in the tree. For the first statement in the lemma, we argue that, conditioned on $\tau_{n}>T_{n}$ , the $T_{n}$ -step neighborhood of $i^{*}$ in the graph and the $T_{n}$ -step neighborhood of $\phi$ in the tree are constructed via the same procedure; since the signals are defined in the same manner as well, this implies $\vartheta_{T_{n}}(i^{*})$ and $\hat{\vartheta}_{T_{n}}(\phi)$ have the same distribution. The second statement of the lemma says that the condition $\tau_{n}>T_{n}$ occurs with high probability; it is essentially implied by [33, Lemma 5.4]. We note that the assumptions A1 and A2 are required for this second statement to hold, and are standard assumptions needed to locally approximate a sparse random graph construction with a branching process. Finally, we recall $\zeta<1/2$ by A2, which is why the limit shown in Lemma 7.9 holds.

Lemma 7.9.

Assume A1 and A2 hold, and let $\stackrel{{\scriptstyle\mathscr{D}}}{{=}}$ denote equality in distribution. Then

[TABLE]

Proof 7.10.

See Appendix 9.1.2.

We can now state and prove Lemma 7.11, which is the main result for Step 1 of the proofs of the theorems. This result will allow us to analyze convergence of $\theta_{T_{n}}(i^{*})$ (the graph agent estimate) by instead analyzing convergence of $\hat{\vartheta}_{T_{n}}(\phi)$ (the tree agent estimate).

Lemma 7.11.

Assume A1, A2, and A4 hold. Then $\forall\ x\in\mathbb{R}$ and all $n\in\mathbb{N}$ sufficiently large,

[TABLE]

Proof 7.12.

First, given $\epsilon>0$ , we have for sufficiently large $n$ ,

[TABLE]

where the first inequality uses the triangle inequality and in the second we used Lemma 7.7 to bound $|\theta_{T_{n}}(i^{*})-\vartheta_{T_{n}}(i^{*})|$ by $\epsilon/2$ $a.s.$ Furthermore, by the law of total probability, we have

[TABLE]

Combining the previous two inequalities and using Lemma 7.9 (which applies since A1, A2 are assumed to hold), we obtain

[TABLE]

which is what we set out to prove.

Before proceeding, we state another lemma that will be used in Step 2 of the proofs for both theorems. This lemma uses the fact that each agent in the tree has a unique path to the root. As a result, we can obtain an alternate expression for the terms $e_{\mathbf{i}}\hat{Q}^{t}e_{\phi}^{\mathsf{T}}$ appearing in (60).

Lemma 7.13.

For each $n\in\mathbb{N}$ ,

[TABLE]

where by convention $\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}=1$ when $l=0$ .

Proof 7.14.

See Appendix 9.1.3.

7.2 Step 2 for proof of Theorem 1

Our next goal is to establish convergence of $\hat{\vartheta}_{T_{n}}(\phi)$ , from which convergence of $\theta_{T_{n}}(i^{*})$ will follow via Lemma 7.11. For this, we will use Chebyshev’s inequality, so we begin with two lemmas describing the limiting behavior of the mean and variance of $\hat{\vartheta}_{T_{n}}(\phi)$ . Here and moving forward, for random variables $X$ and $Y$ we use $\textrm{Var}_{n}(X)=\mathbb{E}_{n}[X^{2}]-(\mathbb{E}_{n}[X])^{2}$ and $\textrm{Cov}_{n}(X,Y)=\mathbb{E}_{n}[XY]-\mathbb{E}_{n}[X]\mathbb{E}_{n}[Y]$ to denote variance and covariance conditional on the degree sequence.

Lemma 7.15.

Given A3 and A4, we have the following:

[TABLE]

Proof 7.16.

See Appendix 9.2.1.

Lemma 7.17.

Given A3 and A4, $\lim_{n\rightarrow\infty}\textrm{Var}_{n}(\hat{\vartheta}_{T_{n}}(\phi))1(\Omega_{n,2})=0\ a.s.$

Proof 7.18.

See Appendix 9.2.2.

Before proceeding, we briefly describe our approach to proving these lemmas. First, we note that in analyzing the moments of $\hat{\vartheta}_{T_{n}}(\phi)$ , the i.i.d. Bernoulli random variables $\hat{s}_{T_{n}-t}(\mathbf{i})$ in (67) are easily dealt with; the difficulty arises from the $\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}$ terms. Luckily, there is a simple interpretation of these terms that guides our analysis and that proceeds as follows. First, define a random walk $\{X_{l}^{1}\}_{l\in\mathbb{N}_{0}}$ with $X_{0}^{1}=\phi$ and $X_{l}^{1}$ chosen uniformly from the incoming neighbors of $X_{l-1}^{1}$ , for each $l\in\mathbb{N}$ . Then, as shown in (181) in Appendix 9.2.1,

[TABLE]

In short, computing the mean of $\hat{\vartheta}_{T_{n}}(\phi)$ amounts to computing hitting probabilities of the form $\mathbb{P}(X_{l}^{1}\in\hat{A}_{l})$ . Similarly, to analyze the second moment of $\hat{\vartheta}_{T_{n}}(\phi)$ , we compute hitting probabilities of the form $\mathbb{P}(X_{l}^{1}\in\hat{A}_{l},X_{l}^{2}\in\hat{A}_{l})$ , where $X_{l}^{2}$ is defined in the same manner as $X_{l}^{1}$ and is conditionally independent of $X_{l}^{1}$ given the tree structure. We note that, in principal, the $k$ -th moment of $\hat{\vartheta}_{T_{n}}(\phi)$ can be computed by analyzing $k$ walks. However, the calculations become exceedingly complex as $k$ grows, and because we only require two moments, we do not study any case $k>2$ .

This interpretation explains Lines 4 and 4-4 of Algorithm 4: in Line 4, we begin two random walks at the root node $\phi$ ; each time Lines 4-4 are reached, we advance the random walks one step. Importantly, we simultaneously sample the walks and construct the tree. In particular, the $l$ -th step of the walk is taken at Line 4, before the degrees of the corresponding node are realized in Line 4; this is crucial to our computation of the aforementioned hitting probabilities. Finally, we note that in Line 4 of Algorithm 4, the condition $j^{*}>d_{in}^{A}(\mathbf{i})$ implies the walk reaches the set of bots $\hat{B}$ ; since bots have self-loops but no other incoming edges, they act as absorbing states on the walk. This is why the entire future trajectory of the walk can be defined in Line 4.

In Lemmas 7.19 and 7.21, we compute the hitting probabilities needed for the proofs of Lemmas 7.15 and 7.17. We note that, in addition to the random variables $\tilde{p}_{n},\tilde{p}_{n}^{*},\tilde{q}_{n}$ defined in (13) in Section 3.1, Lemma 7.21 requires the definition of several similar random variables; we define these in (72) (and also recall the definitions of $\tilde{p}_{n},\tilde{p}_{n}^{*},\tilde{q}_{n}$ for convenience). We discuss these in more detail shortly.

[TABLE]

Lemma 7.19.

We have

[TABLE]

Proof 7.20.

See Appendix 9.2.4.

Lemma 7.21.

For $l^{\prime}>l$ , we have

[TABLE]

Furthermore,

[TABLE]

Proof 7.22.

See Appendix 9.2.5.

Before proceeding, we comment on the form of (75), which helps explain the definitions in (72). Namely, in (75), $\tilde{r}_{n}^{*}\tilde{p}_{n}^{2(l-1)}$ is the probability of the two random walks visiting different agents on the first step of the walk ( $\tilde{r}_{n}^{*}$ term), then separately remaining in the agent set for the next $l-1$ steps of the walk ( $\tilde{p}_{n}^{2(l-1)}$ term); similarly, $\tilde{q}_{n}^{*}\tilde{q}_{n}^{t-2}\tilde{r}_{n}\tilde{p}_{n}^{2(l-t)}$ is the probability of the walks visiting the same agents for $t-1$ steps ( $\tilde{q}_{n}^{*}\tilde{q}_{n}^{t-2}$ term), then visiting a different agent on the $t$ -th step ( $\tilde{r}_{n}$ term), then separately remaining in the agent set for $l-t$ steps ( $\tilde{p}_{n}^{2(l-t)}$ term); finally, $\tilde{q}_{n}^{*}\tilde{q}_{n}^{l-1}$ is the probability of the walks remaining together and in the agent set for $l$ steps. Each of these arguments follows from (72): $\tilde{p}_{n}$ gives the probability of a single walk proceeding to an agent ( $j/(j+k)$ term), $\tilde{q}_{n}$ gives the probability of two walks proceeding to the same agent ( $j/(j+k)$ term for the first walk, $1/(j+k)$ term for the second walk), and $\tilde{r}_{n}$ gives the probability of two walks proceeding to different agents ( $j/(j+k)$ term for the first walk, $(j-1)/(j+k)$ term for the second walk). Similar arguments apply to $\tilde{p}_{n}^{*},\tilde{q}_{n}^{*},\tilde{r}_{n}^{*}$ , except these pertain to the first steps of the walks.

Equipped with Lemmas 7.15 and 7.17, we can prove Theorem 1. First, suppose $T_{n}(1-p_{n})\rightarrow 0$ . Given $\epsilon>0$ , we can use Lemma 7.11 to obtain (provided the limits exist)

[TABLE]

where we have used $\mathbb{P}(\Omega_{n,1}^{C})\rightarrow 0$ by A1 and $\zeta<1/2$ by A2. Next, using total probability,

[TABLE]

We can further expand the first summand in (78) as

[TABLE]

where we have simply used the triangle inequality and the union bound. Now for the first summand in (80), we have (via total expectation and the conditional form of Chebyshev’s inequality)

[TABLE]

where the limit holds by Lemma 7.17. For second summand in (80), we write

[TABLE]

where the first two lines use total expectation and the inequality $1(x>y)\leq x/y$ for $x,y>0$ (which is easily proven by considering the cases $x>y$ and $x\leq y$ ), and the limit holds by Lemma 7.15. Finally, combining (76), (78), (80), (81), and (83), and recalling that $\mathbb{P}(\Omega_{n,2}^{C})\rightarrow 0$ by A3, we obtain

[TABLE]

Since $\epsilon>0$ was arbitrary, we conclude that $\theta_{T_{n}}(i^{*})$ converges to $\theta$ in probability, completing the proof in the case $T_{n}(1-p_{n})\rightarrow 0$ . For the cases $T_{n}(1-p_{n})\rightarrow c\in(0,\infty)$ and $T_{n}(1-p_{n})\rightarrow\infty$ , respectively, we can replace $\theta$ with $\theta(1-e^{-c\eta})/(c\eta)$ and [math], respectively (the corresponding cases from Lemma 7.15), but otherwise follow the same approach.

7.3 Step 2 for proof of Theorem 6.6

Similar to the second step in the proof of Theorem 1, we begin by analyzing the limiting behavior of $\hat{\vartheta}_{T_{n}}(\phi)$ . However, we will use a different approach than that used in Theorem 1. This approach is made possible by the stronger assumptions of Theorem 6.6, and it will yield a fast rate of convergence that will allow us to prove the theorem.

To explain our approach, we first recall that Lemma 7.13 states

[TABLE]

Hence, letting $\mathcal{T}$ denote the collection of random variables defining the tree structure,

[TABLE]

where we have simply used the fact that the signals are i.i.d. $\textrm{Bernoulli}(\theta)$ random variables. Our basic approach will now proceed in two steps. First, in Lemma 7.23 we condition on the tree structure, so that $\hat{\vartheta}_{T_{n}}(\phi)$ is simply a weighted sum of i.i.d. $\textrm{Bernoulli}(\theta)$ random variables; the lemma shows that this weighted sum is close to its conditional mean $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ with high probability. Second, in Lemma 7.25, we show that the conditional mean $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ converges to zero in probability. Before proceeding, we also note that an argument similar to (61) implies

[TABLE]

which will be used in the proofs of the lemmas in this appendix.

We now state Lemma 7.23. As mentioned, the proof involves analyzing a weighted sum of i.i.d. random variables; hence, our analysis is similar to the derivation of Hoeffding’s inequality.

Lemma 7.23.

Assume $\exists\ \mu>0$ and $N^{\prime}\in\mathbb{N}$ independent of $n$ s.t. the following hold:

•

A4*, with $T_{n}\geq\mu\log n\ \forall\ n\geq N^{\prime}$ .*

Then $\forall\ \epsilon>0$ ,

[TABLE]

Proof 7.24.

See Appendix 9.3.1.

Lemma 7.25 states that conditional mean $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ converges to zero in probability. Note that the only source of randomness in $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ is the tree structure. Since the tree structure is generated recursively, $\mathbb{E}[\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{T}]$ has a martingale-like structure; this allows us to use an approach similar to the Azuma-Hoeffding inequality for bounded-difference martingales.

Lemma 7.25.

Assume $\exists\ \kappa,\mu>0$ and $N^{\prime}\in\mathbb{N}$ independent of $n$ s.t. the following hold:

•

A3*, with $P(\Omega_{n,2})=O(n^{-\kappa})$ and $p<1$ .*

•

A4*, with $T_{n}\geq\mu\log n\ \forall\ n\geq N^{\prime}$ .*

Then $\forall\ \epsilon>0$ ,

[TABLE]

Proof 7.26.

See Appendix 9.3.2.

With Lemmas 7.23 and 7.25 in place, we can prove Theorem 6.6. First, since $\theta_{T_{n}}(i^{*}),\hat{\vartheta}_{T_{n}}(\phi)\geq 0$ , taking $x=0$ in Lemma 7.11 yields

[TABLE]

where the equality is by the theorem assumptions. For the first summand in (93), we write

[TABLE]

where the first equality adds and subtracts a term, the first inequality is immediate, the second inequality uses the union bound, the second equality uses Lemmas 7.23 and 7.25, and the final equality holds since $\eta,p\in(0,1)$ implies $\epsilon^{2}\mu/8>\mu(\epsilon\eta(1-p)/\theta)^{2}/16$ . Substituting into (93),

[TABLE]

We can then write

[TABLE]

where we have used (98). Hence, by Markov’s inequality,

[TABLE]

where the limit holds by the assumption on $k$ in the statement of the theorem.

7.4 Other remarks

7.4.1 A sufficient condition for extending Theorem 6.6

Here we show that the condition (52) from Appendix 6 is sufficient to extend Theorem 6.6 to other cases of $p_{n}$ . Recall this condition is

[TABLE]

where $L(p_{n})$ is the limit from Theorem 1 based on the relative asymptotics of $T_{n}$ and $p_{n}$ , i.e.

[TABLE]

Suppose (103) holds in the case $T_{n}(1-p_{n})\rightarrow 0$ , so that $L(p_{n})=\theta$ . In this case, we have

[TABLE]

where the first inequality is Lemma 7.11 (which holds for all cases of $p_{n}$ ) with $\mathbb{P}(\Omega_{n,1})=O(n^{-\kappa})$ and the third uses Lemma 7.23 (which holds for all cases of $p_{n}$ ) and the sufficient condition (103). Hence, by the argument following (98), we obtain for any $\epsilon>0$ , $K>0$ , and $k^{\prime}>1-\min\{\epsilon^{2}\mu/8,\gamma^{\prime},\kappa,(1/2)-\zeta\}$ ,

[TABLE]

i.e. Theorem 6.6 holds with $k$ replaced by $k^{\prime}$ . The same argument shows that Theorem 6.6 holds (with only a change of $k$ ) in the cases $T_{n}(1-p_{n})\rightarrow c\in(0,\infty)$ and $T_{n}(1-p_{n})\rightarrow\infty$ with $p_{n}\rightarrow 1$ .

7.4.2 Comparing Step 2 for proofs of Theorems 1 and 6.6

As shown in Appendices 7.2 and 7.3, Step 2 for the proofs of both theorems involves bounding $\mathbb{P}(|\hat{\vartheta}_{T_{n}}(\phi)-L(p_{n})|>\epsilon/2)$ for the appropriate $L(p_{n})$ . One may wonder why we have conducted a different analysis for the two theorems. The reason is that, as shown in Appendix 9.3.3, the analysis for Step 2 of Theorem 6.6 yields a bound that does not decay with $n$ in the case $T_{n}(1-p_{n})\rightarrow c\in[0,\infty)$ . Hence, we have derived a bound for Theorem 1 that encompasses all cases of $\lim_{n\rightarrow\infty}T_{n}(1-p_{n})$ . On the other hand, the bound from Theorem 1 only states $\mathbb{P}(|\hat{\vartheta}_{T_{n}}(\phi)-L(p_{n})|>\epsilon/2)\rightarrow 0$ but does not provide a rate of convergence so cannot be used to prove Theorem 6.6. We also note Appendix 9.3.3 shows that, while the bound for Step 2 of Theorem 6.6 does decay in $n$ for the case $T_{n}(1-p_{n})\rightarrow\infty$ with $p_{n}\rightarrow 1$ , it does not decay quickly enough to establish (52).

8 Section 4 proof and experiment details

8.1 Solution of the relaxed problem

We aim to show $d_{n}^{rel}\in\operatorname*{arg\,min}_{d\in\mathbb{R}_{+}^{n}:\sum_{i=1}^{n}d(i)\leq b_{n}}\tilde{p}_{n}(d)$ , where (we recall)

[TABLE]

We also recall $x_{+}=x1(x>0)$ , $r(i)=d_{out}(i)/d_{in}^{A}(i)\ \forall\ i\in[n]$ , $h^{*}=\max_{x\in\mathbb{R}_{+}}h(x)$ , and

[TABLE]

First note strict convexity of $y\mapsto 1/y$ for $y\in\mathbb{R}$ implies strict convexity of $\tilde{p}_{n}$ , i.e. for any $d\neq d^{\prime}\in\mathbb{R}^{n}_{+}$ and $\rho\in(0,1)$ ,

[TABLE]

Also note we can rewrite the relaxed problem (30) as

[TABLE]

where $c(d)=\sum_{i=1}^{n}d(i)-b_{n},c_{i}(d)=-d(i)$ . Given $\lambda,\lambda_{i}\geq 0$ , we also define the Lagrangian

[TABLE]

Finally, we set $\lambda^{*}=(h^{*})^{2}/m_{n},\lambda_{i}^{*}=((h^{*})^{2}-r(i))_{+}/m_{n}$ (clearly, $\lambda^{*},\lambda_{i}^{*}\geq 0$ ). Now to prove the theorem, it suffices to establish the following KKT conditions (see e.g. [13, Section 5.5.3]):

$c(d_{n}^{rel}),c_{1}(d_{n}^{rel}),\ldots,c_{n}(d_{n}^{rel})\leq 0\ \forall\ i\in[n]$ , i.e. $d_{n}^{rel}$ is a feasible point of (112). 2. 2.

$\nabla L(d_{n}^{rel},\lambda^{*},\lambda_{1}^{*},\ldots,\lambda_{n}^{*})=0$ , i.e. the first-order condition is satisfied. 3. 3.

$\lambda^{*}c(d_{n}^{rel})=\lambda_{1}^{*}c_{1}(d_{n}^{rel})=\cdots=\lambda_{n}^{*}c_{n}(d_{n}^{rel})=0$ , i.e. complementary slackness holds.

We proceed to the proofs of these three statements.

Clearly, $c_{i}(d_{n}^{rel})\leq 0\ \forall\ i\in[n]$ . To show $c(d_{n}^{rel})\leq 0$ , we claim (and will return to prove) that $h^{*}$ is a fixed point of $h$ , i.e. $h^{*}=h(h^{*})$ . Assuming this claim holds, we have

[TABLE]

where the last two equalities use the fixed point claim and the definition of $h$ , respectively. 2. 2.

First, let $i\in[n]$ satisfy $r(i)>(h^{*})^{2}$ , so that $d_{n}^{rel}(i)=-d_{in}^{A}(i)+d_{in}^{A}(i)\sqrt{r(i)}/h^{*},\lambda_{i}^{*}=0$ . Then

[TABLE]

Next, let $i\in[n]$ satisfy $r(i)\leq(h^{*})^{2}$ , so that $d_{n}^{rel}(i)=0,\lambda_{i}=((h^{*})^{2}-r(i))/m_{n}$ . Then

[TABLE] 3. 3.

For any $i\in[n]$ , we have

[TABLE]

Clearly, the first $(\cdot)_{+}$ term is zero if $r(i)>(h^{*})^{2}$ , the second is zero if $r(i)<(h^{*})^{2}$ , and both are zero if $r(i)=(h^{*})^{2}$ . Finally, $\lambda^{*}c(d_{n}^{rel})=0$ holds by (114).

We return to establish the fixed point claim. We in fact prove the slightly stronger result

[TABLE]

The fixed point claim then follows, since $h^{*}\geq h(h^{*})$ by definition and $h^{*}\leq h(h^{*})$ by (120) with $x=x^{*}$ , where $x^{*}$ is a maximizer of $h$ . Thus, it suffices to prove (120). Towards this end, fix $x\in\mathbb{R}_{+}$ . We first assume $x\geq h(x)$ and will return to address the other case. For any $y,z\in\mathbb{R}\cup\{\infty\}$ , we define

[TABLE]

where by convention $N(y,z)=D(y,z)=0$ if $y,z$ are such that $\{i\in[n]:r(i)\in[y^{2},z^{2})\}=\emptyset$ (i.e. if the sums are over empty sets). Then by definition of $h$ , $N$ , and $D$ , we have

[TABLE]

Again by definition of $h$ , $N$ , and $D$ , and recalling $r(i)=d_{out}(i)/d_{in}^{A}(i)$ , we also have

[TABLE]

Thus, combining the previous two equations, we obtain

[TABLE]

If instead $x\leq h(x)$ , we can use the same argument to obtain

[TABLE]

8.2 Rewriting the objective function

We aim to prove (39) and (40), which we restate here for convenience:

[TABLE]

For the equality in (128), we write

[TABLE]

where the first, fourth, and fifth equalities hold by definition of $g_{n}$ (see proof of Theorem 2), $d_{n}^{rand}$ (see Algorithm 2), and $\tilde{p}_{n}(d)$ (see (27)), respectively, and the others are straightforward. For the inequality in (128), we first write (simliar to above)

[TABLE]

Now fix $i\in[n]$ and $j\in[b_{n}]$ . Then by the smoothing property,

[TABLE]

We next observe

[TABLE]

where we used independence of $\{W_{j}\}_{j=1}^{k}$ , Algorithm 2, and (114), respectively. Combining the previous two identities,

[TABLE]

where the first inequality is Jensen’s and the second holds since $1\leq d_{in}^{A}(i)$ and $\frac{b_{n}-1}{b_{n}}<2$ . Substituting into (131), we obtain

[TABLE]

where the equality holds by definition of $\tilde{p}_{n}$ (27).

8.3 Self-bounding concentration

As mentioned in the main text, we exploit the theory of self-bounding functions.

Definition 8.27.

[14, Section 3.3]** Let $\mathcal{X}$ be a measurable space, $l\in\mathbb{N}$ , and $f:\mathcal{X}^{l}\rightarrow[0,\infty)$ . We say $f$ is a self-bounding function if there exists auxiliary functions $f_{-i}:\mathcal{X}^{l-1}\rightarrow\mathbb{R},i\in[l]$ such that, for any $x=(x_{1},\ldots,x_{l})\in\mathcal{X}^{l}$ ,

[TABLE]

where $x_{-i}=(x_{1},\ldots,x_{i-1},x_{i+1},\ldots,x_{l})\ \forall\ i\in[l]$ .

Theorem 8.28.

[14, Theorem 6.12]** Let $X_{1},\ldots,X_{l}$ be independent $\mathcal{X}$ -valued random variables, define $X=(X_{1},\ldots,X_{l})$ , and let $f:\mathcal{X}^{l}\rightarrow[0,\infty)$ be self-bounding. Then for any $t\in(0,\mathbb{E}f(X)]$ ,

[TABLE]

Assuming for the moment that $g_{n}$ is self-bounding, we can apply the theorem with $t=\delta\mathbb{E}g_{n}(W)/(2+\delta)$ to obtain

[TABLE]

where $c_{\delta}=\frac{\delta^{2}}{4(2+\delta)^{2}}$ as in Theorem 2. This completes the proof of (43) from the main text.

To verify $g_{n}$ is self-bounding, we use the most obvious choice of auxiliary functions: let

[TABLE]

where $w_{-i}=(w_{1},\ldots,w_{i-1},w_{i+1},\ldots,w_{b_{n}})$ for $w\in(w_{1},\ldots,w_{b_{n}})\in[n]^{b_{n}}$ , i.e. we simply ignore the $i$ -th coordinate of $w$ . Towards bounding $g_{n}(w)-g_{n,-i}(w_{-i})$ , we first observe

[TABLE]

where in (144) we computed the difference of fractions in (143), in (145) we replaced $w_{j}$ by $w_{i}$ (which is permitted due to the indicator $1(w_{i}=w_{j})$ ), and in (146) we rearranged the expression; the upper bound in (147) is obvious, while the lower bound holds since the second factor in (146) is less than 1. Using the upper bound in (147), we can then obtain

[TABLE]

On the other hand, using the lower bound in (147), along with (148)-(149), we immediately obtain $g_{n}(w)-g_{n,-i}(w_{-i})>0$ . Together with (150), the first condition in Definition 8.27 holds. To verify the second condition in Definition 8.27, we use the leftmost expression in (150) to obtain

[TABLE]

8.4 Proof of Corollary 4.4

Let $\mathcal{I}_{n}=\{i\in[n]:r(i)\geq\epsilon\bar{r}\}$ ; recall $|\mathcal{I}_{n}|\rightarrow\infty$ as $n\rightarrow\infty$ by assumption. Define $d_{n}\in\mathbb{R}_{+}^{n}$ by

[TABLE]

Clearly, $\sum_{i=1}^{n}d_{n}(i)=b_{n}$ , so $\tilde{p}_{n}(d_{n}^{rel})\leq\tilde{p}_{n}(d_{n})$ (see Appendix 8.1). Hence, by Theorem 2, we aim to find $\{\delta_{n}\}_{n\in\mathbb{N}}$ s.t.

[TABLE]

where $c_{\delta_{n}}=\frac{\delta_{n}^{2}}{4(2+\delta_{n})^{2}}$ as in Theorem 2. In fact, it suffices to show $m_{n}(1-\tilde{p}_{n}(d_{n}))/\bar{r}\rightarrow\infty$ , since then we choose $\delta_{n}$ such that (for example) $c_{\delta_{n}}=\sqrt{\frac{\bar{r}}{m_{n}(1-\tilde{p}_{n}(d_{n}))}}$ to ensure (153) holds. Toward this end, first note that for any $i\in\mathcal{I}_{n}$ ,

[TABLE]

Hence, by definition of $\tilde{p}_{n}$ (27),

[TABLE]

Recall $|\mathcal{I}_{n}|\rightarrow\infty$ by assumption, so

[TABLE]

Since also $\epsilon b_{n}\rightarrow\infty$ by assumption, the final expression in (155) diverges, as desired.

8.5 Other algorithmic details

We first show Algorithm 1 solves (28). We require a basic fact about discrete convexity.

Definition 8.29.

[12, Section 1.4.2]** Let $f:\mathbb{Z}^{n}\rightarrow\mathbb{R}\cup\{\infty\}$ and $dom(f)=\{x\in\mathbb{Z}^{n}:f(x)\in\mathbb{R}\}$ . Then $f$ is called M-convex if for any $x,y\in dom(f)$ and any $i\in[n]$ satisfying $x(i)>y(i)$ , there exists $j\in[n]$ satisfying

[TABLE]

Theorem 8.30.

[12, Theorem 6.26]** Let $f$ be M-convex, and let $x\in dom(f)$ . Then

[TABLE]

In words, the theorem says that $x$ minimizes $f$ if and only if $f$ cannot be decreased by an “exchange,” wherein $x$ is replaced by $x-e_{i}+e_{j}$ . Note that Algorithm 1 terminates precisely when this criteria is satisfied, so if we can show that (29) is M-convex, we obtain as a corollary that Algorithm 1 solves (28).

To show M-convexity, let $d,d^{\prime}\in dom(\hat{p}_{n}),i\in[n]$ s.t. $d(i)>d^{\prime}(i)$ . Then since $\sum_{k=1}^{n}d(k)=\sum_{k=1}^{n}d^{\prime}(k)=b_{n}$ , we clearly have $d^{\prime}(j)>d(j)$ for some $j\in[n]$ . From $\sum_{k=1}^{n}d(k)=\sum_{k=1}^{n}d^{\prime}(k)=b_{n}$ and $d(i),d^{\prime}(j)\geq 1$ , it is also clear that $d-e_{i}+e_{j},d^{\prime}+e_{i}-e_{j}\in dom(\hat{p}_{n})$ . Hence, letting $\mu(k)=d_{out}(k)d_{in}^{A}(k)/m_{n}$ ,

[TABLE]

where we have simply used the definitions of $\hat{p}_{n},\tilde{p}_{n}$ . Similarly, we obtain

[TABLE]

Adding the previous two equations, and using the inequalities $d(i)\geq d^{\prime}(i)+1,d^{\prime}(j)\geq d(j)+1$ (where the first holds since $d(i)>d^{\prime}(i)$ and $d(i),d^{\prime}(i)\in\mathbb{Z}$ , and the second holds similarly) gives $\hat{p}_{n}(d-e_{i}+e_{j})+\hat{p}_{n}(d^{\prime}+e_{i}-e_{j})\leq\hat{p}_{n}(d)+\hat{p}_{n}(d^{\prime})$ .

For the runtime of Algorithm 1, we note the following:

•

The complexity of each iteration is dominated by the computation of $\{\hat{p}_{n}(d-e_{i}+e_{j})\}_{i,j\in[n]}$ . By (159), we can compute $\hat{p}_{n}(d-e_{i}+e_{j})$ in $O(1)$ time per $i,j$ pair, which yields $O(n^{2})$ complexity per iteration.

•

In the best case, the initial choice of $d$ is actually a solution. However, it still requires one iteration to verify this, so the best-case complexity is $O(n^{2})$ .

•

In the general case, [12, Section 10.1.1] provides a tie-breaking rule for the choice of $(i^{*},j^{*})$ that guarantees termination in $\max\{\|d-d^{\prime}\|_{1}:d,d^{\prime}\in dom(\hat{p}_{n})\}=O(b_{n})$ iterations, which means $O(n^{2}b_{n})$ complexity.

For the randomized scheme (Algorithm 2), first observe that by definition of $h$ , $\{h(x)\}_{x\in\mathbb{R}_{+}}=\{h(\sqrt{r(i)})\}_{i\in[n]}$ . Furthermore, $\{h(\sqrt{r(i)})\}_{i\in[n]}$ , and thus $\{h(x)\}_{x\in\mathbb{R}_{+}}$ , can be computed in time $O(n\log n)$ as follows:

•

Compute a vector containing $\{r(i)\}_{i\in[n]}$ sorted in decreasing order ( $O(n\log n)$ time).

•

Iteratively compute the sums in (32) at each $x\in\{\sqrt{r(i)}\}_{i\in[n]}$ ( $O(n)$ time).

•

Compute $\{h(\sqrt{r(i)})\}_{i\in[n]}$ ( $O(n)$ time).

In summary, $\{h(x)\}_{x\in\mathbb{R}_{+}}$ (which contains at most $n$ elements) can be computed in $O(n\log n)$ time. After computing this set, $h^{*}$ , and subsequently $d_{n}^{rel}$ , can each be computed in linear time. Thus, computing the relaxed solution (31) requires $O(n\log n)$ complexity. Finally, assuming we can obtain one sample from $d_{n}^{rel}$ in $O(1)$ time after $O(n\log n)$ pre-processing time (using e.g. the alias method [34, Section 3.4.1]), Algorithm 2 has total complexity $O(n\log n+b_{n})$ .

8.6 Additional experiments

Figure 5 shows an analogue of Figure 1 with budget $b_{n}=\lceil\tilde{b}|E_{n}|\rceil$ for each $\tilde{b}\in\{\frac{1}{1600},\frac{1}{800},\frac{1}{200},\frac{1}{100}\}$ . The results are qualitatively similar to Figure 1 (Algorithm 1 outperforms Algorithm 2, which itself outperforms the heuristics). We also observe the gap between between the heuristics and our algorithms generally increases as the budget decreases for a fixed social network. Put differently, if an adversary with a limited budget spends this budget intelligently (i.e. using our proposed solutions), they can still disrupt learning; in contrast, an adversary with a large budget need not be as careful.

9 Proof of Theorems 1 and 6.6 (details)

9.1 Branching process approximation (Step 1 for proofs of Theorems 1 and 6.6)

9.1.1 Proof of Lemma 7.7

For $t\in\mathbb{N}_{0}$ , let $\alpha_{t},\beta_{t}$ denote the parameters $\{\alpha_{t}(i)\}_{i\in A\cup B},\{\beta_{t}(i)\}_{i\in A\cup B}$ in vector form, and let $\mathbf{1}$ denote the all ones vector. We claim

[TABLE]

We prove (162) for $\alpha_{t}$ ; the proof for $\beta_{t}$ follows the same approach. First, we use the parameter update equations (3), and the definitions of $P$ and $Q$ from Appendix 7.1 ( $P$ being the column-normalized adjacency matrix and $Q=(1-\eta)I+\eta P$ ) to write the parameter update equation in vector form as

[TABLE]

We next use induction. For $t=1$ , (162) is equivalent to (163). Assuming (162) holds for $t-1$ , we have

[TABLE]

which completes the proof. Next, recalling $e_{i}$ is the vector with 1 in the $i$ -th position and 0 elsewhere,

[TABLE]

where the equalities hold by definition, by (162), since the columns of $Q$ sum to 1 by definition, and by multiplying numerator and denominator by $\frac{1}{(1-\eta)T_{n}}$ , respectively. Next, recall from Section 2 that $\alpha_{0}(j)\in[0,\bar{\alpha}]\ \forall\ j\in A\cup B$ for some $\bar{\alpha}>0$ . Hence, $\alpha_{0}$ is element-wise upper bounded by $\bar{\alpha}\mathbf{1}$ , so $\alpha_{0}Q^{T_{n}}e_{i}^{\mathsf{T}}\leq\bar{\alpha}\mathbf{1}Q^{T_{n}}e_{i}^{\mathsf{T}}=\bar{\alpha}$ , where we have used column stochasticity of $Q$ . Additionally, $\alpha_{0}Q^{T_{n}}e_{i}^{\mathsf{T}}\geq 0$ (since the three terms in the product are elementwise nonnegative). By a similar argument, $0\leq\beta_{0}Q^{T_{n}}e_{i}^{\mathsf{T}}\leq\bar{\beta}$ . Taken together, we can use the previous equation to obtain

[TABLE]

Finally, recall from Section 2 that $\bar{\alpha}$ and $\bar{\beta}$ are independent of $n$ . Hence, because $T_{n}\rightarrow\infty$ as $n\rightarrow\infty$ (by A4 in the statement of the lemma), $\bar{\alpha}/T_{n},\bar{\beta}/T_{n}\rightarrow 0$ as $n\rightarrow\infty$ . It follows that, for given $\epsilon>0$ and $n$ sufficiently large, $|\theta_{T_{n}}(i)-\frac{1}{T_{n}}\sum_{\tau=1}^{T_{n}}s_{\tau}Q^{T_{n}-\tau}e_{i}^{\mathsf{T}}|<\epsilon$ . Finally, by changing the index of summation, it is clear that $\frac{1}{T_{n}}\sum_{\tau=1}^{T_{n}}s_{\tau}Q^{T_{n}-\tau}e_{i}^{\mathsf{T}}=\vartheta_{T_{n}}(i)$ , completing the proof.

9.1.2 Proof of Lemma 7.9

We begin by arguing $\vartheta_{T_{n}}(i^{*})|\{\tau_{n}>T_{n}\}\stackrel{{\scriptstyle\mathscr{D}}}{{=}}\hat{\vartheta}_{T_{n}}(\phi)$ . For this, first consider the sub-graph containing only edges between two agents formed during the first $T_{n}$ iterations of Algorithm 3. Conditioned on $\tau_{n}>T_{n}$ , this sub-graph is constructed as follows:

•

The initial agent $i^{*}$ is sampled uniformly from $A$ (Line 3), which implies its degrees $(d_{out}(i^{*}),$ $d_{in}^{A}(i^{*}),d_{in}^{B}(i^{*}))$ are distributed as $f_{n}^{*}$ . (In fact, this holds even if $\tau_{n}\leq T_{n}$ .)

•

Each time an edge is added to the sub-graph (Line 3), the paired outstub $(i^{\prime},j^{\prime})$ is sampled uniformly from $O_{A}$ (else, $\tau_{n}>T_{n}$ is contradicted by Line 3-3), so the degrees $(d_{out}(i^{\prime}),d_{in}^{A}(i^{\prime}),$ $d_{in}^{B}(i^{\prime}))$ of the corresponding agent $i^{\prime}$ are distributed as $f_{n}$ .

•

The initial agent $i^{*}$ has no paired outstubs, while all other agents in the sub-graph have one paired outstub (otherwise, an outstub with label 2 was paired within the first $T_{n}$ iterations, contradicting $\tau_{n}>T_{n}$ by Line 3); in particular, the sub-graph has $|\cup_{l=0}^{T_{n}}A_{l}|$ nodes and $|\cup_{l=0}^{T_{n}}A_{l}|-1$ edges. Also, every agent in the sub-graph has a path to $i^{*}$ by the breadth-first-search nature of the construction, so, neglecting edge polarities, we obtain a connected graph with $|\cup_{l=0}^{T_{n}}A_{l}|$ nodes and $|\cup_{l=0}^{T_{n}}A_{l}|-1$ edges, i.e. a tree. Finally, since all edges point towards $i^{*}$ (see Line 3), the sub-graph is a directed tree pointed towards $i^{*}$ .

In summary, the sub-graph is a directed tree pointing towards an agent with degrees distributed as $f_{n}^{*}$ , in which all other nodes have degrees distributed as $f_{n}$ . This is precisely the procedure used to construct the sub-graph of agents during the first $T_{n}$ iterations of Algorithm 4. Additionally, Algorithms 3 and 4 add bots in the same manner (Lines 3-3 in Algorithm 3, Lines 4-4 in Algorithm 4). Taken together, we conclude that, conditioned on $\tau_{n}>T_{n}$ , the $T_{n}$ -step neighborhood into $i^{*}$ is constructed in the same manner in Algorithm 3 as the $T_{n}$ -step neighborhood into $\phi$ is constructed in Algorithm 4. Furthermore, by (58) and (60), it is clear that $\vartheta_{T_{n}}(i)$ and $\hat{\vartheta}_{T_{n}}(\phi)$ , respectively, depend only on these respective neighborhoods, and on the signals $s_{T_{n}-t}(i)$ and $\hat{s}_{T_{n}-t}(\mathbf{i})$ , respectively. Since the signals $s_{T_{n}-t}(i)$ and $\hat{s}_{T_{n}-t}(\mathbf{i})$ are also defined in the same manner ( $s_{T_{n}-t}(i),\hat{s}_{T_{n}-t}(\mathbf{i})\sim\textrm{Bernoulli}(\theta)$ for $i\in A,\mathbf{i}\in\hat{A}$ ; $s_{T_{n}-t}(i)=\hat{s}_{T_{n}-t}(\mathbf{i})=0$ for $i\in B,\mathbf{i}\in\hat{B}$ ), we ultimately conclude that $\vartheta_{T_{n}}(i^{*})$ and $\hat{\vartheta}_{T_{n}}(\phi)$ have the same distribution when $\tau_{n}>T_{n}$ holds.

We next argue $\{\tau_{n}>T_{n}\}$ occurs with high probability when $\Omega_{n,1}$ holds. For this, we note that Algorithm 3 is nearly identical to the graph construction described in [33, Section 5.2]. More specifically, the only difference is that the construction in [33] does not include the pairing of agent instubs with bots in Lines 3-3 of Algorithm 3. However, these lines do not affect $\tau_{n}$ . Moreover, when A1 holds, the assumptions of [33, Lemma 5.4] are satisfied. This lemma states that, if $t_{n}<(\log n)/(2\log(\nu_{3}/\nu_{1}))$ and $\nu_{3}>\nu_{1}$ (with $\nu_{1},\nu_{3}$ defined as in A1), then $P(\tau_{n}\leq t_{n}|\Omega_{n,1})=O((\nu_{3}/\nu_{1})^{t_{n}}/\sqrt{n})$ . In particular, by A2 we have $T_{n}\leq\zeta\log(n)/\log(\nu_{3}/\nu_{1})$ for $n$ sufficiently large, with $\zeta\in(0,1/2)$ independent of $n$ ; substituting gives

[TABLE]

9.1.3 Proof of Lemma 7.13

We first claim that for $l\in\mathbb{N}_{0}$ and $\mathbf{i}\in\hat{A}_{l}$ ,

[TABLE]

(Recall $\hat{P}$ is the column-normalized adjacency matrix.) We prove (170) separately for $l=0$ and $l\in\mathbb{N}$ . When $l=0$ , the only case is $\mathbf{i}=\phi$ (since $\hat{A}_{0}=\{\phi\}$ ); if $l^{\prime}=0$ , the left side is clearly 1 and the right side is 1 by convention; if $l^{\prime}\in\mathbb{N}$ , the left side is 0 since $e_{\phi}\hat{P}^{l^{\prime}}=0$ ( $\phi$ has no outgoing neighbors in the tree). Next, we aim to prove (170) for $\mathbf{i}\in\hat{A}_{l}$ and $l\in\mathbb{N}$ . For such $\mathbf{i}$ , there is a unique path from $\mathbf{i}$ to $\phi$ with length $l$ that visits the nodes $\mathbf{i}|l=\mathbf{i},\mathbf{i}|l-1,\ldots,\mathbf{i}|0=\phi$ . By definition of $\hat{P}$ , it follows that

[TABLE]

On the other hand, if $l^{\prime}\neq l$ , no path of length $l^{\prime}$ from $\mathbf{i}$ to $\phi$ exists, so $e_{\mathbf{i}}\hat{P}^{l^{\prime}}e_{\phi}=0$ . This proves (170).

Recalling that $\hat{Q}=(1-\eta)I+\eta\hat{P}$ , we next claim that $\forall\ t\in\mathbb{N}_{0}$ ,

[TABLE]

We prove (172) inductively: both sides equal $I$ when $t=0$ ; assuming (172) is true for $t$ , we have

[TABLE]

where in the first line we have used the definition of $\hat{Q}$ and the inductive hypothesis, the second line simply uses the distributive property, the third rearranges summations, and the fourth uses Pascal’s rule ( $[t+1]$ has ${t+1\choose l}$ subsets of cardinality $l$ ; ${t\choose l-1}$ that contain 1 and ${t\choose l}$ that do not contain 1). This completes the proof of (172).

Having established (172) and (170), we can combine them to obtain $\forall\ t\in\mathbb{N}_{0},\mathbf{i}\in\hat{A}_{l}$ ,

[TABLE]

Finally, substituting the previous equation into (60), and recalling $\hat{A}=\cup_{l=0}^{\infty}\hat{A}_{l}$ , we obtain

[TABLE]

which completes the proof.

9.2 Step 2 for proof of Theorem 1

9.2.1 Proof of Lemma 7.15

First, letting $\mathcal{D}$ denote the degree sequence and $\mathcal{T}$ denote the set of random variables defining the tree structure, we can use Lemma 7.13 to write

[TABLE]

where the first equality uses the tower property of conditional expectation and the fact that $\hat{A}_{l}$ and $d(\mathbf{i}|j)^{-1}$ are fixed given the tree structure, the second uses the fact that $\hat{s}_{T_{n}-t}(\mathbf{i})\sim\textrm{Bernoulli}(\theta)$ , and the third holds by the tower property and the definition of $X_{l}^{1}$ , i.e.

[TABLE]

Here we have also used the fact that $\{X_{l}^{1}\}_{l\in\mathbb{N}}$ is a random walk starting at the root of a directed tree; hence, for $\mathbf{i}\in\hat{A}_{l}$ , $\mathbb{P}(X_{l}^{1}=\mathbf{i}|\mathcal{D},\mathcal{T})$ is the probability of the lone path from $\phi$ to $\mathbf{i}$ , which is $\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}$ , and $X_{l}^{1}\in\hat{A}\Leftrightarrow X_{l}^{1}=\mathbf{i}$ for some $\mathbf{i}\in\hat{A}_{l}$ . Next, using (180) and Lemma 7.19, we obtain

[TABLE]

where by convention the summation over $l$ is zero when $t=0$ . Adding and subtracting $(1-\eta)^{t}\tilde{p}_{n}^{*}/\tilde{p}_{n}$ , the previous equation can be rewritten as

[TABLE]

where we have simply used the binomial theorem and computed two geometric series.

Next, we assume temporarily that $p_{n}\rightarrow 1$ as $n\rightarrow\infty$ . By A3, we have for $\omega\in\Omega_{n,2}$

[TABLE]

Hence, by $p_{n}\rightarrow 1$ , and since $\delta_{n}\rightarrow 0$ by A3, we have for $\gamma_{1}>0$ , $n$ sufficiently large, and such $\omega$

[TABLE]

where we have also used the fact that $1\geq\tilde{p}_{n}^{*}\geq\tilde{p}_{n}$ on $\Omega_{n,2}$ by A3. Also, by A4, it is clear that $(1-(1-\eta))^{T_{n}}/T_{n}\rightarrow 0$ , so for given $\gamma_{2}>0$ and $n$ sufficiently large,

[TABLE]

Combining the previous four equations implies that for $n$ sufficiently large and $\omega\in\Omega_{n,2}$ ,

[TABLE]

We complete the proof for the case $T_{n}(1-p_{n})\rightarrow 0$ ; the proof for the other two cases is similar. In this case, we can use Lemma 9.31 from Appendix 9.4 to obtain for any $\gamma_{3}>0$ and for $n$ large enough

[TABLE]

Combining the previous two equations gives for $n$ large and $\omega\in\Omega_{n,2}$

[TABLE]

Hence, for given $\gamma>0$ , we can find $\gamma_{1},\gamma_{2},\gamma_{3}$ sufficiently small and $n$ sufficiently large such that, for $\omega\in\Omega_{n,2}$ , $|\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)](\omega)-\theta|<\gamma$ . This clearly also implies $|\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)](\omega)-\theta|1(\Omega_{n,2})(\omega)<\gamma$ for such $\omega$ . On the other hand, for $\omega\notin\Omega_{n,2}$ , it is trivial that $|\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)](\omega)-\theta|1(\Omega_{n,2})(\omega)=0<\gamma$ . This completes the proof for the case $T_{n}(1-p_{n})\rightarrow 0$ .

We now return to the case $p_{n}\rightarrow p\in[0,1)$ . In this case, it follows from A4 that $T_{n}(1-p_{n})\rightarrow[0,\infty)$ cannot occur, i.e. we need only consider the case $T_{n}(1-p_{n})\rightarrow\infty$ . First, note that since $p_{n}\rightarrow p<1$ and $\delta_{n}\rightarrow 0$ , we have $p_{n}+\delta_{n}<1-\gamma_{1}$ for some $\gamma_{1}>0$ and $n$ sufficiently large. For such $n$ , and for $\omega\in\Omega_{n,2}$ , we then obtain $\tilde{p}_{n}(\omega)<1-\gamma_{1}$ ; substituting into (182) (evaluated at $\omega$ ) gives

[TABLE]

where in the first inequality we used $\tilde{p}_{n}(\omega)<1-\gamma_{1}$ and $\tilde{p}_{n}^{*}(\omega)\leq 1$ , in the second we used $1-\gamma_{1}\in(0,1)$ (so that $(1-\eta)^{t}<(1-\eta)^{t}/(1-\gamma_{1})$ ), for the equality we used the binomial theorem and computed a geometric series, and the final inequality is immediate. Since $\theta,\eta,\gamma_{1}$ are independent of $n$ , while $T_{n}\rightarrow\infty$ as $n\rightarrow\infty$ by A4, it is clear from this final expression that, for given $\gamma>0$ , $n$ sufficiently large, and $\omega\in\Omega_{n,2}$ , $0\leq\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)](\omega)<\gamma$ . It follows that $|\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)]|1(\Omega_{n,2})\rightarrow 0\ a.s.$ , completing the proof.

9.2.2 Proof of Lemma 7.17

First, suppose $p_{n}\rightarrow p\in[0,1)$ . Then, since $\hat{\vartheta}_{T_{n}}(\phi)\leq 1\ a.s.$ (see (61) and the following argument), $\textrm{Var}_{n}(\hat{\vartheta}_{T_{n}}(\phi))\leq\mathbb{E}_{n}\hat{\vartheta}_{T_{n}}(\phi)^{2}\leq\mathbb{E}_{n}\hat{\vartheta}_{T_{n}}(\phi)$ . Furthermore, since $T_{n}\rightarrow\infty$ by A4, the fact that $p_{n}\rightarrow p\in[0,1)$ means only the case $T_{n}(1-p_{n})\rightarrow\infty$ can occur. In this case, since $\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)]1(\Omega_{n,2})\rightarrow 0\ a.s.$ by Lemma 7.15, we immediately obtain from $\textrm{Var}_{n}(\hat{\vartheta}_{T_{n}}(\phi))\leq\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)]$ that $\textrm{Var}_{n}(\hat{\vartheta}_{T_{n}}(\phi))1(\Omega_{n,2})\rightarrow 0\ a.s.$ as well. Hence, it only remains to prove the lemma in the case $p_{n}\rightarrow 1$ , which we assume to hold for the remainder of the proof.

Towards this end, letting $\mathcal{D}$ denote the degree sequence and $\mathcal{T}$ denote the set of random variables defining the tree structure (as in Appendix 9.2.1), we have

[TABLE]

We next consider the two summands in (197) in turn. In particular, we aim to show that each summand multiplied by $1(\Omega_{n,2})$ tends to zero $a.s.$ as $n$ tends to infinity.

For the first summand in (197), we use the fact that the signals are i.i.d. Bernoulli( $\theta$ ) given the tree structure, as well as Lemma 7.13, to write

[TABLE]

where in the final step we have used $\sum_{\mathbf{i}\in\hat{A}_{l}}\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}\leq 1$ and $\sum_{l=0}^{t}{t\choose l}\eta^{l}(1-\eta)^{t-l}=1$ . It immediately follows that $0\leq\mathbb{E}_{n}[\textrm{Var}(\hat{\vartheta}_{T_{n}}(\phi)|\mathcal{D},\mathcal{T})]1(\Omega_{n,2})\leq 1/T_{n}\ a.s.$ Hence, because $T_{n}\rightarrow\infty$ as $n\rightarrow\infty$ by A4, analysis of the first summand in (197) is complete.

For the second summand in (197), we first use the argument of (180) to write

[TABLE]

where we have defined $Y_{l}=\sum_{\mathbf{i}\in\hat{A}_{l}}\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}$ and $u_{T_{n},l}=\sum_{t=l}^{T_{n}-1}{t\choose l}\eta^{l}(1-\eta)^{t-l}$ . Therefore,

[TABLE]

It remains to compute the variance and covariance terms in (203). First, for any $l,l^{\prime}\in\mathbb{N}$ , we note

[TABLE]

where we have used the argument of (181) and the fact that $\{X_{i}^{1}\}_{i=1}^{\infty}$ and $\{X_{i}^{2}\}_{i=1}^{\infty}$ are independent random walks given the tree structure. By a similar argument, $\mathbb{E}_{n}[Y_{l}]=\mathbb{P}_{n}(X_{l}^{1}\in\hat{A})$ . Hence, using Lemmas 7.19 and 7.21, and assuming for the moment that $l>1$ , we have

[TABLE]

Next, using (72) and Jensen’s inequality, we have

[TABLE]

and so $1-\tilde{r}_{n}/(\tilde{p}_{n}^{2}-\tilde{q}_{n})\leq 0$ , i.e. the second term in (212) is non-positive, so $\forall\ l>1$ ,

[TABLE]

In the case $l=1$ , we have (again by Lemmas 7.19 and 7.21)

[TABLE]

where the inequality is (213) and $\tilde{p}_{n}^{*}\leq 1$ ; hence, (214) holds for $l=1$ as well. Finally, since $Y_{0}=1\ a.s.$ , it is immediate that (214) also holds for $l=0$ . We next analyze the covariance terms in (203). First, if $l^{\prime}>l>0$ , we can use (204) and Lemmas 7.19 and 7.21 to obtain

[TABLE]

On the other hand, if $l^{\prime}>l=0$ , we have $Y_{l}=1\ a.s.$ , so $\textrm{Cov}_{n}(Y_{l},Y_{l^{\prime}})=0=\tilde{p}_{n}^{l^{\prime}}\textrm{Var}_{n}(Y_{0})$ . Hence, combined with (214), we have argued

[TABLE]

Hence, combining (203), (214), and (219), we obtain

[TABLE]

where the second inequality is simply $\theta,\tilde{p}_{n}\leq 1$ , the first equality is immediate, and the second equality holds by definition of $u_{T_{n},l}$ . It clearly follows that

[TABLE]

and so we can complete the proof by showing the right side of (224) tends to zero $a.s.$ Clearly, the right side is zero if $\omega\notin\Omega_{n,2}$ ; we aim to also show that, given $\gamma>0$ , $\exists\ N$ s.t. for $n>N$ and $\omega\in\Omega_{n,2}$ ,

[TABLE]

To prove (225), we first recall that by A3, we have for $\omega\in\Omega_{n,2}$ , $\tilde{p}_{n}^{*}(\omega)\geq\tilde{p}_{n}(\omega)>p_{n}-\delta_{n}$ . Hence, since we are assuming $p_{n}\rightarrow 1$ , and since $\delta_{n}\rightarrow 0$ by A3, we have for $\gamma^{\prime}>0$ , $n$ sufficiently large, and such $\omega$ , $\tilde{p}_{n}(\omega)^{2},\tilde{p}_{n}^{*}(\omega)^{2}>1-\gamma^{\prime}$ . We thus obtain for $n$ large and $\omega\in\Omega_{n,2}$ ,

[TABLE]

To further upper bound the right side of (226), we note $\tilde{r}_{n}\leq 1-\tilde{q}_{n}\ a.s.$ by the first equality in (213). The same argument gives $\tilde{r}_{n}^{*}\leq 1-\tilde{q}_{n}^{*}\ a.s.$ Note, however, that to use the second bound, we must ensure $1-\gamma^{\prime}-\tilde{q}_{n}(\omega)>0$ . To this end, recall that $\tilde{q}_{n}(\omega)<1-\xi$ for $\omega\in\Omega_{n,2}$ by A3. Hence, assuming we choose $\gamma^{\prime}<\xi$ , we obtain $1-\gamma^{\prime}-\tilde{q}_{n}(\omega)>0$ for such $\omega$ . Thus,

[TABLE]

where the first inequality uses (226) and the bounds from the previous paragraph, the equalities are straightforward, the second inequality uses $\tilde{q}_{n}(\omega)<1-\xi$ for $\omega\in\Omega_{n,2}$ by A3, and the third uses $\tilde{q}_{n}^{*}(\omega)\leq 1$ (recall we have chosen $\gamma^{\prime}<\xi$ ). Finally, it is straightforward to see the final bound in (230) tends to zero with $\gamma^{\prime}$ . Hence, for sufficiently small $\gamma^{\prime}$ , (225) follows, completing the proof.

9.2.3 Notation for proofs of Lemmas 7.19 and 7.21

In the next two subsections, we prove Lemmas 7.19 and 7.21. For these proofs, we let $\mathcal{D}$ denote the degree sequence $\{d_{out}(i),d_{in}^{A}(i),d_{in}^{B}(i)\}_{i\in[n]}$ , and we let $D$ denote a realization of this set. Note that the random variables defined in (72) are all functions of $\mathcal{D}$ ; for a realization $D$ of $\mathcal{D}$ , we let e.g. $\tilde{p}_{n,D}$ denote the realization of $\tilde{p}_{n}$ . We similarly define $f_{n,D},f_{n,D}^{*}$ for realizations of $f_{n},f_{n}^{*}$ , defined in (11). Finally, letting $g(D)=\mathbb{P}(\cdot|\mathcal{D}=D)$ , we have $\mathbb{P}_{n}(\cdot)=g(\mathcal{D})$ by definition of $\mathbb{P}_{n}$ . Hence, to prove Lemma 7.19, it suffices to show

[TABLE]

while to prove Lemma 7.21, it suffices to show

[TABLE]

9.2.4 Proof of Lemma 7.19

The $l=0$ case is trivial, since $X_{0}^{1}=\phi\in\hat{A}$ , so we assume $l\in\mathbb{N}$ moving forward. First, since $\hat{A}^{C}=\hat{B}$ is an absorbing set, we have $X_{l}^{1}\in\hat{A}\Rightarrow X_{l-1}^{1}\in\hat{A}$ , so

[TABLE]

For the first term in (234), we have

[TABLE]

where the second equality holds by Algorithm 4. More specifically, for $l>1$ , the degrees of $X_{l-1}^{1}$ are sampled from $f_{n,D}$ (Line 4 in Algorithm 4) after realizing $X_{l-1}^{1}$ (Line 4), yielding the $\sum_{i\in\mathbb{N}}f_{n,D}(i,j,k)$ term; further, $X_{l}^{1}$ is chosen uniformly from the incoming neighbors of $X_{l-1}^{1}$ (Line 4) after realizing the degrees of $X_{l-1}^{1}$ , yielding the $j/(j+k)$ term (the $l=1$ case is similarly justified). Combining (234) and (235), and using the fact that $X_{0}^{1}=\phi\in\hat{A}$ by definition, completes the proof in the case $l=1$ . For $l>1$ , we again use (234) and (235) to obtain

[TABLE]

which completes the proof.

9.2.5 Proof of Lemma 7.21

We begin by proving the first statement in the lemma, i.e. (232). First, we note that for the $l=0$ case, $X_{0}=\phi\in\hat{A}$ by definition, so $\mathbb{P}(X_{l}^{1}\in\hat{A},X_{l^{\prime}}^{2}\in\hat{A}|\mathcal{D}=D)=\mathbb{P}(X_{l^{\prime}}^{2}\in\hat{A}|\mathcal{D}=D)$ , and the statement holds by Lemma 7.19. For the $l\in\mathbb{N}$ case, we first write

[TABLE]

where the first equality holds since $\hat{A}^{C}=\hat{B}$ is an absorbing set (i.e. $X_{l^{\prime}}^{2}\in\hat{A}\Rightarrow X_{l^{\prime}-1}^{2}\in\hat{A}$ ) and the second simply rewrites a conditional probability. Next, by the same argument as (235),

[TABLE]

where we have used the $l^{\prime}>1$ case of (235), since $l^{\prime}>l\geq 1$ . Hence, the previous two equations give

[TABLE]

This completes the proof of (232). For the second statement, i.e. (233), the $l=0$ case is trivial, since $X_{0}^{1}=X_{0}^{2}=\phi\in\hat{A}$ by definition, so we assume $l\in\mathbb{N}$ for the remainder of the proof. First, let $\tau=\inf\{t\in\mathbb{N}_{0}:X_{t}^{1}\neq X_{t}^{2}\}$ denote the first step at which the two walks diverge. Note that $X_{0}^{1}=X_{0}^{2}=\phi$ by definition, so $\tau\in\mathbb{N}\ a.s.$ ; also, due to the tree structure, the walks remain apart forever after diverging, i.e. $X_{\tau+1}^{1}\neq X_{\tau+1}^{2},X_{\tau+2}^{1}\neq X_{\tau+1}^{2},\ldots\ a.s.$ Next, for $l\in\mathbb{N}$ , we write

[TABLE]

We begin by computing the second term in (245). Here we have

[TABLE]

where the first and last equalities hold by definition of $\tau$ and the second holds since $\hat{A}^{C}=\hat{B}$ is an absorbing set. Now for $l>1$ , we obtain

[TABLE]

where the first equality uses independence and eliminates repetitive events, and the third follows an argument similar to that following (235). Combining (247) and (254),

[TABLE]

Finally, by an argument similar to (254), we have

[TABLE]

Hence, combining (259) and (261) gives

[TABLE]

For the first term in (245), we first consider the $t=l$ summand. For $l>1$ , similar to (254),

[TABLE]

where in the final step we have also used (264). Similarly, for $l=1$ ,

[TABLE]

To summarize, we have shown

[TABLE]

Next, we consider the $t<l$ summands in (245) (such summands are present only for $l>1$ ). We have

[TABLE]

where in the first equality we used the fact that $\hat{A}^{C}=\hat{B}$ is an absorbing set and the fact that once the walks diverge they remain apart; in the second equality we used the fact that $X_{l}^{1}$ and $X_{l}^{2}$ are conditionally independent given the event $X_{l-1}^{1}\neq X_{l-1}^{2}$ . Further, for $h\in\{1,2\}$ ,

[TABLE]

and so, combining the previous two equations and applying recursively yields

[TABLE]

where the final equality uses (273). Finally, combining (245), (264), (273), and (281) yields

[TABLE]

which is what we set out to prove.

9.3 Step 2 for proof of Theorem 6.6

9.3.1 Proof of Lemma 7.23

We first write

[TABLE]

where the first equality uses the law of total expectation and the second is immediate. For the first summand in the expectation in (286), we fix $\lambda>0$ and write

[TABLE]

Here the first equality holds by monotonicity of $x\mapsto e^{\lambda x}$ , the first inequality is Markov’s, the second equality holds by (87), the second inequality uses Lemma 9.35 from Appendix 9.4, the third inequality uses ${t\choose l}\eta^{l}(1-\eta)^{t-l},\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}\leq 1$ , the third equality is immediate, the fourth equality again uses (87), and the fourth inequality uses (89). Since the preceding argument holds $\forall\ \lambda>0$ , we choose $\lambda=4\epsilon T_{n}$ to minimize the bound. Upon substituting into (293), we obtain $e^{-2\epsilon^{2}T_{n}}$ . The same argument holds for the second summand in the expectation of (286). We also note that the bound $e^{-2\epsilon^{2}T_{n}}$ is non-random, so we may discard the expectation. In summary, we have shown

[TABLE]

Hence, for $n$ sufficiently large, we have by assumption on $T_{n}$

[TABLE]

which is what we set out to prove.

9.3.2 Proof of Lemma 7.25

We begin by deriving a bound conditioned on the degree sequence. First, we fix $\tilde{\lambda}>0$ and use monotonicity of $x\mapsto e^{\tilde{\lambda}x}$ and Markov’s inequality to write

[TABLE]

The bulk of the proof will involve bounding the expectation term. For this, we first note

[TABLE]

where the first equality holds by (87), the second rearranges summations, and in the third we have defined $\lambda=\tilde{\lambda}\theta/T_{n}$ , $u_{T_{n},l}=\sum_{t=l}^{T_{n}-1}{t\choose l}\eta^{l}(1-\eta)^{t-l}$ , and $Y_{l}=\sum_{\mathbf{i}\in\hat{A}_{l}}\prod_{j=0}^{l-1}d_{in}(\mathbf{i}|j)^{-1}$ . For the remainder of the proof, we use $\mathbb{E}_{n,l}$ to denote conditional expectation with respect to the degree sequence and the set of random variables realized during the first $l$ iterations of Algorithm 4 (i.e. the random variables defining the first $l$ generations of the tree). Using this notation, we have

[TABLE]

where in the third equality we have multiplied and divided $\exp(\lambda u_{T_{n},T_{n}-1}\tilde{p}_{n}Y_{T_{n}-2})$ . Next, we note

[TABLE]

where in the first equality we rewrote the sum based on the construction of $\hat{A}_{T_{n}-1}$ in Algorithm 4, in the second we have used the fact that $\mathbf{i}|j=\mathbf{i}^{\prime}|j$ for $j\in\{0,\ldots,T_{n}-2\}$ by Algorithm 4 (in words, $\mathbf{i}$ and $\mathbf{i}^{\prime}$ share the same ancestry in the tree), in the third we have recognized that the $\mathbf{i}$ -th summand does not depend on $\mathbf{i}$ , and in the fourth we have used $\mathbf{i}^{\prime}|(T_{n}-2)=\mathbf{i}^{\prime}$ (since $\mathbf{i}^{\prime}\in\hat{A}_{T_{n}-2}$ ) and the construction of the agent offspring of $\mathbf{i}^{\prime}$ in Algorithm 4. It follows that

[TABLE]

where $\mathbb{E}_{n,T_{n}-2}(d_{in}^{A}(\mathbf{i}^{\prime})/d_{in}(\mathbf{i}^{\prime}))=\tilde{p}_{n}$ holds by definition of $d_{in}^{A}(\mathbf{i}^{\prime}),d_{in}(\mathbf{i}^{\prime})$ in Algorithm 4 and of $\tilde{p}_{n}$ from (72). In summary, we have argued $\mathbb{E}_{n,T_{n}-2}(Y_{T_{n}-1}-Y_{T_{n}-2}\tilde{p}_{n})=0$ . On the other hand, we note $0\leq Y_{T_{n}-1}\leq Y_{T_{n}-2}\leq\cdots\leq Y_{0}=1$ , where the first inequality holds since $Y_{T_{n}-1}$ is a sum of nonnegative terms and the second holds by (303) (using $d_{in}(\mathbf{i}^{\prime})=d_{in}^{A}(\mathbf{i}^{\prime})+d_{in}^{B}(\mathbf{i}^{\prime})\geq d_{in}^{A}(\mathbf{i}^{\prime})$ ), and where $Y_{0}=1$ by definition. Hence, we can use Lemma 9.35 from Appendix 9.4 to obtain

[TABLE]

Substituting into (299) then yields

[TABLE]

We can then iteratively apply the preceding argument. Namely, we have

[TABLE]

(The precise form of the summations in (314) can be verified by considering the case $T_{n}=4$ in (312) and (313).) Note that the final step of the iteration is slightly different; this is because the root node has degrees sampled from $f_{n}^{*}$ (the uniform distribution) instead of $f_{n}$ (the size-biased distribution) in Algorithm 4. Nevertheless, a similar argument holds: here we have $\mathbb{E}_{n,0}Y_{1}=\tilde{p}_{n}^{*}Y_{0}$ and $Y_{1}\in[0,1]\ a.s.$ , so by an argument similar to that leading to (306),

[TABLE]

Combining the previous inequality with (307) and (314) then yields

[TABLE]

Next, we recall $Y_{0}=1$ by definition. Additionally, we have

[TABLE]

where the first equality uses the definition of $u_{T_{n},l}$ , the second rearranges summations, and the third uses (182). Combining the previous two equations therefore yields

[TABLE]

Hence, recalling that $\lambda=\tilde{\lambda}\theta/T_{n}$ , and substituting into (296), we have shown

[TABLE]

Clearly, this inequality still holds if we multiply both sides by $1(\Omega_{n,2})$ . Additionally, by A3, $\tilde{p}_{n}(\omega)<p_{n}+\delta_{n}$ for $\omega\in\Omega_{n,2}$ , where $p_{n}\rightarrow p$ and $\delta_{n}\rightarrow 0$ ; since we additionally assume $p<1$ in the statement of the lemma, we conclude $\tilde{p}_{n}(\omega)<p_{n}+\delta_{n}<1$ for $\omega\in\Omega_{n,2}$ and $n$ sufficiently large. For such $n$ , we can therefore write

[TABLE]

where the second inequality uses Lemma 9.33 from Appendix 9.4. Additionally, since $p_{n}\rightarrow p<1$ , we can use the argument leading to (195) to obtain $\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)](\omega)<c/T_{n}$ (for some $c$ independent of $n$ ) whenever $\omega\in\Omega_{n,2}$ and $n$ is sufficiently large. For such $n$ , we obtain

[TABLE]

Now since $\tilde{\lambda}>0$ was arbitrary, we can choose $\tilde{\lambda}=4T_{n}\epsilon\eta^{2}(1-(p_{n}+\delta_{n}))^{2}/\theta^{2}$ . Upon substituting into the exponent in the previous equation, this exponent becomes

[TABLE]

where the inequality simply uses $p_{n},\delta_{n}>0$ and $p_{n}+\delta_{n}\in(0,1)$ (for large $n$ ). Now note that since $p_{n}\rightarrow p$ , we have (for example) $(1-p_{n})^{2}>(1-p)^{2}/2$ for $n$ sufficiently large. Additionally, since $\delta_{n}=o(1/T_{n})$ , we have (for example) $T_{n}\delta_{n}<c/\epsilon$ for $n$ sufficiently large. Combining these observations, we can upper bound (331) as

[TABLE]

Hence, substituting into (327) gives

[TABLE]

Finally, we write

[TABLE]

where the first equality is the law of total expectation, the inequality uses (333) and upper bounds a probability by 1, and the second equality uses the assumptions in the statement of the lemma.

9.3.3 Where the proof fails in the general case

As shown in Appendix 7.4.1, extending Theorem 6.6 to the case $p_{n}\rightarrow 1$ amounts to showing that for some $\gamma^{\prime}>0$ ,

[TABLE]

where $L(p_{n})$ is the appropriate limit from (104). Here we show (roughly) why the approach from the preceding proof fails to establish (336) in the case $p_{n}\rightarrow 1$ . To begin, we note we first used the assumption $p_{n}\rightarrow p<1$ following (323). Hence, in the case $p_{n}\rightarrow 1$ , we can still follow the approach leading to (323) to obtain the (one-sided) bound

[TABLE]

where the approximate equality uses $\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)]\approx L(p_{n})$ on $\Omega_{n,2}$ by Lemma 7.15. We next note

[TABLE]

where the inequality discards nonnegative terms, the first equality is by definition of $u_{T_{n},l^{\prime}}$ , the second rearranges summations and multiplies/divides by $(\tilde{p}_{n}^{*})^{2}$ , and the third uses (182). Hence, we have shown (339) is (roughly) lower bounded by

[TABLE]

where we have also used $\tilde{p}_{n}^{*}\approx 1$ for large $n$ on $\Omega_{n,2}$ when $p_{n}\rightarrow 1$ by A3. Now we consider three cases for the exponent in the previous expression:

•

$T_{n}(1-p_{n})\rightarrow 0$ : Here Lemma 7.15 states $\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)]\approx\theta$ for large $n$ on $\Omega_{n,2}$ ; for such $n$ , the exponent is roughly

[TABLE]

where the inequality holds for large $n$ (so that $\theta(1-(1-\eta)^{T_{n}})/(T_{n}\eta)<1-1/\sqrt{2}$ , which holds since $T_{n}\rightarrow\infty$ ) and the equality holds by choosing the minimizing $\tilde{\lambda}$ (namely, $\tilde{\lambda}=8\epsilon/\theta^{2}$ ). Since this lower bound is constant in $n$ , (339) does not decay as $n$ grows.

•

$T_{n}(1-p_{n})\rightarrow c\in(0,\infty)$ : Here Lemma 7.15 states $\mathbb{E}_{n}[\hat{\vartheta}_{T_{n}}(\phi)]\approx\theta(1-e^{-c\eta})/(c\eta)$ for large $n$ on $\Omega_{n,2}$ . An argument similar to the previous case shows (339) does not decay as $n$ grows.

•

$T_{n}(1-p_{n})\rightarrow\infty$ with $p_{n}\rightarrow 1$ : Here we consider an example to show (339) does not decay sufficiently quickly for the general case. In particular, we assume $T_{n}=\bar{c}\log n$ for some constant $\bar{c}$ that satisfies the theorem assumptions and we set $p_{n}=1-(\log n)^{-0.9}$ . Then since $\delta_{n}=o((\log n)^{-1})$ per A3, we have e.g. $1-p_{n}+\delta_{n}<(1-p_{n})/2$ for large $n$ . Hence,

[TABLE]

where the first inequality holds by (189) in Appendix 9.2.1 (where $\gamma_{1},\gamma_{2}$ are arbitrarily small, hence the approximate inequality), the second holds for our chosen $T_{n},p_{n},\delta_{n}$ , and the third holds for some constant $\tilde{c}$ and for large $n$ . Hence, the exponent is (roughly) lower bounded by

[TABLE]

where the equality holds for the minimizer $\tilde{\lambda}=(4\epsilon/\tilde{c}^{2})(\log n)^{0.2}$ . From here it follows that (339) cannot be $O(n^{-\gamma^{\prime}})$ : if it is, we have for all large $n$ and for some constant $\tilde{C}$ ,

[TABLE]

The final inequality is a contradiction, since $-(2\epsilon^{2}/\tilde{c}^{2})(\log n)^{0.2}+\gamma^{\prime}\log n\rightarrow\infty$ as $n\rightarrow\infty$ .

9.4 Auxiliary results

In this appendix, we collect several auxiliary results used in other proofs. (These results are either cited from other sources, or their proofs are computationally heavy but elementary, so we collect them here to avoid cluttering other parts of our analysis.)

Lemma 9.31.

For $T_{n}\rightarrow\infty$ , $p_{n}\rightarrow 1$ , and $\delta_{n}\rightarrow 0$ s.t. $\delta_{n}=o(1/T_{n})$ , we have

[TABLE]

Proof 9.32.

We consider the three cases of (348) in turn; the proof of (349) follows the same approach.

First, suppose $\lim_{n\rightarrow\infty}T_{n}(1-p_{n})=\infty$ . Then since $T_{n}\delta_{n}\rightarrow 0$ and $T_{n}(1-p_{n})\rightarrow\infty$ , we have $T_{n}\delta_{n}<1<T_{n}(1-p_{n})$ for sufficiently large $n$ , which implies $(1-p_{n}-\delta_{n})>0$ for such $n$ . Clearly, we also have $(1-p_{n}-\delta_{n})<1$ for all $n$ . Taken together, it follows that $1-(1-\eta(1-p_{n}-\delta_{n}))^{T_{n}}\in(0,1)$ for $n$ large. For such $n$ , we can then write

[TABLE]

where we used $(1-p_{n}-\delta_{n})>0$ in the denominator. Now since $T_{n}(1-p_{n})\rightarrow\infty$ and $T_{n}\delta_{n}\rightarrow 0$ , $T_{n}(1-p_{n}-\delta_{n})\rightarrow\infty$ , so taking $n\rightarrow\infty$ in the above inequality gives the result.

Next, suppose $\lim_{n\rightarrow\infty}T_{n}(1-p_{n})=c\in(0,\infty)$ . Since $\eta T_{n}(1-p_{n}-\delta_{n})\rightarrow\eta c$ by $T_{n}(1-p_{n})\rightarrow c$ and $T_{n}\delta_{n}\rightarrow 0$ , it suffices to show $(1-\eta(1-p_{n}-\delta_{n}))^{T_{n}}\rightarrow e^{-\eta c}$ as $n\rightarrow\infty$ . First, since $T_{n}(1-p_{n})\rightarrow c$ , $\forall\ \epsilon_{1}>0\ \exists\ N_{1}$ s.t. $c-\epsilon_{1}<T_{n}(1-p_{n})<c+\epsilon_{1}\ \forall\ n\geq N_{1}$ . Further, since $T_{n}\delta_{n}\rightarrow 0$ , $\forall\ \epsilon_{2}>0\ \exists\ N_{2}$ s.t. $-\epsilon_{2}<T_{n}\delta_{n}<\epsilon_{2}\ \forall\ n\geq N_{2}$ . Hence, $\forall\ n\geq\max\{N_{1},N_{2}\}$ ,

[TABLE]

Next, we note

[TABLE]

Hence, $\forall\ \epsilon_{3}>0\ \exists\ N_{3}$ s.t. $\forall\ n\geq N_{3}$ ,

[TABLE]

Combining these arguments, we obtain $\forall\ n\geq\max\{N_{1},N_{2},N_{3}\}$

[TABLE]

Since both bounds converge to $e^{-\eta c}$ as $\epsilon_{1},\epsilon_{2},\epsilon_{3}\rightarrow 0$ , $(1-\eta(1-p_{n}-\delta_{n}))^{T_{n}}\rightarrow e^{-\eta c}$ follows.

Finally, suppose $\lim_{n\rightarrow\infty}T_{n}(1-p_{n})=0$ . First, we observe

[TABLE]

where the inequality holds for $n$ s.t. $(1-p_{n}-\delta_{n})>0$ (which indeed occurs for large $n$ ; see proof of $T_{n}(1-p_{n})\rightarrow\infty$ case), since then the sum is over $T_{n}$ terms, each upper bounded by 1. On the other hand, we can use the binomial theorem to write

[TABLE]

Next, we observe (assuming $(1-p_{n}-\delta_{n})>0)$ as above)

[TABLE]

where the first inequality replaces negative terms with positive ones; the second inequality uses $\eta<1$ , $(t-2)!<t!$ , and $(T_{n}-j)<T_{n}$ for $j>0$ ; and the third inequality upper bounds the summation by replacing its upper limit with infinity. Hence, (355), (356), and (358) yield

[TABLE]

where the final equality holds since $T_{n}(1-p_{n}),T_{n}\delta_{n}\rightarrow 0$ by assumption.

Lemma 9.33.

Let $u_{T_{n},l}=\sum_{t=l}^{T_{n}-1}{t\choose l}\eta^{l}(1-\eta)^{t-l}$ . Then for any $x\in(0,1)$ ,

[TABLE]

Proof 9.34.

For $l\in\mathbb{N}_{0}$ , define $w_{l}=\sum_{l^{\prime}=l}^{T_{n}-1}u_{T_{n},l^{\prime}}x^{l^{\prime}-l}$ . Then

[TABLE]

Assuming temporarily that $u_{T_{n},l^{\prime}}\geq u_{T_{n},l^{\prime\prime}}$ whenever $l^{\prime}\leq l^{\prime\prime}$ (which we will return to prove),

[TABLE]

Hence, using the previous two equations, we obtain $w_{l+1}-w_{l}=(1-x)w_{l+1}-u_{T_{n}-l}\leq 0$ , i.e. the sequence $\{w_{l}\}$ decreases in $l$ . It is also clearly nonnegative. Therefore,

[TABLE]

To further bound the right hand side, we note

[TABLE]

where the first equality uses the definition of $u_{T_{n},l^{\prime}}$ , the second rearranges summations, the third uses the binomial theorem, the fourth is immediate, the inequality is immediate, and the final equality computes a geometric series. Combining the previous two inequalities proves the lemma.

We return to prove $u_{T_{n},l^{\prime}}\geq u_{T_{n},l^{\prime\prime}}$ whenever $l^{\prime}\leq l^{\prime\prime}$ . For this, we first claim

[TABLE]

We prove (369) by induction on $t^{*}$ . First, when $t^{*}=1$ , the only case to prove is $l=1$ ; when $t^{*}=l=1$ , it is immediate that both sides of (369) equal $\eta(1-\eta)$ . Next, assume (369) holds for $t^{*}-1$ . If $l=t^{*}$ , both sides of (369) equal $\eta^{t^{*}}(1-\eta)$ . If $l\in\{1,\ldots,t^{*}-1\}$ , we write

[TABLE]

where the first equality simply writes the final summands separately, the second uses the inductive hypothesis on the term in parentheses, the third is immediate, the fourth uses Pascal’s rule ( $[t^{*}+1]$ has ${t^{*}+1\choose l+1}$ subsets of cardinality $l+1$ ; ${t^{*}\choose l}$ that contain 1 and ${t^{*}\choose l+1}$ that do not contain 1), and the fifth is immediate. This establishes (369). We then write

[TABLE]

where the first equality holds by definition of $u_{T_{n},l^{\prime}}$ , the second adds and subtracts a term, and the third uses (369). This shows $u_{T_{n},l^{\prime}}\geq u_{T_{n},l^{\prime}+1}$ , iterating gives $u_{T_{n},l^{\prime}}\geq u_{T_{n},l^{\prime\prime}}$ whenever $l^{\prime}\leq l^{\prime\prime}$ .

Lemma 9.35.

Let $Z$ be a random variable satisfying $\mathbb{E}Z=0$ and $Z\in[a,b]\ a.s.$ , and let $\lambda>0$ . Then

[TABLE]

Proof 9.36.

See e.g. [35, Lemma 5.1].

10 Belief convergence results

In this appendix, we prove two belief convergence results. We first discuss some basic tools used in both proofs. To begin, fix $p\geq 1$ and $y\in[0,1]$ , and let $\mu$ be a probability measure over $[0,1]$ . Then clearly,

[TABLE]

where $X$ is distributed as $\mu$ in the final expression. Also, for any $\upsilon>0$ , since $|X-y|\leq 1$ , we clearly have

[TABLE]

Furthermore, for any $z\in[0,1]$ such that $|y-z|^{p}\leq 2^{-p}\upsilon$ , by convexity,

[TABLE]

which implies

[TABLE]

Combined with (380) and (381), we obtain

[TABLE]

Next, we recall some notation and basic results from Appendix 7.1 and 9.1.1. Denote by $\alpha_{t}$ and $\beta_{t}$ the parameters $\{\alpha_{t}(i)\}_{i\in A\cup B}$ and $\{\beta_{t}(i)\}_{i\in A\cup B}$ in vector form. Set $Q=(1-\eta)I+\eta P$ , where $P$ is the graph’s column-normalized adjacency matrix (normalized so each column sums to $1$ ). Let $\mathbf{1}$ be the all ones vector. Then (see (162) in Appendix 9.1.1)

[TABLE]

Hence, by column stochasticity of $Q$ , we obtain the following componentwise inequality:

[TABLE]

Finally, for any $\alpha,\beta\in(0,\infty)$ , we let $X(\alpha,\beta)$ denote a $\text{Beta}(\alpha,\beta)$ random variable. Thus, recalling the expressions for the mean and variance of the beta distribution, for any $\upsilon>0$ , Chebyshev’s inequality implies

[TABLE]

In particular, for any $i\in[n]$ and $t\in\mathbb{N}$ , since $\theta_{t}(i)=\frac{\alpha_{t}(i)}{\alpha_{t}(i)+\beta_{t}(i)}$ by definition, the previous two inequalities imply

[TABLE]

Hence, because $\mu_{t}(i)$ is the $\text{Beta}(\alpha_{t}(i),\beta_{t}(i))$ distribution, we can use (384) to obtain the following:

[TABLE]

10.1 Proof of Proposition 1

Fix $i\in A$ . We first show $\theta_{t}(i)\rightarrow 0$ Let $e_{A}=\sum_{i^{\prime}\in A}e_{i^{\prime}}$ . Recall $s_{\tau}(i^{\prime})\in\{0,1\}$ for $i^{\prime}\in A$ and $s_{\tau}(i^{\prime})=0$ for $i^{\prime}\in B$ , so $s_{\tau}\leq e_{A}$ componentwise. Combined with (385) and (386), we obtain

[TABLE]

Hence, because $\alpha_{0}Q^{t}e_{i}^{\mathsf{T}}$ is bounded independent of $t$ , it suffices to show that for any $\epsilon>0$ and all $t$ large,

[TABLE]

Toward this end, we first observe that $Q(i^{\prime},i^{\prime})=1$ for any $i^{\prime}\in B$ , i.e., $B$ is a set of absorbing states in the Markov chain with transition matrix $Q$ . By the assumption of the proposition, all agents can reach this set. Taken together, we have an absorbing Markov chain with absorbing states $B$ and non-absorbing states $A$ . It follows that $e_{A}Q^{\tau}e_{i}^{\mathsf{T}}\rightarrow 0$ as $\tau\rightarrow\infty$ . Hence, we can find $T_{\epsilon}$ such that $e_{A}Q^{\tau}e_{i}^{\mathsf{T}}<\frac{\epsilon}{2}$ whenever $\tau\geq T_{\epsilon}$ . Thus, for any $t\geq 2T_{\epsilon}/\epsilon$ , we obtain the desired inequality (391):

[TABLE]

Next, we show $W_{p}(\mu_{t}(i),\delta_{0})\rightarrow 0$ Fix $\epsilon>0$ . Since $\theta_{t}(i)\rightarrow 0$ , we can find $T(\epsilon)$ such that

[TABLE]

Hence, if we let $y=0$ and $\upsilon=\epsilon^{p}/2$ , then $|y-\theta_{t}(i)|=|\theta_{t}(i)|<2^{-(p+1)}\epsilon^{p}=2^{-p}\upsilon$ , so by (389) and (393), for any $t\geq T(\epsilon)$ ,

[TABLE]

10.2 Proof of Corollary 1

Fix $p\geq 1$ and $\epsilon>0$ . Similar to the proof of Proposition 1, we set $y=L(p_{n})$ and $\upsilon=\epsilon^{p}/2$ . Then by (389),

[TABLE]

Hence, because $T_{n}\rightarrow\infty$ as $n\rightarrow\infty$ by A4, we conclude that for all $n$ sufficiently large,

[TABLE]

Combined with Theorem 1, we thus obtain

[TABLE]

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Shearer and J. Gottfried, “News use across social media platforms 2017,” Pew Research Center, Journalism and Media , 2017.
2[2] H. Allcott and M. Gentzkow, “Social media and fake news in the 2016 election,” Journal of Economic Perspectives , vol. 31, no. 2, pp. 211–36, 2017.
3[3] P. Savodnik, “‘You start seeing the dreaded sensitivity label’: Is a pro-Trump Twitter army strategically throttling Biden ads?” https://www.vanityfair.com/news/2020/10/is-a-pro-trump-twitter-army-strategically-throttling-biden-ads, 2020.
4[4] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, “Non-bayesian social learning,” Games and Economic Behavior , vol. 76, no. 1, pp. 210–225, 2012.
5[5] C. Shao, G. L. Ciampaglia, O. Varol, K.-C. Yang, A. Flammini, and F. Menczer, “The spread of low-credibility content by social bots,” Nature communications , vol. 9, no. 1, p. 4787, 2018.
6[6] M. Azzimonti and M. Fernandes, “Social media networks, fake news, and polarization,” National Bureau of Economic Research, Tech. Rep., 2018.
7[7] B. Golub and M. O. Jackson, “Naive learning in social networks and the wisdom of crowds,” American Economic Journal: Microeconomics , vol. 2, no. 1, pp. 112–49, 2010.
8[8] D. Acemoglu, A. Ozdaglar, and A. Parandeh Gheibi, “Spread of (mis) information in social networks,” Games and Economic Behavior , vol. 70, no. 2, pp. 194–227, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Local non-Bayesian social learning with stubborn agents

Abstract

1 Introduction

2 Learning model

3 Learning outcome

Proposition 1

3.1 Graph model

3.2 Main result

Theorem 1

Corollary 1

3.3 Comments on assumptions

4 Adversarial setting

4.1 Exact solution

4.2 Approximation algorithm

Theorem 2

Proof 4.3**.**

Corollary 4.4**.**

Proof 4.5**.**

4.3 Empirical results

5 Related work

6 Special case

Theorem 6.6**.**

7 Proof of Theorems 1 and 6.6 (outline)

7.1 Branching process approximation (Step 1 for proofs of Theorems 1 and 6.6)

Lemma 7.7**.**

Proof 7.8**.**

Lemma 7.9**.**

Proof 7.10**.**

Lemma 7.11**.**

Proof 7.12**.**

Lemma 7.13**.**

Proof 7.14**.**

7.2 Step 2 for proof of Theorem 1

Lemma 7.15**.**

Proof 7.16**.**

Lemma 7.17**.**

Proof 7.18**.**

Lemma 7.19**.**

Proof 7.20**.**

Lemma 7.21**.**

Proof 7.22**.**

7.3 Step 2 for proof of Theorem 6.6

Lemma 7.23**.**

Proof 7.24**.**

Lemma 7.25**.**

Proof 7.26**.**

7.4 Other remarks

7.4.1 A sufficient condition for extending Theorem 6.6

7.4.2 Comparing Step 2 for proofs of Theorems 1 and 6.6

8 Section 4 proof and experiment details

8.1 Solution of the relaxed problem

8.2 Rewriting the objective function

8.3 Self-bounding concentration

Definition 8.27**.**

Theorem 8.28**.**

8.4 Proof of Corollary 4.4

8.5 Other algorithmic details

Definition 8.29**.**

Theorem 8.30**.**

8.6 Additional experiments

9 Proof of Theorems 1 and 6.6 (details)

9.1 Branching process approximation (Step 1 for proofs of Theorems 1 and 6.6)

9.1.1 Proof of Lemma 7.7

9.1.2 Proof of Lemma 7.9

9.1.3 Proof of Lemma 7.13

9.2 Step 2 for proof of Theorem 1

9.2.1 Proof of Lemma 7.15

9.2.2 Proof of Lemma 7.17

9.2.3 Notation for proofs of Lemmas 7.19 and 7.21

9.2.4 Proof of Lemma 7.19

9.2.5 Proof of Lemma 7.21

9.3 Step 2 for proof of Theorem 6.6

9.3.1 Proof of Lemma 7.23

9.3.2 Proof of Lemma 7.25

Proof 4.3.

Corollary 4.4.

Proof 4.5.

Theorem 6.6.

Lemma 7.7.

Proof 7.8.

Lemma 7.9.

Proof 7.10.

Lemma 7.11.

Proof 7.12.

Lemma 7.13.

Proof 7.14.

Lemma 7.15.

Proof 7.16.

Lemma 7.17.

Proof 7.18.

Lemma 7.19.

Proof 7.20.

Lemma 7.21.

Proof 7.22.

Lemma 7.23.

Proof 7.24.

Lemma 7.25.

Proof 7.26.

Definition 8.27.

Theorem 8.28.

Definition 8.29.

Theorem 8.30.

Lemma 9.31.

Proof 9.32.

Lemma 9.33.

Proof 9.34.

Lemma 9.35.

Proof 9.36.