On the Distance Between the Rumor Source and Its Optimal Estimate in a   Regular Tree

Tetsunao Matsuta; Tomohiko Uyematsu

arXiv:1901.03039·cs.IT·January 23, 2019

On the Distance Between the Rumor Source and Its Optimal Estimate in a Regular Tree

Tetsunao Matsuta, Tomohiko Uyematsu

PDF

TL;DR

This paper analyzes the accuracy of rumor source detection in regular trees, showing that the estimated source is typically within three edges of the true origin with high probability.

Contribution

It provides a probabilistic analysis of the distance between the true rumor source and its optimal estimate in regular tree networks.

Findings

01

The estimated rumor source is within distance 3 of the true source with high probability.

02

The probability distribution of the distance between the true source and the estimate is characterized.

03

The analysis applies specifically to regular, cycle-free tree networks.

Abstract

This paper addresses the rumor source identification problem, where the goal is to find the origin node of a rumor in a network among a given set of nodes with the rumor. In this paper, we focus on a network represented by a regular tree which does not have any cycle and in which all nodes have the same number of edges connected to a node. For this network, we clarify that, with quite high probability, the origin node is within the distance 3 from the node selected by the optimal estimator, where the distance is the number of edges of the unique path connecting two nodes. This is clarified by the probability distribution of the distance between the origin and the selected node.

Equations588

\displaystyle\varphi_{\mathrm{ML}}(\mathcal{G}_{n})\triangleq\mathop{\mathrm{missing}}{argmax}\limits_{v\in\mathcal{V}(\mathcal{G}_{n})}\Pr\{\mathcal{G}_{n}|v\},

\displaystyle\varphi_{\mathrm{ML}}(\mathcal{G}_{n})\triangleq\mathop{\mathrm{missing}}{argmax}\limits_{v\in\mathcal{V}(\mathcal{G}_{n})}\Pr\{\mathcal{G}_{n}|v\},

n \to \infty lim C_{n} = δ I_{1/2} (\frac{1}{δ - 2}, \frac{δ - 1}{δ - 2}) - (δ - 1),

n \to \infty lim C_{n} = δ I_{1/2} (\frac{1}{δ - 2}, \frac{δ - 1}{δ - 2}) - (δ - 1),

D_{n} (d) ≜ Pr {d_{G} (\hat{V}_{n}, v_{1}) = d},

D_{n} (d) ≜ Pr {d_{G} (\hat{V}_{n}, v_{1}) = d},

n \to \infty lim D_{n} (d) = f (d),

n \to \infty lim D_{n} (d) = f (d),

f (d)

f (d)

\times (\frac{1}{4} + l = 1 \sum d (- 1)^{l} (\frac{( ln 2 ) ^{l}}{l !} - 2 + m = 0 \sum l \frac{( ln 2 ) ^{m}}{m !})) .

0 \leq n \to \infty lim D_{n} (d) - g (δ, d, m) \leq ϵ_{m},

0 \leq n \to \infty lim D_{n} (d) - g (δ, d, m) \leq ϵ_{m},

g (δ, d, m)

g (δ, d, m)

p_{1} (δ, d, k)

p_{2} (δ, k)

- (δ - 1) I_{1/2} (k - 1 + \frac{δ - 1}{δ - 2}, \frac{1}{δ - 2}),

n \to \infty lim Pr {d_{G} (\hat{V}_{n}, v_{1}) \leq d} = l = 0 \sum d f (l) .

n \to \infty lim Pr {d_{G} (\hat{V}_{n}, v_{1}) \leq d} = l = 0 \sum d f (l) .

0 \leq n \to \infty lim Pr {d_{G} (\hat{V}_{n}, v_{1}) \leq d} - l = 0 \sum d g (δ, l, m) \leq d \cdot ϵ_{m} .

0 \leq n \to \infty lim Pr {d_{G} (\hat{V}_{n}, v_{1}) \leq d} - l = 0 \sum d g (δ, l, m) \leq d \cdot ϵ_{m} .

Pr {\hat{V}_{n} = v ∣ v \in V_{n}, \overline{X} (v) < n /2}

Pr {\hat{V}_{n} = v ∣ v \in V_{n}, \overline{X} (v) < n /2}

Pr {\hat{V}_{n} = v ∣ v \in V_{n}, \overline{X} (v) = n /2}

Pr {\hat{V}_{n} = v ∣ v \in V_{n}, \overline{X} (v) > n /2}

D_{n} (d)

D_{n} (d)

= Pr {\hat{V}_{n} \in V^{(d)}}

= v^{(d)} \in V^{(d)} \sum Pr {\hat{V}_{n} = v^{(d)}}

= v^{(d)} \in V^{(d)} \sum Pr {v^{(d)} \in V_{n}, \hat{V}_{n} = v^{(d)}}

= v^{(d)} \in V^{(d)} \sum (Pr {v^{(d)} \in V_{n}, \hat{V}_{n} = v^{(d)}, \overline{X} (v^{(d)}) < n /2}

+ Pr {v^{(d)} \in V_{n}, \hat{V}_{n} = v^{(d)}, \overline{X} (v^{(d)}) = n /2}

+ Pr {v^{(d)} \in V_{n}, \hat{V}_{n} = v^{(d)}, \overline{X} (v^{(d)}) > n /2})

= v^{(d)} \in V^{(d)} \sum (Pr {v^{(d)} \in V_{n}, \overline{X} (v^{(d)}) < n /2}

+ 1/2 Pr {v^{(d)} \in V_{n}, \overline{X} (v^{(d)}) = n /2}),

Pr {v^{(d)} \in V_{n}, \overline{X} (v^{(d)}) < n /2}

Pr {v^{(d)} \in V_{n}, \overline{X} (v^{(d)}) < n /2}

= Pr {\cup_{k = d + 1}^{n} {V_{k} = v^{(d)}}, \overline{X} (v^{(d)}) < n /2}

= Pr {\cup_{k = d + 1}^{n} {V_{k} = v^{(d)}, \overline{X} (v^{(d)}) < n /2}}

= k = d + 1 \sum n Pr {V_{k} = v^{(d)}, \overline{X} (v^{(d)}) < n /2}

= k = d + 1 \sum ⌈ n /2 ⌉ Pr {V_{k} = v^{(d)}, \overline{X} (v^{(d)}) < n /2}

= k = d + 1 \sum ⌈ n /2 ⌉ x^{δ} : \sum_{j = 1}^{δ} x_{j} = n - 1, m a x_{1 \leq j \leq δ} {x_{j}} < n /2 \sum Pr {V_{k} = v^{(d)}, X^{δ} (v^{(d)}) = x^{δ}},

Pr {v^{(d)} \in V_{n}, \overline{X} (v^{(d)}) = n /2}

Pr {v^{(d)} \in V_{n}, \overline{X} (v^{(d)}) = n /2}

= k = d + 1 \sum ⌊ n /2 ⌋ + 1 x^{δ} : \sum_{j = 1}^{δ} x_{j} = n - 1, m a x_{1 \leq j \leq δ} {x_{j}} = n /2 \sum Pr {V_{k} = v^{(d)}, X^{δ} (v^{(d)}) = x^{δ}} .

Pr {V_{n} = v_{n} ∣ V^{n - 1} = v^{n - 1}}

Pr {V_{n} = v_{n} ∣ V^{n - 1} = v^{n - 1}}

= \frac{1}{( n - 1 ) δ - 2 ( n - 2 )} .

{V_{j_{1}} = v^{(d, 1)}, V_{j_{2}} = v^{(d, 2)}, \dots, V_{j_{d}} = v^{(d, d)}}

{V_{j_{1}} = v^{(d, 1)}, V_{j_{2}} = v^{(d, 2)}, \dots, V_{j_{d}} = v^{(d, d)}}

= {V_{2} \neq = v^{(d, 1)}, V_{3} \neq = v^{(d, 1)}, \dots, V_{j_{1} - 1} \neq = v^{(d, 1)}, V_{j_{1}} = v^{(d, 1)},

V_{j_{1} + 1} \neq = v^{(d, 2)}, \dots, V_{j_{2} - 1} \neq = v^{(d, 2)}, V_{j_{2}} = v^{(d, 2)}, \dots,

V_{j_{d - 1} + 1} \neq = v^{(d, d)}, \dots, V_{j_{d} - 1} \neq = v^{(d, d)}, V_{j_{d}} = v^{(d, d)}}

\displaystyle=\cap_{i=1}^{d}\Big{\{}\cap_{l=j_{i-1}+1}^{j_{i}-1}\{V_{l}\neq v^{(d,i)}\},V_{j_{i}}=v^{(d,i)}\Big{\}}

= \cap_{i = 1}^{d} E_{i},

Pr {V_{k} = v^{(d)}}

Pr {V_{k} = v^{(d)}}

= 2 \leq j_{1} < j_{2} < \dots < j_{d - 1} < j_{d} = k \sum Pr {\cap_{i = 1}^{d} E_{i}}

= 2 \leq j_{1} < j_{2} < \dots < j_{d - 1} \leq k - 1 \sum Pr {\cap_{i = 1}^{d} E_{i}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the Distance Between the Rumor Source

and Its Optimal Estimate in a Regular Tree††thanks: An earlier version was presented at SITA2014 [1]. In this paper, we improved notations, added Corollary 1, revised proofs, and corrected the bound of Theorem 3 and many errors.

Tetsunao Matsuta1 and Tomohiko Uyematsu2

Department of Information and Communications Engineering, Tokyo Institute of Technology

Email: [email protected], [email protected]

Abstract

This paper addresses the rumor source identification problem, where the goal is to find the origin node of a rumor in a network among a given set of nodes with the rumor. In this paper, we focus on a network represented by a regular tree which does not have any cycle and in which all nodes have the same number of edges connected to a node. For this network, we clarify that, with quite high probability, the origin node is within the distance “ $3$ ” from the node selected by the optimal estimator, where the distance is the number of edges of the unique path connecting two nodes. This is clarified by the probability distribution of the distance between the origin and the selected node.

I Introduction

In social networks, a rumor spreads like an infectious disease. In fact, it can be modeled as an infectious disease [2, 3]. The most common theme of studies about a rumor (or infectious disease) is to analyze mechanisms of a spreading behavior of a rumor in a given network [4, 5].

Unlike this type of studies, we address the rumor source identification problem introduced by Shah and Zaman [3]. The goal of this problem is to find the origin node of a rumor (rumor source) in a network among a given set of nodes with the rumor. If the rumor source can be detected, it is available to find a weak node which spreads a computer virus, to give ranking to websites for a search engine, etc. For this problem, Shah and Zaman [3] introduced the optimal estimator and analyzed the correct detection probability of it for some types of networks. This probability asymptotically goes to one for a very special network called geometric tree (see [3, Sec. IV.D]). However, they analytically or experimentally showed that the probability is asymptotically not high or goes to zero for many other networks such as regular trees, small-world networks, and scale-free networks, where a regular tree is a network which does not have any cycle and in which all nodes have the same degree, i.e, the number of edges connected to a node.

Although the optimal estimator may not find the rumor source, it actually selects a node near the rumor source. This fact is known experimentally (cf. [3, Sect. V.B] and [6, Sect. 8]) and is not known analytically to the best of our knowledge. In this paper, we focus on this fact and clarify it analytically. Especially, we focus on regular trees and clarify that, with quite high probability, the rumor source is within the distance “ $3$ ” from the node selected by the optimal estimator, where the distance is the number of edges of the unique path connecting two nodes. This is clarified by the probability distribution of the distance between the rumor source and the selected node.

II Rumor Source Identification Problem

In this section, we introduce the rumor source identification problem and show some known results of this problem.

Let $\mathcal{G}$ be an undirected and connected graph. Let $\mathcal{V}(\mathcal{G})$ denote the set of nodes and $\mathcal{E}(\mathcal{G})$ denote the set of edges of the graph $\mathcal{G}$ . We denote the edge connecting two nodes $i,j\in\mathcal{V}(\mathcal{G})$ by the set of nodes $\{i,j\}\in\mathcal{E}(\mathcal{G})$ . In this paper, we consider the case where $\mathcal{G}$ is a regular tree, that is, the graph does not have any cycle, and all nodes have the same degree222The line graph ( $\delta=2$ ) is not concerned in this paper because this case is somewhat difficult to treat in a unified manner. However, essential argument for this case is the same as the case where $\delta\geq 3$ . $\delta\geq 3$ . We assume that the number of nodes is countably infinite in order to avoid boundary effects.

A rumor spreads in a given regular tree $\mathcal{G}$ . Initially, the only one node $v_{1}\in\mathcal{V}(\mathcal{G})$ (the rumor source) possesses a rumor. The node possessing the rumor infects it to connected adjacent nodes, and these nodes keep it forever. For $\{i,j\}\in\mathcal{E}(\mathcal{G})$ , let $\tau_{ij}\in\mathbb{R}$ be a real-valued random variable (RV) that represents the rumor spreading time from the node $i$ to the node $j$ after $i$ gets the rumor. In this model, spreading times $\{\tau_{ij}:\{i,j\}\in\mathcal{E}(\mathcal{G})\}$ are independent and drawn according to the exponential distribution with the unit mean. Thus, the cumulative distribution function $F$ of $\tau_{ij}$ is represented as $F(x)=1-e^{-x}$ if $x\geq 0$ , and $F(x)=0$ if $x\leq 0$ . This spreading model is sometimes called the susceptible-infected (SI) model [3].

Suppose that we observe a network consisted of $n$ infected nodes in the graph $\mathcal{G}$ at some time. Since the rumor spreads to the connected adjacent nodes, this network is a connected subgraph of $\mathcal{G}$ . We denote the RV of this network by $G_{n}$ and its realization as $\mathcal{G}_{n}$ . We only know an observed network and do not know the realization of spreading times on edges. Then, the goal of the rumor source identification problem is to find the rumor source $v_{1}$ among $\mathcal{V}(\mathcal{G}_{n})$ given $\mathcal{G}_{n}$ .

For this problem, the optimal estimator is the maximum likelihood (ML) estimator $\varphi_{\mathrm{ML}}(\mathcal{G}_{n})$ (cf. [3]) defined as

[TABLE]

where ties broken uniformly at random and $\Pr\{\mathcal{G}_{n}|v\}$ is the probability observing $\mathcal{G}_{n}$ under the SI model assuming $v$ is the rumor source. For this optimal estimator, let $\mathbf{C}_{n}$ be the correct detection probability when a graph of $n$ infected nodes is observed, i.e., $\mathbf{C}_{n}=\Pr\{\varphi_{\mathrm{ML}}(G_{n})=v_{1}\}$ . Shah and Zaman [7] showed the asymptotic behavior of $\mathbf{C}_{n}$ as the next theorem.

Theorem 1 ([7, Theorem 3.1])

For a regular tree with degree $\delta\geq 3$ , it holds that

[TABLE]

where $I_{x}(a,b)$ is the regularized incomplete beta function defined as $I_{x}(a,b)\triangleq\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\int_{0}^{x}t^{a-1}(1-t)^{b-1}dt,$ and $\Gamma(\cdot)$ is the Gamma function.

According to this theorem, when $\delta=3$ , $\lim_{n\rightarrow\infty}\mathbf{C}_{n}=0.25$ . Moreover, it rapidly converges to $1-\ln(2)\approx 0.307$ as $\delta$ goes to infinity (cf. [7, Corollary 1 and Figure 3]). This means that, unfortunately, the correct detection probability is not very high for regular trees.

III Main Results

In this section, we show that the ML estimator can select a node near the rumor source with high probability.

To this end, we clarify the probability distribution of the distance $d\ (\geq 1)$ between the rumor source and the node selected by the ML estimator. We denote this probability by $\mathbf{D}_{n}(d)$ and define it as

[TABLE]

where $\hat{V}_{n}=\varphi_{\mathrm{ML}}(G_{n})$ and $d_{\mathcal{G}}(v,w)$ denotes the distance between nodes $v$ and $w$ in the graph $\mathcal{G}$ . Note that $\mathbf{D}_{n}(0)=\mathbf{C}_{n}$ .

When $\delta=3$ , we can clarify a closed-form expression of the asymptotic behavior of $\mathbf{D}_{n}(d)$ as the next theorem.

Theorem 2

Let $\delta=3$ . Then, for any $d\geq 1$ , we have

[TABLE]

where

[TABLE]

We denote the rising factorial $x(x+1)(x+2)\cdots(x+k-1)$ by $x^{\overline{k}}$ . The next theorem gives tight upper and lower bounds of $\lim_{n\rightarrow\infty}\mathbf{D}_{n}(d)$ for more general degrees.

Theorem 3

For any $\delta\geq 3$ , $d\geq 1$ , and $m\geq d+1$ , we have

[TABLE]

where $\epsilon_{m}=e^{2}(8+5m+m^{2})2^{-m+3}$ ,

[TABLE]

$\zeta_{k}^{d}(x)=\sum_{1\leq j_{1}<j_{2}<\cdots<j_{d}\leq k}\left(\prod_{i=1}^{d}\frac{1}{j_{i}+x}\right)$ , and $\zeta_{k}^{0}(x)=1$ for any $k\geq 0$ .

$\zeta_{k}^{d}(x)$ is a partial sum of the multiple Hurwitz zeta function (cf. e.g. [8]) or the shifted multiple harmonic sums (cf. e.g. [9]). We note that the difference of bounds (i.e., $\epsilon_{m}$ ) does not depend on degrees.

These theorems imply that the ML estimator can select a node near the rumor source with high probability. This is clear from the next corollary and its numerical results (Fig. 1).

Corollary 1

Let $\delta=3$ . Then, for any $d\geq 0$ , we have

[TABLE]

More generally, for any $\delta\geq 3$ , $d\geq 0$ , and $m\geq d+1$ , we have

[TABLE]

Here, $f(0)$ and $g(\delta,0,m)$ denote the right-hand side of (1).

Proof:

By noticing that $\Pr\{d_{\mathcal{G}}(\hat{V}_{n},v_{1})\leq d\}=\sum_{l=0}^{d}\mathbf{D}_{n}(l)$ , the corollary is immediately obtained by Theorems 1-3. ∎

Since $\epsilon_{40}\approx 10^{-7}$ , Fig. 1 gives almost exact numerical results of $\lim_{n\rightarrow\infty}\Pr\{d_{\mathcal{G}}(\hat{V}_{n},v_{1})\leq d\}$ . We note that numerical results for other degrees $\delta$ are almost the same (see Fig. 2). Thus, these results show that the rumor source is within the distance $3$ from the node selected by the ML estimator with quite high probability. We note that Khim and Loh [6, Corollary 2] gave another lower bound of $\lim_{n\rightarrow\infty}\Pr\{d_{\mathcal{G}}(\hat{V}_{n},v_{1})\leq d\}$ . However, it is quite looser than our bound and is zero at least values of parameters $d$ and $\delta$ are within the rage in Fig. 1 and Fig. 2.

IV Proofs of Theorems

In this section, we prove our main theorems. We will denote $n$ -length sequences of RVs $(X_{1},X_{2},\cdots,X_{n})$ and its realizations $(x_{1},x_{2},\cdots,x_{n})$ by $X^{n}$ and $x^{n}$ , respectively. For the sake of brevity, we denote $\mathcal{V}(\mathcal{G})$ by $\mathcal{V}$ and $\mathcal{V}(G_{n})$ by $\mathcal{V}_{n}$ .

For any node $v\in\mathcal{V}$ in a regular tree with degree $\delta\geq 3$ , there are $\delta$ neighbors. Thus, there are $\delta$ subtrees rooted at these $\delta$ neighbors with the parent node $v$ . In other words, the regular tree is divided into these $\delta$ subtrees and the node $v$ . Let $X_{j}(v)$ be the number of infected nodes in the $j$ th subtree among those subtrees ( $j=1,2\cdots,\delta$ ). When $v$ is not the rumor source, let $\delta$ th subtree contain the rumor source $v_{1}$ . Note that, if $v$ is an infected node, we have $\sum_{j=1}^{\delta}X_{j}(v)=n-1$ . The next lemma is a key lemma to prove our main theorems.

Lemma 1

For a node $v\in\mathcal{V}$ , let $\overline{X}(v)=\max_{1\leq j\leq\delta}\{X_{j}(v)\}$ . Then, we have

[TABLE]

Since this lemma can be obtained by [10, Proposition 1] (see also [10, Lemma 6]), we prove this in Appendix C.

We denote the set of nodes with distance $d\ (\geq 1)$ from the rumor source by $\mathcal{V}^{(d)}$ . Note that the number of elements of $\mathcal{V}^{(d)}$ is $\delta(\delta-1)^{d-1}$ . Then, $\mathbf{D}_{n}(d)$ can be represented as

[TABLE]

where the last equality comes from Lemma 1.

On the other hand, let $\{V_{i}\}_{i=1}^{\infty}$ be the sequence of RVs each representing $i$ th infected node, where $V_{1}=v_{1}$ with probability 1. Then, we have $\mathcal{V}_{n}=\{V_{1},V_{2},\cdots,V_{n}\}$ . This implies that the event $\{v^{(d)}\in\mathcal{V}_{n}\}$ is equal to the event $\cup_{k=d+1}^{n}\{V_{k}=v^{(d)}\}$ . Hence, we have

[TABLE]

where $X^{\delta}(v)=(X_{1}(v),X_{2}(v),\cdots,X_{\delta}(v))$ . We also have

[TABLE]

Thus, we need to obtain closed-form expressions of $\Pr\{V_{k}=v^{(d)}\}$ and $\Pr\{X^{\delta}(v^{(d)})=x^{\delta}|V_{k}=v^{(d)}\}$ .

IV-A Closed-Form Expression of $\Pr\{V_{k}=v^{(d)}\}$

Let $\mathcal{N}(v)$ be the set of neighboring nodes of $v$ in the graph $\mathcal{G}$ . Suppose that the set $\hat{\mathcal{V}}$ of nodes are infected with a rumor, and any other nodes are not infected. Then, we denote the set of boundary nodes which may be infected by the infected nodes $\hat{\mathcal{V}}$ by $\mathcal{B}(\hat{\mathcal{V}})$ , i.e., $\mathcal{B}(\hat{\mathcal{V}})=\{\cup_{v\in\hat{\mathcal{V}}}\,\mathcal{N}(v)\}\backslash\hat{\mathcal{V}}$ . Let $\mathcal{S}_{n}$ be the set of ordered $n$ nodes on possible paths of infection, i.e., $\mathcal{S}_{n}=\{v^{n}\in\mathcal{V}^{n}:v_{i+1}\in\mathcal{B}(\{v_{1},\cdots,v_{i}\})\}$ , where $v^{n}=(v_{1},v_{2},\cdots,v_{n})$ . Since $\{\tau_{ij}\}$ are independent and these have the memoryless property, an infecting node is uniformly selected from boundary nodes at each step. Hence, we have for any $v^{n-1}\in\mathcal{S}_{n-1}$ and $v_{n}\in\mathcal{B}(\{v_{1},\cdots,v_{n-1}\})$ ,

[TABLE]

Let $(v^{(d,0)},v^{(d,1)},\cdots,v^{(d,d)})$ be the (shortest) path from the rumor source $v_{1}=v^{(d,0)}$ to $v^{(d)}=v^{(d,d)}$ . Then, for $d\geq 1$ and $k\geq d+1$ , the $k$ th infected node is $v^{(d)}$ if and only if the following event occurs for some $j_{1},j_{2},\cdots,j_{d}$ such that $2\leq j_{1}<j_{2}<\cdots<j_{d-1}<j_{d}=k$ :

[TABLE]

where $\mathcal{E}_{i}=\{\cap_{l=j_{i-1}+1}^{j_{i}-1}\{V_{l}\neq v^{(d,i)}\},V_{j_{i}}=v^{(d,i)}\}$ and $j_{0}=1$ . Hence, if $d\geq 2$ and $k\geq d+1$ , we have

[TABLE]

where (a) comes from the chain rule of the probability, and (b) comes from Appendix A.

The remaining case is that $d=1$ and $k\geq d+1\,(=2)$ . In this case, we have

[TABLE]

where (a) comes from Appendix A. Thus, by recalling that $\zeta_{k-2}^{d-1}\Big{(}\frac{1}{\delta-2}\Big{)}=1$ if $d=1$ and $k\geq 2$ , (8) implies that (7) also holds in this case.

Consequently, (7) holds for any $d\geq 1$ and $k\geq d+1$ .

IV-B Closed-Form Expression of $\Pr\{X^{\delta}(v^{(d)})=x^{\delta}|V_{k}=v^{(d)}\}$

Suppose that the $k$ th infected node is $v_{k}$ . Since we consider a regular tree, $v_{k}$ has $\delta$ neighboring nodes $\{v_{k,1},\cdots,v_{k,\delta}\}$ . Let $Y_{j}(v_{k})$ be the number of infected nodes of the subtree rooted at $v_{k,j}$ with the parent node $v_{k}$ after $v_{k}$ is infected. Let the subtree rooted at $v_{k,\delta}$ contain the rumor source. Thus, at the time that $v_{k}$ is infected, it holds that $X_{\delta}(v_{k})=k-1$ . From then on, an infecting node is uniformly selected from boundary nodes at each step. We note that $X_{j}(v_{k})=Y_{j}(v_{k})$ for all $j\in\{1,2,\cdots,\delta-1\}$ , and $X_{\delta}(v_{k})=Y_{\delta}(v_{k})+k-1$ . Then, numbers $\{Y_{j}(v_{k})\}$ are drawn according to the Pólya’s urn model with $\delta$ colors balls (cf. [3] and [10]): Initially, $b_{j}$ balls of color $C_{j}$ $(j=1,2,\cdots,\delta)$ are in the urn, where $b_{j}=1$ if $j\neq\delta$ and $b_{j}=(k-1)(\delta-2)+1$ if $j=\delta$ . At each step, a single ball is uniformly drawn form the urn. Then, the drawn ball is returned with additional $m=\delta-2$ balls of the same color. Repeat this drawing process.

$Y_{j}(v_{k})$ corresponds to the number of times that the balls of color $C_{j}$ are drawn. According to [11, Chap. 4], when the total number of drawing balls is $n-k$ , the joint distribution of $Y^{\delta}(v_{k})=(Y_{1}(v_{k}),\cdots,Y_{\delta}(v_{k}))$ is given by

[TABLE]

where $b=\sum_{j=1}^{\delta}b_{j}$ and $\sum_{j=1}^{\delta}y_{j}=n-k$ . We note that the above probability only depends on $n$ , $k$ and $\delta$ .

Now, by definition, we have

[TABLE]

IV-C Proof of Theorem 2

When $\delta=3$ , according to (7), (9) and (10), we have

[TABLE]

for any $d\geq 1$ and $k\geq d+1$ .

When $n$ is odd, we have $\Pr\{v^{(d)}\in\mathcal{V}_{n},\overline{X}(v)=n/2\}=0$ . Thus, we only consider the first term of (3). According to (7), (9) and (10), (4) can be represented as

[TABLE]

where the equality follows since

[TABLE]

Thus, we have

[TABLE]

In a similar way, we have $\mathbf{D}_{n}(d)$ for even $n$ as follows:

[TABLE]

This is because

[TABLE]

where (a) follows since

[TABLE]

and

[TABLE]

Since $\zeta_{k}^{d}(1)=\zeta_{k+1}^{d}(0)-\zeta_{k}^{d-1}(1)$ for any $d\geq 1$ and $k\geq d$ (see Appendix D), we have for any $d\geq 2$ and $k\geq d+1$ ,

[TABLE]

where $\zeta_{k-1}^{0}(0)=1$ . Note that this holds even if $d=1$ and $k\geq d+1$ . Since it holds [12, 13] that

[TABLE]

for any $l\geq 1$ and $k\geq l$ , we have for any $d\geq 1$ and $k\geq d+1$ ,

[TABLE]

where $[{k\atop l}]$ is the unsigned Stirling numbers of the first kind [14] and $s(k,l)$ is the signed Stirling numbers of the first kind [14] defined as $s(k,l)\triangleq(-1)^{k-l}\left[{k\atop l}\right]$ . Thus, we have for odd $n\geq 3$ ,

[TABLE]

and for even $n\geq 2$ ,

[TABLE]

Now, the well-known Lebesgue’s dominated convergence theorem and the fact that

[TABLE]

and

[TABLE]

implies (see a precise derivation in Appendix E)

[TABLE]

Thus, we can evaluate the probability as follows:

[TABLE]

where (a) comes from Appendix B, and (b) follows since $\sum_{l=1}^{k}s(k,l)=1$ if $k=1$ and $\sum_{l=1}^{k}s(k,l)=0$ if $k\neq 1$ . This completes the proof of Theorem 2.

IV-D Proof of Theorem 3

In this section, we denote $I_{1/2}(k-1+\frac{\delta-1}{\delta-2},\frac{1}{\delta-2})$ by $I^{(1)}(\delta,k)$ and $I_{1/2}(k-1+\frac{1}{\delta-2},\frac{\delta-1}{\delta-2})$ by $I^{(2)}(\delta,k)$ .

Let $\mathcal{E}_{j}(v^{(d)})\triangleq\{X_{j}(v^{(d)})<n/2\}$ . Due to (3), we have

[TABLE]

where (a) comes from the fact that

[TABLE]

and (b) comes from the symmetric property of $\mathcal{E}_{i}(v^{(d)})$ for all $1\leq i\leq\delta-1$ . Similarly, by letting $\mathcal{F}_{j}(v^{(d)})\triangleq\{X_{j}(v^{(d)})\leq n/2\}$ , we have

[TABLE]

where (a) comes from the fact that events $[\mathcal{F}_{1}(v^{(d)})]^{c}$ , $[\mathcal{F}_{2}(v^{(d)})]^{c}$ , $\cdots$ , $[\mathcal{F}_{\delta}(v^{(d)})]^{c}$ are disjoint.

By using the same way as in [10, Chapter III.B] (see also [7, Section 4.1.5]), we have (see a precise derivation in Appendix F)

[TABLE]

According to these equalities, (16), (17), and the dominated convergence theorem, we have (see a precise derivation in Appendix G)

[TABLE]

where $g(\delta,d,m)$ is a partial sum of (20), and the inequality comes from the fact that (according to (17), (18), and (19))

[TABLE]

On the other hand, we have

[TABLE]

where (a) comes from the fact that

[TABLE]

and (b) comes from the same (but a bit improved) inequality in [7, Sect. 4.5]. Thus, for any $M\geq m+1$ , we have

[TABLE]

Since $\lim_{M\rightarrow\infty}g(\delta,d,M)=\lim_{n\rightarrow\infty}\mathbf{D}_{n}(d)$ , we have

[TABLE]

This completes the proof of Theorem 3.

Appendix A

We have

[TABLE]

where we use the converntion that if $j_{i}=j_{i-1}+1$ ,

[TABLE]

On the other hand, we have

[TABLE]

where $\mathcal{P}_{l,i}=\{v^{l-1}\in\mathcal{S}_{l-1}:v_{j_{h}}=v^{(d,h)},\ \forall h\in\{1,\cdots,i-1\},v_{m}\neq v^{(d,i)},\ \forall m\in\{j_{i-1}+1,\cdots,l-1\}\}$ , and (a) comes from (5). Similarly, we have

[TABLE]

where

[TABLE]

By substituting (22) and (23) into (21), we have (6).

Appendix B

Let $f(u,z)$ be a double series defined as

[TABLE]

where we assume that $s(-1,l)=0$ . First of all, we show that $f(u,z)$ is absolutely convergent.

If we assume that $\left[{-1\atop l}\right]=0$ , we have

[TABLE]

where $\binom{a}{k}$ denotes the generalized binomial coefficient defined as for any real number $a\in\mathbb{R}$ ,

[TABLE]

(a) follows since

[TABLE]

and (b) comes from the Maclaurin series for $(1+z)^{a}$ which is convergent if $|z|<1$ . Since $|z(1-z)^{-u}|<\infty$ for any $u\in\mathbb{R}$ and $z\in\mathbb{R}$ such that $|z|<1$ , the above iterated series is convergent. According to [15, Proposition 212], if $u\geq 0$ and $z\in[0,1)$ , the double series is also convergent, i.e.,

[TABLE]

Since for any $u,z,k,l\geq 0$ ,

[TABLE]

we also have, according to [15, Corollary 210],

[TABLE]

Now, for any $u\in\mathbb{R}$ and $z\in\mathbb{R}$ such that $|z|<1$ , we have

[TABLE]

This means that $f(u,z)$ is absolutely convergent.

We note that, according to this fact and [15, Proposition 213], iterated series are equivalent for any $u\in\mathbb{R}$ and $z\in\mathbb{R}$ such that $|z|<1$ , i.e.,

[TABLE]

Let

[TABLE]

Since

[TABLE]

we need a closed-form expression of $\frac{1}{z}f_{l}(z)$ for $z=-\frac{1}{2}$ . To this end, we evaluate the following series:

[TABLE]

where (a) comes from (24), (b) follows since $\sum_{l=0}^{\infty}s(k,l)u^{l}=u(u-1)\cdots(u-k+1)$ , (c) comes from the fact that $\binom{u+1}{k}=1$ if $k=0$ , (d) comes from Maclaurin series with respect to $z$ which are convergent if $|z|<1$ , and (e) comes from Maclaurin series with respect to $u$ which are convergent if $|u|<1$ .

Thus, for any $z,u\in\mathbb{R}$ such that $|z|<1$ and $|u|<1$ , we have

[TABLE]

Since two power series are convergent in a neighborhood of [math], all coefficients are equal (see [16, Corollary 3.8]). This means that

[TABLE]

where $|z|<1$ . Thus, we have

[TABLE]

Especially, when $z=-\frac{1}{2}$ , we have

[TABLE]

Appendix C

In this appendix, we prove Lemma 1.

First of all, we introduce some notations. Let $R(v,\mathcal{G}_{n})$ be the rumor centrality [3] of a node $v$ in $\mathcal{G}_{n}$ , $T_{w}^{v}$ be the subtree of $\mathcal{G}_{n}$ rooted at the node $w$ with the ancestor node $v$ , and $|T_{w}^{v}|$ be the number of nodes in $T_{w}^{v}$ . Here, we assume that $T_{w}^{v}=\emptyset$ and $|T_{w}^{v}|=0$ if $w\notin\mathcal{V}(\mathcal{G}_{n})$ . We note that the ML estimator becomes (see. [3, Section II-C])

[TABLE]

Consider a sub-neighborhood $\mathcal{N}_{l}(v)\subseteq\mathcal{N}(v)$ , where $\mathcal{N}(v)$ is the set of neighboring nodes of $v$ in the graph $\mathcal{G}$ . For $v\in\mathcal{V}(\mathcal{G}_{n})$ , if $R(v,\mathcal{G}_{n})\geq R(w,\mathcal{G}_{n})$ for all $w\in\mathcal{N}_{l}(v)\cap\mathcal{V}(\mathcal{G}_{n})$ , then $v$ is called the local rumor center w.r.t. $\mathcal{N}_{l}(v)$ . For the local rumor center, we know the following properties (see. [10, Proposition 1]):

•

For a node $v\in\mathcal{V}(\mathcal{G}_{n})$ , it holds that $|T_{w}^{v}|\leq\frac{n}{2}$ for all $w\in\mathcal{N}_{l}(v)$ $\Leftrightarrow$ the node $v$ is a local rumor center w.r.t. $\mathcal{N}_{l}(v)$ .

•

A node $v\in\mathcal{V}(\mathcal{G}_{n})$ is a local rumor center w.r.t. $\mathcal{N}_{l}(v)$ $\Rightarrow$ it holds that

[TABLE]

•

A node $v\in\mathcal{V}(\mathcal{G}_{n})$ is a local rumor center w.r.t. $\mathcal{N}_{l}(v)$ $\Rightarrow$ there exists at most a node $w\in\mathcal{N}_{l}(v)$ such that

[TABLE]

where the equality holds if and only if

[TABLE]

According to these properties, for a node $v\in\mathcal{V}(\mathcal{G}_{n})$ , if it holds that $|T_{w}^{v}|\leq\frac{n}{2}$ for all $w\in\mathcal{N}(v)$ , the node $v$ is a (local) rumor center w.r.t. $\mathcal{N}(v)$ . Then, there exists at most a node $w\in\mathcal{N}(v)$ such that

[TABLE]

and

[TABLE]

where the equality holds if and only if

[TABLE]

Hence, for a node $v\in\mathcal{V}(\mathcal{G}_{n})$ , if $\overline{X}(v)<n/2$ , i.e., $\max\{|T_{w}^{v}|,w\in\mathcal{N}(v)\}<n/2$ , we have

[TABLE]

Thus, the MAP estimator outputs $v$ , and hence

[TABLE]

For a node $v\in\mathcal{V}(\mathcal{G}_{n})$ , if $\overline{X}(v)=\frac{n}{2}$ , i.e., there exists a node $w\in\mathcal{N}(v)$ such that $|T_{w}^{v}|=\frac{n}{2}$ and $|T_{w^{\prime}}^{v}|<\frac{n}{2}$ for any other $w^{\prime}\in\mathcal{N}(v)$ , we have

[TABLE]

and

[TABLE]

Thus, the MAP estimator outputs $v$ with probability $1/2$ , and hence

[TABLE]

For a node $v\in\mathcal{V}(\mathcal{G}_{n})$ , if $\overline{X}(v)>n/2$ , i.e., $\max\{|T_{w}^{v}|,w\in\mathcal{N}(v)\}>n/2$ , the node $v$ is not a local rumor center w.r.t. $\mathcal{N}(v)$ . Hence there exists $w\in\mathcal{N}(v)$ such that

[TABLE]

Then, the MAP estimator does not output $v$ , and hence

[TABLE]

This completes the proof.

Appendix D

We note that

[TABLE]

and

[TABLE]

Thus, for any $d\geq 1$ and $k\geq d$ , we have

[TABLE]

where $\zeta_{k}^{0}(1)=1$ .

Appendix E

In order to show the equation (15), we use the next lemma (cf. e.g. [17]).

Lemma 2 (Dominated convergence theorem)

Let $f_{1},f_{2},\cdots:\mathbb{N}\to\mathbb{R}$ be a sequence of real-valued functions on positive integers $\mathbb{N}$ such that

[TABLE]

Suppose that there is $g:\mathbb{N}\to\mathbb{R}$ such that

[TABLE]

Then, we have

[TABLE]

We note that

[TABLE]

where (a) follows since $\Pr\{V_{k}=v^{(d)}\}=0$ for any $k\leq d$ , and (b) comes from the fact that if $v^{(d)}$ is the $k$ th infected node ( $k\geq\lceil n/2\rceil+1$ ), it must hold that $\overline{X}(v^{(d)})\geq n/2$ . We also note that

[TABLE]

where (a) comes from the fact that if $v^{(d)}$ is the $k$ th infected node ( $k\geq\lfloor n/2\rfloor+2$ ), it must hold that $X(v^{(d)})>n/2$ . Thus, we have

[TABLE]

By noticing that $\Pr\{V_{k}=v^{(d)}\}$ does not depend on $n$ (see (7)), we can set

[TABLE]

Then, we have

[TABLE]

We also have

[TABLE]

On the other hand, according to (13) and (14), we have for any $k\geq d+1$ and odd $n\geq 3$ ,

[TABLE]

and for any $k\geq d+1$ and even $n\geq 2$ ,

[TABLE]

By noticing that

[TABLE]

and

[TABLE]

we have for any $k\geq d+1$ ,

[TABLE]

We note that for any $k\leq d$ ,

[TABLE]

Thus, according to Lemma 2, we have

[TABLE]

By noticing that $|\mathcal{V}^{(d)}|=\delta(\delta-1)^{d-1}$ , we have (15) from (25).

Appendix F

We consider the Pólya’s urn model with $2$ colors balls: Initially, $b_{j}$ balls of color $C_{j}$ $(j=1,2)$ are in the urn. At each step, a single ball is uniformly drawn form the urn. Then, the drawn ball is returned with additional $m$ balls of the same color. Repeat this drawing process $n$ times. Let $\tilde{Y}_{j}$ denote the number of balls of the color $C_{j}$ in the urn at the end of time $n$ . Let $Y_{j}$ denote the number of times that the balls of color $C_{j}$ are drawn after $n$ draws.

According to [7, Theorem 4.1], we have the next theorem.

Theorem 4

[TABLE]

where $b_{1}+b_{2}+n\cdot m$ is the total number of balls in the urn at the end of time $n$ , and $Y$ is a Beta random variable with parameters $b_{1}/m$ and $b_{2}/m$ . That is for $x\in[0,1]$ ,

[TABLE]

We immediately have the next corollary.

Corollary 2

[TABLE]

where $Y$ is the same Beta random variable as that of Theorem 4.

Proof:

$Y_{1}$ can be written as

[TABLE]

Thus, we have

[TABLE]

Since $\frac{b_{1}+b_{2}+n\cdot m}{m\cdot n}\to 1$ and $\frac{b_{1}}{m\cdot n}\to 0$ as $n\to\infty$ , we have

[TABLE]

where almost sure convergence comes from Theorem 4. This completes the proof. ∎

After $v^{(d)}$ is infected $k$ th, $X_{1}(v^{(d)})$ can be regarded as the Pólya’s urn model with the following settings: $Y_{1}=X_{1}(v^{(d)})$ , $Y_{2}=\sum_{j=2}^{\delta}X_{j}(v^{(d)})-k+1$ , $b_{1}=1$ , $b_{2}=(k-1)(\delta-2)+\delta-1$ , and $m=\delta-2$ . Here, we assume that the total number of drawing balls is $n-k$ . Then, according to Corollary 2, we have

[TABLE]

where $Y$ is a Beta random variable with parameters $1/(\delta-2)$ and $k-1+(\delta-1)/(\delta-2)$ . Thus, we have

[TABLE]

Similarly, we have

[TABLE]

Due to (26) and (27), we have (18).

On the other hand, after $v^{(d)}$ is infected $k$ th, $X_{\delta}(v^{(d)})$ can be regarded as the Pólya’s urn model with the following settings: $Y_{1}=X_{\delta}(v^{(d)})-k+1$ , $Y_{2}=\sum_{j=1}^{\delta-1}X_{j}(v^{(d)})$ , $b_{1}=(k-1)(\delta-2)+1$ , $b_{2}=\delta-1$ , and $m=\delta-2$ . Here, we assume that the total number of drawing balls is $n-k$ . Then, according to Corollary 2, we have

[TABLE]

where $Y$ is a Beta random variable with parameters $k-1+1/(\delta-2)$ and $(\delta-1)/(\delta-2)$ . Thus, we have

[TABLE]

Similarly, we have

[TABLE]

Due to (28) and (29), we have (19).

Appendix G

According to (16) and (17), it holds that

[TABLE]

and

[TABLE]

For $k\leq\lceil n/2\rceil$ , we set

[TABLE]

For $k\geq\lceil n/2\rceil+1$ , we set $f_{n}(k)=0$ . According to (18) and (19), we have for any $k\geq d+1$ ,

[TABLE]

and for any $k\leq d$ ,

[TABLE]

On the other hand, we set

[TABLE]

Obviously, for $k\geq\lceil n/2\rceil+1$ , it holds that $|f_{n}(k)|\leq g(k)$ . For $k\leq\lceil n/2\rceil$ , we have

[TABLE]

We also have

[TABLE]

Thus, according to Lemma 2, we have

[TABLE]

For $k\leq\lfloor n/2\rfloor+1$ , we set

[TABLE]

For $k\geq\lfloor n/2\rfloor+2$ , we set $h_{n}(k)=0$ . According to (18) and (19), we have for any $k\geq d+1$ ,

[TABLE]

and for any $k\leq d$ ,

[TABLE]

We also have $|h_{n}(k)|\leq g(k)$ .

Thus, according to Lemma 2, we have

[TABLE]

By noticing that $|\mathcal{V}^{(d)}|=\delta(\delta-1)^{d-1}$ , we have (20) from (30) and (31).

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. Matsuta and T. Uyematsu, “Probability distributions of the distance between the rumor source and its estimation on regular trees,” in Proc. 37th Symp. on Inf. Theory and its Apps. , Dec. 2014, pp. 605–610.
2[2] N. T. J. Bailey, The Mathematical Theory of Infectious Diseases and Its Applications . Charles Griffin & Company Ltd., 1975.
3[3] D. Shah and T. Zaman, “Rumors in a network: Who’s the culprit?” IEEE Trans. Inf. Theory , vol. 57, no. 8, pp. 5163–5181, Aug. 2011.
4[4] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free networks,” Phys. Rev. Lett. , vol. 86, pp. 3200–3203, Apr. 2001.
5[5] R. M. May and A. L. Lloyd, “Infection dynamics on scale-free networks,” Phys. Rev. E , vol. 64, p. 066112, Nov. 2001.
6[6] J. Khim and P. Loh, “Confidence sets for the source of a diffusion in regular trees,” IEEE Trans. on Netw. Sci. Eng. , vol. 4, no. 1, pp. 27–40, Jan 2017.
7[7] D. Shah and T. Zaman, “Finding rumor sources on random trees,” Operations Research , vol. 64, no. 3, pp. 736–755, Feb. 2016.
8[8] M. R. Murty and K. Sinha, “Multiple Hurwitz zeta functions,” Proc. Symp. in Pure Math. , vol. 75, pp. 135–156, 2006.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the Distance Between the Rumor Source

Abstract

I Introduction

II Rumor Source Identification Problem

Theorem 1** ([7, Theorem 3.1])**

III Main Results

Theorem 2

Theorem 3

Corollary 1

Proof:

IV Proofs of Theorems

Lemma 1

IV-A Closed-Form Expression of Pr⁡{Vk=v(d)}\Pr\{V_{k}=v^{(d)}\}Pr{Vk​=v(d)}

IV-B Closed-Form Expression of Pr⁡{Xδ(v(d))=xδ∣Vk=v(d)}\Pr\{X^{\delta}(v^{(d)})=x^{\delta}|V_{k}=v^{(d)}\}Pr{Xδ(v(d))=xδ∣Vk​=v(d)}

IV-C Proof of Theorem 2

IV-D Proof of Theorem 3

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Lemma 2** (Dominated convergence theorem)**

Appendix F

Theorem 4

Corollary 2

Proof:

Appendix G

Theorem 1 ([7, Theorem 3.1])

IV-A Closed-Form Expression of $\Pr\{V_{k}=v^{(d)}\}$

IV-B Closed-Form Expression of $\Pr\{X^{\delta}(v^{(d)})=x^{\delta}|V_{k}=v^{(d)}\}$

Lemma 2 (Dominated convergence theorem)