On the Distance Between the Rumor Source and Its Optimal Estimate in a Regular Tree
Tetsunao Matsuta, Tomohiko Uyematsu

TL;DR
This paper analyzes the accuracy of rumor source detection in regular trees, showing that the estimated source is typically within three edges of the true origin with high probability.
Contribution
It provides a probabilistic analysis of the distance between the true rumor source and its optimal estimate in regular tree networks.
Findings
The estimated rumor source is within distance 3 of the true source with high probability.
The probability distribution of the distance between the true source and the estimate is characterized.
The analysis applies specifically to regular, cycle-free tree networks.
Abstract
This paper addresses the rumor source identification problem, where the goal is to find the origin node of a rumor in a network among a given set of nodes with the rumor. In this paper, we focus on a network represented by a regular tree which does not have any cycle and in which all nodes have the same number of edges connected to a node. For this network, we clarify that, with quite high probability, the origin node is within the distance 3 from the node selected by the optimal estimator, where the distance is the number of edges of the unique path connecting two nodes. This is clarified by the probability distribution of the distance between the origin and the selected node.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the Distance Between the Rumor Source
and Its Optimal Estimate in a Regular Tree††thanks: An earlier version was presented at SITA2014 [1]. In this paper, we improved notations, added Corollary 1, revised proofs, and corrected the bound of Theorem 3 and many errors.
Tetsunao Matsuta1 and Tomohiko Uyematsu2
Department of Information and Communications Engineering, Tokyo Institute of Technology
Email: [email protected], [email protected]
Abstract
This paper addresses the rumor source identification problem, where the goal is to find the origin node of a rumor in a network among a given set of nodes with the rumor. In this paper, we focus on a network represented by a regular tree which does not have any cycle and in which all nodes have the same number of edges connected to a node. For this network, we clarify that, with quite high probability, the origin node is within the distance “” from the node selected by the optimal estimator, where the distance is the number of edges of the unique path connecting two nodes. This is clarified by the probability distribution of the distance between the origin and the selected node.
I Introduction
In social networks, a rumor spreads like an infectious disease. In fact, it can be modeled as an infectious disease [2, 3]. The most common theme of studies about a rumor (or infectious disease) is to analyze mechanisms of a spreading behavior of a rumor in a given network [4, 5].
Unlike this type of studies, we address the rumor source identification problem introduced by Shah and Zaman [3]. The goal of this problem is to find the origin node of a rumor (rumor source) in a network among a given set of nodes with the rumor. If the rumor source can be detected, it is available to find a weak node which spreads a computer virus, to give ranking to websites for a search engine, etc. For this problem, Shah and Zaman [3] introduced the optimal estimator and analyzed the correct detection probability of it for some types of networks. This probability asymptotically goes to one for a very special network called geometric tree (see [3, Sec. IV.D]). However, they analytically or experimentally showed that the probability is asymptotically not high or goes to zero for many other networks such as regular trees, small-world networks, and scale-free networks, where a regular tree is a network which does not have any cycle and in which all nodes have the same degree, i.e, the number of edges connected to a node.
Although the optimal estimator may not find the rumor source, it actually selects a node near the rumor source. This fact is known experimentally (cf. [3, Sect. V.B] and [6, Sect. 8]) and is not known analytically to the best of our knowledge. In this paper, we focus on this fact and clarify it analytically. Especially, we focus on regular trees and clarify that, with quite high probability, the rumor source is within the distance “” from the node selected by the optimal estimator, where the distance is the number of edges of the unique path connecting two nodes. This is clarified by the probability distribution of the distance between the rumor source and the selected node.
II Rumor Source Identification Problem
In this section, we introduce the rumor source identification problem and show some known results of this problem.
Let be an undirected and connected graph. Let denote the set of nodes and denote the set of edges of the graph . We denote the edge connecting two nodes by the set of nodes . In this paper, we consider the case where is a regular tree, that is, the graph does not have any cycle, and all nodes have the same degree222The line graph () is not concerned in this paper because this case is somewhat difficult to treat in a unified manner. However, essential argument for this case is the same as the case where . . We assume that the number of nodes is countably infinite in order to avoid boundary effects.
A rumor spreads in a given regular tree . Initially, the only one node (the rumor source) possesses a rumor. The node possessing the rumor infects it to connected adjacent nodes, and these nodes keep it forever. For , let be a real-valued random variable (RV) that represents the rumor spreading time from the node to the node after gets the rumor. In this model, spreading times are independent and drawn according to the exponential distribution with the unit mean. Thus, the cumulative distribution function of is represented as if , and if . This spreading model is sometimes called the susceptible-infected (SI) model [3].
Suppose that we observe a network consisted of infected nodes in the graph at some time. Since the rumor spreads to the connected adjacent nodes, this network is a connected subgraph of . We denote the RV of this network by and its realization as . We only know an observed network and do not know the realization of spreading times on edges. Then, the goal of the rumor source identification problem is to find the rumor source among given .
For this problem, the optimal estimator is the maximum likelihood (ML) estimator (cf. [3]) defined as
[TABLE]
where ties broken uniformly at random and is the probability observing under the SI model assuming is the rumor source. For this optimal estimator, let be the correct detection probability when a graph of infected nodes is observed, i.e., . Shah and Zaman [7] showed the asymptotic behavior of as the next theorem.
Theorem 1** ([7, Theorem 3.1])**
For a regular tree with degree , it holds that
[TABLE]
where is the regularized incomplete beta function defined as and is the Gamma function.
According to this theorem, when , . Moreover, it rapidly converges to as goes to infinity (cf. [7, Corollary 1 and Figure 3]). This means that, unfortunately, the correct detection probability is not very high for regular trees.
III Main Results
In this section, we show that the ML estimator can select a node near the rumor source with high probability.
To this end, we clarify the probability distribution of the distance between the rumor source and the node selected by the ML estimator. We denote this probability by and define it as
[TABLE]
where and denotes the distance between nodes and in the graph . Note that .
When , we can clarify a closed-form expression of the asymptotic behavior of as the next theorem.
Theorem 2
Let . Then, for any , we have
[TABLE]
where
[TABLE]
We denote the rising factorial by . The next theorem gives tight upper and lower bounds of for more general degrees.
Theorem 3
For any , , and , we have
[TABLE]
where ,
[TABLE]
, and for any .
is a partial sum of the multiple Hurwitz zeta function (cf. e.g. [8]) or the shifted multiple harmonic sums (cf. e.g. [9]). We note that the difference of bounds (i.e., ) does not depend on degrees.
These theorems imply that the ML estimator can select a node near the rumor source with high probability. This is clear from the next corollary and its numerical results (Fig. 1).
Corollary 1
Let . Then, for any , we have
[TABLE]
More generally, for any , , and , we have
[TABLE]
Here, and denote the right-hand side of (1).
Proof:
By noticing that , the corollary is immediately obtained by Theorems 1-3. ∎
Since , Fig. 1 gives almost exact numerical results of . We note that numerical results for other degrees are almost the same (see Fig. 2). Thus, these results show that the rumor source is within the distance from the node selected by the ML estimator with quite high probability. We note that Khim and Loh [6, Corollary 2] gave another lower bound of . However, it is quite looser than our bound and is zero at least values of parameters and are within the rage in Fig. 1 and Fig. 2.
IV Proofs of Theorems
In this section, we prove our main theorems. We will denote -length sequences of RVs and its realizations by and , respectively. For the sake of brevity, we denote by and by .
For any node in a regular tree with degree , there are neighbors. Thus, there are subtrees rooted at these neighbors with the parent node . In other words, the regular tree is divided into these subtrees and the node . Let be the number of infected nodes in the th subtree among those subtrees (). When is not the rumor source, let th subtree contain the rumor source . Note that, if is an infected node, we have . The next lemma is a key lemma to prove our main theorems.
Lemma 1
For a node , let . Then, we have
[TABLE]
Since this lemma can be obtained by [10, Proposition 1] (see also [10, Lemma 6]), we prove this in Appendix C.
We denote the set of nodes with distance from the rumor source by . Note that the number of elements of is . Then, can be represented as
[TABLE]
where the last equality comes from Lemma 1.
On the other hand, let be the sequence of RVs each representing th infected node, where with probability 1. Then, we have . This implies that the event is equal to the event . Hence, we have
[TABLE]
where . We also have
[TABLE]
Thus, we need to obtain closed-form expressions of and .
IV-A Closed-Form Expression of
Let be the set of neighboring nodes of in the graph . Suppose that the set of nodes are infected with a rumor, and any other nodes are not infected. Then, we denote the set of boundary nodes which may be infected by the infected nodes by , i.e., . Let be the set of ordered nodes on possible paths of infection, i.e., , where . Since are independent and these have the memoryless property, an infecting node is uniformly selected from boundary nodes at each step. Hence, we have for any and ,
[TABLE]
Let be the (shortest) path from the rumor source to . Then, for and , the th infected node is if and only if the following event occurs for some such that :
[TABLE]
where and . Hence, if and , we have
[TABLE]
where (a) comes from the chain rule of the probability, and (b) comes from Appendix A.
The remaining case is that and . In this case, we have
[TABLE]
where (a) comes from Appendix A. Thus, by recalling that \zeta_{k-2}^{d-1}\Big{(}\frac{1}{\delta-2}\Big{)}=1 if and , (8) implies that (7) also holds in this case.
Consequently, (7) holds for any and .
IV-B Closed-Form Expression of
Suppose that the th infected node is . Since we consider a regular tree, has neighboring nodes . Let be the number of infected nodes of the subtree rooted at with the parent node after is infected. Let the subtree rooted at contain the rumor source. Thus, at the time that is infected, it holds that . From then on, an infecting node is uniformly selected from boundary nodes at each step. We note that for all , and . Then, numbers are drawn according to the Pólya’s urn model with colors balls (cf. [3] and [10]): Initially, balls of color are in the urn, where if and if . At each step, a single ball is uniformly drawn form the urn. Then, the drawn ball is returned with additional balls of the same color. Repeat this drawing process.
corresponds to the number of times that the balls of color are drawn. According to [11, Chap. 4], when the total number of drawing balls is , the joint distribution of is given by
[TABLE]
where and . We note that the above probability only depends on , and .
Now, by definition, we have
[TABLE]
IV-C Proof of Theorem 2
When , according to (7), (9) and (10), we have
[TABLE]
for any and .
When is odd, we have . Thus, we only consider the first term of (3). According to (7), (9) and (10), (4) can be represented as
[TABLE]
where the equality follows since
[TABLE]
Thus, we have
[TABLE]
In a similar way, we have for even as follows:
[TABLE]
This is because
[TABLE]
where (a) follows since
[TABLE]
and
[TABLE]
Since for any and (see Appendix D), we have for any and ,
[TABLE]
where . Note that this holds even if and . Since it holds [12, 13] that
[TABLE]
for any and , we have for any and ,
[TABLE]
where is the unsigned Stirling numbers of the first kind [14] and is the signed Stirling numbers of the first kind [14] defined as . Thus, we have for odd ,
[TABLE]
and for even ,
[TABLE]
Now, the well-known Lebesgue’s dominated convergence theorem and the fact that
[TABLE]
and
[TABLE]
implies (see a precise derivation in Appendix E)
[TABLE]
Thus, we can evaluate the probability as follows:
[TABLE]
where (a) comes from Appendix B, and (b) follows since if and if . This completes the proof of Theorem 2.
IV-D Proof of Theorem 3
In this section, we denote by and by .
Let . Due to (3), we have
[TABLE]
where (a) comes from the fact that
[TABLE]
and (b) comes from the symmetric property of for all . Similarly, by letting , we have
[TABLE]
where (a) comes from the fact that events , , , are disjoint.
By using the same way as in [10, Chapter III.B] (see also [7, Section 4.1.5]), we have (see a precise derivation in Appendix F)
[TABLE]
According to these equalities, (16), (17), and the dominated convergence theorem, we have (see a precise derivation in Appendix G)
[TABLE]
where is a partial sum of (20), and the inequality comes from the fact that (according to (17), (18), and (19))
[TABLE]
On the other hand, we have
[TABLE]
where (a) comes from the fact that
[TABLE]
and (b) comes from the same (but a bit improved) inequality in [7, Sect. 4.5]. Thus, for any , we have
[TABLE]
Since , we have
[TABLE]
This completes the proof of Theorem 3.
Appendix A
We have
[TABLE]
where we use the converntion that if ,
[TABLE]
On the other hand, we have
[TABLE]
where , and (a) comes from (5). Similarly, we have
[TABLE]
where
[TABLE]
By substituting (22) and (23) into (21), we have (6).
Appendix B
Let be a double series defined as
[TABLE]
where we assume that . First of all, we show that is absolutely convergent.
If we assume that , we have
[TABLE]
where denotes the generalized binomial coefficient defined as for any real number ,
[TABLE]
(a) follows since
[TABLE]
and (b) comes from the Maclaurin series for which is convergent if . Since for any and such that , the above iterated series is convergent. According to [15, Proposition 212], if and , the double series is also convergent, i.e.,
[TABLE]
Since for any ,
[TABLE]
we also have, according to [15, Corollary 210],
[TABLE]
Now, for any and such that , we have
[TABLE]
This means that is absolutely convergent.
We note that, according to this fact and [15, Proposition 213], iterated series are equivalent for any and such that , i.e.,
[TABLE]
Let
[TABLE]
Since
[TABLE]
we need a closed-form expression of for . To this end, we evaluate the following series:
[TABLE]
where (a) comes from (24), (b) follows since , (c) comes from the fact that if , (d) comes from Maclaurin series with respect to which are convergent if , and (e) comes from Maclaurin series with respect to which are convergent if .
Thus, for any such that and , we have
[TABLE]
Since two power series are convergent in a neighborhood of [math], all coefficients are equal (see [16, Corollary 3.8]). This means that
[TABLE]
where . Thus, we have
[TABLE]
Especially, when , we have
[TABLE]
Appendix C
In this appendix, we prove Lemma 1.
First of all, we introduce some notations. Let be the rumor centrality [3] of a node in , be the subtree of rooted at the node with the ancestor node , and be the number of nodes in . Here, we assume that and if . We note that the ML estimator becomes (see. [3, Section II-C])
[TABLE]
Consider a sub-neighborhood , where is the set of neighboring nodes of in the graph . For , if for all , then is called the local rumor center w.r.t. . For the local rumor center, we know the following properties (see. [10, Proposition 1]):
- •
For a node , it holds that for all the node is a local rumor center w.r.t. .
- •
A node is a local rumor center w.r.t. it holds that
[TABLE]
- •
A node is a local rumor center w.r.t. there exists at most a node such that
[TABLE]
where the equality holds if and only if
[TABLE]
According to these properties, for a node , if it holds that for all , the node is a (local) rumor center w.r.t. . Then, there exists at most a node such that
[TABLE]
and
[TABLE]
where the equality holds if and only if
[TABLE]
Hence, for a node , if , i.e., , we have
[TABLE]
Thus, the MAP estimator outputs , and hence
[TABLE]
For a node , if , i.e., there exists a node such that and for any other , we have
[TABLE]
and
[TABLE]
Thus, the MAP estimator outputs with probability , and hence
[TABLE]
For a node , if , i.e., , the node is not a local rumor center w.r.t. . Hence there exists such that
[TABLE]
Then, the MAP estimator does not output , and hence
[TABLE]
This completes the proof.
Appendix D
We note that
[TABLE]
and
[TABLE]
Thus, for any and , we have
[TABLE]
where .
Appendix E
In order to show the equation (15), we use the next lemma (cf. e.g. [17]).
Lemma 2** (Dominated convergence theorem)**
Let be a sequence of real-valued functions on positive integers such that
[TABLE]
Suppose that there is such that
[TABLE]
Then, we have
[TABLE]
We note that
[TABLE]
where (a) follows since for any , and (b) comes from the fact that if is the th infected node (), it must hold that . We also note that
[TABLE]
where (a) comes from the fact that if is the th infected node (), it must hold that . Thus, we have
[TABLE]
By noticing that does not depend on (see (7)), we can set
[TABLE]
Then, we have
[TABLE]
We also have
[TABLE]
On the other hand, according to (13) and (14), we have for any and odd ,
[TABLE]
and for any and even ,
[TABLE]
By noticing that
[TABLE]
and
[TABLE]
we have for any ,
[TABLE]
We note that for any ,
[TABLE]
Thus, according to Lemma 2, we have
[TABLE]
By noticing that , we have (15) from (25).
Appendix F
We consider the Pólya’s urn model with colors balls: Initially, balls of color are in the urn. At each step, a single ball is uniformly drawn form the urn. Then, the drawn ball is returned with additional balls of the same color. Repeat this drawing process times. Let denote the number of balls of the color in the urn at the end of time . Let denote the number of times that the balls of color are drawn after draws.
According to [7, Theorem 4.1], we have the next theorem.
Theorem 4
[TABLE]
where is the total number of balls in the urn at the end of time , and is a Beta random variable with parameters and . That is for ,
[TABLE]
We immediately have the next corollary.
Corollary 2
[TABLE]
where is the same Beta random variable as that of Theorem 4.
Proof:
can be written as
[TABLE]
Thus, we have
[TABLE]
Since and as , we have
[TABLE]
where almost sure convergence comes from Theorem 4. This completes the proof. ∎
After is infected th, can be regarded as the Pólya’s urn model with the following settings: , , , , and . Here, we assume that the total number of drawing balls is . Then, according to Corollary 2, we have
[TABLE]
where is a Beta random variable with parameters and . Thus, we have
[TABLE]
Similarly, we have
[TABLE]
Due to (26) and (27), we have (18).
On the other hand, after is infected th, can be regarded as the Pólya’s urn model with the following settings: , , , , and . Here, we assume that the total number of drawing balls is . Then, according to Corollary 2, we have
[TABLE]
where is a Beta random variable with parameters and . Thus, we have
[TABLE]
Similarly, we have
[TABLE]
Due to (28) and (29), we have (19).
Appendix G
According to (16) and (17), it holds that
[TABLE]
and
[TABLE]
For , we set
[TABLE]
For , we set . According to (18) and (19), we have for any ,
[TABLE]
and for any ,
[TABLE]
On the other hand, we set
[TABLE]
Obviously, for , it holds that . For , we have
[TABLE]
We also have
[TABLE]
Thus, according to Lemma 2, we have
[TABLE]
For , we set
[TABLE]
For , we set . According to (18) and (19), we have for any ,
[TABLE]
and for any ,
[TABLE]
We also have .
Thus, according to Lemma 2, we have
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. Matsuta and T. Uyematsu, “Probability distributions of the distance between the rumor source and its estimation on regular trees,” in Proc. 37th Symp. on Inf. Theory and its Apps. , Dec. 2014, pp. 605–610.
- 2[2] N. T. J. Bailey, The Mathematical Theory of Infectious Diseases and Its Applications . Charles Griffin & Company Ltd., 1975.
- 3[3] D. Shah and T. Zaman, “Rumors in a network: Who’s the culprit?” IEEE Trans. Inf. Theory , vol. 57, no. 8, pp. 5163–5181, Aug. 2011.
- 4[4] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free networks,” Phys. Rev. Lett. , vol. 86, pp. 3200–3203, Apr. 2001.
- 5[5] R. M. May and A. L. Lloyd, “Infection dynamics on scale-free networks,” Phys. Rev. E , vol. 64, p. 066112, Nov. 2001.
- 6[6] J. Khim and P. Loh, “Confidence sets for the source of a diffusion in regular trees,” IEEE Trans. on Netw. Sci. Eng. , vol. 4, no. 1, pp. 27–40, Jan 2017.
- 7[7] D. Shah and T. Zaman, “Finding rumor sources on random trees,” Operations Research , vol. 64, no. 3, pp. 736–755, Feb. 2016.
- 8[8] M. R. Murty and K. Sinha, “Multiple Hurwitz zeta functions,” Proc. Symp. in Pure Math. , vol. 75, pp. 135–156, 2006.
