Growth of Common Friends in a Preferential Attachment Model
Bikramjit Das, Souvik Ghosh

TL;DR
This paper analyzes how the number of common friends grows in a preferential attachment model, revealing different growth regimes and providing estimates relevant for social network analysis.
Contribution
It derives the growth rate of common friends in a linear preferential attachment model and identifies phase transitions in their limiting behavior.
Findings
Growth rate of common friends varies with model parameters
Identifies power-law, logarithmic, and static growth regimes
Provides estimates for common friends in social networks
Abstract
The number of common friends (or connections) in a graph is a commonly used measure of proximity between two nodes. Such measures are used in link prediction algorithms and recommendation systems in large online social networks. We obtain the rate of growth of the number of common friends in a linear preferential attachment model. We apply our result to develop an estimate for the number of common friends. We also observe a phase transition in the limiting behavior of the number of common friends; depending on the range of the parameters of the model, the growth is either power-law, or, logarithmic, or static with the size of the graph.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Growth of Common Friends in a Preferential Attachment Model
Bikramjit Daslabel=e1][email protected] [
Souvik Ghoshlabel=e2][email protected] [ Singapore University of Technology and Design\thanksmarkm1 and LinkedIn\thanksmarkm2
Singapore University of Technology and Design
20 Dover Drive, Singapore 138682
LinkedIn Corporation, 700 E. Middlefield Road,
Mountain View, CA 94043, USA
Abstract
The number of common friends (or connections) in a graph is a commonly used measure of proximity between two nodes. Such measures are used in link prediction algorithms and recommendation systems in large online social networks. We obtain the rate of growth of the number of common friends in a linear preferential attachment model. We apply our result to develop an estimate for the number of common friends. We also observe a phase transition in the limiting behavior of the number of common friends; depending on the range of the parameters of the model, the growth is either power-law, or, logarithmic, or static with the size of the graph.
60F15,
60G42,
90B15,
91D30,
heavy-tail,
limit theorem,
link prediction,
preferential attachment,
social network,
keywords:
[class=AMS]
keywords:
\setattribute
journalname
and
T1The authors gratefully acknowledge support from MOE Tier 2 grant MOE2017-T2-2-161.
1 Introduction
Networks platforms like LinkedIn, Facebook, Instagram and Twitter form a big part of our culture. These networks have facilitated an increasing number of personal as well as professional interactions. The networking platforms strive to grow the network (graph) both in terms of the number of users (nodes) and the number of friendships or connections (edges) since a more densely connected user network typically results in a more engaged user base. The platforms often use recommendation systems like People You May Know (LinkedIn, Facebook) or Who to Follow (Twitter) (Gupta et al., 2013), that recommend individual users to connect with other users on the platform. Such recommendation systems look for signals that indicate that two individuals might know each other. For example, having a common friend between two users is a signal that they know each other. Furthermore, if two users have many friends in common then there is a high chance that they know each other. A generalization of this problem is that of link prediction in a network and is well-studied in the literature (Liben-Nowell and Kleinberg, 2007).
In this paper we establish the rate of growth of common friends for a fixed pair of nodes in a linear preferential attachment model, a commonly used generative graph model. The preferential attachment model, made popular in Barabási and Albert (1999), is a very well studied class of graph models. Studies have covered the behavior of degree sequence (Bollobás et al., 2001, Samorodnitsky et al., 2016, Resnick and Samorodnitsky, 2015), the maximal degrees in a graph, second-order degree sequences (size of network of friends of friends) (van der Hofstad, 2017, Section 8), generalizations to sublinear preferential attachment (Dereich and Mörters, 2009) and limiting structure of networks (Elwes, 2016). Other generalizations and extensions of these scale-free models have been studied in Cooper and Frieze (2003), Bollobás et al. (2003). See van der Hofstad (2017) for a nice overview and proper definitions of the models. To the best of our knowledge this is the first theoretical study of the number of common friends in a graph.
Two important observations follow from our result:
- •
There is a phase transition in the asymptotic behavior of common friends. Depending on parameter values of the preferential attachment model, the number of common friends can exhibit a power-law or logarithmic growth or be static with the growth of the graph.
- •
A corollary of our result is that we can use sampling techniques to estimate the common friends in a large network. This is helpful because computing the number of common friends for every pair of nodes in a graph is computationally expensive, especially for large networks with hundreds of millions of nodes and hundreds of billions of edges.
This paper is organized as follows. In Section 2 we describe the linear preferential attachment model we work with and state the main result. In Section 3 we show some simulated results providing intuition for our results. We provide the proof of the main result and some required supplementary results in Section 4. We conclude indicating future direction of work in Section 5.
2 Growth of Common Friends: Main Result
The model paradigm we work with is a version of the well-known undirected linear preferential attachment graph. The idea is that at every time instance when a new node comes to the network it creates independent edges and attaches to the previous nodes following a preferential attachment rule. The process is described as follows:
At any time , the graph sequence is denoted where and . Initially, the graph has one node with self-loops. Then evolves to thus: at the stage, a new node named is added along with edges each of which has as one of its vertices, and the other vertex is selected from with probability proportional to the degree of the vertex (shifted by a parameter ) in . For :
[TABLE]
Here degree of in . The evolution of occurs as:
[TABLE]
where the number of stubs of (out of ) which attaches to . Moreover at any stage , for , call
[TABLE]
We ignore multi-edges when counting in the graph , that is, counts as one common friend (vertex) between and in for , if and regardless of their multiplicity. Our goal is to understand the behavior of for as becomes large. Observe that in our model the growth of occurs via the recurrence relation
[TABLE]
where is the event . Also, note that the possible range of parameters is and .
The power-law growth behavior for the degree distribution of a specific node in a linear preferential attachment model is well-known Bollobás et al. (2001),(van der Hofstad, 2017, Section 8).
Proposition 2.1**.**
For any fixed node , we have
[TABLE]
where , and is a non-negative random variable with .
Proposition 2.1 can be derived using arguments from (van der Hofstad, 2017, Proposition 8.2) or using similar arguments as in the proof of the Proposition 4.4 provided in Section 4.
Our main contribution is the following theorem which provides the growth rate of number of common friends of two nodes in such a model. The proof is given in Section 4.
Theorem 2.2**.**
Under the linear preferential attachment model, , with , for any two fixed nodes , we have
[TABLE]
where , , . Furthermore, and with ; is the limit of the scaled degree sequence of node as defined in Proposition 2.1.
Remark 2.3**.**
An interesting observation is the different regimes in the growth rate of number of common friends depending on the parameter .
When , we are in a regime that is mildly preferential attachment. In this regime, the nodes with low degree also get enough number of new friends. As increases, more nodes have a similar chance of being selected. Although the individual degrees for a fixed node grows like a power-law behavior, the number of common friends between two fixed nodes has a finite expectation even in the limit. 2. 2.
For and especially closer to , the new nodes prefer to friend nodes with a high degree. In this case the number of common friends tend to grow with the number of nodes, as a power-law for and at a logarithmic rate for .
Corollary 2.4**.**
Under the preferential attachment model, , with , for any two fixed nodes and , we have
[TABLE]
Proof.
The result is an easy application of the almost sure convergences of observed in Theorem 2.2. ∎
The above corollary states that we can consistently estimate the number of common friends for a given pair of nodes using an earlier state of the graph, i.e., for any
[TABLE]
For a large value of , the graph can be significantly smaller than and it is significantly cheaper to estimate the number of common friends.
3 Simulation Study
We illustrate the key idea behind Theorem 2.2 in the following simulated examples. Figure 1 shows instances of graphs simulated from the preferential attachment model with 20 nodes and ; left one with and the right one with . When we observe that the graph grows quite preferentially. New nodes tend to connect with the same few nodes and hence the number of common friends for them keep growing fast. When , the graph is more distributed. New nodes tend to connect with different nodes and hence the number of common friends does not grow so much.
To understand asymptotic property of the behavior of common friends, we simulate larger graphs and replicate the exercise multiple times. The left plot in Figure 2 is the histogram of number of common friends for two fixed nodes ( and ) for replicated 2500 times shows the heavy-tailed phenomenon. We also show trajectories of the number common friends for 5 arbitrary simulations as the size of the network grows in the right plot of Figure 2.
Figure 3 shows the behavior of common friends when . As expected, we see that common friends do not grow that fast in this case.
We also check the validity of Corollary 2.4 using simulations. Figure 4 provides simulation results for the estimator for a model. The first row shows the histogram of 500 simulations of for and . The second row shows the same for . We see the concentration of near 1 as increases.
4 Proof of Main Result
In this section, we prove Theorem 2.2 by observing the asymptotic behavior of , jointly and functions thereof. Recall that our preferential attachment graph sequence is . We assume that and for all the results in this section. For convenience’s sake we use the following notations:
[TABLE]
From Proposition 2.1 we get
[TABLE]
where .
In the next few steps we apply Martingale Convergence Theorem to show almost sure convergence for the appropriately scaled sequences of random variables . We also prove uniform integrability of the sequences and so that we additionally have convergence in and can compute the expectation of the limit.
Lemma 4.1**.**
For any and for a fixed , we have
[TABLE]
Proof.
We prove the result by induction on . We have and . Hence for with we get
[TABLE]
We also have Now
[TABLE]
using Stirling’s formula (Abramowitz and Stegun, 2012) given by
[TABLE]
Hence for any ,
[TABLE]
Thus the result holds for . By induction hypothesis, let the result be true for and we have constants such that
[TABLE]
Denoting we get
[TABLE]
where and (appropriately chosen) are constants, and we denote , . Now using (4.4) recursively we get
[TABLE]
Using Sterling’s formula we have for any ,
[TABLE]
Therefore,
[TABLE]
where . Hence dividing both sides by we get
[TABLE]
Hence the result holds for . ∎
Lemma 4.2**.**
For any , we have for a fixed ,
[TABLE]
Proof.
This follows from Lemma 4.1 and the Cauchy-Schwarz inequality. ∎
Remark 4.3**.**
Since both sequences and are bounded for some by Lemmas 4.1 and 4.2, they are also uniformly integrable; see (Durrett, 2019, Theorem 4.6.2).
The next Proposition 4.4 describes the asymptotic behavior of product of the degrees of two nodes, which as expected also has a power-law growth.
Proposition 4.4**.**
For any we have
[TABLE]
where with , , . Here are as defined in Proposition 2.1.
Proof.
Note that
[TABLE]
and writing we have
[TABLE]
where . Moreover, for we have,
[TABLE]
Therefore,
[TABLE]
Define
[TABLE]
using Sterling’s formula. Hence by Lemma 4.2, is uniformly integrable. Moreover and for . Hence by Doob’s Martinagale Convergence Theorem (Durrett, 2019, Theorem 4.2.11 and Theorem 4.6.4)
[TABLE]
where and . Hence we have
[TABLE]
both almost surely and in with . From Proposition 2.1 and (4.1) we can check that , a.s.; hence a.s. ∎
Lemma 4.5**.**
For any , we have
[TABLE]
where is as defined in Proposition 4.4.
Proof.
Let all our random variables be defined on the probability space . From Proposition 4.4, for ,
[TABLE]
holds with probability 1. Fix such an . Then given any small , there exists such that for any ,
[TABLE]
(1) If , we have . Also for , we have . Hence
[TABLE]
Check that as , . Since using (4.7) we get
[TABLE]
Therefore we have,
[TABLE]
By Proposition 4.4, (4.6) holds almost surely implying (4.8) holds almost surely and hence Lemma 4.5(1) holds.
(2) For , we get which means . Note that for any , we have
[TABLE]
Now we can prove Lemma 4.5(2) in the same manner as we proved (1). ∎
Lemma 4.6**.**
For any , we have
[TABLE]
where is as defined in Proposition 4.4.
Proof.
Define
[TABLE]
Note that for ,
[TABLE]
which holds a.s. using Propositions 2.1 and 4.4. Now we can proceed to prove the statements using the same arguments as in Lemma 4.5 by replacing with . ∎
With the aid of all the results above we are in a position to prove Theorem 2.2.
Proof of Theorem 2.2.
Using (2.3) recursively we have
[TABLE]
For any with ,
[TABLE]
We can check that,
[TABLE]
holding with equality for and
[TABLE]
Proof of part (1). First we prove the case when . Clearly as , where
[TABLE]
We want to show that a.s.. Taking expectations in (4.10) and using (4.5) we get
[TABLE]
Applying this argument recursively we get
[TABLE]
where . Therefore for any we have
[TABLE]
since for . Since the right hand side in (4.12) does not depend on , . Using Borel-Cantelli Lemma (Durrett, 2019, Theorem 2.3.1) this implies
[TABLE]
and hence a.s. and since we have
[TABLE]
Proof of part (2). Here we address the case where . Define
[TABLE]
Using the conditional Borel-Cantelli Lemma (Durrett, 2019, Theorem 4.4.5) we have
[TABLE]
Note that, using (4.10) and (4.11), we have for ,
[TABLE]
Using the above recursively we obtain
[TABLE]
Now, from Lemmas 4.5(1) and 4.6(1) we have
[TABLE]
Therefore we get
[TABLE]
Hence from (4.13) and (4.14) we have
[TABLE]
Proof of part (3). The case where can be shown using the same technique as for by using Lemmas 4.5(2) and 4.6(2) in place of Lemmas 4.5(1) and 4.6(1). ∎
5 Conclusion
In this paper we establish the rate of growth of the number of common friends for two fixed nodes in a linear preferential attachment model. The growth rate is shown to be static, logarithmic or power-law type depending on the choice of the parameter- or respectively. We use this result to prove consistency of an estimator of the number of common friends that is less expensive to compute. Such results will be applicable in both link prediction problems for large dynamic networks as well as detection methods for a preferential attachment model.
This is the first step in showing a more general result regarding the growth behavior for common friends of any randomly chosen pair of nodes and obtaining uniform convergence bounds for estimators of common friends. Further properties of such models and estimation issues are under current investigation.
6 Acknowledgement
The authors are very grateful to the referee for insightful comments and also for providing us with precise ideas to fill gaps in parts of the proof of Theorem 2.1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abramowitz and Stegun (2012) {bbook} [author] \bauthor \bsnm Abramowitz, \bfnm M. \binits M. and \bauthor \bsnm Stegun, \bfnm I. A. \binits I. A. ( \byear 2012). \btitle Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. \bpublisher Courier Corporation. \endbibitem
- 2Barabási and Albert (1999) {barticle} [author] \bauthor \bsnm Barabási, \bfnm A. \binits A. and \bauthor \bsnm Albert, \bfnm R. \binits R. ( \byear 1999). \btitle Emergence of scaling in random network. \bjournal Science \bvolume 286 \bpages 509-512. \endbibitem
- 3Bollobás et al. (2001) {barticle} [author] \bauthor \bsnm Bollobás, \bfnm B. \binits B., \bauthor \bsnm Riordan, \bfnm O. \binits O., \bauthor \bsnm Spencer, \bfnm J. \binits J. and \bauthor \bsnm Tusnády, \bfnm G. \binits G. ( \byear 2001). \btitle The degree sequence of a scale-free random graph process. \bjournal Random Structures Algorithms \bvolume 18 \bpages 279–290. \endbibitem
- 4Bollobás et al. (2003) {binproceedings} [author] \bauthor \bsnm Bollobás, \bfnm B. \binits B., \bauthor \bsnm Borgs, \bfnm C. \binits C., \bauthor \bsnm Chayes, \bfnm J. \binits J. and \bauthor \bsnm Riordan, \bfnm O. \binits O. ( \byear 2003). \btitle Directed scale-free graphs. In \bbooktitle Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, 2003) \bpages 132-139. \endbibitem
- 5Cooper and Frieze (2003) {barticle} [author] \bauthor \bsnm Cooper, \bfnm C. \binits C. and \bauthor \bsnm Frieze, \bfnm A. \binits A. ( \byear 2003). \btitle A general model of web graphs. \bjournal Random Structures & Algorithms \bvolume 22 \bpages 311–335. \endbibitem
- 6Dereich and Mörters (2009) {barticle} [author] \bauthor \bsnm Dereich, \bfnm S. \binits S. and \bauthor \bsnm Mörters, \bfnm P. \binits P. ( \byear 2009). \btitle Random networks with sublinear preferential attachment: Degree evolutions. \bjournal Electronic Journal of Probability \bvolume 43 \bpages 1222-1267. \endbibitem
- 7Durrett (2019) {bbook} [author] \bauthor \bsnm Durrett, \bfnm R. T. \binits R. T. ( \byear 2019). \btitle Probability: Theory and Examples, \bedition fifth ed. \bseries Cambridge Series in Statistical and Probabilistic Mathematics \bvolume 49. \bpublisher Cambridge University Press, Cambridge. \endbibitem
- 8Elwes (2016) {barticle} [author] \bauthor \bsnm Elwes, \bfnm R. \binits R. ( \byear 2016). \btitle A Linear Preferential Attachment Process Approaching the Rado Graph. \bjournal http://arxiv.org/abs/1603.08806 v 2. \endbibitem
