On the dense Preferential Attachment Graph models and their graphon induced counterpart
\'Agnes Backhausz, D\'avid Kunszenti-Kov\'acs

TL;DR
This paper compares the dense Preferential Attachment Graph (PAG) model with its graphon-based W-random graph counterpart, providing bounds on their expected distance and insights into their convergence behavior.
Contribution
It introduces a coupling method to bound the expected jumble norm distance between PAG and W-random graphs, advancing understanding of their relationship.
Findings
Expected jumble norm distance bounded by O(log^2 n * n^{-1/3})
Universal lower bound established independent of coupling
Analysis enhances understanding of PAG convergence to graphons
Abstract
Letting denote the space of finite measures on , and denote the Poisson distribution with parameter , the function given by \[ W(x,y)=\mu_{c\log x\log y} \] is called the PAG graphon with density . It is known that this is the limit, in the multigraph homomorphism sense, of the dense Preferential Attachment Graph (PAG) model with edge density . This graphon can then in turn be used to generate the so-called W-random graphs in a natural way. The aim of this paper is to compare the dense PAG model with the W-random graph model obtained from the corresponding graphon. Motivated by the multigraph limit theory, we investigate the expected jumble norm distance of the two models in terms on the number of vertices . We present a coupling for which the expectation can be bounded from above by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the dense Preferential Attachment Graph models and their graphon induced counterpart
Ágnes Backhausz
Eötvös Loránd University and MTA Alfréd Rényi Institute of Mathematics
Pázmány Péter sétány 1/c, H-1117, Budapest, Hungary
and
Dávid Kunszenti-Kovács
MTA Alfréd Rényi Institute of Mathematics
P.O. Box 127, H-1364 Budapest, Hungary
Abstract.
Letting denote the space of finite measures on , and denote the Poisson distribution with parameter , the function given by
[TABLE]
is called the PAG graphon with density . It is known that this is the limit, in the multigraph homomorphism sense, of the dense Preferential Attachment Graph (PAG) model with edge density . This graphon can then in turn be used to generate the so-called W-random graphs in a natural way.
The aim of this paper is to compare the dense PAG model with the W-random graph model obtained from the corresponding graphon. Motivated by the multigraph limit theory, we investigate the expected jumble norm distance of the two models in terms on the number of vertices . We present a coupling for which the expectation can be bounded from above by , and provide a universal lower bound that is coupling independent, but with a worse exponent.
Key words and phrases:
dense graph limits, Pólya urn processes, cut norm, jumble norm
2010 Mathematics Subject Classification:
Primary: 05C80
1. Introduction
Preferential attachment graphs (PAGs) form a group of random growing graph models that have been studied for a long time [2, 5, 8]. The main motivation is modelling randomly evolving large real-world networks, like online and offline social networks, the internet, or biological networks (e.g. protein-protein interactions). The basic PAG models have been extended by various features, for example duplication steps, weighted edges, vertices with random fitness. The study of this wide family of models provided information about several phenomena in real-world networks (asymptotic degree distribution, clustering, relation of local and global properties, epidemic spread). The limiting behaviour of PAG models has also been investigated from various points of view, depending somewhat on the edge density along the graph sequences. For instance, in [3], N. Berger, C. Borgs, J. T. Chayes and A. Saberi consider a sparse version of the process, with a linear number of edges compared to the number of vertices, and prove convergence in the sense of Benjamini–Schramm to a Pólya point graph. A variation with added randomness is considered by R. Elwes in [6, 7], where the preferential attachment model is amended in such a way that the number of edges added at each stage itself is a random variable, but in expectation still preserves a linear growth. The limit here is the infinite Rado graph, or a multigraph variant of the same, depending on whether multiple edges are allowed during the process.
At the dense end of the spectrum, C. Borgs, J. Chayes, L. Lovász, V. Sós and K. Vesztergombi considered in [4] the case when the edge density along the sequence is essentially constant (i.e. the number of edges is approximately ), under the convergence notion of injective graph densities. They showed that with probability 1 the graph sequence converges to the graphon given by . Later, B. Ráth and L. Szakács considered in [13] convergence of a more general family of processes with respect to induced graph densities, showing that the limit object is a graphon that now takes Poisson distributions as values instead.
If instead of considering induced densities, we look for homomorphism densities, the limit object can be seen to be in some sense a combination of the two previously mentioned ones: we obtain a graphon with being a Poisson distribution with parameter (i.e., the injective density limit is the first moment of the homomorphism density limit). Hence the corresponding graphs contain multiple edges, and the original notions for limits of simple graphs cannot be used any more. The paper [10] by K.-K., L. Lovász and B. Szegedy provides a framework for handling homomorphism densities in the context of multigraphs, and makes use of the so-called jumble-norm to measure distance between graphons.
All of the papers [4, 13, 10] also deal with -random graph sequences induced by the limit objects , and show that with probability 1, the resulting graph sequence converges to in the respective densities sense. These -random graph models are thus very similar to the classical graph sequences that gave rise to the limit , but also exhibit some significant differences.
Our goal in this paper is to compare the -dense preferential attachment graph model to its -random counterpart, showing that with probability 1 they are close (but not too close) in the jumble distance. The idea of the proof of the main result is to define a family of random graph models (see Section 3), which connects the -random graph and the PAG model, and which can be coupled (see Section 4) so that the pairwise jumble-norm distances are easier to bound. In the discussion part (Section 6), we point out some features of the -random version that can make it more useful in certain applications.
2. Terminology and main result
We shall start by defining the distance notion between multigraphs that we intend to use in this paper. It may be defined more generally for graphons (which essentially are weighted graphs with vertex set ), but that shall not be needed here, and we refer to [10] for more details.
Definition 1**.**
Let and be two (multi-)graphs on the same vertex set for some positive integer . Then we define their jumble norm distance as
[TABLE]
where and denote the multiplicity of edge in and , respectively.
The cut norm distance used in many other papers (see e.g. [4] for details) differs from this in the factor that is omitted there. As such, our current distance notion magnifies the differences that occur on small sets, and we clearly have . Also the jumble norm distance can be considered as an -version of the cut norm distance, since corresponds to the norm of the characteristic function of the set .
Next, fix a positive parameter . Let denote the space of finite measures on , and be the function given by
[TABLE]
where denotes the Poisson distribution with parameter . We want to define the notion of -random (multi-)graphs. The essence of the two-step randomization is as follows. We consider the set as the vertex set of the infinite graph with “adjacency function” , and sample a random spanned subgraph on vertices by choosing its vertices independently uniformly from . After this first randomization, we obtain a “graph” on vertices where each “edge” is a Poisson distribution. To obtain a true multigraph, we then independently sample an edge multiplicity for each pair of vertices from the corresponding Poisson distribution. If we allow loops, this will correspond to the random graph , whereas if loops are disallowed, we obtain the random graph .
Definition 2**.**
We choose independent exponential random variables with parameter for every . For , let be a Poisson random variable with parameter . For every , let be a Poisson random variable with parameter . Assume that all s are conditionally independent with respect to the s. We put edges between vertices and for every . This yields a random multigraph .
If, compared to , we erase the loops, we obtain the random multigraph .
Remark**.**
Note that using exponential variables instead of the uniform valued ones is compensated by the loss of the in the parameter.
These are the random models we wish to compare to the below version of the PAG model.
Definition 3**.**
We assign an urn to each vertex, initially with one single ball in each of them. Then we run a Pólya urn process for steps. That is, for , at step , we choose an urn, with probabilities proportional to the number of balls inside the urn, and put a new ball into it (each random choice is conditionally independent from the previous steps, given the actual distribution of the balls). Finally, for k=1,2,\ldots,\big{\lfloor}\lfloor cn^{2}\rfloor/2\big{\rfloor}, we add an edge between the vertices where the balls at step and at step have been placed. This yields the random multigraph ; multiple edges and loops may occur.
It was proved in [10] that with probability 1, the random graph converges with respect to multigraph homomorphism densities to the original function . As mentioned in the introduction, this is also the limit object obtained when looking at the random graphs defined as the preferential attachment graph on vertices with edges.
Given that letting go to infinity, the two random sequences and tend to the same limit, it is natural to ask how close these two sequences are as a function of .
Our main result is that under an appropriate coupling, we obtain a polynomial bound on the expected distance.
Theorem 1**.**
There exists a coupling for which for every there exists such that for every we have
[TABLE]
where . With this bound, the optimum value for is , yielding .
In the last section, we provide a universal, coupling-independent lower bound of . The exponents are far from each other, but the lower bound uses very little of the structure of the models, so there is room for improvement.
3. Random graph models
We define a family of random graph models such that the neighboring ones are easier to compare in the jumble norm, and the whole family connects the two models of Theorem 1. In the next section we will also present possible couplings for these pairs of models, which provide a coupling satisfying the conditions of the theorem. A positive number will be a common parameter of all of the models, and it will be considered fixed for the rest of the paper. Model 1 will be a realization of , whilst models 6 and 7 will be realizations of and , respectively.
The graphs will have vertices, labeled by . The parameter will be chosen later so that the bounds are the best possible available from our approach.
Model 1
We assign an urn to each vertex, initially with one single ball in each of them. Then we run a Pólya urn process for steps. That is, for , at step , we choose an urn, with probabilities proportional to the number of balls inside the urn, and put a new ball into it (each random choice is conditionally independent from the previous steps, given the actual distribution of the balls). Finally, for k=1,2,\ldots,\big{\lfloor}\lfloor cn^{2}\rfloor/2\big{\rfloor}, we add an edge between the vertices where the balls at step and at step have been placed. We obtain a random multigraph this way; multiple edges and loops may occur.
Model 2
Fix . Let be a random variable with negative binomial distribution, with parameters and (we mean the version of negative binomial distribution with possible values ). Let ; this has values (sometimes this distribution is called negative binomial). The urn process is the same as in model (independent of ), but we add edges between vertices chosen at step and at step only for (if , then we get the empty graph). We obtain a random multigraph .
Model 3
Let and be defined as in model . For , we run the Pólya urn as before. Let be the proportion of the balls in urn after steps (for ). For , independently at each step, we put a new ball in an urn chosen randomly according to the distribution . That is, the probability that the ball at step falls into urn is , for all . Finally, for , we add an edge between the vertices chosen at step and at step . (If , we mean the empty graph.) We obtain this way.
Model 4
Let and be defined as in model . If , take the empty graph. Otherwise, for every pair , we take a random variable with Poisson distribution of parameter . For every , we take a random variable with Poisson distribution of parameter . We assume that all s are conditionally independent of each other, given the s. Finally, we put edges between vertices and for every pair . We obtain this way.
Model 5
Given and , the model is the same as model except that is not included any more; the model is the same as the previous one in the non-empty case. We obtain this way.
Model 6
We choose independent exponential random variables with parameter for every . For , let be a Poisson random variable with parameter . For every , let be a Poisson random variable with parameter . Assume that all s are conditionally independent with respect to the s. We put edges between vertices and for every . We obtain a random multigraph this way.
Model 7
For every , let be defined as in model . We add edges between vertices and for all these pairs, but there are no loops in this case. We obtain this way.
4. Couplings
In order to prove Theorem 1, we need to construct a particular coupling for which the distance of and is smaller than the upper bound. We do this through a sequence of couplings between the consecutive pairs, with respect to the order of random graph models in the previous section. It will be easy to see that the coupling of the first one (which is a realization of ) and the last one (which is a realization of ) can be constructed following the same order. At each step, we can simply add a finite family of random variables to the probability space independently where necessary, and use the already existing random variables in the other cases.
Coupling of model and model
These two models can be coupled easily. Take a realization of model , and delete the edges corresponding to steps and for . That is, we do not add the edges in the first steps.
Proposition 1**.**
For all there exists such that
[TABLE]
holds in the coupling given above.
Coupling of model and model
We start from a realization of model 2. Let be the proportion of the balls in urn after steps. Then, for , conditionally on the process in model 2 until steps, we choose a coupling of the distributions given by and which minimizes the probability of choosing different urns and which is conditionally independent from the couplings used in the previous steps (with respect to the evolution of the number of balls). After adding the edges, we get a realization of model 3, because the distributions are determined by , and the steps are conditionally independent of each other (and there is no difference in the first steps).
Proposition 2**.**
For all there exists such that for every we have
[TABLE]
in the coupling given above.
Coupling of model and model
The negative binomial random variable is common in the two models, this is chosen first. If , then both models give the empty graph, so we assume the contrary, and construct the coupling given . Notice that in model , since all steps are independent and use the same probability distribution, the edges are chosen independently, with probabilities proportional to for and for loops.
We assign independent Poisson processes to each pair of vertices. For , the rate of the process is for , and for , the rate is for . We denote by the number of events until time in the process (). The sum of these processes is also a Poisson process; let be the time when the total number of events reaches . If we put edges between and for all , then we get model , because all events are distributed among the pairs of vertices independently, with probabilities proportional to the rates. On the other hand, if we put edges between and , then we get model , as the number of edges between the pairs are independent Poisson random variables with the appropriate parameter. Hence this provides a coupling of the two models.
Proposition 3**.**
For all there exists such that for every we have
[TABLE]
in the coupling given above.
Coupling of model and model
For , there is no difference between the two models. Whenever , the graph is the empty graph, so no coupling is needed.
Proposition 4**.**
For all there exists such that for every we have
[TABLE]
in the coupling given above.
Coupling of model and model
First, we wish to couple the exponential random variables with the variables from the Pólya urn. The following representation of the urn process until steps and its connection to independent exponential random variables yields a natural way to do this. In addition, this lemma will be useful when comparing models and as well.
Lemma 5**.**
Fix . Let be defined as in model . Let be the number of balls in urn (for ) after steps (we continue the Pólya urn process even if ). Let be independent random variables with exponential distribution of parameter . We define
[TABLE]
Then and have the same joint distribution.
Proof.
After steps, the total number of balls is ; that is, . As it is well known, by the interchangeability property of the chosen colors in the urn process, for every and we have
[TABLE]
On the other hand, for every and , the definition of implies that
[TABLE]
Hence has geometric distribution of parameter (where we mean the version with possible values ). The random variables s are independent, thus has the same negative binomial distribution as . Hence and have the same distribution. In addition, the conditional distributions given the sum are also the same, because we have
[TABLE]
This depends only on the sum of the s, which implies that
[TABLE]
just as we have seen in the previous case. ∎
Recall that the -s corresponded to the ratio of the colors in the urn after steps, and therefore the Pólya urn model can be coupled to the family of random variables in such a way that
[TABLE]
Next we couple the Poisson random variables and for each pair . We exploit the fact that the sum of two independent Poisson distributions is again a Poisson distribution whose parameter is the sum of the original parameters. Let be the -algebra generated by the families and . Conditioned on , the coupling is done so that for each pair , we generate independent Poisson random variables and of parameter and respectively, and set
[TABLE]
For the variables , the coupling is done similarly, with all parameters halved.
Proposition 6**.**
For all there exists such that for every we have
[TABLE]
in the coupling given above.
Coupling of model and model
Generate , then delete the loops. This yields the natural coupling between and .
Proposition 7**.**
There exists such that for every we have
[TABLE]
in the coupling given above.
We also conclude that this sequence of couplings can be realized in a single probability space, if we start with an appropriate family of independent random variables. Thus we constructed a coupling of and .
5. Proofs
Proof of Theorem 1
The result follows from the triangle inequality and Propositions 1 through 6.
We shall therefore now turn our attention to proving the bounds connecting each pair of models. Since the jumble norm distance is not always easy to work with, we shall make use of the following lemma.
Lemma 8**.**
Let and be two (undirected) multigraphs on the vertex set . Let be the number of edges between and in , and the same quantity in . Then the following holds:
[TABLE]
Proof.
Let . Notice that if , , and , then
[TABLE]
Hence
[TABLE]
as we assumed that . In the reverse case , we get the same with the bound . Since and , this is equal to the previous maximum. This finishes the proof. ∎
5.1. Models and
Proof of Propositon 1
Let be the number of edges between and in model , and the number of edges between and in model . By the definition of the coupling, can never be smaller than . If , then is the number of edges added to model during the first steps. Therefore is at most the number of steps in which urn was chosen during the first steps, which is (cf. Lemma 5). Even if , the sum cannot be larger than , since there are no more edges in model . By Lemma 8 and Lemma 5, we obtain
[TABLE]
Equation (1) implies
[TABLE]
Hence the expectation of the minimum is at most plus some constant depending only on . This finishes the proof.
5.2. Models and
The idea of the proof of Proposition 2 is to find the expected value of the maximum when all global random variables (like ) are close to their mean, and then use large deviation theorems to show that this is the case with high probability. Throughout this proof, the constant factor in the notation may depend only on .
First we fix . Let be the number of balls in urn after steps. Recall that denotes the number of balls in urn after steps. We define the proportions similarly (recall that the initial configuration consists of one ball at each urn):
[TABLE]
We will use an application of de Finetti’s theorem to the urn process (see e.g. Theorem 2.2. in [12]). The joint distribution of the urns chosen randomly can be represented as follows. Let be a random variable with distribution (as there is a single ball in urn at the beginning and balls in the other urns). Then, conditionally on , generate independent Bernoulli random variables taking value with probability . This has the same distribution as the indicators of the steps when a new ball is placed to urn . This representation has an immediate consequence on the maximum of the proportion.
Lemma 9**.**
- (a)
Let be a random variable with distribution with . Then we have
[TABLE] 2. (b)
For every we have
[TABLE]
Proof.
By using that , we have
[TABLE]
Using exponential Markov’s inequality and part , we have
[TABLE]
where we assumed that . This immediately implies . ∎
We will use the following lemma, which is based on a large deviation argument.
Lemma 10**.**
Fix integers . Let be a random variable with distribution . Let be a random variable whose conditional distribution with respect to is binomial with parameters and . We define
[TABLE]
Then there exists such that
[TABLE]
Proof.
We will compare the difference to the variance of the binomial distribution, given . We start with
[TABLE]
We will choose but keep writing for clarity. Since is measurable with respect to , the first term is equal to
[TABLE]
where denotes the indicator function of the event .
We define and ; then the first event in (5) is . It is clear that and ; hence we can apply large deviation arguments. Furthermore, we have on the event , as the following calculation shows.
[TABLE]
We also need . That is, we have to check whether the following holds:
[TABLE]
Since we have on and we assumed , this holds for large enough (recall that does not depend on any of the parameters).
Hence we can apply the relative entropy version of the Chernoff bound for binomial distributions, conditionally with respect to . We obtain
[TABLE]
where . We need the following quantities for the calculations.
[TABLE]
It is easy to check that implies . On the event we have , and hence . Therefore
[TABLE]
Similarly, we have
[TABLE]
Substituting this into the Chernoff bound, we obtain that for defined by equation (5) we have
[TABLE]
for large enough. As for the first term:
[TABLE]
for large enough. Hence the first term is , as we have chosen . In the exponent of the second term, since holds on , we get
[TABLE]
Putting this together, we conclude that , which is a bound for the first term of (4). The second term of (4) can be bounded as follows.
[TABLE]
by equation (2), if . This finishes the proof. ∎
Now we compare the differences of the proportions after steps and the further steps. This will give the order of the distance in the coupling. We define
[TABLE]
Proposition 11**.**
Assuming , there exists such that for every fixed the following hold.
- (a)
[TABLE] 2. (b)
[TABLE] 3. (c)
[TABLE] 4. (d)
We define
[TABLE]
Then for some we have
[TABLE] 5. (e)
For defined in , we have
[TABLE]
Proof.
We will assume that ; otherwise the sums become empty, and .
We will use the representation based on de Finetti’s theorem together with the following decomposition.
[TABLE]
According to the representation, we know that is a binomial random variable with parameters and , given and . We will use Lemma 10 for this conditional distribution. Notice that , and in this case. Therefore for defined in Lemma 10 we have
[TABLE]
It follows that
[TABLE]
Similarly, is a binomial random variable with parameters and , given and . Again, we have that . Thus Lemma 10 can be applied. We get that there exists such that
[TABLE]
This implies
[TABLE]
In addition, using that holds on the event , we can write
[TABLE]
Now we reformulate the third term.
[TABLE]
By equation (2) we obtain
[TABLE]
Putting this together with equations (6) and (7), we obtain that there exists such that
[TABLE]
Since and , for large enough, the middle term is the largest one, and we conclude that for some
[TABLE]
This finishes the proof of .
It follows from part that
[TABLE]
On , we have , as , for large enough . By equation (3) we get that
[TABLE]
The two equations together imply the statement.
Similarly to the proof of Lemma 9, for every we have
[TABLE]
Therefore, writing
[TABLE]
we have
[TABLE]
because on the event we have in all terms (and the inequality is valid for as well).
For large enough (which may depend only on ), the condition
[TABLE]
implies that either the event in part , or the event in inequality (8), or holds, according to the value of . Notice that for we have , hence for large enough we can get rid of the maximum. Thus, combining these inequalities with part of Lemma 9, we get the statement of .
For the first term of , we know this statement with constant from part . We may assume that is so large that holds. Then we can apply Lemma 9 to get
[TABLE]
On the other hand, if holds and the second term of is greater than the bound in , then
[TABLE]
holds. By choosing , this implies that for some we have
[TABLE]
Putting this together with part , this finishes the proof of (notice that does not depend on ).
To see that implies , we only have to check that
[TABLE]
Recall that the random variable has negative binomial distribution with parameters and . For large enough, the inequality holds and we also have
[TABLE]
Notice that can be expressed as the independent sum of geometric random variables supported on with mean . Thus, we compare to , which is less than the mean of the geometric random variables. Hence we can apply Cramér’s theorem for . We obtain that
[TABLE]
where is the moment generating function of this geometric random variables, and minimizes the expression in the exponent. That is, we have
[TABLE]
This yields
[TABLE]
It follows from inequality (10) that for large enough we have
[TABLE]
Since we assumed that , this implies inequality (9). ∎
Proof of Proposition 2. If , then both models give the empty graph and the distance is [math]; we will ignore this case. For odd, let be the indicator of the following event: either vertex gets different edges at step in the coupling of model 2 and model 3, or it gets an edge in exactly one of the models. For even, let . We will be interested in . In addition, we define
[TABLE]
Whenever takes value , we either choose vertex in exactly one of the models at step or , or we choose vertex in both models, but it gets different pairs in the two models. Thus, by the definition of the coupling, we have that
[TABLE]
A slight modification of Proposition 11 implies that for some we have
[TABLE]
To see this, note that the sum for the first two terms for odd gives the first term of defined in part of Proposition 11. The third term here corresponds to the second term of with even s omitted. Finally, for the fourth term it is easy to see that the proof of Proposition 11 is valid if is replaced by .
Let be event in equation (11), and let k_{n}=K_{6}\log^{2}n\cdot\big{(}n^{3/2-\alpha/2}+n^{\alpha-1}\big{)}. By using that and given , the indicators are conditionally independent by the definition of the coupling, we obtain
[TABLE]
Putting this together with equation (11), we get that
[TABLE]
This immediately implies that
[TABLE]
The sum of the indicators is at most . We conclude that
[TABLE]
Since the definition of model 2 and model 3 is the same during the first steps, and we included all possible differences into the indicators, is an upper bound for , where is the number of edges between and in model , and is the corresponding quantity in model (at the end of the whole process). By using Lemma 8 we get the statement of Proposition 2.
5.3. Models and
Proof of Proposition 3
Let be the number of edges between and in model , and be the number of edges between them in model . By using the notations introduced for the coupling of the two models, we have . If , then all the differences are nonnegative, and all of them are negative if . Thus
[TABLE]
We will use the fact that by cumulating the independent Poisson processes assigned to the pairs of vertices we get a Poisson process with rate . In addition, the types of the events are independent of the moments when they occur. Let be the total number of events until time ; i.e. , which has Poisson distribution with parameter . Since there are events in the cumulated process until , there are events between and . On the other hand, independently of each other, all these events increase \big{|}\sum_{j=1}^{n}N_{\tau}^{(ij)}-N_{cn^{2}}^{(ij)}\big{|} by with probability . We conclude that the quantity in equation (12) has binomial distribution with parameters and conditionally with respect to and . Let be the following event:
[TABLE]
By using the moment generating function of the binomial distribution, we obtain
[TABLE]
It follows from part of Lemma 9 and equation (9) that . Similarly to the proof of equation (9) in part of Proposition 11, it can be shown that ; one can use Cramér’s large deviation theorem and the fact that the expectation of is smaller than . Finally, recall that has Poisson distribution with parameter . We can think of it as the independent sum of Poisson random variables with parameter , and apply Cramér’s theorem. That is,
[TABLE]
where is the moment generating function of , and we can choose to minimize the expression on the right hand side. By using and , it follows that this probability is also . The same argument works for . On the other hand, , hence for large .
Putting this together, we obtain that , and
[TABLE]
Since the total sum cannot be larger than , we get Proposition 3 similarly to the arguments in the previous section.
5.4. Models and
Proof of Proposition 4
The expected value \mathbb{E}\big{(}d_{\boxtimes}\big{(}\mathbb{G}_{4}(n,\alpha),\mathbb{G}_{5}(n)\big{)}\big{)} can be split according to the value of as follows.
[TABLE]
The second term is zero by the coupling, whilst the first is
[TABLE]
To bound this, note that we always have
[TABLE]
But we have by the definition of the variables
[TABLE]
whence
[TABLE]
Since is, as noted before, the sum of independent geometric distributions of parameter supported on , we have
[TABLE]
Provided , this yields .
5.5. Models and
To be able to bound the jumble distance, we have to deal with each of the random variables . Recall that denoted the -algebra generated by the and , . By our coupling we may write for each
[TABLE]
Lemma 12**.**
Provided , we have for all non-negative integers
[TABLE]
where denotes the factorial moment for any , i.e. .
Proof.
It is known that for any we have . Suppose now that . By the law of total expectation, we have
[TABLE]
where
[TABLE]
and we made use of the power mean inequality in the form . Note that we may consider as the error that stems from the randomization in the denominator, whilst captures the error that comes from the rounding .
Let us first bound . It is known that for the i.i.d. exponential variables , their sum is independent from the ratios . Hence
[TABLE]
Also, we have . The first term can thus be bounded by
[TABLE]
We have that given i.i.d. random variables with expectation [math], and an integer , the moment of their sum is bounded by , with depending only on the distribution (see e.g. [1, 9]). In addition, . The second term can therefore be bounded by
[TABLE]
Thus we obtain
[TABLE]
For a fixed , this means
[TABLE]
Let us now turn to the term . The first idea is to get rid of the absolute value by observing that if we have random variables such that and , then for any we have
[TABLE]
The role of shall be played by .
Using the fact that by the rounding, for each , we have
[TABLE]
and so we can have
[TABLE]
play the role of . Applying first the power mean inequality, and using that the reciprocal of the sum has inverse gamma distribution, whilst the ratio is a distribution independent of it, for large enough, we obtain
[TABLE]
Again by the rounding, we have the lower bound
[TABLE]
Here it is clear that the last expression is negative, so let’s continue without the minus sign.
[TABLE]
So the role of will be played by
[TABLE]
We use that the sum is independent of the proportions, use inequality (5.5), the Cauchy–Schwarz inequality and the moments of the Gamma distribution:
[TABLE]
Hence , and summing up we obtain
[TABLE]
Proof of Proposition 6
Recall that in the coupling of model and , the absolute value of the difference of the number of edges between and is . By Lemma 12 with , for some , for every fixed we have
[TABLE]
Let now , and . Clearly we have
[TABLE]
For fixed , conditionally on , the random variables () are independent. Since they fall between [math] and , by the Hoeffding inequality we have
[TABLE]
for any . Using the same constant as above, and choosing , we have by the bound on that
[TABLE]
A trivial bound then yields
[TABLE]
Since always holds, we obtain
[TABLE]
It is clear that , since whenever , its rd factorial moment is positive, and strictly larger than itself. Therefore
[TABLE]
From the above, together with inequality (13) :
[TABLE]
where the last inequality follows from a weighted AM-GM.
Finally, Lemma 8 concludes the proof.
5.6. Models and
We have that and coincide everywhere but the main diagonal, and it is then easy to see that
[TABLE]
Proof of Propositon 7
Recall that has Poisson distribution with parameter , where has distribution. Assume first that is fixed, and . Then
[TABLE]
We will use the factorial moments of the Poisson distribution again. For every fixed and integers for some we have
[TABLE]
because the exponential distribution has finite moments.
For an arbitrary function we may apply the above inequality to obtain
[TABLE]
Let now be fixed, set and . For large enough (such that ) and , this yields
[TABLE]
Lemma 8 concludes the proof.
6. Discussion
Our main theorem shows that the classical dense preferential attachment graph model yields random graphs that are close to the random graph model obtained through the PAG-graphon, the limit object in the multigraph homomorphism sense of the random sequence . They are not indistinguishable though (we provide a lower bound on their distance below), and they each have their own advantages for applications.
The random graphs have the advantage that the number of edges is deterministic, but contrarily to the sparse PAG models, one cannot easily generate a growing family of graphs . For the graphon induced , the number of edges is random, though still asymptotically concentrated around the expected value. Also, the way it is generated does not carry the preferential attachment flavour. This may be an advantage from the simulation point of view: the random variables in the model can be generated simultaneously, without the steps that have to be performed after each other in the PAG model.
However, it is possible to couple the elements of the sequence (or ) so that we obtain a growing sequence (and still keep the convergence with probability 1). Indeed, passing from to only means that we have to generate the random variable , independently of the previous -s, and then generate the appropriate Poisson random variables for . This coupling shows that adding an extra vertex and extending to can be performed easily. It seems that this does not hold for the model.
Unfortunately, we do not have a lower bound for the jumble norm distance of and that matches the upper bound given in Theorem 1. Recall that we there obtained as an upper bound for a particular coupling. On the other hand, there is a universal lower bound of , which holds for every coupling, and also for both for the random graphs and . The exponents are quite far from each other, but the arguments used for the lower bound use very little of the structure of the graphs. We present a short argument giving this lower bound for both and .
If we take in Definition 1, then we obtain a lower bound for the jumble norm distance of and by understanding the difference of the number of edges. The main point is that the distribution of this quantity does not depend on the coupling. In , the number of edges is deterministic and it is equal to . We denote by the number of edges in the graph model. Let be the -algebra generated by (recall that the latter random variables are independent and have exponential distribution with parameter ). Then, conditionally with respect to , the random variable has Poisson distribution with parameter . Hence by the law of total expectation.
In any coupling of these two models, by we have
[TABLE]
Notice that
[TABLE]
for an appropriate positive number . This holds for every coupling; therefore the exponent in Theorem 1 cannot be smaller than .
The previous argument relies on the fact the expected number of edges is different in the two models, due to the lack of loops in the model. For the and the models, although the expected number of edges are equal to each other, one can prove that the jumble norm distance is still at least for every coupling. The key point is to use the formula for the central absolute moment of the Poisson distribution and see that it is at least constant times the square root of the parameter.
To see this, we have to consider the random variable , which is the number of edges in . It has Poisson distribution with parameter conditionally with respect to (recall Definition 2). For sake of simplicity, let be a Poisson() distributed random variable, and . First notice that
[TABLE]
On the other hand, by using the formula for the central absolute moment of the Poisson distribution and the well-known upper bound version of Stirling’s formula, we have
[TABLE]
Putting this together, we get
[TABLE]
Now we apply this for the conditional distribution of with . We obtain
[TABLE]
Therefore, since is the number of edges in the PAG model, we conclude that for every coupling of and , we have
[TABLE]
Remark**.**
In this paper we considered the jumble distance between the two random models for the dense PAG graph, as that is the more natural distance notion for multigraphs generated by unbounded graphons (in this particular case, this corresponds to the unboundedness of the parameters of the Poisson distributions). However, as each finite multigraph generated is bounded per se, one may wonder if it is possible to say anything about the cut distance between, e.g., and .
We recall that the cut distance of two graphs on the same set of vertices is defined as
[TABLE]
It is easily seen that , hence the upper bounds given for the jumble distance apply a fortiori to the cut distance as well. On the other hand, the methods used in this paper do not yield stronger bounds for the cut norm distance.
Acknowledgements
The first author was supported by the Hungarian National Research, Development and Innovation Office, NKFIH grant K108615 and by the MTA Rényi Institute Lendület Limits of Structures Research Group. The second author has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement 617747, and from the MTA Rényi Institute Lendület Limits of Structures Research Group.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. von Bahr, On the convergence of moments in the central limit theorem, Ann. Math. Statist. 36 (1965), 808–818.
- 2[2] A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science 286 (1999), no. 5439, 509–512.
- 3[3] N. Berger, C. Borgs, J. T. Chayes, and A. Saberi Asymptotic behavior and distributional limits of preferential attachment graphs, Ann. Prob. 42 (2014), pp. 1–40.
- 4[4] C. Borgs, J. Chayes, L. Lovász, V. Sós, K. Vesztergombi. Limits of randomly grown graph sequences. Eur. J. Combin. 32(7) (2011), pp. 985–999.
- 5[5] R. Durrett, Random graph dynamics , Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge Univ. Press, Cambridge, 2007.
- 6[6] R. Elwes, Preferential Attachment Processes Approaching The Rado Multigraph. ar Xiv preprint ar Xiv:1502.05618 (2015)
- 7[7] R. Elwes, A linear preferential attachment process approaching the Rado graph. ar Xiv preprint ar Xiv:1603.08806 (2016)
- 8[8] A. Frieze, M. Karoński, Introduction to random graphs , Cambridge University Press, Cambridge, 2015.
