TL;DR
This paper analyzes how the PageRank random walk converges to equilibrium on large sparse random directed graphs, revealing a trichotomy in behavior depending on the refresh probability relative to the graph's mixing time.
Contribution
It identifies a universal three-regime behavior of PageRank convergence on sparse random digraphs, depending on the refresh probability and the graph's mixing time.
Findings
When refresh probability is very small, convergence shows cutoff behavior.
When refresh probability is large, convergence is exponential with rate equal to the refresh probability.
Intermediate refresh probabilities lead to a mixed convergence behavior.
Abstract
We consider the generalised PageRank walk on a digraph , with refresh probability and resampling distribution . We analyse convergence to stationarity when is a large sparse random digraph with given degree sequences, in the limit of vanishing . We identify three scenarios: when is much smaller than the inverse of the mixing time of the relaxation to equilibrium is dominated by the simple random walk and displays a cutoff behaviour; when is much larger than the inverse of the mixing time of on the contrary one has pure exponential decay with rate ; when is comparable to the inverse of the mixing time of there is a mixed behaviour interpolating between cutoff and exponential decay. This trichotomy is shown to hold uniformly in the starting point and uniformly in the resampling distribution .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Mixing time of PageRank surfers
on sparse random digraphs
Pietro Caputo*♯*
♯ Dipartimento di Matematica e Fisica, Università Roma Tre, Largo Murialdo 1, 00146 Roma, Italy.
and
Matteo Quattropani*♭*
♭ Dipartimento di Matematica e Fisica, Università Roma Tre, Largo Murialdo 1, 00146 Roma, Italy.
Abstract.
We consider the generalised PageRank walk on a digraph , with refresh probability and resampling distribution . We analyse convergence to stationarity when is a large sparse random digraph with given degree sequences, in the limit of vanishing . We identify three scenarios: when is much smaller than the inverse of the mixing time of the relaxation to equilibrium is dominated by the simple random walk and displays a cutoff behaviour; when is much larger than the inverse of the mixing time of on the contrary one has pure exponential decay with rate ; when is comparable to the inverse of the mixing time of there is a mixed behaviour interpolating between cutoff and exponential decay. This trichotomy is shown to hold uniformly in the starting point and uniformly in the resampling distribution .
Key words and phrases:
PageRank, random digraphs, non-reversible Markov chain, mixing time, random walks on networks.
2010 Mathematics Subject Classification:
Primary: 05C81, 60J10, 60C05. Secondary: 60G42
1. Introduction and results
Given a directed graph and a parameter , the PageRank surf on with damping factor is the Markov chain with state space and transition probabilities given by
[TABLE]
where is the number of vertices of , and, writing for the out-degree of vertex ,
[TABLE]
denotes the transition matrix of the simple random walk on . The interpretation is that of a surfer that at each step, with probability moves to a vertex chosen uniformly at random among the out-neighbours of its current state, and with probability moves to a uniformly random vertex in . The surfer reaches eventually a stationary distribution over , called the PageRank of . Since its introduction by Brin and Page in the seminal paper [10], PageRank has played a fundamental role in the ranking functions of all major search engines; see e.g. [15, 17]. A common generalization is the so-called customised or generalised PageRank, where the uniform resampling is replaced by an arbitrary probability distribution over , so that (1.1) becomes
[TABLE]
The resulting stationary distribution , characterised by the equation
[TABLE]
depends in a nontrivial way on the parameter and the distribution . There have been several investigations of the structural properties of ; see e.g. [18, 2, 9]; we refer in particular to the recent works [11, 16, 21] for cases where the graph is drawn from the configuration model. Here we focus on the dynamical problem of determining the time needed for the surfer to reach the equilibrium distribution , namely we study the mixing time of the Markov chain with transition matrix . In the case , this corresponds to the classical question of determining the mixing time of the simple random walk on the graph ; see e.g. [19]. Even for graphs where the latter is well understood, it is in general not immediate to deduce the influence of the parameter and of the resampling distribution on the speed of convergence to equilibrium.
It is intuitively reasonable to guess that if the parameter is suitably large compared to the inverse of the mixing time of the graph , then the time to reach stationarity will be essentially the expected time needed to make the first -resampling transition, that is a geometric random variable with parameter , while if is suitably small compared to the inverse of the mixing time of the graph , then one should reach stationarity well before the first -resampling, so that the speed of convergence to equilibrium will be essentially that of the simple random walk on . Moreover, one could expect that when is neither too small nor too large compared to the inverse of the mixing time of the graph , then some interpolation between the two opposite behaviours should take place. In this paper we substantiate this intuitive picture for a large class of sparse directed graphs. The results hold uniformly in the initial position and uniformly in the resampling distribution .
1.1. Two models of sparse digraphs
We shall consider two families of directed graphs. Both are obtained via the so-called configuration model, with the difference that in the first case we fix both in and out degrees, while in the second case we only fix the out degrees. The models are sparse in that the degrees are bounded. We now proceed with the formal definition.
Let be a set of vertices. For simplicity we often write , with . For each , we are given two finite sequences and of non negative integers such that
[TABLE]
The directed configuration model DCM(), is the distribution of the random graph obtained as follows: 1) equip each node with tails and heads; 2) pick uniformly at random one of the bijective maps from the set of all tails into the set of all heads, call it ; 3) for all , add a directed edge every time a tail from is mapped into a head from through . The resulting graph may have self-loops and multiple edges, however it is classical that by conditioning on the event that there are no multiple edges and no self-loops one obtains a uniformly random simple digraph with in degree sequence and out degree sequence .
Structural properties of random graphs obtained in this way have been extensively studied in [13]. Here we shall consider the sparse case corresponding to bounded degree sequences. Moreover, in order to avoid non irreducibility issues, we shall assume that all degrees are at least . Thus, throughout this work it will always be assumed that
[TABLE]
We often use the notation . Under the first assumption it is known that DCM() is strongly connected with high probability; see e.g. [13]. Under the second assumption, it is known that DCM() has a uniformly (in ) positive probability of having no self-loops nor multiple edges; see e.g. [12]. In particular, any property that holds with high probability for DCM() will also hold with high probability for a uniformly chosen simple digraph subject to the constraint that in and out degrees be given by and respectively. Here and throughout the rest of the paper we say that a property holds with high probability (w.h.p. for short) if the probability of the corresponding event converges to as . In particular, it follows that w.h.p. there exists a unique stationary distribution for the simple random walk on . Several properties of have been established recently in [6], where it was shown, among other facts, that can be described in terms of recursive distributional equations determined by the sequences .
To define the second model, for each let be a finite sequence of non negative integers and define the out-configuration model OCM() as the distribution of the random graph obtained as follows: 1) equip each node with tails; 2) pick, for every independently, a uniformly random injective map from the set of tails at to the set of all vertices , call it ; 3) for all , add a directed edge if a tail from is mapped into through . Equivalently, is the graph whose adjacency matrix is uniformly random in the set of all matrices with entries [math] or such that every row sums to . Notice that may have self-loops, but there are no multiple edges in this construction. This is due to the requirement that the maps be injective. The latter choice is only a matter of convenience, and everything we say below is actually seen to hold as well for the model obtained by dropping that requirement. We write for the collection of maps. As before we shall make the assumptions
[TABLE]
and use the notation . We remark that under the above assumptions there can still be vertices with in-degree zero, and therefore in this case is not necessarily strongly connected. However, it is still possible to show that w.h.p. there exists a unique stationary distribution for the simple random walk on ; see e.g. [1, 7] for more details.
In what follows denotes a given realization of either the directed configuration model DCM() or the out-configuration model OCM() and all the results to be discussed will hold w.h.p. within these two ensembles. For the sake of simplicity we often refer to these as model 1 and model 2 respectively.
1.2. Main results
Let denote the transition matrix of the simple random walk on . When is a digraph without multiple edges this is given by (1.2). If has multiple edges, is defined as where denotes the number of directed edges from to . For any and any resampling distribution , let denote the PageRank transition matrix defined in (1.3). Notice that as soon as , regardless of the realization of the graph and of the chosen distribution , there exists a unique stationary distribution on . Indeed, the transition matrix satisfies the so-called Doeblin condition if ; see Proposition 7 below for an explicit expression of . Convergence to equilibrium will be quantified using the total variation distance. For two probability measures , the latter is defined by
[TABLE]
where the maximum ranges over all possible events in the underlying probability space. Starting at a node the distribution of the PageRank surfer after steps is , and the distance to equilibrium is defined by
[TABLE]
This defines a non-increasing function of . It is convenient to extend it to a monotone function of , e.g. by considering the integer part of the argument. Finally, for any , the -mixing time is defined by
[TABLE]
Both and are functions of the underlying graph , and are therefore random variables. When , we write and for the corresponding quantities. The behaviour of the distance and of the mixing time has been thoroughly investigated in [6] for model 1 and in [7] for model 2. Let us briefly recall the main conclusions of these works. In order to simplify the exposition, we shall adopt the following unified notation. Let us define the in-degree distribution
[TABLE]
where we use the notation
[TABLE]
for the average degree. Note that for model 2 the distribution represents the average in-degrees rather than the actual in-degrees. Next, let the entropy and the associated entropic time be defined by
[TABLE]
Note that under our assumptions on the deterministic quantities satisfy and . The main results of [6, 7] state that, uniformly in the starting point , the rescaled function , , converges in probability as to the step function
[TABLE]
More precisely, we may combine [6, Theorem 1] and [7, Theorem 1] to obtain the following statement.
Theorem 1** (Uniform cutoff at the entropic time [6, 7]).**
Let be a random graph from either the directed configuration model DCM() or the out-configuration model OCM(). For each one has:
[TABLE]
In (1.14) we use the notation for convergence in probability as . In terms of mixing times, (1.14) implies in particular that for any :
[TABLE]
The fact that the distance to equilibrium approaches a step function, or equivalently that the -mixing time is to leading order insensitive to the value of , is commonly referred to as a cutoff phenomenon; see e.g. [14, 19] for a review. We also refer to [20, 4, 5] for similar results in the case of undirected graphs. We stress that a fundamental difference between the case of undirected graphs and the case of directed graphs considered here is that the underlying stationary distribution is not known explicitly in the directed case.
We now formulate our main results. To obtain explicit asymptotic statements we shall assume that is a sequence such that and such that the limit
[TABLE]
exists, with possibly or . We call the set of all probability measures on .
Theorem 2**.**
Let be a random graph from either the directed configuration model DCM() or the out-configuration model OCM(). Let be parameters as in (1.16). Then, according to the value of there are three scenarios:
- (1)
If then for all :
[TABLE] 2. (2)
If then for all :
[TABLE] 3. (3)
If then for all :
[TABLE]
In terms of mixing times, Theorem 2 implies the following statements.
Corollary 3**.**
In the setting of Theorem 2, the following holds uniformly with respect to :
- (1)
If then for all
[TABLE] 2. (2)
If :
[TABLE] 3. (3)
If then for all :
[TABLE]
The trichotomy displayed in Theorem 2 and Corollary 3 reflects the competition between two distinct mechanisms of relaxation to equilibrium: the simple random walk dominates in the first scenario, while the -resampling dominates in the third; the intermediate scenario interpolates between the two extremes; see Figure 1.
Remarkably, essentially the same trichotomy was uncovered recently by [3] in a model of random walk on dynamically evolving undirected graphs. In that case, the role of the resampling is played by the underlying reshuffling of the graph edges. It is interesting to observe that, in contrast with the undirected case considered in [3], in our setting the two competing processes may well have very distinct goals, and the overall stationary distribution is the result of a nontrivial balance.
To give some guidelines, below we illustrate the main ideas involved in the proof.
The starting point is the observation that the distance to stationarity satisfies the following general identity at all times , for all choices of the parameter and distribution :
[TABLE]
Here we use the notation for the distribution at time of the simple random walk started at a random vertex distributed according to some distribution . The relation (1.23) follows from a simple coupling argument; see Proposition 8 below. Moreover, the stationary distribution admits the power series expansion
[TABLE]
see Proposition 7 below. A particularly simple special case is when the resampling distribution equals the stationary distribution . Indeed, in this case the stationary distribution is the result of a trivial balance and , so that (1.23) becomes
[TABLE]
Therefore, when the results in Theorem 2 are an immediate consequence of Theorem 1. Moreover, this shows that the trichotomy in Theorem 2 follows from Theorem 1 whenever the distribution is such that
[TABLE]
since in this case is well approximated by , and the the three claims in Theorem 2 would follow from (1.23). As we shall see, the approximation (1.26) is rather straightforward in the first scenario. Indeed, if then the simple random walk has enough time to reach equilibrium between successive resampling events and (1.26) holds uniformly in , see Proposition 16 below. The second and third scenarios require a different approach since one cannot expect (1.26) to hold for all . There is however a special class of distributions, that we refer to as widespread, which does satisfy (1.26) in all three scenarios.
Definition 4** (Widespread measure).**
A sequence of probability measures on is widespread if
- (i)
There exists such that
[TABLE] 2. (ii)
Bounded -distance from the uniform distribution:
[TABLE]
Note that there is no requirement on the minimum of , so that large portions of the set of vertices are allowed to receive zero mass. An important property of widespread measures is that, if we start with such a distribution , then the time needed to reach stationarity for the simple random walk is much smaller than the entropic time . More precisely we shall establish the following facts.
Lemma 5**.**
Let be a random graph from either the directed configuration model DCM() or the out-configuration model OCM(). If is widespread, then for any sequence ,
[TABLE]
Moreover, in all three scenarios (1.26) holds for every widespread distribution .
The result in Lemma 5 illustrates well the mechanism behind the trichotomy in the case of widespread measures , but it is far from explaining the general phenomenon described in Theorem 2. For instance, if is a Dirac mass at a vertex , then and therefore (1.29) must fail for all , with fixed, since by Theorem 1 we know that in this case
[TABLE]
Moreover, the stationary distribution can be very far from in both scenarios 2 and 3. In particular, using our analysis in Section 3 one can check that in scenario 3,
[TABLE]
While we believe the result in Lemma 5 to be of interest in its own, the proof of Theorem 2 will be based on a different approach.
The first observation is that the identity (1.23) together with the result of Theorem 1 is already sufficient to establish all the upper bounds on the distance required in the proof of Theorem 2, see Section 4 for the details. On the other hand, some extra work is needed for the proof of the lower bounds on . A key technical point for establishing the desired lower bounds will be the following fact concerning scenarios 2 and 3.
Lemma 6**.**
Let be a random graph from either the directed configuration model DCM() or the out-configuration model OCM(). For fixed , including , and , for any sequence , satisfying , and :
[TABLE]
Essentially, (1.32) says that the -step evolution of the random walk starting at any given vertex is singular with respect to the evolution starting at the page rank distribution, as soon as for some fixed . The uniformity in and in Lemma 6 is a delicate matter. We shall see that for general , if , then is a nontrivial mixture of and another measure , see Lemma 13 below for the precise version of this statement. Depending on the nature of , the measure can be either supported on a small subset of , e.g. if for some , or very spread out, e.g. if is widespread as in Definition 4. We shall however show that structural features of the random random graph and the fact that imply that the measure cannot concentrate any mass on the support of the distribution and thus and are approximately singular. We refer to Section 6 for the derivation of this anti-concentration phenomenon. Since and are approximately singular for as in (1.30), this will be sufficient to prove Lemma 6.
The rest of the paper is arranged as follows: the next section establishes the basic identities (1.23) and (1.24) and some more preliminary material; Section 3 contains our main technical estimates and the proof of Lemma 6; Section 4 shows how to derive the main results from Lemma 6 and the facts established in Section 2. The discussion of widespread measures and the proof of Lemma 5 form an independent piece of work and are given in Section 5.
2. Preliminaries
Here we collect some simple general facts about the PageRank surf. The statements in this section do not depend on the graph where the original walk takes place. Therefore, we fix an arbitrary digraph with vertex set , and let be the transition matrix in (1.2). If for some we may define and for all .
2.1. The stationary distribution
Proposition 7**.**
For any , any probability vector , let be defined by (1.3). There exists a unique probability vector satisfying . Moreover, is given by
[TABLE]
Proof.
The equation is equivalent to
[TABLE]
Since is a stochastic matrix, the matrix is strictly diagonally dominant, and therefore invertible. Then (2.1) follows by expanding the expression . ∎
In particular, (2.1) and the triangle inequality imply that for any other probability vector :
[TABLE]
2.2. Walk vs. teleport
A trajectory of the PageRank surf can be sampled as follows. At each time unit independently, we flip a -biased coin: if heads (with probability ) then the surfer is teleported to a new vertex, chosen according to ; if tails (with probability ) then the surfer walks one step according to the transition matrix . The probability associated to this construction will be denoted by . If denotes the first time the surfer is teleported, then for all :
[TABLE]
Proposition 8**.**
For any , any probability vector , and all , :
[TABLE]
Proof.
We use the construction introduced above, and write for the position of the surfer at time with initial vertex . By using the same sample of the teleporting distribution we couple two trajectories in such a way that , for all . Therefore, letting denote the expectation with respect to this coupling:
[TABLE]
Moreover,
[TABLE]
Therefore,
[TABLE]
Multiplying by , summing over , and using (2.3) one obtains
[TABLE]
It follows that
[TABLE]
∎
Since the total variation distance is always bounded above by , Proposition 8 implies the upper bound
[TABLE]
The latter, in turn, gives the following upper bound on the mixing time.
Corollary 9**.**
For any , any probability vector , and all , the -mixing time (1.10) satisfies
[TABLE]
A further immediate consequence of Proposition 8 is that if is stationary for , then the distance to equilibrium takes a simple form.
Corollary 10**.**
For any , for all and all , if is a probability vector such that , then taking ,
[TABLE]
Proof.
From Proposition 7 it follows that , and therefore for all . ∎
Finally, another useful consequence of Proposition 8 is that it allows us to control the distance in terms of the distance , for some stationary as in Corollary 10, by means of the distance between and .
Corollary 11**.**
For any , all , any probability vector , if is such that ,
[TABLE]
Proof.
From the triangle inequality and the fact that is monotone in for all distributions , one has
[TABLE]
The conclusion then follows from Proposition 8 and Corollary 10. ∎
3. Main technical estimates
The goal of this section is to prove Lemma 6. The proof is divided into three main steps. The first step is a decomposition of as a mixture of and a distribution defined below. The second and most delicate step is the proof that and are approximately singular for and as in Lemma 6. The third step concludes the desired result collecting the technical estimates established in the first two steps.
3.1. Decomposition of
We start with a useful decomposition of as a mixture of and a distribution defined as follows. Fix , , and define and as
[TABLE]
Note that depends on the graph while is deterministic. We consider the case and treat the two cases and separately.
Lemma 12**.**
Fix and assume and . For all , there exists such that with high probability:
[TABLE]
and the normalization in (3.1) satisfies .
Proof.
Since we have . It follows that . Using Proposition 7,
[TABLE]
and therefore . ∎
Lemma 13**.**
Fix , and assume and . For all , there exists such that with high probability:
[TABLE]
where and are given in (3.1).
Proof.
For any , , define the probability vector
[TABLE]
Since , , and we may take small enough and assume that , . Using Proposition 7, letting denote the Dirac mass at :
[TABLE]
Note that , and . We show that the middle term above is negligible and that is well approximated by . If , by Riemann integration it follows that for all large enough
[TABLE]
for some constant . Next, using the monotonicity in time of total variation distance and Theorem 1, w.h.p.
[TABLE]
It follows that w.h.p.
[TABLE]
Writing and taking and as in (3.1) concludes the proof. ∎
3.2. Singularity of and
The key to this result is a property of the random walk that was established in [6, 7]. Roughly speaking this says that with high probability, for most vertices , the trajectory of the walk started at up to time is supported by a “small" directed tree rooted at provided that where is an arbitrary positive constant. As a result the distribution is rather strongly localized. We shall see that the distribution , depending on the nature of , could be either supported on a small subset of (e.g. if for some ) or very spread out (e.g. if is widespread). The approximate singularity of and turns out to be the result of a delicate structural property of the digraph which guarantees that even if is localized it must be sufficiently smeared out and cannot concentrate on the support of . We first recall the construction of the tree and then address the structural properties ensuring this anti-concentration.
3.2.1. The tree
Given the digraph , the tree , for fixed , can be discovered algorithmically as described in [6, Section 6.2] and [7, Section 4.1]. We recall the detailed construction for model 1. A very similar construction can be given for model 2; see [7, Section 4.1].
Below we describe a sequence of digraphs such that at each step is a subset of the out-neighborhood of of height in and such that is obtained from by adding a single edge of . Moreover, we obtain a sequence of directed trees such that for every , is a spanning tree of . The tree will be defined as .
Initially all matchings of tails and heads in are unrevealed and ; let (resp. ) denote the set of unrevealed heads (resp. tails) whose endpoint belongs to ; the height of a tail is defined as plus the number of edges in the unique path in from to the endpoint of ; the weight of is defined as
[TABLE]
where denotes the path in from to the endpoint of ; we then iterate the following steps:
- •
a tail is selected with maximal weight among all with and (using an arbitrary ordering of the tails to break ties);
- •
the head matched to in is revealed, and is obtained from by adding the edge ;
- •
if was not in , then its endpoint and the edge are added to to form .
The process stops at when there are no tails with height and weight . Note that is a directed spanning tree of at each step. The tree is defined as . After the construction of the tree , exactly edges of have been revealed, some of which may not belong to . Note that has edges and coincides with the union of all directed paths from which have length at most and at least probability with respect to the random walk started at . As in [6, Lemma 11], [7, Lemma 7], it is not difficult to see that when exploring the out-neighborhood of in this way the number , regardless of the realization of , is bounded as
[TABLE]
Let us recall the following key facts established in [6, Section 6] for model 1 and in [7, Section 4] for model 2. For every , for every , the trajectory of the random walk started at in satisfies with high probability for most initial positions . More precisely, let denote the quenched law of the random walk in started at . Let denote the set of such that is a directed tree, where , and denotes the out neighborhood of of height in (that is the subgraph of induced by the set of vertices which can be reached from with a path of length at most ). Then, from [6, Proposition 10, part (ii)], and [7, Lemma 11], one has
[TABLE]
where the notation indicates that the walk up to time traverses only edges of .
3.2.2. Key technical estimate
Let denote the set of vertices in that have distance from exactly in . Recall the definition in (3.1).
Lemma 14**.**
Assume . Fix and take . Then, for all , with high probability
[TABLE]
The proof of Lemma 14 is based on a structural property of the graph which says that the intersections of the trees and , where are two arbitrary vertices and , are such that, with high probability, for all , no path in can intersect more than times the set where is a suitably large constant. Let us use the notation for the set of paths in having length exactly and, for all , let denote the set of vertices along that path. Note that the endpoint of is necessarily a vertex of and , since is a tree.
Lemma 15**.**
Fix . For every and ,
[TABLE]
for all large enough, where . In particular, the event
[TABLE]
holds with high probability.
Proof.
We sample the pair in the random digraph by generating first the subgraph , and then the subgraph conditionally on . The construction of follows the steps described by the algorithm in Section 3.2.1 with the understanding that, for model 1 the head to be matched to the tail to form is chosen uniformly at random among all heads that are unmatched after the -th step, while for model 2 the tail has to be connected to a uniformly random vertex in . The process terminates when the tree has been fully generated after steps. A crucial feature of this construction is that the tails of all vertices are unmatched once the tree has been generated. Moreover, the number of vertices of satisfies, as in (3.11)
[TABLE]
Next, we generate the tree , conditionally on . This is done by starting at and by repeating the same steps for the construction described in Section 3.2.1 with the difference that if at step a tail is chosen which had already been matched during the generation of then the corresponding edge is included in the construction (and possibly in the tree being generated). The process terminates when the tree has been fully generated after steps. Thus, after steps we have a sample from the joint distribution of and in . Note that the total number of edges of discovered after the generation of both trees is . Let denote the filtration associated to this generation process, so that is the -field associated to the tree .
During the process generating conditionally on , we say that a bad matching occurs at step if the tail chosen at that step is currently unmatched (that is it was not revealed during the sampling of ) and it gets connected to a vertex that was already discovered in . The first key observation is that the conditional probability of a bad matching at step given is uniformly bounded above by
[TABLE]
Indeed, in the case of model 1 this probability is at most , while for model 2 this probability is at most . In either case it is less than the number defined in (3.16) for all large enough.
The second key observation is that if a path is such that then at least bad matchings have occurred during the formation of that given path. To see this, observe that after a vertex is visited for the first time during the construction of , the tails of will be all matched (at suitable steps ) to a uniformly sampled head among the ones that are currently unmatched (for model 1) or to a uniformly random vertex (for model 2). Indeed, the tails of all vertices all have the same weight and all of them are unmatched after the tree has been generated. Also, by definition, every path in can visit a given vertex at most once, and after a visit to it has to return to with a bad matching in order to visit some other . Hence, the number of visits to in a given path is at most the number of bad matchings occurred along that path +1. The extra 1 comes from the fact that could have started already inside , for instance if .
Next, consider an auxiliary directed tree with random marks defined as follows: is a directed regular tree with deterministic offspring and height , with independent and identically distributed Bernoulli() marks on its edges, where is as in (3.16). Edges whose Bernoulli mark is are colored red. A path of length from the root to one of the leaves is called bad if it has at least red edges. The previous construction then shows that the number of such that is stochastically dominated by the number of bad paths in . The probability that a given path in is bad is given by
[TABLE]
Therefore, the probability that there exists a bad path in is at most . Since , it follows that
[TABLE]
for all sufficiently large. Taking concludes the proof of (3.14). A union bound then implies that the event in the statement of the lemma holds with probability at least . ∎
Once Lemma 15 is available, we can prove Lemma 14.
Proof of Lemma 14.
The distribution satisfies , where . Hence, it is sufficient to prove that w.h.p.
[TABLE]
We write
[TABLE]
where is defined as in (3.12). As in [6, Propositon 6] one shows that for both models, with high probability:
[TABLE]
for any fixed constant . Hence, for all non negative integers with :
[TABLE]
Set . For any we write
[TABLE]
By (3.12), the first term in the right hand side is w.h.p. less than uniformly in , for any fixed . The second term, taking the summation over satisfies, w.h.p.
[TABLE]
where is the constant from Lemma 15 and we have used the fact that the event from Lemma 15 holds with high probability. From (3.19)-(3.21), noting that the first terms in the summation over contribute to (3.19) at most , we then obtain, w.h.p.
[TABLE]
Since and , it follows that for all fixed one has as . Since the parameters and are arbitrary this implies the desired conclusion. ∎
3.3. Proof of Lemma 6
Assume , and , with as in the statement of Lemma 6. The proof below applies to both cases and . We are going to show that for every fixed , there exists an event such that , , and such that on for all there are sets satisfying
[TABLE]
Indeed, using the decompositions in Lemma 12 (for the case ) and in Lemma 13 (for the case ), if (3.23) holds then w.h.p.
[TABLE]
where it is understood that if . Since (3.3) holds uniformly in and , this completes the proof of Lemma 6. We turn to the proof of (3.23).
It is important that the estimates in (3.23) hold for all and small enough (but fixed), where is the parameter implicit in the definition of . Since , we may assume by taking small enough. By Theorem 1 we know that for each , with high probability there exists sets , such that for all :
[TABLE]
For , take such that , and call . Since and , we have for all large enough. For all , call the subset of vertices in such that . Define
[TABLE]
From (3.12) we know that, for all , with high probability,
[TABLE]
By (3.20), (3.25) and (3.26) we obtain
[TABLE]
From (3.25) we also know that . Taking , this and (3.3) imply the last two items in (3.23) since . It remains to estimate . Since we obtain
[TABLE]
From Lemma 14 we see that with high probability, uniformly in and , (3.28) is at most for any fixed . Thus taking concludes the proof of Lemma 6.
4. Proof of the trichotomy
In this section we show how to prove Theorem 2 from the facts established above. Thus, is a random graph from either the directed configuration model DCM() or the out-configuration model OCM(), where the degree sequences satisfy the assumptions (1.6) and (1.7) respectively, and denotes the (w.h.p.) unique stationary distribution for the simple random walk on .
4.1. Scenario 1
We begin with scenario , namely when .
Proposition 16**.**
For any sequence such that ,
[TABLE]
Proof.
We need to show that, uniformly in , for any ,
[TABLE]
The upper bound (2.2) shows that for all :
[TABLE]
Take , with some fixed , and observe that by Theorem 1 we know that for all , for all :
[TABLE]
In particular, using :
[TABLE]
∎
The claim (1.17) is thus a consequence of Corollary 10, Corollary 11 and Theorem 1.
4.2. Scenario 3
Suppose , and for some fixed . From Lemma 6, Proposition 8 and the upper bound (2.10) we obtain:
[TABLE]
Equivalently,
[TABLE]
This proves (1.19).
4.3. Scenario 2
Here . We take , with fixed . We consider separately the case and the case .
Suppose first . By Proposition 8 and the triangle inequality
[TABLE]
Since , for some we have . Therefore, by Theorem 1 it follows that
[TABLE]
On the other hand, suppose that . Here we can apply Lemma 6, Proposition 8 and the upper bound (2.10), as in Section 4.2 above, to obtain
[TABLE]
Combining (4.9) and (4.10), we have proved (1.18).
5. Widespread measures
The goal of this section is to prove Lemma 5. We remark that the statement in probability is a consequence of (1.29). Indeed, fix any sequence , and take such that . From (1.29) we know that
[TABLE]
As in (4.3), from the upper bound (2.2) and the monotonicity in time of total variation distance to stationarity we obtain:
[TABLE]
Using (5.1) and we conclude the proof. Thus, we are left to prove (1.29).
In the special case where , and for the directed configuration model DCM(), a similar result was already obtained in [6]. Here we are going to prove it for the case of the out-configuration model OCM() as well, and more importantly we are going to extend it to the case of an arbitrary widespread probability measure . Following the approach in [6], the proof of Lemma 5 will be based on the construction of a martingale approximation for the distribution . The latter, in turn, rests on a branching approximation which allows one to couple the in-neighbourhood of a uniformly distributed random vertex of with a marked Galton-Watson tree up to depth .
We start with the definition of the relevant branching processes and the associated martingales. These will later be used in a coupling argument to provide an approximate description of the in-neighbourhood of a vertex in our random graphs, and of the stationary distribution at that vertex. Since the constructions differ slightly for the two models DCM() or OCM() we will define two distinct random trees and .
5.1. The marked Galton-Watson trees ,
Given , and a double sequence of degrees satisfying (1.5) and (1.6), for each , we define the rooted random marked tree recursively with the following rules:
- •
the root is given the mark ;
- •
every vertex with mark has children, each of which is given independently the mark with probability .
On the other hand, given , and a sequence of degrees satisfying (1.7), for each , the rooted random marked tree is defined by:
- •
the root is given the mark ;
- •
regardless of its own mark every vertex has, for each independently with probability , a child with mark .
There are several differences between the two trees and . In the first case the number of children of a given vertex is a deterministic function of the vertex’s mark, whereas in the second case it is a random variable that can be written as
[TABLE]
where the are independent Bernoulli random variables with parameters . In particular, the average number of children of any given vertex in is
[TABLE]
Since can be zero, in contrast with the tree , the tree is finite with positive probability. However, the two trees share several common features and we shall try to treat the two cases in a unified fashion as much as possible.
We write for the root and for other vertices of the tree, with the notation if is a child of . Each vertex of the tree has a mark, which we denote by . If denotes an independent uniformly random , and the root is given the mark , then we write and . Notice that and have the same average degree at the root, given by (5.4). We often write for short if this creates no confusion. For each we let denote the set of vertices in the generation of the tree. Each vertex has a unique path connecting it to the root with and . To any such we associate the weight
[TABLE]
If coincides with the in-neighbourhood of in a digraph , then is the probability that the simple random walk on goes from to in steps.
5.2. Martingale approximation
Given a function , we define the process
[TABLE]
We write for the -algebra generated by the random tree up to and including generation .
Lemma 17**.**
Let be either or , and write Then, for all :
[TABLE]
Proof.
Let the symbol denote the sum over the set of children of and note the symbolic identity
[TABLE]
Therefore,
[TABLE]
For the tree we have
[TABLE]
For the tree we have
[TABLE]
This proves (5.7). ∎
In particular, when , then
[TABLE]
Therefore, is a martingale with respect to the filtration . It is convenient to normalize it and consider instead the martingale defined as
[TABLE]
Notice that . In the case of model 1, the following convergence result was already discussed in [6, Proposition 15].
Proposition 18**.**
For every fixed , as the martingale converges to a limit , both almost surely and in (see [22, Ch. 12]) and for all :
[TABLE]
where the constants are given by
[TABLE]
Proof.
Consider the increments
[TABLE]
Reasoning as in Lemma 17, for both models we write
[TABLE]
where is defined as
[TABLE]
As in Lemma 17 one has . Let us compute . For the tree , we can rewrite
[TABLE]
Therefore,
[TABLE]
where we use the notation
[TABLE]
For the tree we have
[TABLE]
where is as in (5.14). Since for all with ,
[TABLE]
Therefore, combining (5.19) and (5.2) we have
[TABLE]
where are given by (5.14). Furthermore, observe that in both models one has
[TABLE]
Thus, iterating we obtain
[TABLE]
Since one has . Thus is a martingale bounded in , and therefore almost surely and in , for some . Using the orthogonality for all , (5.13) follows by summing (5.24) from to . ∎
Remark 19**.**
For each fixed , one can characterise the random variable as the solution to a distributional fixed point equation. For the directed configuration model DCM() this is discussed in [6, Lemma 16]. With a similar reasoning, for the out-configuration model OCM() one obtains that
[TABLE]
where stands for equality of distributions, are i.i.d. copies of and are independent Bernoulli random variables with parameter .
The next result will be crucial for the analysis of widespread measures. Notice that the constant appearing in the estimate below is bounded uniformly in if and only if satisfies (1.28).
Proposition 20**.**
For any probability vector , and any :
[TABLE]
where is as in Proposition 18 and is defined as
[TABLE]
Proof.
Setting , we write . Since , Lemma 17 shows that . We now compute
[TABLE]
Using one has
[TABLE]
For the tree we have
[TABLE]
On the other hand for the tree we have
[TABLE]
Summarising, we have shown that
[TABLE]
Thus, the same argument used in (5.2) implies that in both models
[TABLE]
where is defined as in (5.14). Therefore,
[TABLE]
The desired bound follows from the fact that in both models . ∎
5.3. Branching approximation for in-neighbourhoods
The -in-neighbourhood of a vertex , denoted , is defined as the subgraph of induced by the set of directed paths of length in which terminate at vertex . Here we observe that for any fixed , if is a small multiple of then with high probability can be coupled to the first generations of the random trees defined in Section 5.1. We consider the two models separately.
5.3.1. for DCM()
Recall that each vertex has heads and tails. Call and the sets of heads and tails at respectively. The uniform bijection between heads and tails, viewed as a matching, can be sampled by iterating the following steps until there are no unmatched heads left:
pick an unmatched head according to some priority rule; 2. 2)
pick an unmatched tail uniformly at random; 3. 3)
match with , i.e. set .
Note that this gives the desired uniform distribution over matchings regardless of the priority rule chosen at step 1. The graph is obtained by adding a directed edge whenever and in step 3 above.
To generate only, one can start at vertex and run the previous sequence of steps, by giving priority to those unmatched heads which have minimal distance from vertex , until this minimal distance exceeds , at which point the process stops. During the process, say that a vertex is exposed if at least one of the tails or heads has been already matched. Notice that as long as in step 2 no tail is picked from exposed vertices, the resulting digraph is a directed tree.
Let us now describe a coupling of the in-neighbourhood and the marked tree , where stands for the marked tree up to generation ; see Section 5.1 for the definition of . Clearly, step above can be modified by picking uniformly at random among all (matched or unmatched) tails and rejecting the proposal if the tail was already matched. The tree can then be generated by iteration of the same sequence of steps with the difference that at step we never reject the proposal and at step we add a new leaf to the current tree, with mark if , together with a new set of unmatched heads attached to it. Call the first time that a uniform random choice among all tails gives with already in the tree. By construction, the in-neighbourhood and the tree coincide up to time . At the -th iteration, the probability of picking a tail with a mark already used is at most , where is the maximum degree. Therefore, by a union bound,
[TABLE]
Taking steps, we have necessarily uncovered the whole in-neighbourhood . Thus, we have proved the following statement.
Lemma 21**.**
The -in-neighbourhood and the marked tree can be coupled in such a way that
[TABLE]
5.3.2. for OCM()
Recall that each vertex has tails, and call the sets of tails at . Consider the following exploration process of the in-neighbourhood at a fixed vertex . The process is defined as a triple where are respectively the completed set and the active set at time , and is a map such that for each , . At time zero we set , and for all . The -th iteration of the exploration determines the triple by executing the following steps:
pick a vertex according to some priority rule; 2. 2)
for each independently, sample defined as the Bernoulli random variable with parameter
[TABLE]
call the set of such that , and define
[TABLE] 3. 3)
define the new triple as
[TABLE]
Note that this process stops when becomes empty. Let us call this random time:
[TABLE]
For instance, with probability . We may construct a digraph along with the above process by adding the directed edges for all at step . Notice that when the process stops is a sample of the subgraph of induced by all directed paths in that terminate at . In particular, if the priority in step 1 is given to which have minimal distance to , and if we stop the process as soon as all active vertices have distance to larger than in the current graph , we obtain the in-neighbourhood of at distance , namely the digraph for the model OCM(). More formally, if denotes the minimal such that all have distance to at least in then, is given by the subgraph of induced by the completed set , where denotes the minimum of .
Let us remark that the quantity in (5.36) cannot exceed . In fact, in case there exists some such that then it means that at most vertices need to be discovered at step , and vertex needs to link to all of them. Hence, stays up to the end of the process.
Let us now describe a coupling of and the marked tree , where we write for the marked tree up to generation ; see Section 5.1. First, observe that the tree is obtained by iterating the steps above with the difference that at step 2 the probability must be taken always equal to , and that each yields a new child with mark in the current tree. Let denote the tree obtained after iterations, and let .
Lemma 22**.**
The random variables can be coupled in such a way that for every :
[TABLE]
Proof.
Let . Since at time 0 one has , the event satisfies , so that
[TABLE]
Consider now the -th iteration, and assume that . Thus, we may pick the same in step 1 for both samples. At step 2, let denote the Bernoulli random variables with parameter used for the sampling of and let be the Bernoulli random variables with parameter used for the sampling of . The total variation distance between two Bernoulli random variables equals the absolute value of the difference of their parameters. Therefore, for each independently we may couple with probability . Notice that if , then either at least one of the pairs fails to couple, or at least one of the has . Thus, on the event , the probability of given the history up to the -th iteration is bounded above by
[TABLE]
If , then and . For the second term we write , where denotes the number of edges in the tree . In conclusion, (5.41) is bounded by
[TABLE]
Thus, letting denote the -algebra generated by the two processes up to time , we have obtained
[TABLE]
From (5.4) we deduce . Therefore, the estimate (5.39) follows from (5.40) and (5.3.2). ∎
The next lemma establishes the coupling estimate for the -in-neighbourhood and the tree . The estimate could be refined but (5.43) below will be more than sufficient for our purposes.
Lemma 23**.**
The random variables and the tree can be coupled in such a way that for every , for all large enough:
[TABLE]
Proof.
Let denote the number of edges in the tree . Since at each iteration the number of edges added is stochastically dominated by a binomial random variable with parameters and , one has a large deviation bound for of the form: there exist absolute constants such that
[TABLE]
The estimate (5.44) can be proved e.g. by repeating the argument in [8, Lemma 23]. Next, observe that if and , then there must exist such that . The latter probability can be bounded via Lemma 22. Summarizing,
[TABLE]
The estimate (5.43) follows by taking for some large enough constant , and by taking sufficiently large. ∎
5.4. Proof of Lemma 5
Recall that in both models DCM() and OCM() one has w.h.p. a unique stationary distribution for the simple random walk on , which we denote . The starting point is a result that follows directly from [6, 7], which allows us to replace the unknown distribution with a local approximation.
Proposition 24**.**
For any fixed , taking , as both models satisfy
[TABLE]
Proof.
For a specific choice of , this result appears in [6, Eq. (11)] for model 1 and [7, Eq. (12)] for model 2. In fact, the proofs in [6, 7] apply to any fixed without modifications. Since is monotone in the statement (5.46) holds for all . ∎
To prove Lemma 5, by monotonicity of as a function of , we may restrict to sequences with . Thus, taking advantage of Proposition 24, the conclusion of Lemma 5 is a consequence of the following result.
Proposition 25**.**
There exists such that if , then for any with , for any widespread measure :
[TABLE]
Proof.
The proof is based on a first moment argument. Indeed, it suffices to show that
[TABLE]
Observe that
[TABLE]
where denotes an independent uniformly random vertex in and the expectation is understood to include the expectation over as well. Consider the first term above. We are going to use Lemma 21 for model 1 and Lemma 23 for model 2. Notice that since these estimates apply to any fixed vertex , they apply just as well if the vertex is taken to be uniformly random in , i.e. if as it is the case here. In particular, since , as ,
[TABLE]
where we use the unified notation for the first generations of the tree in either model 1 or model 2. Next, note that by definition, if , then
[TABLE]
where we use the notation from (5.6) and (5.12). Therefore,
[TABLE]
where we used the fact that
[TABLE]
which follows from . Using Schwarz’ inequality and Proposition 20 it follows that
[TABLE]
Since as and , using (5.50) we conclude that
[TABLE]
for all widespread measure . This settles the convergence of the first term in (5.4). To handle the second term, reasoning as in (5.51) we obtain
[TABLE]
If , Lemma 21 and Lemma 23 imply that both models satisfy
[TABLE]
Moreover, Schwarz’ inequality, Proposition 18 and standard facts about square integrable martingales (see, e.g., [22, Ch. 12]) imply
[TABLE]
Since the constant is bounded, letting concludes the proof.
∎
Acknowledgments
We acknowledge support of PRIN 2015 5PAWZB “Large Scale Random Structures", and of INdAM-GNAMPA Project 2019 “Markov chains and games on networks”.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Louigi Addario-Berry, Borja Balle, and Guillem Perarnau. Diameter and stationary distribution of random r 𝑟 r -out digraphs. The Electronic Journal of Combinatorics , pages P 3–28, 2020.
- 2[2] Reid Andersen, Fan Chung, and Kevin Lang. Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06) , pages 475–486. IEEE, 2006.
- 3[3] Luca Avena, Hakan Güldas, Remco van der Hofstad, and Frank den Hollander. Random walks on dynamic configuration models: a trichotomy. Stochastic Processes and their Applications , 2018.
- 4[4] Anna Ben-Hamou and Justin Salez. Cutoff for nonbacktracking random walks on sparse random graphs. The Annals of Probability , 45(3):1752–1770, 2017.
- 5[5] Nathanael Berestycki, Eyal Lubetzky, Yuval Peres, and Allan Sly. Random walks on the random graph. The Annals of Probability , 46(1):456–490, 2018.
- 6[6] Charles Bordenave, Pietro Caputo, and Justin Salez. Random walk on sparse random digraphs. Probability Theory and Related Fields , 170(3):933–960, Apr 2018.
- 7[7] Charles Bordenave, Pietro Caputo, and Justin Salez. Cutoff at the “entropic time” for sparse markov chains. Probability Theory and Related Fields , 173(1):261–292, Feb 2019.
- 8[8] Charles Bordenave, Marc Lelarge, and Laurent Massoulié. Nonbacktracking spectrum of random graphs: Community detection and nonregular ramanujan graphs. The Annals of Probability , 46(1):1–71, 01 2018.
