Mixing time of PageRank surfers on sparse random digraphs

Pietro Caputo; Matteo Quattropani

arXiv:1905.04993·math.PR·February 2, 2021·Random Struct. Algorithms

Mixing time of PageRank surfers on sparse random digraphs

Pietro Caputo, Matteo Quattropani

PDF

2 Repos

TL;DR

This paper analyzes how the PageRank random walk converges to equilibrium on large sparse random directed graphs, revealing a trichotomy in behavior depending on the refresh probability relative to the graph's mixing time.

Contribution

It identifies a universal three-regime behavior of PageRank convergence on sparse random digraphs, depending on the refresh probability and the graph's mixing time.

Findings

01

When refresh probability is very small, convergence shows cutoff behavior.

02

When refresh probability is large, convergence is exponential with rate equal to the refresh probability.

03

Intermediate refresh probabilities lead to a mixed convergence behavior.

Abstract

We consider the generalised PageRank walk on a digraph $G$ , with refresh probability $α$ and resampling distribution $λ$ . We analyse convergence to stationarity when $G$ is a large sparse random digraph with given degree sequences, in the limit of vanishing $α$ . We identify three scenarios: when $α$ is much smaller than the inverse of the mixing time of $G$ the relaxation to equilibrium is dominated by the simple random walk and displays a cutoff behaviour; when $α$ is much larger than the inverse of the mixing time of $G$ on the contrary one has pure exponential decay with rate $α$ ; when $α$ is comparable to the inverse of the mixing time of $G$ there is a mixed behaviour interpolating between cutoff and exponential decay. This trichotomy is shown to hold uniformly in the starting point and uniformly in the resampling distribution $λ$ .

Equations326

P_{α} (x, y) = (1 - α) P (x, y) + \frac{α}{n},

P_{α} (x, y) = (1 - α) P (x, y) + \frac{α}{n},

P (x, y) = {1/ d_{x}^{+} 0 if (x, y) \in E otherwise

P (x, y) = {1/ d_{x}^{+} 0 if (x, y) \in E otherwise

P_{α, λ} (x, y) = (1 - α) P (x, y) + α λ (y) .

P_{α, λ} (x, y) = (1 - α) P (x, y) + α λ (y) .

π_{α, λ} (y) = x \in V \sum π_{α, λ} (x) P_{α, λ} (x, y),

π_{α, λ} (y) = x \in V \sum π_{α, λ} (x) P_{α, λ} (x, y),

m = x \in V \sum d_{x}^{+} = x \in V \sum d_{x}^{-} .

m = x \in V \sum d_{x}^{+} = x \in V \sum d_{x}^{-} .

x \in [n] min d_{x}^{-} \land d_{x}^{+} \geq 2, x \in [n] max d_{x}^{-} \lor d_{x}^{+} = O (1) .

x \in [n] min d_{x}^{-} \land d_{x}^{+} \geq 2, x \in [n] max d_{x}^{-} \lor d_{x}^{+} = O (1) .

x \in [n] min d_{x}^{+} \geq 2, x \in [n] max d_{x}^{+} = O (1),

x \in [n] min d_{x}^{+} \geq 2, x \in [n] max d_{x}^{+} = O (1),

∥ μ - ν ∥_{TV} = E max ∣ μ (E) - ν (E) ∣,

∥ μ - ν ∥_{TV} = E max ∣ μ (E) - ν (E) ∣,

D_{α, λ}^{x} (t) = P_{α, λ}^{t} (x, \cdot) - π_{α, λ}_{TV} .

D_{α, λ}^{x} (t) = P_{α, λ}^{t} (x, \cdot) - π_{α, λ}_{TV} .

T_{α, λ} (ε) = in f {t \geq 0 : x \in V max D_{α, λ}^{x} (t) \leq ε} .

T_{α, λ} (ε) = in f {t \geq 0 : x \in V max D_{α, λ}^{x} (t) \leq ε} .

μ_{in} (x) = \frac{1}{n} \times {d_{x}^{-} / ⟨ d ⟩ 1 model 1 model 2

μ_{in} (x) = \frac{1}{n} \times {d_{x}^{-} / ⟨ d ⟩ 1 model 1 model 2

⟨ d ⟩ = \frac{1}{n} x \in V \sum d_{x}^{-} = \frac{m}{n}

⟨ d ⟩ = \frac{1}{n} x \in V \sum d_{x}^{-} = \frac{m}{n}

H = x \in V \sum μ_{in} (x) lo g d_{x}^{+}, T_{ENT} = \frac{lo g n}{H} .

H = x \in V \sum μ_{in} (x) lo g d_{x}^{+}, T_{ENT} = \frac{lo g n}{H} .

ϑ (s) = {10 if s < 1 if s > 1.

ϑ (s) = {10 if s < 1 if s > 1.

x \in [n] max ∣ D_{0}^{x} (s T_{ENT}) - ϑ (s) ∣ ⟶ P 0.

x \in [n] max ∣ D_{0}^{x} (s T_{ENT}) - ϑ (s) ∣ ⟶ P 0.

\frac{T _{0} ( ε )}{T _{ENT}} ⟶ P 1,

\frac{T _{0} ( ε )}{T _{ENT}} ⟶ P 1,

γ = n \to \infty lim α T_{ENT} \in [0, \infty]

γ = n \to \infty lim α T_{ENT} \in [0, \infty]

λ \in S_{n} max x \in [n] max D_{α, λ}^{x} (s T_{ENT}) - ϑ (s) ⟶ P 0.

λ \in S_{n} max x \in [n] max D_{α, λ}^{x} (s T_{ENT}) - ϑ (s) ⟶ P 0.

λ \in S_{n} max x \in [n] max D_{α, λ}^{x} (s / α) - e^{- s} ϑ (s / γ) ⟶ P 0.

λ \in S_{n} max x \in [n] max D_{α, λ}^{x} (s / α) - e^{- s} ϑ (s / γ) ⟶ P 0.

λ \in S_{n} max x \in [n] max D_{α, λ}^{x} (s / α) - e^{- s} ⟶ P 0.

λ \in S_{n} max x \in [n] max D_{α, λ}^{x} (s / α) - e^{- s} ⟶ P 0.

\frac{T _{α, λ} ( ε )}{T _{ENT}} ⟶ P 1,

\frac{T _{α, λ} ( ε )}{T _{ENT}} ⟶ P 1,

\frac{T _{α, λ} ( ε )}{T _{ENT}} ⟶ P {1 \frac{1}{γ} lo g (1/ ε) if ε \in (0, e^{- γ}) if ε \in [e^{- γ}, 1) .

\frac{T _{α, λ} ( ε )}{T _{ENT}} ⟶ P {1 \frac{1}{γ} lo g (1/ ε) if ε \in (0, e^{- γ}) if ε \in [e^{- γ}, 1) .

α T_{α, λ} (ε) ⟶ P lo g (1/ ε) .

α T_{α, λ} (ε) ⟶ P lo g (1/ ε) .

P_{α, λ}^{t} (x, \cdot) - π_{α, λ}_{TV} = (1 - α)^{t} P^{t} (x, \cdot) - π_{α, λ} P^{t}_{TV} .

P_{α, λ}^{t} (x, \cdot) - π_{α, λ}_{TV} = (1 - α)^{t} P^{t} (x, \cdot) - π_{α, λ} P^{t}_{TV} .

π_{α, λ} = α k = 0 \sum \infty (1 - α)^{k} λ P^{k},

π_{α, λ} = α k = 0 \sum \infty (1 - α)^{k} λ P^{k},

D_{α, π_{0}}^{x} (t) = (1 - α)^{t} D_{0}^{x} (t) .

D_{α, π_{0}}^{x} (t) = (1 - α)^{t} D_{0}^{x} (t) .

∥ π_{α, λ} - π_{0} ∥_{TV} ⟶ P 0,

∥ π_{α, λ} - π_{0} ∥_{TV} ⟶ P 0,

∣ λ ∣_{\infty} = x \in [n] max λ (x) = O (n^{- 1/2 - δ}) .

∣ λ ∣_{\infty} = x \in [n] max λ (x) = O (n^{- 1/2 - δ}) .

\frac{1}{n} j \in [n] \sum (1 - nλ (j))^{2} = O (1) .

\frac{1}{n} j \in [n] \sum (1 - nλ (j))^{2} = O (1) .

λ P^{t} - π_{0}_{TV} ⟶ P 0.

λ P^{t} - π_{0}_{TV} ⟶ P 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Mixing time of PageRank surfers

on sparse random digraphs

Pietro Caputo*♯*

♯ Dipartimento di Matematica e Fisica, Università Roma Tre, Largo Murialdo 1, 00146 Roma, Italy.

[email protected]

and

Matteo Quattropani*♭*

♭ Dipartimento di Matematica e Fisica, Università Roma Tre, Largo Murialdo 1, 00146 Roma, Italy.

[email protected]

Abstract.

We consider the generalised PageRank walk on a digraph $G$ , with refresh probability $\alpha$ and resampling distribution $\lambda$ . We analyse convergence to stationarity when $G$ is a large sparse random digraph with given degree sequences, in the limit of vanishing $\alpha$ . We identify three scenarios: when $\alpha$ is much smaller than the inverse of the mixing time of $G$ the relaxation to equilibrium is dominated by the simple random walk and displays a cutoff behaviour; when $\alpha$ is much larger than the inverse of the mixing time of $G$ on the contrary one has pure exponential decay with rate $\alpha$ ; when $\alpha$ is comparable to the inverse of the mixing time of $G$ there is a mixed behaviour interpolating between cutoff and exponential decay. This trichotomy is shown to hold uniformly in the starting point and uniformly in the resampling distribution $\lambda$ .

Key words and phrases:

PageRank, random digraphs, non-reversible Markov chain, mixing time, random walks on networks.

2010 Mathematics Subject Classification:

Primary: 05C81, 60J10, 60C05. Secondary: 60G42

1. Introduction and results

Given a directed graph $G=(V,E)$ and a parameter $\alpha\in(0,1)$ , the PageRank surf on $G$ with damping factor $1-\alpha$ is the Markov chain with state space $V$ and transition probabilities given by

[TABLE]

where $n=|V|$ is the number of vertices of $G$ , and, writing $d_{x}^{+}$ for the out-degree of vertex $x$ ,

[TABLE]

denotes the transition matrix of the simple random walk on $G$ . The interpretation is that of a surfer that at each step, with probability $1-\alpha$ moves to a vertex chosen uniformly at random among the out-neighbours of its current state, and with probability $\alpha$ moves to a uniformly random vertex in $V$ . The surfer reaches eventually a stationary distribution $\pi_{\alpha}$ over $V$ , called the PageRank of $G$ . Since its introduction by Brin and Page in the seminal paper [10], PageRank has played a fundamental role in the ranking functions of all major search engines; see e.g. [15, 17]. A common generalization is the so-called customised or generalised PageRank, where the uniform resampling is replaced by an arbitrary probability distribution $\lambda$ over $V$ , so that (1.1) becomes

[TABLE]

The resulting stationary distribution $\pi_{\alpha,\lambda}$ , characterised by the equation

[TABLE]

depends in a nontrivial way on the parameter $\alpha$ and the distribution $\lambda$ . There have been several investigations of the structural properties of $\pi_{\alpha,\lambda}$ ; see e.g. [18, 2, 9]; we refer in particular to the recent works [11, 16, 21] for cases where the graph $G$ is drawn from the configuration model. Here we focus on the dynamical problem of determining the time needed for the surfer to reach the equilibrium distribution $\pi_{\alpha,\lambda}$ , namely we study the mixing time of the Markov chain with transition matrix $P_{\alpha,\lambda}$ . In the case $\alpha=0$ , this corresponds to the classical question of determining the mixing time of the simple random walk on the graph $G$ ; see e.g. [19]. Even for graphs where the latter is well understood, it is in general not immediate to deduce the influence of the parameter $\alpha$ and of the resampling distribution $\lambda$ on the speed of convergence to equilibrium.

It is intuitively reasonable to guess that if the parameter $\alpha$ is suitably large compared to the inverse of the mixing time of the graph $G$ , then the time to reach stationarity will be essentially the expected time needed to make the first $\lambda$ -resampling transition, that is a geometric random variable with parameter $\alpha$ , while if $\alpha$ is suitably small compared to the inverse of the mixing time of the graph $G$ , then one should reach stationarity well before the first $\lambda$ -resampling, so that the speed of convergence to equilibrium will be essentially that of the simple random walk on $G$ . Moreover, one could expect that when $\alpha$ is neither too small nor too large compared to the inverse of the mixing time of the graph $G$ , then some interpolation between the two opposite behaviours should take place. In this paper we substantiate this intuitive picture for a large class of sparse directed graphs. The results hold uniformly in the initial position and uniformly in the resampling distribution $\lambda$ .

1.1. Two models of sparse digraphs

We shall consider two families of directed graphs. Both are obtained via the so-called configuration model, with the difference that in the first case we fix both in and out degrees, while in the second case we only fix the out degrees. The models are sparse in that the degrees are bounded. We now proceed with the formal definition.

Let $V$ be a set of $n$ vertices. For simplicity we often write $V=[n]$ , with $[n]=\{1,\dots,n\}$ . For each $n$ , we are given two finite sequences $\mathbf{d}^{+}=(d_{x}^{+})_{x\in[n]}$ and $\mathbf{d}^{-}=(d_{x}^{-})_{x\in[n]}$ of non negative integers such that

[TABLE]

The directed configuration model DCM( $\mathbf{d}^{\pm}$ ), is the distribution of the random graph $G$ obtained as follows: 1) equip each node $x$ with $d_{x}^{+}$ tails and $d_{x}^{-}$ heads; 2) pick uniformly at random one of the $m!$ bijective maps from the set of all tails into the set of all heads, call it $\omega$ ; 3) for all $x,y\in V$ , add a directed edge $(x,y)$ every time a tail from $x$ is mapped into a head from $y$ through $\omega$ . The resulting graph $G$ may have self-loops and multiple edges, however it is classical that by conditioning on the event that there are no multiple edges and no self-loops one obtains a uniformly random simple digraph with in degree sequence $\mathbf{d}^{-}$ and out degree sequence $\mathbf{d}^{+}$ .

Structural properties of random graphs obtained in this way have been extensively studied in [13]. Here we shall consider the sparse case corresponding to bounded degree sequences. Moreover, in order to avoid non irreducibility issues, we shall assume that all degrees are at least $2$ . Thus, throughout this work it will always be assumed that

[TABLE]

We often use the notation $\Delta=\max_{x\in[n]}d_{x}^{-}\vee d_{x}^{+}$ . Under the first assumption it is known that DCM( $\mathbf{d}^{\pm}$ ) is strongly connected with high probability; see e.g. [13]. Under the second assumption, it is known that DCM( $\mathbf{d}^{\pm}$ ) has a uniformly (in $n$ ) positive probability of having no self-loops nor multiple edges; see e.g. [12]. In particular, any property that holds with high probability for DCM( $\mathbf{d}^{\pm}$ ) will also hold with high probability for a uniformly chosen simple digraph subject to the constraint that in and out degrees be given by $\mathbf{d}^{-}$ and $\mathbf{d}^{+}$ respectively. Here and throughout the rest of the paper we say that a property holds with high probability (w.h.p. for short) if the probability of the corresponding event converges to $1$ as $n\to\infty$ . In particular, it follows that w.h.p. there exists a unique stationary distribution $\pi_{0}$ for the simple random walk on $G$ . Several properties of $\pi_{0}$ have been established recently in [6], where it was shown, among other facts, that $\pi_{0}$ can be described in terms of recursive distributional equations determined by the sequences $\mathbf{d}^{\pm}$ .

To define the second model, for each $n$ let $\mathbf{d}^{+}=(d_{x}^{+})_{x\in[n]}$ be a finite sequence of non negative integers and define the out-configuration model OCM( $\mathbf{d}^{+}$ ) as the distribution of the random graph $G$ obtained as follows: 1) equip each node $x$ with $d_{x}^{+}$ tails; 2) pick, for every $x$ independently, a uniformly random injective map from the set of tails at $x$ to the set of all vertices $V$ , call it $\omega_{x}$ ; 3) for all $x,y\in V$ , add a directed edge $(x,y)$ if a tail from $x$ is mapped into $y$ through $\omega_{x}$ . Equivalently, $G$ is the graph whose adjacency matrix is uniformly random in the set of all $n\times n$ matrices with entries [math] or $1$ such that every row $x$ sums to $d_{x}^{+}$ . Notice that $G$ may have self-loops, but there are no multiple edges in this construction. This is due to the requirement that the maps $\omega_{x}$ be injective. The latter choice is only a matter of convenience, and everything we say below is actually seen to hold as well for the model obtained by dropping that requirement. We write $\omega=(\omega_{x})_{x\in[n]}$ for the collection of maps. As before we shall make the assumptions

[TABLE]

and use the notation $\Delta=\max_{x\in[n]}d_{x}^{+}$ . We remark that under the above assumptions there can still be vertices with in-degree zero, and therefore in this case $G$ is not necessarily strongly connected. However, it is still possible to show that w.h.p. there exists a unique stationary distribution $\pi_{0}$ for the simple random walk on $G$ ; see e.g. [1, 7] for more details.

In what follows $G=G(\omega)$ denotes a given realization of either the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) or the out-configuration model OCM( $\mathbf{d}^{+}$ ) and all the results to be discussed will hold w.h.p. within these two ensembles. For the sake of simplicity we often refer to these as model 1 and model 2 respectively.

1.2. Main results

Let $P$ denote the transition matrix of the simple random walk on $G$ . When $G$ is a digraph without multiple edges this is given by (1.2). If $G$ has multiple edges, $P(x,y)$ is defined as $m(x,y)/d^{+}_{x}$ where $m(x,y)$ denotes the number of directed edges from $x$ to $y$ . For any $\alpha\in(0,1)$ and any resampling distribution $\lambda$ , let $P_{\alpha,\lambda}$ denote the PageRank transition matrix defined in (1.3). Notice that as soon as $\alpha>0$ , regardless of the realization of the graph $G$ and of the chosen distribution $\lambda$ , there exists a unique stationary distribution $\pi_{\alpha,\lambda}$ on $V$ . Indeed, the transition matrix $P_{\alpha,\lambda}$ satisfies the so-called Doeblin condition if $\alpha>0$ ; see Proposition 7 below for an explicit expression of $\pi_{\alpha,\lambda}$ . Convergence to equilibrium will be quantified using the total variation distance. For two probability measures $\mu,\nu$ , the latter is defined by

[TABLE]

where the maximum ranges over all possible events in the underlying probability space. Starting at a node $x$ the distribution of the PageRank surfer after $t$ steps is $P^{t}_{\alpha,\lambda}(x,\cdot)$ , and the distance to equilibrium is defined by

[TABLE]

This defines a non-increasing function of $t\in{\mathbb{N}}$ . It is convenient to extend it to a monotone function of $t\in[0,\infty)$ , e.g. by considering the integer part of the argument. Finally, for any $\varepsilon\in(0,1)$ , the $\varepsilon$ -mixing time is defined by

[TABLE]

Both $\mathcal{D}^{x}_{\alpha,\lambda}(t)$ and $T_{\alpha,\lambda}(\varepsilon)$ are functions of the underlying graph $G$ , and are therefore random variables. When $\alpha=0$ , we write $\mathcal{D}^{x}_{0}(t)$ and $T_{0}(\varepsilon)$ for the corresponding quantities. The behaviour of the distance $\mathcal{D}^{x}_{0}(t)$ and of the mixing time $T_{0}(\varepsilon)$ has been thoroughly investigated in [6] for model 1 and in [7] for model 2. Let us briefly recall the main conclusions of these works. In order to simplify the exposition, we shall adopt the following unified notation. Let us define the in-degree distribution

[TABLE]

where we use the notation

[TABLE]

for the average degree. Note that for model 2 the distribution $\mu_{\rm in\,\!}$ represents the average in-degrees rather than the actual in-degrees. Next, let the entropy $H$ and the associated entropic time $T_{\rm ENT}$ be defined by

[TABLE]

Note that under our assumptions on $\mathbf{d}^{\pm}$ the deterministic quantities $H,T_{\rm ENT}$ satisfy $H=\Theta(1)$ and $T_{\rm ENT}=\Theta(\log n)$ . The main results of [6, 7] state that, uniformly in the starting point $x\in V$ , the rescaled function $\mathcal{D}^{x}_{0}(s\,T_{\rm ENT})$ , $s>0$ , converges in probability as $n\to\infty$ to the step function

[TABLE]

More precisely, we may combine [6, Theorem 1] and [7, Theorem 1] to obtain the following statement.

Theorem 1 (Uniform cutoff at the entropic time [6, 7]).

Let $G$ be a random graph from either the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) or the out-configuration model OCM( $\mathbf{d}^{+}$ ). For each $s>0,s\neq 1$ one has:

[TABLE]

In (1.14) we use the notation $\overset{\mathbb{P}}{\longrightarrow}$ for convergence in probability as $n\to\infty$ . In terms of mixing times, (1.14) implies in particular that for any $\varepsilon\in(0,1)$ :

[TABLE]

The fact that the distance to equilibrium approaches a step function, or equivalently that the $\varepsilon$ -mixing time is to leading order insensitive to the value of $\varepsilon\in(0,1)$ , is commonly referred to as a cutoff phenomenon; see e.g. [14, 19] for a review. We also refer to [20, 4, 5] for similar results in the case of undirected graphs. We stress that a fundamental difference between the case of undirected graphs and the case of directed graphs considered here is that the underlying stationary distribution $\pi_{0}$ is not known explicitly in the directed case.

We now formulate our main results. To obtain explicit asymptotic statements we shall assume that $\alpha=\alpha(n)\in(0,1)$ is a sequence such that $\alpha\to 0$ and such that the limit

[TABLE]

exists, with possibly $\gamma=0$ or $\gamma=\infty$ . We call $\mathcal{S}_{n}$ the set of all probability measures on $[n]$ .

Theorem 2.

Let $G$ be a random graph from either the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) or the out-configuration model OCM( $\mathbf{d}^{+}$ ). Let $\alpha=\alpha(n)\in(0,1)$ be parameters as in (1.16). Then, according to the value of $\gamma$ there are three scenarios:

(1)

If $\gamma=0$ then for all $s>0,s\neq 1$ :

[TABLE] 2. (2)

If $\gamma\in\left(0,\infty\right)$ then for all $s>0,s\neq\gamma$ :

[TABLE] 3. (3)

If $\gamma=\infty$ then for all $s>0$ :

[TABLE]

In terms of mixing times, Theorem 2 implies the following statements.

Corollary 3.

In the setting of Theorem 2, the following holds uniformly with respect to $\lambda$ :

(1)

If $\gamma=0$ then for all $\varepsilon\in(0,1)$

[TABLE] 2. (2)

If $\gamma\in\left(0,\infty\right)$ :

[TABLE] 3. (3)

If $\gamma=\infty$ then for all $\varepsilon\in(0,1)$ :

[TABLE]

The trichotomy displayed in Theorem 2 and Corollary 3 reflects the competition between two distinct mechanisms of relaxation to equilibrium: the simple random walk dominates in the first scenario, while the $\lambda$ -resampling dominates in the third; the intermediate scenario interpolates between the two extremes; see Figure 1.

Remarkably, essentially the same trichotomy was uncovered recently by [3] in a model of random walk on dynamically evolving undirected graphs. In that case, the role of the resampling is played by the underlying reshuffling of the graph edges. It is interesting to observe that, in contrast with the undirected case considered in [3], in our setting the two competing processes may well have very distinct goals, and the overall stationary distribution $\pi_{\alpha,\lambda}$ is the result of a nontrivial balance.

To give some guidelines, below we illustrate the main ideas involved in the proof.

The starting point is the observation that the distance to stationarity $\mathcal{D}_{\alpha,\lambda}^{x}(t)$ satisfies the following general identity at all times $t$ , for all choices of the parameter $\alpha$ and distribution $\lambda$ :

[TABLE]

Here we use the notation $\mu P^{t}(y)=\sum_{x\in V}\mu(x)P^{t}(x,y)$ for the distribution at time $t$ of the simple random walk started at a random vertex distributed according to some distribution $\mu$ . The relation (1.23) follows from a simple coupling argument; see Proposition 8 below. Moreover, the stationary distribution admits the power series expansion

[TABLE]

see Proposition 7 below. A particularly simple special case is when the resampling distribution $\lambda$ equals the stationary distribution $\pi_{0}$ . Indeed, in this case the stationary distribution is the result of a trivial balance and $\pi_{\alpha,\lambda}=\pi_{0}$ , so that (1.23) becomes

[TABLE]

Therefore, when $\lambda=\pi_{0}$ the results in Theorem 2 are an immediate consequence of Theorem 1. Moreover, this shows that the trichotomy in Theorem 2 follows from Theorem 1 whenever the distribution $\lambda\in\mathcal{S}_{n}$ is such that

[TABLE]

since in this case $\pi_{\alpha,\lambda}P^{t}$ is well approximated by $\pi_{0}$ , and the the three claims in Theorem 2 would follow from (1.23). As we shall see, the approximation (1.26) is rather straightforward in the first scenario. Indeed, if $\alpha T_{\rm ENT}\to 0$ then the simple random walk has enough time to reach equilibrium between successive resampling events and (1.26) holds uniformly in $\lambda\in\mathcal{S}_{n}$ , see Proposition 16 below. The second and third scenarios require a different approach since one cannot expect (1.26) to hold for all $\lambda\in\mathcal{S}_{n}$ . There is however a special class of distributions, that we refer to as widespread, which does satisfy (1.26) in all three scenarios.

Definition 4 (Widespread measure).

A sequence of probability measures $\lambda=\lambda_{n}$ on $[n]$ is widespread if

(i)

There exists $\delta>0$ such that

[TABLE] 2. (ii)

Bounded $\ell_{2}$ -distance from the uniform distribution:

[TABLE]

Note that there is no requirement on the minimum of $\lambda(x)$ , so that large portions of the set of vertices are allowed to receive zero mass. An important property of widespread measures is that, if we start with such a distribution $\lambda$ , then the time needed to reach stationarity for the simple random walk is much smaller than the entropic time $T_{\rm ENT}$ . More precisely we shall establish the following facts.

Lemma 5.

Let $G$ be a random graph from either the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) or the out-configuration model OCM( $\mathbf{d}^{+}$ ). If $\lambda=\lambda_{n}$ is widespread, then for any sequence $t=t(n)\to\infty$ ,

[TABLE]

Moreover, in all three scenarios (1.26) holds for every widespread distribution $\lambda$ .

The result in Lemma 5 illustrates well the mechanism behind the trichotomy in the case of widespread measures $\lambda$ , but it is far from explaining the general phenomenon described in Theorem 2. For instance, if $\lambda=\delta_{z}$ is a Dirac mass at a vertex $z$ , then $\lambda P^{t}=P^{t}(z,\cdot)$ and therefore (1.29) must fail for all $t=sT_{\rm ENT}$ , with $s\in(0,1)$ fixed, since by Theorem 1 we know that in this case

[TABLE]

Moreover, the stationary distribution $\pi_{\alpha,\delta_{z}}$ can be very far from $\pi_{0}$ in both scenarios 2 and 3. In particular, using our analysis in Section 3 one can check that in scenario 3,

[TABLE]

While we believe the result in Lemma 5 to be of interest in its own, the proof of Theorem 2 will be based on a different approach.

The first observation is that the identity (1.23) together with the result of Theorem 1 is already sufficient to establish all the upper bounds on the distance $\mathcal{D}_{\alpha,\lambda}^{x}(t)$ required in the proof of Theorem 2, see Section 4 for the details. On the other hand, some extra work is needed for the proof of the lower bounds on $\mathcal{D}_{\alpha,\lambda}^{x}(t)$ . A key technical point for establishing the desired lower bounds will be the following fact concerning scenarios 2 and 3.

Lemma 6.

Let $G$ be a random graph from either the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) or the out-configuration model OCM( $\mathbf{d}^{+}$ ). For fixed $\gamma>0$ , including $\gamma=\infty$ , and $s\in(0,\gamma)$ , for any sequence $\alpha\to 0$ , satisfying $\alpha T_{\rm ENT}\to\gamma$ , and $t=s/\alpha$ :

[TABLE]

Essentially, (1.32) says that the $t$ -step evolution of the random walk starting at any given vertex $x$ is singular with respect to the evolution starting at the page rank distribution, as soon as $t\leq(1-\varepsilon)T_{\rm ENT}$ for some fixed $\varepsilon>0$ . The uniformity in $x\in[n]$ and $\lambda\in\mathcal{S}_{n}$ in Lemma 6 is a delicate matter. We shall see that for general $\lambda\in\mathcal{S}_{n}$ , if $\alpha T_{\rm ENT}\to\gamma>0$ , then $\pi_{\alpha,\lambda}P^{t}$ is a nontrivial mixture of $\pi_{0}$ and another measure $\mu_{\lambda}$ , see Lemma 13 below for the precise version of this statement. Depending on the nature of $\lambda\in\mathcal{S}_{n}$ , the measure $\mu_{\lambda}$ can be either supported on a small subset of $[n]$ , e.g. if $\lambda=\delta_{z}$ for some $z$ , or very spread out, e.g. if $\lambda$ is widespread as in Definition 4. We shall however show that structural features of the random random graph $G$ and the fact that $\alpha\to 0$ imply that the measure $\mu_{\lambda}$ cannot concentrate any mass on the support of the distribution $P^{t}(x,\cdot)$ and thus $\mu_{\lambda}$ and $P^{t}(x,\cdot)$ are approximately singular. We refer to Section 6 for the derivation of this anti-concentration phenomenon. Since $\pi_{0}$ and $P^{t}(x,\cdot)$ are approximately singular for $t\leq(1-\varepsilon)T_{\rm ENT}$ as in (1.30), this will be sufficient to prove Lemma 6.

The rest of the paper is arranged as follows: the next section establishes the basic identities (1.23) and (1.24) and some more preliminary material; Section 3 contains our main technical estimates and the proof of Lemma 6; Section 4 shows how to derive the main results from Lemma 6 and the facts established in Section 2. The discussion of widespread measures and the proof of Lemma 5 form an independent piece of work and are given in Section 5.

2. Preliminaries

Here we collect some simple general facts about the PageRank surf. The statements in this section do not depend on the graph $G$ where the original walk takes place. Therefore, we fix an arbitrary digraph $G$ with vertex set $V=[n]$ , and let $P$ be the transition matrix in (1.2). If $d^{+}_{x}=0$ for some $x$ we may define $P(x,x)=1$ and $P(x,y)=0$ for all $y\in V\setminus\{x\}$ .

2.1. The stationary distribution $\pi_{\alpha,\lambda}$

Proposition 7.

For any $\alpha\in(0,1)$ , any probability vector $\lambda$ , let $P_{\alpha,\lambda}$ be defined by (1.3). There exists a unique probability vector $\pi_{\alpha,\lambda}$ satisfying $\pi_{\alpha,\lambda}P_{\alpha,\lambda}=\pi_{\alpha,\lambda}$ . Moreover, $\pi_{\alpha,\lambda}$ is given by

[TABLE]

Proof.

The equation $\pi_{\alpha,\lambda}P_{\alpha,\lambda}=\pi_{\alpha,\lambda}$ is equivalent to

[TABLE]

Since $P$ is a stochastic matrix, the matrix ${\bf 1}-(1-\alpha)P$ is strictly diagonally dominant, and therefore invertible. Then (2.1) follows by expanding the expression $\pi_{\alpha,\lambda}=\alpha\lambda({\bf 1}-(1-\alpha)P)^{-1}$ . ∎

In particular, (2.1) and the triangle inequality imply that for any other probability vector $\mu$ :

[TABLE]

2.2. Walk vs. teleport

A trajectory of the PageRank surf can be sampled as follows. At each time unit independently, we flip a $\alpha$ -biased coin: if heads (with probability $\alpha$ ) then the surfer is teleported to a new vertex, chosen according to $\lambda$ ; if tails (with probability $1-\alpha$ ) then the surfer walks one step according to the transition matrix $P$ . The probability associated to this construction will be denoted by ${\mathbb{P}}$ . If $\tau_{\alpha}$ denotes the first time the surfer is teleported, then for all $t\in{\mathbb{N}}$ :

[TABLE]

Proposition 8.

For any $\alpha\in(0,1)$ , any probability vector $\lambda$ , and all $t\in\mathbb{N}$ , $x\in[n]$ :

[TABLE]

Proof.

We use the construction introduced above, and write $X^{x}_{t}$ for the position of the surfer at time $t$ with initial vertex $x$ . By using the same sample of the teleporting distribution $\lambda$ we couple two trajectories $X^{x}_{t},X^{z}_{t}$ in such a way that $X^{x}_{t}=X^{z}_{t}$ , for all $t\geq\tau_{\alpha}$ . Therefore, letting ${\mathbb{E}}$ denote the expectation with respect to this coupling:

[TABLE]

Moreover,

[TABLE]

Therefore,

[TABLE]

Multiplying by $\pi_{\alpha,\lambda}(z)$ , summing over $z$ , and using (2.3) one obtains

[TABLE]

It follows that

[TABLE]

∎

Since the total variation distance is always bounded above by $1$ , Proposition 8 implies the upper bound

[TABLE]

The latter, in turn, gives the following upper bound on the mixing time.

Corollary 9.

For any $\alpha\in(0,1)$ , any probability vector $\lambda$ , and all $\varepsilon\in(0,1)$ , the $\varepsilon$ -mixing time (1.10) satisfies

[TABLE]

A further immediate consequence of Proposition 8 is that if $\lambda$ is stationary for $P$ , then the distance to equilibrium $\mathcal{D}^{x}_{\alpha,\lambda}(t)$ takes a simple form.

Corollary 10.

For any $\alpha\in(0,1)$ , for all $x\in V$ and all $t\in\mathbb{N}$ , if $\pi_{0}$ is a probability vector such that $\pi_{0}P=\pi_{0}$ , then taking $\lambda=\pi_{0}$ ,

[TABLE]

Proof.

From Proposition 7 it follows that $\pi_{\alpha,\pi_{0}}=\pi_{0}$ , and therefore $\pi_{\alpha,\pi_{0}}P^{t}=\pi_{0}$ for all $t$ . ∎

Finally, another useful consequence of Proposition 8 is that it allows us to control the distance $\mathcal{D}^{x}_{\alpha,\lambda}(t)$ in terms of the distance $\mathcal{D}^{x}_{\alpha,\pi_{0}}(t)$ , for some stationary $\pi_{0}$ as in Corollary 10, by means of the distance between $\pi_{\alpha,\lambda}$ and $\pi_{0}$ .

Corollary 11.

For any $\alpha\in(0,1)$ , all $t\in\mathbb{N}$ , any probability vector $\lambda$ , if $\pi_{0}$ is such that $\pi_{0}P=\pi_{0}$ ,

[TABLE]

Proof.

From the triangle inequality and the fact that $\|\mu P^{t}-\nu P^{t}\|_{\texttt{TV}}$ is monotone in $t$ for all distributions $\mu,\nu$ , one has

[TABLE]

The conclusion then follows from Proposition 8 and Corollary 10. ∎

3. Main technical estimates

The goal of this section is to prove Lemma 6. The proof is divided into three main steps. The first step is a decomposition of $\pi_{\alpha,\lambda}P^{t}$ as a mixture of $\pi_{0}$ and a distribution $\mu_{\lambda}$ defined below. The second and most delicate step is the proof that $\mu_{\lambda}$ and $P^{t}(x,\cdot)$ are approximately singular for $t$ and $\alpha$ as in Lemma 6. The third step concludes the desired result collecting the technical estimates established in the first two steps.

3.1. Decomposition of $\pi_{\alpha,\lambda}P^{t}$

We start with a useful decomposition of $\pi_{\alpha,\lambda}P^{t}$ as a mixture of $\pi_{0}$ and a distribution $\mu_{\lambda}$ defined as follows. Fix $\eta\in(0,1/2)$ , $t\leq(1-2\eta)T_{\rm ENT}$ , and define $\mu_{\lambda}=\mu^{\eta,t}_{\lambda}$ and $A=A^{\eta,t}$ as

[TABLE]

Note that $\mu_{\lambda}$ depends on the graph $G$ while $A$ is deterministic. We consider the case $\alpha T_{\rm ENT}\to\gamma\in(0,\infty]$ and treat the two cases $\gamma=\infty$ and $\gamma\in(0,\infty)$ separately.

Lemma 12.

Fix $s\in(0,\infty)$ and assume $\alpha T_{\rm ENT}\to+\infty$ and $t=s/\alpha$ . For all $\varepsilon>0$ , there exists $\eta>0$ such that with high probability:

[TABLE]

and the normalization in (3.1) satisfies $A\geq 1-\varepsilon$ .

Proof.

Since $\alpha T_{\rm ENT}\to+\infty$ we have $t\ll T_{\rm ENT}$ . It follows that $A\to 1$ . Using Proposition 7,

[TABLE]

and therefore $\|\pi_{\alpha,\lambda}P^{t}-\mu_{\lambda}\|_{\texttt{TV}}\leq(1-A)$ . ∎

Lemma 13.

Fix $\gamma\in(0,\infty)$ , $s\in(0,\gamma)$ and assume $\alpha T_{\rm ENT}\to\gamma$ and $t=s/\alpha$ . For all $\varepsilon>0$ , there exists $\eta>0$ such that with high probability:

[TABLE]

where $A=A^{\eta,t}$ and $\mu_{\lambda}=\mu^{\eta,t}_{\lambda}$ are given in (3.1).

Proof.

For any $a<b$ , $z\in[n]$ , define the probability vector

[TABLE]

Since $t=s/\alpha$ , $s\in(0,\gamma)$ , and $\alpha T_{\rm ENT}\to\gamma$ we may take $\eta>0$ small enough and assume that $t=\kappa T_{\rm ENT}$ , $\kappa\in(0,1-2\eta)$ . Using Proposition 7, letting $\delta_{z}$ denote the Dirac mass at $z$ :

[TABLE]

Note that $Z_{0,1-\eta-\kappa}=A$ , and $\nu^{z}_{0,1-\eta-\kappa}=\mu_{\delta_{z}}$ . We show that the middle term above is negligible and that $\nu^{z}_{1+\eta-\kappa,\infty}$ is well approximated by $\pi_{0}$ . If $\alpha T_{\rm ENT}\to\gamma\in(0,\infty)$ , by Riemann integration it follows that for all $n$ large enough

[TABLE]

for some constant $C>0$ . Next, using the monotonicity in time of total variation distance and Theorem 1, w.h.p.

[TABLE]

It follows that w.h.p.

[TABLE]

Writing $\pi_{\alpha,\lambda}P^{t}=\sum_{z}\lambda(z)\pi_{\alpha,\delta_{z}}P^{t}$ and taking $A$ and $\mu_{\lambda}$ as in (3.1) concludes the proof. ∎

3.2. Singularity of $\mu_{\lambda}$ and $P^{t}(x,\cdot)$

The key to this result is a property of the random walk that was established in [6, 7]. Roughly speaking this says that with high probability, for most vertices $x$ , the trajectory $\{X_{u}^{x},\,u\leq t\}$ of the walk started at $x$ up to time $t$ is supported by a “small" directed tree $\mathcal{T}_{x}(t)$ rooted at $x$ provided that $t\leq(1-\eta)T_{\rm ENT}$ where $\eta$ is an arbitrary positive constant. As a result the distribution $P^{t}(x,\cdot)$ is rather strongly localized. We shall see that the distribution $\mu_{\lambda}$ , depending on the nature of $\lambda$ , could be either supported on a small subset of $[n]$ (e.g. if $\lambda=\delta_{z}$ for some $z\in[n]$ ) or very spread out (e.g. if $\lambda$ is widespread). The approximate singularity of $\mu_{\lambda}$ and $P^{t}(x,\cdot)$ turns out to be the result of a delicate structural property of the digraph $G$ which guarantees that even if $\mu_{\lambda}$ is localized it must be sufficiently smeared out and cannot concentrate on the support of $P^{t}(x,\cdot)$ . We first recall the construction of the tree $\mathcal{T}_{x}(t)$ and then address the structural properties ensuring this anti-concentration.

3.2.1. The tree $\mathcal{T}_{z}(t)$

Given the digraph $G$ , the tree $\mathcal{T}_{z}(t)$ , for fixed $t\leq(1-\eta)T_{\rm ENT}$ , can be discovered algorithmically as described in [6, Section 6.2] and [7, Section 4.1]. We recall the detailed construction for model 1. A very similar construction can be given for model 2; see [7, Section 4.1].

Below we describe a sequence of digraphs $\mathcal{G}^{0},\mathcal{G}^{1},\dots,\mathcal{G}^{\kappa}$ such that at each step $\mathcal{G}^{\ell}$ is a subset of the out-neighborhood of $z$ of height $t$ in $G$ and such that $\mathcal{G}^{\ell}$ is obtained from $\mathcal{G}^{\ell-1}$ by adding a single edge of $G$ . Moreover, we obtain a sequence of directed trees $\mathcal{T}^{0},\mathcal{T}^{1},\dots,\mathcal{T}^{\kappa}$ such that for every $\ell$ , $\mathcal{T}^{\ell}$ is a spanning tree of $\mathcal{G}^{\ell}$ . The tree $\mathcal{T}_{z}(t)$ will be defined as $\mathcal{T}_{z}(t)=\mathcal{T}^{\kappa}$ .

Initially all matchings of tails and heads in $G$ are unrevealed and $\mathcal{G}^{0}=\mathcal{T}^{0}=\{z\}$ ; let $\partial_{-}\mathcal{T}^{\ell}$ (resp. $\partial_{+}\mathcal{T}^{\ell}$ ) denote the set of unrevealed heads (resp. tails) whose endpoint belongs to $\mathcal{T}^{\ell}$ ; the height $h(e)$ of a tail $e\in\partial_{+}\mathcal{T}^{\ell}$ is defined as $1$ plus the number of edges in the unique path in $\mathcal{T}^{\ell}$ from $z$ to the endpoint of $e$ ; the weight of $e\in\partial_{+}\mathcal{T}^{\ell}$ is defined as

[TABLE]

where $(z=x_{0},x_{1},\dots,x_{h(e)-1})$ denotes the path in $\mathcal{T}^{\ell}$ from $z$ to the endpoint of $e$ ; we then iterate the following steps:

•

a tail $e\in\partial_{+}\mathcal{T}^{\ell}$ is selected with maximal weight among all $e\in\partial_{+}\mathcal{T}^{\ell}$ with $h(e)\leq t$ and $\mathbf{w}(e)\geq\mathbf{w}_{\min}:=n^{-1+\eta^{2}}$ (using an arbitrary ordering of the tails to break ties);

•

the head $f$ matched to $e$ in $G$ is revealed, and $\mathcal{G}^{\ell+1}$ is obtained from $\mathcal{G}^{\ell}$ by adding the edge $ef$ ;

•

if $f$ was not in $\partial_{-}\mathcal{T}^{\ell}$ , then its endpoint and the edge $ef$ are added to $\mathcal{T}^{\ell}$ to form $\mathcal{T}^{\ell+1}$ .

The process stops at $\ell=\kappa$ when there are no tails $e\in\partial_{+}\mathcal{T}^{\kappa}$ with height $h(e)\leq t$ and weight $\mathbf{w}(e)\geq\mathbf{w}_{\min}$ . Note that $\mathcal{T}^{\ell}$ is a directed spanning tree of $\mathcal{G}^{\ell}$ at each step. The tree $\mathcal{T}_{z}(t)$ is defined as $\mathcal{T}^{\kappa}$ . After the construction of the tree $\mathcal{T}_{z}(t)$ , exactly $\kappa$ edges of $G$ have been revealed, some of which may not belong to $\mathcal{T}_{z}(t)$ . Note that $\mathcal{G}^{\kappa}$ has $\kappa$ edges and coincides with the union of all directed paths from $z$ which have length at most $t$ and at least probability $\mathbf{w}_{\min}$ with respect to the random walk started at $z$ . As in [6, Lemma 11], [7, Lemma 7], it is not difficult to see that when exploring the out-neighborhood of $z$ in this way the number $\kappa$ , regardless of the realization of $G$ , is bounded as

[TABLE]

Let us recall the following key facts established in [6, Section 6] for model 1 and in [7, Section 4] for model 2. For every $\eta>0$ , for every $t\leq(1-\eta)T_{\rm ENT}$ , the trajectory $(X_{0},\dots,X_{t})$ of the random walk started at $z$ in $G$ satisfies with high probability $(X_{0},\dots,X_{t})\subset\mathcal{T}_{z}(t)$ for most initial positions $z$ . More precisely, let $Q_{z}(\cdot)$ denote the quenched law of the random walk $(X_{0},X_{1}\dots)$ in $G$ started at $z$ . Let $V_{*}$ denote the set of $z\in[n]$ such that $B^{+}_{z,\hslash}$ is a directed tree, where $\hslash:=\frac{1}{10}\log_{\Delta}(n)$ , and $B^{+}_{z,\hslash}$ denotes the out neighborhood of $z$ of height $\hslash$ in $G$ (that is the subgraph of $G$ induced by the set of vertices which can be reached from $z$ with a path of length at most $\hslash$ ). Then, from [6, Proposition 10, part (ii)], and [7, Lemma 11], one has

[TABLE]

where the notation $(X_{0},\dots,X_{t})\subset\mathcal{T}_{z}(t)$ indicates that the walk up to time $t$ traverses only edges of $\mathcal{T}_{z}(t)$ .

3.2.2. Key technical estimate

Let $\mathcal{A}_{x}(t)$ denote the set of vertices in $\mathcal{T}_{x}(t)$ that have distance from $x$ exactly $t$ in $\mathcal{T}_{x}(t)$ . Recall the definition $\mu_{\lambda}=\mu_{\lambda}^{\eta,t}$ in (3.1).

Lemma 14.

Assume $\alpha T_{\rm ENT}\to\gamma\in(0,\infty]$ . Fix $\eta\in(0,1/2)$ and take $t\leq(1-2\eta)T_{\rm ENT}$ . Then, for all $\varepsilon>0$ , with high probability

[TABLE]

The proof of Lemma 14 is based on a structural property of the graph $G$ which says that the intersections of the trees $\mathcal{T}_{z}(u)$ and $\mathcal{T}_{x}(t)$ , where $x,z$ are two arbitrary vertices and $t\leq u\leq(1-\eta)T_{\rm ENT}$ , are such that, with high probability, for all $x,z\in[n]$ , no path in $\mathcal{T}_{z}(u)$ can intersect more than $K$ times the set $\mathcal{A}_{x}(t)$ where $K$ is a suitably large constant. Let us use the notation $\mathcal{P}_{z}(u)$ for the set of paths in $\mathcal{T}_{z}(u)$ having length exactly $u$ and, for all $\mathfrak{p}\in\mathcal{P}_{z}(u)$ , let $V(\mathfrak{p})$ denote the set of vertices along that path. Note that the endpoint of $\mathfrak{p}\in\mathcal{P}_{z}(u)$ is necessarily a vertex of $\mathcal{A}_{z}(u)$ and $|V(\mathfrak{p})|=u$ , since $\mathcal{T}_{z}(u)$ is a tree.

Lemma 15.

Fix $\eta\in(0,1/2)$ . For every $x,z\in[n]$ and $t\leq u\leq(1-\eta)T_{\rm ENT}$ ,

[TABLE]

for all $n$ large enough, where $K=(9+3\log_{2}\Delta)/\eta^{2}$ . In particular, the event

[TABLE]

holds with high probability.

Proof.

We sample the pair $(\mathcal{T}_{x}(t),\mathcal{T}_{z}(u))$ in the random digraph $G$ by generating first the subgraph $\mathcal{T}_{x}(t)$ , and then the subgraph $\mathcal{T}_{z}(u)$ conditionally on $\mathcal{T}_{x}(t)$ . The construction of $\mathcal{T}_{x}(t)$ follows the steps described by the algorithm in Section 3.2.1 with the understanding that, for model 1 the head $f$ to be matched to the tail $e$ to form $\mathcal{G}^{\ell+1}$ is chosen uniformly at random among all $m-\ell$ heads that are unmatched after the $\ell$ -th step, while for model 2 the tail $e$ has to be connected to a uniformly random vertex in $[n]$ . The process terminates when the tree $\mathcal{T}_{x}(t)$ has been fully generated after $\kappa_{x}$ steps. A crucial feature of this construction is that the tails of all vertices $v\in\mathcal{A}_{x}(t)$ are unmatched once the tree $\mathcal{T}_{x}(t)$ has been generated. Moreover, the number of vertices of $\mathcal{T}_{x}(t)$ satisfies, as in (3.11)

[TABLE]

Next, we generate the tree $\mathcal{T}_{z}(u)$ , conditionally on $\mathcal{T}_{x}(t)$ . This is done by starting at $z$ and by repeating the same steps for the construction described in Section 3.2.1 with the difference that if at step $\ell$ a tail $e$ is chosen which had already been matched during the generation of $\mathcal{T}_{x}(t)$ then the corresponding edge is included in the construction (and possibly in the tree being generated). The process terminates when the tree $\mathcal{T}_{z}(u)$ has been fully generated after $\kappa_{z}$ steps. Thus, after $\kappa_{x}+\kappa_{z}$ steps we have a sample from the joint distribution of $\mathcal{T}_{x}(t)$ and $\mathcal{T}_{z}(u)$ in $G$ . Note that the total number of edges of $G$ discovered after the generation of both trees is $\kappa_{x}+\kappa_{z}\leq 2n^{1-\eta^{2}/2}$ . Let $\{\mathcal{F}_{\ell}\}$ denote the filtration associated to this generation process, so that $\mathcal{F}_{\kappa_{x}}$ is the $\sigma$ -field associated to the tree $\mathcal{T}_{x}(t)$ .

During the process generating $\mathcal{T}_{z}(u)$ conditionally on $\mathcal{T}_{x}(t)$ , we say that a bad matching occurs at step $\ell$ if the tail chosen at that step is currently unmatched (that is it was not revealed during the sampling of $\mathcal{T}_{x}(t)$ ) and it gets connected to a vertex $y$ that was already discovered in $\mathcal{T}_{x}(t)$ . The first key observation is that the conditional probability of a bad matching at step $\ell$ given $\mathcal{F}_{\ell}$ is uniformly bounded above by

[TABLE]

Indeed, in the case of model 1 this probability is at most $\kappa_{x}\Delta/(m-\kappa_{x}-\kappa_{z})$ , while for model 2 this probability is at most $\kappa_{x}/n$ . In either case it is less than the number $p$ defined in (3.16) for all $n$ large enough.

The second key observation is that if a path $\mathfrak{p}\in\mathcal{P}_{z}(u)$ is such that $\mathcal{A}_{x}(t)\cap V(\mathfrak{p})>K$ then at least $K$ bad matchings have occurred during the formation of that given path. To see this, observe that after a vertex $y\in\mathcal{A}_{x}(t)$ is visited for the first time during the construction of $\mathcal{T}_{z}(u)$ , the tails of $y$ will be all matched (at suitable steps $\ell_{1},\dots,\ell_{d_{y}^{+}}$ ) to a uniformly sampled head among the ones that are currently unmatched (for model 1) or to a uniformly random vertex (for model 2). Indeed, the tails of all vertices $v\in\mathcal{A}_{x}(t)$ all have the same weight $\mathbf{w}(e)$ and all of them are unmatched after the tree $\mathcal{T}_{x}(t)$ has been generated. Also, by definition, every path in $\mathcal{T}_{z}(u)$ can visit a given vertex $y$ at most once, and after a visit to $y\in\mathcal{A}_{x}(t)$ it has to return to $\mathcal{T}_{x}(t)$ with a bad matching in order to visit some other $y^{\prime}\in\mathcal{A}_{x}(t)$ . Hence, the number of visits to $\mathcal{A}_{x}(t)$ in a given path $\mathfrak{p}\in\mathcal{P}_{z}(u)$ is at most the number of bad matchings occurred along that path +1. The extra 1 comes from the fact that $z$ could have started already inside $\mathcal{T}_{x}(t)$ , for instance if $z\in\mathcal{A}_{x}(t)$ .

Next, consider an auxiliary directed tree with random marks $\tilde{}\mathcal{T}$ defined as follows: $\tilde{}\mathcal{T}$ is a directed regular tree with deterministic offspring $\Delta$ and height $u$ , with independent and identically distributed Bernoulli( $p$ ) marks on its edges, where $p$ is as in (3.16). Edges whose Bernoulli mark is $1$ are colored red. A path of length $u$ from the root to one of the leaves is called bad if it has at least $K$ red edges. The previous construction then shows that the number of $\mathfrak{p}\in\mathcal{P}_{z}(u)$ such that $\mathcal{A}_{x}(t)\cap V(\mathfrak{p})>K$ is stochastically dominated by the number of bad paths in $\tilde{}\mathcal{T}$ . The probability that a given path in $\tilde{}\mathcal{T}$ is bad is given by

[TABLE]

Therefore, the probability that there exists a bad path in $\tilde{}\mathcal{T}$ is at most $\Delta^{u}(up)^{K}$ . Since $u\leq T_{\rm ENT}\leq\log n/\log 2$ , it follows that

[TABLE]

for all $n$ sufficiently large. Taking $K=(9+3\log_{2}\Delta)/\eta^{2}$ concludes the proof of (3.14). A union bound then implies that the event $\mathcal{E}_{K}$ in the statement of the lemma holds with probability at least $1-1/n$ . ∎

Once Lemma 15 is available, we can prove Lemma 14.

Proof of Lemma 14.

The distribution $\mu_{\lambda}$ satisfies $\mu_{\lambda}=\sum_{z\in[n]}\lambda(z)\mu_{z}$ , where $\mu_{z}:=\mu_{\delta_{z}}$ . Hence, it is sufficient to prove that w.h.p.

[TABLE]

We write

[TABLE]

where $Q_{z}$ is defined as in (3.12). As in [6, Propositon 6] one shows that for both models, with high probability:

[TABLE]

for any fixed constant $k_{0}$ . Hence, for all non negative integers $k,t$ with $k_{0}\leq k+t\leq(1-\eta)T_{\rm ENT}$ :

[TABLE]

Set $u=(1-\eta)T_{\rm ENT}$ . For any $v\in V_{*}$ we write

[TABLE]

By (3.12), the first term in the right hand side is w.h.p. less than $\varepsilon$ uniformly in $v\in V_{*}$ , for any fixed $\varepsilon>0$ . The second term, taking the summation over $k\in[k_{0},(1-\eta)T_{\rm ENT}-t]$ satisfies, w.h.p.

[TABLE]

where $K$ is the constant from Lemma 15 and we have used the fact that the event $\mathcal{E}_{K}$ from Lemma 15 holds with high probability. From (3.19)-(3.21), noting that the first $k_{0}$ terms in the summation over $k$ contribute to (3.19) at most $\alpha k_{0}/A$ , we then obtain, w.h.p.

[TABLE]

Since $\alpha T_{\rm ENT}\to\gamma\in(0,\infty]$ and $\alpha\to 0$ , it follows that for all fixed $\eta>0$ one has $\alpha/A\to 0$ as $n\to\infty$ . Since the parameters $\varepsilon>0$ and $k_{0}\in{\mathbb{N}}$ are arbitrary this implies the desired conclusion. ∎

3.3. Proof of Lemma 6

Assume $\alpha T_{\rm ENT}\to\gamma$ , and $t=s/\alpha$ , with $s\in(0,\gamma)$ as in the statement of Lemma 6. The proof below applies to both cases $\gamma\in(0,\infty)$ and $\gamma=\infty$ . We are going to show that for every fixed $\varepsilon>0$ , there exists an event $\mathcal{E}_{\varepsilon}=\mathcal{E}_{\varepsilon}(n)$ such that ${\mathbb{P}}(\mathcal{E}_{\varepsilon})\to 1$ , $n\to\infty$ , and such that on $\mathcal{E}_{\varepsilon}$ for all $x\in[n]$ there are sets $\mathcal{C}_{x}\subset V$ satisfying

[TABLE]

Indeed, using the decompositions in Lemma 12 (for the case $\gamma=\infty$ ) and in Lemma 13 (for the case $\gamma\in(0,\infty)$ ), if (3.23) holds then w.h.p.

[TABLE]

where it is understood that $A=1$ if $\gamma=\infty$ . Since (3.3) holds uniformly in $\lambda$ and $x$ , this completes the proof of Lemma 6. We turn to the proof of (3.23).

It is important that the estimates in (3.23) hold for all $\varepsilon>0$ and $\eta>0$ small enough (but fixed), where $\eta$ is the parameter implicit in the definition of $\mu_{\lambda}=\mu_{\lambda}^{\eta,t}$ . Since $s\in(0,\gamma)$ , we may assume $t\leq(1-2\eta)T_{\rm ENT}$ by taking $\eta$ small enough. By Theorem 1 we know that for each $\delta>0$ , with high probability there exists sets $\mathcal{B}_{x}$ , such that for all $x\in[n]$ :

[TABLE]

For $\varepsilon>0$ , take $k_{0}=k_{0}(\varepsilon)$ such that $2^{-k_{0}}\leq\varepsilon/2$ , and call $t^{\prime}=t-k_{0}$ . Since $t=\Theta(\alpha^{-1})$ and $\alpha\to 0$ , we have $t^{\prime}>0$ for all $n$ large enough. For all $x\in[n]$ , call $V_{x}$ the subset of vertices in $y\in V_{*}$ such that $P^{k_{0}}(x,y)>0$ . Define

[TABLE]

From (3.12) we know that, for all $\delta>0$ , with high probability,

[TABLE]

By (3.20), (3.25) and (3.26) we obtain

[TABLE]

From (3.25) we also know that $\pi_{0}(\mathcal{C}_{x})\leq\pi_{0}(\mathcal{B}_{x})\leq\delta$ . Taking $\delta=\varepsilon/4$ , this and (3.3) imply the last two items in (3.23) since $2^{-k_{0}}\leq\varepsilon/2$ . It remains to estimate $\mu_{\lambda}(\mathcal{C}_{x})$ . Since $\max_{x}|V_{x}|\leq\Delta^{k_{0}}$ we obtain

[TABLE]

From Lemma 14 we see that with high probability, uniformly in $\lambda$ and $x$ , (3.28) is at most $\Delta^{k_{0}}\delta$ for any fixed $\delta>0$ . Thus taking $\delta=\Delta^{-k_{0}}\varepsilon$ concludes the proof of Lemma 6.

4. Proof of the trichotomy

In this section we show how to prove Theorem 2 from the facts established above. Thus, $G$ is a random graph from either the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) or the out-configuration model OCM( $\mathbf{d}^{+}$ ), where the degree sequences satisfy the assumptions (1.6) and (1.7) respectively, and $\pi_{0}$ denotes the (w.h.p.) unique stationary distribution for the simple random walk on $G$ .

4.1. Scenario 1

We begin with scenario $1$ , namely when $\alpha T_{\rm ENT}\to 0$ .

Proposition 16.

For any sequence $\alpha$ such that $\alpha T_{\rm ENT}\to 0$ ,

[TABLE]

Proof.

We need to show that, uniformly in $\lambda$ , for any $\delta>0$ ,

[TABLE]

The upper bound (2.2) shows that for all $t\in{\mathbb{N}}$ :

[TABLE]

Take $t=sT_{\rm ENT}$ , with some fixed $s>1$ , and observe that by Theorem 1 we know that for all $k>t$ , for all $\lambda$ :

[TABLE]

In particular, using $\alpha t\to 0$ :

[TABLE]

∎

The claim (1.17) is thus a consequence of Corollary 10, Corollary 11 and Theorem 1.

4.2. Scenario 3

Suppose $\alpha T_{\rm ENT}\to+\infty$ , and $t=s/\alpha$ for some fixed $s\in(0,\infty)$ . From Lemma 6, Proposition 8 and the upper bound (2.10) we obtain:

[TABLE]

Equivalently,

[TABLE]

This proves (1.19).

4.3. Scenario 2

Here $\alpha T_{\rm ENT}\to\gamma\in(0,\infty)$ . We take $t=s/\alpha$ , with fixed $s\in(0,\infty)$ . We consider separately the case $s\in(\gamma,\infty)$ and the case $s\in(0,\gamma)$ .

Suppose first $s\in(\gamma,\infty)$ . By Proposition 8 and the triangle inequality

[TABLE]

Since $s\in(\gamma,\infty)$ , for some $\varepsilon>0$ we have $t\geq(1+\varepsilon)T_{\rm ENT}$ . Therefore, by Theorem 1 it follows that

[TABLE]

On the other hand, suppose that $s\in(0,\gamma)$ . Here we can apply Lemma 6, Proposition 8 and the upper bound (2.10), as in Section 4.2 above, to obtain

[TABLE]

Combining (4.9) and (4.10), we have proved (1.18).

5. Widespread measures

The goal of this section is to prove Lemma 5. We remark that the statement $\|\pi_{\alpha,\lambda}-\pi_{0}\|_{\texttt{TV}}\to 0$ in probability is a consequence of (1.29). Indeed, fix any sequence $\alpha=\alpha(n)\to 0$ , and take $t=t(n)\to\infty$ such that $\alpha t\to 0$ . From (1.29) we know that

[TABLE]

As in (4.3), from the upper bound (2.2) and the monotonicity in time of total variation distance to stationarity we obtain:

[TABLE]

Using (5.1) and $\alpha t\to 0$ we conclude the proof. Thus, we are left to prove (1.29).

In the special case where $\lambda=\mu_{\rm in\,\!}$ , and for the directed configuration model DCM( $\mathbf{d}^{\pm}$ ), a similar result was already obtained in [6]. Here we are going to prove it for the case of the out-configuration model OCM( $\mathbf{d}^{+}$ ) as well, and more importantly we are going to extend it to the case of an arbitrary widespread probability measure $\lambda$ . Following the approach in [6], the proof of Lemma 5 will be based on the construction of a martingale approximation for the distribution $\lambda P^{t}$ . The latter, in turn, rests on a branching approximation which allows one to couple the in-neighbourhood of a uniformly distributed random vertex of $G$ with a marked Galton-Watson tree up to depth $t=o(\log n)$ .

We start with the definition of the relevant branching processes and the associated martingales. These will later be used in a coupling argument to provide an approximate description of the in-neighbourhood of a vertex in our random graphs, and of the stationary distribution at that vertex. Since the constructions differ slightly for the two models DCM( $\mathbf{d}^{\pm}$ ) or OCM( $\mathbf{d}^{+}$ ) we will define two distinct random trees $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ and $\mathcal{T}^{-}(\mathbf{d}^{+})$ .

5.1. The marked Galton-Watson trees $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ , $\mathcal{T}^{-}(\mathbf{d}^{+})$

Given $n\in{\mathbb{N}}$ , and a double sequence $\mathbf{d}^{\pm}$ of degrees satisfying (1.5) and (1.6), for each $i\in[n]$ , we define the rooted random marked tree $\mathcal{T}^{-}_{i}(\mathbf{d}^{\pm})$ recursively with the following rules:

•

the root is given the mark $i$ ;

•

every vertex with mark $j$ has $d^{-}_{j}$ children, each of which is given independently the mark $k\in[n]$ with probability $d^{+}_{k}/m$ .

On the other hand, given $n\in{\mathbb{N}}$ , and a sequence $\mathbf{d}^{+}$ of degrees satisfying (1.7), for each $i\in[n]$ , the rooted random marked tree $\mathcal{T}^{-}_{i}(\mathbf{d}^{+})$ is defined by:

•

the root is given the mark $i$ ;

•

regardless of its own mark every vertex has, for each $j\in[n]$ independently with probability $d^{+}_{j}/n$ , a child with mark $j$ .

There are several differences between the two trees $\mathcal{T}^{-}_{i}(\mathbf{d}^{\pm})$ and $\mathcal{T}^{-}_{i}(\mathbf{d}^{+})$ . In the first case the number of children of a given vertex is a deterministic function of the vertex’s mark, whereas in the second case it is a random variable $D$ that can be written as

[TABLE]

where the $Y_{j}$ are independent Bernoulli random variables with parameters $d^{+}_{j}/n$ . In particular, the average number of children of any given vertex in $\mathcal{T}^{-}_{i}(\mathbf{d}^{+})$ is

[TABLE]

Since $D$ can be zero, in contrast with the tree $\mathcal{T}^{-}_{i}(\mathbf{d}^{\pm})$ , the tree $\mathcal{T}^{-}(\mathbf{d}^{+})$ is finite with positive probability. However, the two trees share several common features and we shall try to treat the two cases in a unified fashion as much as possible.

We write $\mathbf{o}$ for the root and $\mathbf{x},\mathbf{y}$ for other vertices of the tree, with the notation $\mathbf{y}\to\mathbf{x}$ if $\mathbf{y}$ is a child of $\mathbf{x}$ . Each vertex $\mathbf{x}$ of the tree has a mark, which we denote by $i(\mathbf{x})$ . If $\mathcal{I}$ denotes an independent uniformly random $i\in[n]$ , and the root is given the mark $i(\mathbf{o})=\mathcal{I}$ , then we write $\mathcal{T}^{-}(\mathbf{d}^{\pm})=\mathcal{T}^{-}_{\mathcal{I}}(\mathbf{d}^{\pm})$ and $\mathcal{T}^{-}(\mathbf{d}^{+})=\mathcal{T}^{-}_{\mathcal{I}}(\mathbf{d}^{+})$ . Notice that $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ and $\mathcal{T}^{-}(\mathbf{d}^{+})$ have the same average degree at the root, given by (5.4). We often write $\mathcal{T}^{-}$ for short if this creates no confusion. For each $t\in{\mathbb{N}}$ we let $\mathcal{T}^{-,t}$ denote the set of vertices in the generation $t$ of the tree. Each vertex $\mathbf{x}\in\mathcal{T}^{-,t}$ has a unique path $(\mathbf{x}_{t},\mathbf{x}_{t-1},\dots,\mathbf{x}_{1},\mathbf{x}_{0})$ connecting it to the root with $\mathbf{x}_{t}=\mathbf{x}$ and $\mathbf{x}_{0}=\mathbf{o}$ . To any such $\mathbf{x}$ we associate the weight

[TABLE]

If $\mathcal{T}^{-,t}$ coincides with the in-neighbourhood of $\mathbf{o}$ in a digraph $G$ , then $w(\mathbf{x})$ is the probability that the simple random walk on $G$ goes from $\mathbf{x}$ to $\mathbf{o}$ in $t$ steps.

5.2. Martingale approximation

Given a function $\varphi:[n]\mapsto{\mathbb{R}}$ , we define the process

[TABLE]

We write $\mathcal{F}_{t}$ for the $\sigma$ -algebra generated by the random tree $\mathcal{T}^{-}$ up to and including generation $t$ .

Lemma 17.

Let $\mathcal{T}^{-}$ be either $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ or $\mathcal{T}^{-}(\mathbf{d}^{+})$ , and write $\bar{\varphi}=\sum_{j=1}^{n}\varphi(j).$ Then, for all $t\in{\mathbb{N}}$ :

[TABLE]

Proof.

Let the symbol $\sum_{\mathbf{y}\to\mathbf{x}}$ denote the sum over the set of children of $\mathbf{x}$ and note the symbolic identity

[TABLE]

Therefore,

[TABLE]

For the tree $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ we have

[TABLE]

For the tree $\mathcal{T}^{-}(\mathbf{d}^{+})$ we have

[TABLE]

This proves (5.7). ∎

In particular, when $\varphi=\mu_{\rm in\,\!}$ , then

[TABLE]

Therefore, $X_{t}(\mu_{\rm in\,\!})$ is a martingale with respect to the filtration $\mathcal{F}_{t}$ . It is convenient to normalize it and consider instead the martingale defined as

[TABLE]

Notice that ${\mathbb{E}}[M_{t}]={\mathbb{E}}[M_{0}]=n{\mathbb{E}}[\mu_{\rm in\,\!}(\mathcal{I})]=1$ . In the case of model 1, the following convergence result was already discussed in [6, Proposition 15].

Proposition 18.

For every fixed $n$ , as $t\to\infty$ the martingale $M_{t}$ converges to a limit $M_{\infty}$ , both almost surely and in $L^{2}$ (see [22, Ch. 12]) and for all $t\in{\mathbb{N}}$ :

[TABLE]

where the constants $\rho,C$ are given by

[TABLE]

Proof.

Consider the increments

[TABLE]

Reasoning as in Lemma 17, for both models we write

[TABLE]

where $\psi$ is defined as

[TABLE]

As in Lemma 17 one has ${\mathbb{E}}[\psi(\mathbf{x})\,|\,\mathcal{F}_{t}]=0$ . Let us compute ${\mathbb{E}}[\psi(\mathbf{x})^{2}|\,\mathcal{F}_{t}]$ . For the tree $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ , we can rewrite

[TABLE]

Therefore,

[TABLE]

where we use the notation

[TABLE]

For the tree $\mathcal{T}^{-}(\mathbf{d}^{+})$ we have

[TABLE]

where $\rho$ is as in (5.14). Since ${\mathbb{E}}[\psi(\mathbf{x})\psi(\mathbf{x}^{\prime})\,|\,\mathcal{F}_{t}]=0$ for all $\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{T}^{-,t}$ with $\mathbf{x}\neq\mathbf{x}^{\prime}$ ,

[TABLE]

Therefore, combining (5.19) and (5.2) we have

[TABLE]

where $\rho,C$ are given by (5.14). Furthermore, observe that in both models one has

[TABLE]

Thus, iterating we obtain

[TABLE]

Since $d^{+}_{j}\geq 2$ one has $\rho\leq 1/2$ . Thus $M_{t}$ is a martingale bounded in $L^{2}$ , and therefore $M_{t}\to M_{\infty}$ almost surely and in $L^{2}$ , for some $M_{\infty}\in L^{2}$ . Using the orthogonality ${\mathbb{E}}[\Delta_{t}\Delta_{t^{\prime}}]=0$ for all $t\neq t^{\prime}$ , (5.13) follows by summing (5.24) from $t$ to $+\infty$ . ∎

Remark 19.

For each fixed $n\in{\mathbb{N}}$ , one can characterise the random variable $M_{\infty}$ as the solution to a distributional fixed point equation. For the directed configuration model DCM( $\mathbf{d}^{\pm}$ ) this is discussed in [6, Lemma 16]. With a similar reasoning, for the out-configuration model OCM( $\mathbf{d}^{+}$ ) one obtains that

[TABLE]

where $\overset{d}{=}$ stands for equality of distributions, $M_{\infty,j}$ are i.i.d. copies of $M_{\infty}$ and $Y_{j}$ are independent Bernoulli random variables with parameter $d^{+}_{j}/n$ .

The next result will be crucial for the analysis of widespread measures. Notice that the constant $\gamma(\lambda)$ appearing in the estimate below is bounded uniformly in $n$ if and only if $\lambda$ satisfies (1.28).

Proposition 20.

For any probability vector $\lambda$ , and any $t\in{\mathbb{N}}$ :

[TABLE]

where $\rho\in(0,1)$ is as in Proposition 18 and $\gamma(\lambda)$ is defined as

[TABLE]

Proof.

Setting $\varphi(j)=n(\mu_{\rm in\,\!}(j)-\lambda(j))$ , we write $M_{t}-nX_{t}(\lambda)=X_{t}(\varphi)$ . Since $\bar{\varphi}=0$ , Lemma 17 shows that ${\mathbb{E}}[M_{t}-nX_{t}(\lambda)|\mathcal{F}_{t-1}]=0$ . We now compute

[TABLE]

Using $\bar{\varphi}=0$ one has

[TABLE]

For the tree $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ we have

[TABLE]

On the other hand for the tree $\mathcal{T}^{-}(\mathbf{d}^{+})$ we have

[TABLE]

Summarising, we have shown that

[TABLE]

Thus, the same argument used in (5.2) implies that in both models

[TABLE]

where $\rho$ is defined as in (5.14). Therefore,

[TABLE]

The desired bound follows from the fact that in both models $C(\lambda)\leq\gamma(\lambda)$ . ∎

5.3. Branching approximation for in-neighbourhoods

The $t$ -in-neighbourhood of a vertex $v$ , denoted $B^{-}_{v,t}$ , is defined as the subgraph of $G$ induced by the set of directed paths of length $t$ in $G$ which terminate at vertex $v$ . Here we observe that for any fixed $v\in[n]$ , if $t$ is a small multiple of $\log n$ then with high probability $B^{-}_{v,t}$ can be coupled to the first $t$ generations of the random trees defined in Section 5.1. We consider the two models separately.

5.3.1. $B^{-}_{v,t}$ for DCM( $\mathbf{d}^{\pm}$ )

Recall that each vertex $x$ has $d^{-}_{x}$ heads and $d^{+}_{x}$ tails. Call $E_{x}^{-}$ and $E_{x}^{+}$ the sets of heads and tails at $x$ respectively. The uniform bijection $\omega$ between heads and tails, viewed as a matching, can be sampled by iterating the following steps until there are no unmatched heads left:

pick an unmatched head $e_{-}$ according to some priority rule; 2. 2)

pick an unmatched tail $e_{+}$ uniformly at random; 3. 3)

match $e_{-}$ with $e_{+}$ , i.e. set $\omega(e_{+})=e_{-}$ .

Note that this gives the desired uniform distribution over matchings regardless of the priority rule chosen at step 1. The graph $G$ is obtained by adding a directed edge $(x,y)$ whenever $e_{-}\in E_{y}^{-}$ and $e_{+}\in E_{x}^{+}$ in step 3 above.

To generate $B^{-}_{v,t}$ only, one can start at vertex $v$ and run the previous sequence of steps, by giving priority to those unmatched heads which have minimal distance from vertex $v$ , until this minimal distance exceeds $t$ , at which point the process stops. During the process, say that a vertex $x$ is exposed if at least one of the tails $e_{+}\in E_{x}^{+}$ or heads $e_{-}\in E_{x}^{-}$ has been already matched. Notice that as long as in step 2 no tail $e_{+}$ is picked from exposed vertices, the resulting digraph is a directed tree.

Let us now describe a coupling of the in-neighbourhood $B^{-}_{v,t}$ and the marked tree $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{\pm})$ , where $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{\pm})$ stands for the marked tree $\mathcal{T}^{-}_{v}(\mathbf{d}^{\pm})$ up to generation $t$ ; see Section 5.1 for the definition of $\mathcal{T}^{-}_{v}(\mathbf{d}^{\pm})$ . Clearly, step $2$ above can be modified by picking $e$ uniformly at random among all (matched or unmatched) tails and rejecting the proposal if the tail was already matched. The tree can then be generated by iteration of the same sequence of steps with the difference that at step $2$ we never reject the proposal and at step $3$ we add a new leaf to the current tree, with mark $x$ if $e_{+}\in E_{x}^{+}$ , together with a new set of $d^{-}_{x}$ unmatched heads attached to it. Call $\tau$ the first time that a uniform random choice among all tails gives $e_{+}\in E_{x}^{+}$ with $x$ already in the tree. By construction, the in-neighbourhood and the tree coincide up to time $\tau$ . At the $k$ -th iteration, the probability of picking a tail with a mark already used is at most $k\Delta/m$ , where $\Delta$ is the maximum degree. Therefore, by a union bound,

[TABLE]

Taking $k=\Delta^{t+1}$ steps, we have necessarily uncovered the whole in-neighbourhood $B^{-}_{v,t}$ . Thus, we have proved the following statement.

Lemma 21.

The $t$ -in-neighbourhood $B^{-}_{v,t}$ and the marked tree $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{\pm})$ can be coupled in such a way that

[TABLE]

5.3.2. $B^{-}_{v,t}$ for OCM( $\mathbf{d}^{+}$ )

Recall that each vertex $x$ has $d^{+}_{x}$ tails, and call $E_{x}^{+}$ the sets of tails at $x$ . Consider the following exploration process of the in-neighbourhood at a fixed vertex $v$ . The process is defined as a triple $(\mathcal{C}_{\ell},\mathcal{A}_{\ell},\phi_{\ell})$ where $\mathcal{C}_{\ell},\mathcal{A}_{\ell}\subset[n]$ are respectively the completed set and the active set at time $\ell$ , and $\phi_{\ell}:[n]\mapsto{\mathbb{Z}}_{+}$ is a map such that $\phi_{\ell}(y)\in\{0,\dots,d^{+}_{y}\}$ for each $y\in[n]$ , $\ell\in{\mathbb{Z}}_{+}$ . At time zero we set $\mathcal{C}_{0}=\emptyset,\mathcal{A}_{0}=\{v\}$ , and $\phi_{0}(y)=0$ for all $y\in[n]$ . The $\ell$ -th iteration of the exploration determines the triple $(\mathcal{C}_{\ell},\mathcal{A}_{\ell},\phi_{\ell})$ by executing the following steps:

pick a vertex $x\in\mathcal{A}_{\ell-1}$ according to some priority rule; 2. 2)

for each $y=1,\dots,n$ independently, sample $X_{\ell,y}$ defined as the Bernoulli random variable with parameter

[TABLE]

call $V_{\ell}$ the set of $y\in[n]$ such that $X_{\ell,y}=1$ , and define

[TABLE] 3. 3)

define the new triple $(\mathcal{C}_{\ell},\mathcal{A}_{\ell},\phi_{\ell})$ as

[TABLE]

Note that this process stops when $\mathcal{A}_{\ell}$ becomes empty. Let us call $\tau_{\emptyset}$ this random time:

[TABLE]

For instance, $\tau_{\emptyset}=1$ with probability $\prod_{y=1}^{n}(1-d^{+}_{y}/n)$ . We may construct a digraph $G_{v}(\ell)$ along with the above process by adding the directed edges $(y,x)$ for all $y\in V_{\ell}$ at step $2$ . Notice that when the process stops $G_{v}(\tau_{\emptyset})$ is a sample of the subgraph of $G$ induced by all directed paths in $G$ that terminate at $v$ . In particular, if the priority in step 1 is given to $x$ which have minimal distance to $v$ , and if we stop the process as soon as all active vertices have distance to $v$ larger than $t$ in the current graph $G_{v}(\ell)$ , we obtain the in-neighbourhood of $v$ at distance $t$ , namely the digraph $B^{-}_{v,t}$ for the model OCM( $\mathbf{d}^{+}$ ). More formally, if $\tau_{t}$ denotes the minimal $\ell$ such that all $x\in\mathcal{A}_{\ell}$ have distance to $v$ at least $t+1$ in $G_{v}(\ell)$ then, $B^{-}_{v,t}$ is given by the subgraph of $G_{v}(\tau_{t}\wedge\tau_{\emptyset})$ induced by the completed set $\mathcal{C}_{\tau_{t}\wedge\tau_{\emptyset}}$ , where $a\wedge b$ denotes the minimum of $a,b$ .

Let us remark that the quantity $p_{\ell,y}$ in (5.36) cannot exceed $1$ . In fact, in case there exists some $\ell\in\mathbb{N}$ such that $p_{\ell,y}=1$ then it means that at most $d_{y}^{+}$ vertices need to be discovered at step $\ell$ , and vertex $y$ needs to link to all of them. Hence, $p_{\ell,y}$ stays $1$ up to the end of the process.

Let us now describe a coupling of $B^{-}_{v,t}$ and the marked tree $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{+})$ , where we write $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{+})$ for the marked tree $\mathcal{T}^{-}_{v}(\mathbf{d}^{+})$ up to generation $t$ ; see Section 5.1. First, observe that the tree $\mathcal{T}^{-}_{v}(\mathbf{d}^{+})$ is obtained by iterating the steps above with the difference that at step 2 the probability $p_{\ell,y}$ must be taken always equal to $d^{+}_{\ell}/n$ , and that each $y\in V_{\ell}$ yields a new child with mark $y$ in the current tree. Let $\mathcal{T}^{-}_{v}(\ell)$ denote the tree obtained after $\ell$ iterations, and let $\Delta=\max_{x}d^{+}_{x}$ .

Lemma 22.

The random variables $G_{v}(\ell),\mathcal{T}^{-}_{v}(\ell)$ can be coupled in such a way that for every $\ell\in{\mathbb{N}}$ :

[TABLE]

Proof.

Let $E_{\ell}=\{G_{v}(\ell)\neq\mathcal{T}^{-}_{v}(\ell)\}$ . Since at time 0 one has $G_{v}(0)=\mathcal{T}^{-}_{v}(0)=\{v\}$ , the event $E_{\ell}$ satisfies $E_{\ell}=\cup_{k=1}^{\ell}E_{k-1}^{c}\cap E_{k}$ , so that

[TABLE]

Consider now the $k$ -th iteration, and assume that $G_{v}(k-1)=\mathcal{T}^{-}_{v}(k-1)$ . Thus, we may pick the same $x$ in step 1 for both samples. At step 2, let $X_{k,y}$ denote the Bernoulli random variables with parameter $p_{k,y}$ used for the sampling of $G_{v}(k)$ and let $\tilde{X}_{k,y}$ be the Bernoulli random variables with parameter $d^{+}_{y}/n$ used for the sampling of $\mathcal{T}^{-}_{v}(k)$ . The total variation distance between two Bernoulli random variables equals the absolute value of the difference of their parameters. Therefore, for each $y$ independently we may couple $(X_{k,y},\tilde{X}_{k,y})$ with probability $1-|p_{k,y}-d^{+}_{y}/n|$ . Notice that if $G_{v}(k)\neq\mathcal{T}^{-}_{v}(k)$ , then either at least one of the pairs $(X_{k,y},\tilde{X}_{k,y})$ fails to couple, or at least one of the $y\in\mathcal{C}_{k-1}\cup\mathcal{A}_{k-1}$ has $\tilde{X}_{k,y}=1$ . Thus, on the event $E_{k-1}^{c}$ , the probability of $E_{k}$ given the history up to the $(k-1)$ -th iteration is bounded above by

[TABLE]

If $y\not\in\mathcal{C}_{k-1}\cup\mathcal{A}_{k-1}$ , then $\phi_{k-1}(y)=0$ and $p_{k,y}-d^{+}_{y}/n=\frac{d_{y}^{+}}{n}\cdot\frac{k-1}{n-k+1}$ . For the second term we write $|\mathcal{C}_{k-1}\cup\mathcal{A}_{k-1}|\leq Z_{k-1}$ , where $Z_{\ell}$ denotes the number of edges in the tree $\mathcal{T}^{-}_{v}(\ell)$ . In conclusion, (5.41) is bounded by

[TABLE]

Thus, letting $\mathcal{F}_{\ell}$ denote the $\sigma$ -algebra generated by the two processes up to time $\ell$ , we have obtained

[TABLE]

From (5.4) we deduce ${\mathbb{E}}[Z_{k-1}]=(k-1)\left\langle d\right\rangle\leq(k-1)\Delta$ . Therefore, the estimate (5.39) follows from (5.40) and (5.3.2). ∎

The next lemma establishes the coupling estimate for the $t$ -in-neighbourhood $B^{-}_{v,t}$ and the tree $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{+})$ . The estimate could be refined but (5.43) below will be more than sufficient for our purposes.

Lemma 23.

The random variables $B^{-}_{v,t}$ and the tree $\mathcal{T}^{-}_{v,t}(\mathbf{d}^{+})$ can be coupled in such a way that for every $t\leq\frac{\log n}{4\log\Delta}$ , for all $n$ large enough:

[TABLE]

Proof.

Let $|\mathcal{T}^{-}_{v,t}|$ denote the number of edges in the tree $\mathcal{T}^{-}_{v,t}=\mathcal{T}^{-}_{v,t}(\mathbf{d}^{+})$ . Since at each iteration the number of edges added is stochastically dominated by a binomial random variable with parameters $n$ and $\Delta/n$ , one has a large deviation bound for $|\mathcal{T}^{-}_{v,t}|$ of the form: there exist absolute constants $a,A>0$ such that

[TABLE]

The estimate (5.44) can be proved e.g. by repeating the argument in [8, Lemma 23]. Next, observe that if $|\mathcal{T}^{-}_{v,t}|\leq s\Delta^{t}$ and $B^{-}_{v,t}\neq\mathcal{T}^{-}_{v,t}$ , then there must exist $\ell=1,\dots,s\Delta^{t}$ such that $G_{v}(\ell)\neq\mathcal{T}^{-}_{v}(\ell)$ . The latter probability can be bounded via Lemma 22. Summarizing,

[TABLE]

The estimate (5.43) follows by taking $s=K\log n$ for some large enough constant $K$ , and by taking $n$ sufficiently large. ∎

5.4. Proof of Lemma 5

Recall that in both models DCM( $\mathbf{d}^{\pm}$ ) and OCM( $\mathbf{d}^{+}$ ) one has w.h.p. a unique stationary distribution for the simple random walk on $G$ , which we denote $\pi_{0}$ . The starting point is a result that follows directly from [6, 7], which allows us to replace the unknown distribution $\pi_{0}$ with a local approximation.

Proposition 24.

For any fixed $\varepsilon>0$ , taking $h=\varepsilon T_{\rm ENT}$ , as $n\to\infty$ both models satisfy

[TABLE]

Proof.

For a specific choice of $\varepsilon=\varepsilon_{0}$ , this result appears in [6, Eq. (11)] for model 1 and [7, Eq. (12)] for model 2. In fact, the proofs in [6, 7] apply to any fixed $\varepsilon\in(0,\varepsilon_{0})$ without modifications. Since $\left\|\mu_{\rm in\,\!}P^{h}-\pi_{0}\right\|_{\texttt{TV}}$ is monotone in $h$ the statement (5.46) holds for all $\varepsilon>0$ . ∎

To prove Lemma 5, by monotonicity of $\left\|\lambda P^{t}-\pi_{0}\right\|_{\texttt{TV}}$ as a function of $t$ , we may restrict to sequences $t=t(n)\to\infty$ with $t=o(\log n)$ . Thus, taking advantage of Proposition 24, the conclusion of Lemma 5 is a consequence of the following result.

Proposition 25.

There exists $\varepsilon>0$ such that if $h=\varepsilon T_{\rm ENT}$ , then for any $t=t(n)\to\infty$ with $t=o(\log n)$ , for any widespread measure $\lambda$ :

[TABLE]

Proof.

The proof is based on a first moment argument. Indeed, it suffices to show that

[TABLE]

Observe that

[TABLE]

where $\mathcal{I}$ denotes an independent uniformly random vertex in $[n]$ and the expectation ${\mathbb{E}}$ is understood to include the expectation over $\mathcal{I}$ as well. Consider the first term above. We are going to use Lemma 21 for model 1 and Lemma 23 for model 2. Notice that since these estimates apply to any fixed vertex $v$ , they apply just as well if the vertex $v$ is taken to be uniformly random in $[n]$ , i.e. if $v=\mathcal{I}$ as it is the case here. In particular, since $t=o(\log n)$ , as $n\to\infty$ ,

[TABLE]

where we use the unified notation $\mathcal{T}^{-}_{t}$ for the first $t$ generations of the tree $\mathcal{T}^{-}_{\mathcal{I}}$ in either model 1 or model 2. Next, note that by definition, if $B^{-}_{\mathcal{I},t}=\mathcal{T}^{-}_{t}$ , then

[TABLE]

where we use the notation from (5.6) and (5.12). Therefore,

[TABLE]

where we used the fact that

[TABLE]

which follows from $\left\|\lambda P^{t}-\mu_{\rm in\,\!}P^{t}\right\|_{\texttt{TV}}\leq 1$ . Using Schwarz’ inequality and Proposition 20 it follows that

[TABLE]

Since $t=t(n)\to\infty$ as $n\to\infty$ and $\rho\in(0,1)$ , using (5.50) we conclude that

[TABLE]

for all widespread measure $\lambda$ . This settles the convergence of the first term in (5.4). To handle the second term, reasoning as in (5.51) we obtain

[TABLE]

If $h\leq\frac{\log n}{4\log\Delta}$ , Lemma 21 and Lemma 23 imply that both models satisfy

[TABLE]

Moreover, Schwarz’ inequality, Proposition 18 and standard facts about square integrable martingales (see, e.g., [22, Ch. 12]) imply

[TABLE]

Since the constant $C$ is bounded, letting $n\to\infty$ concludes the proof.

∎

Acknowledgments

We acknowledge support of PRIN 2015 5PAWZB “Large Scale Random Structures", and of INdAM-GNAMPA Project 2019 “Markov chains and games on networks”.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Louigi Addario-Berry, Borja Balle, and Guillem Perarnau. Diameter and stationary distribution of random r 𝑟 r -out digraphs. The Electronic Journal of Combinatorics , pages P 3–28, 2020.
2[2] Reid Andersen, Fan Chung, and Kevin Lang. Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06) , pages 475–486. IEEE, 2006.
3[3] Luca Avena, Hakan Güldas, Remco van der Hofstad, and Frank den Hollander. Random walks on dynamic configuration models: a trichotomy. Stochastic Processes and their Applications , 2018.
4[4] Anna Ben-Hamou and Justin Salez. Cutoff for nonbacktracking random walks on sparse random graphs. The Annals of Probability , 45(3):1752–1770, 2017.
5[5] Nathanael Berestycki, Eyal Lubetzky, Yuval Peres, and Allan Sly. Random walks on the random graph. The Annals of Probability , 46(1):456–490, 2018.
6[6] Charles Bordenave, Pietro Caputo, and Justin Salez. Random walk on sparse random digraphs. Probability Theory and Related Fields , 170(3):933–960, Apr 2018.
7[7] Charles Bordenave, Pietro Caputo, and Justin Salez. Cutoff at the “entropic time” for sparse markov chains. Probability Theory and Related Fields , 173(1):261–292, Feb 2019.
8[8] Charles Bordenave, Marc Lelarge, and Laurent Massoulié. Nonbacktracking spectrum of random graphs: Community detection and nonregular ramanujan graphs. The Annals of Probability , 46(1):1–71, 01 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Mixing time of PageRank surfers

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction and results

1.1. Two models of sparse digraphs

1.2. Main results

Theorem 1** (Uniform cutoff at the entropic time [6, 7]).**

Theorem 2**.**

Corollary 3**.**

Definition 4** (Widespread measure).**

Lemma 5**.**

Lemma 6**.**

2. Preliminaries

2.1. The stationary distribution πα,λ\pi_{\alpha,\lambda}πα,λ​

Proposition 7**.**

Proof.

2.2. Walk vs. teleport

Proposition 8**.**

Proof.

Corollary 9**.**

Corollary 10**.**

Proof.

Corollary 11**.**

Proof.

3. Main technical estimates

3.1. Decomposition of πα,λPt\pi_{\alpha,\lambda}P^{t}πα,λ​Pt

Lemma 12**.**

Proof.

Lemma 13**.**

Proof.

3.2. Singularity of μλ\mu_{\lambda}μλ​ and Pt(x,⋅)P^{t}(x,\cdot)Pt(x,⋅)

3.2.1. The tree Tz(t)\mathcal{T}_{z}(t)Tz​(t)

3.2.2. Key technical estimate

Lemma 14**.**

Lemma 15**.**

Proof.

Proof of Lemma 14.

3.3. Proof of Lemma 6

4. Proof of the trichotomy

4.1. Scenario 1

Proposition 16**.**

Proof.

4.2. Scenario 3

4.3. Scenario 2

5. Widespread measures

5.1. The marked Galton-Watson trees T−(d±)\mathcal{T}^{-}(\mathbf{d}^{\pm})T−(d±), T−(d+)\mathcal{T}^{-}(\mathbf{d}^{+})T−(d+)

5.2. Martingale approximation

Lemma 17**.**

Proof.

Proposition 18**.**

Proof.

Remark 19**.**

Proposition 20**.**

Proof.

5.3. Branching approximation for in-neighbourhoods

5.3.1. Bv,t−B^{-}_{v,t}Bv,t−​ for DCM(d±\mathbf{d}^{\pm}d±)

Lemma 21**.**

5.3.2. Bv,t−B^{-}_{v,t}Bv,t−​ for OCM(d+\mathbf{d}^{+}d+)

Lemma 22**.**

Proof.

Lemma 23**.**

Proof.

5.4. Proof of Lemma 5

Proposition 24**.**

Proof.

Proposition 25**.**

Proof.

Acknowledgments

Theorem 1 (Uniform cutoff at the entropic time [6, 7]).

Theorem 2.

Corollary 3.

Definition 4 (Widespread measure).

Lemma 5.

Lemma 6.

2.1. The stationary distribution $\pi_{\alpha,\lambda}$

Proposition 7.

Proposition 8.

Corollary 9.

Corollary 10.

Corollary 11.

3.1. Decomposition of $\pi_{\alpha,\lambda}P^{t}$

Lemma 12.

Lemma 13.

3.2. Singularity of $\mu_{\lambda}$ and $P^{t}(x,\cdot)$

3.2.1. The tree $\mathcal{T}_{z}(t)$

Lemma 14.

Lemma 15.

Proposition 16.

5.1. The marked Galton-Watson trees $\mathcal{T}^{-}(\mathbf{d}^{\pm})$ , $\mathcal{T}^{-}(\mathbf{d}^{+})$

Lemma 17.

Proposition 18.

Remark 19.

Proposition 20.

5.3.1. $B^{-}_{v,t}$ for DCM( $\mathbf{d}^{\pm}$ )

Lemma 21.

5.3.2. $B^{-}_{v,t}$ for OCM( $\mathbf{d}^{+}$ )

Lemma 22.

Lemma 23.

Proposition 24.

Proposition 25.