Limiting shape of the Depth First Search tree in an Erd\H{o}s-R\'enyi   graph

Nathana\"el Enriquez; Gabriel Faraud; Laurent M\'enard

arXiv:1704.00696·math.PR·March 16, 2020

Limiting shape of the Depth First Search tree in an Erd\H{o}s-R\'enyi graph

Nathana\"el Enriquez, Gabriel Faraud, Laurent M\'enard

PDF

TL;DR

This paper characterizes the limiting shape of the DFS tree in Erdős-Rényi graphs, revealing a deterministic profile and identifying a long non-intersecting path related to the giant component's density.

Contribution

It provides an explicit deterministic shape for the DFS tree in Erdős-Rényi graphs and demonstrates the existence of a long non-intersecting path proportional to the graph size.

Findings

01

DFS tree profile converges to a deterministic shape

02

Existence of a long non-intersecting path of specified length

03

Explicit relation involving the giant component density

Abstract

We show that the profile of the tree constructed by the Depth First Search Algorithm in the giant component of an Erd\H{o}s-R\'enyi graph with $N$ vertices and connection probability $c / N$ converges to an explicit deterministic shape. This makes it possible to exhibit a long non-intersecting path of length $(ρ_{c} - \frac{Li _{2} ( ρ _{c} )}{c}) \times N$ , where $ρ_{c}$ is the density of the giant component.

Equations127

N \to \infty lim inf \frac{H _{N}}{N} \geq ρ_{c} - \frac{L i _{2} ( ρ _{c} )}{c},

N \to \infty lim inf \frac{H _{N}}{N} \geq ρ_{c} - \frac{L i _{2} ( ρ _{c} )}{c},

1 - ρ_{c} = exp (- c ρ_{c}) .

1 - ρ_{c} = exp (- c ρ_{c}) .

N \to \infty lim inf \frac{H _{N}}{N} \geq 1 - \frac{π ^{2}}{6 c} + o (\frac{1}{c}),

N \to \infty lim inf \frac{H _{N}}{N} \geq 1 - \frac{π ^{2}}{6 c} + o (\frac{1}{c}),

⎩ ⎨ ⎧ A_{0} S_{0} R_{0} = (1), = {2, 3, \dots, N}, = \emptyset.

⎩ ⎨ ⎧ A_{0} S_{0} R_{0} = (1), = {2, 3, \dots, N}, = \emptyset.

⎩ ⎨ ⎧ a_{n + 1} A_{n + 1} S_{n + 1} R_{n + 1} = in f {k \in S_{n} : {a_{n}, k} \in E_{N}}, = A_{n} \cup a_{n + 1} \mbox (t ha t i s, t h eco n c a t e na t i o n o f A_{n} \mbox an d a_{n + 1}), = S_{n} \ {a_{n + 1}}, = R_{n} .

⎩ ⎨ ⎧ a_{n + 1} A_{n + 1} S_{n + 1} R_{n + 1} = in f {k \in S_{n} : {a_{n}, k} \in E_{N}}, = A_{n} \cup a_{n + 1} \mbox (t ha t i s, t h eco n c a t e na t i o n o f A_{n} \mbox an d a_{n + 1}), = S_{n} \ {a_{n + 1}}, = R_{n} .

⎩ ⎨ ⎧ A_{n + 1} S_{n + 1} R_{n + 1} = A_{n} \ a_{n} \mbox (t ha t i s A_{n} \mbox w i t hi t s l a s t e l e m e n t r e m o v e d), = S_{n}, = R_{n} \cup {a_{n}} .

⎩ ⎨ ⎧ A_{n + 1} S_{n + 1} R_{n + 1} = A_{n} \ a_{n} \mbox (t ha t i s A_{n} \mbox w i t hi t s l a s t e l e m e n t r e m o v e d), = S_{n}, = R_{n} \cup {a_{n}} .

N \to \infty lim \frac{X _{⌈ tN ⌉}}{N} = h (t),

N \to \infty lim \frac{X _{⌈ tN ⌉}}{N} = h (t),

(t, h (t))_{0 \leq t \leq f (0)}

(t, h (t))_{0 \leq t \leq f (0)}

(t, h (t))_{f (0) \leq t \leq 2 ρ_{c}}

f (ρ)

f (ρ)

g (ρ)

g (0) = \frac{1}{c} (lo g \frac{1}{1 - ρ _{c}} - L i_{2} (ρ_{c})) = ρ_{c} - \frac{L i _{2} ( ρ _{c} )}{c} .

g (0) = \frac{1}{c} (lo g \frac{1}{1 - ρ _{c}} - L i_{2} (ρ_{c})) = ρ_{c} - \frac{L i _{2} ( ρ _{c} )}{c} .

{τ_{0} τ_{i + 1} = 0, = in f {n > τ_{i} : X_{n} = i + 1, in f {k : X_{n + k} = i} > N} \land 2 N .

{τ_{0} τ_{i + 1} = 0, = in f {n > τ_{i} : X_{n} = i + 1, in f {k : X_{n + k} = i} > N} \land 2 N .

E [τ_{i + 1} - τ_{i} ∣ τ_{i}]

E [τ_{i + 1} - τ_{i} ∣ τ_{i}]

α_{n} < 1 - (1 + η) / c .

α_{n} < 1 - (1 + η) / c .

N \to \infty lim P (G) = 1.

N \to \infty lim P (G) = 1.

E [Z_{k}] \leq (k n) k^{k - 2} p^{k - 1} (1 - p)^{k (n - k)} .

E [Z_{k}] \leq (k n) k^{k - 2} p^{k - 1} (1 - p)^{k (n - k)} .

E [Z_{k}]

E [Z_{k}]

\leq \frac{A N}{k ^{5/2}} (c e^{- η + c \frac{k}{N}})^{k} .

E [Z_{k}]

E [Z_{k}]

E k = N^{1/10} \sum N^{9/10} Z_{k}

E k = N^{1/10} \sum N^{9/10} Z_{k}

h_{k} = in f {i : α_{τ_{i}} > k ε} .

h_{k} = in f {i : α_{τ_{i}} > k ε} .

k \leq K := (1 - (1 + η) / c) / ε .

k \leq K := (1 - (1 + η) / c) / ε .

\frac{2}{ρ _{(1 - k ε) c}} - 1 - C ε \leq E [τ_{i + 1} - τ_{i} ∣ τ_{i}] \leq \frac{2}{ρ _{(1 - k ε) c}} - 1 + C ε .

\frac{2}{ρ _{(1 - k ε) c}} - 1 - C ε \leq E [τ_{i + 1} - τ_{i} ∣ τ_{i}] \leq \frac{2}{ρ _{(1 - k ε) c}} - 1 + C ε .

τ_{i + 1} - τ_{i} = 1 + 2 j = 1 \sum G (a_{n}) W_{j} .

τ_{i + 1} - τ_{i} = 1 + 2 j = 1 \sum G (a_{n}) W_{j} .

k ε := α_{-} \leq α_{τ_{i}} < α_{τ_{i + 1}} < α_{+} := (k + 1) ε + ε^{2} .

k ε := α_{-} \leq α_{τ_{i}} < α_{τ_{i + 1}} < α_{+} := (k + 1) ε + ε^{2} .

E j = 1 \sum G (a_{n}) W_{j} G (a_{n}) < C (a_{n}) = \frac{k = 0 \sum \infty E [ j = 1 \sum k W _{j} 1 _{G (a_{n}) = k} 1 _{C (a_{n}) > k} ]}{P ( G ( a _{n} ) < C ( a _{n} ) )} .

E j = 1 \sum G (a_{n}) W_{j} G (a_{n}) < C (a_{n}) = \frac{k = 0 \sum \infty E [ j = 1 \sum k W _{j} 1 _{G (a_{n}) = k} 1 _{C (a_{n}) > k} ]}{P ( G ( a _{n} ) < C ( a _{n} ) )} .

E [j = 1 \sum k W_{j} 1_{G (a_{n}) = k} 1_{C (a_{n}) > k}] = E [j = 1 \sum k W_{j} P (G (a_{n}) = k and C (a_{n}) > k j = 1 \sum k W_{j})] .

E [j = 1 \sum k W_{j} 1_{G (a_{n}) = k} 1_{C (a_{n}) > k}] = E [j = 1 \sum k W_{j} P (G (a_{n}) = k and C (a_{n}) > k j = 1 \sum k W_{j})] .

E [j = 1 \sum k W_{j} P (G (a_{n}) = k and C (a_{n}) > k j = 1 \sum k W_{j})] \leq E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] P (C (a_{n}) > k) p_{α_{-}}

E [j = 1 \sum k W_{j} P (G (a_{n}) = k and C (a_{n}) > k j = 1 \sum k W_{j})] \leq E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] P (C (a_{n}) > k) p_{α_{-}}

E [j = 1 \sum k W_{j} P (G (a_{n}) = k and C (a_{n}) > k j = 1 \sum k W_{j})] \geq E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] P (C (a_{n}) > k) p_{α_{+}} .

E [j = 1 \sum k W_{j} P (G (a_{n}) = k and C (a_{n}) > k j = 1 \sum k W_{j})] \geq E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] P (C (a_{n}) > k) p_{α_{+}} .

E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] = j = 1 \sum k P l \leq j - 1 ⋂ {W_{l} < N} \times E W_{j} 1_{W_{j} < N} l \leq j - 1 ⋂ {W_{l} < N} P j + 1 \leq l \leq k ⋂ {W_{l} < N} l \leq j ⋂ {W_{l} < N} .

E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] = j = 1 \sum k P l \leq j - 1 ⋂ {W_{l} < N} \times E W_{j} 1_{W_{j} < N} l \leq j - 1 ⋂ {W_{l} < N} P j + 1 \leq l \leq k ⋂ {W_{l} < N} l \leq j ⋂ {W_{l} < N} .

E [j = 1 \sum k W_{j} 1_{⋂_{l \leq k} {W_{l} < N}}] \geq (1 - p_{α_{-}})^{k} E W_{j} l \leq j ⋂ {W_{l} < N} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Limiting shape of the Depth First Search tree

in an Erdős-Rényi graph

Nathanaël Enriquez, Gabriel Faraud and Laurent Ménard

Abstract

We show that the profile of the tree constructed by the Depth First Search Algorithm in the giant component of an Erdős-Rényi graph with $N$ vertices and connection probability $c/N$ converges to an explicit deterministic shape. This makes it possible to exhibit a long non-intersecting path of length $\left(\rho_{c}-\frac{\mathrm{Li}_{2}(\rho_{c})}{c}\right)\times N$ , where $\rho_{c}$ is the density of the giant component.

**Keywords. Erdős-Rényi graphs, Depth First Search Algorithm.

**

2010 Mathematics Subject Classification. 60K35, 82C21, 60J20, 60F10.

Université Paris Nanterre

1 Introduction

The celebrated Erdős-Renyi model of random graphs [ER] exhibits a phase transition when the average degree in the graph is $1$ . Above this threshold, the graph contains with high probability a unique connected component of macroscopic size called the giant component. The geometry of this giant component has been the subject of numerous research articles and we refer to the monographs by Bollobás [B], Durrett [D] or Frieze and Karoński [F] for extensive surveys. Some results are striking by their sharpness. This is the case for the typical distance between vertices (see Durrett [D]) and the diameter (see Riordan and Wormald [RW]) which are both of logarithmic order in the number of vertices. One could ask whether this small world effect prevents the graph from containing a long simple path. This is not the case and Ajtai, Komlós and Szemerédi [AKS] proved that there exists a simple path of linear length in the supercritical regime, solving a conjecture by P. Erdős [E]. In the recent paper [KS], Krivelevich and Sudakov propose a simple proof of the phase transition which also exhibits a simple path of linear length in the supercritical regime. However, they only show the existence of a simple path of length $\varepsilon$ times the number of vertices in the graph for some positive $\varepsilon$ . Their strategy is to analyse the classical Depth First Search algorithm (DFS) we now describe informally.

For any finite graph $G$ , the DFS is an exploration process on $G$ . Starting at one vertex, say $v$ , it jumps to any neighbor of $v$ , continues to a neighbor of this new vertex and so on, with the restriction that the process is not allowed to visit a vertex twice. The process will draw a non-intersecting path in the graph, and ultimately get stuck. The rule is then to make a step back (that is towards $v$ ) and start exploring again. It is clear that, at any time, the set of visited edges is a tree. Eventually, the process will completely visit the connected component of $v$ and draw a spanning tree of it.

In this paper, we study the length of the longest simple path constructed by the DFS when it is started at a vertex belonging to the giant component (Theorem 1). In fact, we even get the scaling limit of the spanning tree constructed by the DFS (see Theorem 2). This gives an explicit lower bound for the longest simple path in the graph:

Theorem 1.

Let $H_{N}$ be the length of the longest simple path in an Erdős-Rényi random graph with $N$ vertices and parameter $c/N$ . Then, in probability

[TABLE]

where $Li_{2}$ stands for the dilogarithm function and $\rho_{c}$ is the survival probability of a Galton-Watson branching process with Poisson( $c$ ) offspring distribution characterized by the equation

[TABLE]

It is interesting to consider the behavior of this lower bound when $c$ is large. As $Li_{2}(1)=\pi^{2}/6$ , we have the asymptotic expansion

[TABLE]

improving the former lower bound $1-2.21/c$ derived by Fernandez de la Vega in [V] as mentioned in [AKS]. It is natural to ask whether the bound of Theorem 1 is optimal or not. We did not find any evidence in either direction.

2 The Depth First Search algorithm and its scaling limit

In the following, we denote by $\mathcal{G}_{N}=(V_{N},E_{N})$ an Erdős-Rényi random graph with $N$ vertices and parameter $c/N$ . The vertex set of $\mathcal{G}_{N}$ is $V_{N}=\{1,2,\dots,N\}$ and for a pair $(i,j)\in{\mathbb{N}}^{2},\,i\neq j$ the edge $\{i,j\}$ belongs to $E_{N}$ with probability $c/N$ , independently of all the others. As already mentioned if $c>1$ then there is a constant $\rho_{c}$ such that the largest connected component of $\mathcal{G}_{N}$ grows asymptotically like $\rho_{c}\,N$ as $N$ goes to infinity, where, for any $c>1$ , the constant $\rho_{c}$ is characterized by the fixed point equation (1).

2.1 The Depth First Search algorithm

Let us formally define the DFS algorithm on $\mathcal{G}_{N}$ by induction. At each step we define the following objects:

•

$A_{n}$ is an ordered set of vertices, called active vertices at time $n$ . With a slight abuse of notation, we will sometimes also denote by $A_{n}$ the unordered set of vertices of the ordered list $A_{n}$ .

•

$a_{n}$ is the last element of $A_{n}$ ,

•

$S_{n}$ is a set of vertices, called sleeping vertices,

•

$R_{n}=\{1,\ldots,N\}\setminus(A_{n}\cup S_{n})$ is also a set of vertices, called the retired vertices.

Initially we set:

[TABLE]

The process stops when $A_{n}=\emptyset$ . This occurs when $n=2|\mathcal{C}(1)|-1$ , where $|\mathcal{C}(1)|$ is the number of vertices in the connected component of $1$ in $\mathcal{G}_{N}$ . Knowing $A_{n},S_{n}$ and $R_{n}$ we define $A_{n+1},S_{n+1}$ and $R_{n+1}$ according to the following rules:

•

If $a_{n}$ has a neighbor in $S_{n}$ , we set

[TABLE]

•

If however, $a_{n}$ has no neighbor in $S_{n}$ , we set

[TABLE]

The sequence of vertices $(a_{n})$ is a nearest neighbor walk on the connected component of $1$ and its trace is a spanning tree of this component. Moreover, the chronology of the construction makes this tree rooted and planar. By construction, the list $A_{n}$ is the ancestral line between $a_{n}$ and $1$ in this spanning tree. The set $S_{n}$ is the set of vertices that have not been visited by the walk $(a_{n})$ before time $n$ . The vertices in $R_{n}$ are those for which the construction of the process ensures that they have no neighbor in $S_{n}$ .

Remark.

From a probabilistic point of view it might seem unnatural to take the neighbor with smallest index in the definition of $(a_{n})$ instead of, for example, picking a neighbor at random. As it will become clear in the proofs, this does not change the asymptotics of the process.

2.2 Scaling limit of the DFS

At each step, the current height of the walker in the spanning tree constructed by this algorithm is denoted by $X_{n}=|A_{n}|-1$ . This defines a Dyck path $X=(X_{n})_{0\leq n\leq 2|\mathcal{C}(1)|-1}$ : it starts at [math], has increments in $\{-1,+1\}$ and is non-negative except at its final value $-1$ . The process $X$ is the canonical contour process in clockwise order of the spanning tree constructed by the DFS algorithm. Because of all the information it encodes, the process $X$ will be our main object of interest. Since we are mainly interested in the geometry of the giant component of $\mathcal{G}_{N}$ , we study the process $X$ conditional on the event $\mathbf{S}$ that $1$ belongs to the largest component of $\mathcal{G}_{N}$ . This event has asymptotic probability $\rho_{c}$ . Our main result is the convergence of the process $X$ to a deterministic curve, illustrated in Figure 1:

Theorem 2.

Conditional on $\mathbf{S}$ , the following limit holds in probability for the topology of uniform convergence:

[TABLE]

where the function $h$ is continuous and defined on the interval $[0,2\rho_{c}]$ . The graph $(t,h(t))_{t\in[0,2\rho_{c}]}$ can be divided into a first increasing part and a second decreasing part. These parts are respectively parametrized by:

[TABLE]

where the functions $f$ and $g$ are given by

[TABLE]

and $Li_{2}$ stands for the dilogarithm function.

Theorem 1 is easily obtained by computing the maximal height of the curve given in Theorem 2, which is equal to

[TABLE]

3 Pseudo renewal times and strategy of the proof

We call $\mathcal{F}_{n}$ the canonical filtration associated to $(a_{n})$ . Notice that $\mathcal{F}_{n}$ carries some partial information on the underlying Erdős-Rényi graph but not all of it. In particular the graph structure of $S_{n}$ given $\mathcal{F}_{n}$ is that of an Erdős-Rényi graph with connection probability $c/N$ since the connection between vertices of $S_{n}$ have not yet been tested at time $n$ .

We call $\alpha_{n}=|A_{n}\cup R_{n}|/N$ the non-decreasing proportion of vertices explored by the process at time $n$ . It is straightforward to check that $\alpha_{n}=\frac{X_{n}+n}{2N}$ . Note that at time $n$ , conditional on $\alpha_{n}$ , the expected number of unexplored vertices neighboring $a_{n}$ is $(1-\alpha_{n})c.$ Therefore it is natural to expect two successive phases:

•

When $(1-\alpha_{n})c>1$ , the walker finds a lot of unexplored vertices allowing it to drift away from its starting point. We call that phase the way up.

•

When $(1-\alpha_{n})c<1$ , the walker spends most of the time backtracking towards its starting point. We call that phase the way down.

3.1 Pseudo renewal times and the way up

On the way up, every time the walker visits a new vertex, there is a positive probability that this vertex belongs to the largest component of the new $S_{n}$ . However this is not guaranteed, as the walker could be in a dead end. If this is indeed the case, the walker will soon go back to the previously visited vertex. On the other hand, if the walker is not in a dead end, it is going to spend a very long time (that is of order $N$ ) before returning to the current vertex, as it needs to fully explore the largest component of $S_{n}$ . Therefore the walk $(X_{n})$ contains a "spine" of macroscopic size, with small excursions. In order to detect this spine, we introduce the following sequence of random pseudo renewal times. Let

[TABLE]

In words, $(\tau_{i})$ is the sequence of times where the walk hits a vertex and does not come back before having visited a macroscopic portion of the graph (see Figure 2 for an illustration). We take the minimum with $2N$ to ensure that these times are well defined even if the set $\{n>\tau_{i};X_{n}=i+1,\,\inf\{k;X_{n+k}=i+1\}>\sqrt{N}\}$ is empty, in which case $\tau_{j}=2N$ for every $j\geq i$ . However this only happens when $(1-\alpha_{n})c$ is close to $1$ .

An important observation is that the $\tau_{i}$ ’s are not stopping times with respect to $\mathcal{F}_{n}$ . However, in the large $N$ limit, they have a nice description as we will see in the following.

3.2 Strategy of the proof

When hitting a pseudo renewal time $n=\tau_{i}$ , we know that the walker is necessarily at a vertex $a_{n}$ belonging to the largest component of $S_{n-1}$ . The neighbors of $a_{n}$ in $S_{n}$ are vertices picked at random, independently of the edges between vertices of $S_{n}$ . Among these neighbors, some are in small components – typically of finite size – while at least one of them is in the largest one. Therefore, the increment $\tau_{i+1}-\tau_{i}$ corresponds to the time it takes to find the largest component of $S_{n}$ . The number $C(a_{n})$ of neighbors of $a_{n}$ in $S_{n}$ is close to a Poisson distribution, while the number $G(a_{n})$ of tries it takes to find the largest component of $S_{n}$ is close to a geometric distribution, as it is a sequence of almost independent tries due to the very small amount of vertices visited between two tries. As we know that the procedure succeeds, the number of neighbors tested before finding the good one is a geometric (minus one) random variable conditioned to be smaller than a Poisson random variable. Figure 3 gives an illustration of this situation.

When the walker goes to a neighbor of $a_{n}$ , it has to visit its whole connected component inside $S_{n}$ before returning to $a_{n}$ . The time it takes to do so is twice the number of vertices in this connected component and will be small. Indeed, by definition, this connected component is not the giant component of the graph $S_{n}$ and therefore is asymptotically a subcritical Galton-Watson tree with an explicit offspring distribution. These observations make it possible to study in detail the conditional expectation

[TABLE]

in Section 4.2. A precise statement is given in Lemma 4.

A crucial parameter in our estimates of the above expectation is the proportion of sleeping vertices available at time $\tau_{i}$ , that is $(1-\alpha_{n})$ with our notation. In order to control this parameter, we introduce in Section 4.2 a sequence of random times $(h_{k})$ corresponding to times where this proportion of available vertices hits fixed levels, independent of $N$ . As we already mentioned, the times $\tau_{i}$ are not stopping times. However, we will see in Section 4.3 that the $\tau_{i}$ ’s can be viewed as a Markov chain for which the $h_{k}$ ’s are stopping times. This allows us to prove a concentration result for the $h_{k}$ ’s with a martingale argument. See Lemma 5 for a precise statement.

The knowledge of the $h_{k}$ ’s and of the associated times $\tau_{h_{k}}$ provides pinning points through which the profile of the walk has to pass. Slope arguments then show that the normalized profile of the walk converges and the expectations $\mathbb{E}\left[\tau_{i+1}-\tau_{i}\middle|\tau_{i}\right]$ give us access to the derivative of the increasing part of the limiting profile. The decreasing part is then deduced from the increasing one by a simple argument once one realizes that the time it takes to go back to a given level is twice the size of the giant component of the graph composed by the current sleeping vertices. This proof of Theorem 2 is detailed in Section 4.4.

4 The proof itself

4.1 Giant component among sleeping vertices

We already mentioned in Section 3.1 that the pseudo renewal times $\tau_{i}$ may degenerate. This will not be the case if, for every $n$ during the way up, the graph $S_{n}$ has no connected component of mesoscopic size. The next lemma shows that the probability of this event converges to $1$ . For later convenience, we also include a logarithmic bound for the maximal degree in the graph.

To avoid problems at criticality, we fix a margin $\eta>0$ and consider times where $(1-\alpha_{n})c>1+\eta,$ or equivalently

[TABLE]

Lemma 3.

Let $\bf G$ be the event that, for every $n$ such that $\alpha_{n}$ verifies (3), the graph $S_{n}$ has no connected component of size between $N^{1/10}$ and $N^{9/10}$ , and that the maximum degree of a vertex in $S_{0}$ , hence in every $S_{n}$ , is at most $\log N$ . Then

[TABLE]

Proof.

The maximum degree of Erdős-Rényi graphs is well known (see e.g. [B, F]) and we just focus on the size of the connected components.

Recall that, by construction, for every $n$ , the subgraph spanned by $S_{n}$ is an Erdős-Rényi random graph with $(1-\alpha_{n})N$ vertices and parameter $c/N$ .

Fix $k\geq 0$ and let $Z_{k}$ denote the number of connected components of size $k$ in an Erdős-Rényi graph of size $n$ and parameter $p$ . Using the fact that a complete graph with $k$ vertices has $k^{k-2}$ spanning trees, we get:

[TABLE]

When $p=c/N$ and $n=(1-\alpha)N$ with $c(1+\alpha)<1+\eta$ , using classical inequalities we obtain

[TABLE]

Now, if $k\in[N^{1/10},N^{9/10}]$ we obtain

[TABLE]

If $N$ is large enough, the parameter $c$ being fixed, we have $ce^{-\eta+cN^{-1/10}}<1$ and therefore

[TABLE]

The lemma follows from the union bound and Markov’s Inequality. ∎

4.2 The renewal increments

To get Theorem 2, we need a good estimate of the expected difference between to consecutive pseudo renewal times. As we will see, the law of $\tau_{i+1}-\tau_{i}$ mainly depends on $\alpha_{\tau_{i}}$ , therefore we introduce the random indices $(h_{k})$ , depending on $N$ and a fixed $\varepsilon>0$ , defined as

[TABLE]

These indices correspond to heights for the walk $(X_{n})$ by the relation $X_{\tau_{h_{k}}}=h_{k}$ . The points $(\tau_{h_{k}},h_{k})$ will be our pinning points for the profile of the walk.

The $h_{k}$ ’s are well-defined during the way up, at least for times $n$ such that $(1-\alpha_{n})c>1+\eta$ . This corresponds to

[TABLE]

The fact that the parameter $\alpha$ varies only slightly between two consecutive $h_{k}$ ’s means that the sequence $(\tau_{i+1}-\tau_{i})_{h_{k}\leq i\leq h_{k+1}}$ is almost an i.i.d. sequence.

Lemma 4.

There exists a constant $C$ such that, if $N$ is large enough, for every integer $i\in[h_{k},h_{k+1}[$ with $k\leq K$ , one has

[TABLE]

Proof.

To be able to bound the conditional expectation of $\tau_{i+1}-\tau_{i}$ , we need to introduce the fundamental decomposition of the trajectory of $(X_{n})$ during this interval, leading to identity (6) below. At time $n$ , the walker is at a vertex $a_{n}$ having $C(a_{n})$ neighbors inside $S_{n}$ (see Figure 3). The law of $C(a_{n})$ is complicated unless the time $n$ is the first visit of $a_{n}$ . Indeed, for for such a time $n$ , the algorithm has never tested the connection between vertices of $S_{n}$ and $a_{n}$ , meaning that the integer $C(a_{n})$ is just a binomial random variable with parameters $(1-\alpha_{n})N$ and $c/N$ . We denote by ${\bf F}_{n}$ the event that $n$ is the first visit to $a_{n}$ . In addition, notice that, on the event ${\bf F}_{n}$ , the number $C(a_{n})$ and the neighbors $x_{1}<\cdots<x_{C(a_{n})}$ of $a_{n}$ in $S_{n}$ are independent of the connections inside $S_{n}$ .

For every $n$ , call ${\bf H}_{n}$ the event that the return time to $a_{n-1}$ is at least $\sqrt{N}$ . On $\mathbf{F}_{n}$ , this is equivalent to the fact that the connected component of $a_{n}$ in $S_{n-1}$ has at least $\sqrt{N}/2$ vertices meaning that $\{n=\tau_{i}\}=\{X_{n}=i\}\cap{\bf F}_{n}\cap{\bf H}_{n}$ .

We denote by $G(a_{n})$ the smallest $k$ such that the connected component of $x_{k+1}$ in $S_{n}$ has size larger than $\sqrt{N}/2$ , and $G(a_{n})=C(a_{n})$ if none of the $x_{i}$ ’s is in such a connected component. For $1\leq i<G(a_{n})$ , we call $W_{i}$ the number of vertices in the connected component of $x_{i}$ in $S_{n}$ . We fix however $W_{i}=0$ if $x_{i}$ belongs to the connected component of a previously explored neighbor, meaning $x_{i}$ will be retired before the algorithm has the chance to test the connection between $a_{n}$ and $x_{i}$ (see for example the vertices number $2$ and $4$ in Figure 3).

On the event $\{X_{n}=i\}\cap{\bf F}_{n}\cap\mathbf{G}$ , the event $\mathbf{H}_{n}$ is equivalent to the fact that the connected component of $a_{n}$ in $S_{n-1}$ contains at least $N^{9/10}$ vertices. Using the bound on the maximal degree in the graph given by $\mathbf{G}$ , this is also equivalent to the fact that at least one of the neighbors of $a_{n}$ in $S_{n-1}$ has a connected component in $S_{n}$ of size at least $N^{9/10}/\log{N}$ , or $\sqrt{N}/2$ . Therefore, on the event $\{X_{n}=i\}\cap{\bf F}_{n}\cap\mathbf{G}$ , the event $\mathbf{H}_{n}$ is equivalent to $G(a_{n})<C(a_{n})$ and

[TABLE]

Conditional on $\tau_{i}$ , the distribution of $(G(a_{n}),(W_{j})_{1\leq j\leq G(a_{n})})$ is explicit and only depends on $\alpha_{\tau_{i}}=\frac{i+\tau_{i}}{2N}$ . Therefore we have shown that, on the event ${\bf G}$ , the sequence $(\tau_{i})$ is coupled with a non-homogeneous Markov chain, and the $h_{k}$ ’s are stopping times for this Markov chain.

We can now turn to the actual proof of the lemma. We assume that $\varepsilon$ is small enough and that $N$ is large enough. In all our computations, $C$ denotes a constant independent on $k$ , $N$ and $\varepsilon$ which can change from line to line to keep computations easier to read.

Recall $h_{k}\leq i<h_{k+1}$ , meaning that

[TABLE]

Indeed, on the event ${\bf G}$ , the difference $\tau_{i+1}-\tau_{i}$ is at most $N^{1/10}\log N$ . Therefore, if $N$ is large enough, we can make sure that the variation in $\alpha$ between two subsequent $\tau_{i}$ ’s stays arbitrarily small.

Dropping the dependency in $N$ , we call $p_{\alpha}$ the probability that a randomly taken vertex in an Erdős-Rényi graph with $(1-\alpha)N$ vertices and parameter $c/N$ belongs to a connected component of size larger than $\sqrt{N}/2$ . By Dini’s theorem, the sequence $p_{\alpha}$ converges uniformly to $\rho((1-\alpha)c)$ as $N$ goes to infinity. We want to compute

[TABLE]

For a fixed $k$ ,

[TABLE]

Conditional on $(W_{l})_{l\leq k}$ , if $\bigcap_{l\leq k}\{W_{l}<\sqrt{N}\},$ the event $\{G(a_{n})=k\}$ means that $x_{k+1}$ belongs to a large component of $S_{n}$ . This is also true after removing the components of $x_{1},\dots,x_{k}$ to get rid of dependencies. Besides, $\{C(a_{n})>k\}$ means that $a_{n}$ has at least $k+1$ children. By independence between the neighbors of $a_{n}$ and the connections inside $S_{n}$

[TABLE]

and

[TABLE]

We turn to the expectation in the last bounds.

[TABLE]

Using once again the fact that the local value of $\alpha$ remains between $\alpha_{-}$ and $\alpha_{+}$ with high probability,

[TABLE]

and

[TABLE]

Conditional on $\bigcap_{l\leq j}\{W_{l}<\sqrt{N}\}$ , the random variable $W_{j}$ is either the size of a small component in an Erdős-Rényi random graph with parameter $c$ and a number of vertices between $(1-\alpha_{+})N$ and $(1-\alpha_{-})N$ , or zero if $x_{j}$ belongs to one of the previously visited components, which has probability smaller than $\varepsilon$ for $N$ large enough. The expected size of a small component in an Erdős-Rényi random graph with parameter $c$ and $(1-\alpha)N$ vertices converges, for a fixed $\alpha$ , to the expected size of a Galton-Watson tree with Poisson $((1-\alpha)c)$ offspring distribution conditioned on extinction. This, in turn, is a subcritical Galton-Watson tree with Poisson $((1-\rho_{(1-\alpha)c})(1-\alpha)c)$ offspring distribution having expected size

[TABLE]

Using the smoothness of $\rho_{x}$ as a function of $x$ , for $N$ large enough and for every $\alpha\in[\alpha_{-},\alpha_{+}],$ the expected size of a small component is thus in the interval

[TABLE]

Equation (7) then gives

[TABLE]

and

[TABLE]

As both bounds will be treated similarly, we will focus on the upper bound (9). By a coupling argument we can always assume that for $N$ large enough, with high probability, the random variable $C(a_{n})$ is larger than a Poisson $(c(1-\alpha_{-}))$ random variable, denoted by X in the following. We call

[TABLE]

Isolating the sum in (9), we compute

[TABLE]

with the convention $S_{-1}=0$ . It is straightforward to check

[TABLE]

For any $\alpha_{-}$ , the function on the right hand side of (11) is infinitely differentiable and therefore Lipschiz in $p_{\alpha_{+}}$ . In addition, the Lipschitz coefficient of this function can be computed explicitely and bounded uniformly in $\alpha_{-}$ .

By uniform convergence $|p_{\alpha_{+}}-\rho_{(1-\alpha_{-})c}|\leq\varepsilon$ and $|p_{\alpha_{-}}-\rho_{(1-\alpha_{-})c}|\leq\varepsilon$ if $N$ is large enough. Therefore we can replace $p_{\alpha_{+}}$ by $\rho_{(1-\alpha_{-})c}$ in the previous computation, with only a error of order $C\varepsilon.$ Recalling relation (1) characterizing $\rho$ , we obtain

[TABLE]

We turn to the factor ${\mathbb{P}}(G(a_{n})<C(a_{n}))$ . Denote by $Y$ a Poisson $(c(1-\alpha_{+}))$ random variable. Using once again a coupling argument as well as the same decomposition as when dealing with the first member of (8) we get

[TABLE]

Hence

[TABLE]

Putting equations (9), (10), (12) and (13) together

[TABLE]

Recalling

[TABLE]

we get the desired result. ∎

4.3 Concentration for the pinning heights

The sharp estimate of the length of the renewal intervals obtained in Lemma 4 converts into concentration for the pinning heights $(h_{k})_{1\leq k\leq K}$ defined by (4):

Lemma 5.

There exists a constant $C$ , depending only on $\eta$ , such that for every $k\leq K$ , with high probability,

[TABLE]

Proof.

Fix $k\leq K$ , we are going to construct a martingale involving the sequence $(\tau_{i})_{h_{k}\leq i<h_{k+1}}$ . Recall that on $\mathbf{G},$ the sequence $(\tau_{i})$ is a Markov chain, and that $h_{k}$ is a stopping time for it. Indeed, as we saw earlier, $\tau_{i+1}-\tau_{i}$ has an explicit distribution, depending only on $\tau_{i}+i$ .

We modify slightly the sequence $\tau_{h_{k}+i}$ in the following way. Let $\tilde{\tau}_{h_{k}+i}$ be equal to $\tau_{h_{k}+i}$ as long as $\tau_{h_{k}+i}-\tau_{h_{k}}+i\leq 2\varepsilon N$ . Then complete the sequence by adding to the last term $\tau_{h_{k+1}}$ i.i.d. copies of $\tau_{h_{k}+1}-\tau_{h_{k}}$ at each step. This is just a formal definition, and we are only interested in $h_{k+1}-h_{k}$ , which is precisely, by definition, the hitting time of $2\varepsilon N$ by the sequence $\tau_{h_{k}+i}-\tau_{h_{k}}+i$ . Obviously changing the sequence after this hitting time won’t modify it.

Now we introduce the martingale $M^{(k)}_{n}$ with respect to $\sigma(\tilde{\tau}_{h_{k}+i})_{i\geq 0}$ defined on $\mathbf{G}$ by

[TABLE]

Recall that, still on the event $\mathbf{G}$ , the difference $|\tilde{\tau}_{h_{k}+i+1}-\tilde{\tau}_{h_{k}+i}|$ is smaller then $N^{1/10}\log N$ , while by construction and Lemma 4, for every $i\geq 0$ ,

[TABLE]

Therefore, for $N$ large enough, the increments of $M^{(k)}$ are bounded by $N^{1/10}\log N$ .

Azuma-Hoeffding inequality gives that

[TABLE]

therefore, by the union bound,

[TABLE]

and, since $K\leq C/\varepsilon$ , using once again the union bound

[TABLE]

which can be made as small as requested by taking $N$ large.

This implies that, as $N\to\infty$ , with high probability

[TABLE]

whence, for all $n\leq\varepsilon N$ ,

[TABLE]

Recalling that $h_{k+1}-h_{k}$ is the hitting time of $2\varepsilon N$ by the sequence $(\tilde{\tau}_{h_{k}+n}-\tilde{\tau}_{h_{k}}+n)_{n}$ , we get the result. ∎

4.4 Proof of Theorem 2

As we are now going to manipulate $\varepsilon$ , we keep track of the dependency of the $h_{k}$ ’s on $\varepsilon$ by writing $h^{\varepsilon}_{k}$ . As a consequence of Lemma 5, for every $k\leq K=(1-(1+\eta)/c)/\varepsilon$

[TABLE]

Taking $k=\lceil u/\varepsilon\rceil$ , we identify a Riemann sum, so that by derivability of the integrated function $x\mapsto\rho_{x}$ , uniformly in $u\in[0,1-(1+\eta)/c]$

[TABLE]

The $K$ points of the normalized profile $(n/N,X_{n}/N)$ of the walk taken at the times $\tau_{h_{k}^{\varepsilon}}$ for $k\in\{1,\ldots,K\}$ , can be written

[TABLE]

the last equality coming from the fact that, on the event $\mathbf{G}$ , each increment $\tau_{i+1}-\tau_{i}$ is bounded from above by $N^{1/10}\,\log N$ .

Gathering (14) and (15), we obtain that as $N\to\infty$ , these $K$ points of the normalized profile of the walk are uniformly at distance $C\varepsilon$ of the following parametrized curve:

[TABLE]

Finally, recalling that

[TABLE]

and that the slope of the renormalized profile is smaller than $1$ in absolute value, we are assured that the whole normalized profile stays at distance smaller than $C\varepsilon$ from the curve defined by (16). Taking $\varepsilon\to 0$ first and then $\eta\to 0$ , we have the convergence of the normalized profile of the walk for the parameter $u$ ranging from [math] to $1-1/c$ .

To identify the parametrized curve defined by (16) with the explicit one given in Theorem 2, we just have to parametrize the curve by $\rho_{(1-u)c}$ instead of $u$ . The definition (1) of $\rho_{(1-u)c}$ gives

[TABLE]

From this relation, we can proceed to a change of variable in the integral appearing in (16) and get the announced formulas.

We now turn to the convergence of the profile of the process after reaching criticality, that is during the way down.

For every $k\leq K$ , we introduce

[TABLE]

the time when the walker returns to its position at time $\tau_{h_{k}}$ after exploring the connected component of $a_{\tau_{h_{k}+1}}$ in $S_{\tau_{h_{k}}}$ . On G, the difference $\zeta_{k}-\tau_{h_{k}}$ is twice the size of this connected component. Recall that, on G, the subgraph $S_{\tau_{h_{k}}}$ is an Erdős-Rényi graph with number of vertices in $(1-k\varepsilon)N+O(N^{1/5})$ and connection probability $c/N$ . As a consequence, for every $k$ ,

[TABLE]

Besides $X_{\zeta_{k}}=X_{\tau_{h_{k}}}=h_{k}$ . This implies that the K points of the profile taken at times $\zeta_{k}$ for $k\in\{1,\ldots,K\}$ can be written

[TABLE]

where the term $o(1)$ goes to zero as $N\to\infty$ .

Using the same slope arguments as before, we get the announced parametrization.

Acknowledgments

N.E. is partially supported by ANR PPPP (ANR-16-CE40-0016). N.E. and G.F. are partially supported by ANR MALIN. L.M. is partially supported by ANR GRAAL (ANR-14-CE25-0014).

All three authors acknowledge the support of Labex MME-DII (ANR11-LBX-0023-01).

Bibliography1

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[]

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Limiting shape of the Depth First Search tree

Abstract

1 Introduction

Theorem 1**.**

2 The Depth First Search algorithm and its scaling limit

2.1 The Depth First Search algorithm

Remark**.**

2.2 Scaling limit of the DFS

Theorem 2**.**

3 Pseudo renewal times and strategy of the proof

3.1 Pseudo renewal times and the way up

3.2 Strategy of the proof

4 The proof itself

4.1 Giant component among sleeping vertices

Lemma 3**.**

Proof.

4.2 The renewal increments

Lemma 4**.**

Proof.

4.3 Concentration for the pinning heights

Lemma 5**.**

Proof.

4.4 Proof of Theorem 2

Acknowledgments

Theorem 1.

Remark.

Theorem 2.

Lemma 3.

Lemma 4.

Lemma 5.