Limiting shape of the Depth First Search tree in an Erd\H{o}s-R\'enyi graph
Nathana\"el Enriquez, Gabriel Faraud, Laurent M\'enard

TL;DR
This paper characterizes the limiting shape of the DFS tree in Erdős-Rényi graphs, revealing a deterministic profile and identifying a long non-intersecting path related to the giant component's density.
Contribution
It provides an explicit deterministic shape for the DFS tree in Erdős-Rényi graphs and demonstrates the existence of a long non-intersecting path proportional to the graph size.
Findings
DFS tree profile converges to a deterministic shape
Existence of a long non-intersecting path of specified length
Explicit relation involving the giant component density
Abstract
We show that the profile of the tree constructed by the Depth First Search Algorithm in the giant component of an Erd\H{o}s-R\'enyi graph with vertices and connection probability converges to an explicit deterministic shape. This makes it possible to exhibit a long non-intersecting path of length , where is the density of the giant component.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Limiting shape of the Depth First Search tree
in an Erdős-Rényi graph
Nathanaël Enriquez, Gabriel Faraud and Laurent Ménard
Abstract
We show that the profile of the tree constructed by the Depth First Search Algorithm in the giant component of an Erdős-Rényi graph with vertices and connection probability converges to an explicit deterministic shape. This makes it possible to exhibit a long non-intersecting path of length , where is the density of the giant component.
**Keywords. Erdős-Rényi graphs, Depth First Search Algorithm.
**
2010 Mathematics Subject Classification. 60K35, 82C21, 60J20, 60F10.
Université Paris Nanterre
1 Introduction
The celebrated Erdős-Renyi model of random graphs [ER] exhibits a phase transition when the average degree in the graph is . Above this threshold, the graph contains with high probability a unique connected component of macroscopic size called the giant component. The geometry of this giant component has been the subject of numerous research articles and we refer to the monographs by Bollobás [B], Durrett [D] or Frieze and Karoński [F] for extensive surveys. Some results are striking by their sharpness. This is the case for the typical distance between vertices (see Durrett [D]) and the diameter (see Riordan and Wormald [RW]) which are both of logarithmic order in the number of vertices. One could ask whether this small world effect prevents the graph from containing a long simple path. This is not the case and Ajtai, Komlós and Szemerédi [AKS] proved that there exists a simple path of linear length in the supercritical regime, solving a conjecture by P. Erdős [E]. In the recent paper [KS], Krivelevich and Sudakov propose a simple proof of the phase transition which also exhibits a simple path of linear length in the supercritical regime. However, they only show the existence of a simple path of length times the number of vertices in the graph for some positive . Their strategy is to analyse the classical Depth First Search algorithm (DFS) we now describe informally.
For any finite graph , the DFS is an exploration process on . Starting at one vertex, say , it jumps to any neighbor of , continues to a neighbor of this new vertex and so on, with the restriction that the process is not allowed to visit a vertex twice. The process will draw a non-intersecting path in the graph, and ultimately get stuck. The rule is then to make a step back (that is towards ) and start exploring again. It is clear that, at any time, the set of visited edges is a tree. Eventually, the process will completely visit the connected component of and draw a spanning tree of it.
In this paper, we study the length of the longest simple path constructed by the DFS when it is started at a vertex belonging to the giant component (Theorem 1). In fact, we even get the scaling limit of the spanning tree constructed by the DFS (see Theorem 2). This gives an explicit lower bound for the longest simple path in the graph:
Theorem 1**.**
Let be the length of the longest simple path in an Erdős-Rényi random graph with vertices and parameter . Then, in probability
[TABLE]
where stands for the dilogarithm function and is the survival probability of a Galton-Watson branching process with Poisson() offspring distribution characterized by the equation
[TABLE]
It is interesting to consider the behavior of this lower bound when is large. As , we have the asymptotic expansion
[TABLE]
improving the former lower bound derived by Fernandez de la Vega in [V] as mentioned in [AKS]. It is natural to ask whether the bound of Theorem 1 is optimal or not. We did not find any evidence in either direction.
2 The Depth First Search algorithm and its scaling limit
In the following, we denote by an Erdős-Rényi random graph with vertices and parameter . The vertex set of is and for a pair the edge belongs to with probability , independently of all the others. As already mentioned if then there is a constant such that the largest connected component of grows asymptotically like as goes to infinity, where, for any , the constant is characterized by the fixed point equation (1).
2.1 The Depth First Search algorithm
Let us formally define the DFS algorithm on by induction. At each step we define the following objects:
- •
is an ordered set of vertices, called active vertices at time . With a slight abuse of notation, we will sometimes also denote by the unordered set of vertices of the ordered list .
- •
is the last element of ,
- •
is a set of vertices, called sleeping vertices,
- •
is also a set of vertices, called the retired vertices.
Initially we set:
[TABLE]
The process stops when . This occurs when , where is the number of vertices in the connected component of in . Knowing and we define and according to the following rules:
- •
If has a neighbor in , we set
[TABLE]
- •
If however, has no neighbor in , we set
[TABLE]
The sequence of vertices is a nearest neighbor walk on the connected component of and its trace is a spanning tree of this component. Moreover, the chronology of the construction makes this tree rooted and planar. By construction, the list is the ancestral line between and in this spanning tree. The set is the set of vertices that have not been visited by the walk before time . The vertices in are those for which the construction of the process ensures that they have no neighbor in .
Remark**.**
From a probabilistic point of view it might seem unnatural to take the neighbor with smallest index in the definition of instead of, for example, picking a neighbor at random. As it will become clear in the proofs, this does not change the asymptotics of the process.
2.2 Scaling limit of the DFS
At each step, the current height of the walker in the spanning tree constructed by this algorithm is denoted by . This defines a Dyck path : it starts at [math], has increments in and is non-negative except at its final value . The process is the canonical contour process in clockwise order of the spanning tree constructed by the DFS algorithm. Because of all the information it encodes, the process will be our main object of interest. Since we are mainly interested in the geometry of the giant component of , we study the process conditional on the event that belongs to the largest component of . This event has asymptotic probability . Our main result is the convergence of the process to a deterministic curve, illustrated in Figure 1:
Theorem 2**.**
Conditional on , the following limit holds in probability for the topology of uniform convergence:
[TABLE]
where the function is continuous and defined on the interval . The graph can be divided into a first increasing part and a second decreasing part. These parts are respectively parametrized by:
[TABLE]
where the functions and are given by
[TABLE]
and stands for the dilogarithm function.
Theorem 1 is easily obtained by computing the maximal height of the curve given in Theorem 2, which is equal to
[TABLE]
3 Pseudo renewal times and strategy of the proof
We call the canonical filtration associated to . Notice that carries some partial information on the underlying Erdős-Rényi graph but not all of it. In particular the graph structure of given is that of an Erdős-Rényi graph with connection probability since the connection between vertices of have not yet been tested at time .
We call the non-decreasing proportion of vertices explored by the process at time . It is straightforward to check that . Note that at time , conditional on , the expected number of unexplored vertices neighboring is Therefore it is natural to expect two successive phases:
- •
When , the walker finds a lot of unexplored vertices allowing it to drift away from its starting point. We call that phase the way up.
- •
When , the walker spends most of the time backtracking towards its starting point. We call that phase the way down.
3.1 Pseudo renewal times and the way up
On the way up, every time the walker visits a new vertex, there is a positive probability that this vertex belongs to the largest component of the new . However this is not guaranteed, as the walker could be in a dead end. If this is indeed the case, the walker will soon go back to the previously visited vertex. On the other hand, if the walker is not in a dead end, it is going to spend a very long time (that is of order ) before returning to the current vertex, as it needs to fully explore the largest component of . Therefore the walk contains a "spine" of macroscopic size, with small excursions. In order to detect this spine, we introduce the following sequence of random pseudo renewal times. Let
[TABLE]
In words, is the sequence of times where the walk hits a vertex and does not come back before having visited a macroscopic portion of the graph (see Figure 2 for an illustration). We take the minimum with to ensure that these times are well defined even if the set is empty, in which case for every . However this only happens when is close to .
An important observation is that the ’s are not stopping times with respect to . However, in the large limit, they have a nice description as we will see in the following.
3.2 Strategy of the proof
When hitting a pseudo renewal time , we know that the walker is necessarily at a vertex belonging to the largest component of . The neighbors of in are vertices picked at random, independently of the edges between vertices of . Among these neighbors, some are in small components – typically of finite size – while at least one of them is in the largest one. Therefore, the increment corresponds to the time it takes to find the largest component of . The number of neighbors of in is close to a Poisson distribution, while the number of tries it takes to find the largest component of is close to a geometric distribution, as it is a sequence of almost independent tries due to the very small amount of vertices visited between two tries. As we know that the procedure succeeds, the number of neighbors tested before finding the good one is a geometric (minus one) random variable conditioned to be smaller than a Poisson random variable. Figure 3 gives an illustration of this situation.
When the walker goes to a neighbor of , it has to visit its whole connected component inside before returning to . The time it takes to do so is twice the number of vertices in this connected component and will be small. Indeed, by definition, this connected component is not the giant component of the graph and therefore is asymptotically a subcritical Galton-Watson tree with an explicit offspring distribution. These observations make it possible to study in detail the conditional expectation
[TABLE]
in Section 4.2. A precise statement is given in Lemma 4.
A crucial parameter in our estimates of the above expectation is the proportion of sleeping vertices available at time , that is with our notation. In order to control this parameter, we introduce in Section 4.2 a sequence of random times corresponding to times where this proportion of available vertices hits fixed levels, independent of . As we already mentioned, the times are not stopping times. However, we will see in Section 4.3 that the ’s can be viewed as a Markov chain for which the ’s are stopping times. This allows us to prove a concentration result for the ’s with a martingale argument. See Lemma 5 for a precise statement.
The knowledge of the ’s and of the associated times provides pinning points through which the profile of the walk has to pass. Slope arguments then show that the normalized profile of the walk converges and the expectations give us access to the derivative of the increasing part of the limiting profile. The decreasing part is then deduced from the increasing one by a simple argument once one realizes that the time it takes to go back to a given level is twice the size of the giant component of the graph composed by the current sleeping vertices. This proof of Theorem 2 is detailed in Section 4.4.
4 The proof itself
4.1 Giant component among sleeping vertices
We already mentioned in Section 3.1 that the pseudo renewal times may degenerate. This will not be the case if, for every during the way up, the graph has no connected component of mesoscopic size. The next lemma shows that the probability of this event converges to . For later convenience, we also include a logarithmic bound for the maximal degree in the graph.
To avoid problems at criticality, we fix a margin and consider times where or equivalently
[TABLE]
Lemma 3**.**
Let be the event that, for every such that verifies (3), the graph has no connected component of size between and , and that the maximum degree of a vertex in , hence in every , is at most . Then
[TABLE]
Proof.
The maximum degree of Erdős-Rényi graphs is well known (see e.g. [B, F]) and we just focus on the size of the connected components.
Recall that, by construction, for every , the subgraph spanned by is an Erdős-Rényi random graph with vertices and parameter .
Fix and let denote the number of connected components of size in an Erdős-Rényi graph of size and parameter . Using the fact that a complete graph with vertices has spanning trees, we get:
[TABLE]
When and with , using classical inequalities we obtain
[TABLE]
Now, if we obtain
[TABLE]
If is large enough, the parameter being fixed, we have and therefore
[TABLE]
The lemma follows from the union bound and Markov’s Inequality. ∎
4.2 The renewal increments
To get Theorem 2, we need a good estimate of the expected difference between to consecutive pseudo renewal times. As we will see, the law of mainly depends on , therefore we introduce the random indices , depending on and a fixed , defined as
[TABLE]
These indices correspond to heights for the walk by the relation . The points will be our pinning points for the profile of the walk.
The ’s are well-defined during the way up, at least for times such that . This corresponds to
[TABLE]
The fact that the parameter varies only slightly between two consecutive ’s means that the sequence is almost an i.i.d. sequence.
Lemma 4**.**
There exists a constant such that, if is large enough, for every integer with , one has
[TABLE]
Proof.
To be able to bound the conditional expectation of , we need to introduce the fundamental decomposition of the trajectory of during this interval, leading to identity (6) below. At time , the walker is at a vertex having neighbors inside (see Figure 3). The law of is complicated unless the time is the first visit of . Indeed, for for such a time , the algorithm has never tested the connection between vertices of and , meaning that the integer is just a binomial random variable with parameters and . We denote by the event that is the first visit to . In addition, notice that, on the event , the number and the neighbors of in are independent of the connections inside .
For every , call the event that the return time to is at least . On , this is equivalent to the fact that the connected component of in has at least vertices meaning that .
We denote by the smallest such that the connected component of in has size larger than , and if none of the ’s is in such a connected component. For , we call the number of vertices in the connected component of in . We fix however if belongs to the connected component of a previously explored neighbor, meaning will be retired before the algorithm has the chance to test the connection between and (see for example the vertices number and in Figure 3).
On the event , the event is equivalent to the fact that the connected component of in contains at least vertices. Using the bound on the maximal degree in the graph given by , this is also equivalent to the fact that at least one of the neighbors of in has a connected component in of size at least , or . Therefore, on the event , the event is equivalent to and
[TABLE]
Conditional on , the distribution of is explicit and only depends on . Therefore we have shown that, on the event , the sequence is coupled with a non-homogeneous Markov chain, and the ’s are stopping times for this Markov chain.
We can now turn to the actual proof of the lemma. We assume that is small enough and that is large enough. In all our computations, denotes a constant independent on , and which can change from line to line to keep computations easier to read.
Recall , meaning that
[TABLE]
Indeed, on the event , the difference is at most . Therefore, if is large enough, we can make sure that the variation in between two subsequent ’s stays arbitrarily small.
Dropping the dependency in , we call the probability that a randomly taken vertex in an Erdős-Rényi graph with vertices and parameter belongs to a connected component of size larger than . By Dini’s theorem, the sequence converges uniformly to as goes to infinity. We want to compute
[TABLE]
For a fixed ,
[TABLE]
Conditional on , if the event means that belongs to a large component of . This is also true after removing the components of to get rid of dependencies. Besides, means that has at least children. By independence between the neighbors of and the connections inside
[TABLE]
and
[TABLE]
We turn to the expectation in the last bounds.
[TABLE]
Using once again the fact that the local value of remains between and with high probability,
[TABLE]
and
[TABLE]
Conditional on , the random variable is either the size of a small component in an Erdős-Rényi random graph with parameter and a number of vertices between and , or zero if belongs to one of the previously visited components, which has probability smaller than for large enough. The expected size of a small component in an Erdős-Rényi random graph with parameter and vertices converges, for a fixed , to the expected size of a Galton-Watson tree with Poisson offspring distribution conditioned on extinction. This, in turn, is a subcritical Galton-Watson tree with Poisson offspring distribution having expected size
[TABLE]
Using the smoothness of as a function of , for large enough and for every the expected size of a small component is thus in the interval
[TABLE]
Equation (7) then gives
[TABLE]
and
[TABLE]
As both bounds will be treated similarly, we will focus on the upper bound (9). By a coupling argument we can always assume that for large enough, with high probability, the random variable is larger than a Poisson random variable, denoted by X in the following. We call
[TABLE]
Isolating the sum in (9), we compute
[TABLE]
with the convention . It is straightforward to check
[TABLE]
For any , the function on the right hand side of (11) is infinitely differentiable and therefore Lipschiz in . In addition, the Lipschitz coefficient of this function can be computed explicitely and bounded uniformly in .
By uniform convergence and if is large enough. Therefore we can replace by in the previous computation, with only a error of order Recalling relation (1) characterizing , we obtain
[TABLE]
We turn to the factor . Denote by a Poisson random variable. Using once again a coupling argument as well as the same decomposition as when dealing with the first member of (8) we get
[TABLE]
Hence
[TABLE]
Putting equations (9), (10), (12) and (13) together
[TABLE]
Recalling
[TABLE]
we get the desired result. ∎
4.3 Concentration for the pinning heights
The sharp estimate of the length of the renewal intervals obtained in Lemma 4 converts into concentration for the pinning heights defined by (4):
Lemma 5**.**
There exists a constant , depending only on , such that for every , with high probability,
[TABLE]
Proof.
Fix , we are going to construct a martingale involving the sequence . Recall that on the sequence is a Markov chain, and that is a stopping time for it. Indeed, as we saw earlier, has an explicit distribution, depending only on .
We modify slightly the sequence in the following way. Let be equal to as long as . Then complete the sequence by adding to the last term i.i.d. copies of at each step. This is just a formal definition, and we are only interested in , which is precisely, by definition, the hitting time of by the sequence . Obviously changing the sequence after this hitting time won’t modify it.
Now we introduce the martingale with respect to defined on by
[TABLE]
Recall that, still on the event , the difference is smaller then , while by construction and Lemma 4, for every ,
[TABLE]
Therefore, for large enough, the increments of are bounded by .
Azuma-Hoeffding inequality gives that
[TABLE]
therefore, by the union bound,
[TABLE]
and, since , using once again the union bound
[TABLE]
which can be made as small as requested by taking large.
This implies that, as , with high probability
[TABLE]
whence, for all ,
[TABLE]
Recalling that is the hitting time of by the sequence , we get the result. ∎
4.4 Proof of Theorem 2
As we are now going to manipulate , we keep track of the dependency of the ’s on by writing . As a consequence of Lemma 5, for every
[TABLE]
Taking , we identify a Riemann sum, so that by derivability of the integrated function , uniformly in
[TABLE]
The points of the normalized profile of the walk taken at the times for , can be written
[TABLE]
the last equality coming from the fact that, on the event , each increment is bounded from above by .
Gathering (14) and (15), we obtain that as , these points of the normalized profile of the walk are uniformly at distance of the following parametrized curve:
[TABLE]
Finally, recalling that
[TABLE]
and that the slope of the renormalized profile is smaller than in absolute value, we are assured that the whole normalized profile stays at distance smaller than from the curve defined by (16). Taking first and then , we have the convergence of the normalized profile of the walk for the parameter ranging from [math] to .
To identify the parametrized curve defined by (16) with the explicit one given in Theorem 2, we just have to parametrize the curve by instead of . The definition (1) of gives
[TABLE]
From this relation, we can proceed to a change of variable in the integral appearing in (16) and get the announced formulas.
We now turn to the convergence of the profile of the process after reaching criticality, that is during the way down.
For every , we introduce
[TABLE]
the time when the walker returns to its position at time after exploring the connected component of in . On G, the difference is twice the size of this connected component. Recall that, on G, the subgraph is an Erdős-Rényi graph with number of vertices in and connection probability . As a consequence, for every ,
[TABLE]
Besides . This implies that the K points of the profile taken at times for can be written
[TABLE]
where the term goes to zero as .
Using the same slope arguments as before, we get the announced parametrization.
Acknowledgments
N.E. is partially supported by ANR PPPP (ANR-16-CE40-0016). N.E. and G.F. are partially supported by ANR MALIN. L.M. is partially supported by ANR GRAAL (ANR-14-CE25-0014).
All three authors acknowledge the support of Labex MME-DII (ANR11-LBX-0023-01).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[]
