Loop-erased partitioning of a graph: mean-field analysis
Luca Avena, Alexandre Gaudilliere, Paolo Milanesi, Matteo Quattropani

TL;DR
This paper analyzes a graph partitioning method based on loop-erased random walks, revealing phase transitions in community detection and providing insights into the macroscopic structure of complex networks.
Contribution
It introduces a novel loop-erased random walk-based partitioning approach and characterizes its phase transition behavior in community detection.
Findings
Derived an interaction potential for vertex pairs based on non-membership probability.
Computed the potential and its scaling limits on complete and non-homogeneous graphs.
Identified a phase transition in community detectability depending on parameters.
Abstract
We consider a random partition of the vertex set of an arbitrary graph that can be sampled using loop-erased random walks stopped at a random independent exponential time of parameter , that we see as a tuning parameter.The related random blocks tend to cluster nodes visited by the random walk on time scale . We explore the emerging macroscopic structure by analyzing 2-point correlations. To this aim, it is defined an interaction potential between pair of vertices, as the probability that they do not belong to the same block of the random partition. This interaction potential can be seen as an affinity measure for ``densely connected nodes'' and capture well-separated regions in network models presenting non-homogeneous landscapes. In this spirit, we compute this potential and its scaling limits on a complete graph and on a non-homogeneous weighted version with communityâŚ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTheoretical and Computational Physics ¡ Complex Network Analysis Techniques ¡ Stochastic processes and statistical mechanics
Loop-erased partitioning of a graph:
mean-field analysis
Luca Avena*âĄ*
⥠Leiden University, Mathematical Institute, Niels Bohrweg 1 2333 CA, Leiden. The Netherlands.
,Â
Alexandre Gaudillièreâ
Aix-Marseille UniversitĂŠ, CNRS, Centrale Marseille. I2M UMR CNRS 7373. 39, rue Joliot Curie. 13 453 Marseille Cedex 13. France.
,Â
Paolo Milanesi§
§ Aix-Marseille UniversitÊ, CNRS, Centrale Marseille. I2M UMR CNRS 7373. 39, rue Joliot Curie. 13 453 Marseille Cedex 13. France.
 andÂ
Matteo Quattropaniâ
â Dipartimento di Matematica e Fisica, UniversitĂ di Roma Tre, Largo S. Leonardo Murialdo 1, 00146 Roma, Italy.
Abstract.
We consider a random partition of the vertex set of an arbitrary graph that can be sampled using loop-erased random walks stopped at a random independent exponential time of parameter , that we see as a tuning parameter.The related random blocks tend to cluster nodes visited by the random walk on time scale . We explore the emerging macroscopic structure by analyzing 2-point correlations. To this aim, it is defined an interaction potential between pair of vertices, as the probability that they do not belong to the same block of the random partition. This interaction potential can be seen as an affinity measure for âdensely connected nodesâ and capture well-separated regions in network models presenting non-homogeneous landscapes. In this spirit, we compute this potential and its scaling limits on a complete graph and on a non-homogeneous weighted version with community structures. For the latter geometry we show a phase-transition for âcommunity detectabilityâ as a function of the tuning parameter and the edge weights.
Key words and phrases:
Discrete Laplacian, random partitions, loop-erased random walk, Wilsonâs algorithm, spanning rooted forests
2010 Mathematics Subject Classification:
05C81, 05C85, 60J10, 60J27, 60J28
1. Intro: Loop-erasure and random partitioning
Consider an arbitrary simple undirected weighted connected graph on vertices where stands for the edge set and is a given edge-weight function. We call the Random Walk (RW) associated to the continuous-time Markov chain with state space and the discrete Laplacian as infinitesimal generator, i.e., the matrix:
[TABLE]
where for any , is the weighted adjacency matrix and is the diagonal matrix guarantying that the entries of each row in sum up to [math].
The goal of this paper is to explore the following probability measure on the set of partitions of the vertex set .
Definition 1** **(Loop-erased partitioning).
Given , fix a positive parameter . We call loop-erased a partition of in blocks sampled according to the following probability measure:
[TABLE]
where the sum is over spanning rooted forests âs of , stands for the partition of induced by a forest , for the forest weight, and is a normalizing constant. We denote by a random variable in with law .
In the above definition a spanning rooted forest of a graph is a collection of rooted trees spanning its vertex set. Denoting by the set of spanning rooted forests of , we note thatâdue to the matrix tree theoremâthe normalizing constant in Eq. 1.2 can be expressed as the characteristic polynomial of the matrix evaluated at , i.e.
[TABLE]
where denotes the number of trees in . Furthermore, the number of blocks in , denoted by , is distributed as the sum of independent Bernoulli random variables with success probabilities , for , with âs being the eigenvalues of . We refer the reader to [5, Prop. 2.1] for a proof of these statements.
1.1. Tuning parameter and underlying geometry.
The first factor in Eq. 1.2 favors partitions having many small blocks as growths, while as vanishes, the measure degenerates into a one-block partition. The second combinatorial factor takes into account the underlying geometry and for example in the unweighted case (i.e. constant edgeâweights ) counts how many rooted forests are compatible with a given partition. In the simple setup of an unweighted complete graph on vertices , the measure in Definition 1 reduces to
[TABLE]
for a partition constituted of blocks with sizes , such that . In particular, we see in this setup that this second factor favors partitions with a few âfatâ blocks. Notice that Eq. 1.3 holds true because, by Cayleyâs formula, unrooted trees can cover block , and since we are dealing with rooted trees, an extra volume factor for the possible roots is needed. In general, the competition between these two factors depends on the delicate interplay among the tuning parameter , the underlying geometry and the weight function .
1.2. Sampling algorithm and Loop-Erased RW (LERW)
An attractive feature of this measure is that there exists a simple exact sampling algorithm. Originally due to Wilson [22] and based on the associated LERW killed at random times. The LERW with killing is the process obtained by running the RW , erasing cycles as soon as they appear, and stopping the evolving self-avoiding trajectory at an independent random time with law an exponential of parameter .
The algorithm can be described as follows:
- (1)
pick any arbitrary vertex in and run a LERW up to an independent time Call the obtained self-avoiding trajectory. 2. (2)
pick any arbitrary vertex in that does not belong to . Run a LERW until , being the first time the RW hits a vertex in . Call the union of and the new self-avoiding trajectory obtained in this step. 3. (3)
Iterate step (2) with in place of until exhaustion of the vertex set .
In step (2) we note that if the killing occurs before , then is a rooted forest in , else is a rooted tree.
When the above algorithm stops, it produces a spanning rooted forest , where the roots are the points where the involved LERWs were killed along the algorithm steps. The resulting forest on induces the partition of the vertex set , where each block is identified by vertices belonging to the same tree. It can be shown that the probability to obtain a given rooted spanning forest is proportional to to the power of the number of trees, times the forest weight . It then follows that the induced partition is distributed as in Definition 1. We refer the reader to [5] for the proof of the latter and for more detailed aspects of this algorithm, including dynamical variants. In the sequel we will denote by a probability measure on an abstract probability space sufficiently rich for the randomness required by this algorithm.
1.3. Partition detecting âmetastable landscapesâ.
The Wilsonâs sampling algorithm described above shows that the resulting partition has the tendency to cluster in the same block (tree) points that can be visited by the RW with high probability on time scale . In this sense the loop-erased partitioning has the tendency to capture metastable-like regions (blocks), namely, regions of points from which it is difficult for the RW to escape on time scale . This makes the probability an interesting measure for randomized clustering procedures, see in this direction [2] and [3, Sec. 5]. Yet, a-priori it is not clear how strong and stable is this feature of capturing âmetastable landscapesâ, since it heavily depends on the underlying geometry (weighted adjacency matrix) and the choice of the killing parameter . The scope of this paper is to start making precise this heuristic by analyzing 2-points correlations associated to on the simplest dense informative geometries.
1.4. Two-point correlations
For a pair of distinct vertices , consider the event that these vertices belong to different blocks in . That is, the event
[TABLE]
where stands for the block in containing . The probability of this event induces a 2-point correlation function which turns out to be analyzable by means of LERW explorations, and it encodes relevant information on how the resulting partition looks like on the underlying graph as a function of the parameters. Here is the formal definition together with an operative characterization.
Definition 2** **(Pairwise LEP-interaction potential).
For given and , and any pair , we call pairwise LEP-interaction potential the following probability:
[TABLE]
where and stand for the laws of the LERW killed at rate and of the RW, respectively, starting from , and the above sum runs over all possible self-avoiding paths âs starting at .
The representation in Eq. 1.4 is a consequence of Wilsonâs sampling procedure described in Section 1.2 and it holds true since, remarkably, in steps (1) and (2) of the algorithm the starting points can be chosen arbitrarily.
Furthermore, we notice that, as for any generic random partition of , such an interaction potential defines a distance on the vertex set. This specific metric can be interpreted as an affinity measure capturing how densely connected vertices and are in the graph . Thus providing a further motivation to analyze it.
Still, the observable captured by is not the only one inducing a natural notion of 2-point correlations associated to . For example, if we express the LEP-potential in Definition 2 as an expectation, i.e. , we may think of normalizing it with the masses of the related blocks and obtain another natural 2-point correlation function. This is captured in the following definition.
Definition 3** **(Pairwise RW-interaction potential).
For given and , and any pair , we call pairwise RW-interaction potential the following correlation function:
[TABLE]
where is the uniform measure on .
As we will see, the functional is actually much simpler to analyze but it captures less insightful information on the underlying graph structure. Further, unlike , this is not a probability, it is neither a metric, and it does not allow to derive a description of the macroscopic structure of . In a sense, the latter is not surprising, in fact (see Lemma 1) this alternative correlation function can be expressed in terms of the sole RW Greenâs kernel without need to introduce the LEP . Note in particular that the uniform measure in Definition 3 corresponds to the invariant measure of the RW .
1.5. Related literature
Several properties of the forest measure associated to the loop-erased partitioning have been derived in the recent [5, 6]. Based on these results, in [3, Prop. 6] and [4, Sect. 5.2], the authors proposed an approach making use of the loop-erased partitioning and so-called intertwining dualities to describe the evolution of local equilibria of a finite state space Markov chain.
As mentioned before, this sampling method based on LERW is originally due to Wilson [22] and shows that the measure considered herein is intimately related to the well-known Uniform Spanning Tree (UST) measure. Actually the measure on spanning rooted forests mentioned in Section 1.2 can be seen as a generalized version of the UST measure which is recovered by taking when . Therefore the results presented in this manuscript are along the line of the flourishing literature on statistical properties of the UST and LERW, see e.g. [1, 7, 8, 9, 11, 18, 12, 14, 15, 19, 20, 21].
A detailed exact and asymptotic analysis of observables related to Wilsonâs algorithm on a complete graph have been pursued in [16]. The derivation of our results is in this spirit, although we deal with the additional randomness given by the presence of the killing parameter, which in turns makes the combinatorics more involved.
We further mention that in dense geometries, the UST has been studied under the perspective of the continuous random tree topology on the complete graph [1] and with respect to local weak convergence still on the complete graph [9] and more recently on growing expanders admitting a limiting graphon[10]. These other interesting lines of investigation could also be naturally considered for the forest measure in Section 1.2 but we will not pursue these approaches in this work.
1.6. Paper overview
Our main theorems are presented in Section 2 and identify the LEP-potential in Definition 2 and its asymptotics on a complete graph, Theorem 1, and on a non-homogeneous complete graph with two communities, Theorems 2 and 3. Some consequences on the macroscopic emergent partition on these mean-field models are derived in Corollary 1. The last result in Proposition 1 concerns the asymptotics detectability related to the other 2-point correlation function in Definition 3. The concluding Sections 3 and 4 are devoted to the proofs for the complete graph and the community model, respectively.
1.7. Basic standard notation
In what follows we will use the following standard asymptotic notation. For given positive sequences and , we write:
- â˘
if .
- â˘
if .
- â˘
if .
- â˘
if .
- â˘
if .
- â˘
if .
For we will denote by the descendent factorial. Furthermore, we denote by the identity matrix, and , respectively, for the row and column vectors of all âs, where the dimensions will be clear from the context. We will write for the transpose of a matrix .
2. Results: correlations and emerging partition on mean-field models
Our first result characterizes the LEP-potential in absence of geometry for finite , and shows that this probability is asymptotically non-degenerate at scale :
Theorem 1**.**
(Mean-field LEP-potential and limiting law)* Fix and let be a complete graph on vertices with constant edge weight . Then, for all ,*
[TABLE]
Furthermore, if , for fixed , then
[TABLE]
*with being a standard Gaussian random variable. *
Notice that the critical scale is the typical length of a LERW path with no killing andâas can be derived by the results in [16]âis the typical length of the first branch of the Wilsonâs algorithm on the complete graph, when .
Our second result is the analogous of Eq. 2.1 when still every vertex is accessible from any other, but the edge weights are non-homogeneous and give rise to a community structure. In this sense we will informally refer to this network as of a mean-field-community model. Formally, for given positive reals and , we denote by the graph with , and if is such that either or , and otherwise. Thus, the weight measures the pairwise connection intensity within the same community, while between pairs of nodes belonging to different communities. Given the symmetry of the model, we will use the notation to refer to the potential , for and in different communities. Conversely, we set for the potential associated to two nodes belonging to the same community.
Theorem 2**.**
*(LEP-potential for mean-field-community model) *** Fix and consider a two-community-graph . Let be a geometric random variable with success parameter
[TABLE]
and let be a discrete-time Markov chain with state space and transition matrix
[TABLE]
Denote by the corresponding local time in state up to time and by the corresponding path measure starting from .
For , set if , and if ,then
[TABLE]
where
[TABLE]
with, for ,
[TABLE]
and
[TABLE]
with
[TABLE]
The above theorem is saying that the pairwise LEP-potential can be seen as the double-expectation of the function in Eq. 2.3 with respect to the geometric time and to the local time of the coarse-grained RW . As can be seen in the proof, the analysis of this model can be in fact reduced to the study of such a coarse-grained RW jumping between the two âlumped communitiesâ up to the independent random time . The function is the crucial combinatorial term encoding in the different parameter regimes the most likely trajectories for such a stopped two-state macroscopic walk .
Remark 1**.**
*(Extensions to many communities of arbitrary sizes and weigths) *** The formula in Eq. 2.3 can be derived also for the general model with arbitrary number of communities of variable compatible sizes and arbitrary weights within and among communities. The corresponding statement and proof are more involved but they follow exactly the same scheme of this equal-size-two-community case captured in the above theorem. We refer the reader interested in such an extension to [17].
The next theorem gives the limit of the LEP-potential computed in Theorem 2, the resulting scenario is summarized in the phase-diagram in Fig. 1.
Theorem 3**.**
*(Detectability and phase diagram for two communities) *** Under the assumptions of Theorem 2, set , and for some . Then:
- (a)
if , and .
- (b)
if , and .
- (c)
if , and .
- (d)
if ,
- (e)
if , , .
- (f)
if ,
Remark 2**.**
(Anticommunities for negative )* The above theorem is stated for arbitrary and . We notice that while for we are back to the complete graph with constant weight 1, for , it would be more appropriate to speak about âanticommunitiesâ rather than communities. In fact in this case, at every step, the RW prefers to change community rather than staying in its original one. Thus, it is somewhat artificial to see what the loop-erased partitioning captures. This is the reason why the plot in Fig. 1 is restricted to . However, the theorem still remains valid for negative and, not surprisingly, the difference between the in and out potentials turns out to be zero.*
The next statement collects some simple consequences, deduced from these two-point LEP-potential, on the macroscopic structure of . We recall that stands for the number of blocks in the random partition .
Corollary 1**.**
(Macroscopic emergent structure)* Under the assumption of Theorem 3, the following scenarios hold true. If , there exists depending only on and s.t.*
[TABLE]
Moreover:
- (a)
if then whp there are two blocks of linear size s.t. each block has a fraction of vertices from the same community.
- (b)
if then whp there are two blocks of size s.t. each block has a fraction of vertices from the same community.
- (c)
if then whp there is at least a block of linear size.
- (d)
if then whp there is one block of size .
- (e)
if then whp there is at least a block of linear size.
- (f)
if then whp blocks of linear size do not exist.
Theorem 3 says that the LEPâpotential contains sufficient information to detect the underlying communities in a parametric region where the ratio of the out and in weights is bigger than . This suggests that estimating the probabilities in Definition 2 could be a valuable method to design a community detection algorithm for well-separated regions. Nonetheless, there can be other observables associated to which perform better, meaning e.g. that they can be used for detection beyond regions (a)â(c) in Fig. 1. However, it is not the scope of this paper to explore the practical applications and implications of this loop-erased partitioning in the context of community detection. For this reason we will omit complexity and other algorithmic considerations. As already mentioned, our main goal is rather to start understanding analytically the measure and its emergent structure.
Our last result, Proposition 1, is the analogous of Theorem 3 for the RW-potential in Definition 3 and shows that this other potential gives essentially no insight on the emergent partition and very little can be detected from it. To state the result, we first give in the next lemma a characterization of the RW-potential which reveals that in reality this other 2-body interaction is determined only by the RW flow in the graph rather than the LEPâmeasure.
Lemma 1**.**
(RWâpotential independent of LEP structure)* For any arbitrary graph on vertices, the pairwise correlation function in Definition 3 admits the following representation:*
[TABLE]
where
[TABLE]
is, up to the factor , the Greenâs kernel of the RW stopped at an independent exponentially distributed time , with rate .
We can now state the detectability captured by this RWâpotential in the mean-field-community model. As for the LEP-potential we adapt the notation to distinguish between pairs within the same community or not.
Proposition 1**.**
(Detectability via RWâpotential)* Consider the twoâcommunityâgraph with , and . Then, if and *
[TABLE]
On the other hand:
[TABLE]
As anticipated, this last statement shows that this RW-potential is less informative than the LEP one. In particular, the detectable parametric region is narrower and corresponds to the triangle for in the detectable region depicted in Fig. 1.
3. Proofs of Theorem 1: homogeneous complete graph
Proof of Eq. 2.1
For convenience, we consider a discretization of the continuous time Markov process with generator
[TABLE]
Set , so that and the associated transition matrix is given by
[TABLE]
If we consider the killing as an absorbing state within the state space of the Markov chain extended from to , denoting this absorbing state, we get the adjacency matrix
[TABLE]
and generator
[TABLE]
We can then normalize it by setting
[TABLE]
and get a discrete RW with transition matrix given by
[TABLE]
where
[TABLE]
It should be clear that a sample of a LE-path starting at a given vertex can be obtained as the output of the following procedure:
- â˘
With probability the discrete process reaches the absorbing state. In particular we set for a geometric random variable of parameter .
- â˘
With probability the LERW moves accordingly to the law where is the last reached node.
- â˘
We call the vertices covered by the LE-path up to time . Then, if at time the transition takes place and the vertex , then . Conditioning on , the latter event occurs with probability . Conversely, if , then we remove from all the vertices that has been visited by the LERW since its last visit to . As consequence the quantity reduces. One can then compute that the reductions occur with law
[TABLE]
It would be easier to look at the quantity by using the following metaphor. We interpret as the height from which a bear fall down while moving on a stair of height . In particular, we will assume that
- â˘
The bear starts with probability 1 from the first stair.
- â˘
At each time the bear select a step of the stair uniformly at random, including also the step he currently stands on.
- â˘
If the choice made by the bear is a lower step (or the current one), he moves to that step.
- â˘
If he chooses an upper step, then he walks in the upper direction by a single step.
- â˘
Before doing each step, there is a probability as in Eq. 3.7 that the bear âfalls downâ.
Let us next fix , that is, , so that we can study the bearâs dynamic independently of his falling. By setting for the position of the bear at time , we get
[TABLE]
The latter implies that at time we reached the ergodic measure over the first steps of the stair, while at time the probability measure is exactly the ergodic one.
It is interesting to notice that an easier expression can be written for the cumulative distribution of the variable , i.e.
[TABLE]
Next, calling the time immediately before the bear falls, we get
[TABLE]
which gives us the distribution of the last step of the bear before his failing. Recall that this is equivalent to the length of the original LERW starting on , when the walk is stopped at an exponential time of rate . Hence, we are now left to compute the probability that another walker, starting on , is killed before it hits the previously sampled LERW.
Thanks to the bear metaphor, for the size of the LE-trajectory we get:
[TABLE]
and by explicit computation, setting for the first hitting time of the LE-path ,
[TABLE]
â
Proof of Eq. 2.2
Let
[TABLE]
and notice that if , with , then
[TABLE]
Call
[TABLE]
in order to rewrite
[TABLE]
and notice that the first term in the latter sum is the probability that the geometric random variable assumes value . Moreover it trivially holds that
[TABLE]
Hence,
[TABLE]
Let us approximate at the first order as follows
[TABLE]
Next, set and , notice that and that
[TABLE]
since converges in distribution to as diverges. In view of the latter together with Eq. 3.22, we can estimate
[TABLE]
where the last inequality holds true by choosing any which in particular guarantees that . â
4. Proofs for mean-field-communities
4.1. Proof of Theorem 2
We use here the same line of argument used in the proof of Theorem 1. We will consider the process having state space , where
[TABLE]
and generator
[TABLE]
We will specialize later on the case .
We now consider a killed LERW , and we denote by the set of points of the -th community belonging to , i.e.,
[TABLE]
We can write
[TABLE]
and we assume, without loss of generality, that ; then, by conditioning, we get for with ,
[TABLE]
being the hitting time of .
The LERW starting from
A result due to Marchal [13] provides the following explicit expression for the probability of a loop erased trajectory:
[TABLE]
By looking closely at the latter formula we distinguish two parts: a product over the weights of the edges of the path and an algebraic part containing the ratio of two determinants which encodes the âloop-erasedâ feature of the process. In particular we notice that the former contains all the details about the trajectory, while the latter only depends on the number of points visited in each community. Let (respectively, ) be the number of jumps from the first community to the second (from the second to the first, respectively) along the LE-path. We have
[TABLE]
where
- â˘
The first binomial coefficients stays for the possible choices for the points in (one of those must be ) over the possible points of the first community (except ). In the second community we can choose any vertices over the possible vertices of the second community (except ).
- â˘
The factorials stay for the possible ordering of the nodes covered in each community. Notice that the path on the first community must start by .
- â˘
We sum over all the possible jumps from the first community to the second, , and from the second to the first, (notice that if must be equal or one smaller than ).
- â˘
For any choice over the product of the previous three terms we have a path that has probability as given by the Marchal formula.
In the case in which we condition on having both and in the same (first, say) community we have
[TABLE]
Namely, only the first combinatorial term changes.
The ratio of determinants
In our mean-field setup, the terms in Eq. 4.6 and Eq. 4.7 coming from  Eq. 4.5 can be explicitly computed. We consider here the two communities case, i.e. , where the communities possibly have different sizes, and . Now, consider the matrix obtained by erasing () rows and corresponding columns in the first community (the second one, respectively) in . We are left with a square matrix made of two square blocks on the diagonal of size (respectively ). We will denote this matrix by
[TABLE]
where the elements on the diagonal are given by
[TABLE]
We want to find solutions of the problem
[TABLE]
First we consider eigenvectors of the form , where the upper component has length and the lower one has length . If we write explicitly Eq. 4.10 we get the following linear system:
[TABLE]
from which we get two eigenvalues, which we will refer to as and .
Then we consider ; with this choice we are left with the system
[TABLE]
and we have to find eigenvalues that are associated with eigenvector orthogonal to constants. By direct computation, has eigenvalue with multiplicity . With the opposite choice, namely , we get
[TABLE]
Namely, there is an eigenvalue with multiplicity . So the spectrum of is
[TABLE]
with multiplicity denoted by :
[TABLE]
Therefore, we can see that the ratio of determinants in Eq. 4.6 and Eq. 4.7 can be written explicitly. Indeed, at the denominator we have
[TABLE]
while at the numerator we are left with
[TABLE]
where
[TABLE]
while and are the two solutions of the system in Eq. 4.11. In particular, if we specialize in the case we can conclude that the ratio of determinants is given by
[TABLE]
where we defined
[TABLE]
and
[TABLE]
The path starting from
Now we have to consider the second path starting from which decides the root at which will be connected in the forest generated by the algorithm. The latter corresponds to the second factor in Eq. 4.4. Notice that it is sufficient to consider such path in the simpler fashion, i.e. without erasing the loops, since we are only concerned with the absorption of the walker: either in or killed at rate . Moreover, we can exploit again the symmetry of the model to reduce it to a Markov chain with state space corresponding to the sets , where is again the absorbing state, i.e., the âstate-independentâ exponential killing. We will assume that
[TABLE]
Hence, the transition matrix we are interested in is given by
[TABLE]
where
[TABLE]
[TABLE]
with
[TABLE]
The states represent:
- ()
nodes of the community that have not been covered by the LE-path started at .
- ()
nodes of the community that have not been covered by the LE-path started at .
- ()
nodes of both communities that have been covered by the LE-path started at .
- ()
the absorbing state .
Called the hitting time of the absorbing set , we want to compute the probability that the process is absorbed in the state, and not in . In terms of our original process, this means that the process is killed before the hitting of the LE-path starting at . By direct computation
[TABLE]
notice that the first component of the vector corresponds to the intra-community case for some , i.e., , while the second one to the inter-community case, namely .
If we now use the assumption that , the steps above allow us to write the following formulas
[TABLE]
[TABLE]
where
[TABLE]
as in Eq. 4.19 and
[TABLE]
By direct computation we see that
[TABLE]
where
[TABLE]
Local time interpretation
Now consider the part of the formula concerning the jumps among the two communities of the killed-LE-path starting at , i.e.
[TABLE]
The latter can be thought of as a function of a Markov Chain on the state space , with transition matrix
[TABLE]
where the -th state stays for the -th community. Indeed, we can rewrite Eq. 4.32 as
[TABLE]
[TABLE]
with being the local time as in the statement of Theorem 2.
Geometric smoothing
From the previous steps we get the following expression
[TABLE]
Next, we would like to make appear a geometric term as in the complete and uniform case of Theorem 1. Notice that multiplying and dividing by one obtains
[TABLE]
we can then define
[TABLE]
in order to obtain
[TABLE]
and
[TABLE]
where is an independent random variable with law .
Conclusions
One can ideally divide the formulas in Eqs. 4.38 and 4.39 in five terms, namely
- (1)
The entropic term
[TABLE]
was already present in the complete and uniform case Eq. 2.1. Indeed
[TABLE] 2. (2)
The term related to the spectrum of the size 2 matrix presented in Eq. 4.11, i.e.
[TABLE]
which is the same in both in e out community cases. It can be rewritten as the ratio between two parabolas in , i.e.,
[TABLE] 3. (3)
The term related to the geometric random variable of parameter , which was present also in the case of the uniform graph, Eq. 2.1. 4. (4)
The term related to the local times of the 2-states Markov chain , in Eq. 4.33. 5. (5)
The term related to the absorption probability, i.e., to the quantity , see Eq. 4.25, as a function of the process presented in Eq. 4.21.
It is worth noticing that the above is slightly different from the in the statement of Theorem 2 which contains the extra factor . At this point by setting
[TABLE]
[TABLE]
we can write
[TABLE]
and
[TABLE]
which is equivalent to the statement in Theorem 2. â
4.2. Proof of  Theorem 3
**Proofs of (a) and (b): (detectability) **
As expressed in the following lemma in this regime the RW is confined to its starting community for the entire life-time.
Lemma 2** (RW is confined to its community up to dying).**
Let and for , consider the event
[TABLE]
where is the first time in which the RW moves out of the community in which lies.
Then, as ,
[TABLE]
Proof.
Let be a r.v. that can assume values in the set with probabilities:
[TABLE]
[TABLE]
Let be a sequence of i.i.d. r.v.s with the same law of and notice that
[TABLE]
Therefore
[TABLE]
from which the claim. â
In view of the decomposition in Eq. 1.4 and the above lemma, we can write for any
[TABLE]
Let us first consider . In this case, by  Lemma 2, for any and uniformly in , we have that
[TABLE]
As a consequence , and by plugging this estimate in Eq. 4.48, we get .
Concerning , one has to notice that, for every LERW starting from and ending at the absorbing state, we can consider the event
[TABLE]
Once more, uniformly in , we get by Lemma 2 that
[TABLE]
Thus, for , by Eq. 4.48, we can estimate
[TABLE]
Notice that, under such conditioning, the sum can be read as the probability that two vertices in a complete graph with vertices end up in two different trees. Therefore, this reduces to Eq. 2.2, which in turns gives for and else. â
Proof of (f) : (high killing region)
We will only show that , this will suffice since e.g. by direct computation one can check that .
Observe first that being , the length of the Loop-Erased path must be âsmallâ with high probability. In particular we can bound
[TABLE]
hence
[TABLE]
â
We next prove the remaining items in Theorem 3 for which we will implement a similar strategy which we start explaining. In all remaining regimes we need to show that , either vanishes or stays bounded away from zero. To this aim, we will use the representation in Eq. 2.3.
Depending on the parameter regimes, we will split the sum over in different pieces to be treated according to the asymptotic behavior of the involved factors. To simplify the exposition we will restrict in what follows to the positive quadrant . We stress however that, as the reader can check, the following estimates hold true and actually converge faster even outside of the positive quadrant.
Let us start with a few observations. We notice that for every choice of , moreover if . Furthermore, for each ,
[TABLE]
and while estimating the involved factors it will be crucial the behavior of the product for which we can in general observe the following facts.
- (A)
For any , if , then it follows from Eq. 3.23 that decays to zero, uniformly in , faster than any polynomial as . For such âs , since is polynomially bounded (uniformly in ), the contribution in Eq. 2.3 of such terms can be neglected. 2. (B)
Whenever we consider âs for which , because of Eq. 4.49 and the uniform control on , the contribution of such terms in Eq. 2.3 can also be neglected. 3. (C)
For âs for which neither Item A nor Item B hold, we will estimate the asymptotics of such part of the sum by controlling the mass of the geometric time against , and in the most delicate cases (on the separation lines in Fig. 1), taking into account the behavior of the local time too.
We are now ready to treat the remaining parameter regimes using such facts.
**Proof of (d): (changing-communities before dying) **
In this regime, the overall picture resembles the phenomenology of the complete graph. In particular, the RW will manage to change community before being killed and up to the killing time scale, it will forget its starting community. Moreover, with high probability a single tree of size will be formed, so that, given any two points , they will end up in the same tree with high probability independently on their communities.
To prove the claim notice that, uniformly in ,
[TABLE]
As a consequence the asymptotics of will be independent of . To show that such a limit is zero we argue as follows. Within this parameter region:
[TABLE]
which together with Eq. 4.50 leads to
[TABLE]
We can now plug in this asymptotic representation of in Eq. 2.3, and separately treat the four resulting terms.
For the first term, namely the sum in Eq. 2.3 with in place of , we split the sum in into two parts at , for small , and show that they both goes to zero, by using Item C and Item B, respectively In fact, with this âcutâ we see that:
[TABLE]
Analogously, for the second term we split the sum over into two parts at , with small . Using Item C for the first part and Item A for the second one, we see that
[TABLE]
For the third term we need to split the corresponding sum into three parts at and , which will be controlled by Item B, Item C and Item A, respectively. That is
[TABLE]
Finally, for the last term, we split the sum at . Indeed we see that: on the one hand, for , we can use Item C since
[TABLE]
On the other hand, for , we can argue as in Item A. Hence,
[TABLE]
â
Proofs of (c) and (e) (high-entropy separating lines)
We start by proving (e), i.e.
[TABLE]
Start noting that under our assumptions on and we have that
[TABLE]
and
[TABLE]
We are going to split the sum over in Eq. 2.3 in three parts:
- â˘
. For such âs we have that the product is of order . Hence we can neglect this part by using Item C together with the estimate
[TABLE]
- â˘
. Also this part can be neglected thanks to the argument of Item A.
- â˘
. This is the delicate non-vanishing part. We start by noticing that, due to Eq. 4.67 and Eq. 4.68, the leading term in does not involve , so that âat first orderâ must equal . In order to show that the latter two are asymptotically bounded away from zero, we fix and consider
[TABLE]
Moreover, thanks to Eq. 4.71 we can easily deduce that the limit is strictly smaller than .
We next conclude by giving the proof of (e), i.e., we are going to show that
[TABLE]
Observe that, under our assumptions on and , we have that
[TABLE]
and
[TABLE]
hence, their product behaves asymptotically as
[TABLE]
To evaluate the asymptotic behavior of , we split the sum over in Eq. 2.3 in three pieces:
- â˘
: where, thanks to Eq. 4.75, we know that . We argue as in Item C, obtaining
[TABLE]
- â˘
: in this case we can argue as in Item A.
- â˘
: in this case we have to distinguish between and .
Consider first . We call the following event concerning the Markov chain
[TABLE]
Notice that if then the event occurs with high probability. Hence, for any choice of and we can write
[TABLE]
being the Kronecker delta. Hence
[TABLE]
Concerning , it is easy to get a lower bound via a soft argument by considering the events
[TABLE]
[TABLE]
Indeed,
[TABLE]
Finally, we are left to show that is asymptotically bounded away from . We consider the further split
[TABLE]
Focusing on the first sum in the latter display, thanks to Eq. 4.75, we have that
[TABLE]
Concerning the second sum, we have
[TABLE]
â
4.3. Proof of Corollary 1
Let be the eigenvalues of . As shown in[5, Prop. 2.1], the number of blocks of the induced partition, , is distributed as the sum of independent Bernoulli random variables with success probabilities . That is
[TABLE]
In case of the two-communities model we have
[TABLE]
Therefore
[TABLE]
where
[TABLE]
Hence
[TABLE]
Moreover, we can prove the concentration result claimed in the first part of the statement by using the multiplicative version of the Chernoff bound on the sum of âs. Indeed, denoting by
[TABLE]
we have that
[TABLE]
and since
[TABLE]
we can deduce the concentration of .
Notice also that the second part of the statement is a trivial consequence of the detectability result of Theorem 3. â
4.4. Proof of Lemma 1
In this proof we will consider the probability measure on the space of rooted spanning forests studied in [5], namely,
[TABLE]
where we denoted by the set of roots of . As mentioned in Section 1.2, we stress that the measure in Definition 1 can be obtained by projecting this forest measure on the set of partitions.
Call the -field generated by the block structure of the random forest . By [5, Proposition 6.4], we have
[TABLE]
Now we notice that by Definition 3 and the tower property,
[TABLE]
We can now invoke [5, Theorem 3.4], stating that the set of roots is a determinantal process with kernel . As a consequence we obtain that
[TABLE]
and the claim readily follows. â
4.5. Proof of Proposition 1
We consider here the discrete time version of the process as presented in Theorem 1, see (3.6). As a warm-up, we start by computing the potential in the complete graph with unitary weights. In this case,
[TABLE]
where
[TABLE]
Therefore,
[TABLE]
From which:
[TABLE]
Thus, in order to have a non-degenerate potential on , we need to take .â
We next move to the mean-field-community model with , and arbitrary . The corresponding discrete-time RW is killed at an independent geometric time with
[TABLE]
Denoting by the random variable that counts the number of times, up to time , in which this random walk changes community, we notice that:
[TABLE]
that is, conditioning on , has binomial distribution with success parameter
[TABLE]
We are now in shape to compute the probability that is absorbed in some . Without loss of generality we assume , so that and determines the and potential, respectively.
Thus:
[TABLE]
where the last identity is due to the fact that the sum in Eq. 4.95 is a probability and hence bounded above by .
(high killing) When , with , , thus the term in Eq. 4.96 is negligible, and . In particular, the potential diverges as or depending on or , respectively.
(order one killing) In the regime , the term in Eq. 4.96 is no longer negligible and needs to be analyzed further. Let us first consider the sub-regime .
Notice that, when ,
[TABLE]
Clearly, implies that , while if then From which, if , then
[TABLE]
while, for :
[TABLE]
where in Eqs. 4.98 and 4.99 we used the fact that, in order to compute the first order, it is sufficient to restrict the sum over to the values on the scale . By Eq. 4.95 and the above estimates, we conclude that, for :
[TABLE]
and , which together with Definition 3 lead to:
[TABLE]
On the other hand, for , the estimate in Eq. 4.99 shows that, regardless of the community of , . Thus the and potentials are asymptotically equivalent. In particular, .
(vanishing killing) It remains to analyze the case when for some negative . In this case, we have that
[TABLE]
We can then argue as in the case but distinguishing between being bigger or smaller than . In particular, due to Eq. 4.102, when the resulting and potentials are asymptotically equivalent and decay as . On the other hand, for , , which together with Eq. 4.102 and Eq. 4.95 lead to the estimates: , for and for pairs in different communities. By plugging these estimates in Lemma 1 the statement follows. â
Acknowledgments
L. Avena was supported by NWO Gravitation Grant 024.002.003-NETWORKS. M. Quattropani was partially supported by the INdAM-GNAMPA Project 2019 âMarkov chains and games on networksâ. Part of this work started during the preparation of the master thesis [17] and the authors are thankful to Diego Garlaschelli for acting as co-supervisor of this thesis project.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Aldous, The Continuum Random Tree. I. Ann. Probab. 19, 1â28 (1991).
- 2[2] L. Avena, F. Castell, A. Gaudillière and C. MÊlot, Intertwining wavelets or multiresolution analysis on graphs through random forests, ACHA DOI:10.1016/j.acha.2018.09.006 (2018).
- 3[3] L. Avena, F. Castell, A. Gaudillière and C. MĂŠlot, Random Forests and Networks Analysis, J. Stat. Phys. 173, 985â1027 (2018).
- 4[4] L. Avena, F. Castell, A. Gaudillière and C. MÊlot, Approximate and exact solutions of intertwining equations through random spanning forests, Ar Xiv:1702.05992 (2017).
- 5[5] L. Avena and A. Gaudillière, Two applications of random spanning forests, J. Theor. Probab. 31, 1975â2004 (2018).
- 6[6] L. Avena and A. Gaudillière, A proof of the transfer-current theorem in absence of reversibility, Stat. Probab. Lett. 142, 17â22 (2018).
- 7[7] I. Benjamini and G. Kozma, Loop-erased random walk on a torus in dimensions 4 and above, Comm. Math. Phys. 259, 257â286 (2005).
- 8[8] R. Burton and R. Pemantle, Local characteristics, entropy and limit theorems for spanning trees and domino tilings via transfer-impedances, Ann. Probab. 21, 1329â1371 (1993).
