Finding a Small Vertex Cut on Distributed Networks
Yonggang Jiang, Sagnik Mukhopadhyay

TL;DR
None
Contribution
None
Abstract
We present an algorithm for distributed networks to efficiently find a small vertex cut in the CONGEST model. Given a positive integer , our algorithm can, with high probability, either find vertices whose removal disconnects the network or return that such vertices do not exist. Our algorithm takes rounds, where is the number of vertices in the network and denotes the network's diameter. This implies round complexity whenever . Prior to our result, a bound of is known only when [Parter, Petruschka DISC'22]. For , this bound can be obtained only by an -approximation algorithm [Censor-Hillel, Ghaffari, Kuhn PODC'14], and the only known exact algorithm takes rounds,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Advanced Graph Theory Research · Stochastic Gradient Optimization Techniques
Finding a Small Vertex Cut on Distributed Networks
Yonggang Jiang MPI-INF, Germany, [email protected]
Sagnik Mukhopadhyay University of Sheffield, UK, [email protected]
We present an algorithm for distributed networks to efficiently find a small vertex cut in the CONGEST model. Given a positive integer , our algorithm can, with high probability, either find vertices whose removal disconnects the network or return that such vertices do not exist. Our algorithm takes rounds, where is the number of vertices in the network and denotes the network’s diameter. This implies round complexity whenever .
Prior to our result, a bound of is known only when [Parter, Petruschka DISC’22]. For , this bound can be obtained only by an -approximation algorithm [Censor-Hillel, Ghaffari, Kuhn PODC’14], and the only known exact algorithm takes rounds, where is the maximum degree [Parter DISC’19]. Our result answers an open problem by Nanongkai, Saranurak, and Yingchareonthawornchai [STOC’19].
Contents
-
2.1 IsolatingSmallCut (proof sketch of Lemma 2.1; full proof in Section 4)
-
2.2 SingleSourceLocalCut (proof sketch of Lemma 2.2; full proof in Section 5.)
1 Introduction
For any undirected non-complete111For complete graphs, the problem is trivial so we ignore this case. graph , a set is called a vertex cut if contains at least two connected components, where is obtained by removing vertices in from . In the vertex cut or vertex connectivity problem, we are given a positive integer and want to either find a vertex cut of size at most or to answer that such vertex cut does not exist. Vertex cut is a fundamental graph property and computing it is one of the most basic problems in graph algorithms. For example, it quantifies the vulnerability of a communication network in terms of the minimum number of vertices whose failures can disconnect the network. In the sequential model, this problem has been extensively studied over many decades (e.g. [Kle69, Tar72, HT73, ET75, Eve75, BDD*+*82, LLW88, KR91, CT91, NI92b, HRG00, Gab06, Geo10, HRW20, NSY19, FNY*+*20, LNP*+*21]). For , a linear-time algorithm via depth-first search was long known due to Tarjan [Tar72]. For , the linear-time algorithm was due to Hopcroft and Tarjan [HT73]. For , an -time algorithm was recently discovered by [NSY19, FNY*+*20]. For other values of , a reduction to maxflow by [LNP*+*21] together with the very recent fast maxflow algorithm of [CKL*+*22] led to an almost-linear time algorithm.
To conclude, the vertex cut/connectivity problem is almost solved in the sequential setting. However, when it comes to distributed networks computing their own vertex cut, much less is known. This is the case even when it wants to find a few (say, ) vertices whose failures might destroy its communication. A distributed algorithm for finding a small vertex cut is the focus of this paper.
Distributed Vertex Cut.
We study computing the vertex cut problem in the CONGEST model of distributed networks. In this model, an undirected graph is given as the communication network. Two important parameters are and , the diameter of . Time is divided into discrete rounds. In each round, each vertex can send an bits message to each of its neighbors. After each round, each vertex can locally perform arbitrary computation and decide what to send in the next round. Initially, each vertex is given a specified input indicating some local information of the network (e.g. neighbors and weights of its incident edges). For the vertex cut problem, the input of each vertex is simply the set of its neighbors and integer . After several rounds, all vertices are expected to terminate and generate the desired output. For the case of the vertex cut problem, we expect at most vertices to identify themselves as being in a vertex cut if such a cut exists; otherwise, every vertex knows that such a cut does not exist. The goal is to minimize the number of rounds before all vertices terminate.
The CONGEST model is a standard model for studying basic graph algorithms in the message-passing distributed networks, e.g. minimum spanning tree (MST), shortest paths, min-cut, and approximate maxflow [GKP98, KP98, PR00, Elk06, DHK*+*12, GK13, NS14, GKK*+*15, FN18, Elk20, GL18, DEMN21]. These problems typically admit a trivial lower bound of ; thus, the focus is usually on the dependency on . A large number of graph problems were shown to require rounds, and this bound has become a gold standard.222Throughout, , , and hide . Examples of such problems include MST [GKP98, KP98, PR00, Elk06, DHK*+*12], approximate shortest paths [LP13, NS14, HKN21], approximate 2-edge connected spanning subgraph (2-ECSS) [Dor18, DG19], tree packing [CGK14] and approximate maxflow [GKK*+*18]. For cut-related problems, a line of work (e.g. [DHK*+*12, GK13, NS14, DHNS19, GNT20, DEMN21]) led to an bound for computing edge cut that holds even for weighted graphs [DEMN21, MN20]. The bound matches the lower bounds from [GK13, DHK*+*12] (the lower bounds hold when is large enough).333[DHK*+*12] proved a lower bound of for computing weighted mincut on some graphs of diameter . For the unweighted case, it follows from [GK13, Theorem 6.4] that for any , there is a lower bound of some graphs with diameter and edge cut . Moreover, when the edge cut is small, better algorithms exist: For , the problem can be solved in time [PT11]. For other values of , there is a rounds algorithm[Par19]. (The last bound is small under a typical assumption that .)
In sharp contrast with the above, our understanding of distributed vertex cut is much less complete. To the best of our knowledge, existing algorithms consist of
- nosep
an -round algorithm that works only when [Thu97], 2. nosep
an -round algorithm that works only when [PT11] ( denotes the maximum degree), 3. nosep
an -approximation -round algorithm [CHGK14a], and 4. nosep
an -round algorithm [Par19], 5. nosep
an -round algorithm that works only when [PP22].
Thus, even to find vertices that can disconnect the network, the available solutions are to either settle with a much bigger approximate solution of size [CHGK14b] or find an exact solution in time [Par19] which can be prohibitively slow for typical networks with large-degree “hubs” (e.g. the star networks). In other words, even for we are already very far from the typical -time exact algorithms!
Challenges.
A fundamental difficulty in solving the vertex cut problem is its tight connection to maxflow computation. For example, while edge cut algorithms that are faster than solving maxflow were known in the sequential model for many decades (e.g. [NI92a, Kar00, MN20, GMW20, GMW21], it was only very recently that a vertex cut algorithm that is as fast as solving maxflow (and not faster) was found [LNP*+*21]. The situation is even worse in the distributed setting. For example, consider the case where we know two vertices and such that removing vertices in would disconnect from (this is a basic case that all the state-of-the-art sequential algorithms have to solve [LNP*+*21, FNY*+*20, NSY19]). When , one can solve vertex cut in linear time in the sequential model using the Ford-Fulkerson algorithm. In contrast, in the distributed setting we cannot even solve this case in the typical rounds because it generalizes the distributed reachability problem, whose best-known round complexities are [GU15] and [LJS19]. More generally, the distributed setting poses an additional challenge for computing vertex cut because there is no non-trivial maxflow algorithm available.444The exception is the approximate maxflow algorithm of [GKK*+*15]. However, approximate maxflow was not known to be useful for solving vertex cut. Thus, to design distributed vertex cut algorithms, one needs to overcome fundamental questions of whether one could avoid maxflow computations or develop maxflow algorithms specialized for solving vertex cut. Since efficient maxflow algorithms are not available in many models of computation (e.g. graph streaming and parallel computing), answering these questions may lead to efficient vertex cut algorithms in other models as well.
1.1 Our Result
We show that, in rounds, a distributed network can find up to vertices that can disconnect itself. More generally, our result is the following.
Theorem 1.1** (Informal. See Theorem 2.11 for a formal version.).**
There is a randomized algorithm in the CONGEST model that, with input and undirected graph , takes rounds and determine whether is -vertex-connected or not; if not, output the minimum vertex cut.555When , our running time guarantee becomes at least , which is quite bad, so we do not consider this case here. Also notice that by running Ford–Fulkerson algorithm from a sampled node to any other nodes in parallel, one can easily get a algorithm.
Our bound can be thought of as generalizing the bound of [Thu97] that works only when to any ; however, the techniques we use are very different. It is sublinear in as long as . Our result answers an open problem from [NSY19].
1.2 Techniques
We provide a detailed overview of this framework and our algorithm in the next section. Here, we discuss some challenges and techniques to overcome them that might be of independent interest. Our algorithm follows the framework used by the algorithms of [NSY19, FNY*+*20] for solving vertex cut in time in the sequential model, where denotes the number of edges. These algorithms consider two types of vertex cuts of size (assuming that they exist): a vertex cut that leads to a small connected component is called unbalanced and otherwise it is called balanced.
To find these cuts, we have to execute some maxflow algorithms which keep finding augmenting paths. For an intuition, suppose that there are internally vertex-disjoint -paths between two vertices and . An augmenting path is an -path that, together with the existing paths, let us create internally vertex-disjoint -paths. (See Fig. 1 for an example and Section 2 for a more detailed definition.) Finding an augmenting path is useful because we can show that it exists if and only if there is no vertex cut of size that disconnects and . We now consider finding two types of cuts. Note that below we use ‘Lemma’ for lemmas that are used to provide intuition and are not actually proven.
Finding Unbalanced Cuts: Local Flows and Resolving Congestions (‘Lemmas’ 2.6 and 4.3).
To find unbalanced cuts, [NSY19, FNY*+*20] use local flow algorithms. Like many maxflow algorithms, a local flow algorithm keeps finding augmenting paths to increase the flow size; however, under some conditions, it can find augmenting paths without reading the whole input graph. For example, for vertex cut, [NSY19, FNY*+*20] use local flow algorithms to solve a problem where, given a vertex in the above small connected component , the algorithms can find the cut vertices in time roughly the size of the connected component defined above (more precisely, the volume of ), which can be much less than the size of the whole input graph. By not reading the whole graph, we can execute multiple local flow algorithms in near-linear time in total. This feature plays a key role in designing many efficient sequential algorithms, e.g. finding balanced cuts [ST13, SW19]), edge cut [KT15, HRW20], and dynamically maintaining expanders [Wul17, NS17, NSW17, SW19, CGL*+*20].
Applying the above idea in the CONGEST model, however, requires solving the congestion issue: many augmenting paths from different executions may go through the same edge. For example, the sequential vertex cut algorithms of [NSY19, FNY*+*20] need to compute local flows at some point, and we cannot rule out the case where all these executions require augmenting paths that share the same edge, which would cause rounds to modify all flows along these augmenting paths.
Congestion is a fundamental issue in the CONGEST model (thus the name). It is typically avoided by not executing too many algorithms in parallel. However, for vertex cut, we do not know how to avoid this. As far as we know, the same issue also arose in the distributed expander decomposition computation [CPZ19, CS19], where the authors use PageRank algorithms instead of local flow algorithms (both algorithms can be used to compute the expander decomposition in the sequential model). Then, they exploit the property of PageRank to show that there is not much congestion, thus the congestion issue can be avoided.
In this paper, we solve the congestion issue differently. Essentially, we show that even when there are huge congestions, fraction of the executions can still proceed. To show this, we prove the following (see ‘Lemmas’ 2.6 and 4.3 for detail). We have up to executions of the local flow algorithm of [FNY*+*20] running in parallel. Consider two augmenting paths and from two executions with sources and . If and meet at some vertex , then there is a path either from to or from to that uses only edges explored by the two executions so far such that * can be used as an augmenting path by one of the two executions*.666Here, we also exploit the fact that a source of one execution can be a sink for other executions. In other words, if the augmenting paths from two executions meet at the same vertex, then one of them can augment to another one.
This argument can be extended to show that if many augmenting paths meet at a vertex, then they can stop and only use what they have explored to finish the augmentation for half of them. This property helps reduce congestion when finding augmenting paths from different vertices.
To conclude, the above property allows us to find a vertex cut of size in where , the number of vertices in one of the connected components in the cut (see Lemma 2.1 for detail). Finally, note that given the prevalence of local flow algorithms in designing efficient graph algorithms, similar issues to the above may arise for other problems, and it is interesting to see if our technique can be applied elsewhere.
Finding Balanced Cuts: Specialized Fast Reachability Algorithm (‘Lemma’ 2.7).
Before discussing this case, note that the above algorithm with round complexity already lends itself to a sublinear time algorithm for vertex cut with —one can use this algorithm for small , and Ford-Fulkerson and reachability algorithm when is large. In order to improve the round complexity to even when , there is another fundamental barrier: the need to solve the distributed reachability problem.
For concreteness, assume that removing vertices leaves us with two connected components and each of vertices. This case cannot be solved efficiently by the local flow algorithm since . In the sequential setting, this case can be easily solved by sampling two vertices and and computing a -maxflow of size in a graph. To do this, simply find augmenting paths for rounds (i.e. the Ford-Fulkerson algorithm). This takes time and succeeds with constant probability (since ). In the CONGEST model, however, even answering a simpler question of whether there is one augmenting path from to (i.e., solving the -reachability) requires larger than rounds: The best distributed algorithms for reachability require rounds [GU15] and rounds [LJS19] .
In this paper, we develop an algorithm specialized for our case: when we want to find an augmenting path, we are solving a reachability problem where most edges in the graph are undirected. A result implied by our technique when is as follows. (See ‘Lemma’ 2.7 for the full statement.)
‘Lemma’ 1.2**.**
There exists a randomized CONGEST algorithm that, given two vertices and a set of internally vertex-disjoint -paths , either returns an augmenting path or declares that such path does not exist. The algorithm takes rounds.
So, to find internally vertex-disjoint -paths, we use the above algorithm times, taking rounds in total. This partially explains the round complexity of our final algorithm.
The main technique for proving the above ‘lemma’ is to modify the framework in the reachability algorithms [Nan14, GU15, LJS19]: As usual, we sample hubs and grow BFS trees from each hub and build a virtual graph on the hubs. Our novelty is to use a clustering technique (Lemma 5.4) to create a small number of strongly connected components (or clusters) and give them some ordering with the following guarantee: Any vertex in a cluster can reach any vertex in another cluster which is ordered lower than the former cluster. This clustering lets us reduce the number of vertices and edges in the virtual graph without affecting reachability as well as makes it possible to broadcast the whole virtual graph. See Section 2.2 for an overview of this algorithm.
1.3 Open problems
This paper presents a study on the computational complexity of the vertex connectivity problem for small in the CONGEST model. There are several avenues for future research that may further improve upon the findings presented in this study.
Vertex connectivity in CONGEST model.
- •
(Small ) for small values of , it would be interesting to investigate whether it is possible to surpass the running time with an algorithm given that there is no lower bound for unweighted vertex connectivity. Although algorithms have been developed that run in rounds for , the true complexity for larger remains unknown.
- •
(Large ) the current best algorithms for the general vertex connectivity problem in the CONGEST model do not have sub-linear time complexity when is as large as . It would be interesting to explore the development of sub-linear algorithms for cases where is large.
- •
(Universally optimal) In recent years, there have been many papers seeking universally optimal algorithms, starting from the work by Haeupler, Wajc and Zuzic [HWZ21]. Since our algorithm meet the upper bound for , it would be interesting to explore the development of an algorithm that is universally optimal.
Parallel vertex connectivity.
By combining the current best sequential algorithm for small with the current best parallel algorithm for reachability with depth , it is possible to develop an almost linear work parallel algorithm with depth . It would be interesting to investigate whether it is possible to further reduce the depth of the algorithm to the best reachability algorithm depth of or better. As this paper provides an example of surpassing the reachability running time for small in the CONGEST model, it is reasonable to expect that similar improvements may be possible in the parallel model as well.
Other models of computation.
In addition to advancements in the CONGEST and parallel models of computation, we would like to see further advancements in cut-query and two-party communication models, both in classical and quantum settings, for the problem of vertex connectivity (and minimum vertex cut). Notably, the edge connectivity (and minimum edge cut) has nearly been resolved within the classical setting [RSW18, MN20, LLSZ21] and considerable progress has been achieved within the quantum setting [LSZ21, AEG*+*22]. However, no substantial progress is made for vertex connectivity.
Other graph cut problems.
Ultimately, significant advancements have yet to be made in addressing alternative variants of graph cut problems, including directed edge connectivity and minimum weighted vertex cut, in CONGEST and other distributed models of computation. Consequently, any headway achieved in these domains within any of the distributed models of computation would be of considerable interest.
2 Overview
In this section, we sketch the proof of our main result, i.e. Theorem 1.1. For notations, we use the following: for , denotes the neighbors of in graph , and .
The crux of our algorithm is the subroutines called IsolatingSmallCut and SingleSourceLocalCut, which give guarantees as in Lemmas 2.1 and 2.2 below. We sketch the proofs of Lemmas 2.1 and 2.2 in Section 2.1 and Section 2.2 respectively. Then, in Section 2.3, we show how to combine them together by following the framework of [FNY*+*20].
We also denote the vertex cut of the graph by , where are the two sides of the cut, and is the set of vertices whose removal disconnects from . Lemma 2.1 roughly guarantees that if we have a set of vertices such that, for some -cut , exactly one vertex in is in (i.e. and its neighbors), then we will be able to find such a cut or a similar cut in rounds. So, to find a small vertex cut when is small (the “unbalanced case” mentioned earlier), this algorithm will be fast assuming that we can find such an . For intuition, note the following related sequential algorithms. (i) In [LNP*+*21], the same statement to ours is proved in the sequential setting with an algorithm that takes max-flow time (which is currently almost linear [CKL*+*22]). This is done via the isolating cut technique [LP20], thus the word “Isolating” in the name of our algorithm. Unfortunately, we cannot use the same technique since we do not have an efficient exact max-flow algorithm in the distributed setting. (ii) In [FNY*+*20], a similar statement can be guaranteed in time in the sequential setting. Compared to our requirement that , the statement of [FNY*+*20] requires a weaker condition that ( that satisfies this condition can be easily found, e.g. ). As we will show in Section 2.1, our algorithm follows the idea of [FNY*+*20], but our stricter condition gives us some leverage to avoid the congestion issue that we would face if we simply followed the ideas of [FNY*+*20] (discussed in the previous section).
Lemma 2.1** (IsolatingSmallCut(); Proof in Section 4).**
There exists a CONGEST algorithm that given an undirected graph , a set of vertices , and ,777Every vertex knows of their membership in and . either outputs a valid -cut888Every vertex knows of their membership in . with one side such that , or outputs . The output satisfies
- •
if there exists a vertex set such that and , then the algorithm outputs with at most constant probability999When we say ”with constant probability” in this paper, we mean a constant less than 1., and
- •
the algorithm runs in rounds.
Lemma 2.2 roughly guarantees that if we know two vertices and that are on the opposite sides of a -cut, i.e. for some -cut we have and , then we can find a -cut efficiently; here, “efficiently” means the dilation of and congestion of . We need the congestion to be so that we can run algorithms with different simultaneously, while still keeping the running time . It is necessary to run algorithms since we need to sample vertices to guarantee at least one vertex is inside . Each algorithm will take one sampled vertex as .
Note that a similar statement was achieved in the sequential setting in time, which is the time to compute a max-flow of size using Ford-Fulkerson algorithm. As discussed earlier, computing a max-flow will not allow us to beat the time to solve reachability. For this reason, we need some clustering ideas which we show in Section 2.2.
Lemma 2.2** (SingleSourceLocalCut(); Proof in Section 5).**
There exists a CONGEST algorithm that given an undirected graph , two vertices and , where , either outputs a valid -cut, or outputs , such that
- •
if there exists such that , then the algorithm outputs with constant probability,
- •
the algorithm has dilation and congestion .
Remark 2.3*.*
Throughout this paper, it is important for the reader to keep in mind that our algorithm is a Monte Carlo algorithm with one-sided error. Specifically, when the output is a cut, it must be a valid cut with a size less than . However, when the output is , it is possible that the graph has a cut with a size less than , and the algorithm cannot distinguish whether the output is correct or not. Nevertheless, since the algorithm has one-sided error, as long as the error probability is bounded by a constant between 0 and 1, it can be reduced to as small as by repeating the algorithm times.
2.1 IsolatingSmallCut (proof sketch of Lemma 2.1; full proof in Section 4)
The starting point is to run the algorithm of [FNY*+*20] for every vertex in simultaneously, i.e, run rounds of DFS to find augmenting path on residual graphs, defined below. For a path , we define for any and for any . are called the internal vertices of . A set of paths are called internally vertex disjoint if any two of them do not share the same internal vertex. We define as the vertex set consisting of all vertices in . For a set of paths , we define .
Definition 2.4** (-Augmenting Path).**
Let be an undirected graph, and is a set of internally vertex disjoint paths starting from . (We call a flow-path set of .) A path in is called -augmenting if
- (i)
Starting vertex:* starts at and,* 2. (ii)
Forced retreat:* for any consecutive vertices in where is not the end of and any , if and , then .*
Figure 1 provides an example of such an -augmenting path. Intuitively speaking, if an augmenting path enters a vertex in path that is not from its successor, then it is forced to go backward (or retreat).
For a minimum vertex cut , our goal is to find the maximum number of vertex disjoint paths from to (from which we can infer the vertex cut), and we use augmenting paths to this end as follows. ‘Lemma’ 2.5 shows (i) if an augmenting path ending at can be found, then we can increase the number of vertex disjoint path, (2) if no augmenting path ending at can be found, then we can find a vertex cut.
‘Lemma’ 2.5** (Simplified version of Lemmas 3.3 and 3.4).**
Suppose is an undirected graph, if is a set of internally vertex disjoint paths starting from , ending at a vertex set , then
- (i)
(Augmentation.) Suppose is a -augmenting path, ending at , then there exists a set of internally vertex disjoint paths ending at . See Figure 1 as an example. 2. (ii)
(Find a cut.) Let contain all the nodes that can reach through a -augmenting path. If , then the following nodes form a vertex cut: for any , the node in that has the largest distance to on . See Figure 2 as an example.
It is not hard to see that, in the Augmentation case above, the minimum vertex cut separating and has a size at least if does not have an edge to —this follows from Menger’s theorem.
Our algorithm IsolatingSmallCut for Lemma 2.1 works as follows. Initially, each node has an empty flow-path set . We run iterations where, in each iteration, we increase the size of by for each vertex : In each iteration, very informally, each vertex sends a DFS token to explore in a DFS manner for rounds in order to find a -augmenting path. If the DFS gets stuck (This is explained shortly.), then we use ‘Lemma’ 2.5 to find a cut. Indeed, our main challenge is to reduce congestion caused by all of these DFS traversals running in parallel. To this end, we exploit the following property of augmenting paths which is the main technical lemma of this subsection. We start with some definitions which provide the necessary context.
A -augmenting path is called retreating if there exists , such that , i.e., the only way to extend to a -augmenting path is to set . For example, in Figure 1, the red -augmenting path from to is retreating. A -augmenting path is called non-retreating if it is not retreating.
‘Lemma’ 2.6** (Simplified version of Lemma 4.3).**
For any undirected graph , consider two vertices , and let and be flow-path sets of and , respectively. Let and be non-retreating - and -augmenting paths, respectively. If and end at the same vertex, then there exists a path on the subgraph of resulting from combining all edges of such that is either -augmenting ending at , or -augmenting ending at .
With ‘Lemma’ 2.6, the algorithm becomes the following. Denote the flow-path set of as . Initially . Run the following procedure for iterations: In each iteration, we make sure that the size of increases by 1 for all .
- (i)
Whole-graph DFS: In parallel, every vertex sends a token (denoted by the -token) to explore new vertices in in a DFS manner: Each vertex (including ), once receiving the token, finds out which of its neighbors is not explored yet by the -token, and sends the -token to one such unexplored neighbor. The DFS follows the forced retreat property described in Definition 2.4, i.e., when an -token arrives at a vertex on a flow-path not from , then the token must be sent to .101010The astute reader may observe that this DFS traversal may visit a vertex on a flow-path path more than once because it is forced to do so by a forced retreat. In Algorithm 1, however, we use a directed graph representation that will be defined in Definition 3.2 which avoids this problem. The DFS traversal ends in either of the following three ways:
- •
If explores vertices or reaches another vertex , it stops.
- •
If two tokens from meet at a vertex , then they stop, form a pair , and report this fact back to and through DFS trees. Denote the path from and to in the DFS trees by and respectively. Define subgraph as the subgraph formed by the union of edges in . This graph will be used in the next step.
If many tokens meet at , we pair them up to get subgraphs . In the case where is odd, is allowed to continue its DFS onward.
- •
If finishes DFS (i.e., has explored all vertices it can reach) without exploring vertices and without reaching another vertex , output the small cut using ‘Lemma’ 2.5 (ii)(If several vertices finish DFS, we just need to pick an arbitrary one.)
Let be the vertex cut as claimed in Lemma 2.1, i.e., , and . Note that succeeds in finding a -augmenting path that terminates in in the first case with a constant probability: (i) If explores vertices, then a random vertex among the explored vertices is in with probability at least . So we can choose this random vertex as the terminating vertex of the augmenting path111111A-priori we do not know if our chosen vertex is in or not. However, we show that, if the algorithm outputs a valid vertex cut in the end, it will be a cut of size at most . See Remark 2.3.. (ii) If reaches that , then is the terminating vertex and .
Once the DFS traversals stop for every , we move to the next step. 2. (ii)
Subgraphs DFS: For each pair , and run DFS traversal on . These DFS traversals in all ’s are run simultaneously using the random delay technique [Gha15] to avoid congestion121212According to [Gha15], running independent CONGEST algorithms simultaneously can be done using random delay in rounds. See Lemma 3.1 for more details.. If find a -augmenting path to , it uses to increase the size of by . Do the same for . ‘Lemma’ 2.6 guarantees that one of and will succeed in finding an augmenting path.
Note that executing Step (i) and (ii) will increase for a constant fraction of by ‘Lemma’ 2.6. We repeat these two steps times to make sure increases for every .
Round complexity. We first bound the round complexity for the two steps. One can see that Step (i) runs in rounds. The round complexity of Step (ii) depends on the dilation (i.e., the diameter of subgraph ) and congestion (i.e., the maximum number of for different pairs that shares the same edge) which we bound below. We crucially use the following fact: A -augmenting path of length w.r.t. a flow-path set can increase the number of path edges in the new flow-path set by at most an additive factor of . 131313This observation follows directly from the following fact which is easy to see. Suppose has flow-path set at the end of each iteration (We assume ), and consider the -augmenting paths that are used to generate different : Each is a -augmenting path. We claim that the edges in is a subset of edges in . Note that it might not be true that the set of edges in is a subset of the set of edges in
Dilation.
Note that each , , is of size . From the fact stated above, it is straightforward to bound the size of (which is composed of ) by .
Congestion.
The number of that contain an edge is bounded by the number of times is visited by DFS traversals in Step (i), as can be included in some only after it is visited in any DFS traversal in Step (i) by or . Every edge is included in at most one DFS traversal in each round of Step (i). Since Step (i) lasts for rounds in each of the iterations, an upper bound on the number of times is visited by DFS traversals in Step (i) is .
The total round complexity is : The first is the number of iterations, is the number of times Step (i) and (ii) are repeated in each iteration. See Section 4 for more details.
2.2 SingleSourceLocalCut (proof sketch of Lemma 2.2; full proof in Section 5.)
For intuition, note that a statement similar to Lemma 2.2 can be shown in the sequential setting [NSY19, FNY*+*20] by running the Ford-Fulkerson algorithm. This algorithm runs for iterations where in each iteration it increases the amount of -flow by one via an augmenting path. We follow this basic idea but need some modifications. First, in each of the iterations, we randomly select some terminals, where each vertex has probability to be the terminal. We allow the augmenting path to end at a terminal instead of at . This suffices because if there exists a vertex cut such that (thus satisfies the condition in the first bullet of Lemma 2.2), a simple union bound shows that the random terminals on all rounds are in with constant probability. The algorithm for finding the augmenting path is stated as the following lemma. We will use this algorithm with . Recall from Definition 2.4 the notion of flow-paths and -augmenting path.
‘Lemma’ 2.7** (RandomAugmenting).**
There exists a CONGEST algorithm called RandomAugmenting that takes an undirected graph , two vertices , integer and a set of flow-paths of where each path in has length bounded by , as input and the algorithm either
outputs a vertex cut of size , or
- -
outputs a -augmenting path with length bounded by , either ending at , or ending at a random vertex , where for any .
The algorithm has dilation and congestion .
To prove Lemma 2.2 using ‘Lemma’ 2.7, our algorithm starts with . It proceeds in iterations, where in each iteration we find a -augmenting path using ‘Lemma’ 2.7 with to increase the size of by . Since , one can see that the dilation is and the congestion is , which is what we want in Lemma 2.2. The rest of this section is devoted to showing the proof idea of ‘Lemma’ 2.7.
Proof idea of ‘Lemma’ 2.7.
We first review the framework for distributed reachability algorithms used in [Nan14, GU15, LJS19]. (We will modify this framework to find a -augmenting path as guaranteed in ‘Lemma’ 2.7.) This framework consists of two phases, where the first phase is identical in all algorithms in [Nan14, GU15, LJS19], and these algorithms differ in the second phase. Suppose we want to find a path from to . The two phases are:
- (i)
Build a virtual graph. Pick appropriate parameter (we will pick to prove ‘Lemma’ 2.7). Construct a virtual141414By “virtual” it means that edges in the virtual graph might not be edges in the input network. graph where (also called set of hubs) includes every vertex of with probability as well as , and an edge is included in if the distance from to in is at most . can be constructed by constructing a BFS tree of depth from each vertex in . 2. (ii)
Reachability in the virtual graph. Find all the hubs that can reach in , denoted by (the way to efficiently find differs by different algorithms). Now we claim that are all the vertices can reach in the original graph .
The correctness is guaranteed by the following arguments: since we sample hubs with probability , the path from to a vertex contains hubs with distance one after another along the path, with high probability. Therefore, the hubs in the path form a directed path in the virtual graph, where the last hub in the path has distance to in .
Using reachability algorithm to find an augmenting path.
Our definition for augmenting path in Definition 3.2 can be reformulated as a directed path in a directed graph, by the standard way of duplicating each vertex into in-vertex and out-vertex. See Section 3.2 for more detail. Thus, we can use a directed graph reachability algorithm to find an augmenting path.
However, directly applying this framework to prove ‘Lemma’ 2.7 is not efficient as there can be BFS tree constructions that can lead to dilation and congestion . Recall that in ‘Lemma’ 2.7 we want dilation and congestion , where can be much smaller than . There is no way to set appropriate to satisfy both the dilation and congestion. To achieve a better dilation and congestion trade-off, we will only grow a BFS tree on fewer carefully chosen hubs instead of all hubs.
Path centered clustering.
The key idea to reduce the number of BFS tree constructions is a structure called path-centered clustering. The details of this structure are described in Definition 5.1, and Lemma 5.4 shows that we can efficiently construct this structure. Here we give a simplified version of the structure. Note that the following definition is different from the definition in Section 5.1, because the following definition failed to satisfy 2.8 in some cases, which affect the correctness of our algorithm. However, it shows the general idea of the more complicated definition, so we use it for ease of explanation.
For a given network of diameter , a path centered clustering is a tuple where is a flow-path set, and is a partition of (i.e. is a disjoint union of all ’s), called clusters with the following guarantees: Each cluster contains , and each induced subgraph has a diameter at most . We call the center of every vertex and denote it by . See Figure 3 for an example.
Definition of and active hubs.
We need a few definitions to show the properties of path centered clustering. For a path and on the path, we say if and we say on path if . For any two vertices and a path centered clustering , we say , if and belong to some path and . The relationship is not total as not every two vertices in are comparable by . For each hub (recall that hubs are sampled vertices in with sample probability ), we use to denote the number of hubs with . We will assume the following assumption.
Assumption 2.8**.**
If , then can reach through an augmenting path.
Remark 2.9*.*
It is to be noted that our actual clustering is more fine-grained than what is described above to tackle the following technical problem: 2.8 is true if (In Figure 4, the blue line shows an augmenting path from to .) and may not be true if . This is solved by making the clustering more fine-grained—more details are provided in Section 5.1. In this section, we assume 2.8 holds for ease of explanation.
Build a virtual graph with fewer BFS tree constructions.
In this part we will show how to build a virtual graph on hubs with BFS tree constructions, such that either
preserves the -reachability (in the sense that all the vertices reachability by in can be reached from a vertex in with distance , such that can reach in ), or
- -
can reach a random vertex such that each vertex in becomes with probability .
Now we give our algorithm. We first compute a path centered clustering . We call a hub active hub if . Other hubs are called non-active hubs. Denote the set of all active hubs as . One can argue that . We only grow BFS trees on active hubs. By setting , the dilation and congestion of constructing all the BFS trees satisfy the requirement in ‘Lemma’ 2.7. By doing that, we can get a virtual graph where includes an edge if and has distance at most to in .
Now we argue the property of . If in , can reach a non-active hub through active hubs, then we can pick a uniform random hub among all hubs as the destination. Notice that a non-active node satisfies , thus, each node has probability at most to be the destination. On the other hand, if cannot reach any non-active hub, then by growing BFS trees on all active hubs, we can find all vertices that can reach in exactly.
Find reachability in virtual graph
Let contain all the active hubs that can reach in the virtual graph . Our goal in this part is to find efficiently. Notice that if we can find , the according to the argument in the previous part, either we can find all vertices in that can reach, or find a non-active hub such that we can choose a random destination with probability .
We first discuss the difficulty. Notice that . Possible values of are and . In this case, . All the existing algorithms fail to find reachability with round complexity on a virtual graph with vertices. However, our virtual graph is not an arbitrary directed graph. We will exploit some properties of our virtual graph to come up with an efficient algorithm.
The idea is to sparsify the transitive closure of and broadcast the whole sparsified graph. We will make sure that the sparsified graph has the same reachability relationship as the original graph, and it is possible to broadcast the sparsified graph using messages. There are two types of edges in the sparsified graph.
Backward edges.
These are edges where . To learn this type of edge, we give each flow-path an id. Each vertex on can learn ’s id and its position on (the number of vertices with ) efficiently by existing results. After that, each active hub broadcasts the flow-path id where is on, as well as the position on the flow-path.
Forward edges.
For each active hub , recall that is the directed tree with depth rooted at . Instead of keeping all edges from to all hubs in , we preserve the “highest hub” for each path : let contain all hubs in with on . Let be an arbitrary hub in such that for every other hub , we have . is added to the virtual graph for any .
One can see that the number of messages broadcast by every active hub is bounded by . Thus, the congestion is , which fits our goal. To see that the reachability relationship does not change, suppose where is on , then can reach in the virtual graph by first using the upward edge , then using the downward edge .
Remark 2.10*.*
We skip the mapping of each edge in the virtual graph to a path in the original graph efficiently in the technical overview, see Section 5.4 for more details. Actually, to recover the path in the original graph efficiently, the sparsified virtual graph defined in Section 5.4 is different from here and more complicated, while the high-level ideas are the same.
2.3 Putting everything together
We first restate Theorem 1.1 formally.
Theorem 2.11**.**
There is a randomized vertex cut algorithm in the CONGEST model that, with input and undirected graph , takes rounds, either outputs a minimum vertex cut of , or outputs , satisfying
If the output is a vertex cut, then it must be a minimum vertex cut of . 2. 2.
If is not -connected, then is output with at most constant probability.
Since Theorem 2.11 states a one-side error algorithm, the success probability can be boosted efficiently. The following is the schematic of the algorithm, using the subroutine described in Lemmas 2.1 and 2.2.
Schematic algorithm for vertex cut
•
Input: An undirected graph with nodes, a positive integer .
•
Output: A vertex cut with size less than , or .
If a vertex has degree less than in , output all the neighbors of this vertex. Otherwise continue the following procedures.
For do:
(a)
Let . Each vertex is included in with probability independently.
(b)
If ,
•
discard vertices in with degree larger than in , and run a -coloring algorithm in ([HKMT21]) to get independent sets (see Lemma 2.12);
•
run IsolatingSmallCut (see Lemma 2.1) for any .
(c)
If , discard all vertices in with degree at least in , run IsolatingSmallCut (see Lemma 2.1).
(d)
If , for each , let be an arbitrary vertex which is distinct from . Run SingleSourceLocalCut (see Lemma 2.2) for any in parallel (see Lemma 3.1).
If any subroutine described in Lemmas 2.1 and 2.2 outputs a cut, then the algorithm outputs the cut and stop. Otherwise, output .
Correctness.
According to Lemmas 2.1 and 2.2, if a cut is output, then it must be a valid vertex cut with size less than . Thus, if the graph has no valid vertex cut with size less than , then the algorithm will output with probability .
Suppose there is a vertex cut with . We assume the max degree of the graph is at least , otherwise, a vertex cut of size less than can be trivially found in the first step of the algorithm. We will show that in the second step, at the first iteration when , a cut with a size less than will be output with constant probability.
Case 1 ():
In this case, we get independent sets . We first prove the following lemma.
Lemma 2.12**.**
At least one of (denoted by ) satisfies: is an independent set on , contains exactly one vertex in , and .
Proof.
Since and we sample each vertex into with probability , with constant probability there is exactly one vertex . Let contain all neighbors of in and itself. Since the degree of is at least and , we have has size at most . Thus, with constant probability, contains no vertex in . Consider the independent set among that contain . We have , which finishes the proof. ∎
According to Lemma 2.1, once Lemma 2.12 is proved, a cut with size less than will be output with constant probability when IsolatingSmallCut is called.
Case 2 ():
Since we sample each vertex into with probability and , with constant probability, exactly one vertex is in and . According to Lemma 2.1, a cut with size less than will be output with constant probability.
Case 3 ():
According to the same argument, with constant probability, exactly one vertex is in and . Consider the instance with , that instance satisfies the premise of Lemma 2.2 to output a cut with size less than .
Round complexity.
When , the round complexity for the coloring algorithm is . There are instances of IsolatingSmallCut in Lemma 2.1, which leads to the round complexity since . When , the round complexity is . When , the dilation is and the total congestion is , since there are vertices in w.h.p. Thus, the round complexity is .
2.4 Organization
The rest of this paper is organized as follows. In Section 3, we give some basic definitions and define the vertex residual graph. In Section 4, we describe the algorithm IsolatingSmallCut to prove Lemma 2.1. In Section 5, we describe the algorithm SingleSourceLocalCut to prove Lemma 2.2. Other proofs for less important lemmas are deferred to the appendices.
3 Preliminary
3.1 Basic Definitions
We will use the following terminology throughout the paper.
Graph terminologies.
For convenience, we treat an undirected graph as a directed graph with each undirected edge replaced by two directed edges , i.e., and are different edges in an undirected graph.
For a graph , a path with length is a vertex sequence , where for all . We say starts at and ends at . For , we write to denote precedes on path . The edges are called the edges of , denoted as . Normally we assume a path cannot contain repeated edges (but can contain repeated vertices). are called vertices of , denoted as , are called internal vertices of , denoted as . is called simple path if are distinct. A set of paths are called internally vertex disjoint, if every two paths intersect only at non-internal vertices (start and end vertices). Similarly we define . We say ends at the multiset if the union of end vertices of paths in is . In multiset, we also care about the number of occurrences of elements. A circle with length is a length path where are different. A set of circles are called vertex disjoint if any two of the circles do not share any vertices.
A subgraph of is an edge set . Let be all the vertices adjacent to , is the subgraph associated with . We do not distinguish and the subgraph if there is no ambiguity in the context. For a vertex set , the induced subgraph is the graph with vertices set and edge set . Further, We define the boundary of a subset of vertices as
[TABLE]
Moreover, we define . Two vertex sets are call connected if .
A vertex cut is a vertex set such that is not connected. is called the size of the vertex cut. We also use the 3-tuple to represent a vertex cut, where , are mutually disjoint, and .
CONGEST model.
Suppose the communication happens in the network . In the CONGEST model, time is divided into discrete time slots, where each slot is called a round. Throughout the paper, we always use to denote the number of vertices in our distributed network, i.e., . In each round, each vertex in can send a bit message to each of its neighbors. At the end of each round, vertices can do arbitrary local computations. A CONGEST algorithm initially specifies the input for each vertex, after several rounds, all vertices terminate and generate output. The time complexity of a CONGEST algorithm is measured by the number of rounds.
Distributed inputs and outputs.
Since the inputs and outputs to the distributed network should be specified for each vertex, we must be careful when we say something is given as input or is output. Here we make some assumptions. For the network , we say a subset of vertices (or a single vertex) is the input or output if every vertex is given the information about whether it is in . We say a subgraph (a subset of edges, for example, paths or circles) is the input or output if every vertex knows the edges in adjacent to it. We say a number is input or output (for example, ), we normally mean the number is the input or output of every vertex unless otherwise specified.
Dilation and congestion
For independent CONGEST algorithms on the same network , the dilation of is defined to be where is the round complexity of algorithm , and the congestion of is defined to be , where is the number of messages sent through edge by algorithm . The following lemma is taken from [Gha15]. We will frequently use this lemma in our algorithm description.
Lemma 3.1**.**
All algorithm in can be simulated in rounds.
As an example, consider growing BFS trees with depth starting from vertices, one can see that this can be done in rounds.
3.2 Vertex Residual Graph
In this section, we will define vertex residual graph. We will define a directed graph , such that finding a directed path on is equivalent to finding a -Augmenting Path defined in Definition 2.4.
We use ideas from the well-known reduction from vertex connectivity to edge connectivity. We split each vertex into two vertices , with a directed edge from to . For each edge in the original graph, we build an edge from to . Moreover, the residual graph is the graph reversing edge directions on , i.e., for each edge we reverse the edge direction of edge and for any we reverse the edge direction of edge . Since only edges with one edge vertex in will change direction, in the following definition, we only duplicate vertex to , vertex not in can combine into just one vertex .
Definition 3.2** (Vertex residual graph).**
Given an undirected graph , a vertex and a set containing internally vertex disjoint simple paths starting from , we define the vertex residual graph on as the directed graph . Let be all the internal vertices of .
[TABLE]
[TABLE]
We say is the projection of vertex , denoted as . Similarly, for a path in , is the path on that maps each vertex in to its projection, maintaining the order, and combining consecutive repeated vertices mapped from . We call a vertex in as in-vertex or out-vertex according to its superscript, i.e., whether it is or for some .
One should keep in mind the following relationship between Definition 2.4 and Definition 3.2: if a path reaches , that means the path enters from a vertex other than and must go to .
The following lemmas show how to relate finding paths in a vertex residual graph to increasing the number of internally vertex disjoint paths. For an edge , the reversed edge is defined to be . For an edge set , the reversed edge set is }. The symmetric difference of is defined to be . i.e., all the converse edges are cancelled.
Lemma 3.3** (Augmenting; Proof in Appendix B).**
Given an undirected graph , a vertex and a set containing internally vertex disjoint simple paths starting from and ending at multiset , let be the vertex residual graph on , if there exists a simple directed path in starting from ending at , while the internal vertices of do not contain any for , then there exists internally vertex disjoint simple paths starting from and ending at multiset .
Moreover, in the subgraph , the maximal connected components containing is ; other maximal connected components of are vertex disjoint circles.
The following lemma shows how to find a cut if an augmenting path cannot be found.
Lemma 3.4** (Find cut).**
Given an undirected graph , a vertex and a set containing internally vertex disjoint simple paths starting from . Suppose is the vertex residual graph on , and are all the vertices can reach in . Then let , has size at most and is a vertex cut in if .
Proof of Lemma 3.4.
First, notice that can contain at most vertex in each path . If it contains two vertices where , then we know , which means can reach , and can reach through the backwards direction of . That lead to a contradiction as . Thus, we have .
Denote . We will prove that and , which means is a vertex cut.
We first prove that . To prove , suppose is a neighbor of where , we will prove . Notice that according to the definition of . Therefore, if do not exists, then is an edge and that contradict the fact that . Thus, exists and is an edge, which means . Since , we have . To prove , for a vertex with and , we first have . Since , there exists a path (since the edge that go into must from an out vertex). We have and , which proves .
Then we prove that there exists a destination of a path such that . Otherwise, if contains all destinations of , then contains all , which means contains all the nodes in as the graph is connected. Now we get , and we also know that according to the definition. Since we proved , we get .
∎
4 IsolatingSmallCut (Proof of Lemma 2.1)
In this section, we prove Lemma 2.1. We will give details of the algorithm described in Section 2.1, in the context of vertex residual graph.
4.1 Distributed Algorithm Details
We first described the detailed DFS subroutine for one vertex . Recall that initially gets an empty flow-path set , and at each loop, it uses DFS to increase the size of by , by finding an augmenting path in the residual graph. We use to denote a sufficiently large constant.
Remark 4.1*.*
More details about line 1: Let be the subgraph containing all edges in that has transferred the DFS-token. Clearly, is connected and contains all paths from to vertices in . To sample , we sample a random rank in for every out-vertex and pick the highest one by communicating inside ; to find , we simply follow DFS-token back forward path, which is also inside . To update to using , recall Lemma 3.3. We first map to . Then we truncate at the vertex fit into one of the following two cases
is an end vertex of a path . 2. 2.
(for the definition of , see Algorithm 2).
we compute , where is the path from to in , and find the connected component of containing , divide it into internally vertex disjoint paths.
Fact 4.2**.**
Algorithm 1 has the following properties.
It either outputs a valid -vertex cut, or internally vertex disjoint paths . 2. 2.
It has constant congestion inside , and has dilation . 3. 3.
When the algorithm ends, has size .
The first fact is due to Lemmas 3.3 and 3.4. This second fact is straightforward from the algorithm description. The third fact is due to the DFS procedure: each round of the for loop (line 1) either add a new vertex to , or send back the DFS-token from a vertex in , while each vertex can send back DFS-token only once.
The main algorithm will run Algorithm 1 for all vertices in simultaneously, which might lead to higher congestion. To avoid congestion, we need the following lemma, which is the restatement of ‘Lemma’ 2.6 in the context of the vertex residual graph.
Lemma 4.3** (Path-handshaking; proof in Section 4.2).**
Let be an undirected graph and . For , suppose is a flow-path set of . Let be the vertex residual graph on , and suppose there exists a simple path on starting from , ending at . Then there exists a path with , such that is either on starting from ending at or , or on starting from ending at or .
See Algorithm 2 for our main algorithm in this section. For convenience, we duplicate each edge in the communication graph into two parallel edges, one edge transfers message sending from and one edge transfers message sending from .
Line 2 refers to the DFS procedure in Algorithm 1 loop 1. Line 2 refers to line 1, 1 in Algorithm 1, running simultaneously for every using Lemma 3.1. Line 2 is also running simultaneously for every pair in using Lemma 3.1.
4.2 Proof of Lemma 4.3
In this section, we will prove the important path-handshaking lemma. Suppose end at multiset , respectively. According to Lemma 3.3, there exists internally vertex disjoint paths ending at , where . Denote the edge set as a subgraph . We have according to Lemma 3.3. Let be the vertex residual graph restricted on (i.e., the subgraph of that contains edges with or ). Let denote the set of all the vertices such that can reach in . Since both end at , we have . Now we suppose cannot reach on , and cannot reach on , and try to get contradiction. We first show some properties of and for .
Claim 4.4**.**
If and or is reachable from in , then .
Proof.
According to the definition of , are either in for some , or in . In the former case, can reach , can reach , which means . Now suppose . In the latter case, we also have according to the definition of . Thus, has no edge to or . Since , the only possibility is . Recall that is a path from in , we have . ∎
Claim 4.5**.**
For each path , exactly a prefix of the path is in ; i.e., either , or there exists a vertex , such that for any .
Proof.
Suppose . We have . Suppose for some , which means is reachable from in . Since we have , according to 4.4, we get . Thus, for any , we have , which leads to the claim. ∎
Claim 4.6**.**
In , all neighbors of is either in or in for some .
Proof.
Suppose is a neighbor of , if , then there is an edge from to , which means can be reached from in . Thus, . Now suppose and . There is an edge from to , which means is reachable from in . Suppose for some , and , i.e., are two vertices adjacent to in . We first show that . To prove this, we only need to show that . Actually, either or is in , both means is reachable from and , which is a contradiction. Thus, is true. Then we show that . To prove this, notice that is either in a circle, or in according to Lemma 3.3. Suppose is in a circle . We have and . Since are internally vertex disjoint simple paths, there must exist an edge in that is in . An end point of this edge must be in . According 4.4, all vertices on the circle are in , which is a contradiction to . Thus, the only possibility is . Let be the path that , and . Recall that is reachable from in . According to 4.4, . According to 4.5, , which finish the proof. ∎
We divide the paths in into three types, defined as follows. For , we define as the opposite index. Suppose path starts at and ends at , is called
type 1, if do not exists (which means according to 4.5), and . 2. 2.
type 2, if exists, and . 3. 3.
type 3, if it is not type 1 or type 2 path.
Among all the paths, let the number of type 1, type 2, type 3 paths be respectively. For a type 1 or type 2 path , where we set if do not exists, and if exists. we define as the vertex with the smallest index satisfying and . Since we have and (If , then can reach , in which case can reach or , but remember that we assume cannot reach or ), must exists. We first show that for a different type 1 or type 2 path , it must hold that . Otherwise, suppose , where . Note that since are internally vertex disjoint paths. Further, , according to 4.5. That leads to a contradiction to the definition of .
Since there are type 1 or type 2 paths, we get in total such vertices and corresponding edge , where is on and is not in . Thus, according to 4.6, for some . Path is an type 2 path since . There are type 2 paths, and is distinct for different type 1 or 2 paths , we have , which means . However, remember that contains a path ending at , and , which is a type 1 path. That leads to a contradiction since type paths do exist.
4.3 Analysis of Algorithm
Round complexity.
We first bound the number of while loops in line 2.
Lemma 4.7**.**
For any , the while loop in line 2 contains loops.
Proof.
We will prove that each while loop will decrease the size of by at least by half. Since pairs in are disjoint, and vertices not inside are all deleted from , we just need to prove line 2 will find a path either from to or from to . Notice that are inside because in line 2, the algorithm collide at some edge . Remember that we duplicate each edge into two edges one for transferring messages from out-vertex and one for transferring messages from in-vertex, so there are two cases to consider
- •
both send from out-vertex . That means the DFS-token from and both arrives , which means there is a path from and to in the residual graphs, satisfying the precondition of Lemma 4.3. Also notice that the mapping of these paths is included in . Thus, by DFS searching in , at least one of or will find path to or . Let us consider the case where reach , we will argue that does not exist in the residual graph of : that is because the truncating of 4.1, any internal vertex of a path in cannot contain . Finally, we have reach in the residual graph.
- •
both send from in-vertex . Recall that in Definition 3.2, has only one out-neighbor, which is an out-vertex . Besides, the edge is in both . Therefore, both have paths to in the residual graphs, and the mapping of these paths are inside . Thus, by DFS searching in , at least one of or will successfully update or based on the same argument as above.
This finishes the proof. ∎
Then we bound the complexity of line 2, 2.
Lemma 4.8**.**
For each such that is not stopped, let be a subgraph containing all edges involved in . has dilation , and for any edge , the number of different where is bounded by .
For each pair in , let in line 2, the dilation of is bounded by , and for any edge , the number of different pairs where is bounded by .
Proof.
According to Fact 4.2, has dilation . Consider the for loop 2, each round each edge can transfer one message, which means each edge can be included in at most one in each round. Since there are rounds, the first part of the lemma is proved.
The above arguments also hold for . To bound the dilation and congestion for , we focus on analysing and . The updating rule of guarantees that if an edge is included in , it must be transferring message in line 2 for at some previous loops. Since in each round, each edge can transfer one message, and line 2 runs in , the lemma is proved. The term comes from the number of while loops 2 according to Lemma 4.7, and term comes from the number of outer loops 2. ∎
The round complexity claimed in Lemma 2.1 is , the third term is the complexity of inner loop and the first two terms are the number of outer loops and inner loops.
Correctness.
Then we prove the correctness. We first prove that, if a cut is output, it must be a valid cut. Recall in Algorithm 1, a cut is output iff. the token goes back to . This can only happen when contains all the vertices that can reach. According to Lemma 3.4, is a cut as long as . The latter claim is because . As , must be non-empty.
Now suppose there exists such that , we will prove that will be output with at most constant probability. Denote . If all paths in end at , then according to Lemma 3.3, there exists internally vertex disjoint path between and , which contradiction the fact that . Let be the first outer loop (line 2) where the ending set of contains a vertex not in . Let be the path found in line 1. If is truncated by Remark 4.1 by where is an end vertex of a path , the ending set of is the same in the -th loop; if is truncated by Remark 4.1 by where , then must be in . Thus, in the -th loop, must end at a vertex in .
Let be the event that the first outer loop (line 2) where the ending set of contains a vertex not in is loop . Let be the path found in line 1 in loop . The endpoint of is a random vertex among vertices, which lies in with probability bounded by since . Thus, the event happens with probability . If the algorithm return , then one of must happen. By union bound, ”the algorithm return nothing” has probability bounded by . By letting the constant hidden in sufficiently small, the probability is bounded by a constant.
5 SingleSourceLocalCut (Proof of Lemma 2.2)
In this section we prove Lemma 2.2.
5.1 Path Centered Clustering
We first give the definition of paths centered clustering promised at Remark 2.9. For a path and vertices with , we write to denote precedes on path .
Definition 5.1** (Paths centered clustering).**
For an undirected graph with diameter and a set of simple paths , a paths centered clustering on is a tuple , where is a partition151515A set of vertex sets is defined to be a partition of a vertex set , if any two sets in are disjoint, and the union of is . of ; are functions on , each is called a cluster. For any , we have
* is a vertex set containing at most one vertex in each path , and .* 2. 2.
* is connected; each vertex in is a neighbor of some vertex in ; the subgraph has diameter at most .* 3. 3.
Suppose is on path . We write as the cluster containing if . Then there exists an edge such that , and satisfies
- •
If , then either , or there exists such that . is call a lower cluster of .
- •
If , then either or there exists such that . is call a upper cluster of .
Recall that in the simplified version defined in Section 2.2, the clustering is a partition of and each cluster contains exactly one vertex in the flow-paths as its center. However, it is different in the above definition: clustering is a partition of , each cluster contains no vertex in , but still has a unique ”representative” adjacent to in . Moreover, each node in might be the ”representative” of several different clusters. The function defines whether is connected to the upper part or the lower part of the ”representative” path. See Figure 4 as an example.
The following definition defines a partial order on based on a paths centered clustering . Using the following definition, the problem mentioned in Remark 2.9 is solved, see Remark 5.3.
Definition 5.2** (Paths centered order).**
A paths centered order on the paths centered clustering on is a partial order defined as follows. We first extend functions to all vertices: for , ; for , let be the cluster that is in, then . For , we define iff. are both on path and , or .
Remark 5.3*.*
The paths centered ordering is defined based on the following intuition: if , then can reach in the vertex residual graph on . See Figure 5 for an example.
The following lemma shows there exists a fast algorithm to compute paths centered clustering.
Lemma 5.4** (Clustering; Proof in Section A.1).**
On an undirected graph with diameter , there exists a CONGEST model algorithm given a set of vertex disjoint simple paths , either output a vertex cut with size at most , or compute a path centered clustering on , denoted as , where each vertex in knows . The algorithm has dilation and congestion .
5.2 Algorithm Overview
Partial Virtual Graph.
Recall that in Section 2.2, we showed the framework of reachability in the CONGEST model: construct a virtual graph which depicts the reachability of the original graph. We also showed that it is not efficient to build the virtual graph on all sample hubs. In fact, any edge in the virtual graph starts at active hubs. The reason is that if we can reach a non-active hub, then we can find the desired path and can stop. This leads to the following definition of partial virtual graph, where the partial virtual graph is only guaranteed to either depict the reachability of the graph or can reach the desired destination.
For two vertices in graph , we use to denote an arbitrary path from to in .
Definition 5.5** (Partial Virtual Graph).**
For a directed graph and a set , a partial virtual graph on with dilation is a virtual graph where satisfying the following property: For any two vertices , if exists, then there exists and satisfies one of the following two conditions.
. 2. 2.
* has distance at most to in .*
The following is the schematic of our algorithm. We omit most of the details to give the reader a high-level idea of what the algorithm is doing. The implementation details will be presented in the following sub-sections.
Schematic of SingleSourceLocalCut()
•
Inputs: An undirected graph , vertices , integers with .
•
Outputs: A valid -vertex cut, or .
•
Initially set . is a set of internally vertex disjoint paths starting from . Let . Repeat the following steps for loops.
Each vertex becomes a hub with probability . become hubs with probability . Use Lemma 5.4 on to get a path centered clustering , or Return a vertex cut. Each hub becomes an active hub if the number of hubs with is bounded by .
Let be the vertex residual graph on . If is a hub or active hub in , then is a hub or active hub in . Let contain all the hubs and contain all the active hubs in . Each vertex in broadcast bits messages to the whole graph (will be specified in Section 5.4, analogue to ”Downward edges” and ”upward edges” described in Section 2.2). Using these informations, a partial virtual graph with dilation on can be known by all vertices.
Let contain all the vertices that can reach in . if , then sample a uniformly random from vertices in , and let ; otherwise, if , let ; otherwise, if there exists , then sample a uniformly random hub among all hubs with and let ; otherwise, for each , let contain all the vertices that a vertex in can reach with distance , if has no outgoing edges, Return the vertex cut , otherwise Return .
Map into a path in . Update by the augmenting path using Lemma 3.3.
•
If no cut is output, Return .
Distributed implementation organization.
One can see that there are four steps in each loop. We will describe the implementation details for each step in the following subsections.
Step 1 (Section 5.3):
We will describe how to get hubs and active hubs, this can be done easily by aggregation through clusters and paths.
Step 2 (Section 5.4):
We will define the partial virtual graph by giving types of edges in . We will also show that those types of edges can be constructed by broadcasting bits of messages by each active vertices.
Step 3 (Sections 5.5 and 5.6):
There are four if-else possibilities in step 3, we will give more details for each possibility in Section 5.5. One important detail is how to get the path , which basically shows how to get the path according to Remark 5.3. This is in Section 5.6.
Step 4 (Section 5.7):
The final step will give details about mapping paths in into paths in , by distributively mapping each type of edge in to paths in . The final step also contains the details of updating .
5.3 Step 1: Find hubs and active hubs.
We first introduce a basic algorithm that we will frequently use. The proof is deferred to Appendix C. The lemma mainly shows how to efficiently aggregate values in a path by divide and conquer.
Lemma 5.6** (Path aggregation; Proof in Appendix C).**
There exists a CONGEST algorithm given a directed graph with undirected diameter , given a path , an integer such that , each vertex on the path get a polynomial bounded integer (we treat repeated vertices on the path as different vertices); each vertex outputs the value . The algorithm has dilation and congestion .
Now we show how to compute hubs and active hubs. Firstly, we calculate for each vertex , its path’s index (defined later) and its own index (defined later) on the path. This can be done by running the algorithm in Lemma 5.6 two times for each path . Firstly for all to get ; secondly for all except to get . Since there are at most paths, each has length bounded by , this cause a dilation and congestion .
Then we use Lemma 5.4 on to create . Notice that is a subgraph with diameter at most , and are disjoint for different . Thus, in the following, we always assume the aggregation problem inside happens in , and are computed in parallel for all . Each vertex can get the information , by broadcasting inside . Recall that we sample each vertex in as hub with probability , and is a hub. For each , let denote the number of hubs inside all lower clusters of , and let denote the number of hubs in all upper clusters of plus an indicator variable that equals to if is a hub. This can be computed by collecting the number of hubs inside and sending it to using . This procedure case dilation and congestion .
Recall that we defined as the paths centered ordering in Definition 5.2. Now we want to compute for each hub , the number of hubs such that , denoted as . This can be done by running the algorithm in Lemma 5.6 for each path , each vertex on the path receives input value . Upon each receives the output of the algorithm , it sends the output to all its upper clusters and sends the output minus to all its lower clusters. At last each hub becomes an active hub if . This guarantees that the number of active hubs is bounded by . This can be done with dilation and congestion .
5.4 Step 2: Build partial virtual graph.
In this step, we show how to build the partial virtual graph on the vertex residual graph efficiently, mainly by reducing the number of edges in the partial virtual graph while maintaining the mutual reachability relationship and broadcasting a small amount of information to let all vertices know all edges in .
Recall that is the vertex residual graph on , and is a hub or active hub in iff. is a hub or active hub in . There are five types of edges in the partial virtual graph. We will define them and also show what information should each active vertex broadcast to build the edges. Recall that in Section 2.2, we only give two types of edges, namely upwards edges and downwards edges. The 5 types of edges is an extensions of it in order to make it easy to recover a path in from the edge in (see Section 5.7 for how to get the path in from each type of edge). For convenience, for each , we use to denote the cluster that contains . For convenience, if , then we also define and .
Type 1: Edges inside clusters.
Each active hub broadcasts its cluster’s id (each cluster has a unique cluster id, it can be the largest vertex id in the cluster, for example) to the whole graph. All the hubs in the same cluster form a clique in .
**Type 2: Upwards edges for upper clusters. **
For each active hub , it broadcasts tokens to build a BFS tree with depth on , denoted as . For each , we define an order over all active hubs with as follows: Recall that can contain at most element, denote the only element as . Define if , and if . Then iff. or . If there exists with , then broadcasts the partial virtual graph edge to the whole graph where is an arbitrary maximal active hub with respect to , i.e., for any active hubs such that , we have . This can be found by aggregation on .
**Type 3: Upwards edges for lower clusters. **
For each active hub , it builds a reversed BFS tree with depth , i.e., all vertices in can reach by a path with length at most . For each , if there exists active hub with , then broadcasts the partial virtual graph edge to the whole graph where is an arbitrary minimal active hub with respect to , i.e., for any active hub such that , we have . This can be found by aggregation on .
Type 4: Downwards edges.
For each active hub , for any , it broadcasts ’s path id and position on path , and two indicator defined by: If then , otherwise if then , otherwise ; similarly if then , otherwise if in then , otherwise . Now type 4 edges contains all the edges such that there exists satisfying , and either , or . Notice that in that case, can reach through a path with length bounded by .
Type 5: Terminal edges.
For each active hub , if contains a vertex in , then it broadcast the edge to the whole graph.
One can see that each vertex broadcasts at most messages to the whole graph, and the tree aggregate messages. Since there are at most active hubs, this cause a dilation and congestion . Now every vertex knows the same graph locally. The following lemma shows that is a partial virtual graph. The proof is deferred to Section A.
Lemma 5.7** (Partial virtual graph; Proof in Section A.2).**
With high probability, the defined by the above 5 types of edges is a partial virtual graph on with dilation .
5.5 Step 3: Find augmenting path.
One can see that there are four cases in Step 3. In the first three cases, a path should be found, in the last case, a cut should be output. We discuss each case separately in the following. Recall that is the set of all the active hubs that can reach in .
Case 1 ():
The reason that we do not uniformly sample a hub among all hubs in is that we might finally get a path in with length , which we want to avoid. Thus, instead of sampling from , we find a subset such that can reach all the vertices in through vertices in ; and if one hub in a cluster is in , then all hubs in the same cluster are all in . This can be done since each cluster contains at most hubs in . To find , we start from , repeatedly adding that can reach, and if is added, all hubs in the same cluster as in are also added, until . Then we sample a hub uniformly at random and find as the path using vertices in from to . All the above procedures can be done locally in each vertex since is shared by all the vertices. We also need to guarantee that each vertex gets the same path in , this can be done by raising a leader, sampling a vertex in , and each vertex finding the unique path with the smallest lexicographical order to the sampled vertex. We will show how to turn this path into a path in in Section 5.7.
Case 2 ():
The path is . Similarly, a unique path from to in is shared by all vertex. We will show how to turn this path into a path in in Section 5.7.
Case 3 (there exist with ):
We need to sample a uniformly random hub among all hubs with . One can see that since is not an active hub, the number of hubs with must be . To sample , recall that in Definition 5.2, only needs the information . Thus, broadcast the path id and position on the path of and . By using this information, each vertex with can mark itself. Denote the set of all the marked vertex as . To sample one vertex, each vertex in samples itself with probability , and aggregates through the whole graph to see whether there is exactly one vertex sample itself. If not, then repeat. The procedure will end with a high probability in rounds. That shows how to sample the uniformly at random. The path is . We will see how to turn the path into a path in in Section 5.7. According to Remark 5.3, there is a path from to in . However, finding the path from to is a complicated procedure, we defer it to Section 5.6.
Case 4 ( do not contain any or a non-active hub):
We will need the following claim.
Claim 5.8**.**
* contains all the vertices that can reach in with high probability.*
To see this, suppose is reachable from but not in , then according to Lemma 5.7 and Definition 5.5, should reach a vertex in , which is a contradiction. Thus, according to Lemma 3.4, by finding all the vertex that is in , we get a vertex cut with size at most .
5.6 Substep: Find path from to .
We first describe a lemma showing how to find a path between two vertices in the same cluster .
Lemma 5.9** (Find path in cluster; Proof in Appendix C).**
On an undirected graph with diameter , given an connected induced subgraph of , and two vertices , given an integer , there exists a CONGEST model algorithm on finding a path from to in , with dilation and congestion .
For two vertices , we use to denote the path found by Lemma 5.9 from to in , using as the communication graph in the lemma. For , Let the path in be the corresponding backward path of . We use for two vertices in to denote the subpath from to in . Such a path can be found distributively by broadcasting the position of , all vertices in the path with position between make an edge towards . In the following, we consider three cases and show how to find the path from to in each case. For convenience, we assume , and the cluster containing are separately. The case when or is in only makes things easier which can be treated similarly. In the following, we will also define cluster . For convenience, we assume the size of are all bounded by , which guarantees the length of the path. We will discuss what should we do when one of has size in step 4. Let and let be the vertices such that is a neighbor of , is a neighbor of . Let be the path that is in.
Case 1 ( is an upper cluster):
Suppose is connected to cluster through edge , with and . Let be the vertices in that are neighbors of . Then the path is . Note that according to the definition of upper cluster, it might be the case that is connected to a vertex on the path (but not a cluster). In that case, the problem becomes easier: there is an edge directly from to .
Case 2 ( is a lower cluster and ):
can go to its representative and then following the backwards of the path to reach . The path is .
Case 3 ( is a lower cluster and ):
In that case, must be a lower cluster. Suppose is connected to cluster through edge , with and . Let be the vertices in that is a neighbor of . Then the path is . Note that according to the definition of lower cluster, it might be the case that is connected to a vertex on the path (but not a cluster). In that case the problem become easier: there is an edge from to .
On the case that one of has size , we need the following lemma. The proof the lemma is deferred to Appendix C.
Lemma 5.10** (Find small piece in cluster; proof in Appendix C).**
On an undirected graph with diameter , given an induced subgraph of satisfying , is connected. Given a vertex , given an integer , there exists a CONGEST model algorithm on finding a vertex set such that , , is connected, with dilation and congestion .
Without loss of generality, suppose the first of that has size is . Then when we want to find the path in by Lemma 5.9, we first use Lemma 5.10 to find a connected induced subgraph inside with vertices containing , then sample a vertex in uniformly at random, then use Lemma 5.9 to find the path from to in . Instead of reaching , we reach as the final vertex. Same if the first of them is or : we stop at the point when we want to find a path inside or , and end at inside or .
Since each cluster has size bounded by , the above procedure has dilation , congestion . The length of the path from to is bounded by .
5.7 Step 4: Change path in to and update .
Now given a path from to in ( can be or in step 3), we want to turn it into a path in . We first refine this path such that it goes into each cluster (hubs with the same cluster IDs, which form a clique) and go out at most once: we put an edge from the first vertex on the path intersecting the clique to the last vertex on the path intersecting the clique and discard the subpath between them. Now we replace each edge in into a path in . We consider three cases according to the type of .
Case 1 (Type 1 edge):
It this case, are in the same cluster, denoted as . The path becomes in , which can be found with dilation and congestion using Lemma 5.9. Let be the number of hubs inside , since we sample each hub with probability , w.h.p we have . Notice that exists at most once for each , and there are active hubs inside (or ), which means all type 1 edges can be turned into paths with dilation and congestion , while the total length of these paths is bounded by .
Case 2 (Type 2,3,4,5 edge):
Notice that in this case, has distance bounded by to . The path from to in can be found by growing a BFS tree with depth at most . Finding all of them cause dilation and congestion , and the sum of the length of all these paths is bounded by .
The total dilation is , the total congestion is , and the total length of the first part path is .
Now we get a path in but not a simple path. We use Lemma 5.6 on the path to find the position of each vertex on the path. If a vertex is repeated, it only preserves the edge point out to the vertices with the largest position id (farthest from ), and deletes other out edges. Now all the edges form a subgraph, where the connected components of the subgraph containing is a simple path. The following lemma shows how to find it. The proof is deferred to Appendix C.
Lemma 5.11** (Turn path into simple path; Proof in Appendix C).**
On the network with undirected diameter , there exists a CONGEST model algorithm given the edge direction, a vertex , a subgraph with vertices which contains a path starting from , satisfying there are no edges in between vertices in and other vertices in , an integer , output for each vertex whether it is in or not; the algorithm has dilation and congestion .
We use Lemma 5.11 to fix the final simple path. Now we get a simple path on . Then we check whether the internal points intersect with the endpoints of any path in , if there exists, then we discard all the vertices after the intersection. According to Lemma 3.3, by computing locally, and use Lemma 5.11, we get internally vertex disjoint paths staring from . Since has length bounded by , the step has dilation and congestion .
5.8 Analysis of Algorithm
Round complexity.
The algorithm is described above. As shown in each step of the algorithm details, the total dilation and congestion for all loops are bounded by and . Recall that , which leads to the dilation and congestion claimed in Lemma 2.2.
Correctness.
Similarly to Section 4.3, We first prove that, if a cut is output, it must be a valid cut. Recall in the algorithm, a cut is output iff. it falls into the last sentence of Step 3. Since contains no outgoing edges, contains all the vertices that can reach. According to Lemma 3.4, is a cut as long as . The latter claim is because .
Now suppose there exists such that , we will prove that will be output with at most constant probability. According to 5.8, will not be output at Step 3 with high probability, so we only consider the case that is output at the last line of the algorithm. Let . If all paths in end at , then according to Lemma 3.3, there exists internally vertex disjoint path between and , which contradiction the fact that .
Let be the event that the first loop where the ending set of contains a vertex not in is loop . We will bound the probability that happens. Let be the flow-path before the -th loop. According to the algorithm description, at the -th loop, it finds an augmenting path in either ending at an endpoint of (we denote all the endpoints of as ), or ends at or or . Denote . Note that implies . ends at a point in with probability at least . We consider each case.
If ends at or , then we are done since and . 2. 2.
If ends at . Recall that is an uniformly random hub from hubs. Since we sample each vertex as hub with probability at least inside , with high probability, the hubs inside is bounded by with high probability. Since , we have with probability at least . 3. 3.
If ends at . Recall that is a uniformly random vertex from vertices. Thus, with probability at least .
The event happens with probability at most according to the above discussion, which means is output with probability bounded by a constant.
Acknowledgment
We would like to thank Danupon Nanongkai for numerous fruitful discussions through out the project, and the reviewers for their meticulous reading and comments.
Appendix A Clustering and Partial Virtual Graph
In this section we prove two main gradients of our algorithm for large : the clustering algorithm in Lemma 5.4 and the partial virtual graph properties in Lemma 5.7.
A.1 Proof of Lemma 5.4
We give the algorithm for building a paths centered clustering on . Each vertex in broadcast tokens to build a BFS tree: initially only vertices in are activate, on each round all active vertices send a token to all its neighbors if it haven not done so. At the end of each round, if a vertex that have not been activated receive a token (probability more than one token), then it joins the BFS tree of an arbitrary vertex that sent it the token, then become activate. The procedure has dilation and congestion , since each active vertices only sends once, and each vertex has distance at most to a vertex in .
Now we get a partition of , each part is a tree rooted at a vertex with depth at most , we denote it as . Each subtree rooted at a child of on is a cluster . All form a partition of , and has diameter . Now we want to combine clusters in order to make each cluster either an upper cluster or a lower cluster. We maintain the center set for each cluster . Initially , where is the tree containing . We write . One can see that has diameter initially. During the algorithm we will maintain the property that has diameter . For each vertex , we use to denote the cluster is in.
For path , we define to denote the path id and position on path for . For each edge where , if there do not exists such that , then we call edge a critical edge. The idea of the algorithm is to eliminate all critical edges.
We run several phases until there are no critical edge. At each phase, each cluster tosses a coin, getting head or tail with equal probability . Clusters use the shortcuts to share information inside a cluster. For a critical edge where has tail and has head, want to join . Let contains all the clusters that want to join . We run at most times the following loops. For a cluster , define . At each loop, try to find , i.e., path id in but not in . This can be done by making congestion . If do not exists, then loops are ending. Otherwise, pick arbitrary with , merge to . After merging, the new center of becomes . One can see that still contains at most elements, according to the definition of critical edge (there do not exist two different elements with the same path id). Also notice that each vertex in has distance at most to a vertex in , thus, has diameter . After merging, some critical edges might no longer be a critical edges. Recompute all critical edges and (note that we do not re-toss the coin), and continue the next loop. Since each loop increase the size of by , there can be at most loops. After all the loops, all clusters satisfying . Now all clusters in merge to , after which the phase ends.
At each phase, a critical edge with tail on one side and head on another side must disappear at the end of this phase. Moreover, if an edge is not a critical edge, then it will never become a critical edge. That is because an edge is not a critical edge if either one of its end points is in , or the centers of both side shares the same path id with different position on the path, and never delete elements. Each critical edge disappear with constant probability at each phase, thus, after phases, with high probability, there are no critical edges. Each phase contains at most loops, each loop contains a broadcasting inside shortcut with messages and diameter . Thus, the algorithm has dilation and congestion .
At the end, contains at most one vertex in each path, and form a shortcut for with diameter . Moreover, all the edges adjacent with is not a critical edge. Consider several cases:
If there exists such that , then there exists such that . We let be the representative of , denoted as , and set as upper cluster or lower cluster according to whether or . 2. 2.
Otherwise, if there exists such that . We assume , otherwise consider two cases: if contains a center with , then let and be another vertex among ; if do not contains such a center, then we add to . In any case, we can assume . Let be the representative of , and set as upper cluster or lower cluster according to whether or . 3. 3.
Otherwise, all the edges adjacent to is like where and only appear once for each path. In that case, there are at most vertices in . We return as a vertex cut.
A.2 Proof of Lemma 5.7
We will prove several claims that shows properties about , which will help us prove the Lemma. Recall that we define as the cluster in containing . For convenience, if , then we also define and , and is treated as an upper cluster (since it can reach ). We define for two hubs as the following: there exists and . Recall that , and our goal is assuming a path from to a vertex exists in , proving that there is a path from to in or to in where has distance to ..
Claim A.1**.**
with high probability, for each cluster that contains at least one hub, there exists a hub in the cluster that has distance at most to any vertex . Moreover, any consecutive sub path of with length contains at least one hub.
Proof.
The first sentence is because is connected and also connect to . In the induced subgraph , we find all the vertices with distance to . There are at least such vertices (or it covers the whole , in which case it must contain the hub in the cluster), and with high probability one vertex is sampled as hub because we sample a hub with probability . This same argument holds for a consecutive sub path with length . Since there are at most clusters and sub paths, by using union bound we get the desired conclusion. ∎
Claim A.2**.**
For any two hubs , if there exists such that , then can reach either or in .
Proof.
For any active hub satisfying , denote the only element in as . Let the set contains all the active hubs that can reach in such that , let be one of the minimal in with respect to , i.e., for any . Firstly, is not an empty set, since is in . We claim that . Otherwise, for convenience we suppose and is in cluster . The case only make things easier. Then with high probability there exists an active hub which has distance at most to according to A.1, and has an edge to according to type 1 edge. There must exist a active hub in such that and , according to A.1. According to the definition of type 4 edge, has an edge to , which means can be reached from , leading to a contradiction. Thus, we have . For convenience we assume is in a cluster , the case that is in only make things easier. Let be the hub in that has distance to . If , then can reach by first use type 1 edge to reach , then use type 5 edge to reach since has distance at most to . Otherwise is an active hub, and we can use type 4 edge to reach , then use type 1 edge to reach . ∎
Claim A.3**.**
For , if , then either can reach , or can reach in , with high probability.
Proof.
Let be the path that is in. For any , we define as the only vertex in . If contains a vertex in then we are done. Otherwise, according to the definition of type 2 edge, there exists an edge such that or and is an active hub. In the case that , we are done according to A.2, since . So we only need to consider the case , we denote as . Recall that and is or if is a lower or upper cluster. Thus, we have , which means . Now consider two cases.
If , in which case is an upper cluster. Then according to the definition of upper cluster, there exists an edge where and is in cluster with and . According to A.1, there is an active hub that has distance at most to . Consider the path starting from , go through the connected induced subgraph , to and go back to through the reversed path of . If has length bounded by , then we have , which means is connected to a vertex in the cluster according to type 4 edge; otherwise, the length of is , in which case there is a hub which is an internal vertex of , such that w.h.p. Notice that either or , in both cases we have . Thus, there is a type 2 edge such that according to the definition of type 2 edge. According to A.2, we are done. 2. 2.
If , then both are lower cluster. According to the definition of lower cluster, there exists an edge where , is in a cluster with and . According to A.1, there is an active hub that has distance at most to . Consider the path starting from , go back to though the reversed path of , and go to in the connected induced subgraph . If has length bounded by , then we have , which means there is a vertex in that connects to according to type 4 edge; otherwise, the length of is , in which case there is a hub which is an internal vertex of , such that , w.h.p. Notice that either or , in both cases we have . Thus, there is a type 3 edge such that according to the definition of type 3 edge. According to A.2, we are done.
∎
Now we are ready to prove the lemma. Suppose can reach and is the path from to in . With high probability, there exists a sequence of hubs on like such that for any , and : that is because every consecutive vertices must contain a hub with high probability. According to A.3, for each active hub , either it can reach in , or it can reach . Therefore, either can reach , or can reach , and can reach with distance at most .
Appendix B Proof of Vertex Residual Graph Lemmas
In this section we prove Lemma 3.3 and Lemma 3.4, which relate vertex residual graph to vertex connectivity.
Proof of Lemma 3.3.
We prove it by induction on the length of . When the length is [math], the lemma trivially holds. Now suppose and the lemma holds for . We consider two cases.
Suppose ends at for some , and is like . We use the induction on the path to get vertex disjoint paths , which contains a path ending at . We will argue that by extending the path ending at to , we get which satisfies all we want. There are three things to show: 1. are internally vertex disjoint paths. Since are already internally vertex disjoint, we only need to show: do not intersect with any other paths unless , and do not intersect with any other internal vertices. 2. are simple paths. 3. is indeed the connected component of subgraph containing , and other components are circle.
First we show that do not intersect with any other paths in unless : firstly, we have since cannot contain any vertices in , thus, since ends at multiset , exactly one path ends at , and is not the internal vertices of any other paths according to induction hypothesis.
Now We consider two cases, differed by or not. Suppose . We first show that . We know , and we also have , and since is a simple path with length at least . is also not in , since is a simple path. Thus, is not adjacent to any edge in . Therefore, by extending the path ending at to , all paths are still internally vertex disjoint, since we have shown do not intersect other paths. Components relationship does not change.
Suppose . Then is not the internal vertices of any path in according to induction hypothesis. Thus, by adding to the path ending at in , we still get internally vertex disjoint paths. Components relationship does not change.
Finally, satisfies since and . 2. 2.
Suppose ends at for some . According to the definition of residual graph, and the internal of do not intersect , there must exists such that , and must be like . Now consider two cases.
Suppose . Use the induction on path to get internally vertex disjoint paths , which contains a path ending at . Since is a simple path, is not in . Thus, , because is the only edge that can be mapped to . Since , we have . is also in since is in the same components as . Thus, there exists a path ending at edge in . By deleting this edge , we get . Since we only delete an edge, all properties about still holds.
When . We use the induction on path to get internally vertex disjoint paths , which contains a path ending at . According to the same argument as 1, we know there is exactly one path in that ends at . Since is a simple path, are not in . Thus, , since is the only edge that can be mapped to . Therefore, we have . If also holds, since is an end point of , one path in must be like . By deleting the last two edges of the path to , we get , which maintains all the properties of . Thus, the only case left is . In that case we need to do path shifting. We consider two cases, differed by whether is in or not.
If , suppose is the path in containing . Let be the unique path in ending at . We consider two cases. Suppose , then by adding edge and delete from , we get a path ending at (which is a sub path of by delete the last edge of ), as well as a circle, containing the sub path of starting from to and edge . Since is a simple path, the circle is vertex disjoint, and the new path is simple. If , then we shift this two paths by adding and delete from . One of the new path take the whole part of and the latter sub path of starting from ; the other new path take the former sub path of end at . All properties hold.
If , then according to induction hypothesis, is in a vertex disjoint circle. by adding and delete , we add the circle into the path and get a new path ending at . All properties hold.
∎
Appendix C CONGEST Algorithms Based on Merging Clusters
In this section we prove missing lemmas stated in Section 5. They are all based on techniques that start from clusters with single vertices, merge clusters while maintaining cluster properties, and finally combined all clusters into a single cluster.
Proof of Lemma 5.11.
We maintain clusters , initially each cluster contains exactly one vertex in . We run several phases, at each phase, each cluster toss a coin, getting head or tail. In each phase, tail clusters merge to head clusters if they share an edge in . Each cluster also maintain the information whether it contains . Communication inside each cluster happened by broadcasting in each cluster if the cluster has size at most , or broadcasting through the whole network if it has size more than . Since there are at most vertices in , the number of clusters that broadcast through the whole network is bounded by . Since at each phase, each edge has constant probability to merge two clusters (and become an edge inside a cluster), after phases, there are no edge connecting two different cluster. Now, path is the cluster that contain . ∎
Proof of Lemma 5.6.
If a vertex is repeated times, we just treat it as vertices, in which case we can treat as a simple path. We maintain clusters , initially each cluster contains exactly one vertex in . Each cluster is a sub path of . Suppose , each vertex also maintain the value . Initially the value is simply .
We run several phases, at each phase, each cluster toss a coin, getting head or tail. In each phase, tail clusters merge to head clusters if they share an edge , where is the head cluster and is the tail cluster. One can see that each tail cluster can merge to at most head cluster, and each head cluster can merge at most tail cluster. sends the value to all the vertices in the cluster containing , and they add this value to there results. Communication inside each cluster happened by broadcasting in each cluster if the cluster has size at most , or broadcasting through the whole network if it has size more than . Since the path has length , the number of clusters that broadcast through the whole network is bounded by . Since at each phase, each edge has constant probability to merge two clusters (and become an edge inside a cluster), after phases, there are no edge connecting two different cluster. ∎
Proof of Lemma 5.9.
The idea is to combine directed rooted trees while maintain the tree properties. We first prove the following frequently used claim: given several vertex disjoint rooted directed trees inside , each vertex knows the subtree size inside each tree, then given each vertex a number , with dilation and congestion , each vertex get , where is the subtree rooted at .
To prove the claim, we use the idea of heavy-light tree decomposition and Lemma 5.6. We first decompose each tree into paths: each vertex build a directed edge to one of its child with largest subtree size. The decomposition has the property that every path from a root to a leaf will go through at most paths. To see this, consider each time the path from the root to a leaf leaves a path, since a vertex always point to the child with the largest subtree size, the child without the edge from the parent must have at most half of the subtree size. Thus, we only need to run the algorithm in Lemma 5.6 steps: initially, each vertex get the value , and each path calculate which is the sum of for all after on this path. At the beginning of any latter steps, each vertex send the value to its parent if is the beginning of some path, i.e., the parent of do not have an edge pointing to . Then all vertices set the receive value as and redo Lemma 5.6. For each path, if its length is at most , then we do calculation just by pipeline on this path; otherwise we use the Lemma 5.6 to cause a dilation and congestion , where is the length of the path. Since has vertices, there are at most such instance, while the sum of congestion of all instance is bounded by . Thus, the claim is proved.
Now we proceed to prove the lemma. We maintain clusters , initially each cluster contains exactly one vertex in . Each cluster maintain a rooted directed tree, and each vertex in the tree know the size of the subtree rooted on the vertex. We run several phases, at each phase, each cluster tosses a coin, getting head or tail. In each phase, tail clusters merge to head clusters if they share an edge in . Now we consider how to maintain the tree information. Suppose the head cluster is , and there is a tail cluster want to merge to through the edge with . first reverse all the edges from to the root of , this can be done by applying the claim, putting value on and value [math] on any other vertices. All the vertices on the path change its subtree value from to , where is the size of the tree . One can see that this maintains the correct subtree size inside . Now we want to maintain the correct subtree size inside . sends the tree size of to . Now each leaf of that hangs a merged subtree get the value of the tree size. use the claim to add these value to its original subtree size for each vertex. All the communication happens simultaneous, with the same argument: if a tree has size at most , then all the properties are calculated through a pipeline inside , otherwise use the claim and cause a dilation and congestion , while the sum of all is bounded by .
Since each edge become an edge inside a cluster with constant probability at each phase, after phases, becomes a whole cluster since is connected. Each phase cause a dilation and congestion , the total dilation and congestion are and . Now we get a rooted directed tree on . To find the path from to , we can use the claim to find the path from to the root, and from the root to . ∎
Proof of Lemma 5.10.
Each vertex in broadcast tokens to build a BFS tree: initially only vertices in are activate, on each round all active vertices send a token to all its neighbors if it haven not done so. At the end of each round, if a vertex that have not been activated receive a token (probability more than one token), then it joins the BFS tree of an arbitrary vertex that sent it the token, then become activate. The procedure has dilation and congestion , since each active vertices only sends once, and each vertex has distance at most to a vertex in .
Now we get a partition of , each part is a tree rooted at a vertex with depth at most , we denote it as . Each subtree rooted at a child of on is a cluster . All form a partition of , denoted as , and is a rooted tree with depth . Now we want to combine clusters in order to increase the number of vertices in each cluster. Each cluster also maintain a center set , initially contain , where is the cluster which is a subtree of a child of . We will maintain the property that has diameter at most .
The algorithm runs for several phases, in each phase, each cluster with tosses a coin, getting tail or head with equal probability . If there exists an edge where are in different clusters , and have head while have tail, then will merge to . They merge there center set as well as their vertex set. Notice that each vertex in any cluster still have distance at most to one of its center. Thus, as long as a cluster is connected, the set has diameter at most , because all the vertices that has distance to a certain center has mutual distance at most . Since an edge connecting two small cluster has constant probability to merge at one phase, there are at most phases with high probability. Each phase contains a broadcasting with dilation and congestion .
At last, all the edge in has the following property: either are in the same cluster, or one of the cluster containing has size at least . Since merging happens on two clusters with size both at most , if a final cluster have been merged during the algorithm, it must have size ; otherwise, it has never been merged, which means it remains a rooted tree with depth . Now suppose is in cluster . If then we are done; otherwise, either is an initial cluster, or and one of the neighbor of has size . In the later case, we will include both and some part of into . If , then we are done, otherwise is an initial cluster. So the only thing left is the following problem: given a rooted tree with depth , number of vertices and a vertex , find a connected components in containing with size . With out loss of generality, we can assume is the root of , while the depth is still bounded by . We first compute for each vertex in the subtree size , by a pipeline on . Then we compute the preorder traversal number for each vertex in in the following way: suppose a vertex has the preorder traversal number , and it has children , then get the preorder traversal number . Start from the root with preorder traversal number , the preorder traversal number for layer 2 vertices in rounds, then layer 3, 4… The total dilation is . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AEG + 22] Simon Apers, Yuval Efron, Pawel Gawrychowski, Troy Lee, Sagnik Mukhopadhyay, and Danupon Nanongkai. Cut query algorithms with star contraction. In FOCS , pages 507–518. IEEE, 2022.
- 2[BDD + 82] M. Becker, W. Degenhardt, J. Doenhardt, S. Hertel, G. Kaninke, W. Keber, K. Mehlhorn, S. Näher, H. Rohnert, and T. Winter. A probabilistic algorithm for vertex connectivity of graphs. Information Processing Letters , 15(3):135–136, 1982.
- 3[CGK 14] Keren Censor-Hillel, Mohsen Ghaffari, and Fabian Kuhn. Distributed connectivity decomposition. In PODC , pages 156–165. ACM, 2014.
- 4[CGL + 20] Julia Chuzhoy, Yu Gao, Jason Li, Danupon Nanongkai, Richard Peng, and Thatchaphol Saranurak. A deterministic algorithm for balanced cut with applications to dynamic connectivity, flows, and beyond. In FOCS , pages 1158–1167. IEEE, 2020.
- 5[CHGK 14a] Keren Censor-Hillel, Mohsen Ghaffari, and Fabian Kuhn. Distributed connectivity decomposition. In Proceedings of the 2014 ACM symposium on Principles of distributed computing , pages 156–165, 2014.
- 6[CHGK 14b] Keren Censor-Hillel, Mohsen Ghaffari, and Fabian Kuhn. A new perspective on vertex connectivity. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms , pages 546–561. SIAM, 2014.
- 7[CKL + 22] Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time. In FOCS , pages 612–623. IEEE, 2022.
- 8[CPZ 19] Yi-Jun Chang, Seth Pettie, and Hengjie Zhang. Distributed triangle detection via expander decomposition. In SODA , pages 821–840. SIAM, 2019.
