Finding a Small Vertex Cut on Distributed Networks

Yonggang Jiang; Sagnik Mukhopadhyay

arXiv:2302.11651·cs.DS·February 24, 2023

Finding a Small Vertex Cut on Distributed Networks

Yonggang Jiang, Sagnik Mukhopadhyay

PDF

Open Access

TL;DR

None

Contribution

None

Abstract

We present an algorithm for distributed networks to efficiently find a small vertex cut in the CONGEST model. Given a positive integer $κ$ , our algorithm can, with high probability, either find $κ$ vertices whose removal disconnects the network or return that such $κ$ vertices do not exist. Our algorithm takes $κ^{3} \cdot \tilde{O} (D + n)$ rounds, where $n$ is the number of vertices in the network and $D$ denotes the network's diameter. This implies $\tilde{O} (D + n)$ round complexity whenever $κ = polylog (n)$ . Prior to our result, a bound of $\tilde{O} (D)$ is known only when $κ = 1, 2$ [Parter, Petruschka DISC'22]. For $κ \geq 3$ , this bound can be obtained only by an $O (lo g n)$ -approximation algorithm [Censor-Hillel, Ghaffari, Kuhn PODC'14], and the only known exact algorithm takes $O ((κ Δ D)^{O (κ)})$ rounds,…

Equations8

N (V^{'}) = {v ∣ \exists (u, v) \in E, u \in V^{'}, v \neq \in V^{'}}

N (V^{'}) = {v ∣ \exists (u, v) \in E, u \in V^{'}, v \neq \in V^{'}}

V^{'} = {u^{o u t} ∣ u \neq \in X} \cup {u^{in}, u^{o u t} ∣ u \in X}

V^{'} = {u^{o u t} ∣ u \neq \in X} \cup {u^{in}, u^{o u t} ∣ u \in X}

E^{'} =

E^{'} =

\cup {(v^{in}, u^{o u t}) ∣ v, u \in X, (u, v) \in E (P)} \cup {(v^{o u t}, u^{o u t}) ∣ (u, v) \in E (P), v \neq \in X}

\cup {(u^{o u t}, u^{in}) ∣ u \in X}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Advanced Graph Theory Research · Stochastic Gradient Optimization Techniques

Full text

Finding a Small Vertex Cut on Distributed Networks

Yonggang Jiang MPI-INF, Germany, [email protected]

Sagnik Mukhopadhyay University of Sheffield, UK, [email protected]

We present an algorithm for distributed networks to efficiently find a small vertex cut in the CONGEST model. Given a positive integer $\kappa$ , our algorithm can, with high probability, either find $\kappa$ vertices whose removal disconnects the network or return that such $\kappa$ vertices do not exist. Our algorithm takes $\kappa^{3}\cdot\tilde{O}(D+\sqrt{n})$ rounds, where $n$ is the number of vertices in the network and $D$ denotes the network’s diameter. This implies $\tilde{O}(D+\sqrt{n})$ round complexity whenever $\kappa=\mathrm{polylog}(n)$ .

Prior to our result, a bound of $\tilde{O}(D)$ is known only when $\kappa=1,2$ [Parter, Petruschka DISC’22]. For $\kappa\geq 3$ , this bound can be obtained only by an $O(\log n)$ -approximation algorithm [Censor-Hillel, Ghaffari, Kuhn PODC’14], and the only known exact algorithm takes $O\left((\kappa\Delta D)^{O(\kappa)}\right)$ rounds, where $\Delta$ is the maximum degree [Parter DISC’19]. Our result answers an open problem by Nanongkai, Saranurak, and Yingchareonthawornchai [STOC’19].

1 Introduction
1.1 Our Result
1.2 Techniques
1.3 Open problems
2 Overview
2.1 IsolatingSmallCut (proof sketch of Lemma 2.1; full proof in Section 4)
2.2 SingleSourceLocalCut (proof sketch of Lemma 2.2; full proof in Section 5.)
2.3 Putting everything together
2.4 Organization
3 Preliminary
3.1 Basic Definitions
3.2 Vertex Residual Graph
4 IsolatingSmallCut (Proof of Lemma 2.1)
4.1 Distributed Algorithm Details
4.2 Proof of Lemma 4.3
4.3 Analysis of Algorithm
5 SingleSourceLocalCut (Proof of Lemma 2.2)
5.1 Path Centered Clustering
5.2 Algorithm Overview
5.3 Step 1: Find hubs and active hubs.
5.4 Step 2: Build partial virtual graph.
5.5 Step 3: Find augmenting path.
5.6 Substep: Find path from $h_{0}$ to $h^{*}$ .
5.7 Step 4: Change path in $G^{\prime\prime}$ to $G^{\prime}$ and update $P$ .
5.8 Analysis of Algorithm
A Clustering and Partial Virtual Graph
A.1 Proof of Lemma 5.4
A.2 Proof of Lemma 5.7
B Proof of Vertex Residual Graph Lemmas
C CONGEST Algorithms Based on Merging Clusters

1 Introduction

For any undirected non-complete111For complete graphs, the problem is trivial so we ignore this case. graph $G=(V,E)$ , a set $S\subseteq V$ is called a vertex cut if $G\setminus S$ contains at least two connected components, where $G\setminus S$ is obtained by removing vertices in $S$ from $G$ . In the vertex cut or vertex connectivity problem, we are given a positive integer $\kappa$ and want to either find a vertex cut of size at most $\kappa$ or to answer that such vertex cut does not exist. Vertex cut is a fundamental graph property and computing it is one of the most basic problems in graph algorithms. For example, it quantifies the vulnerability of a communication network in terms of the minimum number of vertices whose failures can disconnect the network. In the sequential model, this problem has been extensively studied over many decades (e.g. [Kle69, Tar72, HT73, ET75, Eve75, BDD*+*82, LLW88, KR91, CT91, NI92b, HRG00, Gab06, Geo10, HRW20, NSY19, FNY*+*20, LNP*+*21]). For $\kappa=1$ , a linear-time algorithm via depth-first search was long known due to Tarjan [Tar72]. For $\kappa=2$ , the linear-time algorithm was due to Hopcroft and Tarjan [HT73]. For $\kappa=\mathrm{polylog}(n)$ , an $\tilde{O}(m\kappa^{2})$ -time algorithm was recently discovered by [NSY19, FNY*+*20]. For other values of $\kappa$ , a reduction to maxflow by [LNP*+*21] together with the very recent fast maxflow algorithm of [CKL*+*22] led to an almost-linear time algorithm.

To conclude, the vertex cut/connectivity problem is almost solved in the sequential setting. However, when it comes to distributed networks computing their own vertex cut, much less is known. This is the case even when it wants to find a few (say, $\kappa=2$ ) vertices whose failures might destroy its communication. A distributed algorithm for finding a small vertex cut is the focus of this paper.

Distributed Vertex Cut.

We study computing the vertex cut problem in the CONGEST model of distributed networks. In this model, an undirected graph $G=(V,E)$ is given as the communication network. Two important parameters are $n:=|V|$ and $D$ , the diameter of $G$ . Time is divided into discrete rounds. In each round, each vertex can send an $O(\log n)$ bits message to each of its neighbors. After each round, each vertex can locally perform arbitrary computation and decide what to send in the next round. Initially, each vertex is given a specified input indicating some local information of the network (e.g. neighbors and weights of its incident edges). For the vertex cut problem, the input of each vertex is simply the set of its neighbors and integer $\kappa$ . After several rounds, all vertices are expected to terminate and generate the desired output. For the case of the vertex cut problem, we expect at most $\kappa$ vertices to identify themselves as being in a vertex cut if such a cut exists; otherwise, every vertex knows that such a cut does not exist. The goal is to minimize the number of rounds before all vertices terminate.

The CONGEST model is a standard model for studying basic graph algorithms in the message-passing distributed networks, e.g. minimum spanning tree (MST), shortest paths, min-cut, and approximate maxflow [GKP98, KP98, PR00, Elk06, DHK*+*12, GK13, NS14, GKK*+*15, FN18, Elk20, GL18, DEMN21]. These problems typically admit a trivial lower bound of $\Omega(D)$ ; thus, the focus is usually on the dependency on $n$ . A large number of graph problems were shown to require $\tilde{\Theta}(D+\sqrt{n})$ rounds, and this bound has become a gold standard.222Throughout, $\tilde{O}$ , $\tilde{\Omega}$ , and $\tilde{\Theta}$ hide $\mathrm{polylog}(n)$ . Examples of such problems include MST [GKP98, KP98, PR00, Elk06, DHK*+*12], approximate shortest paths [LP13, NS14, HKN21], approximate 2-edge connected spanning subgraph (2-ECSS) [Dor18, DG19], tree packing [CGK14] and approximate maxflow [GKK*+*18]. For cut-related problems, a line of work (e.g. [DHK*+*12, GK13, NS14, DHNS19, GNT20, DEMN21]) led to an $\tilde{O}(D+\sqrt{n})$ bound for computing edge cut $\lambda$ that holds even for weighted graphs [DEMN21, MN20]. The bound matches the lower bounds from [GK13, DHK*+*12] (the lower bounds hold when $\lambda$ is large enough).333[DHK*+*12] proved a lower bound of $\tilde{\Omega}(\sqrt{n})$ for computing weighted mincut on some graphs of diameter $D=\Theta(\log n)$ . For the unweighted case, it follows from [GK13, Theorem 6.4] that for any $\epsilon>0$ , there is a lower bound of $\tilde{\Omega}(n^{1/2-\epsilon})$ some graphs with diameter $D=\tilde{O}(n^{1/2-3\epsilon})$ and edge cut $n^{2\epsilon}$ . Moreover, when the edge cut $\lambda$ is small, better algorithms exist: For $\lambda\in\{1,2\}$ , the problem can be solved in $O(D)$ time [PT11]. For other values of $\lambda$ , there is a $O((\lambda D)^{O(\lambda)})$ rounds algorithm[Par19]. (The last bound is small under a typical assumption that $D\ll n$ .)

In sharp contrast with the above, our understanding of distributed vertex cut is much less complete. To the best of our knowledge, existing algorithms consist of

nosep

an $O(D+\sqrt{n}\log^{*}(n))$ -round algorithm that works only when $\kappa=1$ [Thu97], 2. nosep

an $O(D+\Delta/\log{n})$ -round algorithm that works only when $\kappa=1$ [PT11] ( $\Delta$ denotes the maximum degree), 3. nosep

an $O(\log n)$ -approximation $\tilde{O}(\sqrt{n}+D)$ -round algorithm [CHGK14a], and 4. nosep

an $O\left((\kappa\Delta D)^{O(\kappa)}\right)$ -round algorithm [Par19], 5. nosep

an $\widetilde{O}(D)$ -round algorithm that works only when $\kappa=1,2$ [PP22].

Thus, even to find $\kappa=3$ vertices that can disconnect the network, the available solutions are to either settle with a much bigger approximate solution of size $\Theta(\log n)$ [CHGK14b] or find an exact solution in $O\left((\kappa\Delta D)^{O(\kappa)}\right)$ time [Par19] which can be prohibitively slow for typical networks with large-degree “hubs” (e.g. the star networks). In other words, even for $\kappa=3$ we are already very far from the typical $\tilde{O}(\sqrt{n}+D)$ -time exact algorithms!

Challenges.

A fundamental difficulty in solving the vertex cut problem is its tight connection to maxflow computation. For example, while edge cut algorithms that are faster than solving maxflow were known in the sequential model for many decades (e.g. [NI92a, Kar00, MN20, GMW20, GMW21], it was only very recently that a vertex cut algorithm that is as fast as solving maxflow (and not faster) was found [LNP*+*21]. The situation is even worse in the distributed setting. For example, consider the case where we know two vertices $s$ and $t$ such that removing $\kappa$ vertices in $G$ would disconnect $s$ from $t$ (this is a basic case that all the state-of-the-art sequential algorithms have to solve [LNP*+*21, FNY*+*20, NSY19]). When $\kappa=O(1)$ , one can solve vertex cut in linear time in the sequential model using the Ford-Fulkerson algorithm. In contrast, in the distributed setting we cannot even solve this case in the typical $\tilde{O}(\sqrt{n}+D)$ rounds because it generalizes the distributed reachability problem, whose best-known round complexities are $\tilde{O}(D+\sqrt{n}D^{1/4})$ [GU15] and $\widetilde{O}(\sqrt{n}+n^{1/3+o(1)}\cdot D^{2/3})$ [LJS19]. More generally, the distributed setting poses an additional challenge for computing vertex cut because there is no non-trivial maxflow algorithm available.444The exception is the approximate maxflow algorithm of [GKK*+*15]. However, approximate maxflow was not known to be useful for solving vertex cut. Thus, to design distributed vertex cut algorithms, one needs to overcome fundamental questions of whether one could avoid maxflow computations or develop maxflow algorithms specialized for solving vertex cut. Since efficient maxflow algorithms are not available in many models of computation (e.g. graph streaming and parallel computing), answering these questions may lead to efficient vertex cut algorithms in other models as well.

1.1 Our Result

We show that, in $\tilde{O}(D+\sqrt{n})$ rounds, a distributed network can find up to $O(\mathrm{polylog}(n))$ vertices that can disconnect itself. More generally, our result is the following.

Theorem 1.1 (Informal. See Theorem 2.11 for a formal version.).

There is a randomized algorithm in the CONGEST model that, with input $\kappa<n^{1/4}$ and undirected graph $G$ , takes $\kappa^{3}\cdot\tilde{O}(D+\sqrt{n})$ rounds and determine whether $G$ is $\kappa$ -vertex-connected or not; if not, output the minimum vertex cut.555When $\kappa\geq n^{1/4}$ , our running time guarantee becomes at least $\Omega(kn)$ , which is quite bad, so we do not consider this case here. Also notice that by running Ford–Fulkerson algorithm from a sampled node to any other nodes in parallel, one can easily get a $O(kn)$ algorithm.

Our bound can be thought of as generalizing the $\tilde{O}(D+\sqrt{n})$ bound of [Thu97] that works only when $\kappa=1$ to any $\kappa=O(\log n)$ ; however, the techniques we use are very different. It is sublinear in $n$ as long as $\kappa\ll n^{1/6}$ . Our result answers an open problem from [NSY19].

1.2 Techniques

We provide a detailed overview of this framework and our algorithm in the next section. Here, we discuss some challenges and techniques to overcome them that might be of independent interest. Our algorithm follows the framework used by the algorithms of [NSY19, FNY*+*20] for solving vertex cut in $\tilde{O}(m\kappa^{2})$ time in the sequential model, where $m$ denotes the number of edges. These algorithms consider two types of vertex cuts of size $\kappa$ (assuming that they exist): a vertex cut that leads to a small connected component $C$ is called unbalanced and otherwise it is called balanced.

To find these cuts, we have to execute some maxflow algorithms which keep finding augmenting paths. For an intuition, suppose that there are $\kappa$ internally vertex-disjoint $(s,t)$ -paths between two vertices $s$ and $t$ . An augmenting path is an $st$ -path that, together with the existing paths, let us create $\kappa+1$ internally vertex-disjoint $(s,t)$ -paths. (See Fig. 1 for an example and Section 2 for a more detailed definition.) Finding an augmenting path is useful because we can show that it exists if and only if there is no vertex cut of size $\kappa$ that disconnects $s$ and $t$ . We now consider finding two types of cuts. Note that below we use ‘Lemma’ for lemmas that are used to provide intuition and are not actually proven.

Finding Unbalanced Cuts: Local Flows and Resolving Congestions (‘Lemmas’ 2.6 and 4.3).

To find unbalanced cuts, [NSY19, FNY*+*20] use local flow algorithms. Like many maxflow algorithms, a local flow algorithm keeps finding augmenting paths to increase the flow size; however, under some conditions, it can find augmenting paths without reading the whole input graph. For example, for vertex cut, [NSY19, FNY*+*20] use local flow algorithms to solve a problem where, given a vertex $s$ in the above small connected component $C$ , the algorithms can find the cut vertices in time roughly the size of the connected component $C$ defined above (more precisely, the volume of $C$ ), which can be much less than the size of the whole input graph. By not reading the whole graph, we can execute multiple local flow algorithms in near-linear time in total. This feature plays a key role in designing many efficient sequential algorithms, e.g. finding balanced cuts [ST13, SW19]), edge cut [KT15, HRW20], and dynamically maintaining expanders [Wul17, NS17, NSW17, SW19, CGL*+*20].

Applying the above idea in the CONGEST model, however, requires solving the congestion issue: many augmenting paths from different executions may go through the same edge. For example, the sequential vertex cut algorithms of [NSY19, FNY*+*20] need to compute $\Omega(n)$ local flows at some point, and we cannot rule out the case where all these executions require augmenting paths that share the same edge, which would cause $\Omega(n)$ rounds to modify all $\Omega(n)$ flows along these augmenting paths.

Congestion is a fundamental issue in the CONGEST model (thus the name). It is typically avoided by not executing too many algorithms in parallel. However, for vertex cut, we do not know how to avoid this. As far as we know, the same issue also arose in the distributed expander decomposition computation [CPZ19, CS19], where the authors use PageRank algorithms instead of local flow algorithms (both algorithms can be used to compute the expander decomposition in the sequential model). Then, they exploit the property of PageRank to show that there is not much congestion, thus the congestion issue can be avoided.

In this paper, we solve the congestion issue differently. Essentially, we show that even when there are huge congestions, $\Omega(1)$ fraction of the executions can still proceed. To show this, we prove the following (see ‘Lemmas’ 2.6 and 4.3 for detail). We have up to $\Omega(n)$ executions of the local flow algorithm of [FNY*+*20] running in parallel. Consider two augmenting paths $p_{1}$ and $p_{2}$ from two executions with sources $s_{1}$ and $s_{2}$ . If $p_{1}$ and $p_{2}$ meet at some vertex $t$ , then there is a path $p$ either from $s_{1}$ to $s_{2}$ or from $s_{2}$ to $s_{1}$ that uses only edges explored by the two executions so far such that $p$ * can be used as an augmenting path by one of the two executions*.666Here, we also exploit the fact that a source of one execution can be a sink for other executions. In other words, if the augmenting paths from two executions meet at the same vertex, then one of them can augment to another one.

This argument can be extended to show that if many augmenting paths meet at a vertex, then they can stop and only use what they have explored to finish the augmentation for half of them. This property helps reduce congestion when finding augmenting paths from different vertices.

To conclude, the above property allows us to find a vertex cut of size $\kappa$ in $\tilde{O}(\kappa^{3}\alpha)$ where $\alpha:=|C|$ , the number of vertices in one of the connected components in the cut (see Lemma 2.1 for detail). Finally, note that given the prevalence of local flow algorithms in designing efficient graph algorithms, similar issues to the above may arise for other problems, and it is interesting to see if our technique can be applied elsewhere.

Finding Balanced Cuts: Specialized Fast Reachability Algorithm (‘Lemma’ 2.7).

Before discussing this case, note that the above algorithm with round complexity $\tilde{O}(\kappa^{3}\alpha)$ already lends itself to a sublinear time algorithm for vertex cut with $\kappa=O(1)$ —one can use this algorithm for small $\alpha$ , and Ford-Fulkerson and reachability algorithm when $\alpha$ is large. In order to improve the round complexity to $\tilde{O}(D+\sqrt{n})$ even when $\kappa=O(1)$ , there is another fundamental barrier: the need to solve the distributed reachability problem.

For concreteness, assume that removing $\kappa=O(1)$ vertices leaves us with two connected components $A$ and $B$ each of $\Omega(n)$ vertices. This case cannot be solved efficiently by the local flow algorithm since $\alpha=\Omega(n)$ . In the sequential setting, this case can be easily solved by sampling two vertices $s$ and $t$ and computing a $(s,t)$ -maxflow of size $\Theta(\kappa)$ in a graph. To do this, simply find augmenting paths for $\Theta(\kappa)$ rounds (i.e. the Ford-Fulkerson algorithm). This takes $O(m\kappa)$ time and succeeds with constant probability (since $Pr[s\in A\text{ and }t\in B]=\Omega(1)$ ). In the CONGEST model, however, even answering a simpler question of whether there is one augmenting path from $s$ to $t$ (i.e., solving the $(s,t)$ -reachability) requires larger than $\tilde{O}(D+\sqrt{n})$ rounds: The best distributed algorithms for reachability require $\tilde{O}(D+\sqrt{n}D^{1/4})$ rounds [GU15] and $\widetilde{O}(\sqrt{n}+n^{1/3+o(n)}\cdot D^{2/3})$ rounds [LJS19] .

In this paper, we develop an algorithm specialized for our case: when we want to find an augmenting path, we are solving a reachability problem where most edges in the graph are undirected. A result implied by our technique when $\alpha=\Omega(n)$ is as follows. (See ‘Lemma’ 2.7 for the full statement.)

‘Lemma’ 1.2.

There exists a randomized CONGEST algorithm that, given two vertices $s,t\in V$ and a set of $\ell$ internally vertex-disjoint $(s,t)$ -paths $P$ , either returns an augmenting path or declares that such path does not exist. The algorithm takes $\ell^{2}\cdot\widetilde{O}(D+\sqrt{n})$ rounds.

So, to find $\kappa$ internally vertex-disjoint $(s,t)$ -paths, we use the above algorithm $\kappa$ times, taking $\kappa^{3}\cdot\widetilde{O}(D+\sqrt{n})$ rounds in total. This partially explains the round complexity of our final algorithm.

The main technique for proving the above ‘lemma’ is to modify the framework in the reachability algorithms [Nan14, GU15, LJS19]: As usual, we sample hubs and grow BFS trees from each hub and build a virtual graph on the hubs. Our novelty is to use a clustering technique (Lemma 5.4) to create a small number of strongly connected components (or clusters) and give them some ordering with the following guarantee: Any vertex in a cluster can reach any vertex in another cluster which is ordered lower than the former cluster. This clustering lets us reduce the number of vertices and edges in the virtual graph without affecting reachability as well as makes it possible to broadcast the whole virtual graph. See Section 2.2 for an overview of this algorithm.

1.3 Open problems

This paper presents a study on the computational complexity of the vertex connectivity problem for small $\kappa$ in the CONGEST model. There are several avenues for future research that may further improve upon the findings presented in this study.

Vertex connectivity in CONGEST model.

•

(Small $\kappa$ ) for small values of $\kappa$ , it would be interesting to investigate whether it is possible to surpass the $O(D+\sqrt{n})$ running time with an algorithm given that there is no $\Omega(D+\sqrt{n})$ lower bound for unweighted vertex connectivity. Although algorithms have been developed that run in $\tilde{O}(D)$ rounds for $\kappa=1,2$ , the true complexity for larger $\kappa$ remains unknown.

•

(Large $\kappa$ ) the current best algorithms for the general vertex connectivity problem in the CONGEST model do not have sub-linear time complexity when $\kappa$ is as large as $\Theta(n)$ . It would be interesting to explore the development of sub-linear algorithms for cases where $\kappa$ is large.

•

(Universally optimal) In recent years, there have been many papers seeking universally optimal algorithms, starting from the work by Haeupler, Wajc and Zuzic [HWZ21]. Since our algorithm meet the $\tilde{O}(D+\sqrt{n})$ upper bound for $\kappa=\text{polylog}(n)$ , it would be interesting to explore the development of an algorithm that is universally optimal.

Parallel vertex connectivity.

By combining the current best sequential algorithm for small $\kappa$ with the current best parallel algorithm for reachability with depth $n^{1/2}$ , it is possible to develop an almost linear work parallel algorithm with depth $n^{3/4}$ . It would be interesting to investigate whether it is possible to further reduce the depth of the algorithm to the best reachability algorithm depth of $n^{1/2}$ or better. As this paper provides an example of surpassing the reachability running time for small $\kappa$ in the CONGEST model, it is reasonable to expect that similar improvements may be possible in the parallel model as well.

Other models of computation.

In addition to advancements in the CONGEST and parallel models of computation, we would like to see further advancements in cut-query and two-party communication models, both in classical and quantum settings, for the problem of vertex connectivity (and minimum vertex cut). Notably, the edge connectivity (and minimum edge cut) has nearly been resolved within the classical setting [RSW18, MN20, LLSZ21] and considerable progress has been achieved within the quantum setting [LSZ21, AEG*+*22]. However, no substantial progress is made for vertex connectivity.

2 Overview

In this section, we sketch the proof of our main result, i.e. Theorem 1.1. For notations, we use the following: for $S\subseteq V$ , $\mathsf{N}(S)=\{v\mid\exists(u,v)\in E,u\in S,v\not\in S\}$ denotes the neighbors of $S$ in graph $G=(V,E)$ , and $\mathsf{N}^{+}(S)=\mathsf{N}(S)\cup S$ .

The crux of our algorithm is the subroutines called IsolatingSmallCut and SingleSourceLocalCut, which give guarantees as in Lemmas 2.1 and 2.2 below. We sketch the proofs of Lemmas 2.1 and 2.2 in Section 2.1 and Section 2.2 respectively. Then, in Section 2.3, we show how to combine them together by following the framework of [FNY*+*20].

We also denote the vertex cut of the graph $G$ by $(L,S,R)$ , where $|L|\leq|R|$ are the two sides of the cut, and $S$ is the set of vertices whose removal disconnects $L$ from $R$ . Lemma 2.1 roughly guarantees that if we have a set of vertices $A\subseteq V$ such that, for some $\kappa$ -cut $(L,S,R)$ , exactly one vertex in $A$ is in $\mathsf{N}^{+}(L)$ (i.e. $L$ and its neighbors), then we will be able to find such a cut or a similar cut in $\widetilde{O}(\kappa^{3}|L|)$ rounds. So, to find a small vertex cut $(L,S,R)$ when $|L|$ is small (the “unbalanced case” mentioned earlier), this algorithm will be fast assuming that we can find such an $A$ . For intuition, note the following related sequential algorithms. (i) In [LNP*+*21], the same statement to ours is proved in the sequential setting with an algorithm that takes max-flow time (which is currently almost linear [CKL*+*22]). This is done via the isolating cut technique [LP20], thus the word “Isolating” in the name of our algorithm. Unfortunately, we cannot use the same technique since we do not have an efficient exact max-flow algorithm in the distributed setting. (ii) In [FNY*+*20], a similar statement can be guaranteed in $O(m\kappa^{2})$ time in the sequential setting. Compared to our requirement that $|A\cap\mathsf{N}^{+}(L)|=1$ , the statement of [FNY*+*20] requires a weaker condition that $|A\cap L|\geq 1$ ( $A$ that satisfies this condition can be easily found, e.g. $A=V$ ). As we will show in Section 2.1, our algorithm follows the idea of [FNY*+*20], but our stricter condition gives us some leverage to avoid the congestion issue that we would face if we simply followed the ideas of [FNY*+*20] (discussed in the previous section).

Lemma 2.1 (IsolatingSmallCut( $G=(V,E),A\subseteq V,\kappa,\alpha$ ); Proof in Section 4).

There exists a CONGEST algorithm that given an undirected graph $G=(V,E)$ , a set of vertices $A\subseteq V$ , and $\kappa,\alpha\in\mathbb{N}$ ,777Every vertex knows of their membership in $A$ and $\kappa,\alpha$ . either outputs a valid $\kappa$ -cut888Every vertex knows of their membership in $S$ . $(L,S,R)$ with one side $L$ such that $|A\cap\mathsf{N}^{+}(L)|=1$ , or outputs $\bot$ . The output satisfies

•

if there exists a vertex set $L\subseteq V$ such that $|\mathsf{N}(L)|<\kappa,$ $|A\cap\mathsf{N}^{+}(L)|=1,$ and $|L|\leq\alpha$ , then the algorithm outputs $\bot$ with at most constant probability999When we say ”with constant probability” in this paper, we mean a constant less than 1., and

•

the algorithm runs in $\tilde{O}(\kappa^{3}\alpha)$ rounds.

Lemma 2.2 roughly guarantees that if we know two vertices $s$ and $t$ that are on the opposite sides of a $\kappa$ -cut, i.e. for some $\kappa$ -cut $(L,S,R)$ we have $s\in L$ and $t\in R$ , then we can find a $\kappa$ -cut efficiently; here, “efficiently” means the dilation of $\widetilde{O}(\kappa^{2.5}\sqrt{n}+\kappa^{3}D)$ and congestion of $\widetilde{O}(\kappa^{2.5}|L|/\sqrt{n})$ . We need the congestion to be $\widetilde{O}(\kappa^{2.5}|L|/\sqrt{n})$ so that we can run $O(n/|L|)$ algorithms with different $s,t$ simultaneously, while still keeping the running time $\widetilde{O}(\kappa^{2.5}\sqrt{n}+\kappa^{3}D)$ . It is necessary to run $\Theta(n/|L|)$ algorithms since we need to sample $\Theta(n/|L|)$ vertices to guarantee at least one vertex is inside $L$ . Each algorithm will take one sampled vertex as $s$ .

Note that a similar statement was achieved in the sequential setting in $O(m\kappa)$ time, which is the time to compute a max-flow of size $\kappa$ using Ford-Fulkerson algorithm. As discussed earlier, computing a max-flow will not allow us to beat the time to solve reachability. For this reason, we need some clustering ideas which we show in Section 2.2.

Lemma 2.2 (SingleSourceLocalCut( $G=(V,E),s,t,\kappa,\alpha$ ); Proof in Section 5).

There exists a CONGEST algorithm that given an undirected graph $G=(V,E)$ , two vertices $s,t\in V$ and $\kappa,\alpha\in\mathbb{N}$ , where $\kappa\leq\alpha$ , either outputs a valid $\kappa$ -cut, or outputs $\bot$ , such that

•

if there exists $L\subseteq V$ such that $|\mathsf{N}(L)|<\kappa,\{s,t\}\cap\mathsf{N}^{+}(L)=\{s\},|L|\leq\alpha$ , then the algorithm outputs $\bot$ with constant probability,

•

the algorithm has dilation $\widetilde{O}(\kappa^{2.5}\sqrt{n}+\kappa^{3}D)$ and congestion $\widetilde{O}(\kappa^{2.5}\alpha/\sqrt{n})$ .

*Remark 2.3**.*

Throughout this paper, it is important for the reader to keep in mind that our algorithm is a Monte Carlo algorithm with one-sided error. Specifically, when the output is a cut, it must be a valid cut with a size less than $\kappa$ . However, when the output is $\bot$ , it is possible that the graph has a cut with a size less than $\kappa$ , and the algorithm cannot distinguish whether the output is correct or not. Nevertheless, since the algorithm has one-sided error, as long as the error probability is bounded by a constant between 0 and 1, it can be reduced to as small as $\frac{1}{n^{c}}$ by repeating the algorithm $O(\log n)$ times.

2.1 IsolatingSmallCut (proof sketch of Lemma 2.1; full proof in Section 4)

The starting point is to run the algorithm of [FNY*+*20] for every vertex in $A$ simultaneously, i.e, run $\kappa$ rounds of DFS to find augmenting path on residual graphs, defined below. For a path $p=(v_{0},v_{1},v_{2},...,v_{\ell})$ , we define $pre_{p}(v_{i})=v_{i-1}$ for any $0<i\leq\ell$ and $suc_{p}(v_{i})=v_{i+1}$ for any $0\leq i<\ell$ . $v_{1},v_{2},...,v_{\ell-1}$ are called the internal vertices of $p$ . A set of paths are called internally vertex disjoint if any two of them do not share the same internal vertex. We define $V(p)$ as the vertex set consisting of all vertices in $p$ . For a set of paths $P$ , we define $V(P)=\cup_{p\in P}V(p)$ .

Definition 2.4 ( $(G,s,P)$ -Augmenting Path).

Let $G=(V,E)$ be an undirected graph, $s\in V$ and $P$ is a set of $k$ internally vertex disjoint paths starting from $s$ . (We call $P$ a flow-path set of $s$ .) A path $p_{aug}$ in $G$ is called $(G,s,P)$ -augmenting if

(i)

Starting vertex:* $p_{aug}$ starts at $s$ and,* 2. (ii)

Forced retreat:* for any consecutive vertices $u_{1},u_{2}$ in $p_{aug}$ where $u_{2}$ is not the end of $p_{aug}$ and any $p\in P$ , if $u_{2}\in V(p)\setminus\{s\}$ and $u_{1}\not=suc_{p}(u_{2})$ , then $suc_{p_{aug}}(u_{2})=pre_{p}(u_{2})$ .*

Figure 1 provides an example of such an $(G,s,P)$ -augmenting path. Intuitively speaking, if an augmenting path enters a vertex in path $p\in P$ that is not from its successor, then it is forced to go backward (or retreat).

For a minimum vertex cut $(L,S,R)$ , our goal is to find the maximum number of vertex disjoint paths from $s\in L$ to $R$ (from which we can infer the vertex cut), and we use augmenting paths to this end as follows. ‘Lemma’ 2.5 shows (i) if an augmenting path ending at $R$ can be found, then we can increase the number of vertex disjoint path, (2) if no augmenting path ending at $R$ can be found, then we can find a vertex cut.

‘Lemma’ 2.5 (Simplified version of Lemmas 3.3 and 3.4).

Suppose $G=(V,E)$ is an undirected graph, if $P$ is a set of $k$ internally vertex disjoint paths starting from $s\in V$ , ending at a vertex set $T$ , then

(i)

(Augmentation.) Suppose $p$ is a $(G,s,P)$ -augmenting path, ending at $t$ , then there exists a set of $k+1$ internally vertex disjoint paths $P^{\prime}$ ending at $T\cup\{t\}$ . See Figure 1 as an example. 2. (ii)

(Find a cut.) Let $S^{\prime}$ contain all the nodes that $s$ can reach through a $(G,s,P)$ -augmenting path. If $S^{\prime}\neq V$ , then the following nodes form a vertex cut: for any $p\in P$ , the node in $S^{\prime}\cap V(p)$ that has the largest distance to $s$ on $p$ . See Figure 2 as an example.

It is not hard to see that, in the Augmentation case above, the minimum vertex cut separating $s$ and $T\cup\{t\}$ has a size at least $k+1$ if $s$ does not have an edge to $T\cup\{t\}$ —this follows from Menger’s theorem.

Our algorithm IsolatingSmallCut for Lemma 2.1 works as follows. Initially, each node $s\in A$ has an empty flow-path set $P_{s}$ . We run $\kappa$ iterations where, in each iteration, we increase the size of $P_{s}$ by $1$ for each vertex $s\in A$ : In each iteration, very informally, each vertex $s$ sends a DFS token to explore $G$ in a DFS manner for $\Theta(\kappa\alpha)$ rounds in order to find a $(G,s,P_{s})$ -augmenting path. If the DFS gets stuck (This is explained shortly.), then we use ‘Lemma’ 2.5 to find a cut. Indeed, our main challenge is to reduce congestion caused by all of these DFS traversals running in parallel. To this end, we exploit the following property of augmenting paths which is the main technical lemma of this subsection. We start with some definitions which provide the necessary context.

A $(G,s,P)$ -augmenting path $p_{aug}=(s,v_{1},...,v_{\ell-1},v_{\ell})$ is called retreating if there exists $p\in P$ , such that $v_{\ell}\in V(p)\backslash\{s\},v_{\ell-1}\not=suc_{p}(v_{\ell})$ , i.e., the only way to extend $p_{aug}$ to a $(G,s,P)$ -augmenting path $(s,v_{1},...,v_{\ell},v_{\ell+1})$ is to set $v_{\ell+1}=pre_{p}(v_{\ell})$ . For example, in Figure 1, the red $(G,s,P)$ -augmenting path from $s$ to $u_{2}$ is retreating. A $(G,s,P)$ -augmenting path is called non-retreating if it is not retreating.

‘Lemma’ 2.6 (Simplified version of Lemma 4.3).

For any undirected graph $G=(V,E)$ , consider two vertices $u,v\in V$ , and let $P_{u}$ and $P_{v}$ be flow-path sets of $u$ and $v$ , respectively. Let $p_{u}$ and $p_{v}$ be non-retreating $(G,u,P_{u})$ - and $(G,v,P_{v})$ -augmenting paths, respectively. If $p_{u}$ and $p_{v}$ end at the same vertex, then there exists a path $p$ on the subgraph of $G$ resulting from combining all edges of $P_{u},P_{v},p_{u},p_{v}$ such that $p$ is either $(G,u,P_{u})$ -augmenting ending at $v$ , or $(G,v,P_{v})$ -augmenting ending at $u$ .

With ‘Lemma’ 2.6, the algorithm becomes the following. Denote the flow-path set of $s\in A$ as $P_{s}$ . Initially $P_{s}=\emptyset$ . Run the following procedure for $\kappa$ iterations: In each iteration, we make sure that the size of $P_{s}$ increases by 1 for all $s\in A$ .

(i)

Whole-graph DFS: In parallel, every vertex $s\in A$ sends a token (denoted by the $s$ -token) to explore new vertices in $G$ in a DFS manner: Each vertex $u$ (including $s$ ), once receiving the token, finds out which of its neighbors is not explored yet by the $s$ -token, and sends the $s$ -token to one such unexplored neighbor. The DFS follows the forced retreat property described in Definition 2.4, i.e., when an $s$ -token arrives at a vertex $u$ on a flow-path $p\in P_{s}$ not from $suc_{p}(u)$ , then the token must be sent to $pre_{p}(u)$ .101010The astute reader may observe that this DFS traversal may visit a vertex on a flow-path path more than once because it is forced to do so by a forced retreat. In Algorithm 1, however, we use a directed graph representation that will be defined in Definition 3.2 which avoids this problem. The DFS traversal ends in either of the following three ways:

•

If $s$ explores $\Theta(\kappa\alpha)$ vertices or $s$ reaches another vertex $t\in A$ , it stops.

•

If two tokens from $u,v\in A$ meet at a vertex $t$ , then they stop, form a pair $(u,v)$ , and report this fact back to $u$ and $v$ through DFS trees. Denote the path from $u$ and $v$ to $t$ in the DFS trees by $p_{u}$ and $p_{v}$ respectively. Define subgraph $H_{(u,v)}$ as the subgraph formed by the union of edges in $p_{u},p_{v},P_{u},P_{v}$ . This graph will be used in the next step.

If many tokens $u_{1},u_{2},\ldots,u_{\ell}$ meet at $t$ , we pair them up $(u_{1},u_{2}),\ldots$ to get subgraphs $H_{(u_{1},u_{2})},\ldots$ . In the case where $\ell$ is odd, $u_{\ell}$ is allowed to continue its DFS $t$ onward.

•

If $s$ finishes DFS (i.e., has explored all vertices it can reach) without exploring $\Theta(\kappa\alpha)$ vertices and without reaching another vertex $t\in A$ , output the small cut using ‘Lemma’ 2.5 (ii)(If several vertices finish DFS, we just need to pick an arbitrary one.)

Let $(L,S,R)$ be the vertex cut as claimed in Lemma 2.1, i.e., $|S|<\kappa$ , $A\cap(L\cup S)=\{s\}$ and $|L|\leq\alpha$ . Note that $s$ succeeds in finding a $(G,s,P_{s})$ -augmenting path that terminates in $R$ in the first case with a constant probability: (i) If $s$ explores $\Omega(\kappa\alpha)$ vertices, then a random vertex among the explored vertices is in $R$ with probability at least $1-\frac{1}{\Omega(\kappa)}$ . So we can choose this random vertex as the terminating vertex of the augmenting path111111A-priori we do not know if our chosen vertex is in $R$ or not. However, we show that, if the algorithm outputs a valid vertex cut in the end, it will be a cut of size at most $\kappa$ . See Remark 2.3.. (ii) If $s$ reaches $t\in A$ that $t\not=s$ , then $t$ is the terminating vertex and $t\in R$ .

Once the DFS traversals stop for every $s\in A$ , we move to the next step. 2. (ii)

Subgraphs DFS: For each pair $(u,v)$ , $u$ and $v$ run DFS traversal on $H_{(u,v)}$ . These DFS traversals in all $H_{(u,v)}$ ’s are run simultaneously using the random delay technique [Gha15] to avoid congestion121212According to [Gha15], running independent CONGEST algorithms simultaneously can be done using random delay in $\widetilde{O}(\text{dilation}+\text{congestion})$ rounds. See Lemma 3.1 for more details.. If $u$ find a $(G,u,P_{u})$ -augmenting path $p$ to $v$ , it uses $p$ to increase the size of $P_{u}$ by $1$ . Do the same for $v$ . ‘Lemma’ 2.6 guarantees that one of $u$ and $v$ will succeed in finding an augmenting path.

Note that executing Step (i) and (ii) will increase $|P_{s}|$ for a constant fraction of $s\in A$ by ‘Lemma’ 2.6. We repeat these two steps $O(\log n)$ times to make sure $|P_{s}|$ increases for every $s\in A$ .

Round complexity. We first bound the round complexity for the two steps. One can see that Step (i) runs in $O(\kappa\alpha)$ rounds. The round complexity of Step (ii) depends on the dilation (i.e., the diameter of subgraph $H_{(u,v)}$ ) and congestion (i.e., the maximum number of $H_{(u,v)}$ for different pairs $(u,v)$ that shares the same edge) which we bound below. We crucially use the following fact: A $(G,s,P)$ -augmenting path $p$ of length $\ell$ w.r.t. a flow-path set $P$ can increase the number of path edges in the new flow-path set by at most an additive factor of $\ell$ . 131313This observation follows directly from the following fact which is easy to see. Suppose $s\in A$ has flow-path set $P^{i}_{s}$ at the end of each iteration $i$ (We assume $P^{0}_{s}=\emptyset$ ), and consider the $(G,s,P^{i}_{s})$ -augmenting paths $p^{1}_{s},p^{2}_{s},...,p^{i}_{s}$ that are used to generate different $P^{i}_{s}$ : Each $p^{i}_{s}$ is a $(G,s,P^{i-1}_{s})$ -augmenting path. We claim that the edges in $P^{i}_{s}$ is a subset of edges in $p^{1}_{s},p^{2}_{s},...,p^{i}_{s}$ . Note that it might not be true that the set of edges in $P^{i-1}_{s}$ is a subset of the set of edges in $P^{i}_{s}.$

Dilation.

Note that each $p^{i}_{s}$ , $i\in[\kappa]$ , is of size $O(\kappa\alpha)$ . From the fact stated above, it is straightforward to bound the size of $H_{(u,v)}$ (which is composed of $p_{u},p_{v},P_{u},P_{v}$ ) by $\widetilde{O}(\kappa^{2}\alpha)$ .

Congestion.

The number of $H_{(u,v)}$ that contain an edge $e$ is bounded by the number of times $e$ is visited by DFS traversals in Step (i), as $e$ can be included in some $H_{(u,v)}$ only after it is visited in any DFS traversal in Step (i) by $u$ or $v$ . Every edge $e$ is included in at most one DFS traversal in each round of Step (i). Since Step (i) lasts for $\widetilde{O}(\kappa\alpha)$ rounds in each of the $\kappa$ iterations, an upper bound on the number of times $e$ is visited by DFS traversals in Step (i) is $\widetilde{O}(\kappa^{2}\alpha)$ .

The total round complexity is $\kappa\times O(\log n)\times\widetilde{O}(\kappa^{2}\alpha)=\tilde{O}(\kappa^{3}\alpha)$ : The first $\kappa$ is the number of iterations, $O(\log n)$ is the number of times Step (i) and (ii) are repeated in each iteration. See Section 4 for more details.

2.2 SingleSourceLocalCut (proof sketch of Lemma 2.2; full proof in Section 5.)

For intuition, note that a statement similar to Lemma 2.2 can be shown in the sequential setting [NSY19, FNY*+*20] by running the Ford-Fulkerson algorithm. This algorithm runs for $\kappa$ iterations where in each iteration it increases the amount of $st$ -flow by one via an augmenting path. We follow this basic idea but need some modifications. First, in each of the $k$ iterations, we randomly select some terminals, where each vertex has probability $O(1/(\kappa\alpha))$ to be the terminal. We allow the augmenting path to end at a terminal instead of at $t$ . This suffices because if there exists a vertex cut $(L,S,R)$ such that $\{s,t\}\cap(L\cup S)=\{s\},|L\cup S|\leq\alpha$ (thus $L$ satisfies the condition in the first bullet of Lemma 2.2), a simple union bound shows that the random terminals on all $\kappa$ rounds are in $R$ with constant probability. The algorithm for finding the augmenting path is stated as the following lemma. We will use this algorithm with $x=\kappa\alpha$ . Recall from Definition 2.4 the notion of flow-paths and $(G,s,P)$ -augmenting path.

‘Lemma’ 2.7 (RandomAugmenting $(G=(V,E),s,P,x)$ ).

There exists a CONGEST algorithm called RandomAugmenting that takes an undirected graph $G=(V,E)$ , two vertices $s,t\in V$ , integer $x$ and a set $P$ of flow-paths of $s$ where each path in $P$ has length bounded by $O(x)$ , as input and the algorithm either

outputs a vertex cut of size $|P|$ , or

-

outputs a $(G,s,P)$ -augmenting path with length bounded by $\widetilde{O}(x)$ , either ending at $t$ , or ending at a random vertex $\tilde{t}$ , where $\Pr[\tilde{t}=v]=O(1/x)$ for any $v\in V$ .

The algorithm has dilation $\widetilde{O}(|P|^{1.5}\sqrt{n}+|P|^{2}D)$ and congestion $\widetilde{O}(|P|^{0.5}x/\sqrt{n})$ .

To prove Lemma 2.2 using ‘Lemma’ 2.7, our algorithm starts with $P=\emptyset$ . It proceeds in $\kappa$ iterations, where in each iteration we find a $(G,s,P)$ -augmenting path using ‘Lemma’ 2.7 with $x=\Theta(\kappa\alpha)$ to increase the size of $P$ by $1$ . Since $|P|<\kappa$ , one can see that the dilation is $\kappa\cdot\widetilde{O}(|P|^{1.5}\sqrt{n}+|P|^{2}D)=\widetilde{O}(\kappa^{2.5}\sqrt{n}+\kappa^{3}D)$ and the congestion is $\kappa\cdot\widetilde{O}(|P|^{0.5}x/\sqrt{n})=\widetilde{O}(\kappa^{2.5}\alpha/\sqrt{n})$ , which is what we want in Lemma 2.2. The rest of this section is devoted to showing the proof idea of ‘Lemma’ 2.7.

Proof idea of ‘Lemma’ 2.7.

We first review the framework for distributed reachability algorithms used in [Nan14, GU15, LJS19]. (We will modify this framework to find a $(G,s,P)$ -augmenting path as guaranteed in ‘Lemma’ 2.7.) This framework consists of two phases, where the first phase is identical in all algorithms in [Nan14, GU15, LJS19], and these algorithms differ in the second phase. Suppose we want to find a path from $s$ to $t$ . The two phases are:

(i)

Build a virtual graph. Pick appropriate parameter $d$ (we will pick $d=|P|^{1.5}\sqrt{n}$ to prove ‘Lemma’ 2.7). Construct a virtual141414By “virtual” it means that edges in the virtual graph might not be edges in the input network. graph $G_{vir}=(V_{vir},E_{vir})$ where $V_{vir}$ (also called set of hubs) includes every vertex of $V$ with probability $1/d$ as well as $s$ , and an edge $e=(h_{1},h_{2})$ is included in $E_{vir}$ if the distance from $h_{1}$ to $h_{2}$ in $G$ is at most $d$ . $E_{vir}$ can be constructed by constructing a BFS tree $T_{h}$ of depth $d$ from each vertex $h\in V_{vir}$ in $G$ . 2. (ii)

Reachability in the virtual graph. Find all the hubs that $s$ can reach in $G_{vir}$ , denoted by $H_{r}$ (the way to efficiently find $H_{r}$ differs by different algorithms). Now we claim that $\cup_{h\in H_{r}}T_{h}$ are all the vertices $s$ can reach in the original graph $G$ .

The correctness is guaranteed by the following arguments: since we sample hubs with probability $\frac{1}{d}$ , the path from $s$ to a vertex $v$ contains hubs with distance $\widetilde{O}(d)$ one after another along the path, with high probability. Therefore, the hubs in the path form a directed path in the virtual graph, where the last hub in the path has distance $d$ to $v$ in $G$ .

Using reachability algorithm to find an augmenting path.

Our definition for augmenting path in Definition 3.2 can be reformulated as a directed path in a directed graph, by the standard way of duplicating each vertex into in-vertex and out-vertex. See Section 3.2 for more detail. Thus, we can use a directed graph reachability algorithm to find an augmenting path.

However, directly applying this framework to prove ‘Lemma’ 2.7 is not efficient as there can be $\Omega(n/d)$ BFS tree constructions that can lead to dilation $\Omega(d)$ and congestion $\Omega(n/d)$ . Recall that in ‘Lemma’ 2.7 we want dilation $\widetilde{O}(|P|^{1.5}\sqrt{n}+|P|^{2}D)$ and congestion $\widetilde{O}(|P|^{0.5}x/\sqrt{n})$ , where $x$ can be much smaller than $n$ . There is no way to set appropriate $d$ to satisfy both the dilation and congestion. To achieve a better dilation and congestion trade-off, we will only grow a BFS tree on fewer carefully chosen hubs instead of all $\Omega(n/d)$ hubs.

Path centered clustering.

The key idea to reduce the number of BFS tree constructions is a structure called path-centered clustering. The details of this structure are described in Definition 5.1, and Lemma 5.4 shows that we can efficiently construct this structure. Here we give a simplified version of the structure. Note that the following definition is different from the definition in Section 5.1, because the following definition failed to satisfy 2.8 in some cases, which affect the correctness of our algorithm. However, it shows the general idea of the more complicated definition, so we use it for ease of explanation.

For a given network $G=(V,E)$ of diameter $D$ , a path centered clustering is a tuple $\mathcal{C}=(P,\{S_{u}\}_{u\in V(P)})$ where $P$ is a flow-path set, and $\{S_{u}\}_{u\in V(P)}$ is a partition of $V$ (i.e. $V$ is a disjoint union of all $S_{u}$ ’s), called clusters with the following guarantees: Each cluster $S_{u}$ contains $u\in V(P)$ , and each induced subgraph $G[S_{u}]$ has a diameter at most $D$ . We call $u$ the center of every vertex $v\in S_{u}$ and denote it by $\mathsf{Center}_{\mathcal{C}}(v)$ . See Figure 3 for an example.

Definition of $\mathsf{Before}$ and active hubs.

We need a few definitions to show the properties of path centered clustering. For a path $p=(v_{0},v_{1},...,v_{\ell})$ and $v_{i},v_{j}$ on the path, we say $v_{i}\preceq_{p}v_{j}$ if $i\leq j$ and we say $v_{i}\prec v_{j}$ on path $p$ if $i<j$ . For any two vertices $h,h^{\prime}\in V$ and a path centered clustering $\mathcal{C}=(P,\{S_{u}\}_{u\in V(P)})$ , we say $h^{\prime}\preceq_{\mathcal{C}}h$ , if $\mathsf{Center}_{\mathcal{C}}(h^{\prime})$ and $\mathsf{Center}_{\mathcal{C}}(h)$ belong to some path $p\in P$ and $\mathsf{Center}_{\mathcal{C}}(h^{\prime})\preceq_{p}\mathsf{Center}_{\mathcal{C}}(h)$ . The relationship $\preceq_{\mathcal{C}}$ is not total as not every two vertices in $G$ are comparable by $\preceq_{\mathcal{C}}$ . For each hub $h$ (recall that hubs are sampled vertices in $G$ with sample probability $\frac{1}{d}$ ), we use $\mathsf{Before}_{\mathcal{C}}[h]$ to denote the number of hubs $h^{\prime}$ with $h^{\prime}\preceq_{\mathcal{C}}h$ . We will assume the following assumption.

Assumption 2.8.

If $h^{\prime}\preceq_{\mathcal{C}}h$ , then $h$ can reach $h^{\prime}$ through an augmenting path.

*Remark 2.9**.*

It is to be noted that our actual clustering is more fine-grained than what is described above to tackle the following technical problem: 2.8 is true if $\mathsf{Center}_{\mathcal{C}}(h^{\prime})\prec_{p}\mathsf{Center}_{\mathcal{C}}(h)$ (In Figure 4, the blue line shows an augmenting path from $h$ to $h^{\prime}$ .) and may not be true if $\mathsf{Center}_{\mathcal{C}}(h^{\prime})=\mathsf{Center}_{\mathcal{C}}(h)$ . This is solved by making the clustering more fine-grained—more details are provided in Section 5.1. In this section, we assume 2.8 holds for ease of explanation.

Build a virtual graph with fewer BFS tree constructions.

In this part we will show how to build a virtual graph $G_{vir}$ on hubs with $O(x/d)\cdot|P|$ BFS tree constructions, such that either

$G_{vir}$ preserves the $s$ -reachability (in the sense that all the vertices reachability by $s$ in $G$ can be reached from a vertex $u$ in $G_{vir}$ with distance $d$ , such that $s$ can reach $u$ in $G_{vir}$ ), or

-

$s$ can reach a random vertex $\tilde{t}$ such that each vertex in $V$ becomes $\tilde{t}$ with probability $O(\frac{1}{x})$ .

Now we give our algorithm. We first compute a path centered clustering $\mathcal{C}$ . We call a hub $h$ active hub if $\mathsf{Before}_{\mathcal{C}}[h]=O(x/d)$ . Other hubs are called non-active hubs. Denote the set of all active hubs as $V_{act}$ . One can argue that $|V_{act}|=O(x/d)\cdot|P|$ . We only grow BFS trees on active hubs. By setting $d=|P|^{1.5}\sqrt{n}$ , the dilation and congestion of constructing all the BFS trees satisfy the requirement in ‘Lemma’ 2.7. By doing that, we can get a virtual graph $G_{vir}=(V_{vir},E_{vir})$ where $E_{vir}$ includes an edge $e=(h_{1},h_{2})$ if $h_{1}\in V_{act}$ and $h_{1}$ has distance at most $d$ to $h_{2}$ in $G$ .

Now we argue the property of $G_{vir}$ . If in $G_{vir}$ , $s$ can reach a non-active hub $h$ through active hubs, then we can pick a uniform random hub $h^{\prime}$ among all hubs $h^{\prime}\preceq_{\mathcal{C}}h$ as the destination. Notice that a non-active node $h$ satisfies $\mathsf{Before}_{\mathcal{C}}[h]=\Omega(n/d)$ , thus, each node has probability at most $O(1/d)\cdot O(d/x)=O(1/x)$ to be the destination. On the other hand, if $s$ cannot reach any non-active hub, then by growing BFS trees on all active hubs, we can find all vertices that $s$ can reach in $G$ exactly.

Find reachability in virtual graph

Let $H_{r}$ contain all the active hubs that $s$ can reach in the virtual graph $G_{vir}$ . Our goal in this part is to find $H_{r}$ efficiently. Notice that if we can find $H_{r}$ , the according to the argument in the previous part, either we can find all vertices in $G$ that $s$ can reach, or find a non-active hub such that we can choose a random destination with probability $O(1/x)$ .

We first discuss the difficulty. Notice that $|H_{r}|=O(|P|\cdot x/d)=O\left(x/(|P|^{0.5}\sqrt{n})\right)$ . Possible values of $x,|P|$ are $x=\Theta(n)$ and $|P|=O(1)$ . In this case, $|H_{r}|=O(\sqrt{n})$ . All the existing algorithms fail to find reachability with round complexity $\tilde{O}(\sqrt{n}+D)$ on a virtual graph with $\sqrt{n}$ vertices. However, our virtual graph is not an arbitrary directed graph. We will exploit some properties of our virtual graph to come up with an efficient algorithm.

The idea is to sparsify the transitive closure of $G_{vir}$ and broadcast the whole sparsified graph. We will make sure that the sparsified graph has the same reachability relationship as the original graph, and it is possible to broadcast the sparsified graph using $O(|P|\cdot|V_{act}|)$ messages. There are two types of edges in the sparsified graph.

Backward edges.

These are edges $(h,h^{\prime})$ where $h^{\prime}\preceq_{\mathcal{C}}h$ . To learn this type of edge, we give each flow-path $p\in P$ an id. Each vertex $v$ on $p$ can learn $p$ ’s id and its position on $p$ (the number of vertices $x$ with $x\preceq_{p}v$ ) efficiently by existing results. After that, each active hub $h$ broadcasts the flow-path id where $\mathsf{Center}_{\mathcal{C}}[h]$ is on, as well as the position on the flow-path.

Forward edges.

For each active hub $h$ , recall that $T_{h}$ is the directed tree with depth $d$ rooted at $h$ . Instead of keeping all edges from $h$ to all hubs in $T_{h}$ , we preserve the “highest hub” for each path $p\in P$ : let $T^{p}_{h}$ contain all hubs $x$ in $T_{h}$ with $\mathsf{Center}_{\mathcal{C}}[x]$ on $p$ . Let $h^{*}_{p}$ be an arbitrary hub in $T^{p}_{h}$ such that for every other hub $h^{\prime}\in T^{p}_{h}$ , we have $h^{\prime}\preceq_{\mathcal{C}}h^{*}_{p}$ . $(h,h^{*}_{p})$ is added to the virtual graph for any $p\in P$ .

One can see that the number of messages broadcast by every active hub is bounded by $|P|$ . Thus, the congestion is $\widetilde{O}(|P|^{0.5}x/\sqrt{n})$ , which fits our goal. To see that the reachability relationship does not change, suppose $h^{\prime}\in T_{h}$ where $\mathsf{Center}_{\mathcal{C}}[h^{\prime}]$ is on $p$ , then $h$ can reach $h^{\prime}$ in the virtual graph by first using the upward edge $(h,h^{\ast}_{p})$ , then using the downward edge $(h^{\ast}_{p},h^{\prime})$ .

*Remark 2.10**.*

We skip the mapping of each edge in the virtual graph to a path in the original graph efficiently in the technical overview, see Section 5.4 for more details. Actually, to recover the path in the original graph efficiently, the sparsified virtual graph defined in Section 5.4 is different from here and more complicated, while the high-level ideas are the same.

2.3 Putting everything together

We first restate Theorem 1.1 formally.

Theorem 2.11.

There is a randomized vertex cut algorithm in the CONGEST model that, with input $\kappa<n^{1/4}$ and undirected graph $G$ , takes $\kappa^{3}\cdot\tilde{O}(D+\sqrt{n})$ rounds, either outputs a minimum vertex cut of $G$ , or outputs $\bot$ , satisfying

If the output is a vertex cut, then it must be a minimum vertex cut of $G$ . 2. 2.

If $G$ is not $\kappa$ -connected, then $\bot$ is output with at most constant probability.

Since Theorem 2.11 states a one-side error algorithm, the success probability can be boosted efficiently. The following is the schematic of the algorithm, using the subroutine described in Lemmas 2.1 and 2.2.

Schematic algorithm for vertex cut

•

Input: An undirected graph $G$ with $n$ nodes, a positive integer $\kappa<n^{1/4}$ .

•

Output: A vertex cut with size less than $\kappa$ , or $\bot$ .

If a vertex has degree less than $\kappa$ in $G$ , output all the neighbors of this vertex. Otherwise continue the following procedures.

For $1\leq i\leq\log n$ do:

(a)

Let $\alpha=2^{i},A=\emptyset$ . Each vertex is included in $A$ with probability $1/\alpha$ independently.

(b)

If $\alpha\leq\kappa$ ,

•

discard vertices in $A$ with degree larger than $\kappa$ in $G[A]$ , and run a $O(\kappa)$ -coloring algorithm in $G[A]$ ([HKMT21]) to get $\ell=O(\kappa)$ independent sets $A_{1},A_{2},...,A_{\ell}$ (see Lemma 2.12);

•

run IsolatingSmallCut $(G,A_{i},\kappa,\alpha)$ (see Lemma 2.1) for any $i\in[\ell]$ .

(c)

If $\kappa<\alpha<\sqrt{n}$ , discard all vertices in $A$ with degree at least $1$ in $G[A]$ , run IsolatingSmallCut $(G,A,\kappa,\alpha)$ (see Lemma 2.1).

(d)

If $\sqrt{n}\leq\alpha$ , for each $s\in A$ , let $t_{s}\in A$ be an arbitrary vertex which is distinct from $s$ . Run SingleSourceLocalCut $(G,s,t_{s},\kappa,\alpha)$ (see Lemma 2.2) for any $s\in A$ in parallel (see Lemma 3.1).

If any subroutine described in Lemmas 2.1 and 2.2 outputs a cut, then the algorithm outputs the cut and stop. Otherwise, output $\bot$ .

Correctness.

According to Lemmas 2.1 and 2.2, if a cut is output, then it must be a valid vertex cut with size less than $\kappa$ . Thus, if the graph $G$ has no valid vertex cut with size less than $\kappa$ , then the algorithm will output $\bot$ with probability $1$ .

Suppose there is a vertex cut $(L,S,R)$ with $|S|<\kappa$ . We assume the max degree of the graph is at least $\kappa$ , otherwise, a vertex cut of size less than $\kappa$ can be trivially found in the first step of the algorithm. We will show that in the second step, at the first iteration when $|L|<\alpha=O(|L|)$ , a cut with a size less than $\kappa$ will be output with constant probability.

Case 1 ( $\kappa\geq\alpha$ ):

In this case, we get $\ell$ independent sets $A_{1},A_{2},...,A_{\ell}$ . We first prove the following lemma.

Lemma 2.12.

At least one of $A_{1},A_{2},...,A_{\ell}$ (denoted by $A^{*}$ ) satisfies: $A^{*}$ is an independent set on $G$ , contains exactly one vertex in $L$ , and $A^{*}\cap S=\emptyset$ .

Proof.

Since $|L|=\Theta(\alpha)$ and we sample each vertex into $A$ with probability $1/\alpha$ , with constant probability there is exactly one vertex $u\in A\cap L$ . Let $\mathsf{N}^{+}(u)$ contain all neighbors of $u$ in $G$ and $u$ itself. Since the degree of $u$ is at least $\kappa$ and $|S|<\kappa$ , we have $(L\cup S)-\mathsf{N}^{+}(u)$ has size at most $|L|+\kappa-\kappa=|L|=O(\alpha)$ . Thus, with constant probability, $(L\cup S)-\mathsf{N}^{+}(u)$ contains no vertex in $A$ . Consider the independent set $A^{*}$ among $A_{1},...,A_{\ell}$ that contain $u$ . We have $A^{*}\cap(L\cup S)=\{u\}$ , which finishes the proof. ∎

According to Lemma 2.1, once Lemma 2.12 is proved, a cut with size less than $\alpha$ will be output with constant probability when IsolatingSmallCut $(G,A^{*},\kappa,\alpha)$ is called.

Case 2 ( $\kappa<\alpha<\sqrt{n}$ ):

Since we sample each vertex into $A$ with probability $1/\alpha$ and $|L|=\Theta(\alpha),S=O(\kappa)=O(\alpha)$ , with constant probability, exactly one vertex is in $A\cap L$ and $A\cap S=\emptyset$ . According to Lemma 2.1, a cut with size less than $\alpha$ will be output with constant probability.

Case 3 ( $\alpha\geq\sqrt{n}$ ):

According to the same argument, with constant probability, exactly one vertex $u$ is in $A\cap L$ and $A\cap S=\emptyset$ . Consider the instance with $s\leftarrow u$ , that instance satisfies the premise of Lemma 2.2 to output a cut with size less than $\kappa$ .

Round complexity.

When $\alpha<\kappa$ , the round complexity for the coloring algorithm is $\widetilde{O}(1)$ . There are $\widetilde{O}(\kappa)$ instances of IsolatingSmallCut in Lemma 2.1, which leads to the round complexity $\widetilde{O}(\kappa^{4}\alpha)=\widetilde{O}(\kappa^{5})=\widetilde{O}(\kappa^{3}\sqrt{n})$ since $\kappa=O(n^{1/4})$ . When $\kappa\leq\alpha<\sqrt{n}$ , the round complexity is $\widetilde{O}(\kappa^{3}\alpha)=\widetilde{O}(\kappa^{3}\sqrt{n})$ . When $\sqrt{n}\leq\alpha$ , the dilation is $\widetilde{O}(\kappa^{2.5}\sqrt{n}+\kappa^{3}D)$ and the total congestion is $\widetilde{O}(n/\alpha)\cdot\widetilde{O}(\kappa^{2.5}\alpha/\sqrt{n})$ , since there are $\widetilde{O}(n/\alpha)$ vertices in $A$ w.h.p. Thus, the round complexity is $\kappa^{3}\cdot\widetilde{O}(\sqrt{n}+D)$ .

2.4 Organization

The rest of this paper is organized as follows. In Section 3, we give some basic definitions and define the vertex residual graph. In Section 4, we describe the algorithm IsolatingSmallCut to prove Lemma 2.1. In Section 5, we describe the algorithm SingleSourceLocalCut to prove Lemma 2.2. Other proofs for less important lemmas are deferred to the appendices.

3 Preliminary

3.1 Basic Definitions

We will use the following terminology throughout the paper.

Graph terminologies.

For convenience, we treat an undirected graph as a directed graph with each undirected edge $(u,v)$ replaced by two directed edges $(u,v),(v,u)$ , i.e., $(u,v)$ and $(v,u)$ are different edges in an undirected graph.

For a graph $G=(V,E)$ , a path $p$ with length $k$ is a vertex sequence $(v_{0},v_{1},...,v_{k})$ , where $(v_{i},v_{i+1})\in E$ for all $0\leq i<k$ . We say $p$ starts at $v_{0}$ and ends at $v_{k}$ . For $i<j$ , we write $v_{i}\prec_{p}v_{j}$ to denote $v_{i}$ precedes $v_{j}$ on path $p$ . The edges $\{(v_{i},v_{i+1})\mid 0\leq i<k\}$ are called the edges of $p$ , denoted as $E(p)$ . Normally we assume a path cannot contain repeated edges (but can contain repeated vertices). $\{v_{i}\mid 0\leq i\leq k\}$ are called vertices of $p$ , denoted as $V(p)$ , $\{v_{i}\mid 0<i<k\}$ are called internal vertices of $p$ , denoted as $V_{I}(p)$ . $p$ is called simple path if $v_{0},...,v_{k}$ are distinct. A set of paths $P$ are called internally vertex disjoint, if every two paths intersect only at non-internal vertices (start and end vertices). Similarly we define $E(P)=\cup_{p\in P}E(p),V(P)=\cup_{p\in P}V(p),V_{I}(P)=\cup_{p\in P}V_{I}(p)$ . We say $P$ ends at the multiset $V^{\prime}$ if the union of end vertices of paths in $P$ is $V^{\prime}$ . In multiset, we also care about the number of occurrences of elements. A circle with length $k$ is a length $k$ path $(v_{0},v_{1},...,v_{k}=v_{0})$ where $v_{0},v_{1},...,v_{k-1}$ are different. A set of circles are called vertex disjoint if any two of the circles do not share any vertices.

A subgraph of $G$ is an edge set $E^{\prime}\subseteq E$ . Let $V^{\prime}$ be all the vertices adjacent to $E^{\prime}$ , $(V^{\prime},E^{\prime})$ is the subgraph associated with $E^{\prime}$ . We do not distinguish $E^{\prime}$ and the subgraph $(V^{\prime},E^{\prime})$ if there is no ambiguity in the context. For a vertex set $V^{\prime}\subseteq V$ , the induced subgraph $G[V^{\prime}]$ is the graph with vertices set $V^{\prime}$ and edge set $\{(u,v)\mid u,v\in V^{\prime},(u,v)\in E\}$ . Further, We define the boundary of a subset of vertices as

[TABLE]

Moreover, we define $\mathsf{N}^{+}(V^{\prime})=\mathsf{N}(V^{\prime})\cup V^{\prime}$ . Two vertex sets $V_{1},V_{2}$ are call connected if $\mathsf{N}^{+}(V_{1})\cap\mathsf{N}^{+}(V_{2})\not=\emptyset$ .

A vertex cut is a vertex set $S\subseteq V$ such that $G[V\backslash S]$ is not connected. $|S|$ is called the size of the vertex cut. We also use the 3-tuple $(L,S,R)$ to represent a vertex cut, where $L\cup S\cup R=V$ , $L,S,R$ are mutually disjoint, and $\mathsf{N}(L)\subseteq S$ .

CONGEST model.

Suppose the communication happens in the network $G=(V,E)$ . In the CONGEST model, time is divided into discrete time slots, where each slot is called a round. Throughout the paper, we always use $n$ to denote the number of vertices in our distributed network, i.e., $|V|$ . In each round, each vertex in $V$ can send a $O(\log n)$ bit message to each of its neighbors. At the end of each round, vertices can do arbitrary local computations. A CONGEST algorithm initially specifies the input for each vertex, after several rounds, all vertices terminate and generate output. The time complexity of a CONGEST algorithm is measured by the number of rounds.

Distributed inputs and outputs.

Since the inputs and outputs to the distributed network should be specified for each vertex, we must be careful when we say something is given as input or is output. Here we make some assumptions. For the network $G=(V,E)$ , we say a subset of vertices (or a single vertex) $V^{\prime}\subseteq V$ is the input or output if every vertex is given the information about whether it is in $V^{\prime}$ . We say a subgraph $H$ (a subset of edges, for example, paths or circles) is the input or output if every vertex knows the edges in $H$ adjacent to it. We say a number is input or output (for example, $\kappa$ ), we normally mean the number is the input or output of every vertex unless otherwise specified.

Dilation and congestion

For $k$ independent CONGEST algorithms $\mathcal{A}=\{A_{i}\mid i\in[k]\}$ on the same network $G=(V,E)$ , the dilation of $\mathcal{A}$ is defined to be $d_{\mathcal{A}}=\max_{i\in[k]}d_{i}$ where $d_{i}$ is the round complexity of algorithm $A_{i}$ , and the congestion of $\mathcal{A}$ is defined to be $c_{\mathcal{A}}=\max_{e\in E}\left(\sum_{i\in[k]}c^{e}_{i}\right)$ , where $c^{e}_{i}$ is the number of messages sent through edge $e$ by algorithm $A_{i}$ . The following lemma is taken from [Gha15]. We will frequently use this lemma in our algorithm description.

Lemma 3.1.

All algorithm in $\mathcal{A}$ can be simulated in $\widetilde{O}(d_{\mathcal{A}}+c_{\mathcal{A}})$ rounds.

As an example, consider growing BFS trees with depth $d$ starting from $c$ vertices, one can see that this can be done in $\widetilde{O}(d+c)$ rounds.

3.2 Vertex Residual Graph

In this section, we will define vertex residual graph. We will define a directed graph $G^{\prime}$ , such that finding a directed path on $G^{\prime}$ is equivalent to finding a $(G,s,P)$ -Augmenting Path defined in Definition 2.4.

We use ideas from the well-known reduction from vertex connectivity to edge connectivity. We split each vertex $v$ into two vertices $v^{in},v^{out}$ , with a directed edge from $v^{in}$ to $v^{out}$ . For each edge $(u,v)$ in the original graph, we build an edge from $u^{out}$ to $v^{in}$ . Moreover, the residual graph is the graph reversing edge directions on $P$ , i.e., for each edge $(u,v)\in E(P)$ we reverse the edge direction of edge $(u^{out},v^{in})$ and for any $u\in V(P)$ we reverse the edge direction of edge $(u^{in},u^{out})$ . Since only edges with one edge vertex in $V_{I}(P)$ will change direction, in the following definition, we only duplicate vertex $v\in V_{I}(P)$ to $v^{in},v^{out}$ , vertex not in $V_{I}(P)$ can combine $v^{in},v^{out}$ into just one vertex $v^{out}$ .

Definition 3.2 (Vertex residual graph).

Given an undirected graph $G=(V,E)$ , a vertex $s\in V$ and a set $P$ containing $k$ internally vertex disjoint simple paths starting from $s$ , we define the vertex residual graph on $(G,P)$ as the directed graph $G^{\prime}=(V^{\prime},E^{\prime})$ . Let $X=V_{I}(P)$ be all the internal vertices of $P$ .

[TABLE]

We say $s\in G$ is the projection of vertex $s^{in},s^{out}\in G^{\prime}$ , denoted as $M(s^{in})=M(s^{out})=s$ . Similarly, for a path $p^{\prime}$ in $G^{\prime}$ , $M(p^{\prime})$ is the path on $G$ that maps each vertex in $p^{\prime}$ to its projection, maintaining the order, and combining consecutive repeated vertices mapped from $(u^{out},u^{in})$ . We call a vertex in $V^{\prime}$ as in-vertex or out-vertex according to its superscript, i.e., whether it is $v^{in}$ or $v^{out}$ for some $v\in V$ .

One should keep in mind the following relationship between Definition 2.4 and Definition 3.2: if a path reaches $v^{in}$ , that means the path enters $v$ from a vertex other than $suc_{p}(v)$ and must go to $pre_{p}(v)$ .

The following lemmas show how to relate finding paths in a vertex residual graph to increasing the number of internally vertex disjoint paths. For an edge $(u,v)$ , the reversed edge is defined to be $(u,v)^{T}=(v,u)$ . For an edge set $E$ , the reversed edge set is $E^{T}=\{(v,u)\mid(u,v)\in E$ }. The symmetric difference of $E$ is defined to be $\oplus(E)=E-E\cap E^{T}$ . i.e., all the converse edges are cancelled.

Lemma 3.3 (Augmenting; Proof in Appendix B).

Given an undirected graph $G=(V,E)$ , a vertex $s\in V$ and a set $P$ containing $k$ internally vertex disjoint simple paths starting from $s$ and ending at multiset $T\subseteq V$ , let $G^{\prime}$ be the vertex residual graph on $(G,P)$ , if there exists a simple directed path $p^{\prime}$ in $G^{\prime}$ starting from $s$ ending at $v^{out}$ , while the internal vertices of $p^{\prime}$ do not contain any $t^{out}$ for $t\in T$ , then there exists $k+1$ internally vertex disjoint simple paths $P^{\prime}$ starting from $u$ and ending at multiset $T\cup\{v\}$ .

Moreover, in the subgraph $\oplus(E(P)\cup E(M(p^{\prime}))$ , the maximal connected components containing $s$ is $E(P^{\prime})$ ; other maximal connected components of $\oplus(E(P)\cup E(M(p^{\prime}))$ are vertex disjoint circles.

The following lemma shows how to find a cut if an augmenting path cannot be found.

Lemma 3.4 (Find cut).

Given an undirected graph $G=(V,E)$ , a vertex $s\in V$ and a set $P$ containing $k$ internally vertex disjoint simple paths starting from $s$ . Suppose $G^{\prime}=(V^{\prime},E^{\prime})$ is the vertex residual graph on $(G,P)$ , and $S^{\prime}\subseteq V^{\prime}$ are all the vertices $s^{out}$ can reach in $G^{\prime}$ . Then let $S=\{v\in V\mid v^{in}\in S^{\prime},v^{out}\not\in S^{\prime}\}$ , $S$ has size at most $k$ and is a vertex cut in $G$ if $V^{\prime}\backslash S^{\prime}\not=\emptyset$ .

Proof of Lemma 3.4.

First, notice that $S$ can contain at most $1$ vertex in each path $p\in P$ . If it contains two vertices $v_{i},v_{j}$ where $p=(....,v_{i},....,v_{j},...)$ , then we know $v^{in}_{j}\in S^{\prime}$ , which means $s^{out}$ can reach $v^{in}_{j}$ , and $v^{in}_{j}$ can reach $v^{out}_{i}$ through the backwards direction of $p$ . That lead to a contradiction as $v^{out}_{i}\not\in S^{\prime}$ . Thus, we have $|S|\leq k$ .

Denote $L=\{v\in V\mid v^{out}\in S^{\prime}\}$ . We will prove that $\mathsf{N}(L)=S$ and $V\backslash\mathsf{N}^{+}(L)\not=\emptyset$ , which means $S$ is a vertex cut.

We first prove that $\mathsf{N}(L)=S$ . To prove $\mathsf{N}(L)\subseteq S$ , suppose $v$ is a neighbor of $u\in L$ where $v\not\in L$ , we will prove $v\in S$ . Notice that $u^{out}\in S^{\prime}$ according to the definition of $L$ . Therefore, if $v^{in}$ do not exists, then $(u^{out},v^{out})$ is an edge and that contradict the fact that $v^{out}\not\in S^{\prime}$ . Thus, $v^{in}$ exists and $(u^{out},v^{in})$ is an edge, which means $v^{in}\in S^{\prime}$ . Since $v^{out}\not\in S^{\prime}$ , we have $v\in L$ . To prove $S\subseteq\mathsf{N}(L)$ , for a vertex $v$ with $v^{in}\in S^{\prime}$ and $v^{out}\not\in S^{\prime}$ , we first have $v\not\in L$ . Since $v^{in}\in S^{\prime}$ , there exists a path $(s^{out},....,u^{out},v^{in})$ (since the edge that go into $v^{in}$ must from an out vertex). We have $u^{out}\in S^{\prime},u\in L$ and $(u,v)\in E$ , which proves $v\in\mathsf{N}(L)$ .

Then we prove that there exists a destination $t$ of a path $p\in P$ such that $t^{out}\not\in S^{\prime}$ . Otherwise, if $S^{\prime}$ contains all destinations of $p\in P$ , then $S^{\prime}$ contains all $v^{in},v^{out}\in V(P)$ , which means $S^{\prime}$ contains all the nodes in $V^{\prime}$ as the graph is connected. Now we get $t\not\in L$ , and we also know that $t\not\in S$ according to the definition. Since we proved $\mathsf{N}(L)=S$ , we get $t\in V\backslash\mathsf{N}^{+}(L)$ .

∎

4 IsolatingSmallCut (Proof of Lemma 2.1)

In this section, we prove Lemma 2.1. We will give details of the algorithm described in Section 2.1, in the context of vertex residual graph.

4.1 Distributed Algorithm Details

We first described the detailed DFS subroutine for one vertex $u$ . Recall that $u$ initially gets an empty flow-path set $P$ , and at each loop, it uses DFS to increase the size of $P$ by $1$ , by finding an augmenting path in the residual graph. We use $c_{0}$ to denote a sufficiently large constant.

*Remark 4.1**.*

More details about line 1: Let $H$ be the subgraph containing all edges in $G^{\prime}$ that has transferred the DFS-token. Clearly, $H$ is connected and contains all paths from $u^{out}$ to vertices in $S^{\prime}$ . To sample $v^{out}$ , we sample a random rank in $[1,n^{c}]$ for every out-vertex and pick the highest one by communicating inside $H$ ; to find $p^{\prime}$ , we simply follow DFS-token back forward path, which is also inside $H$ . To update $P$ to $P^{\prime}$ using $p^{\prime}$ , recall Lemma 3.3. We first map $p^{\prime}$ to $M(p^{\prime})$ . Then we truncate $M(p^{\prime})$ at the vertex $t$ fit into one of the following two cases

$t$ is an end vertex of a path $p\in P$ . 2. 2.

$t\not=u,t\in A$ (for the definition of $A$ , see Algorithm 2).

we compute $H^{\prime}=\oplus(E(P)\cup E(trancated[M(p^{\prime})])\subseteq H$ , where $trancated[M(p^{\prime})]$ is the path from $u$ to $t$ in $M(p^{\prime})$ , and find the connected component of $H^{\prime}$ containing $u$ , divide it into $|P|+1$ internally vertex disjoint paths.

Fact 4.2.

Algorithm 1 has the following properties.

It either outputs a valid $|P|$ -vertex cut, or internally vertex disjoint paths $P^{\prime}$ . 2. 2.

It has constant congestion inside $H$ , and has dilation $O(\kappa\alpha)$ . 3. 3.

When the algorithm ends, $S^{\prime}$ has size $\Omega(\kappa\alpha)$ .

The first fact is due to Lemmas 3.3 and 3.4. This second fact is straightforward from the algorithm description. The third fact is due to the DFS procedure: each round of the for loop (line 1) either add a new vertex to $S^{\prime}$ , or send back the DFS-token from a vertex in $S^{\prime}$ , while each vertex can send back DFS-token only once.

The main algorithm will run Algorithm 1 for all vertices in $A$ simultaneously, which might lead to higher congestion. To avoid congestion, we need the following lemma, which is the restatement of ‘Lemma’ 2.6 in the context of the vertex residual graph.

Lemma 4.3 (Path-handshaking; proof in Section 4.2).

Let $G=(V,E)$ be an undirected graph and $s_{1},s_{2},t\in V$ . For $i\in\{1,2\}$ , suppose $P_{i}$ is a flow-path set of $s_{i}$ . Let $G^{\prime}_{i}$ be the vertex residual graph on $(G,P_{i})$ , and suppose there exists a simple path $p^{\prime}_{i}$ on $G^{\prime}_{i}$ starting from $s^{out}_{i}$ , ending at $t^{out}$ . Then there exists a path $p^{\prime}$ with $E(M(p^{\prime}))\subseteq E\left(\{M(p^{\prime}_{1}),M(p^{\prime}_{2})\}\cup P_{1}\cup P_{2}\right)$ , such that $p^{\prime}$ is either on $G^{\prime}_{1}$ starting from $s^{out}_{1}$ ending at $s_{2}^{in}$ or $s_{2}^{out}$ , or on $G^{\prime}_{2}$ starting from $s^{out}_{2}$ ending at $s_{1}^{in}$ or $s_{1}^{out}$ .

See Algorithm 2 for our main algorithm in this section. For convenience, we duplicate each edge $(u,v)$ in the communication graph into two parallel edges, one edge transfers message sending from $u^{out}$ and one edge transfers message sending from $u^{in}$ .

Line 2 refers to the DFS procedure in Algorithm 1 loop 1. Line 2 refers to line 1, 1 in Algorithm 1, running simultaneously for every $Alg_{s}$ using Lemma 3.1. Line 2 is also running simultaneously for every pair in $(u,v)$ using Lemma 3.1.

4.2 Proof of Lemma 4.3

In this section, we will prove the important path-handshaking lemma. Suppose $P_{1},P_{2}$ end at multiset $T_{1},T_{2}$ , respectively. According to Lemma 3.3, there exists internally vertex disjoint paths $P^{\prime}_{1},P^{\prime}_{2}$ ending at $T_{1}\cup\{t\},T_{2}\cup\{t\}$ , where $E(P^{\prime}_{1})\subseteq\oplus(E(P_{1}\cup\{M(p^{\prime}_{1})\})),E(P^{\prime}_{2})\subseteq\oplus(E(P_{2}\cup\{M(p^{\prime}_{2}))\})$ . Denote the edge set $E(P_{1}\cup P_{2}\cup\{M(p^{\prime}_{1}),M(p^{\prime}_{2})\})$ as a subgraph $H=(V_{H},E_{H})$ . We have $E(P^{\prime}_{1})\cup E(P^{\prime}_{2})\subseteq H$ according to Lemma 3.3. Let $H^{\prime}_{1},H^{\prime}_{2}$ be the vertex residual graph restricted on $H$ (i.e., the subgraph of $G^{\prime}_{i}$ that contains edges $(u,v)$ with $M(u)=M(v)$ or $(M(u),M(v))\in H$ ). Let $S_{i}$ denote the set of all the vertices $v$ such that $s_{i}$ can reach $v^{out}$ in $H^{\prime}_{i}$ . Since $p^{\prime}_{1},p^{\prime}_{2}$ both end at $t^{out}$ , we have $t\in S_{1}\cap S_{2}$ . Now we suppose $s^{out}_{1}$ cannot reach $s^{in}_{2},s^{out}_{2}$ on $H^{\prime}_{1}$ , and $s^{out}_{2}$ cannot reach $s^{in}_{1},s^{out}_{1}$ on $H^{\prime}_{2}$ , and try to get contradiction. We first show some properties of $S_{i}$ and $P^{\prime}_{i}$ for $i\in\{1,2\}$ .

Claim 4.4.

If $(u,v)\in\oplus(E(P_{i}\cup\{M(p^{\prime}_{i})\}))$ and $v^{in}$ or $v^{out}$ is reachable from $s_{i}^{out}$ in $H^{\prime}_{i}$ , then $u\in S_{i}$ .

Proof.

According to the definition of $\oplus$ , $(u,v)$ are either in $E(p^{*})$ for some $p^{*}\in P_{i}$ , or in $E(M(p^{\prime}_{i}))$ . In the former case, $v^{out}$ can reach $v^{in}$ , $v^{in}$ can reach $u^{out}$ , which means $u\in S_{i}$ . Now suppose $(u,v)\not\in E(P_{i})$ . In the latter case, we also have $(v,u)\not\in E(P_{i})$ according to the definition of $\oplus$ . Thus, $u^{in}$ has no edge to $v^{in}$ or $v^{out}$ . Since $(u,v)\in E(M(p^{\prime}_{i}))$ , the only possibility is $u^{out}\in V(p^{\prime}_{i})$ . Recall that $p^{\prime}_{i}$ is a path from $s_{i}^{out}$ in $H^{\prime}_{i}$ , we have $u\in S_{i}$ . ∎

Claim 4.5.

For each path $p\in P^{\prime}_{i}$ , exactly a prefix of the path is in $S_{i}$ ; i.e., either $V(p)\subseteq S_{i}$ , or there exists a vertex $F(p)\in V(p),F(p)\not\in S_{i}$ , such that $u\in S_{i}$ for any $u\prec_{p}F(p)$ .

Proof.

Suppose $p\in P^{\prime}_{i},p=(s_{i}=v_{0},v_{1},...,v_{\ell})$ . We have $s_{i}\in S_{i}$ . Suppose $v_{m}\in S_{i}$ for some $m>0$ , which means $v_{m}^{out}$ is reachable from $s_{i}^{out}$ in $H^{\prime}_{i}$ . Since we have $(v_{m-1},v_{m})\in\oplus(E(P_{i}\cup\{M(p^{\prime}_{i})\}))$ , according to 4.4, we get $v_{m-1}\in S_{i}$ . Thus, for any $\forall u\prec_{p}v_{m}$ , we have $u\in S_{i}$ , which leads to the claim. ∎

Claim 4.6.

In $H$ , all neighbors of $w\in S_{i}$ is either in $S_{i}$ or in $F(p)$ for some $p\in P^{\prime}_{i}$ .

Proof.

Suppose $w^{\prime}$ is a neighbor of $w$ , if $w^{\prime}\not\in V_{I}(P_{i})$ , then there is an edge from $w^{out}$ to $w^{\prime out}$ , which means $w^{\prime out}$ can be reached from $w^{out}$ in $H^{\prime}_{i}$ . Thus, $w^{\prime}\in S_{i}$ . Now suppose $w^{\prime}\in V_{I}(P_{i})$ and $w^{\prime}\not\in S_{i}$ . There is an edge from $w^{out}$ to $w^{\prime in}$ , which means $w^{\prime in}$ is reachable from $s_{i}^{out}$ in $H^{\prime}_{i}$ . Suppose $w^{\prime}\in V_{I}(p^{*})$ for some $p^{*}\in P_{i}$ , and $p^{*}=(s_{i}=v_{0},v_{1},...,v_{m-1},v_{m}=w^{\prime},v_{m+1},...)$ , i.e., $v_{m-1},v_{m+1}$ are two vertices adjacent to $w^{\prime}$ in $p^{*}$ . We first show that $(v_{m},v_{m+1})\in\oplus E(P_{i}\cup\{M(p^{\prime}_{i})\})$ . To prove this, we only need to show that $(v_{m+1},v_{m})\not\in E(M(p^{\prime}_{i}))$ . Actually, either $v^{out}_{m+1}$ or $v^{in}_{m+1}$ is in $E(p^{\prime}_{i})$ , both means $v^{out}_{m}$ is reachable from $s_{i}^{out}$ and $v_{m}\in S_{i}$ , which is a contradiction. Thus, $(v_{m+1},v_{m})\not\in E(M(p^{\prime}_{i}))$ is true. Then we show that $v_{m}\in V(P^{\prime}_{i})$ . To prove this, notice that $(v_{m},v_{m+1})$ is either in a circle, or in $E(P^{\prime}_{i})$ according to Lemma 3.3. Suppose $(v_{m},v_{m+1})$ is in a circle $C$ . We have $C\subseteq\oplus E(P_{i}\cup\{M(p^{\prime}_{i})\})$ and $s_{i}\not\in C$ . Since $P_{i}$ are internally vertex disjoint simple paths, there must exist an edge in $C$ that is in $E(M(p^{\prime}_{i}))$ . An end point of this edge must be in $S_{i}$ . According 4.4, all vertices on the circle are in $S_{i}$ , which is a contradiction to $v_{m}\not\in S_{i}$ . Thus, the only possibility is $(v_{m},v_{m+1})\in E(P^{\prime}_{i})$ . Let $p^{\prime\prime}\in P^{\prime}_{i}$ be the path that $v_{m}=w^{\prime}\in V(p^{\prime\prime})$ , and $(w^{\prime}_{pre},w^{\prime})\in E(p^{\prime\prime})$ . Recall that $w^{\prime in}$ is reachable from $s_{i}^{out}$ in $H^{\prime}_{i}$ . According to 4.4, $w^{\prime}_{pre}\in S_{i}$ . According to 4.5, $w^{\prime}=F(p^{\prime\prime})$ , which finish the proof. ∎

We divide the paths in $P_{1}\cup P_{2}$ into three types, defined as follows. For $i=1,2$ , we define $i^{\prime}=2,1$ as the opposite index. Suppose path $p$ starts at $s_{i}$ and ends at $t_{p}$ , $p$ is called

type 1, if $F(p)$ do not exists (which means $V(p)\subseteq S_{i}$ according to 4.5), and $t_{p}\in S_{i^{\prime}}$ . 2. 2.

type 2, if $F(p)$ exists, and $F(p)\in S_{i^{\prime}}$ . 3. 3.

type 3, if it is not type 1 or type 2 path.

Among all the paths, let the number of type 1, type 2, type 3 paths be $n_{1},n_{2},n_{3}$ respectively. For a type 1 or type 2 path $p=(s_{i}=w_{0},w_{1},w_{2},...,w_{\ell},...)$ , where we set $w_{\ell}=t_{p}$ if $F(p)$ do not exists, and $w_{\ell}=F(p)$ if $F(p)$ exists. we define $v(p)=w_{m}$ as the vertex with the smallest index $m\in(0,\ell)$ satisfying $w_{m+1}\in S_{i^{\prime}}$ and $w_{m}\not\in S_{i^{\prime}}$ . Since we have $w_{\ell}\in S_{i^{\prime}}$ and $w_{1}\not\in S_{i^{\prime}}$ (If $w_{1}\in S_{i^{\prime}}$ , then $s_{i^{\prime}}$ can reach $w_{1}^{out}$ , in which case $s_{j}$ can reach $w_{0}^{in}$ or $w_{0}^{out}$ , but remember that we assume $s_{j}$ cannot reach $w_{0}^{in}$ or $w_{0}^{out}$ ), $v(p)$ must exists. We first show that for a different type 1 or type 2 path $p^{\prime}\not=p$ , it must hold that $v(p)\not=v(p^{\prime})$ . Otherwise, suppose $v(p)=v(p^{\prime})$ , where $p^{\prime}=(s_{j}=w^{\prime}_{0},w^{\prime}_{1},...,w^{\prime}_{m^{\prime}}=w_{m}=v(p)=v(p^{\prime}),w^{\prime}_{m+1},...)$ . Note that $i\not=j$ since $P^{\prime}_{i}$ are internally vertex disjoint paths. Further, $w_{m}\in S_{j}$ , according to 4.5. That leads to a contradiction to the definition of $v(p)$ .

Since there are $n_{1}+n_{2}$ type 1 or type 2 paths, we get in total $n_{1}+n_{2}$ such vertices $v(p)$ and corresponding edge $(w_{m}=v(p),w_{m+1})$ , where $w_{m+1}$ is on $S_{i^{\prime}}$ and $w_{m}$ is not in $S_{i^{\prime}}$ . Thus, according to 4.6, $w_{m-1}\in F(p^{*})$ for some $p^{*}\in P^{\prime}_{i^{\prime}}$ . Path $p^{*}$ is an type 2 path since $F(p^{*})\in S_{i}$ . There are $n_{2}$ type 2 paths, and $v(p)=w_{m}$ is distinct for different type 1 or 2 paths $p$ , we have $n_{1}+n_{2}=n_{2}$ , which means $n_{1}=0$ . However, remember that $P^{\prime}_{1}$ contains a path ending at $t$ , and $t\in S_{1}\cap S_{2}$ , which is a type 1 path. That leads to a contradiction since type $1$ paths do exist.

4.3 Analysis of Algorithm

Round complexity.

We first bound the number of while loops in line 2.

Lemma 4.7.

For any $i$ , the while loop in line 2 contains $O(\log n)$ loops.

Proof.

We will prove that each while loop will decrease the size of $A^{\prime}$ by at least by half. Since pairs in $Pairs$ are disjoint, and vertices not inside $Pairs$ are all deleted from $A^{\prime}$ , we just need to prove line 2 will find a path either from $u^{out}$ to $v^{out}$ or from $v^{out}$ to $u^{out}$ . Notice that $(u,v)$ are inside $Pairs$ because in line 2, the algorithm $Alg_{u},Alg_{v}$ collide at some edge $(w_{1},w_{2})$ . Remember that we duplicate each edge into two edges one for transferring messages from out-vertex and one for transferring messages from in-vertex, so there are two cases to consider

•

$Alg_{u},Alg_{v}$ both send from out-vertex $w^{out}_{1}$ . That means the DFS-token from $u^{out}$ and $v^{out}$ both arrives $w^{out}_{1}$ , which means there is a path from $u^{out}$ and $v^{out}$ to $w^{out}_{1}$ in the residual graphs, satisfying the precondition of Lemma 4.3. Also notice that the mapping of these paths is included in $H_{(u,v)}$ . Thus, by DFS searching in $H_{(u,v)}\cup E(P_{u})\cup E(P_{v})$ , at least one of $u^{out}$ or $v^{out}$ will find path to $v^{in},v^{out}$ or $u^{in},u^{out}$ . Let us consider the case where $u^{out}$ reach $v^{in},v^{out}$ , we will argue that $v^{in}$ does not exist in the residual graph of $u$ : that is because the truncating of 4.1, any internal vertex of a path in $P_{u}$ cannot contain $v$ . Finally, we have $u^{out}$ reach $v^{out}$ in the residual graph.

•

$Alg_{u},Alg_{v}$ both send from in-vertex $w^{in}_{1}$ . Recall that in Definition 3.2, $w^{in}_{1}$ has only one out-neighbor, which is an out-vertex $w^{out}_{2}$ . Besides, the edge $(w_{2},w_{1})$ is in both $P_{u},P_{v}$ . Therefore, $u^{out},v^{out}$ both have paths to $w^{out}_{2}$ in the residual graphs, and the mapping of these paths are inside $H_{(u,v)}\cup E(P_{u})\cup E(P_{v})$ . Thus, by DFS searching in $H_{(u,v)}\cup E(P_{u})\cup E(P_{v})$ , at least one of $u$ or $v$ will successfully update $P_{u}$ or $P_{v}$ based on the same argument as above.

This finishes the proof. ∎

Then we bound the complexity of line 2, 2.

Lemma 4.8.

For each $v\in A^{\prime}$ such that $Alg_{v}$ is not stopped, let $H_{v}$ be a subgraph containing all edges involved in $Alg_{v}$ . $H_{v}$ has dilation $O(\kappa\alpha)$ , and for any edge $e$ , the number of different $s\in A^{\prime}$ where $e\in H_{s}$ is bounded by $O(\kappa\alpha)$ .

For each pair $(u,v)$ in $Pairs$ , let $H^{\prime}_{(u,v)}=H_{(u,v)}\cup E(P_{u})\cup E(P_{v})$ in line 2, the dilation of $H^{\prime}_{(u,v)}$ is bounded by $\widetilde{O}(\kappa^{2}\alpha)$ , and for any edge $e$ , the number of different pairs $(x,y)$ where $e\in H_{(x,y)}$ is bounded by $\widetilde{O}(\kappa^{2}\alpha)$ .

Proof.

According to Fact 4.2, $H_{v}$ has dilation $O(\kappa\alpha)$ . Consider the for loop 2, each round each edge can transfer one message, which means each edge can be included in at most one $H_{v}$ in each round. Since there are $O(\kappa\alpha)$ rounds, the first part of the lemma is proved.

The above arguments also hold for $H_{(u,v)}$ . To bound the dilation and congestion for $H^{\prime}_{(u,v)}$ , we focus on analysing $E(P_{u})$ and $E(P_{v})$ . The updating rule of $P_{u}$ guarantees that if an edge is included in $P_{u}$ , it must be transferring message in line 2 for $Alg_{u}$ at some previous loops. Since in each round, each edge can transfer one message, and line 2 runs in $O(\kappa\alpha)\cdot O(\log n)\cdot\kappa=O(\kappa^{2}\alpha)$ , the lemma is proved. The term $O(\log n)$ comes from the number of while loops 2 according to Lemma 4.7, and $\kappa$ term comes from the number of outer loops 2. ∎

The round complexity claimed in Lemma 2.1 is $\kappa\cdot O(\log n)\cdot\widetilde{O}(\kappa^{2}\alpha)=\widetilde{O}(\kappa^{3}\alpha)$ , the third term is the complexity of inner loop and the first two terms are the number of outer loops and inner loops.

Correctness.

Then we prove the correctness. We first prove that, if a cut is output, it must be a valid cut. Recall in Algorithm 1, a cut is output iff. the token goes back to $u_{0}$ . This can only happen when $S^{\prime}$ contains all the vertices that $u^{out}$ can reach. According to Lemma 3.4, $\{v\in V\mid v^{in}\in S^{\prime},v^{out}\not\in S^{\prime}\}$ is a cut as long as $V^{\prime}\backslash S^{\prime}\not=\emptyset$ . The latter claim is because $\kappa<n^{1/4},\alpha<\sqrt{n}$ . As $|S^{\prime}|=O(\kappa\alpha)$ , $V^{\prime}\backslash S^{\prime}$ must be non-empty.

Now suppose there exists $s\in A,L\subseteq V$ such that $|\mathsf{N}(L)|<\kappa,A\cap\mathsf{N}^{+}_{v}(L)=\{s\},|L|\leq\alpha$ , we will prove that $\bot$ will be output with at most constant probability. Denote $R=V\backslash\mathsf{N}^{+}(L)$ . If all paths in $P_{s}$ end at $R$ , then according to Lemma 3.3, there exists $\kappa$ internally vertex disjoint path between $s$ and $R$ , which contradiction the fact that $|\mathsf{N}(L)|<\kappa$ . Let $i$ be the first outer loop (line 2) where the ending set of $P_{s}$ contains a vertex not in $R$ . Let $p^{\prime}$ be the path found in line 1. If $p^{\prime}$ is truncated by Remark 4.1 by $t$ where $t$ is an end vertex of a path $p\in P$ , the ending set of $P_{s}$ is the same in the $(i-1)$ -th loop; if $p^{\prime}$ is truncated by Remark 4.1 by $t$ where $t\not=s,t\in A$ , then $t$ must be in $R$ . Thus, in the $i$ -th loop, $p^{\prime}$ must end at a vertex in $\mathsf{N}^{+}(L)$ .

Let $I_{i}$ be the event that the first outer loop (line 2) where the ending set of $P_{s}$ contains a vertex not in $R$ is loop $i$ . Let $p^{\prime}$ be the path found in line 1 in loop $i$ . The endpoint of $p^{\prime}$ is a random vertex among $\Omega(\kappa\alpha)$ vertices, which lies in $\mathsf{N}^{+}(L)$ with probability bounded by $O(1/\kappa)$ since $|\mathsf{N}^{+}(L)|=O(\alpha+\kappa)$ . Thus, the event $I_{i}$ happens with probability $O(1/\kappa)$ . If the algorithm return $\bot$ , then one of $I_{1},I_{2},...,I_{\kappa}$ must happen. By union bound, ”the algorithm return nothing” has probability bounded by $O(1/\kappa)\cdot\kappa$ . By letting the constant hidden in $O$ sufficiently small, the probability is bounded by a constant.

5 SingleSourceLocalCut (Proof of Lemma 2.2)

In this section we prove Lemma 2.2.

5.1 Path Centered Clustering

We first give the definition of paths centered clustering promised at Remark 2.9. For a path $p=(v_{0},v_{1},...,v_{\ell})$ and vertices $v_{i},v_{j}$ with $i<j$ , we write $v_{i}\prec_{p}v_{j}$ to denote $v_{i}$ precedes $v_{j}$ on path $p$ .

Definition 5.1 (Paths centered clustering).

For an undirected graph $G(V,E)$ with diameter $D$ and a set of $k$ simple paths $P$ , a paths centered clustering on $(G,P)$ is a tuple $\mathcal{C}=(\mathcal{S},Centers,Rep,LU)$ , where $\mathcal{S}$ is a partition151515A set of vertex sets $\mathcal{S}$ is defined to be a partition of a vertex set $U$ , if any two sets in $\mathcal{S}$ are disjoint, and the union of $\mathcal{S}$ is $U$ . of $V\backslash V(P)$ ; $Centers,Rep,LU$ are functions on $\mathcal{S}$ , each $S\in\mathcal{S}$ is called a cluster. For any $S\in\mathcal{S}$ , we have

$Centers[S]$ * is a vertex set containing at most one vertex in each path $p\in P$ , and $Rep[S]\in Centers[S]$ .* 2. 2.

$G[S]$ * is connected; each vertex in $Centers[S]$ is a neighbor of some vertex in $S$ ; the subgraph $\{(u,v)\mid u\in S,v\in S\cup Centers[S]\}$ has diameter at most $kD$ .* 3. 3.

Suppose $Rep[S]$ is on path $p\in P$ . We write $S_{v}$ as the cluster $S_{v}\in\mathcal{S}$ containing $v$ if $v\in V\backslash V(P)$ . Then there exists an edge $(u,v)$ such that $u\in S$ , and $v$ satisfies

•

If $LU[S]=0$ , then either $v\in V(p),v\prec_{p}Rep[S]$ , or there exists $c\in Centers[S_{v}]$ such that $c\prec_{p}Rep[S]$ . $S$ is call a lower cluster of $Rep[S]$ .

•

If $LU[S]=1$ , then either $v\in V(p),Rep[S]\prec_{p}v$ or there exists $c\in Centers[S_{v}]$ such that $Rep[S]\prec_{p}c$ . $S$ is call a upper cluster of $Rep[S]$ .

Recall that in the simplified version defined in Section 2.2, the clustering is a partition of $V$ and each cluster contains exactly one vertex in the flow-paths $P$ as its center. However, it is different in the above definition: clustering is a partition of $V\backslash V(P)$ , each cluster $S$ contains no vertex in $P$ , but still has a unique ”representative” $Rep[S]$ adjacent to $S$ in $P$ . Moreover, each node in $P$ might be the ”representative” of several different clusters. The function $LU$ defines whether $S$ is connected to the upper part or the lower part of the ”representative” path. See Figure 4 as an example.

The following definition defines a partial order $\preceq_{\mathcal{C}}$ on $V$ based on a paths centered clustering $\mathcal{C}$ . Using the following definition, the problem mentioned in Remark 2.9 is solved, see Remark 5.3.

Definition 5.2 (Paths centered order).

A paths centered order on the paths centered clustering $\mathcal{C}=(\mathcal{S},Centers,Rep,LU)$ on $(G,P)$ is a partial order $\preceq_{\mathcal{C}}$ defined as follows. We first extend functions $Rep,LU$ to all vertices: for $v\in V(P)$ , $Rep[v]=v,LU[v]=1$ ; for $v\not\in V(P)$ , let $S_{v}\in\mathcal{S}$ be the cluster that $v$ is in, then $Rep[v]=Rep[S_{v}],LU[v]=LU[S_{v}]$ . For $u,v\in V$ , we define $u\preceq_{\mathcal{C}}v$ iff. $Rep[u],Rep[v]$ are both on path $p\in P$ and $Rep[u]\prec_{p}Rep[v]$ , or $Rep[u]=Rep[v],LU[u]\leq LU[v]$ .

*Remark 5.3**.*

The paths centered ordering is defined based on the following intuition: if $u\preceq_{\mathcal{C}}v$ , then $u$ can reach $v$ in the vertex residual graph $G^{\prime}$ on $(G,P)$ . See Figure 5 for an example.

The following lemma shows there exists a fast algorithm to compute paths centered clustering.

Lemma 5.4 (Clustering; Proof in Section A.1).

On an undirected graph $G=(V,E)$ with diameter $D$ , there exists a CONGEST model algorithm given a set of vertex disjoint simple paths $P$ , either output a vertex cut with size at most $|P|$ , or compute a path centered clustering on $(G,P)$ , denoted as $(\mathcal{S},Centers[S],Rep[S])$ , where each vertex in $S\in\mathcal{S}$ knows $S,Centers[S],Rep[S],LU[S]$ . The algorithm has dilation $\tilde{O}(|P|^{2}D)$ and congestion $\widetilde{O}(|P|^{2})$ .

5.2 Algorithm Overview

Partial Virtual Graph.

Recall that in Section 2.2, we showed the framework of reachability in the CONGEST model: construct a virtual graph which depicts the reachability of the original graph. We also showed that it is not efficient to build the virtual graph on all sample hubs. In fact, any edge in the virtual graph starts at active hubs. The reason is that if we can reach a non-active hub, then we can find the desired path and can stop. This leads to the following definition of partial virtual graph, where the partial virtual graph is only guaranteed to either depict the reachability of the graph or can reach the desired destination.

For two vertices in graph $G$ , we use $u\stackrel{{\scriptstyle G}}{{\to}}v$ to denote an arbitrary path from $u$ to $v$ in $G$ .

Definition 5.5 (Partial Virtual Graph).

For a directed graph $G=(V,E)$ and a set $T\subseteq V$ , a partial virtual graph on $(G,T)$ with dilation $d$ is a virtual graph $G^{\prime}=(V^{\prime},E^{\prime})$ where $V^{\prime}\subseteq V$ satisfying the following property: For any two vertices $s\in V^{\prime},v\in V$ , if $s\stackrel{{\scriptstyle G}}{{\to}}v$ exists, then there exists $v^{\prime}\in V^{\prime},s\stackrel{{\scriptstyle G^{\prime}}}{{\to}}v^{\prime}$ and $v^{\prime}$ satisfies one of the following two conditions.

$v^{\prime}\in T$ . 2. 2.

$v^{\prime}$ * has distance at most $d$ to $v$ in $G$ .*

The following is the schematic of our algorithm. We omit most of the details to give the reader a high-level idea of what the algorithm is doing. The implementation details will be presented in the following sub-sections.

Schematic of SingleSourceLocalCut( $G,s,t,\kappa,\alpha$ )

•

Inputs: An undirected graph $G=(V,E)$ , vertices $s,t\in V$ , integers $\kappa,\alpha,d\in\mathbb{N}$ with $\kappa<n^{\frac{1}{4}}$ .

•

Outputs: A valid $\kappa$ -vertex cut, or $\bot$ .

•

Initially set $P=\emptyset$ . $P$ is a set of internally vertex disjoint paths starting from $s$ . Let $d=\kappa^{1.5}\sqrt{n}$ . Repeat the following steps for $\kappa$ loops.

Each vertex becomes a hub with probability $p=1/d$ . $s,t$ become hubs with probability $1$ . Use Lemma 5.4 on $(G,P)$ to get a path centered clustering $\mathcal{C}$ , or Return a vertex cut. Each hub $v$ becomes an active hub if the number of hubs $u$ with $u\preceq_{\mathcal{C}}v$ is bounded by $O(\kappa\alpha p\log n)$ .

Let $G^{\prime}$ be the vertex residual graph on $(G,P)$ . If $v$ is a hub or active hub in $V$ , then $v^{out}$ is a hub or active hub in $V^{\prime}$ . Let $H$ contain all the hubs and $H_{a}$ contain all the active hubs in $V^{\prime}$ . Each vertex in $H_{a}$ broadcast $\widetilde{O}(\kappa)$ bits messages to the whole graph (will be specified in Section 5.4, analogue to ”Downward edges” and ”upward edges” described in Section 2.2). Using these informations, a partial virtual graph $G^{\prime\prime}$ with dilation $\Theta(d\log n)$ on $(G^{\prime},T=\{t^{out}\}\cup H\backslash H_{a})$ can be known by all vertices.

Let $H_{r}$ contain all the vertices that $s$ can reach in $G^{\prime\prime}$ . if $|H_{r}|=\Omega(\kappa\alpha p\log n)$ , then sample a uniformly random $h_{1}$ from $\Theta(\kappa\alpha p\log n)$ vertices in $H_{r}$ , and let $p^{\prime\prime}=s\stackrel{{\scriptstyle G^{\prime\prime}}}{{\to}}h_{1}$ ; otherwise, if $t^{out}\in H_{r}$ , let $p^{\prime\prime}=s\stackrel{{\scriptstyle G^{\prime\prime}}}{{\to}}t^{out}$ ; otherwise, if there exists $h_{0}\in H_{r}\cap(H\backslash H_{a})$ , then sample a uniformly random hub $h^{*}$ among all hubs $h$ with $h\preceq_{\mathcal{C}}h_{0}$ and let $p^{\prime\prime}=s\stackrel{{\scriptstyle G^{\prime\prime}}}{{\to}}h_{0}\stackrel{{\scriptstyle G^{\prime}}}{{\to}}h^{*}$ ; otherwise, for each $h\in H_{r}$ , let $S^{\prime}$ contain all the vertices that a vertex in $H_{r}$ can reach with distance $\Theta(d\log n)$ , if $S^{\prime}$ has no outgoing edges, Return the vertex cut $\{v\in V\mid v^{in}\in S^{\prime},v^{out}\not\in S^{\prime}\}$ , otherwise Return $\bot$ .

Map $p^{\prime\prime}$ into a path $p^{\prime}$ in $G^{\prime}$ . Update $P$ by the augmenting path $p^{\prime}$ using Lemma 3.3.

•

If no cut is output, Return $\bot$ .

Distributed implementation organization.

One can see that there are four steps in each loop. We will describe the implementation details for each step in the following subsections.

Step 1 (Section 5.3):

We will describe how to get hubs and active hubs, this can be done easily by aggregation through clusters and paths.

Step 2 (Section 5.4):

We will define the partial virtual graph $G^{\prime\prime}$ by giving $5$ types of edges in $G^{\prime\prime}$ . We will also show that those types of edges can be constructed by broadcasting $\widetilde{O}(\kappa)$ bits of messages by each active vertices.

Step 3 (Sections 5.5 and 5.6):

There are four if-else possibilities in step 3, we will give more details for each possibility in Section 5.5. One important detail is how to get the path $h_{0}\stackrel{{\scriptstyle G^{\prime}}}{{\to}}h^{*}$ , which basically shows how to get the path according to Remark 5.3. This is in Section 5.6.

Step 4 (Section 5.7):

The final step will give details about mapping paths in $G^{\prime\prime}$ into paths in $G^{\prime}$ , by distributively mapping each type of edge in $G^{\prime\prime}$ to paths in $G^{\prime}$ . The final step also contains the details of updating $P$ .

5.3 Step 1: Find hubs and active hubs.

We first introduce a basic algorithm that we will frequently use. The proof is deferred to Appendix C. The lemma mainly shows how to efficiently aggregate values in a path by divide and conquer.

Lemma 5.6 (Path aggregation; Proof in Appendix C).

There exists a CONGEST algorithm given a directed graph $G=(V,E)$ with undirected diameter $D$ , given a path $p=(v_{0},v_{1},...,v_{k})$ , an integer $d$ such that $1\leq d\leq k$ , each vertex $v_{i}$ on the path get a polynomial bounded integer $x[v_{i}]$ (we treat repeated vertices on the path as different vertices); each vertex $v_{i}$ outputs the value $\sum_{0\leq j\leq i}x[v_{j}]$ . The algorithm has dilation $\tilde{O}(d+D)$ and congestion $\tilde{O}(k/d)$ .

Now we show how to compute hubs and active hubs. Firstly, we calculate for each vertex $v\in V(P)$ , its path’s index $PI[v]$ (defined later) and its own index $I[v]$ (defined later) on the path. This can be done by running the algorithm in Lemma 5.6 two times for each path $p_{i}\in P$ . Firstly $x_{j}=1$ for all $j$ to get $I[v]$ ; secondly $x_{j}=0$ for all $j$ except $x_{0}=i$ to get $PI[v]$ . Since there are at most $\kappa$ paths, each has length bounded by $\widetilde{O}(\kappa\alpha)$ , this cause a dilation $\tilde{O}(d+D)$ and congestion $\tilde{O}(\kappa^{2}\alpha/d)$ .

Then we use Lemma 5.4 on $(G,P)$ to create $\mathcal{C}=(S,Centers,Rep,LU)$ . Notice that $H_{S}=(S\cup Centers[S],\{(u,v)\mid u\in S,v\in S\cup Centers[S]\})$ is a subgraph with diameter at most $O(kD)$ , and are disjoint for different $S$ . Thus, in the following, we always assume the aggregation problem inside $S$ happens in $H_{S}$ , and are computed in parallel for all $S$ . Each vertex $v\in S$ can get the information $PI[Rep[S]],I[Rep[S]]$ , by broadcasting inside $H_{S}$ . Recall that we sample each vertex in $G$ as hub with probability $p=1/d$ , and $s$ is a hub. For each $v\in V(P)$ , let $Num_{L}[v]$ denote the number of hubs inside all lower clusters of $v$ , and let $Num_{R}[v]$ denote the number of hubs in all upper clusters of $v$ plus an indicator variable that equals to $1$ if $v$ is a hub. This can be computed by collecting the number of hubs inside $S$ and sending it to $Rep[S]$ using $H_{S}$ . This procedure case dilation $\widetilde{O}(\kappa D)$ and congestion $\widetilde{O}(1)$ .

Recall that we defined $\preceq_{\mathcal{C}}$ as the paths centered ordering in Definition 5.2. Now we want to compute for each hub $h$ , the number of hubs $h^{\prime}$ such that $h^{\prime}\preceq_{\mathcal{C}}h$ , denoted as $\mathsf{Before}_{\mathcal{C}}[h]$ . This can be done by running the algorithm in Lemma 5.6 for each path $p$ , each vertex $v$ on the path receives input value $Num_{L}[v]+Num_{U}[v]$ . Upon each $v$ receives the output of the algorithm $\sum_{v^{\prime}\leq_{p}v}(Num_{L}[v^{\prime}]+Num_{U}[v^{\prime}])$ , it sends the output to all its upper clusters and sends the output minus $Num_{U}[v]$ to all its lower clusters. At last each hub $h$ becomes an active hub if $\mathsf{Before}_{\mathcal{C}}[h]=\widetilde{O}(\kappa\alpha p)$ . This guarantees that the number of active hubs is bounded by $\widetilde{O}(\kappa^{2}\alpha p)$ . This can be done with dilation $\widetilde{O}(d+\kappa D)$ and congestion $\widetilde{O}(\kappa^{2}\alpha/d)$ .

5.4 Step 2: Build partial virtual graph.

In this step, we show how to build the partial virtual graph $G^{\prime\prime}$ on the vertex residual graph efficiently, mainly by reducing the number of edges in the partial virtual graph while maintaining the mutual reachability relationship and broadcasting a small amount of information to let all vertices know all edges in $G^{\prime\prime}$ .

Recall that $G^{\prime}$ is the vertex residual graph on $(G,P)$ , and $v^{out}$ is a hub or active hub in $G^{\prime}$ iff. $v$ is a hub or active hub in $G$ . There are five types of edges in the partial virtual graph. We will define them and also show what information should each active vertex broadcast to build the edges. Recall that in Section 2.2, we only give two types of edges, namely upwards edges and downwards edges. The 5 types of edges is an extensions of it in order to make it easy to recover a path in $G^{\prime}$ from the edge in $G^{\prime\prime}$ (see Section 5.7 for how to get the path in $G^{\prime}$ from each type of edge). For convenience, for each $v^{out}$ , we use $S_{v^{out}}$ to denote the cluster that contains $v$ . For convenience, if $v\in V(P)$ , then we also define $S_{v^{out}}=\{v\}$ and $Centers[S_{v^{out}}]=\{v\},Rep[S_{v^{out}}]=v,LU[S_{v^{out}}]=1$ .

Type 1: Edges inside clusters.

Each active hub broadcasts its cluster’s id (each cluster has a unique cluster id, it can be the largest vertex id in the cluster, for example) to the whole graph. All the hubs in the same cluster form a clique in $G^{\prime\prime}$ .

**Type 2: Upwards edges for upper clusters. **

For each active hub $h$ , it broadcasts tokens to build a BFS tree with depth $\Theta(d\log n)$ on $G^{\prime}$ , denoted as $T_{h}$ . For each $p\in P$ , we define an order $\preceq^{\prime}_{p}$ over all active hubs $h^{\prime}\in T_{h}$ with $Centers[S_{h^{\prime}}]\cap V(p)\not=\emptyset$ as follows: Recall that $Centers[S_{h^{\prime}}]\cap V(p)$ can contain at most $1$ element, denote the only element as $v_{h^{\prime}}$ . Define $LU_{p}[S_{h^{\prime}}]=LU[S_{h^{\prime}}]$ if $v_{h^{\prime}}=Rep[S_{h^{\prime}}]$ , and $LU_{p}[S_{h^{\prime}}]=-1$ if $v_{h^{\prime}}\not=Rep[S_{h^{\prime}}]$ . Then $h_{1}\preceq^{\prime}_{p}h_{2}$ iff. $v_{h_{1}}\prec_{p}v_{h_{2}}$ or $v_{h_{1}}=v_{h_{2}},LU_{p}[S_{h_{1}}]\leq LU_{p}[S_{h_{2}}]$ . If there exists $h^{\prime}\in T_{h}$ with $Centers[S_{h^{\prime}}]\cap V(p)\not=\emptyset$ , then $h$ broadcasts the partial virtual graph edge $(h,h^{*})$ to the whole graph where $h^{*}$ is an arbitrary maximal active hub with respect to $\preceq^{\prime}_{p}$ , i.e., for any active hubs $h^{\prime}\in T_{h}$ such that $Centers[S_{h^{\prime}}]\cap V(p)\not=\emptyset$ , we have $h^{\prime}\preceq^{\prime}_{p}h^{*}$ . This can be found by aggregation on $T_{h}$ .

**Type 3: Upwards edges for lower clusters. **

For each active hub $h$ , it builds a reversed BFS tree $T^{\prime}_{h}$ with depth $O(d\log n)$ , i.e., all vertices in $T^{\prime}_{h}$ can reach $h$ by a path with length at most $O(d\log n)$ . For each $p\in P$ , if there exists active hub $h^{\prime}\in T^{\prime}_{h}$ with $Centers[S_{h^{\prime}}]\cap V(p)\not=\emptyset$ , then $h$ broadcasts the partial virtual graph edge $(h^{*},h)$ to the whole graph where $h^{*}$ is an arbitrary minimal active hub with respect to $\preceq^{\prime}_{p}$ , i.e., for any active hub $h^{\prime}\in T^{\prime}_{h}$ such that $Centers[S_{h^{\prime}}]\cap V(p)\not=\emptyset$ , we have $h^{*}\preceq^{\prime}_{p}h^{\prime}$ . This can be found by aggregation on $T^{\prime}_{h}$ .

Type 4: Downwards edges.

For each active hub $h$ , for any $c\in Centers[S_{h}]$ , it broadcasts $c$ ’s path id and position on path $PI[c],I[c]$ , and two indicator $To_{c}[h],From_{c}[h]\in\{0,1,2\}$ defined by: If $c^{out}\in T_{h}$ then $To_{c}[h]=2$ , otherwise if $c^{in}\in T_{h}$ then $To_{c}[h]=1$ , otherwise $To_{c}[h]=0$ ; similarly if $c^{in}\in T^{\prime}_{h}$ then $From_{c}[h]=2$ , otherwise if $c^{out}$ in $T^{\prime}_{h}$ then $From_{c}[h]=1$ , otherwise $From_{c}[h]=0$ . Now type 4 edges contains all the edges $(h_{1},h_{2})$ such that there exists $c_{1}\in Centers[S_{h_{1}}],c_{2}\in Centers[S_{h_{2}}]$ satisfying $PI[c_{1}]=PI[c_{2}],To_{c}[h_{1}]>0,From_{c}[h_{2}]>0$ , and either $0<I[c_{1}]-I[c_{2}]<d\log n$ , or $To_{c}[h]+From_{c}[h]>2,I[c_{1}]=I[c_{2}]$ . Notice that in that case, $h_{1}$ can reach $h_{2}$ through a path with length bounded by $\widetilde{O}(d)$ .

Type 5: Terminal edges.

For each active hub $h$ , if $T_{h}$ contains a vertex $v$ in $\{t\}\cup H\backslash H_{a}$ , then it broadcast the edge $(h,v)$ to the whole graph.

One can see that each vertex broadcasts at most $\widetilde{O}(\kappa)$ messages to the whole graph, and the tree $T_{h},T^{\prime}_{h}$ aggregate $\kappa$ messages. Since there are at most $\widetilde{O}(\kappa^{2}\alpha p)$ active hubs, this cause a dilation $\widetilde{O}(d+D)$ and congestion $\widetilde{O}(\kappa^{3}\alpha p)$ . Now every vertex knows the same graph $G^{\prime\prime}$ locally. The following lemma shows that $G^{\prime\prime}$ is a partial virtual graph. The proof is deferred to Section A.

Lemma 5.7 (Partial virtual graph; Proof in Section A.2).

With high probability, the $G^{\prime\prime}$ defined by the above 5 types of edges is a partial virtual graph on $(G^{\prime},T=\{t^{out}\}\cup H\backslash H_{a})$ with dilation $\Theta(d\log n)$ .

5.5 Step 3: Find augmenting path.

One can see that there are four cases in Step 3. In the first three cases, a path $p^{\prime\prime}$ should be found, in the last case, a cut should be output. We discuss each case separately in the following. Recall that $H_{r}$ is the set of all the active hubs that $s$ can reach in $G^{\prime\prime}$ .

Case 1 ( $|H_{r}|=\Omega(\kappa\alpha p\log n)$ ):

The reason that we do not uniformly sample a hub among all hubs in $H_{r}$ is that we might finally get a path in $G^{\prime}$ with length $\omega(\kappa\alpha\log n)$ , which we want to avoid. Thus, instead of sampling from $H_{r}$ , we find a subset $H^{\prime}_{r}\subseteq H_{r},|H^{\prime}_{r}|=\Theta(\kappa\alpha p\log n)$ such that $s$ can reach all the vertices in $H^{\prime}_{r}$ through vertices in $H^{\prime}_{r}$ ; and if one hub in a cluster is in $H^{\prime}_{r}$ , then all hubs in the same cluster are all in $H^{\prime}_{r}$ . This can be done since each cluster contains at most $O(\kappa\alpha p\log n)$ hubs in $H_{r}$ . To find $H^{\prime}_{r}$ , we start from $H^{\prime}_{r}=\{s\}$ , repeatedly adding $h\in H_{r}$ that $H^{\prime}_{r}$ can reach, and if $h$ is added, all hubs in the same cluster as $h$ in $H_{r}$ are also added, until $|H^{\prime}_{r}|=\Theta(\kappa\alpha p\log n)$ . Then we sample a hub $h_{1}\in H^{\prime}_{r}$ uniformly at random and find $p^{\prime\prime}$ as the path using vertices in $H^{\prime}_{r}$ from $s$ to $h_{1}$ . All the above procedures can be done locally in each vertex since $G^{\prime\prime}$ is shared by all the vertices. We also need to guarantee that each vertex gets the same path in $G^{\prime\prime}$ , this can be done by raising a leader, sampling a vertex in $H^{\prime}_{r}$ , and each vertex finding the unique path with the smallest lexicographical order to the sampled vertex. We will show how to turn this path into a path in $G^{\prime}$ in Section 5.7.

Case 2 ( $t^{out}\in H_{r}$ ):

The path is $p^{\prime\prime}=s\stackrel{{\scriptstyle G^{\prime\prime}}}{{\to}}t^{out}$ . Similarly, a unique path from $s$ to $h_{2}$ in $G^{\prime\prime}$ is shared by all vertex. We will show how to turn this path into a path in $G^{\prime}$ in Section 5.7.

Case 3 (there exist $h_{0}\in H_{r}$ with $h_{0}\in H\backslash H_{a}$ ):

We need to sample a uniformly random hub $h^{*}$ among all hubs $h$ with $h\preceq_{\mathcal{C}}h_{0}$ . One can see that since $h_{0}$ is not an active hub, the number of hubs $h$ with $h\preceq_{\mathcal{C}}h_{0}$ must be $\Omega(\kappa\alpha\log n)$ . To sample $h$ , recall that in Definition 5.2, $\preceq_{\mathcal{C}}$ only needs the information $Rep[h_{0}],LU[h_{0}]$ . Thus, $h_{0}$ broadcast the path id and position on the path of $Rep[h_{0}]$ and $LU[h_{0}]$ . By using this information, each vertex $h$ with $h\preceq_{\mathcal{C}}h_{0}$ can mark itself. Denote the set of all the marked vertex as $L_{h_{0}}$ . To sample one vertex, each vertex in $L_{h_{0}}$ samples itself with probability $1/|L_{h_{0}}|$ , and aggregates through the whole graph to see whether there is exactly one vertex sample itself. If not, then repeat. The procedure will end with a high probability in $O(\log n)$ rounds. That shows how to sample the $h^{*}$ uniformly at random. The path is $p^{\prime\prime}=s\stackrel{{\scriptstyle G^{\prime\prime}}}{{\to}}h_{0}\stackrel{{\scriptstyle G^{\prime}}}{{\to}}h^{*}$ . We will see how to turn the path $s\stackrel{{\scriptstyle G^{\prime\prime}}}{{\to}}h_{0}$ into a path in $G^{\prime}$ in Section 5.7. According to Remark 5.3, there is a path from $h_{0}$ to $h^{*}$ in $G^{\prime}$ . However, finding the path from $h_{0}$ to $h^{*}$ is a complicated procedure, we defer it to Section 5.6.

Case 4 ( $H_{r}$ do not contain any $t^{out}$ or a non-active hub):

We will need the following claim.

Claim 5.8.

$\cup_{h\in H_{r}}T_{h}$ * contains all the vertices that $s^{out}$ can reach in $G^{\prime}$ with high probability.*

To see this, suppose $v\in V^{\prime}$ is reachable from $s$ but not in $\cup_{h\in H_{r}}T_{h}$ , then according to Lemma 5.7 and Definition 5.5, $s^{out}$ should reach a vertex in $T=\{t^{out}\}\cup H\backslash H_{a}$ , which is a contradiction. Thus, according to Lemma 3.4, by finding all the vertex $v\in V$ that $v^{out}$ is in $\cup_{h\in H_{r}}T_{h}$ , we get a vertex cut with size at most $\kappa$ .

5.6 Substep: Find path from $h_{0}$ to $h^{*}$ .

We first describe a lemma showing how to find a path between two vertices in the same cluster $S\in\mathcal{S}$ .

Lemma 5.9 (Find path in cluster; Proof in Appendix C).

On an undirected graph $H=(V,E)$ with diameter $D$ , given an connected induced subgraph $H[V^{\prime}]$ of $H$ , and two vertices $s,t\in V^{\prime}$ , given an integer $1\leq d\leq N$ , there exists a CONGEST model algorithm on $H$ finding a path from $s$ to $t$ in $H[V^{\prime}]$ , with dilation $\tilde{O}(d+D)$ and congestion $\tilde{O}(|V|/d)$ .

For two vertices $u,v\in S$ , we use $Path_{S}(u,v)$ to denote the path found by Lemma 5.9 from $u$ to $v$ in $S$ , using $H_{S}$ as the communication graph $H$ in the lemma. For $p\in P$ , Let the path $p^{\prime}$ in $G^{\prime}$ be the corresponding backward path of $p$ . We use $Path_{p^{\prime}}(u,v)$ for two vertices $u,v$ in $p^{\prime}$ to denote the subpath from $u$ to $v$ in $p^{\prime}$ . Such a path can be found distributively by broadcasting the position of $u,v$ , all vertices in the path with position between $u,v$ make an edge towards $v$ . In the following, we consider three cases and show how to find the path from $h_{0}$ to $h^{*}$ in each case. For convenience, we assume $M(h_{0}),M(h^{*})\not\in V(P)$ , and the cluster containing $M(h_{0}),M(h^{*})$ are $A,C$ separately. The case when $M(h_{0})$ or $M(h^{*})$ is in $V(P)$ only makes things easier which can be treated similarly. In the following, we will also define cluster $B$ . For convenience, we assume the size of $A,B,C$ are all bounded by $O(\kappa\alpha\log n)$ , which guarantees the length of the path. We will discuss what should we do when one of $A,B,C$ has size $\Omega(\kappa\alpha\log n)$ in step 4. Let $a=Rep[A],c=Rep[C]$ and let $a^{\prime}\in A,c^{\prime}\in C$ be the vertices such that $a^{\prime}$ is a neighbor of $a$ , $c^{\prime}$ is a neighbor of $c$ . Let $p$ be the path that $a,c$ is in.

Case 1 ( $A$ is an upper cluster):

Suppose $A$ is connected to cluster $B$ through edge $(a_{0},b_{0})$ , with $b\in Centers[B]$ and $a\prec_{p}b$ . Let $b^{\prime}$ be the vertices in $B$ that are neighbors of $b$ . Then the path is $Path_{A}(h_{0},a_{0}^{out})\to b_{0}^{out}\to Path_{B}(b_{0}^{out},b^{\prime out})\to b^{in}\to Path_{p^{\prime}}(b^{in},c^{out})\to c^{\prime out}\to Path_{C}(c^{\prime out},h^{*})$ . Note that according to the definition of upper cluster, it might be the case that $A$ is connected to a vertex $b\succ_{p}a$ on the path (but not a cluster). In that case, the problem becomes easier: there is an edge directly from $a_{0}^{out}$ to $b^{in}$ .

Case 2 ( $A$ is a lower cluster and $c\prec_{p}a$ ):

$A$ can go to its representative and then following the backwards of the path to reach $C$ . The path is $Path_{A}(h_{0},a^{\prime out})\to a^{in}\to Path_{p^{\prime}}(a^{in},c^{out})\to c^{\prime out}\to Path_{C}(c^{\prime out},h^{*})$ .

Case 3 ( $A$ is a lower cluster and $a=c$ ):

In that case, $C$ must be a lower cluster. Suppose $C$ is connected to cluster $B$ through edge $(b_{0},c_{0})$ , with $b\in Centers[B]$ and $b\prec_{p}c$ . Let $b^{\prime}$ be the vertices in $B$ that is a neighbor of $b$ . Then the path is $Path_{A}(h_{0},a^{\prime out})\to a^{in}\to Path_{p^{\prime}}(a^{in},b^{out})\to b^{\prime out}\to Path_{B}(b^{\prime out},b_{0}^{out})\to c_{0}^{out}\to Path_{C}(c_{0}^{out},h^{*})$ . Note that according to the definition of lower cluster, it might be the case that $C$ is connected to a vertex $b\prec_{p}c$ on the path (but not a cluster). In that case the problem become easier: there is an edge from $b^{out}$ to $c_{0}^{out}$ .

On the case that one of $A,B,C$ has size $\Omega(\kappa\alpha)$ , we need the following lemma. The proof the lemma is deferred to Appendix C.

Lemma 5.10 (Find small piece in cluster; proof in Appendix C).

On an undirected graph $H=(V,E)$ with diameter $D$ , given an induced subgraph $H[V^{\prime}]$ of $H$ satisfying $|V\backslash V^{\prime}|\leq k$ , $H[V^{\prime}]$ is connected. Given a vertex $s\in V^{\prime}$ , given an integer $x$ , there exists a CONGEST model algorithm on $H$ finding a vertex set $V^{\prime\prime}\subseteq V^{\prime}$ such that $s\in V^{\prime\prime}$ , $|V^{\prime\prime}|=\Theta(x)$ , $H[V^{\prime\prime}]$ is connected, with dilation $\tilde{O}(kD)$ and congestion $\tilde{O}(k)$ .

Without loss of generality, suppose the first of $A,B,C$ that has size $\Omega(\kappa\alpha)$ is $A$ . Then when we want to find the path in $A$ by Lemma 5.9, we first use Lemma 5.10 to find a connected induced subgraph $A^{\prime}$ inside $A$ with $\Theta(\kappa\alpha)$ vertices containing $h_{0}$ , then sample a vertex $t^{*}$ in $A^{\prime}$ uniformly at random, then use Lemma 5.9 to find the path from $h_{0}$ to $t^{*}$ in $A^{\prime}$ . Instead of reaching $h^{*}$ , we reach $t^{*}$ as the final vertex. Same if the first of them is $B$ or $C$ : we stop at the point when we want to find a path inside $B$ or $C$ , and end at $t^{*}$ inside $B$ or $C$ .

Since each cluster has size bounded by $\widetilde{O}(\kappa\alpha)$ , the above procedure has dilation $\widetilde{O}(d+D)$ , congestion $\widetilde{O}(\kappa\alpha/d)$ . The length of the path from $h_{0}$ to $h^{*}$ is bounded by $\widetilde{O}(\kappa\alpha)$ .

5.7 Step 4: Change path in $G^{\prime\prime}$ to $G^{\prime}$ and update $P$ .

Now given a path from $s$ to $h^{\prime}$ in $G^{\prime\prime}$ ( $h^{\prime}$ can be $h_{1},t^{out}$ or $h_{0}$ in step 3), we want to turn it into a path in $G^{\prime}$ . We first refine this path such that it goes into each cluster (hubs with the same cluster IDs, which form a clique) and go out at most once: we put an edge from the first vertex on the path intersecting the clique to the last vertex on the path intersecting the clique and discard the subpath between them. Now we replace each edge $(h_{L},h_{R})$ in $p^{\prime\prime}$ into a path in $G^{\prime}$ . We consider three cases according to the type of $(h_{L},h_{R})$ .

Case 1 (Type 1 edge):

It this case, $h_{L},h_{R}$ are in the same cluster, denoted as $A$ . The path becomes $Path_{A}(h_{L},h_{R})$ in $G^{\prime}$ , which can be found with dilation $\widetilde{O}(d+D)$ and congestion $\widetilde{O}(|A|/d)$ using Lemma 5.9. Let $S[A]$ be the number of hubs inside $A$ , since we sample each hub with probability $p$ , w.h.p we have $|A|=\tilde{O}(S[A]/p)=\tilde{O}(S[A]d)$ . Notice that $Path_{A}(h_{L},h_{R})$ exists at most once for each $A$ , and there are $\widetilde{O}(\kappa\alpha p)$ active hubs inside $H_{r}$ (or $H^{\prime}_{r}$ ), which means all type 1 edges can be turned into paths with dilation $\widetilde{O}(d+D)$ and congestion $\widetilde{O}(\kappa\alpha p)$ , while the total length of these paths is bounded by $\sum|A|=\tilde{O}(\sum S[A]d)=\tilde{O}(\kappa\alpha)$ .

Case 2 (Type 2,3,4,5 edge):

Notice that in this case, $h_{L}$ has distance bounded by $\widetilde{O}(d)$ to $h_{R}$ . The path from $h_{L}$ to $h_{R}$ in $G^{\prime}$ can be found by growing a BFS tree with depth at most $\widetilde{O}(d)$ . Finding all of them cause dilation $\widetilde{O}(d)$ and congestion $\widetilde{O}(\kappa\alpha p)$ , and the sum of the length of all these paths is bounded by $\widetilde{O}(\kappa\alpha)$ .

The total dilation is $\widetilde{O}(d+D)$ , the total congestion is $\widetilde{O}(\kappa\alpha p)$ , and the total length of the first part path is $\widetilde{O}(\kappa\alpha)$ .

Now we get a path in $G^{\prime}$ but not a simple path. We use Lemma 5.6 on the path to find the position of each vertex on the path. If a vertex is repeated, it only preserves the edge point out to the vertices with the largest position id (farthest from $s$ ), and deletes other out edges. Now all the edges form a subgraph, where the connected components of the subgraph containing $s$ is a simple path. The following lemma shows how to find it. The proof is deferred to Appendix C.

Lemma 5.11 (Turn path into simple path; Proof in Appendix C).

On the network $G=(V,E)$ with undirected diameter $D$ , there exists a CONGEST model algorithm given the edge direction, a vertex $s$ , a subgraph $H$ with $\mu$ vertices which contains a path $p$ starting from $s$ , satisfying there are no edges in $H$ between vertices in $p$ and other vertices in $H$ , an integer $1\leq d\leq\mu$ , output for each vertex whether it is in $p$ or not; the algorithm has dilation $\tilde{O}(d+D)$ and congestion $\tilde{O}(\mu/d)$ .

We use Lemma 5.11 to fix the final simple path. Now we get a simple path $p^{\prime}$ on $G^{\prime}$ . Then we check whether the internal points intersect with the endpoints of any path in $P$ , if there exists, then we discard all the vertices after the intersection. According to Lemma 3.3, by computing $\oplus(E(M(p^{\prime}))\cup E(P))$ locally, and use Lemma 5.11, we get $k+1$ internally vertex disjoint paths staring from $s$ . Since $p$ has length bounded by $\widetilde{O}(\kappa\alpha)$ , the step has dilation $\widetilde{O}(d+D)$ and congestion $\widetilde{O}(\kappa\alpha/d)$ .

5.8 Analysis of Algorithm

Round complexity.

The algorithm is described above. As shown in each step of the algorithm details, the total dilation and congestion for all $\kappa$ loops are bounded by $\widetilde{O}(\kappa d+\kappa^{3}D)$ and $\widetilde{O}(\kappa^{4}\alpha/d)$ . Recall that $d=\kappa^{1.5}\sqrt{n}$ , which leads to the dilation $\widetilde{O}(\kappa^{2.5}\sqrt{n}+\kappa^{3}D)$ and congestion $\widetilde{O}(\kappa^{2.5}\alpha/\sqrt{n})$ claimed in Lemma 2.2.

Correctness.

Similarly to Section 4.3, We first prove that, if a cut is output, it must be a valid cut. Recall in the algorithm, a cut is output iff. it falls into the last sentence of Step 3. Since $S^{\prime}$ contains no outgoing edges, $S^{\prime}$ contains all the vertices that $u^{out}$ can reach. According to Lemma 3.4, $\{v\in V\mid v^{in}\in S^{\prime},v^{out}\not\in S^{\prime}\}$ is a cut as long as $V^{\prime}\backslash S^{\prime}\not=\emptyset$ . The latter claim is because $t\not\in S^{\prime}$ .

Now suppose there exists $L\subseteq V$ such that $|\mathsf{N}(L)|<\kappa,\{s,t\}\cap\mathsf{N}^{+}_{v}(L)=\{s\},|L|\leq\alpha$ , we will prove that $\bot$ will be output with at most constant probability. According to 5.8, $\bot$ will not be output at Step 3 with high probability, so we only consider the case that $\bot$ is output at the last line of the algorithm. Let $R=V\backslash\mathsf{N}^{+}(L)$ . If all paths in $P$ end at $R$ , then according to Lemma 3.3, there exists $\kappa$ internally vertex disjoint path between $s$ and $R$ , which contradiction the fact that $|\mathsf{N}(L)|<\kappa$ .

Let $I_{i}$ be the event that the first loop where the ending set of $P$ contains a vertex not in $R$ is loop $i$ . We will bound the probability that $I_{i}$ happens. Let $P$ be the flow-path before the $i$ -th loop. According to the algorithm description, at the $i$ -th loop, it finds an augmenting path $p^{\prime}$ in $G^{\prime}$ either ending at an endpoint of $P$ (we denote all the endpoints of $P$ as $T$ ), or ends at $t^{out}$ or $h_{1},h^{*}$ or $t^{*}$ . Denote $R=V\backslash\mathsf{N}^{+}_{v}(L)$ . Note that $I_{i}$ implies $T\subseteq R$ . $M(p^{\prime})$ ends at a point in $R$ with probability at least $1-\frac{1}{2\kappa}$ . We consider each case.

If $p^{\prime}$ ends at $T$ or $t^{out}$ , then we are done since $T\subseteq R$ and $t\in R$ . 2. 2.

If $p^{\prime}$ ends at $h^{*},h_{1}$ . Recall that $h^{*},h_{1}$ is an uniformly random hub from $\Omega(\kappa\alpha p\log n)$ hubs. Since we sample each vertex as hub with probability at least $p$ inside $L$ , with high probability, the hubs inside $\mathsf{N}^{+}_{v}(L)$ is bounded by $O((\alpha+\kappa)p\log n)$ with high probability. Since $\kappa<\alpha$ , we have $M(h_{1}),M(h^{*})\in R$ with probability at least $1-\frac{1}{2\kappa}$ . 3. 3.

If $p^{\prime}$ ends at $t^{*}$ . Recall that $t^{*}$ is a uniformly random vertex from $\Omega(\kappa\alpha)$ vertices. Thus, $M(t^{*})\in R$ with probability at least $1-\frac{1}{2\kappa}$ .

The event $I_{i}$ happens with probability at most $\frac{1}{2\kappa}$ according to the above discussion, which means $\bot$ is output with probability bounded by a constant.

Acknowledgment

We would like to thank Danupon Nanongkai for numerous fruitful discussions through out the project, and the reviewers for their meticulous reading and comments.

Appendix A Clustering and Partial Virtual Graph

In this section we prove two main gradients of our algorithm for large $\alpha$ : the clustering algorithm in Lemma 5.4 and the partial virtual graph properties in Lemma 5.7.

A.1 Proof of Lemma 5.4

We give the algorithm for building a paths centered clustering on $(G,P)$ . Each vertex in $V(P)$ broadcast tokens to build a BFS tree: initially only vertices in $V(P)$ are activate, on each round all active vertices send a token to all its neighbors if it haven not done so. At the end of each round, if a vertex that have not been activated receive a token (probability more than one token), then it joins the BFS tree of an arbitrary vertex that sent it the token, then become activate. The procedure has dilation $O(D)$ and congestion $O(1)$ , since each active vertices only sends once, and each vertex has distance at most $D$ to a vertex in $V(P)$ .

Now we get a partition of $V$ , each part is a tree rooted at a vertex $v\in V(P)$ with depth at most $D$ , we denote it as $T_{v}$ . Each subtree rooted at a child of $v$ on $T_{v}$ is a cluster $S$ . All $S$ form a partition of $V\backslash V(P)$ , and $G[S]$ has diameter $D$ . Now we want to combine clusters in order to make each cluster either an upper cluster or a lower cluster. We maintain the center set $Centers[S]$ for each cluster $S$ . Initially $Centers[S]=\{v\}$ , where $T_{v}$ is the tree containing $S$ . We write $H_{S}=(S\cup Centers[S],\{(u,v)\mid u\in S,v\in S\cup Centers[S]\})$ . One can see that $H_{S}$ has diameter $D$ initially. During the algorithm we will maintain the property that $H_{S}$ has diameter $kD$ . For each vertex $u\in V\backslash V(P)$ , we use $S_{u}$ to denote the cluster $u$ is in.

For path $p_{i}=(v_{0},v_{1},...,v_{j},...,v_{\ell})$ , we define $PI[v_{j}]=i,I[v_{j}]=j$ to denote the path id and position on path for $v_{j}$ . For each edge $(u,v)$ where $u,v\not\in V(P),S_{u}\not=S_{v}$ , if there do not exists $c_{1}\in Centers[S_{u}],c_{2}\in Centers[S_{v}]$ such that $PI[c_{1}]=PI[c_{2}],I[c_{1}]\not=I[c_{2}]$ , then we call edge $(u,v)$ a critical edge. The idea of the algorithm is to eliminate all critical edges.

We run several phases until there are no critical edge. At each phase, each cluster tosses a coin, getting head or tail with equal probability $\frac{1}{2}$ . Clusters use the shortcuts $H_{S}$ to share information inside a cluster. For a critical edge $(u,v)$ where $S_{u}$ has tail and $S_{v}$ has head, $S_{u}$ want to join $S_{v}$ . Let $S^{\prime}$ contains all the clusters that want to join $S_{v}$ . We run at most $k$ times the following loops. For a cluster $S$ , define $PI[S]=\{PI[c]\mid c\in Centers[S]\}$ . At each loop, $S_{v}$ try to find $i\in(\cup_{S\in S^{\prime}}PI[S])\backslash PI[S_{v}]$ , i.e., path id in $S^{\prime}$ but not in $S_{v}$ . This can be done by making congestion $k$ . If $i$ do not exists, then loops are ending. Otherwise, pick arbitrary $S\in S^{\prime}$ with $i\in PI[S]$ , merge $S$ to $S_{v}$ . After merging, the new center of $S_{v}$ becomes $Centers[S_{v}]\cup Centers[S]$ . One can see that $Centers[S_{v}]\cup Centers[S]$ still contains at most $k$ elements, according to the definition of critical edge (there do not exist two different elements with the same path id). Also notice that each vertex in $S_{v}$ has distance at most $D$ to a vertex in $Centers[S_{v}]$ , thus, $H_{S_{v}}$ has diameter $kD$ . After merging, some critical edges might no longer be a critical edges. Recompute all critical edges and $S^{\prime}$ (note that we do not re-toss the coin), and continue the next loop. Since each loop increase the size of $Centers[S_{v}]$ by $1$ , there can be at most $k$ loops. After all the loops, all clusters $S\in S^{\prime}$ satisfying $PI[S]\subseteq PI[S_{v}]$ . Now all clusters in $S^{\prime}$ merge to $S_{v}$ , after which the phase ends.

At each phase, a critical edge with tail on one side and head on another side must disappear at the end of this phase. Moreover, if an edge is not a critical edge, then it will never become a critical edge. That is because an edge is not a critical edge if either one of its end points is in $V(P)$ , or the centers of both side shares the same path id with different position on the path, and $Centers[S]$ never delete elements. Each critical edge disappear with constant probability at each phase, thus, after $O(\log n)$ phases, with high probability, there are no critical edges. Each phase contains at most $k$ loops, each loop contains a broadcasting inside shortcut with $k$ messages and diameter $kD$ . Thus, the algorithm has dilation $\tilde{O}(k^{2}D)$ and congestion $\tilde{O}(k^{2})$ .

At the end, $Centers[S]$ contains at most one vertex in each path, and $H_{S}$ form a shortcut for $S$ with diameter $kD$ . Moreover, all the edges adjacent $(u,v)$ with $u\in S,v\not\in S$ is not a critical edge. Consider several cases:

If there exists $(u,v)$ such that $v\not\in V(P)$ , then there exists $c_{1}\in Centers[S_{u}],c_{2}\in Centers[S_{v}]$ such that $PI[c_{1}]=PI[c_{2}],I[c_{1}]\not=I[c_{2}]$ . We let $c_{1}$ be the representative of $S$ , denoted as $Rep[S]$ , and set $S$ as upper cluster or lower cluster according to whether $I[c_{1}]<I[c_{2}]$ or $I[c_{2}]<I[c_{1}]$ . 2. 2.

Otherwise, if there exists $(u,v_{1}),(u,v_{2})$ such that $v_{1},v_{2}\in V(P),PI[v_{1}]=PI[v_{2}],I[v_{1}]\not=I[v_{2}]$ . We assume $v_{1}\in Centers[S]$ , otherwise consider two cases: if $Centers[S]$ contains a center $c$ with $PI[c]=PI[v_{1}]$ , then let $v_{1}=c$ and $v_{2}$ be another vertex among $v_{1},v_{2}$ ; if $Centers[S]$ do not contains such a center, then we add $v_{1}$ to $Centers[S]$ . In any case, we can assume $v_{1}\in Centers[S]$ . Let $v_{1}$ be the representative of $S$ , and set $S$ as upper cluster or lower cluster according to whether $I[v_{1}]<I[v_{2}]$ or $I[v_{2}]<I[v_{1}]$ . 3. 3.

Otherwise, all the edges adjacent to $S$ is like $(u,v)$ where $v\in V(P)$ and $v$ only appear once for each path. In that case, there are at most $k$ vertices in $\mathsf{N}(S_{i})$ . We return $\mathsf{N}(S_{i})$ as a vertex cut.

A.2 Proof of Lemma 5.7

We will prove several claims that shows properties about $G^{\prime\prime}$ , which will help us prove the Lemma. Recall that we define $S_{v^{out}}$ as the cluster in $\mathcal{S}$ containing $v$ . For convenience, if $v\in V(P)$ , then we also define $S_{v^{out}}=\{v\}$ and $Centers[S_{v^{out}}]=\{v\},Rep[S_{v^{out}}]=\{v\},LU[S_{v^{out}}]=1$ , and $S_{v^{out}}$ is treated as an upper cluster (since it can reach $v^{out}$ ). We define $h_{1}\prec^{\prime}_{p}h_{2}$ for two hubs $h_{1},h_{2}$ as the following: there exists $c_{1}\in Centers[S_{h_{1}}]\cap V(p),c_{2}\in Centers[S_{h_{2}}]\cap V(p)$ and $c_{1}\prec_{p}c_{2}$ . Recall that $T=\{t\}\cup H\backslash H_{a}$ , and our goal is assuming a path from $s$ to a vertex $v\in V^{\prime}$ exists in $G^{\prime}$ , proving that there is a path from $s$ to $T$ in $G^{\prime\prime}$ or $s$ to $v^{\prime}$ in $G^{\prime\prime}$ where $v^{\prime}$ has distance $\Theta(d\log n)$ to $v$ ..

Claim A.1.

with high probability, for each cluster $S$ that contains at least one hub, there exists a hub in the cluster that has distance at most $\widetilde{O}(d\log n)$ to any vertex $c\in Centers[S]\cup S$ . Moreover, any consecutive sub path of $P$ with length $\Omega(d\log n)$ contains at least one hub.

Proof.

The first sentence is because $G[S]$ is connected and also connect to $c$ . In the induced subgraph $G[S\cup\{c\}]$ , we find all the vertices with distance $\Theta(d\log n)$ to $c$ . There are at least $\Omega(d\log n)$ such vertices (or it covers the whole $S_{i}$ , in which case it must contain the hub in the cluster), and with high probability one vertex is sampled as hub because we sample a hub with probability $p=1/d$ . This same argument holds for a consecutive sub path $P$ with length $\Omega(d\log n)$ . Since there are at most $n$ clusters and $n^{2}$ sub paths, by using union bound we get the desired conclusion. ∎

Claim A.2.

For any two hubs $h_{1}\in H_{a},h_{2}\in H$ , if there exists $p\in P$ such that $h_{2}\prec^{\prime}_{p}h_{1}$ , then $h_{1}$ can reach either $T$ or $h_{2}$ in $G^{\prime\prime}$ .

Proof.

For any active hub $h$ satisfying $Centers[S_{h}]\cap V(p)\not=\emptyset$ , denote the only element in $Centers[S_{h^{\prime}}]\cap V(p)$ as $c_{h}$ . Let the set $H^{*}$ contains all the active hubs $h$ that $h_{1}$ can reach in $G^{\prime\prime}$ such that $c_{h_{2}}\prec_{p}c_{h}$ , let $h^{*}$ be one of the minimal in $H^{*}$ with respect to $\preceq^{\prime}_{p}$ , i.e., $h^{*}\preceq^{\prime}_{p}h$ for any $h\in H^{*}$ . Firstly, $H^{*}$ is not an empty set, since $h_{1}$ is in $H^{*}$ . We claim that $I[c_{h^{*}}]-I[c(h_{2})]=O(d\log n)$ . Otherwise, for convenience we suppose $M(h^{*})\in V\backslash V(P)$ and is in cluster $S$ . The case $M(h^{*})\in V\backslash V(P)$ only make things easier. Then with high probability there exists an active hub $c\in S$ which has distance at most $O(d\log n)$ to $c_{h^{*}}$ according to A.1, and $h^{*}$ has an edge to $c$ according to type 1 edge. There must exist a active hub $h_{3}$ in $V(P)$ such that $I[c_{h^{*}}]>I[M(h_{3})]>I[c_{h_{2}}]$ and $I[c_{h^{*}}]-I[M(h_{3})]=O(d\log n)$ , according to A.1. According to the definition of type 4 edge, $c$ has an edge to $h_{3}$ , which means $h_{3}$ can be reached from $h_{1}$ , leading to a contradiction. Thus, we have $I[c_{h^{*}}]-I[c_{h_{2}}]=O(d\log n)$ . For convenience we assume $M(h_{2})$ is in a cluster $S^{\prime}$ , the case that $M(h_{2})$ is in $V(P)$ only make things easier. Let $c^{\prime}$ be the hub in $S^{\prime}$ that has distance $O(d\log n)$ to $c_{h_{2}}$ . If $c^{\prime}\in T$ , then $h^{*}$ can reach $c^{\prime}$ by first use type 1 edge to reach $c$ , then use type 5 edge to reach $c^{\prime}$ since $c$ has distance at most $O(d\log n)$ to $c^{\prime}$ . Otherwise $c^{\prime}$ is an active hub, and we can use type 4 edge to reach $c^{\prime}$ , then use type 1 edge to reach $h_{2}$ . ∎

Claim A.3.

For $h_{1},h_{2}\in H_{a}$ , if $h_{2}\in T_{h_{1}}$ , then either $h_{1}$ can reach $T$ , or $h_{1}$ can reach $h_{2}$ in $G^{\prime\prime}$ , with high probability.

Proof.

Let $p\in P$ be the path that $Rep[S_{h_{2}}]$ is in. For any $h\in H_{a}$ , we define $v_{h}$ as the only vertex in $Center[S_{h}]\cap V(p)$ . If $T_{h_{1}}$ contains a vertex in $T$ then we are done. Otherwise, according to the definition of type 2 edge, there exists an edge $(h_{1},h^{*})$ such that $v_{h_{2}}\prec_{p}v_{h^{*}}$ or $v_{h_{2}}=v_{h^{*}},LU[S_{h_{2}}]\leq LU[S_{h^{*}}]$ and $h^{*}$ is an active hub. In the case that $v_{h_{2}}\prec_{p}v_{h^{*}}$ , we are done according to A.2, since $h_{2}\prec^{\prime}_{p}h^{*}$ . So we only need to consider the case $v_{h_{2}}=v_{h^{*}},LU[S_{h_{2}}]\leq LU[S_{h^{*}}]$ , we denote $v_{h_{2}},v_{h^{*}}$ as $c$ . Recall that $LU[S_{h_{2}}]>0$ and is $1$ or $2$ if $S_{h_{2}}$ is a lower or upper cluster. Thus, we have $LU[S_{h^{*}}]>0$ , which means $c=Rep[S_{h^{*}}]$ . Now consider two cases.

If $LU[S_{h^{*}}]=2$ , in which case $S_{h^{*}}$ is an upper cluster. Then according to the definition of upper cluster, there exists an edge $(v_{1},v_{2})$ where $v_{1}\in S_{h^{*}}$ and $v_{2}$ is in cluster $B$ with $b\in Center[B]$ and $c\prec_{p}b$ . According to A.1, there is an active hub $h^{\prime}\in S_{h^{*}}$ that has distance at most $O(d\log n)$ to $v_{1}$ . Consider the path $p_{0}$ starting from $v_{2}$ , go through the connected induced subgraph $G^{\prime}[B]$ , to $b^{in}$ and go back to $c^{out}$ through the reversed path of $p$ . If $p_{0}$ has length bounded by $O(d\log n)$ , then we have $c^{out}\in T_{h^{\prime}}$ , which means $h^{\prime}$ is connected to a vertex in the cluster $S_{h_{2}}$ according to type 4 edge; otherwise, the length of $p_{0}$ is $\Omega(d\log n)$ , in which case there is a hub $h_{p_{0}}$ which is an internal vertex of $p$ , such that $h_{p_{0}}\in T_{h^{\prime}}$ w.h.p. Notice that either $h_{p_{0}}\in B$ or $h_{p_{0}}\in p$ , in both cases we have $h_{2}\prec^{\prime}_{p}h_{p_{0}}$ . Thus, there is a type 2 edge $(h^{\prime},h^{*}_{p_{0}})$ such that $h_{2}\prec^{\prime}_{p}h^{*}_{p_{0}}$ according to the definition of type 2 edge. According to A.2, we are done. 2. 2.

If $LU[S_{h^{*}}]=1$ , then both $S_{h^{*}},S_{h_{2}}$ are lower cluster. According to the definition of lower cluster, there exists an edge $(v_{1},v_{2})$ where $v_{1}\in S_{h_{2}}$ , $v_{2}$ is in a cluster $B$ with $b\in Center[B]$ and $b\prec_{p}c$ . According to A.1, there is an active hub $h^{\prime}\in S_{h_{2}}$ that has distance at most $O(d\log n)$ to $v_{1}$ . Consider the path $p_{0}$ starting from $c^{in}$ , go back to $b^{out}$ though the reversed path of $p$ , and go to $v_{2}$ in the connected induced subgraph $G^{\prime}[B]$ . If $p_{0}$ has length bounded by $O(d\log n)$ , then we have $c^{in}\in T_{h_{2}}$ , which means there is a vertex in $S_{h^{*}}$ that connects to $h^{\prime}$ according to type 4 edge; otherwise, the length of $p_{0}$ is $\Omega(d\log n)$ , in which case there is a hub $h_{p_{0}}$ which is an internal vertex of $p$ , such that $h^{\prime}\in T_{h_{p_{0}}}$ , w.h.p. Notice that either $h_{p_{0}}\in B$ or $h_{p_{0}}\in p$ , in both cases we have $h_{p_{0}}\prec^{\prime}_{p}h^{*}$ . Thus, there is a type 3 edge $(h^{*}_{p_{0}},h^{\prime})$ such that $h^{*}_{p_{0}}\prec^{\prime}_{p}h^{*}$ according to the definition of type 3 edge. According to A.2, we are done.

∎

Now we are ready to prove the lemma. Suppose $s$ can reach $v\in V^{\prime}$ and $p$ is the path from $s$ to $v$ in $G^{\prime}$ . With high probability, there exists a sequence of hubs on $p$ like $(s=h_{0},h_{1},...,h_{k})$ such that $h_{i+1}\in T_{h_{i}}$ for any $0\leq i\leq k-1$ , and $v\in T_{h_{k}}$ : that is because every consecutive $\Theta(d\log n)$ vertices must contain a hub with high probability. According to A.3, for each active hub $h_{i}$ , either it can reach $h_{i+1}$ in $G^{\prime\prime}$ , or it can reach $T$ . Therefore, either $s$ can reach $T$ , or $s$ can reach $h_{k}$ , and $h_{k}$ can reach $v$ with distance at most $\Theta(d\log n)$ .

Appendix B Proof of Vertex Residual Graph Lemmas

In this section we prove Lemma 3.3 and Lemma 3.4, which relate vertex residual graph to vertex connectivity.

Proof of Lemma 3.3.

We prove it by induction on the length of $p^{\prime}$ . When the length is [math], the lemma trivially holds. Now suppose $|p^{\prime}|=k$ and the lemma holds for $|p^{\prime}|<k$ . We consider two cases.

Suppose $p^{\prime}$ ends at $v^{out}$ for some $v\not\in V_{I}(P)$ , and $p^{\prime}$ is like $(u,...,v^{\prime},v)$ . We use the induction on the path $p^{\prime\prime}=(u,...,v^{\prime})$ to get vertex disjoint paths $P^{\prime\prime}$ , which contains a path ending at $M(v^{\prime})$ . We will argue that by extending the path ending at $M(v^{\prime})$ to $v$ , we get $P^{\prime}$ which satisfies all we want. There are three things to show: 1. $E(P^{\prime})$ are internally vertex disjoint paths. Since $E(P^{\prime\prime})$ are already internally vertex disjoint, we only need to show: $M(v^{\prime})$ do not intersect with any other paths unless $M(v^{\prime})=u$ , and $v$ do not intersect with any other internal vertices. 2. $P^{\prime}$ are simple paths. 3. $E(P^{\prime})$ is indeed the connected component of subgraph $\oplus(E(P)\cup E(M(p^{\prime}))$ containing $u$ , and other components are circle.

First we show that $M(v^{\prime})$ do not intersect with any other paths in $P^{\prime\prime}$ unless $M(v^{\prime})=u$ : firstly, we have $M(v^{\prime})\not\in T$ since $V_{I}(p^{\prime})$ cannot contain any vertices in $T$ , thus, since $P^{\prime\prime}$ ends at multiset $T\cup\{M(v^{\prime})\}$ , exactly one path ends at $M(v^{\prime})$ , and $M(v^{\prime})$ is not the internal vertices of any other paths according to induction hypothesis.

Now We consider two cases, differed by $v\in T$ or not. Suppose $v\not\in T$ . We first show that $v\not\in V(P)$ . We know $v\not\in V_{I}(P)$ , and we also have $v\not\in T$ , and $v\not=u$ since $p$ is a simple path with length at least $1$ . $v$ is also not in $V(p^{\prime\prime})$ , since $p^{\prime}$ is a simple path. Thus, $v$ is not adjacent to any edge in $\oplus(E(P)\cup E(M(p^{\prime\prime}))$ . Therefore, by extending the path ending at $M(v^{\prime})$ to $v$ , all paths are still internally vertex disjoint, since we have shown $M(v^{\prime}),v$ do not intersect other paths. Components relationship does not change.

Suppose $v\in T$ . Then $v$ is not the internal vertices of any path in $P^{\prime\prime}$ according to induction hypothesis. Thus, by adding $(M(v^{\prime}),v)$ to the path ending at $M(v^{\prime})$ in $P^{\prime\prime}$ , we still get internally vertex disjoint paths. Components relationship does not change.

Finally, $P^{\prime}$ satisfies $E(P^{\prime})=\oplus(E(P)\cup E(M(p^{\prime})))$ since $E(P^{\prime})=E(P^{\prime\prime})\cup\{(M(v^{\prime}),v)\}$ and $E(M(p^{\prime}))=E(M(p^{\prime\prime}))\cup\{(M(v^{\prime}),v)\}$ . 2. 2.

Suppose $p^{\prime}$ ends at $v^{out}$ for some $v\in V_{I}(P)$ . According to the definition of residual graph, and the internal of $p^{\prime}$ do not intersect $T$ , there must exists $p\in P$ such that $(v,w)\in E(p),w\not\in T$ , and $p^{\prime}$ must be like $(u,...,w^{\prime},w^{in},v^{out})$ . Now consider two cases.

Suppose $w^{\prime}=w^{out}$ . Use the induction on path $p^{\prime\prime}=(u,...,w^{\prime}=w^{out})$ to get internally vertex disjoint paths $P^{\prime\prime}$ , which contains a path ending at $w$ . Since $p^{\prime}$ is a simple path, $(w^{in},v^{out})$ is not in $E(p^{\prime\prime})$ . Thus, $(w,v)\not\in M(p^{\prime\prime})$ , because $(w^{in},v^{out})$ is the only edge that can be mapped to $(w,v)$ . Since $(v,w)\in E(P)$ , we have $(v,w)\in\oplus(E(P)\cup E(M(p^{\prime\prime})))$ . $(v,w)$ is also in $E(P^{\prime\prime})$ since $w$ is in the same components as $u$ . Thus, there exists a path ending at edge $(v,w)$ in $P^{\prime\prime}$ . By deleting this edge $(v,w)$ , we get $P^{\prime}$ . Since we only delete an edge, all properties about $P^{\prime\prime}$ still holds.

When $w^{\prime}\not=w^{out}$ . We use the induction on path $p^{\prime\prime}=(u,...,w^{\prime})$ to get internally vertex disjoint paths $P^{\prime\prime}$ , which contains a path ending at $M(w^{\prime})$ . According to the same argument as 1, we know there is exactly one path in $P^{\prime\prime}$ that ends at $M(w^{\prime})$ . Since $p^{\prime}$ is a simple path, $(w^{in},v^{out}),(w^{\prime},w^{in})$ are not in $E(p^{\prime\prime})$ . Thus, $(w,v)\not\in E(M(p^{\prime\prime}))$ , since $(w^{in},v^{out})$ is the only edge that can be mapped to $(w,v)$ . Therefore, we have $(v,w)\in\oplus(E(M(p^{\prime\prime}))\cup E(P))$ . If $(w,M(w^{\prime}))\in\oplus(E(M(p^{\prime\prime}))\cup E(P))$ also holds, since $M(w^{\prime})$ is an end point of $P^{\prime\prime}$ , one path in $P^{\prime\prime}$ must be like $(u,...,v,w,M(w^{\prime}))$ . By deleting the last two edges of the path to $(u,...,v)$ , we get $P^{\prime}$ , which maintains all the properties of $P^{\prime\prime}$ . Thus, the only case left is $(w,M(w^{\prime}))\not\in E(M(p^{\prime\prime}))+E(P)$ . In that case we need to do path shifting. We consider two cases, differed by whether $(v,w)$ is in $E(P^{\prime\prime})$ or not.

If $(v,w)\in E(P^{\prime\prime})$ , suppose $p_{v}$ is the path in $P^{\prime\prime}$ containing $(v,w)$ . Let $p_{w}$ be the unique path in $P^{\prime\prime}$ ending at $M(w^{\prime})$ . We consider two cases. Suppose $p_{v}=p_{w}$ , then by adding edge $(M(w^{\prime}),w)$ and delete $(v,w)$ from $E(P^{\prime\prime})$ , we get a path ending at $v$ (which is a sub path of $p_{v}$ by delete the last edge of $p_{v}$ ), as well as a circle, containing the sub path of $p_{v}$ starting from $w$ to $M(w^{\prime})$ and edge $(M(w^{\prime}),w)$ . Since $p_{v}$ is a simple path, the circle is vertex disjoint, and the new path is simple. If $p_{v}\not=p_{w}$ , then we shift this two paths by adding $(M(w^{\prime}),w)$ and delete $(v,w)$ from $E(P^{\prime\prime})$ . One of the new path take the whole part of $p_{v}$ and the latter sub path of $p_{w}$ starting from $w$ ; the other new path take the former sub path of $p_{w}$ end at $v$ . All properties hold.

If $(v,w)\not\in E(P^{\prime\prime})$ , then according to induction hypothesis, $(v,w)$ is in a vertex disjoint circle. by adding $(M(w^{\prime}),w)$ and delete $(v,w)$ , we add the circle into the path and get a new path ending at $v$ . All properties hold.

∎

Appendix C CONGEST Algorithms Based on Merging Clusters

In this section we prove missing lemmas stated in Section 5. They are all based on techniques that start from clusters with single vertices, merge clusters while maintaining cluster properties, and finally combined all clusters into a single cluster.

Proof of Lemma 5.11.

We maintain clusters $\mathcal{S}$ , initially each cluster $S\in\mathcal{S}$ contains exactly one vertex in $H$ . We run several phases, at each phase, each cluster toss a coin, getting head or tail. In each phase, tail clusters merge to head clusters if they share an edge in $H$ . Each cluster also maintain the information whether it contains $s$ . Communication inside each cluster happened by broadcasting in each cluster if the cluster has size at most $d$ , or broadcasting through the whole network if it has size more than $d$ . Since there are at most $\mu$ vertices in $H$ , the number of clusters that broadcast through the whole network is bounded by $\mu/d$ . Since at each phase, each edge has constant probability to merge two clusters (and become an edge inside a cluster), after $O(\log n)$ phases, there are no edge connecting two different cluster. Now, path $p$ is the cluster that contain $u$ . ∎

Proof of Lemma 5.6.

If a vertex is repeated $r$ times, we just treat it as $r$ vertices, in which case we can treat $p$ as a simple path. We maintain clusters $\mathcal{P}$ , initially each cluster $p^{\prime}\in\mathcal{P}$ contains exactly one vertex in $p$ . Each cluster is a sub path of $p$ . Suppose $p^{\prime}=(v_{a},v_{a+1},..,v_{a+\ell})$ , each vertex $v_{i}\in p^{\prime}$ also maintain the value $s[v_{i}]=\sum_{a\leq j\leq i}x[v_{j}]$ . Initially the value is simply $x[v_{i}]$ .

We run several phases, at each phase, each cluster toss a coin, getting head or tail. In each phase, tail clusters merge to head clusters if they share an edge $(v_{i},v_{i+1})$ , where $v_{i}$ is the head cluster and $v_{i+1}$ is the tail cluster. One can see that each tail cluster can merge to at most $1$ head cluster, and each head cluster can merge at most $1$ tail cluster. $v_{i}$ sends the value $s[v_{i}]$ to all the vertices in the cluster containing $v_{i+1}$ , and they add this value to there results. Communication inside each cluster happened by broadcasting in each cluster if the cluster has size at most $d$ , or broadcasting through the whole network if it has size more than $d$ . Since the path has length $k$ , the number of clusters that broadcast through the whole network is bounded by $k/d$ . Since at each phase, each edge has constant probability to merge two clusters (and become an edge inside a cluster), after $O(\log n)$ phases, there are no edge connecting two different cluster. ∎

Proof of Lemma 5.9.

The idea is to combine directed rooted trees while maintain the tree properties. We first prove the following frequently used claim: given several vertex disjoint rooted directed trees inside $H$ , each vertex $v$ knows the subtree size $s[v]$ inside each tree, then given each vertex $v$ a number $x[v]$ , with dilation $\tilde{O}(d+D)$ and congestion $\widetilde{O}(N/d)$ , each vertex get $\sum_{v^{\prime}\in T_{v}}x[v^{\prime}]$ , where $T_{v}$ is the subtree rooted at $v$ .

To prove the claim, we use the idea of heavy-light tree decomposition and Lemma 5.6. We first decompose each tree into paths: each vertex build a directed edge to one of its child with largest subtree size. The decomposition has the property that every path from a root to a leaf will go through at most $\log N$ paths. To see this, consider each time the path from the root to a leaf leaves a path, since a vertex always point to the child with the largest subtree size, the child without the edge from the parent must have at most half of the subtree size. Thus, we only need to run the algorithm in Lemma 5.6 $O(\log n)$ steps: initially, each vertex get the value $x[v]$ , and each path calculate $s[v]$ which is the sum of $x[v^{\prime}]$ for all $v^{\prime}$ after $v$ on this path. At the beginning of any latter steps, each vertex $v$ send the value $s[v]$ to its parent if $v$ is the beginning of some path, i.e., the parent of $v$ do not have an edge pointing to $v$ . Then all vertices $v$ set the receive value as $x[v]$ and redo Lemma 5.6. For each path, if its length is at most $d$ , then we do calculation just by pipeline on this path; otherwise we use the Lemma 5.6 to cause a dilation $\tilde{O}(d)$ and congestion $\tilde{O}(k/d)$ , where $k$ is the length of the path. Since $H$ has $N$ vertices, there are at most $N/d$ such instance, while the sum of congestion of all instance is bounded by $\tilde{O}(N/d)$ . Thus, the claim is proved.

Now we proceed to prove the lemma. We maintain clusters $\mathcal{S}$ , initially each cluster $S\in\mathcal{S}$ contains exactly one vertex in $V^{\prime}$ . Each cluster maintain a rooted directed tree, and each vertex in the tree know the size of the subtree rooted on the vertex. We run several phases, at each phase, each cluster tosses a coin, getting head or tail. In each phase, tail clusters merge to head clusters if they share an edge in $G[V^{\prime}]$ . Now we consider how to maintain the tree information. Suppose the head cluster is $T$ , and there is a tail cluster $T^{\prime}$ want to merge to $T$ through the edge $(u,v)$ with $u\in T,v\in T^{\prime}$ . $T^{\prime}$ first reverse all the edges from $v$ to the root of $T^{\prime}$ , this can be done by applying the claim, putting value $1$ on $v$ and value [math] on any other vertices. All the vertices $v^{\prime}$ on the path change its subtree value from $s[v^{\prime}]$ to $s[r]-s[v^{\prime}]+1$ , where $s[r]$ is the size of the tree $T^{\prime}$ . One can see that this maintains the correct subtree size inside $T^{\prime}$ . Now we want to maintain the correct subtree size inside $T$ . $v$ sends the tree size of $T^{\prime}$ to $u$ . Now each leaf of $T$ that hangs a merged subtree get the value of the tree size. $T$ use the claim to add these value to its original subtree size for each vertex. All the communication happens simultaneous, with the same argument: if a tree $T$ has size at most $d$ , then all the properties are calculated through a pipeline inside $T$ , otherwise use the claim and cause a dilation $\widetilde{O}(d+D)$ and congestion $\widetilde{O}(|T|/d)$ , while the sum of all $|T|$ is bounded by $N$ .

Since each edge become an edge inside a cluster with constant probability at each phase, after $O(\log n)$ phases, $V^{\prime}$ becomes a whole cluster since $H[V^{\prime}]$ is connected. Each phase cause a dilation $\widetilde{O}(d+D)$ and congestion $\widetilde{O}(N/d)$ , the total dilation and congestion are $\widetilde{O}(d+D)$ and $\widetilde{O}(N/d)$ . Now we get a rooted directed tree $T$ on $V^{\prime}$ . To find the path from $s$ to $t$ , we can use the claim to find the path from $s$ to the root, and from the root to $t$ . ∎

Proof of Lemma 5.10.

Each vertex in $V\backslash V^{\prime}$ broadcast tokens to build a BFS tree: initially only vertices in $V\backslash V^{\prime}$ are activate, on each round all active vertices send a token to all its neighbors if it haven not done so. At the end of each round, if a vertex that have not been activated receive a token (probability more than one token), then it joins the BFS tree of an arbitrary vertex that sent it the token, then become activate. The procedure has dilation $O(D)$ and congestion $O(1)$ , since each active vertices only sends once, and each vertex has distance at most $D$ to a vertex in $V\backslash V^{\prime}$ .

Now we get a partition of $V$ , each part is a tree rooted at a vertex $v\in V\backslash V^{\prime}$ with depth at most $D$ , we denote it as $T_{v}$ . Each subtree rooted at a child of $v$ on $T_{v}$ is a cluster $S$ . All $S$ form a partition of $V^{\prime}$ , denoted as $\mathcal{S}$ , and $G[S]$ is a rooted tree with depth $D$ . Now we want to combine clusters in order to increase the number of vertices in each cluster. Each cluster $S$ also maintain a center set $Centers[S]$ , initially contain $v$ , where $S$ is the cluster which is a subtree of a child of $v$ . We will maintain the property that $H_{S}=\{(u,v)\mid u\in S,v\in S\cup Centers[S]$ has diameter at most $O(kD)$ .

The algorithm runs for several phases, in each phase, each cluster $S$ with $|S|<x$ tosses a coin, getting tail or head with equal probability $1/2$ . If there exists an edge $(u,v)$ where $u,v$ are in different clusters $S_{u},S_{v}$ , and $S_{u}$ have head while $S_{v}$ have tail, then $S_{v}$ will merge to $S_{u}$ . They merge there center set $Centers[S_{u}],Centers[S_{v}]$ as well as their vertex set. Notice that each vertex in any cluster still have distance at most $D$ to one of its center. Thus, as long as a cluster $S$ is connected, the set $H_{S}$ has diameter at most $O(kD)$ , because all the vertices that has distance to a certain center has mutual distance at most $2D$ . Since an edge connecting two small cluster has constant probability to merge at one phase, there are at most $O(\log n)$ phases with high probability. Each phase contains a broadcasting with dilation $O(kD)$ and congestion $O(k)$ .

At last, all the edge $(u,v)$ in $H[V^{\prime}]$ has the following property: either $u,v$ are in the same cluster, or one of the cluster containing $u,v$ has size at least $x$ . Since merging happens on two clusters with size both at most $x$ , if a final cluster have been merged during the algorithm, it must have size $\Theta(x)$ ; otherwise, it has never been merged, which means it remains a rooted tree with depth $D$ . Now suppose $s$ is in cluster $S$ . If $|S|=\Theta(x)$ then we are done; otherwise, either $S$ is an initial cluster, or $|S|=o(x)$ and one of the neighbor $S^{\prime}$ of $S$ has size $\Omega(x)$ . In the later case, we will include both $S$ and some part of $S^{\prime}$ into $V^{\prime\prime}$ . If $|S^{\prime}|=\Theta(x)$ , then we are done, otherwise $S^{\prime}$ is an initial cluster. So the only thing left is the following problem: given a rooted tree $T$ with depth $D$ , number of vertices $\Omega(x)$ and a vertex $s\in T$ , find a connected components in $T$ containing $s$ with size $\Theta(x)$ . With out loss of generality, we can assume $s$ is the root of $T$ , while the depth is still bounded by $O(D)$ . We first compute for each vertex $v$ in $T$ the subtree size $s[v]$ , by a pipeline on $T$ . Then we compute the preorder traversal number for each vertex in $T$ in the following way: suppose a vertex $v$ has the preorder traversal number $p[v]$ , and it has $\ell$ children $v_{1},...,v_{\ell}$ , then $v_{i}$ get the preorder traversal number $\sum_{1\leq j<i}s[v_{j}]+p[v]+1$ . Start from the root with preorder traversal number $1$ , the preorder traversal number for layer 2 vertices in $O(1)$ rounds, then layer 3, 4… The total dilation is $O(D)$ . ∎

Bibliography69

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AEG + 22] Simon Apers, Yuval Efron, Pawel Gawrychowski, Troy Lee, Sagnik Mukhopadhyay, and Danupon Nanongkai. Cut query algorithms with star contraction. In FOCS , pages 507–518. IEEE, 2022.
2[BDD + 82] M. Becker, W. Degenhardt, J. Doenhardt, S. Hertel, G. Kaninke, W. Keber, K. Mehlhorn, S. Näher, H. Rohnert, and T. Winter. A probabilistic algorithm for vertex connectivity of graphs. Information Processing Letters , 15(3):135–136, 1982.
3[CGK 14] Keren Censor-Hillel, Mohsen Ghaffari, and Fabian Kuhn. Distributed connectivity decomposition. In PODC , pages 156–165. ACM, 2014.
4[CGL + 20] Julia Chuzhoy, Yu Gao, Jason Li, Danupon Nanongkai, Richard Peng, and Thatchaphol Saranurak. A deterministic algorithm for balanced cut with applications to dynamic connectivity, flows, and beyond. In FOCS , pages 1158–1167. IEEE, 2020.
5[CHGK 14a] Keren Censor-Hillel, Mohsen Ghaffari, and Fabian Kuhn. Distributed connectivity decomposition. In Proceedings of the 2014 ACM symposium on Principles of distributed computing , pages 156–165, 2014.
6[CHGK 14b] Keren Censor-Hillel, Mohsen Ghaffari, and Fabian Kuhn. A new perspective on vertex connectivity. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms , pages 546–561. SIAM, 2014.
7[CKL + 22] Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time. In FOCS , pages 612–623. IEEE, 2022.
8[CPZ 19] Yi-Jun Chang, Seth Pettie, and Hengjie Zhang. Distributed triangle detection via expander decomposition. In SODA , pages 821–840. SIAM, 2019.

TL;DR

Contribution

Abstract

Peer Reviews

Videos

Taxonomy

Finding a Small Vertex Cut on Distributed Networks

Contents

1 Introduction

Distributed Vertex Cut.

Challenges.

1.1 Our Result

Theorem 1.1** (Informal. See Theorem 2.11 for a formal version.).**

1.2 Techniques

Finding Unbalanced Cuts: Local Flows and Resolving Congestions (‘Lemmas’ 2.6 and 4.3).

Finding Balanced Cuts: Specialized Fast Reachability Algorithm (‘Lemma’ 2.7).

‘Lemma’ 1.2**.**

1.3 Open problems

Vertex connectivity in CONGEST model.

Parallel vertex connectivity.

Other models of computation.

Other graph cut problems.

2 Overview

Lemma 2.1** (IsolatingSmallCut(G=(V,E),A⊆V,κ,αG=(V,E),A\subseteq V,\kappa,\alphaG=(V,E),A⊆V,κ,α); Proof in Section 4).**

Lemma 2.2** (SingleSourceLocalCut(G=(V,E),s,t,κ,αG=(V,E),s,t,\kappa,\alphaG=(V,E),s,t,κ,α); Proof in Section 5).**

Remark 2.3*.*

2.1 IsolatingSmallCut (proof sketch of Lemma 2.1; full proof in Section 4)

Definition 2.4** ((G,s,P)(G,s,P)(G,s,P)-Augmenting Path).**

‘Lemma’ 2.5** (Simplified version of Lemmas 3.3 and 3.4).**

‘Lemma’ 2.6** (Simplified version of Lemma 4.3).**

2.2 SingleSourceLocalCut (proof sketch of Lemma 2.2; full proof in Section 5.)

‘Lemma’ 2.7** (RandomAugmenting(G=(V,E),s,P,x)(G=(V,E),s,P,x)(G=(V,E),s,P,x)).**

Proof idea of ‘Lemma’ 2.7.

Using reachability algorithm to find an augmenting path.

Path centered clustering.

Definition of Before\mathsf{Before}Before and active hubs.

Assumption 2.8**.**

Remark 2.9*.*

Build a virtual graph with fewer BFS tree constructions.

Find reachability in virtual graph

Remark 2.10*.*

2.3 Putting everything together

Theorem 2.11**.**

Correctness.

Lemma 2.12**.**

Proof.

Round complexity.

2.4 Organization

3 Preliminary

3.1 Basic Definitions

Graph terminologies.

CONGEST model.

Distributed inputs and outputs.

Dilation and congestion

Lemma 3.1**.**

3.2 Vertex Residual Graph

Definition 3.2** (Vertex residual graph).**

Lemma 3.3** (Augmenting; Proof in Appendix B).**

Lemma 3.4** (Find cut).**

Proof of Lemma 3.4.

4 IsolatingSmallCut (Proof of Lemma 2.1)

4.1 Distributed Algorithm Details

Remark 4.1*.*

Fact 4.2**.**

Lemma 4.3** (Path-handshaking; proof in Section 4.2).**

4.2 Proof of Lemma 4.3

Claim 4.4**.**

Proof.

Claim 4.5**.**

Proof.

Claim 4.6**.**

Proof.

4.3 Analysis of Algorithm

Round complexity.

Lemma 4.7**.**

Proof.

Lemma 4.8**.**

Proof.

Correctness.

5 SingleSourceLocalCut (Proof of Lemma 2.2)

Theorem 1.1 (Informal. See Theorem 2.11 for a formal version.).

‘Lemma’ 1.2.

Lemma 2.1 (IsolatingSmallCut( $G=(V,E),A\subseteq V,\kappa,\alpha$ ); Proof in Section 4).

Lemma 2.2 (SingleSourceLocalCut( $G=(V,E),s,t,\kappa,\alpha$ ); Proof in Section 5).

*Remark 2.3**.*

Definition 2.4 ( $(G,s,P)$ -Augmenting Path).

‘Lemma’ 2.5 (Simplified version of Lemmas 3.3 and 3.4).

‘Lemma’ 2.6 (Simplified version of Lemma 4.3).

‘Lemma’ 2.7 (RandomAugmenting $(G=(V,E),s,P,x)$ ).

Definition of $\mathsf{Before}$ and active hubs.

Assumption 2.8.

*Remark 2.9**.*

*Remark 2.10**.*

Theorem 2.11.

Lemma 2.12.

Lemma 3.1.

Definition 3.2 (Vertex residual graph).

Lemma 3.3 (Augmenting; Proof in Appendix B).

Lemma 3.4 (Find cut).

*Remark 4.1**.*

Fact 4.2.

Lemma 4.3 (Path-handshaking; proof in Section 4.2).

Claim 4.4.

Claim 4.5.

Claim 4.6.

Lemma 4.7.

Lemma 4.8.

Definition 5.1 (Paths centered clustering).

Definition 5.2 (Paths centered order).

*Remark 5.3**.*

Lemma 5.4 (Clustering; Proof in Section A.1).

Definition 5.5 (Partial Virtual Graph).

Lemma 5.6 (Path aggregation; Proof in Appendix C).

Lemma 5.7 (Partial virtual graph; Proof in Section A.2).

Claim 5.8.

5.6 Substep: Find path from $h_{0}$ to $h^{*}$ .

Lemma 5.9 (Find path in cluster; Proof in Appendix C).

Lemma 5.10 (Find small piece in cluster; proof in Appendix C).

5.7 Step 4: Change path in $G^{\prime\prime}$ to $G^{\prime}$ and update $P$ .

Lemma 5.11 (Turn path into simple path; Proof in Appendix C).

Claim A.1.

Claim A.2.

Claim A.3.