Log Diameter Rounds Algorithms for $2$-Vertex and $2$-Edge Connectivity
Alexandr Andoni, Clifford Stein, Peilin Zhong

TL;DR
This paper develops efficient parallel algorithms for 2-vertex and 2-edge connectivity problems in the MPC model, improving on previous bounds and establishing lower bounds, with scalability and practical relevance for large distributed systems.
Contribution
It introduces new MPC algorithms for 2-edge and 2-vertex connectivity with improved parallel time bounds based on graph diameter, and provides lower bounds for biconnectivity.
Findings
Algorithms run in roughly log diameter rounds.
Achieves linear total memory and scalable per-processor memory.
Provides a lower bound of Omega(log D') for biconnectivity.
Abstract
Many modern parallel systems, such as MapReduce, Hadoop and Spark, can be modeled well by the MPC model. The MPC model captures well coarse-grained computation on large data --- data is distributed to processors, each of which has a sublinear (in the input data) amount of memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. This model is stronger than the classical PRAM model, and it is an intriguing question to design algorithms whose running time is smaller than in the PRAM model. In this paper, we study two fundamental problems, -edge connectivity and -vertex connectivity (biconnectivity). PRAM algorithms which run in time have been known for many years. We give algorithms using roughly log diameter rounds in the MPC model. Our main results are,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Columbia [email protected] partly supported by NSF Grants (CCF-1617955 and CCF-1740833), Simons Foundation (#491119) and Google Research Award. Columbia [email protected] partly supported by NSF Grants CCF-1714818 and CCF-1822809. Columbia [email protected] partly supported by NSF Grants (CCF-1703925, CCF-1421161, CCF-1714818, CCF-1617955 and CCF-1740833), Simons Foundation (#491119) and Google Research Award. \CopyrightAlexandr Andoni, Clifford Stein and Peilin Zhong\ccsdesc[300]Theory of computation MapReduce algorithms \ccsdesc[300]Mathematics of computing Paths and connectivity problems\supplement
Acknowledgements.
\hideLIPIcs\EventEditorsChristel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi \EventNoEds4 \EventLongTitle46th International Colloquium on Automata, Languages, and Programming (ICALP 2019) \EventShortTitleICALP 2019 \EventAcronymICALP \EventYear2019 \EventDateJuly 9–12, 2019 \EventLocationPatras, Greece \EventLogoeatcs \SeriesVolume132 \ArticleNo9
Log Diameter Rounds Algorithms for -Vertex and -Edge Connectivity
Alexandr Andoni
Clifford Stein
Peilin Zhong
Abstract
Many modern parallel systems, such as MapReduce, Hadoop and Spark, can be modeled well by the MPC model. The MPC model captures well coarse-grained computation on large data — data is distributed to processors, each of which has a sublinear (in the input data) amount of memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. This model is stronger than the classical PRAM model, and it is an intriguing question to design algorithms whose running time is smaller than in the PRAM model.
In this paper, we study two fundamental problems, -edge connectivity and -vertex connectivity (biconnectivity). PRAM algorithms which run in time have been known for many years. We give algorithms using roughly log diameter rounds in the MPC model. Our main results are, for an -vertex, -edge graph of diameter and bi-diameter , 1) a parallel time -edge connectivity algorithm, 2) a parallel time biconnectivity algorithm, where the bi-diameter is the largest cycle length over all the vertex pairs in the same biconnected component. Our results are fully scalable, meaning that the memory per processor can be for arbitrary constant , and the total memory used is linear in the problem size. Our -edge connectivity algorithm achieves the same parallel time as the connectivity algorithm of [4]. We also show an conditional lower bound for the biconnectivity problem.
keywords:
parallel algorithms, biconnectivity, -edge connectivity, the MPC model
category:
\relatedversion
1 Introduction
The success of modern parallel and distributed systems such as MapReduce [16, 17], Spark [42], Hadoop [40], Dryad [24], together with the need to solve problems on massive data, is driving the development of new algorithms which are more efficient and scalable in these large-scale systems. An important theoretical problem is to develop models which are good abstractions of these computational frameworks. The Massively Parallel Computation (MPC) model [26, 22, 11, 3, 9, 15, 4] captures the capabilities of these computational systems while keeping the description of the model itself simple. In the MPC model, there are machines (processors), each with local memory, where denotes the size of the input and . The computation proceeds in rounds, where each machine can perform unlimited local computation in a round and exchange data at the end of the round. The parallel time of an algorithm is measured by the total number of computation-communication rounds. The MPC model is a variant of the Bulk Synchronous Parallel (BSP) model [39]. It is also a more powerful model than the PRAM since any PRAM algorithm can be simulated in the MPC model [26, 22] while some problem can be solved in a faster parallel time in the MPC model. For example, computing the XOR of bits takes parallel time in the MPC model but needs near-logarithmic parallel time on the most powerful CRCW PRAM [10].
A natural question to ask is: which problems can be solved in faster parallel time in the MPC model than on a PRAM? This question has been studied by a line of recent papers [26, 19, 30, 3, 1, 6, 23, 15, 7, 14, 13, 33, 20]. Most of these results studied the graph problems, which are the usual benchmarks of parallel/distributed models. Many graph problems such as graph connectivity [36, 34, 31], graph biconnectivity [38, 37], maximal matching [27], minimum spanning tree [28] and maximal independent set [32, 2] can be solved in the standard logarithmic time in the PRAM model, but these problems have been shown to have a better parallel time in the MPC model.
In addition, we hope to develop fully scalable algorithms for the graph problems, i.e., the algorithm should work for any constant . The previous literatures show that a graph problem in the MPC model with large local memory size may be much easier than the same problem in the MPC model but with a smaller local memory size. In particular, when the local memory size per machine is close to the number of vertices , many graph problems have efficient algorithms. For example, if the local memory size per machine is , the connectivity problem [7] and the approximate matching problem [5] can be solved in parallel time. If the local memory size per machine is , then the MPC model meets the congested clique model [12]. In this setting, the connectivity problem and the minimum spanning tree problem can be solved in parallel time [25]. If the local memory size per machine is , many graph problems such as maximal matching, approximate weighted matchings, approximate vertex and edge covers, minimum cuts, and the biconnectivity problem can be solved in parallel time [30, 8]. The landscape of graph algorithms in the MPC model with small local memory is more nuanced and challenging for algorithm designers. If the local memory size per machine is , then the best connectivity algorithm takes parallel time where is the diameter of the graph [4], and the best approximate maximum matching algorithm takes parallel time [33].
Therefore, the main open question is: which kind of the graph problems can have faster fully scalable MPC algorithms than the standard logarithmic PRAM algorithms?
Two fundamental graph problems in graph theory are -edge connectivity and -vertex connectivity (biconnectivity). In this work, we studied these two problems in the MPC model. Consider an -vertex, -edge undirected graph . A bridge of is an edge whose removal increases the number of connected components of . In the -edge connectivity problem, the goal is to find all the bridges of . For any two different edges of , are in the same biconnected component (block) of if and only if there is a simple cycle which contains both . If we define a relation such that if and only if or are contained by a simple cycle, then is an equivalence relation [18]. Thus, a biconnected component is an induced graph of an equivalence class of . In the biconnectivity problem, the goal is to output all the biconnected components of . We proposed faster, fully scalable algorithms for the both -edge connectivity problem and the biconnectivity problem by parameterizing the running time as a function of the diameter and the bi-diameter of the graph. The diameter of is the largest diameter of its connected components. The definition of bi-diameter is a natural generalization of the definition of diameter. If vertices are in the same biconnected component, then the cycle length of is defined as the minimum length of a simple cycle which contains both and . The bi-diameter of is the largest cycle length over all the vertex pairs where both and are in the same biconnected component. Our main results are 1) a fully scalable parallel time -edge connectivity algorithm, 2) a fully scalable parallel time biconnectivity algorithm. Our -edge connectivity algorithm achieves the same parallel time as the connectivity algorithm of [4]. We also show an conditional lower bound for the biconnectivity problem.
1.1 The Model
Our model of computation is the Massively Parallel Computation (MPC) model [26, 22, 11].
Consider two non-negative parameters . In the -MPC model [4], there are machines (processors) each with local memory size , where and denotes the size of the input data. Thus, the space per machine is sublinear in , and the total space is only an factor more than the input size. In particular, if , the total space available in the system is linear in the input size . The space size is measured by words each containing bits. Before the computation starts, the input data is distributed on input machines. The computation proceeds in rounds. In each round, each machine can perform local computation on its local data, and send messages to other machines at the end of the round. In a round, the total size of messages sent/received by a machine should be bounded by its local memory size . For example, a machine can send size messages to machines or send a size message to machine in a single round. However, it cannot broadcast a size message to every machine. In the next round, each machine only holds the received messages in its local memory. At the end of the computation, the output data is distributed on the output machines. An algorithm in this model is called a -MPC algorithm. The parallel time of an algorithm is the total number of rounds needed to finish its computation. In this paper, we consider an arbitrary constant in .
1.2 Our Results
Our main results are efficient MPC algorithms for -edge connectivity and biconnectivity problems. In our algorithms, one important subroutine is computing the Depth-First-Search (DFS) sequence [4] which is a variant of the Euler tour representation proposed by [38, 37] in 1984. We show how to efficiently compute the DFS sequence in the MPC model with linear total space. Conditioned on the hardness of the connectivity problem in the MPC model, we prove a hardness result on the biconnectivity problem.
For -edge connectivity and biconnectivity, the input is an undirected graph with vertices and edges. denotes the size of the representation of , denotes the diameter of , and denotes the bi-diameter of . We state our results in the following.
Biconnectivity. In the biconnectivity problem, we want to find all the biconnected components (blocks) of the input graph . Since the biconnected components of define a partition on , we just need to color each edge, i.e., at the end of the computation, , there is a unique tuple with stored on an output machine, where is called the color of , such that the edges are in the same biconnected components if and only if they have the same color.
Theorem 1.1** (Biconnectivity in MPC).**
For any and any constant , there is a randomized -MPC algorithm which outputs all the biconnected components of the graph in parallel time. The success probability is at least . If the algorithm fails, then it returns FAIL.
The worst case is when the input graph is sparse and the total space available is linear in the input size, i.e., and . In this case, the parallel running time of our algorithm is . If the graph is slightly denser ( for some constant ), or the total space is slightly larger ( is a constant), then we obtain time.
A cut vertex (articulation point) in the graph is a vertex whose removal increases the number of connected components of . Since a vertex is a cut vertex if and only if there are two edges which share the endpoint and are not in the same biconnected component, our algorithm can also find all the cut vertices of .
-Edge connectivity. In the -edge connectivity problem, we want to output all the bridges of the input graph . Since an edge is a bridge if and only if each of its endpoints is either a cut vertex or a vertex with degree , the -edge connectivity problem should be easier than the biconnectivity problem. We show how to solve -edge connectivity in the same parallel time as the algorithm proposed by [4] for solving connectivity.
Theorem 1.2** (-Edge connectivity in MPC).**
For any and any constant , there is a randomized -MPC algorithm which outputs all the bridges of the graph in parallel time. The success probability is at least . If the algorithm fails, then it returns FAIL.
DFS sequence. A rooted tree with a vertex set can be represented by pairs where is a set of parent pointers, i.e., for a non-root vertex , denotes the parent of , and for the root vertex , . We show an algorithm which can compute the DFS sequence (Definition 2.2) of the rooted tree in the MPC model with linear total space.
Theorem 1.3** (DFS sequence of a tree in MPC).**
Given a rooted tree represented by a set of parent pointers , there is a randomized -MPC algorithm which outputs the DFS sequence in parallel time, where is an arbitrary constant, is the depth of the tree. The success probability is at least . If the algorithm fails, then it returns FAIL.
Conditional hardness for biconnectivity. A conjectured hardness for the connectivity problem is the one cycle vs. two cycles conjecture: for any and any constant , any -MPC algorithm requires parallel time to determine whether the input -vertex graph is a single cycle or contains two disjoint length cycles. This conjectured hardness result is widely used in the MPC literature [26, 11, 29, 35, 41]. Under this conjecture, we show that parallel time is necessary for the biconnectivity problem, and this is true even when , i.e., the diameter of the graph is a constant.
Theorem 1.4** (Hardness of biconnectivity in MPC).**
For any and any constant , unless there is a -MPC algorithm which can distinguish the following two instances: 1) a single cycle with vertices, 2) two disjoint cycles each contains vertices, in parallel time, any -MPC algorithm requires parallel time for testing whether a graph with a constant diameter is biconnected.
1.3 Our Techniques
Biconnectivity. At a high level our biconnectivity algorithm is based on a framework proposed by [37]. The main idea is to construct a new graph and reduce the problem of finding biconnected components of to the problem of finding connected components of the new graph . At first glance, it should be efficiently solved by the connectivity algorithm [4]. However, there are two main issues: 1) since the parallel time of the MPC connectivity algorithm of [4] depends on the diameter of the input graph, we need to make the diameter of small, 2) we need to construct efficiently. Let us first consider the first issue, and we will discuss the second issue later.
We give an analysis of the diameter of constructed by [37]. Without loss of generality, we can suppose the input is connected. Each vertex in corresponds to an edge of . Let be an arbitrary spanning tree of with depth . Each non-tree edge can define a simple cycle which contains the edge and the unique path between the endpoints of in the tree . Thus, the length of is at most . If there is a such cycle containing any two tree edges , vertices are connected in . For each non-tree edge , we connect the vertex to the vertex in graph where is an arbitrary tree edge in the cycle . By the construction of , any from the same connected components of should be in the same biconnected components of . Now consider arbitrary two edges in the same biconnected component of . There must be a simple cycle which contains both edges in . Since all the simple cycles defined by the non-tree edges are a cycle basis of [18], the edge set of can be represented by the xor sum of all the edge sets of basis cycles where is a simple cycle defined by a non-tree edge on the cycle . is upper bounded by the bi-diameter of . Furthermore, we can assume intersects . There should be a path between in , and the length of the path is at most . So, the diameter of is upper bounded by . Thus, according to [4], we can find the connected components of in parallel time, where and are upper bounded by the diameter and the bi-diameter of respectively.
Now let us consider how to construct efficiently. The bottleneck is to determine whether the tree edges should be connected in or not. Suppose is the parent of and is the parent of . The vertex should connect to the vertex in if and only if there is a non-tree edge that connects a vertex in the subtree of and a vertex which is on the outside of the subtree of . For each vertex , let be the minimum depth of the least common ancestor (LCA) of over all the non-tree edges . Then should be connected to in if and only if there is a vertex in the subtree of in such that is smaller than the depth of . Since the vertices in a subtree should appear consecutively in the DFS sequence, this question can be solved by some range queries over the DFS sequence. Next, we will discuss how to compute the DFS sequence of a tree.
DFS sequence. The DFS sequence of a tree is a variant of the Euler tour representation of the tree. For an -vertex tree , [37] gives an parallel time PRAM algorithm for the Euler tour representation of . However, since their construction method will destroy the tree structure, it is hard to get a faster MPC algorithm based on this framework. Instead, we follow the leaf sampling framework proposed by [4]. Although the DFS sequence algorithm proposed by [4] takes time where is the depth of , it needs total space. The bottleneck is the subroutine which needs to solve the least common ancestors problem and generate multiple path sequences. The previous algorithm uses the doubling algorithm for the subroutine, i,e., for each vertex , they store the -th ancestor of for every . This is the reason why [4] cannot achieve the linear total space. We show how to compress the tree into a new tree which only contains at most vertices. We argue that applying the doubling algorithm on is sufficient for us to find the DFS sequence of .
-Edge connectivity. Without loss of generality, we can assume the input graph is connected. Consider a rooted spanning tree and an edge in . Suppose the depth of is at least the depth of in , i.e., cannot be a child of . The edge is not a bridge if and only if either is a non-tree edge or there is a non-tree edge connecting the subtree of and a vertex on the outside of the subtree of . Similarly, the second case can be solved by some range queries over the DFS sequence of .
Conditional hardness for biconnectivity. We want to reduce the connectivity problem to the biconnectivity problem. For an undirected graph , if we add an additional vertex and connects to every vertex of , then the diameter of the resulting graph is at most and each biconnected components of corresponds to a connected component of . Furthermore, the bi-diameter of is upper bounded by the diameter of plus . Therefore, if the parallel time of an algorithm for finding the biconnected components of depends on the bi-diameter of , there exists an algorithm which can find all the connected components of in the parallel time which has the same dependence on the diameter of .
1.4 A Roadmap
The rest of this paper is organized as follows. Section 2 includes the notation and some useful definitions. Section 3 describes the offline algorithms for -edge connectivity and biconnectivity. It also includes the analysis of some crucial properties and the correctness of the algorithms. In Section 4, we show how to find the DFS sequence of a tree in the MPC model with linear total space. Section 5 discusses the implementations of the -edge connectivity algorithm and the biconnectivity algorithm in the MPC model. Section 6 contains the conditional hardness result for the biconnectivity problem in the MPC model.
2 Preliminaries
We follow the notation of [4]. denotes the set of integers .
Diameter and bi-diameter. Consider an undirected graph with a vertex set and an edge set . For any two vertices , we use to denote the distance between and in graph . If are not in the same (connected) component of , then . The diameter of is the largest diameter of its connected components, i.e., . is a cycle of length if and . We say a cycle is simple if and each vertex only appears once in the cycle except . Consider two different vertices . We use to denote the minimum length of a simple cycle which contains both vertices and . If there is no simple cycle which contains both and , . is defined as [math]. The bi-diameter of , , is defined as .
Representation of a rooted forest. Let denote a set of vertices. We represent a rooted forest in the same manner as [4]. Consider a mapping . For and , we define as , and is defined as itself. If such that , then we call a set of parent pointers on . For , if , then we say is a root of . Notice that actually can represent a rooted forest, thus can have more than one root. The depth of , is the smallest such that is the same as . The root of , is defined as . The depth of is defined as .
Ancestor and path. For two vertices , if such that then is an ancestor of (in ). If is an ancestor of , then the path (in ) from to is a sequence and the path is the reverse of , i.e., . If an ancestor of is also an ancestor of , then is a common ancestor of . Furthermore, if a common ancestor of satisfies for any common ancestor of , then is the lowest common ancestor (LCA) of .
Children and leaves. For any non-root vertex of , is a child of . For any vertex , denotes the set of all the children of , i.e., If is the smallest vertex in the set then we define , or in other words, is the child of . If is a root vertex of , then is defined as . denotes the child of . For simplicity, if is clear in the context, we just use , and to denote , and for short. If , then is a leaf of . We denote as the set of all the leaves of , i.e., .
2.1 Depth-First-Search Sequence
The Euler tour representation of a tree is proposed by [38, 37]. It is a crucial building block in many graph algorithms including biconnectivity algorithms. The Depth-First-Search (DFS) sequence [4] of a rooted tree is a variant of the Euler tour representation. Let us first introduce some relevant concepts of the DFS sequence.
Definition 2.1** (Subtree [4]).**
Consider a set of parent pointers on a vertex set . Let be a vertex in , and let . is a set of parent pointers on . If , and , then is a subtree of in . For , we say is in the subtree of .
The definition of the DFS sequence is the following:
Definition 2.2** (DFS sequence [4]).**
Consider a set of parent pointers on a vertex set . Let be a vertex in . If is a leaf in , then the DFS sequence of the subtree of is . Otherwise, the DFS sequence of the subtree of is defined recursively as
[TABLE]
where and is the DFS sequence of the subtree of , i.e., the child of .
If has a unique root , then we define the DFS sequence of as the DFS sequence of the subtree of . By the definition of the DFS sequence, for any two consecutive elements and in the sequence, is either a parent of or is a child of . Furthermore, for any vertex , if both elements and in the DFS sequence are , any element between and (i.e., ) should be a vertex in the subtree of .
2.2 Data Organization and Basic Algorithms in the MPC Model
We organize the data in the MPC model as in [4].
Set. Consider a set of items where each can be described by a constant number of words. If there is a unique machine which stores a pair in its local memory, then the set is stored in the system. is the name of the set and can be represented by a constant number of words. Let be a family of sets, where is stored in the system and the name of can be represented by a constant number of words. If there is a unique machine which stores a pair in its local memory, then we say is stored in the system. The total space for storing is .
An undirected graph can be represented by a pair of the sets , where denotes the set of the vertices and denotes the set of the edges. To store the graph in the system, we just need to store both and in the system.
Mapping. Consider a mapping where are two finite sets and every element from or only requires a constant number of words to describe. Let . Then is a set representation of the mapping , and the name of is . If the set is stored in the system, then we say the mapping is stored in the system. The total space needed for storing is .
A set of parent pointers on a vertex set can be regarded as a mapping .
Sequence. Let be a sequence of elements, where each element can be represented by a constant number of words. Let where . Then is a set representation of the sequence , and the name of is . If is stored in the system, then we say the sequence is stored in the system. The total space needed for storing is .
Basic MPC operations. One of the most basic algorithm in the MPC model is sorting.
Theorem 2.3** ([21, 22]).**
Sorting can be solved in parallel time in the -MPC model for any constant , where is a universal constant.
For any number of machines with local memory can always be simulated by number of machines with local memory. Therefore, if an algorithm can solve a problem in -MPC model in rounds, then the such algorithm can be simulated in -MPC model in rounds for any . Thus, for any and any constant , sorting takes parallel time in the -MPC model.
Sorting is an important tool to build the MPC subroutines. One such MPC subroutine is to handle multiple queries at the same time. Roughly speaking, a random access shared memory can be simulated in the MPC model. Suppose there are sets stored in the system, and the of them are set representations of mappings . Suppose each machine has several queries where each query requires the value for some . All the queries can be simultaneously handled in constant parallel time in the -MPC model for any constant . For more basic MPC operations, we refer readers to [4].
3 -Edge Connectivity and Biconnectivity
Consider a connected undirected graph with a vertex set and an edge set . In the -edge connectivity problem, the goal is to find all the bridges of , where an edge is called a bridge if its removal disconnects . In the biconnectivity problem, the goal is to partition the edges into several groups , i.e., , such that , and are in the same group if and only if there is a simple cycle in which contains both and . A subgraph induced by an edge group is called a biconnected component (block). In other words, the goal of the biconnectivity problem is to find all the blocks of .
In this section, we describe the algorithms for both the -edge connectivity problem and the biconnectivity problem in the offline setting. In Section 5, we will discuss how to implement them in the MPC model.
3.1 -Edge Connectivity
The -edge connectivity problem is much simpler than the biconnectivity problem. We first compute a spanning tree of the graph. Only a tree edge can be a bridge. Then for any non-root vertex , if there is no non-tree edge which crosses between the subtree of and the outside of the subtree of , then the tree edge which connects to its parent is a bridge.
Lemma 3.1** (-Edge connectivity).**
Consider an undirected graph . Let be the output of Bridges. Then is the set of all the bridges of .
Proof 3.2**.**
Suppose is not a bridge. If is a non-tree edge in , then since only contains tree edges, . Otherwise, suppose . There must be a non-tree edge such that is in the subtree of but is not in the subtree of . Thus, the LCA of is not , and it is an ancestor of which means that the depth of the LCA of is smaller than . By step 2, we have . Let be the first and the last appearance of in the DFS sequence of . Since is in the subtree of , there exists such that . By step 4, since , .
If is a bridge. Then must be a tree edge in , i.e., either or . Suppose . Then for any non-tree edge with in the subtree of , must also be in the subtree of . Thus, the depth of the LCA of should be at least . By step 2, for any in the subtree of , we have . Let be the first and the last appearance of in the DFS sequence of . Since all the vertices are in the subtree of , we have by step 4.
3.2 Biconnectivity
In this section, we will show a biconnectivity algorithm. It is a modification of the algorithm proposed by [37]. The high level idea is to construct a new graph based on the input graph , and reduce the biconnectivity problem of to the connectivity problem of . Since the running time of the connectivity algorithm [4] depends on the diameter of the graph, we also give an analysis of the diameter of the graph .
Lemma 3.3** (Biconnectivity).**
Consider an undirected graph . Let be the output of Biconn. Then satisfies there is a simple cycle in which contains both and . Furthermore, the diameter of the graph constructed by Biconn is at most , the number of vertices of is at most , and the number of edges of is at most .
Proof 3.4**.**
Each corresponds to a tree edge . Since , . By step 5 and step 6, each edge of creates at most edge of . Thus, .
Claim 1**.**
If , i.e., vertices are in the same connected component of , then there is a simple cycle in which contains both edges and .
Proof 3.5**.**
Firstly, let us consider the case when . If is added into by step 6, then there is a simple cycle in :
[TABLE]
Both edges and are in the such cycle. If is added into by step 5, then . Let be the first and the last appearance of in respectively. By step 5, there exists with such that . Thus, there is a vertex in the subtree of such that . By step 2, there is an edge such that the depth of the LCA of is smaller than which means that is not in the subtree of . In this case, there is a simple cycle in :
[TABLE]
Since , both edges , are in the such cycle.
Suppose are in the same connected component of and , are in a simple cycle in . Suppose are in the same connected component of and , are in a simple cycle in . Then, and are in the same connected component of . The symmetric difference of the edge set of and the edge set of should form another simple cycle in which contains both edges and . By induction on , the claim holds.
By Claim 1 and step 8, , if , then there should be a simple cycle in which contains both edges and . Consider an edge such that neither nor is the LCA of , i.e., is a non-tree edge. Without loss of generality, suppose . There is always a cycle in :
[TABLE]
which contains both edges . By step 8, we have . Therefore, , there are always tree edges such that , are either in a simple cycle in or , and are either in a simple cycle in or . If , then which implies that are either in a simple cycle in or . Hence if then either there is a simple cycle in which contains both or .
Next, let us show that if there is a simple cycle in which contains both edges , then . An observation is that each non-tree edge (i.e., neither nor is the LCA of in ) defines a simple cycle in :
[TABLE]
Claim 2**.**
For any simple cycle defined by a non-tree edge , there is a path in such that contains every vertex in except the LCA of in . Furthermore, the length of is at most .
Proof 3.6**.**
Without loss of generality, we can assume . If is an ancestor of , then the cycle is
[TABLE]
for some . For each , is in the subtree of . By step 5, since for any , we have . Thus, there is a path in : . In this case, the length of should be at most .
If is not an ancestor of , then the cycle is
[TABLE]
for some . By the similar argument, the edge ( the edge ) is added into by step 5. By step 6, is added into . Therefore, there is a path in :
[TABLE]
In this case, the length of should be at most .
Notice that all the simple cycles defined by the non-tree edges formed a cycle basis of the cycle space of , i.e., the edge set of any simple cycle in can be represented by an xor sum of the edge sets of cycles defined by some non-tree edges [18]. Consider any two tree edges contained by a simple cycle . Let be all the non-tree edges in . Then can be represented by an xor sum of . Furthermore, and should have a common tree edge. According to Claim 2, for each , we can find a path in and , intersects . Therefore, and are in the same connected component in . By step 8, . Now consider a non-tree edge . Without loss of generality, we can assume . A tree edge is the simple cycle defined by . By step 8, we know that . Therefore, we can conclude that , if there is a simple cycle in which contains both , then .
The only thing remaining to prove is the diameter of . According to Claim 1, with , there is a cycle in which contains both edges and .
Claim 3**.**
, if there is a cycle in which contains both edges , , then there is a cycle in with length which contains both edges , .
Proof 3.7**.**
By the definition of , there is a cycle with length at most which contains both vertices . If already contains both edges , , then we are done. Otherwise, suppose does not contain . There is an another cycle with length at most which contains both vertices . We can regard as two disjoint paths from to . Thus at least one of the path does not contain the edge . Suppose this path is where is the first vertex which appears in , then we can combine the path with the path obtained by removing the sub-path from to of to get a new cycle which contains both the edge and . The length of the new cycle is at most . We can do the similar operation to add edge into the cycle. Thus, finally we will get a cycle which contains both , with length at most .
According to the above claim, we can find a cycle in which contains both edges , with length at most . It means that can be represented by an xor sum of basis cycles defined by non-tree edges . Furthermore, and have at least one common tree edge. By Claim 2, we can find paths defined by in such that intersects at some vertex, and are on some path respectively. Thus, , where the second inequality follows from Claim 2. To conclude, .
4 Parallel DFS Sequence in Linear Total Space
In Section 4.1, we will review an algorithmic framework proposed by [4] for the DFS sequence. In Section 4.2, 4.3, 4.4, we will discuss the subroutines needed for our DFS sequence algorithm in the offline setting. In Section 4.5, we will discuss the implementation in the MPC model.
4.1 DFS Sequence via Leaf Sampling
In the following, we review the leaf sampling algorithmic framework proposed by [4] for finding the DFS sequence of a rooted tree.
Theorem 4.1** (Leaf sampling algorithm [4]).**
Consider a set of parent pointers on a set of vertices. Suppose has a unique root. For any and any constant , if both of step 4 and step 6 in LeafSampling can be implemented in the -MPC model with parallel time, then the leaf sampling algorithm with parameter on input can be implemented in the -MPC model. Furthermore, with probability at least , LeafSampling can output the DFS sequence of in parallel time. If the algorithm fails, then it returns FAIL.
By Theorem 4.1, we only need to give a linear total space MPC algorithm for the LCA problem and the path generation problem to design an efficient DFS sequence algorithm in the -MPC model.
In [4], they proposed to use doubling algorithms to compute the LCA and generate the paths. Since they need to store the every -th ancestor for each vertex, the total space needed is . We will show that we only need to apply the doubling algorithm for a compressed tree, instead of applying the doubling algorithm for the original tree.
4.2 Compressed Rooted Tree
Given a set of parent pointers , we will show how to compress the rooted tree represented by .
Lemma 4.2** (Properties of a compressed rooted tree).**
Let be a set of parent pointers on a vertex set with , and has a unique root. Let and let Compress. Then it has the following properties:
. 2. 2.
* .* 3. 3.
* such that .*
Proof 4.3**.**
Consider the first property. For each , we define a set
[TABLE]
* we have . Since , , we have . Furthermore, it is easy to show that , . Thus, . On the other hand, since , we know that . Therefore . To conclude, .*
Consider the second property. If is a root vertex, . For a non-root vertex , . Since , we have which means that . Now we prove by induction. Suppose , then .
Consider the third property. For , such that . Since and , we know that . Since , the property holds.
4.3 Least Common Ancestor
Given a rooted tree represented by a set of parent pointers on a vertex set , and a set of queries where , we show a space efficient algorithm which can output the LCA of each queried pair of vertices. Notice that the assumption that queries only contain leaves is without loss of generality: we can attach an additional child vertex to each non-leaf vertex . Thus, is a leaf vertex. When a query contains , we can use to replace in the query, and the result will not change.
Before we analyze the algorithm LCA, let us discuss some details of the algorithm.
We pre-compute and for every and . 2. 2.
To implement step 3a, we firstly check whether . If it is not true, we can set to be directly. Otherwise, according to Lemma 4.2, there is a such that . Since , . We initialize to be . For , if (i.e., ), we set . Due to Lemma 4.2 again, the final must satisfy and . This step takes time .
Lemma 4.4** (LCA algorithm).**
Let be a set of parent pointers on a vertex set . has a unique root. Let be a set of pairs of vertices where . Let be the output of LCA. For , satisfies that is the LCA of , are ancestors of respectively, and are children of .
Proof 4.5**.**
Without loss of generality, we can assume . After step 3a, satisfies and . Notice that the LCA of in is the same as the LCA of in . In step 3b, if we find the LCA of , then the lemma holds for . Otherwise, the depth of the LCA of is smaller than . By combining with Lemma 4.2, neither of nor in step 3c can be the LCA of in . Thus, the LCA of in is the same as the LCA of in . According to step 3d, are ancestors of respectively in both and , but neither of nor is the common ancestor of . Furthermore, is the LCA of in . Thus, is a common ancestor of in . By combining with Lemma 4.2, we know that there exists such that is the LCA of in . In step 3e, we can find the LCA of in and thus the LCA of .
4.4 Multi-Paths Generation
Consider a rooted tree represented by a set of parent pointers on a vertex set and a set of vertex-ancestor pairs where is an ancestor of . We show a space efficient algorithm MultiPaths which can generate all the paths .
Before we analyze the correctness of the algorithm, let us discuss some details.
In step 3a, if the length of the path is at most , then we can generate the path in rounds. In the -th round, we can find the vertex . 2. 2.
In step 3b, we use the following way to find . We initialize as . For , if (i.e., ), we set .
Lemma 4.6** (Generation of multiple paths).**
Let be a set of parent pointers on a vertex set . has a unique root. Let be a set of pairs of vertices where is an ancestor of in . Let be the output of MultiPaths. Then , i.e., is a sequence which denotes a path from to in .
Proof 4.7**.**
Consider a pair . If , then will be the path from to in by step 3a.
We only need to consider the case when . According to Lemma 4.2, such that . Thus, can be found by step 3b. Then can be found. is an ancestor of . is an ancestor of . is an ancestor of . In step 3d, the initialization of should be By Lemma 4.2, the initialization of is also . Then by step 3e, the final sequence will be which denotes the path from to in .
4.5 Implementation of the DFS Sequence Algorithm in MPC
Here, we discuss how to implement the subroutines mentioned in Section 4.2, 4.3, 4.4 in the MPC model. See section 2.2 for the organization of the data in the MPC model and basic MPC operations.
Compressed rooted tree. Consider the implementation of Compress (Section 4.2) in the MPC model. The input size is . In the first step, we need to compute the depth of every vertex in . As shown by [4], this can be computed in the MPC model with total space and local memory size per machine for any constant in time. In the next step, can be computed in time. Finally, we can simultaneously compute for every vertex . Since for , it takes time. Therefore, Compress can be implemented in the -MPC model for any constant in time.
Least common ancestor. Consider the implementation of LCA (Section 4.3) in the MPC model. The input size is . The first step computes a compressed rooted tree . As discussed in the previous paragraph, this only requires total space and local memory per machine for any constant . Before the next step, we need to compute the depth of each vertex in and the depth of each vertex in . Since , it takes time. In step 2, as shown in [4], for can be computed in the MPC model with total space and local memory per machine for any constant in time. According to Lemma 4.2, . Thus, step 2 only needs total space and takes time . For step 3, we can handle all the queries in simultaneously. For step 3a, we can use time to check whether . If it is true, we can use time to find a such that . Then, we apply an exponential search by using to find . This takes time. Step 3b checks whether is the LCA for every . Thus, it takes time. In step 3c, according to Lemma 4.2, there exists such that . Thus, we only need time to find . Similarly, we only need time to find . In step 3d, by [4], the LCA of each in can be computed simultaneously in the MPC model with total space in time. The last step checks whether for each . Thus it requires time. To conclude, LCA can be implemented in the -MPC model for any constant in parallel time.
Multiple paths generation. Consider the implementation of MultiPaths (Section 4.4) in the MPC model. The first two steps are the same as the first two in the LCA subroutine mentioned in the previous paragraph. They can be implemented in the MPC model with total space and local memory per machine for any constant in time. We compute the depth of each vertex in and the depth of each vertex in in time before the next step. In step 3, all the queries can be handled simultaneously. In step 3a, if , the length of the path from to is at most , and thus can be computed in time. In step 3b, we can use time to find the minimum such that . Then we can apply exponential search to find by using in time. In step 3c, by [4], each path in can be generated simultaneously in the MPC model with total space in time. Consider the initialization of in step 3d. should be and should be . By Lemma 4.2, , . Thus, the number of repetitions in the final step is at most . To conclude, MultiPaths can be implemented in the MPC model with total space linear in and local memory size per machine for any constant in time.
DFS sequence in the MPC model. Consider LeafSampling where and is an arbitrary constant from . For step 4 of LeafSampling, we run our LCA (Section 4.3) algorithm. The correctness of our LCA algorithm is guaranteed by Lemma 4.4. According to [4], the total number of queries generated in step 4 of LeafSampling is at most with high probability. Then due to the discussion in the previous paragraphs, the step 4 of LeafSampling can be implemented in the -MPC model for any constant in time. For step 6 of LeafSampling, we run our multiple paths generation (Section 4.4) algorithm. The correctness of our multiple paths generation algorithm is guaranteed by Lemma 4.6. Notice that the total length of all the queried paths in the step 6 of LeafSampling is at most the length of the DFS sequence which is . According to the discussion in the previous paragraphs, the step 6 of LeafSampling can be implemented in the -MPC model for any constant in time. Together with Theorem 4.1, we conclude Theorem 1.3.
5 -Edge Connectivity and Biconnectivity in MPC
In this section, we will discuss how to implement the -edge connectivity algorithm and the biconnectivity algorithm in the MPC model. Let us firstly introduce how to implement an subroutine called range minimum query (RMQ) in the MPC model.
5.1 Parallel Range Minimum Query in Linear Total Space
The range minimum query (RMQ) problem is as the following. Given a sequence and a set of queries where , we want to find the value for each query . [4] shows an MPC algorithm which requires total space and takes parallel time for solving the RMQ problem. Their space is not linear in the input size. In this section, we show that if every query satisfies , then we can solve the such RMQ problem in the MPC model with total space in parallel time. The offline description is shown in the algorithm RMQ.
Lemma 5.1** (Range minimum query).**
Let be a sequence of numbers and where . Let be the output of RMQ. Then , In addition, RMQ can be implemented in the -MPC model for any constant in parallel time.
Proof 5.2**.**
Firstly, let us consider the correctness of RMQ. Let . For a query , since , the found by the step 3a will satisfy . If , then and . Otherwise, by step 3b, . By step 3c, .
Let us analyze the total space required and the parallel time for running RMQ in the MPC model. According to Theorem 2.3, the sorting takes time and requires linear total space. Notice that is a constant and each machine has local memory. We can sort by their indexes and number of duplicates of some elements in such that are on the machine. Therefore, the first two steps of RMQ can be implemented in the MPC model with total space and in time . For step 3, we can handle all the queries simultaneously. Step 3a only requires local computations. Step 3b needs to handle at most RMQ on the sequence . Due to [4], this can be implemented in the MPC model with total space and parallel time. Step 3c can be done in time. To conclude, RMQ can be implemented in the -MPC model for any constant and the parallel time is .
5.2 MPC Implementation of -Edge Connectivity and Biconnectivity
The input is a connected undirected graph . has vertices and edges. Thus, the input size is . Consider the -MPC model for and an arbitrary constant . The total space in the system should be and the local memory size of each machine is . There is an efficient algorithm for solving connected components and spanning tree problem.
Theorem 5.3** ([4]).**
For any and any constant , there is a randomized -MPC algorithm which outputs the connected components together with a rooted spanning forest of an undirected graph with vertices and edges in parallel time. Furthermore, the depth of the spanning forest is at most . The success probability is at least . If the algorithm fails, then it returns FAIL.
-Edge connectivity. In the first step of Bridges (Section 3.1), according to Theorem 5.3, with probability , the rooted spanning tree of can be computed in the MPC model with total space in time, and the depth of the spanning tree is at most . In step 2, to compute for each , we can query the LCA of in for each edge . We can use our LCA algorithm (Section 4.3) as the subroutine for this purpose. It takes the total space and the running time (Section 4.5). In step 3, with probability at least , the DFS sequence can be computed using total space in time (Theorem 1.3). In step 4, we can use sorting to find the first appearance and the last appearance in the DFS sequence of each vertex , and corresponds to a range minimum query. If the size of the subtree of is at most , the corresponding RMQ can be solved by local computation. Otherwise, we use our RMQ algorithm (Section 5.1) to handle the corresponding RMQ of . By Lemma 5.1, this step only takes time and requires space. To conclude, Bridges only takes total space and has parallel time .
Since the correctness of Bridges (Section 3.1) is guaranteed by Lemma 3.1, we can conclude Theorem 1.2.
Biconnectivity. The first three steps of Biconn (Section 3.2) are the same as the first three steps of Bridges (Section 3.1). Thus, the success probability of the first three steps is at least . The total space used is at most and the running time is at most . Step 5 of Biconn corresponds to the RMQ problem which is almost the same as the step 4 of Bridges. Thus, it takes total space and parallel time. Step 6 requires LCA queries. We can run our LCA algorithm (Section 4.3) for this step. It takes space and time (Section 4.5). By Lemma 3.3, we have . According to Theorem 5.3, with probability at least , the connected components of can be computed in step 7, the total space needed is , and the running time is . To conclude, the total space needed is at most , and the parallel running time is .
Since the correctness of Biconn (Section 3.2) is guaranteed by Lemma 3.3, we can conclude Theorem 1.1.
6 Hardness of Biconnectivity in MPC
There is a conjectured hardness result which is widely used in the MPC literature [26, 11, 29, 35, 41].
Conjecture 6.1** (One cycle vs. two cycles).**
For any and any constant , distinguishing the following two graph instances in the -MPC model requires parallel time:
a single cycle contains vertices, 2. 2.
two disjoint cycles, each contains vertices.
Under the above conjecture, we show that parallel time is necessary to compute the biconnected components of . This claim is true even for the constant diameter graph , i.e., .
Theorem 6.2** (Hardness of biconnectivity in MPC).**
For any and any constant , unless the one cycle vs. two cycles conjecture (Conjecture 6.1) is false, any -MPC algorithm requires parallel time for testing whether a graph with a constant diameter is biconnected.
Proof 6.3**.**
For and an arbitrary constant , suppose there is a -MPC algorithm which can determine whether an arbitrary constant diameter graph is biconnected in parallel time. Then we give a -MPC algorithm for solving one cycle vs. two cycles problem as the following:
For a one cycle vs. two cycles instance -vertex graph , construct a new graph : . 2. 2.
Run on . If is not biconnected, contains two cycles. Otherwise is a single cycle.
It is easy to see that the diameter of is . If is a single cycle, then is biconnected and . If contains two cycles, then contains two biconnected components and .
The first step of the above algorithm takes parallel time and only requires linear total space. The graph has vertices and edges. Thus, the above algorithm is also a -MPC algorithm. The parallel time of the above algorithm is the same as the time needed for running on which is . Thus the existence of the algorithm implies that the one cycle vs. two cycles conjecture (Conjecture 6.1) is false.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Kook Jin Ahn and Sudipto Guha. Access to data and number of iterations: Dual primal algorithms for maximum matching under resource constraints. ACM Transactions on Parallel Computing (TOPC) , 4(4):17, 2018.
- 2[2] Noga Alon, László Babai, and Alon Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. Journal of algorithms , 7(4):567–583, 1986.
- 3[3] Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages 574–583. ACM, 2014.
- 4[4] Alexandr Andoni, Zhao Song, Clifford Stein, Zhengyu Wang, and Peilin Zhong. Parallel graph connectivity in log diameter rounds. In FOCS . https://arxiv.org/pdf/1805.03055 , 2018.
- 5[5] Sepehr Assadi, Mohammad Hossein Bateni, Aaron Bernstein, Vahab Mirrokni, and Cliff Stein. Coresets meet edcs: algorithms for matching and vertex cover on massive graphs. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 1616–1635. SIAM, 2019.
- 6[6] Sepehr Assadi and Sanjeev Khanna. Randomized composable coresets for matching and vertex cover. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures , pages 3–12. ACM, 2017.
- 7[7] Sepehr Assadi, Xiaorui Sun, and Omri Weinstein. Massively parallel algorithms for finding well-connected components in sparse graphs. In Ar Xiv preprint . https://arxiv.org/pdf/1805.02974 , 2018.
- 8[8] Giorgio Ausiello, Donatella Firmani, Luigi Laura, and Emanuele Paracone. Large-scale graph biconnectivity in mapreduce. Department of Computer and System Sciences Antonio Ruberti Technical Reports , 4(4), 2012.
