Pseudo-Separation for Assessment of Structural Vulnerability of a Network
Alan Kuhnle, Tianyi Pan, Victoria G. Crawford, Md Abdul Alim, My T., Thai

TL;DR
This paper introduces pseudocut problems as a new way to assess network vulnerability by analyzing how network functionality is affected when nodes are sufficiently separated, with applications in communication networks.
Contribution
It generalizes classical cut problems, analyzes their computational complexity, and provides approximation algorithms with practical evaluation for network vulnerability assessment.
Findings
Pseudocut problems are broadly relevant to network reliability.
Three approximation algorithms are proposed for pseudocut problems.
Experimental evaluation demonstrates the utility of the algorithms in communication networks.
Abstract
Based upon the idea that network functionality is impaired if two nodes in a network are sufficiently separated in terms of a given metric, we introduce two combinatorial \emph{pseudocut} problems generalizing the classical min-cut and multi-cut problems. We expect the pseudocut problems will find broad relevance to the study of network reliability. We comprehensively analyze the computational complexity of the pseudocut problems and provide three approximation algorithms for these problems. Motivated by applications in communication networks with strict Quality-of-Service (QoS) requirements, we demonstrate the utility of the pseudocut problems by proposing a targeted vulnerability assessment for the structure of communication networks using QoS metrics; we perform experimental evaluations of our proposed approximation algorithms in this context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReliability and Maintenance Optimization · Formal Methods in Verification · Software Reliability and Analysis Research
Pseudo-Separation for Assessment
of Structural Vulnerability of a Network
Alan Kuhnle, Tianyi Pan, Victoria G. Crawford, Md Abdul Alim, and My T. Thai
Department of Computer & Information Science & Engineering
University of Florida
Gainesville, Florida, USA
Email: {kuhnle, tianyi, crawford, alim, mythai}@cise.ufl.edu
Abstract
Based upon the idea that network functionality is impaired if two nodes in a network are sufficiently separated in terms of a given metric, we introduce two combinatorial pseudocut problems generalizing the classical min-cut and multi-cut problems. We expect the pseudocut problems will find broad relevance to the study of network reliability. We comprehensively analyze the computational complexity of the pseudocut problems and provide three approximation algorithms for these problems.
Motivated by applications in communication networks with strict Quality-of-Service (QoS) requirements, we demonstrate the utility of the pseudocut problems by proposing a targeted vulnerability assessment for the structure of communication networks using QoS metrics; we perform experimental evaluations of our proposed approximation algorithms in this context.
I Introduction
The concept of connectivity, or the existence of a path between two nodes, is vital for any network. Whatever functionality a network may provide to a pair of nodes is usually absent if the pair is disconnected. As a result, many studies of network vulnerability, or the degree to which the functionality of a network may be disrupted by failures, have incorporated connectivity as a fundamental measure of network functionality [1, 2, 3, 4]. Recognition of the importance of connectivity has led to the study of many combinatorial problems related to connectivity [5, 6, 7], perhaps the most well-known of which is the minimum cut problem (CUT), of determining the minimum number of edges (vertices) to remove in order to disconnect a pair of vertices in a graph. CUT was shown to be solvable in polynomial time via the celebrated maximum flow minimum cut relationship [8].
However, the functionality a network provides may break down even when elements of a network remain connected. For example, suppose is a communication network, with edge lengths representing transmission time delay over that edge. For nodes to communicate, it is necessary that the total time-delay on the routing path by which they communicate remain below some threshold . If the shortest-path distance between exceeds , communication breaks down, despite the fact that and are topologically connected within the network. Another example is the shipping of a perishable item through a transportation network. If the item reaches its destination after it has perished, it is of no use to the recipient. Therefore, instead of considering network failure to occur if elements of the network are topologically separated, we propose a more general measure of network failure: network functionality is impaired after the -separation of elements in a network, where is a real number. Two nodes are -separated if the weighted shortest-path distance exceeds .
As we demonstrate in this work, the -separation analogue (defined below) to the classical CUT problem cannot be reduced to CUT unless . Given a constant , the minimum -pseudocut (T-PCUT) problem takes as input a directed graph , targeted pair , and distance function on the edges of . The problem asks for the minimum-size set of vertices (edges) to remove from , such that after the removal of , the -shortest paths distance . To demonstrate the differences between CUT and T-PCUT, consider the following example. Let be the network shown in Fig. 1, let , and consider for each edge ; finally, set . An optimal solution to the vertex version of CUT (also known as minimum vertex separator [6]) must contain three nodes, while the removal of is an optimal solution to this instance of -PCUT; after removal of , . Observe that the naive proposal of eliminating all vertices of distance greater than from and then solving CUT on the new graph does not work, since every node in initially satisfies .
Although the new combinatorial problems we propose in this work should be broadly applicable, the application in which we are most interested is structural vulnerability with respect to additive Quality-of-Service (QoS) metrics on communication networks. For example, the total time-delay, jitter, or packet-loss111Packet-loss can be converted to an additive metric, as described in Lemma 1. between two nodes in a communication network are additive QoS metrics. For a given additive QoS metric , the minimum acceptable threshold for this metric is a constant independent of any particular communication network, although it will vary with the desired communication application, such as voice or video call, process control, or machine control.
I-A Our contributions
- •
We introduce -separation analogues to the following two classical combinatorial problems: the CUT problem defined above, and the MULTI-CUT problem [5], in which pairs must be disconnected with minimum number of edges (nodes) removed. Collectively, we refer to these new formulations as pseudocut problems, and they are respectively T-PCUT and T-MULTI-PCUT; these problems are formally defined in Section II.
- •
Computational complexity: We show that with arbitrary edge weights, -PCUT is -complete. With uniform edge weights, we show -MULTI-PCUT is inapproximable within a factor of 1.3606 by approximation-preserving reduction from the minimum vertex cover problem.
- •
Approximation algorithms: For the T-PCUT and T-MULTI-PCUT problems with uniform edge weights, we provide GEN, an -approximation algorithm; and FEN, a -approximation algorithm. In addition, we provide GEST, an efficient, randomized algorithm with probabilistic performance guarantee: with probability , GEST returns a feasible solution with cost within ratio of optimal, where is the number of pairs to -separate, is the maximum degree in the graph, and is user-defined parameter in . The time complexity of GEST is , so gives the user control of the trade-off between performance and running time.
- •
Vulnerability assessment: Finally, we utilize the pseudocut problems to formulate a vulnerability assessment for an arbitrary additive QoS metric on communication networks. We then perform extensive experimental evaluations of our algorithms in the framework of this vulnerability assessment.
I-B Related work
The theoretical results for min-cut, multi-cut, and partial multi-cut vary depending on whether the edge or vertex version of the problem is considered, and whether the graph is undirected or directed. Table I shows the current status of the best-known approximation ratios for each version of the problem, and the references where a proof of this ratio may be found. In contrast, our algorithms work equally well in undirected or directed graphs and for the vertex or edge version of the pseudocut problem. To the best of our knowledge, we are the first to consider the pseudocut problems.
The seminal work of Ford and Fulkerson showed the max-flow and min-cut are equal for the CUT problem [8]. Leighton and Rao showed an analogous result for the multi-cut problem [5] using multicommodity max-flow, which gives -approximation algorithm for the edge version of multi-cut problem in undirected graphs. For the node version of multi-cut in undirected graphs, Garg et al. [9] gave an -approximation algorithm. For the edge version of multi-cut in directed graphs, Cheriyan et al. [10] gave an -approximation; Gupta [11] improved this ratio to , and finally Agarwal et al. [12] improved the ratio to . For multi-cut in trees, Garg et al. [13] provided another max-flow min-cut relationship, giving a -approximation for multi-cut in trees.
A QoS-aware vulnerability assessment has been considered in Xuan et al. [14]; however, the complexity of their assessment lies above even the class as a valid solution cannot even be checked in polynomial time. A related problem to the single pair T-PCUT was studied by Israeli and Wood [15]; in this problem (MSP), given a fixed budget and pair , a set of edges are sought to maximize the shortest path between . Israeli and Wood seek exact solutions using a bilevel optimization model, and this problem has been used as the basis for the detection of critical infrastructure and network vulnerability [16, 17]. However, we emphasize the difference between T-PCUT and MSP: in T-PCUT, it is the size (or cost) of the critical set that must be minimized; furthermore, MSP is formulated for edge interdiction only, while we primarily consider node interdiction. Finally, we have found only expensive exact methods to solve MSP; to the best of our knowledge, no efficient solutions MSP with performance guarantee exist.
I-C Organization
The rest of this paper is organized as follows. In Section II, we define the pseudocut problems, discuss motivating applications, define the QoS vulnerability assessment, and formulate the pseudocut problems as integer programs. In Section III, we analyze the computational complexity of the pseudocut problems. In Section IV, we present our three approximation algorithms. In Section V, we experimentally evaluate our algorithms in the context of the QoS vulnerability assessments. Finally, in Section VI, we summarize our contributions and discuss future work.
II Problem definitions
In this section, we introduce the vertex versions of the pseudocut problems; the edge versions are presented in Appendix -A. Let be an arbitrary but fixed constant throughout this section. The problems will take as input a triple , where is a directed graph ; is a cost function on vertices representing the difficulty of removing each node; and is a length function on edges. For example, could be the latency or packet loss on edge . Although both and may be considered weight functions, we use cost for and length for to avoid confusion. The case when for all vertices is referred to as uniform cost, and the case when for all edges is referred to as uniform length. The distance between two vertices is the length of the -weighted, directed, and shortest path between and ; the cost of set of a set of vertices is the sum of the costs of individual vertices in .
Problem 1** (Minimum -pseudocut (T-PCUT)).**
Given triple and a pair of vertices of , determine a minimum cost set of vertices such that after the removal of from .
Notice that in the formulation of T-PCUT, we disallow the pair endpoints to be chosen in the solution – for the non-uniform cost version, this restriction is unnecessary since the endpoints could be assigned higher cost; however, we include this restriction since otherwise the optimal solution would be trivial in the uniform cost version.
Problem 2** (Minimum -multi-pseudocut (T-MULTI-PCUT)).**
Given triple , and a target set of pairs of vertices of , , determine a minimum cost set of vertices such that for all after the removal of from .
In contrast to T-PCUT, we allow picking members of pairs in into the solution of T-MULTI-PCUT; thus, there is always a feasible solution of size at most . If a vertex is removed from , we adopt the convention that for all vertices .
In the above two formulations, we emphasize again that the threshold is a fixed constant independent of the input; in addition, we introduce versions of these problems where is part of the input. We will refer to the versions of these problems where is an input as PCUT and MULTI-PCUT, respectively. Finally, the algorithms in Section IV generalize to the edge versions of the problems as well, as discussed in Appendix -B.
II-A Motivation and applications for the pseudocut problems
In this section, we give brief overviews of two potential applications of the pseudocut problems. Motivated by these examples, we next provide the vulnerability assessment for QoS on communication networks.
II-A1 Industrial Internet of Things
An emerging application for pseudocut problems is the Industrial Internet of Things (IIoT). As everyday objects become increasingly equipped with means for electronic identification and communication, from Radio Frequency Identification (RFID) to smarter communication capabilities, new applications and scenarios have emerged in the Internet of Things [18, 19].
As surveyed in [20], an emerging trend is to integrate communication capabilities into industrial production systems. Such cyberphysical systems (CPS) in the production process are connected to conventional business IT networks. Integrated CPS allow extensive monitoring and control of production facilities in real time. However, the QoS requirements for control of production systems are very strict, and special routing protocols have been formulated to guarantee acceptable QoS conditions [21]. An IEEE task group on Time-Sensitive Networking (TSN) [22] is currently chartered to provide specifications to allow time-synchronized low latency streaming services through 802 networks. Critical data streams are guaranteed certain end-to-end QoS by resource reservation; this service is intended for industrial applications such as process control, machine control, and vehicles; and for audio/video streams.
As an example application for the T-PCUT, consider two nodes in IIoT as described above: , a control node, and a lower-level node. Further, suppose that an acceptable level of packet loss ratio between is . Then, the problem instance of T-PCUT is the IIoT network , with edges weighted by the metric defined in Lemma 1 below. A solution to -PCUT problem for identifies the most critical vertices whose proper functioning is required to ensure , where is the cumulative packet loss ratio between and .
To convert the packet error rate between nodes to an additive metric, we define the following transformation. Given network , let represent packet error rate for each edge . Then, the transformation is
[TABLE]
Lemma 1**.**
Let represent packet error rate between each . Then the transformation (1) yields an additive metric such that is the lowest cumulative packet error rate between nodes over all possible routing paths.
Proof.
Let with packet error rate be given for each . Let . Let , and be the set of all paths in from to . Then
[TABLE]
Now, is the probability a packet is successfully transmitted along path . Thus, maximizing this probability over all paths minimizes both and the cumulative packet error rate between .
Furthermore, if packet error rate threshold is given, then by similar reasoning
[TABLE]
where is the cumulative packet error rate between . ∎
II-A2 Military communications networks
Next generation millitary communications networks will be multilayer, interdependent networks [23, 24, 25] comprising wired fiber-optic and wireless components, including satellite communications. For example, consider the proposed Army Warfighter Information Network-Tactical (WIN-T) network, the theory of operation for which is contained in [24]. WIN-T comprises interdependent wireless and wired components that are organized into layers; the WIN-T multi-tiered architecture is organized as follows: (1) the space layer, utilizing military satellite communications (MILSATCOM) and commercial satellite bands, (2) the airborne layer, consisting of unmanned aerial vehicles (UAVs), (3) the ground layer, which contains many different kinds of nodes. These nodes communicate to each other and nodes in the other layers in a variety of ways including wired LANs, wireless WANs, and satellite communications.
To ensure QoS in WIN-T, traffic is only admitted to the WAN network when the network infrastructure and congestion state offer a high probability that the traffic can be delivered within QoS requirements specified in WIN-T Baseline Requirements Document. Thus, communication failure between a pair of nodes in the network may occur despite the existence of a routing path between and in the network, if any of the QoS metrics are greater than a threshold .
Therefore, the T-PCUT problem would identify the most critical nodes if communication between a given pair of nodes . For example, could be a commanding node attempting to send an order to infantry unit . If communication between and is a high priority, critical nodes identified by T-PCUT would be especially important to protect against an adversarial attack.
II-A3 Vulnerability assessment on communication networks
Motivated by the above two examples, we present a vulnerability assessment for communication networks in this section. Let represent a communication network. We fix an additive QoS metric on the edges of . Since the QoS metric is additive, we define the QoS metric on the path as
[TABLE]
Furthermore, we denote the metric between a pair as , the shortest-path distance between , where the weight of each edge in the network is . Clearly, no routing path could provide better QoS with respect to than the -shortest path. Let be a constant representing the threshold such that if then communication between and is no longer possible. Notice that since the value of on each edge is determined by network parameters, it has a minimum value which is a constant independent of the network size.
Next, we define the problems of identification of the most critical elements of the network with respect to the metric and threshold , and a given targeted set of pairs in the network, with respect to -separation.
Problem 3** (Targeted Communication Vulnerability Assessment (TCVA)).**
Given communication network , an additive quality of service metric , a threshold for indicating the highest acceptable value of for communication between a pair of nodes in , a targeted set , and a cost function on , determine of minimum cost such if is removed from , then for all ,
Notice that TCVA is exactly the T-MULTI-PCUT problem with the edge length function equal to the QoS value on the edge.
II-B Integer programming formulations
In this section, we formulate the pseudocut problems as integer programs. We will state the formulations for the pseudocut versions where is an input, but the same formulations apply when is a constant. We formulate PCUT and MULTI-PCUT as integer programs in the following way. Let an instance of MULTI-PCUT be given. We will consider simple paths ; that is, paths containing no cycles. Let denote the set of simple paths between that satisfy the condition . If a vertex lies on path , we write . The following lemma relates the optimal solution to MULTI-PCUT to the minimum-size hitting set of which is necessary for the integer programming formulation.
Lemma 2**.**
Let be an optimal solution to an instance of MULTI-PCUT. Let be a minimum cost set of vertices satisfying for all for all . Then, .
Proof.
Since is a solution to the MULTI-PCUT problem, we have for all after the removal of . Any path in between a pair satisfying must therefore satisfy , for otherwise . Thus, .
Similarly, the removal of from ensures for all , hence . ∎
As a consequence of Lemma 2, we can formulate MULTI-PCUT as a covering integer program. Consider the vertex set of to be . Let if vertex lies on path , where . If , let . Also, let variable if vertex is to be chosen into the set of vertices , and [math] otherwise. Finally, denote the cost of choosing vertex as , and let vectors and . Then, the covering integer program formulation is as follows.
Integer Program 1** (IP 1).**
[TABLE]
The constraints (2) ensure that for each path , we choose at least one node . By Lemma 2, the optimal solution to IP 1 corresponds to an optimal solution of MULTI-PCUT. The linear relaxation of IP 1 is designated LP 1, in which each constraint (3) is replaced by . Finally, we remark that since PCUT is a special case of MULTI-PCUT, IP 1 and all solutions we discuss apply to PCUT as well.
II-B1 Discussion
Notice that if we let become large enough, the classical problems CUT and MULTI-CUT are recovered from PCUT and MULTI-PCUT.
If is an input, IP 1 above is superpolynomial in size; there could be constraints (1); The analogous integer program for MULTI-CUT also could have exponentially many constraints but has a polynomial-time separation oracle that enables the linear relaxation to be solved in polynomial time by the ellipsoid method [6]. However, this separation oracle does not work for the linear relaxation of IP 1; in general, the linear relaxation may not be solvable in polynomial time. However, the IP formulations above hold when is a constant. Thus, IP 1 is polynomial in size when T-MULTI-PCUT is considered.
Finally, notice that not all instances to PCUT admit a valid solution; suppose as input a graph consisting of a single edge is given. PCUT is formulated to disallow choosing or ; hence, there is no solution. Whether a feasible solution exists can easily be detected in polynomial time, so unless otherwise stated, we assume that a feasible problem instance is given in our analysis.
III Computational complexity
In this section, we present our results on the computational complexity of the pseudocut problems.
III-A T-PCUT
We give polynomial-time algorithms for certain cases of the version of T-PCUT with uniform lengths. However, T-PCUT with arbitrary edge lengths and uniform vertex costs is shown to be -hard.
Proposition 1**.**
For , T-PCUT with uniform lengths and costs is solvable in polynomial time.
Proof.
Let be an instance of T-PCUT. First consider the case . Since edge lengths are uniform, all paths of length from to have exactly three vertices: for some . Therefore, no such paths can intersect unless they are identically equal. So to ensure , one must simply remove all intermediate vertices between and .
Next, suppose . Let be a path of length 3 from to , and let be a path of length 2 that intersects . In order to satisy , must be broken, which can happen in only one way and necessarily breaks as well. Hence, in the first step we break all paths of length 2 in the same way as for the case, and denote the modified graph as . The remaining paths of length 3 do not intersect paths of length 2. Two distinct paths of length 3 can intersect each other in a maximum of one vertex. Let be the set of all nodes that appear as the second node (after ) on a path of length 3; similarly, let be the set of nodes appearing as the third node on a path of length 3. Notice that , because otherwise a path of length 2 would still be extant in the graph, but all such paths were removed in the first step.
Thus, the relevant subgraph will appear of the form exemplified in Fig. 2. Notice that an edge would have no relevance to the solution, as the only way to create a path of length 3 using would be to add as well; but this process creates the path , which is of length 2; so would have been chosen in the first step. If we delete and from the graph , we see that our problem reduces to a bipartite vertex cover problem, which is solvable in polynomial time; the second step will consist of the optimal solution to this problem. The final solution is the union of vertices chosen in the first and second steps. ∎
Proposition 2**.**
Let be a constant, T-PCUT be an instance of T-PCUT for some constant with uniform lengths and uniform costs. If the maximum degree in satisfies , then the optimal solution is computable in polynomial time.
Proof.
Consider all distinct paths of length at most starting from and ending at . The number of distinct vertices on these paths is ; let us call this set . Therefore, the number of possible subsets of these vertices is a constant bounded by . Since each subset can be checked in polynomial time, the optimal solution can be found by checking each possible subset of . ∎
Theorem 1**.**
Consider the decision version of 1-PCUT with uniform costs and arbitrary lengths; that is, given problem instance 1-PCUT with uniform costs and arbitrary lengths, and given constant , determine if a solution exists with . This problem is NP-complete.
Proof.
For clarity, we first prove the theorem for the edge version of 1-PCUT (where edges have both cost and length functions), with arbitrary costs of edges; next, we discuss how to modify the proof for the uniform cost function and the vertex version of PCUT. The decision problem is clearly in . To show -hardness, we first reduce the Knapsack problem to an instance of Pseudocut with non-uniform costs; then we discuss how to modify the reduction for uniform costs. A problem instance of Knapsack is specified as follows. Let be a set of objects with sizes and profits , and a “knapsack capacity” , and desired profit . The decision version of the problem is to find a subset of objects with total profit at least and total size bounded by .
Given a Knapsack instance, we construct an instance of the pseudocut problem in the following way. For each item , we add nodes and edges , , and . We also set the following cost and values: , , , , , and . Fig 3 illustrates this construction.
Then, letting , , we have an instance of the 1-PCUT, the decision version of which is whether there exists a set of edges of total cost at most such that . Notice that including edge into a solution incurs cost and adds to . Furthermore, edges and will not be chosen since these edges have infinite cost. So choosing edge exactly corresponds to adding item into the knapsack, and solutions to the Knapsack instance and the Pseudocut instance are in one-to-one correspondence, with corresponding solutions having the same cost. Also, iff the corresponding solution to the Knapsack problem has profit at least .
Modification for vertex version: To obtain the -hardness of the uniform cost vertex 1-PCUT problem, we discuss how to modify the above reduction. The first modification is to replace each vertex in the construction with a clique of vertices. Edges and are replaced by edges matching clique with and with , respectively. Instead of a single edge we add vertices between and , connecting each vertex in cliques to each . Distinct nodes are added and is connected to each vertex in first clique , and to each node in clique . Thus, in order to add to the distance , it is necessary to pick all vertices . ∎
III-B T-MULTI-PCUT
In this section, we show uniform length and cost T-MULTI-PCUT to be inapproximable within a factor of .
Theorem 2**.**
Let . Consider the decision version of T-MULTI-PCUT with uniform lengths and costs; that is, given problem instance T-MULTI-PCUT with uniform lengths and costs, determine if a solution exists with . This problem is NP-complete.
Proof.
The feasibility of a solution satisfying can easily be checked in polynomial time, so T-MULTI-PCUT . We give an approximation-preserving reduction [6] from the vertex cover problem to T-MULTI-PCUT. Let be an instance of the vertex cover problem; let the vertex set of be . An instance of T-MULTI-PCUT s constructed as follows. Let be a complete graph on , and be the edge set of .
Then, there is a natural one-to-one, cost-preserving correspondence between solutions of the two instances; namely the identity mapping: if is a vertex cover of size , is also a feasible solution to the T-MULTI-PCUT instance of size , since implies , which implies or since is a vertex cover, which finally implies in (by the convention discussed in Section II). If is a solution to T-MULTI-PCUT, then for each , after removal of . Since the edge is in , or is in , so that is a vertex cover. ∎
Corollary 1**.**
Unless , there is no polynomial-time approximation to uniform length, cost T-MULTI-PCUT within a factor of , for .
Proof.
This corollary follows from the proof of Theorem 2 and the inapproximability of vertex cover [26]. ∎
IV Approximation algorithms
In this section, we present three approximation algorithms for arbitrary vertex cost T-MULTI-PCUT, when the length function on edges is bounded below: for some constant . In this case, we call the edge lengths bounded. Recall from Section II-A that edge lengths are bounded when the edge length function is an additive QoS metric. For the case of bounded edge lengths, we let constant . If the length function is uniform, then of course . For bounded edge length, arbitrary vertex cost T-MULTI-PCUT, we present GEN, an -approximation algorithm, and FEN, a -approximation algorithm in Section IV-A. Although these algorithms run in polynomial time since is constant, their running time may suffer if is large for some application. Hence, we also present a randomized algorithm with probabilistic performance guarantee in Section IV-B, capable of running efficiently even for large .
IV-A Approximations for T-MULTI-PCUT
First, we present two approximation algorithms for the constant problems T-PCUT and T-MULTI-PCUT, based upon Lemma 2 and IP 1, when edge lengths have a lower bound . The idea is as follows: for each path of vertices between a pair of the target set with , we must select at least one node belonging to the path into the solution. Thus, we formulate the problem into a covering framework, where each node covers a subset of paths. Both algorithms require the following enumeration of paths.
IV-A1 Path enumeration
This enumeration can be accomplished in polynomial-time in the following way: let ; then each path must have at most nodes. Thus, we may iterate through all sequences of nodes of length at most , and test if the path produced is in ; that is, for some , the path must start at , terminate at , and satisfy . This procedure can be accomplished in time . Using these paths, we can construct the matrices in IP 1.
IV-A2 -approximation
The first approximation algorithm for MULTI-PCUT is given in Alg. 1. The general approach is as follows. After the enumeration of all paths in , the algorithm greedily selects the node that intersects the largest number of paths normalized by the vertex cost until all paths in have been covered. By the proof of Lemma 2, when all such paths in are covered, we have a feasible solution .
An explicit description of the algorithm is given in Alg. 1. In lines 1 – 3, the enumeration described above is performed. Next, the algorithm initializes , the set of vertices chosen, and , the set of paths covered by to in line 4. The while loop on line 5 tests whether any paths satisfying still exist in the network. If so, it chooses the node which covers the most such extant paths into the set on line 11 and updates accordingly on line 12.
Theorem 3**.**
Alg. 1 achieves a performance guarantee of with respect to the optimal solution with running time bounded by . Furthermore, for each , there exists an instance of the single pair PCUT problem where Alg. 1 returns a solution of cost greater than a factor of the optimal.
Proof.
The performance ratio of follows from the fact that IP 1 is a covering integer program corresponding to the set cover problem with at most elements (the paths) for which the greedy algorithm has the ratio [6].
Next, we construct a tight example for Alg. 1; which holds even in the case of the single pair T-PCUT, for . At the beginning of the construction, contains two isolated nodes, . Add nodes and edges for each . Next, add nodes to the graph, along with edges . Then, for each , add disjoint paths of length 2 between and , and similar paths between and . Let for all edges in . For , see Fig. 4 in the Appendix for a depiction of the construction. Then Alg. 1 will select nodes in that order, while the optimal solution is . ∎
IV-A3 -approximation
Next, we present FEN in Alg. 2, a frequency-based rounding algorithm for LP 1. FEN first enumerates and constructs LP 1. In this covering program, each path intersects at most nodes, as discussed above. Hence, the algorithm nexts solves LP 1 to obtain optimal fractional solution . Next, an integral solution is obtained by rounding
[TABLE]
That is a feasible solution follows from the fact that for each and , constraint so at least one in the sum must satisfy , since the sum has at most nonzero elements. Furthermore, since the optimal fractional solution has cost at most the cost of the optimal integral solution, and the cost of is within factor of , it follows that FEN is an -approximation algorithm.
IV-B Probabilistic approximation algorithm
In this section, we propose another approximation algorithm, for T-PCUT and T-MULTI-PCUT when the length function is bounded below. This algorithm, GEST, is intended to more easily handle large values of than the algorithms in the preceding section. The key for GEST is a procedure to efficiently estimate the number of paths between of length at most that each vertex lies upon, which will guide the greedy selection of nodes. By theoretical analysis, we demonstrate that GEST is not only efficient, but also has a probabilistic performance guarantee.
IV-B1 Algorithm overview and key results
The GEST algorithm is detailed in Alg. 3. As an overview, GEST iteratively selects nodes for removal based upon its estimation procedure, until the distance between all pairs exceeds . Define as the number of paths in , that intersects, respectively and , as corresponding estimators. From the definition, we have and . In each iteration of GEST, the node that maximizes will be added to , the set of selected nodes. The details of the estimator and the path sampling method are discussed in Sections IV-B2 and IV-B3, respectively.
In the following, we will prove Theorem 4, which establishes the key results on the probabilistic approximation ratio and time complexity of GEST. Before the proof, we introduce Lemma 3 on the number of samples for each pair to guarantee the accuracy of . The proof of Lemma 3 is provided in Section IV-B4. The parameter in can be used to balance running time and accuracy of the algorithm.
Lemma 3**.**
Let the number of paths sampled for each be at least . Then, given a set and as the maximum degree in , the inequality holds with probability at least .
Theorem 4**.**
Given an instance of uniform vertex cost T-MULTI-PCUT whose length function is bounded below, let be the maximum degree in . With probability at least , Alg. 3 returns a feasible solution with cost within ratio of optimal. The running time of Alg. 3 is .
Proof.
Let ; then for any , observe that
[TABLE]
We will apply Lemma 3 and consider that the inequality therein always holds; later, we will consider the probability that the inequality in Lemma 3 does not hold for some application. Let and apply Lemma 3. By (5), we have:
[TABLE]
Observe that Alg. 3 at each iteration picks such that . Let be the choice of Alg. 3 after iterations, and let be the final solution returned by the algorithm. Let be the size of an optimal solution satisfying , where is the number of paths in ; notice that is determined in Alg. 3 by testing if all pairs in satisfy after removal of . Then
[TABLE]
Therefore, Then
[TABLE]
From here, there exists an such that the following differences satisfy
[TABLE]
Thus, by inequalities (8) and (9), and By inequality (10) and the assumption on the termination of the algorithm, the greedy algorithm adds at most more elements, so In Alg. 3, we require the guarantee from Lemma 3 for all nodes for all iterations, which can happen times in the worst case. Therefore, by union bound, the probability of having the desired approximation ratio is at least . The running time follows from the choice of . Alg. 3 needs to sample sets of samples per iteration and in the worst case, there can be iterations. ∎
IV-B2 The estimators
Let , and let be the set of all paths between satisfying the distance constraint and additionally vertex . We want to efficiently estimate the quantity for all . To achieve this estimation, we adapt the approach of Roberts et al. [27]; their estimators are for the total number of simple paths in a graph, while we require as estimation of the number of simple paths each vertex lies upon, where the length of each path is restricted to be at most .
To define an estimator , we proceed in the following way. Let be any simple path between and ; we will define a probability distribution on paths satisfying if ; the distribution is defined in Section IV-B3 and will have domain , a set of simple paths starting from . We will then independently sample paths from and define the estimator
[TABLE]
where is an indicator random variable that takes value if and , and [math] otherwise.
Lemma 4**.**
* is an unbiased estimator of .*
Proof.
Let be the random variable
[TABLE]
for . Then the expection of is
[TABLE]
From here, the lemma follows from the law of large numbers. ∎
IV-B3 Definition of and path sampling
Next, we define the probability distribution on , the set of all simple paths starting from and ending at or ending at another vertex and is maximal; that is, adding any vertex to creates a cycle or causes the length of the path to exceed . We define the probability of a path sequentially: Notice that since is always chosen as the starting vertex. Furthermore, is a uniform distribution over the number of vertices available to be chosen as the next vertex of the path; that is does not create a cycle and .
The definition of lends itself to the following sequential sampling algorithm, shown in Alg. 4. In line 1, the algorithm choose with probability . Let be the set of neighbors of not previously chosen into the path . If or , the algorithm terminates. Otherwise is chosen from uniformly with probability and the value of is updated accordingly.
IV-B4 Bound on number of samples required
In this section, we prove Lemma 3 for how many path samples are required to ensure . To this end, we require Hoeffding’s inequality
Theorem** (Hoeffding’s inequality).**
Suppose are independent random variables in . Let . Then the probability
Proof for Lemma 3.
Consider , where is the random variable defined in (12). Let , which is the maximum value of , and . Next, we require the probability bound from Hoeffding’s inequality to be less than . Solving for the number of samples yields Therefore, when the number of samples is at least , we can guarantee for one pair with probability . Then, the inequality holds for all with probability by union bound. Since and are the summations, is at most when all the inequalities hold. ∎
IV-B5 Further modification to GEST
In this section, we discuss a simple modifications to GEST; this modification, GESTA, improves performance for the T-MULTI-PCUT problem.
GESTA: In practice, valid path samples in become harder to obtain as GEST progresses nearer to a solution to T-MULTI-PCUT; this fact results from most valid paths originally in the network having already been broken. Therefore, we propose GESTA, a modification to Alg. 3 as follows: if GESTA performs samples, as in line 5 of GEST, and obtains no valid paths in for any , then GESTA computes a shortest path between a randomly chosen pair in for which . The algorithm then chooses the cheapest node on this path into its solution, and continues with the while loop on line 2 of GEST.
V Experimental evaluation
In this section, we experimentally evaluate our proposed algorithms on the QoS vulnerability assessment TCVA in V-B. In Section V-A, we discuss the methodology of our evaluation.
V-A Datasets and methodology
Synthesized datasets: To generate topologies, we used a well-known Internet topology generator BRITE [28]; which we employed to generate (1) Flat Router-Level (RL) only, (2) Flat Autonomous System level (AS) only, and (3) hierarchical top-down datasets, consisting of AS and RL, with each AS divided into routers. We also used topologies generated according to Erdos-Renyi (ER) random graphs. To simulate a QoS metric, edges were weighted uniformly in the interval , following [29, 14]. The dataset statistics are as follows: ER1, an ER graph with , ; RL1, router-level graph with , , generated by BRITE with default parameters and Waxman model; RL2, same as RL1 except ; RL3, same as RL1 except , ; AS1, an AS-level graph generated by BRITE with default parameters and ; and finally, H1, a hierarchical BRITE top-down graph with 200 autonomous systems and 100 routers per AS, with .
Algorithms for TCVA: For TCVA, we compared the following algorithms with GEN (Alg. 1), FEN (Alg. 2), and GESTA (Section IV-B5):
- •
OPT: the optimal solution of IP 1, which was implemented using the IP solver included in the open-source GNU Linear Programming Kit (GLPK) [30];
- •
MC: the classical minimum-cut algorithm implemented with the Goldberg-Tarjan algorithm [31] for maximum flow, only employed when the size of the target set ; and
The cost function on vertices employed for TCVA is specified in each section; when cost is uniform, we refer to the size of the solution returned by each algorithm. The path enumeration required for GEN, FEN, and OPT was parallelized, using at most 25 threads. This parallelization was accomplished by assigning distinct initial segments of paths to distinct threads. Also, when , enumerations for distinct pairs were assigned to distinct threads. Total computation time is the sum of the computation time over all threads. Algorithms were limited to one hour of wall-clock time before termination; this could be much more computation time than one hour depending on the level of parallelization. All times shown in the results are total computation time. All experiments were performed on a machine with Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz and 392 GB RAM.
V-B Evaluation for Targeted Assessment (TCVA)
V-B1 On choice of target set
In order to evaluate the algorithms for TCVA, it is necessary to choose the target set ; in practice, this choice is entirely up to the user. First, we discuss the motivation and effectiveness of choosing the target sets uniformly randomly; next, we observe how restricting the elements of the target set based upon their degree affects the size of the optimal solution.
Uniformly random: One method of evaluating the performance of our algorithms for TCVA is to measure the average size (or cost) of the solution over all possible choices of the target set . To avoid the large computation time involved in running each algorithm on each possible choice of , we approximated this value by averaging over uniformly random choices of . To justify this approximation, we show in Fig. 5(a) the average cost of the solution returned by each algorithm versus on the RL1 dataset, with and . Also shown is the sample standard deviation of the values for the cost. While the value of the mean fluctuates, the value of these fluctuations is less than despite the huge number , the number of possible choices of . Qualitatively similar results were found for the other datasets and values. Therefore, in the remainder of this section we average results over uniformly random choices of unless otherwise stated, which we found sufficient to identify trends in the results.
By degree: Next, we observed how restricting the choice of the target set by degree impacts the size of the optimal solution. For the purposes of this assessment, let , and let be the maximum degree in graph ; define the following two sets of vertices: , . Then we may restrict a source or target node to lie uniformly randomly within one of these sets. We consider four different schemes of choosing the target set based upon : HL, HH, LL, and RR. In HL, for each pair , is chosen uniformly random from , and is chosen uniformly randomly within . HH and LL are defined analogously, and RR chooses both nodes of each pair uniformly randomly from the entire vertex, as in the previous section.
In Fig. 5(b), we plot the size of the optimal solution to TCVA versus for each scheme of target set selection, averaged over choices of . The results for LL and RR are as expected; RR shows no dependence on , and LL is approximately equal to RR for low values of before decreasing monotonically as approaches
- However, HH and HL initially increase before decreasing below RR – this behavior is explained by the cardinality of and in addition to the restriction upon the degree. As increases, the cardinality of decrease; as these cardinalities decrease, it becomes more likely that an element from one pair in the target set appears in another pair, even though all pairs in the target set are distinct. As the fraction of nodes appearing in multiple pairs increases, it becomes easier to pseudo-separate the target set. This effect counteracts the fact that higher degree nodes are more difficult to pseudo-separate.
V-B2 Size of target set
In this section, we fixed a constant for each dataset, let vertices have uniform cost, and observed the behavior of the algorithms when was incremented from to . The only algorithm able to run on all datasets and values was GESTA, and it demonstrated good performance (always within a factor of 2 in solution cost) in comparison with OPT while running faster than the other algorithms by a factor of more than 10. Representative results are shown in the first two columns of Fig. 6. GEN outperforms GESTA and is the algorithm consistently the closest in performance to OPT when both run. Second best alternates between GESTA and FEN on RL1 and AS1, respectively. For each dataset, at some value, OPT exceeds one hour of computation time and is no longer included in the results. Notice on our largest dataset H1, with , neither GEN nor FEN can run after . Both of these algorithms require the enumeration of , which was unable to complete after this value of on this dataset. However, on RL1 and AS1, GEN and FEN continue to finish within one hour throughout the experiment; notice from the running time shown in Fig. 6(e) that the asymptotic behavior of the running time for fixed of GEN is linear in , consistent with Theorem 3. In practice, GESTA runs faster than GEN and FEN by a constant factor of more than 10 on all inputs.
V-B3 Varying threshold
In this section, we consider two choices of : , and . We then observed the behavior of the algorithms when was incremented; representative results are shown in the last two columns of Fig. 6. When , we compared the performance of our algorithms to the classical MC algorithm (Fig. 6(c)); as expected, MC returned a result independent of , which demonstrates the inadequacy of solutions to the classical cutting problems for our assessments: for example, at , MC is returning a solution of size more than four times the optimal, and it does comparatively worse for lower values of . Also, we observe experimentally that as increases, we recover the classical version of our problem: past , GESTA is completely separating the input pair, and returning a solution of size similar to MC.
As in the previous section, the only algorithm able to run for all parameter values was GESTA, which maintained performance within factor 2 of OPT. Although not as scalable as GESTA, GEN consistently outperformed the other algorithms in size of solution. On ER1, shown in Fig. 6(c), GEN was limited by the path enumeration time after , and FEN and OPT were unable to finish solving the LP 1; this LP solution is necessary for the rounding of FEN and the integer solver of GLPK. Indeed, the running time of GEN and FEN increased exponentially with (Fig. 6(h)) as expected.
V-B4 Discussion
Throughout the TCVA experiments, we consistently observed the best performance compared to the optimal by GEN, which was able to run in many situations where OPT could not finish. Furthermore, GEN scales well with the size of the target set . However, as the threshold value becomes relatively large, LP 1 becomes much larger and thus more difficult to solve; for this reason, GEN was unable to finish when became large. In these cases, we demonstrated that the approach of GESTA scales well with both the size of and the threshold value , while maintaining good performance with respect to the optimal.
VI Conclusions and Future Work
In this work, we introduced three new combinatorial pseudocut problems. We analyzed the computational complexity of these problems, and we provided three approximation algorithms. We used the pseudocut problems to formulate a vulnerability assessment TCVA with respect to an arbitrary additive QoS metric on a communications network. Future work would include extending this assessment to incorporate more than one QoS metric; however, this is likely to be difficult as the problem of finding a routing path satisfying two or more QoS constraints is NP-hard; however, approximation algorithms do exist for this problem [29]. In addition, the computational complexity of the uniform edge length version of our simplest problem, T-PCUT, is left open; our NP-hardness proof required nonuniform edge lengths and we provided polynomial-time algorithms only for special cases.
In our experimental evaluation, we found our -approximation GEN for T-MULTI-PCUT to consistently return the solution closest to the optimal value, although its asymptotic ratio is worse than the ratio of FEN; however, for applications that demand a high value for , our experiments showed that GEN and FEN may be unsuitable, despite the ease with which path enumeration may be parallelized – for this case, minor modifications to our probabilistic algorithm GEST were shown to give good performance in practice. The modifications to GEST were necessary because of the difficulty of obtaining valid path samples when GEST is close to a feasible solution; future work would include boosting the ability of GEST to obtain valid samples of paths between a terminal pair , so that heuristic modification GESTA becomes unnecessary.
-A Edge versions
Let be an arbitrary but fixed constant throughout this section. The problems will take as input a triple , where is a directed graph ; is a cost function on edges representing the difficulty of removing each edge; and is a length function on edges. Although both and may be considered weight functions, we use cost for and length for to avoid confusion. The distance between two vertices is the length of the -weighted, directed, and shortest path between and ; the cost of set of a set of edges is the sum of the costs of individual edges in .
Problem 4** (Minimum -pseudocut (edge version)).**
Given triple and a pair of vertices of , determine a minimum cost set of edges such that after the removal of from .
Problem 5** (Minimum -multi-pseudocut (edge version)).**
Given triple , and a target set of pairs of vertices of , , determine a minimum cost set of edges such that for all after the removal of from .
-B Algorithms for edge versions
If paths from to are defined as sequences of edges instead of vertices, then, to approximate the edge versions, we can define analogous approximation algorithms to GEN, FEN, GEST, and ENBI with analagous performance bounds. For example, we define an analogous program to IP 1 for the edge version of MULTI-PCUT below.
We will consider simple paths ; that is, paths containing no cycles. Let denote the set of simple paths between that satisfy the condition . If an edge lies on path , we write . Consider the edge set of to be . Let if edge lies on path , where . If , let . Also, let variable if edge is to be chosen into the set of edges , and [math] otherwise. Finally, denote the cost of choosing edge as , and let vectors and . Then, the covering integer program formulation is as follows.
Integer Program 2** (Edge MULTI-PCUT).**
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Tony H Grubesic, Timothy C Matisziw, Alan T Murray, and Diane Snediker. Comparative Approaches for Assessing Network Vulnerability. International Regional Science Review , 31(1):88–112, 2008.
- 2[2] Ashwin Arulselvan, Clayton W. Commander, Lily Elefteriadou, and Panos M. Pardalos. Detecting critical nodes in sparse graphs. Computers and Operations Research , 36(7):2193–2200, 2009.
- 3[3] Thang N. Dinh, Ying Xuan, My T. Thai, Panos M. Pardalos, and Taieb Znati. On new approaches of assessing network vulnerability: Hardness and approximation. IEEE/ACM Transactions on Networking , 20(2):609–619, 2012.
- 4[4] Thang N. Dinh and My T. Thai. Network under joint node and link attacks: Vulnerability assessment methods and analysis. IEEE/ACM Transactions on Networking , 23(3):1001–1011, 2015.
- 5[5] Tom Leighton and Satish Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM , 46(6):787–832, 1999.
- 6[6] Vijay V Vazirani. Approximation Algorithms . 2013.
- 7[7] C J Colbourn. The Combinatorics of Network Reliability. 1987.
- 8[8] L. R. Ford and D. R. Fulkerson. Sur le probleme des courbes gauches en topologie. Canad. J. Math , 8:399–404, 1956.
