Pseudo-Separation for Assessment of Structural Vulnerability of a   Network

Alan Kuhnle; Tianyi Pan; Victoria G. Crawford; Md Abdul Alim; My T.; Thai

arXiv:1704.04555·cs.DS·April 18, 2017

Pseudo-Separation for Assessment of Structural Vulnerability of a Network

Alan Kuhnle, Tianyi Pan, Victoria G. Crawford, Md Abdul Alim, My T., Thai

PDF

Open Access

TL;DR

This paper introduces pseudocut problems as a new way to assess network vulnerability by analyzing how network functionality is affected when nodes are sufficiently separated, with applications in communication networks.

Contribution

It generalizes classical cut problems, analyzes their computational complexity, and provides approximation algorithms with practical evaluation for network vulnerability assessment.

Findings

01

Pseudocut problems are broadly relevant to network reliability.

02

Three approximation algorithms are proposed for pseudocut problems.

03

Experimental evaluation demonstrates the utility of the algorithms in communication networks.

Abstract

Based upon the idea that network functionality is impaired if two nodes in a network are sufficiently separated in terms of a given metric, we introduce two combinatorial \emph{pseudocut} problems generalizing the classical min-cut and multi-cut problems. We expect the pseudocut problems will find broad relevance to the study of network reliability. We comprehensively analyze the computational complexity of the pseudocut problems and provide three approximation algorithms for these problems. Motivated by applications in communication networks with strict Quality-of-Service (QoS) requirements, we demonstrate the utility of the pseudocut problems by proposing a targeted vulnerability assessment for the structure of communication networks using QoS metrics; we perform experimental evaluations of our proposed approximation algorithms in this context.

Tables1

Table 1. TABLE I: Approximation results

Problem	Undirected	Directed
CUT (both)	1	1
M-CUT (edge)	$O (\log k)$ [6]	$O (n^{11 / 23})$ [12]
M-CUT (vertex)	$O (\log k)$ [9]	-
T-PCUT (both)	$T + 1$	$T + 1$
T-M-PCUT (both)	$T + 1$	$T + 1$

Equations42

p_{uv} \to - lo g (1 - p_{uv}) .

p_{uv} \to - lo g (1 - p_{uv}) .

d (s, t)

d (s, t)

= p \in P min e \in p \sum - lo g (1 - p_{er} (e))

= - p \in P max lo g e \in p \prod (1 - p_{er} (e))

d (s, t) < - lo g (1 - P) ⟺ p_{er} (s, t) < P,

d (s, t) < - lo g (1 - P) ⟺ p_{er} (s, t) < P,

Q (p) = i = 1 \sum l Q (p_{i - 1}, p_{i}) .

Q (p) = i = 1 \sum l Q (p_{i - 1}, p_{i}) .

min c \cdot w such that

min c \cdot w such that

i = 1 \sum n A_{p, i}^{(u, v)} w_{i} \geq 1, \forall p \in P (u, v), \forall (u, v) \in S

w_{i} \in {0, 1}, \forall i \in {1, \dots, n}

\overset{w}{^}_{i} = {10 \overset{w}{ˉ}_{i} \geq \frac{1}{T _{0} + 1} otherwise

\overset{w}{^}_{i} = {10 \overset{w}{ˉ}_{i} \geq \frac{1}{T _{0} + 1} otherwise

Δ_{x} τ (S) \geq Δ_{x} τ (T) .

Δ_{x} τ (S) \geq Δ_{x} τ (T) .

Δ_{x} σ (S) \geq Δ_{x} σ (T) - ε .

Δ_{x} σ (S) \geq Δ_{x} σ (T) - ε .

P - σ (A_{i})

P - σ (A_{i})

= j = 1 \sum o Δ_{c_{j}} σ (A_{i} \cup {c_{1}, \dots, c_{j - 1}})

\leq j = 1 \sum o Δ_{c_{j}} σ (A_{i}) + o ε (by Eq. \ref eq:apxsm)

\leq o \cdot [σ (A_{i + 1}) - σ (A_{i}) + ε] .

P - σ (A_{i})

P - σ (A_{i})

\leq P (1 - \frac{1}{o})^{i} + ε o .

P - σ (A_{i})

P - σ (A_{i})

P - σ (A_{i + 1})

σ_{uv} (W \cup {i}) = \frac{1}{L} l = 1 \sum L \frac{I ( q _{l} \in \cup _{j \in W \cup {i}} P ^{j} ( u , v ) )}{h ( q _{l} )},

σ_{uv} (W \cup {i}) = \frac{1}{L} l = 1 \sum L \frac{I ( q _{l} \in \cup _{j \in W \cup {i}} P ^{j} ( u , v ) )}{h ( q _{l} )},

Y (q) := \frac{I ( q _{l} \in \cup _{j \in W \cup {i}} P ^{j} ( u , v ) )}{h ( q )},

Y (q) := \frac{I ( q _{l} \in \cup _{j \in W \cup {i}} P ^{j} ( u , v ) )}{h ( q )},

E (Y (q))

E (Y (q))

= \cup_{j \in W \cup {i}} P^{j} (u, v) = τ_{uv} (W \cup {i}) .

min c \cdot w such that

min c \cdot w such that

i = 1 \sum n A_{p, i}^{(u, v)} w_{i} \geq 1, \forall p \in P (u, v), \forall (u, v) \in S

w_{i} \in {0, 1}, \forall i \in {1, \dots, n} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReliability and Maintenance Optimization · Formal Methods in Verification · Software Reliability and Analysis Research

Full text

Pseudo-Separation for Assessment

of Structural Vulnerability of a Network

Alan Kuhnle, Tianyi Pan, Victoria G. Crawford, Md Abdul Alim, and My T. Thai

Department of Computer & Information Science & Engineering

University of Florida

Gainesville, Florida, USA

Email: {kuhnle, tianyi, crawford, alim, mythai}@cise.ufl.edu

Abstract

Based upon the idea that network functionality is impaired if two nodes in a network are sufficiently separated in terms of a given metric, we introduce two combinatorial pseudocut problems generalizing the classical min-cut and multi-cut problems. We expect the pseudocut problems will find broad relevance to the study of network reliability. We comprehensively analyze the computational complexity of the pseudocut problems and provide three approximation algorithms for these problems.

Motivated by applications in communication networks with strict Quality-of-Service (QoS) requirements, we demonstrate the utility of the pseudocut problems by proposing a targeted vulnerability assessment for the structure of communication networks using QoS metrics; we perform experimental evaluations of our proposed approximation algorithms in this context.

I Introduction

The concept of connectivity, or the existence of a path between two nodes, is vital for any network. Whatever functionality a network may provide to a pair of nodes is usually absent if the pair is disconnected. As a result, many studies of network vulnerability, or the degree to which the functionality of a network may be disrupted by failures, have incorporated connectivity as a fundamental measure of network functionality [1, 2, 3, 4]. Recognition of the importance of connectivity has led to the study of many combinatorial problems related to connectivity [5, 6, 7], perhaps the most well-known of which is the minimum cut problem (CUT), of determining the minimum number of edges (vertices) to remove in order to disconnect a pair $(s,t)$ of vertices in a graph. CUT was shown to be solvable in polynomial time via the celebrated maximum flow minimum cut relationship [8].

However, the functionality a network provides may break down even when elements of a network remain connected. For example, suppose $G$ is a communication network, with edge lengths representing transmission time delay over that edge. For nodes $s,t$ to communicate, it is necessary that the total time-delay on the routing path by which they communicate remain below some threshold $T$ . If the shortest-path distance between $s,t$ exceeds $T$ , communication breaks down, despite the fact that $s$ and $t$ are topologically connected within the network. Another example is the shipping of a perishable item through a transportation network. If the item reaches its destination after it has perished, it is of no use to the recipient. Therefore, instead of considering network failure to occur if elements of the network are topologically separated, we propose a more general measure of network failure: network functionality is impaired after the $T$ -separation of elements in a network, where $T$ is a real number. Two nodes $s,t$ are $T$ -separated if the weighted shortest-path distance exceeds $T$ .

As we demonstrate in this work, the $T$ -separation analogue (defined below) to the classical CUT problem cannot be reduced to CUT unless $P=NP$ . Given a constant $T$ , the minimum $T$ -pseudocut (T-PCUT) problem takes as input a directed graph $G$ , targeted pair $(s,t)$ , and distance function $d$ on the edges of $G$ . The problem asks for the minimum-size set of vertices (edges) $W$ to remove from $G$ , such that after the removal of $W$ , the $d$ -shortest paths distance $d(s,t)>T$ . To demonstrate the differences between CUT and T-PCUT, consider the following example. Let $G$ be the network shown in Fig. 1, let $s=0,t=12$ , and consider $d(e)=1$ for each edge $e\in G$ ; finally, set $T=5$ . An optimal solution to the vertex version of CUT (also known as minimum vertex separator [6]) must contain three nodes, while the removal of $W=\{5,7\}$ is an optimal solution to this instance of $5$ -PCUT; after removal of $W$ , $d(s,t)=6>T$ . Observe that the naive proposal of eliminating all vertices of distance greater than $T$ from $s$ and then solving CUT on the new graph does not work, since every node $v$ in $G$ initially satisfies $d(s,v)\leq 4$ .

Although the new combinatorial problems we propose in this work should be broadly applicable, the application in which we are most interested is structural vulnerability with respect to additive Quality-of-Service (QoS) metrics on communication networks. For example, the total time-delay, jitter, or packet-loss111Packet-loss can be converted to an additive metric, as described in Lemma 1. between two nodes in a communication network are additive QoS metrics. For a given additive QoS metric $Q$ , the minimum acceptable threshold $T_{Q}$ for this metric is a constant independent of any particular communication network, although it will vary with the desired communication application, such as voice or video call, process control, or machine control.

I-A Our contributions

•

We introduce $T$ -separation analogues to the following two classical combinatorial problems: the CUT problem defined above, and the MULTI-CUT problem [5], in which $k$ pairs $(s_{1},t_{1}),\ldots,(s_{k},t_{k})$ must be disconnected with minimum number of edges (nodes) removed. Collectively, we refer to these new formulations as pseudocut problems, and they are respectively T-PCUT and T-MULTI-PCUT; these problems are formally defined in Section II.

•

Computational complexity: We show that with arbitrary edge weights, $T$ -PCUT is $NP$ -complete. With uniform edge weights, we show $T$ -MULTI-PCUT is inapproximable within a factor of 1.3606 by approximation-preserving reduction from the minimum vertex cover problem.

•

Approximation algorithms: For the T-PCUT and T-MULTI-PCUT problems with uniform edge weights, we provide GEN, an $O(\log n)$ -approximation algorithm; and FEN, a $(T+1)$ -approximation algorithm. In addition, we provide GEST, an efficient, randomized algorithm with probabilistic performance guarantee: with probability $1-1/n$ , GEST returns a feasible solution with cost within ratio $O(\alpha\delta^{T}+\log k)$ of optimal, where $k$ is the number of pairs to $T$ -separate, $\delta$ is the maximum degree in the graph, and $\alpha$ is user-defined parameter in $(0,1)$ . The time complexity of GEST is $O(k^{3}n\log(2n^{2})/2\alpha^{2})$ , so $\alpha$ gives the user control of the trade-off between performance and running time.

•

Vulnerability assessment: Finally, we utilize the pseudocut problems to formulate a vulnerability assessment for an arbitrary additive QoS metric on communication networks. We then perform extensive experimental evaluations of our algorithms in the framework of this vulnerability assessment.

I-B Related work

The theoretical results for min-cut, multi-cut, and partial multi-cut vary depending on whether the edge or vertex version of the problem is considered, and whether the graph is undirected or directed. Table I shows the current status of the best-known approximation ratios for each version of the problem, and the references where a proof of this ratio may be found. In contrast, our algorithms work equally well in undirected or directed graphs and for the vertex or edge version of the pseudocut problem. To the best of our knowledge, we are the first to consider the pseudocut problems.

The seminal work of Ford and Fulkerson showed the max-flow and min-cut are equal for the CUT problem [8]. Leighton and Rao showed an analogous result for the multi-cut problem [5] using multicommodity max-flow, which gives $O(\log k)$ -approximation algorithm for the edge version of multi-cut problem in undirected graphs. For the node version of multi-cut in undirected graphs, Garg et al. [9] gave an $O(\log k)$ -approximation algorithm. For the edge version of multi-cut in directed graphs, Cheriyan et al. [10] gave an $O(\sqrt{n\log k})$ -approximation; Gupta [11] improved this ratio to $O(\sqrt{n})$ , and finally Agarwal et al. [12] improved the ratio to $O(n^{11/23})$ . For multi-cut in trees, Garg et al. [13] provided another max-flow min-cut relationship, giving a $2$ -approximation for multi-cut in trees.

A QoS-aware vulnerability assessment has been considered in Xuan et al. [14]; however, the complexity of their assessment lies above even the $NP$ class as a valid solution cannot even be checked in polynomial time. A related problem to the single pair T-PCUT was studied by Israeli and Wood [15]; in this problem (MSP), given a fixed budget $k$ and pair $(s,t)$ , a set of $k$ edges are sought to maximize the shortest path between $s,t$ . Israeli and Wood seek exact solutions using a bilevel optimization model, and this problem has been used as the basis for the detection of critical infrastructure and network vulnerability [16, 17]. However, we emphasize the difference between T-PCUT and MSP: in T-PCUT, it is the size (or cost) of the critical set that must be minimized; furthermore, MSP is formulated for edge interdiction only, while we primarily consider node interdiction. Finally, we have found only expensive exact methods to solve MSP; to the best of our knowledge, no efficient solutions MSP with performance guarantee exist.

I-C Organization

The rest of this paper is organized as follows. In Section II, we define the pseudocut problems, discuss motivating applications, define the QoS vulnerability assessment, and formulate the pseudocut problems as integer programs. In Section III, we analyze the computational complexity of the pseudocut problems. In Section IV, we present our three approximation algorithms. In Section V, we experimentally evaluate our algorithms in the context of the QoS vulnerability assessments. Finally, in Section VI, we summarize our contributions and discuss future work.

II Problem definitions

In this section, we introduce the vertex versions of the pseudocut problems; the edge versions are presented in Appendix -A. Let $T$ be an arbitrary but fixed constant throughout this section. The problems will take as input a triple $(G,c,d)$ , where $G$ is a directed graph $G=(V,E)$ ; $c:V\to\mathbf{R}^{+}$ is a cost function on vertices representing the difficulty of removing each node; and $d:E\to\mathbf{R}^{+}$ is a length function on edges. For example, $d(e)$ could be the latency or packet loss on edge $e$ . Although both $c$ and $d$ may be considered weight functions, we use cost for $c$ and length for $d$ to avoid confusion. The case when $c(v)=1$ for all vertices is referred to as uniform cost, and the case when $d(e)=1$ for all edges is referred to as uniform length. The distance $d(u,v)$ between two vertices is the length of the $d$ -weighted, directed, and shortest path between $u$ and $v$ ; the cost $c(W)$ of set $W$ of a set of vertices is the sum of the costs of individual vertices in $W$ .

Problem 1 (Minimum $T$ -pseudocut (T-PCUT)).

Given triple $(G,c,d)$ and a pair $(s,t)$ of vertices of $G$ , determine a minimum cost set $W\subset V\backslash\{s,t\}$ of vertices such that $d(s,t)>T$ after the removal of $W$ from $G$ .

Notice that in the formulation of T-PCUT, we disallow the pair endpoints to be chosen in the solution – for the non-uniform cost version, this restriction is unnecessary since the endpoints could be assigned higher cost; however, we include this restriction since otherwise the optimal solution would be trivial in the uniform cost version.

Problem 2 (Minimum $T$ -multi-pseudocut (T-MULTI-PCUT)).

Given triple $(G,c,d)$ , and a target set of pairs of vertices of $G$ , $\mathcal{S}=\{(s_{1},t_{1}),(s_{2},t_{2}),\ldots,(s_{k},t_{k})\}$ , determine a minimum cost set $W$ of vertices such that $d(s_{i},t_{i})>T$ for all $i$ after the removal of $W$ from $G$ .

In contrast to T-PCUT, we allow picking members of pairs in $\mathcal{S}$ into the solution of T-MULTI-PCUT; thus, there is always a feasible solution of size at most $k$ . If a vertex $v$ is removed from $G$ , we adopt the convention that $d(v,w)=\infty$ for all vertices $w\in G$ .

In the above two formulations, we emphasize again that the threshold $T$ is a fixed constant independent of the input; in addition, we introduce versions of these problems where $T$ is part of the input. We will refer to the versions of these problems where $T$ is an input as PCUT and MULTI-PCUT, respectively. Finally, the algorithms in Section IV generalize to the edge versions of the problems as well, as discussed in Appendix -B.

II-A Motivation and applications for the pseudocut problems

In this section, we give brief overviews of two potential applications of the pseudocut problems. Motivated by these examples, we next provide the vulnerability assessment for QoS on communication networks.

II-A1 Industrial Internet of Things

An emerging application for pseudocut problems is the Industrial Internet of Things (IIoT). As everyday objects become increasingly equipped with means for electronic identification and communication, from Radio Frequency Identification (RFID) to smarter communication capabilities, new applications and scenarios have emerged in the Internet of Things [18, 19].

As surveyed in [20], an emerging trend is to integrate communication capabilities into industrial production systems. Such cyberphysical systems (CPS) in the production process are connected to conventional business IT networks. Integrated CPS allow extensive monitoring and control of production facilities in real time. However, the QoS requirements for control of production systems are very strict, and special routing protocols have been formulated to guarantee acceptable QoS conditions [21]. An IEEE task group on Time-Sensitive Networking (TSN) [22] is currently chartered to provide specifications to allow time-synchronized low latency streaming services through 802 networks. Critical data streams are guaranteed certain end-to-end QoS by resource reservation; this service is intended for industrial applications such as process control, machine control, and vehicles; and for audio/video streams.

As an example application for the T-PCUT, consider two nodes in IIoT as described above: $s$ , a control node, and $t$ a lower-level node. Further, suppose that an acceptable level of packet loss ratio between $s,t$ is $10^{-10}$ . Then, the problem instance of T-PCUT is the IIoT network $G$ , with edges $e$ weighted by the metric $d$ defined in Lemma 1 below. A solution to $10^{-10}$ -PCUT problem for $(s,t)$ identifies the most critical vertices whose proper functioning is required to ensure $p(s,t)<10^{-10}$ , where $p(s,t)$ is the cumulative packet loss ratio between $s$ and $t$ .

To convert the packet error rate between nodes to an additive metric, we define the following transformation. Given network $G=(V,E)$ , let $p_{uv}\in[0,1]$ represent packet error rate for each edge $(u,v)\in E$ . Then, the transformation is

[TABLE]

Lemma 1.

Let $p_{uv}$ represent packet error rate between each $(u,v)\in E$ . Then the transformation (1) yields an additive metric $d$ such that $1-\exp\left(-d(s,t)\right)$ is the lowest cumulative packet error rate between nodes $s,t$ over all possible routing paths.

Proof.

Let $G=(V,E)$ with packet error rate $p_{er}(e)\in(0,1)$ be given for each $e\in E$ . Let $d(e)=-\log(1-p_{er}(e))$ . Let $s,t\in G$ , and $\mathscr{P}$ be the set of all paths in $G$ from $s$ to $t$ . Then

[TABLE]

Now, $\prod_{e\in p}(1-p_{er}(e))$ is the probability a packet is successfully transmitted along path $p$ . Thus, maximizing this probability over all paths minimizes both $d(s,t)$ and the cumulative packet error rate between $s,t$ .

Furthermore, if packet error rate threshold $P$ is given, then by similar reasoning

[TABLE]

where $p_{er}(s,t)$ is the cumulative packet error rate between $s,t$ . ∎

II-A2 Military communications networks

Next generation millitary communications networks will be multilayer, interdependent networks [23, 24, 25] comprising wired fiber-optic and wireless components, including satellite communications. For example, consider the proposed Army Warfighter Information Network-Tactical (WIN-T) network, the theory of operation for which is contained in [24]. WIN-T comprises interdependent wireless and wired components that are organized into layers; the WIN-T multi-tiered architecture is organized as follows: (1) the space layer, utilizing military satellite communications (MILSATCOM) and commercial satellite bands, (2) the airborne layer, consisting of unmanned aerial vehicles (UAVs), (3) the ground layer, which contains many different kinds of nodes. These nodes communicate to each other and nodes in the other layers in a variety of ways including wired LANs, wireless WANs, and satellite communications.

To ensure QoS in WIN-T, traffic is only admitted to the WAN network when the network infrastructure and congestion state offer a high probability that the traffic can be delivered within QoS requirements specified in WIN-T Baseline Requirements Document. Thus, communication failure between a pair $s,t$ of nodes in the network may occur despite the existence of a routing path between $s$ and $t$ in the network, if any of the QoS metrics are greater than a threshold $T$ .

Therefore, the T-PCUT problem would identify the most critical nodes if communication between a given pair of nodes $(s,t)$ . For example, $s$ could be a commanding node attempting to send an order to infantry unit $t$ . If communication between $s$ and $t$ is a high priority, critical nodes identified by T-PCUT would be especially important to protect against an adversarial attack.

II-A3 Vulnerability assessment on communication networks

Motivated by the above two examples, we present a vulnerability assessment for communication networks in this section. Let $C=(V,E)$ represent a communication network. We fix an additive QoS metric $Q$ on the edges of $C$ . Since the QoS metric $Q$ is additive, we define the QoS metric on the path $p=p_{0}p_{1}\cdots p_{l}\in C$ as

[TABLE]

Furthermore, we denote the metric between a pair $s,t$ as $Q(s,t)$ , the shortest-path distance between $s,t$ , where the weight of each edge in the network is $Q(u,v)$ . Clearly, no routing path could provide better QoS with respect to $Q$ than the $Q$ -shortest path. Let $T$ be a constant representing the threshold such that if $Q(s,t)>T$ then communication between $s$ and $t$ is no longer possible. Notice that since the value of $Q$ on each edge is determined by network parameters, it has a minimum value $q_{min}$ which is a constant independent of the network size.

Next, we define the problems of identification of the most critical elements of the network with respect to the metric $Q$ and threshold $T$ , and a given targeted set of pairs $\mathcal{S}$ in the network, with respect to $T$ -separation.

Problem 3 (Targeted Communication Vulnerability Assessment (TCVA)).

Given communication network $C=(V,E)$ , an additive quality of service metric $Q$ , a threshold $T$ for $Q$ indicating the highest acceptable value of $Q$ for communication between a pair of nodes in $C$ , a targeted set $\mathcal{S}=\{(s_{1},t_{1}),(s_{2},t_{2}),\ldots,(s_{k},t_{k})\}$ , and a cost function $c$ on $C$ , determine $W\subset V$ of minimum cost such if $W$ is removed from $C$ , then for all $(u,v)\in\mathcal{S}$ , $Q(u,v)>T.$

Notice that TCVA is exactly the T-MULTI-PCUT problem with the edge length function equal to the QoS value on the edge.

II-B Integer programming formulations

In this section, we formulate the pseudocut problems as integer programs. We will state the formulations for the pseudocut versions where $T$ is an input, but the same formulations apply when $T$ is a constant. We formulate PCUT and MULTI-PCUT as integer programs in the following way. Let an instance $(G,c,d,\mathcal{S},T)$ of MULTI-PCUT be given. We will consider simple paths $p=p_{0}p_{1}\ldots p_{l}\in G$ ; that is, paths containing no cycles. Let $\mathcal{P}(s_{i},t_{i})$ denote the set of simple paths $p$ between $(s_{i},t_{i})\in\mathcal{S}$ that satisfy the condition $d(p)\leq T$ . If a vertex $u$ lies on path $p$ , we write $u\in p$ . The following lemma relates the optimal solution to MULTI-PCUT to the minimum-size hitting set of $\mathcal{P}=\bigcup_{i=1}^{k}\mathcal{P}(s_{i},t_{i}),$ which is necessary for the integer programming formulation.

Lemma 2.

Let $W^{*}$ be an optimal solution to an instance of MULTI-PCUT. Let $W^{\prime}$ be a minimum cost set of vertices satisfying $W^{\prime}\cap p\neq\emptyset$ for all $p\in\mathcal{P}(s_{i},t_{i})$ for all $(s_{i},t_{i})\in\mathcal{S}$ . Then, $c(W^{\prime})=c(W^{*})$ .

Proof.

Since $W^{*}$ is a solution to the MULTI-PCUT problem, we have $d(u,v)>T$ for all $(u,v)\in\mathcal{S}$ after the removal of $W^{*}$ . Any path $p$ in $G$ between a pair $(u,v)\in\mathcal{S}$ satisfying $d(p)\leq T$ must therefore satisfy $p\cap W^{*}\neq\emptyset$ , for otherwise $d(u,v)\leq T$ . Thus, $c(W^{\prime})\leq c(W^{*})$ .

Similarly, the removal of $W^{\prime}$ from $G$ ensures $d(u,v)>T$ for all $(u,v)\in\mathcal{S}$ , hence $c(W^{*})\leq c(W^{\prime})$ . ∎

As a consequence of Lemma 2, we can formulate MULTI-PCUT as a covering integer program. Consider the vertex set of $G$ to be $\{1,\ldots,n\}$ . Let $A^{(u,v)}_{p,i}=1$ if vertex $i$ lies on path $p\in\mathcal{P}(u,v)$ , where $(u,v)\in\mathcal{S}$ . If $i\not\in p$ , let $A^{(u,v)}_{p,i}=0$ . Also, let variable $w_{i}=1$ if vertex $i$ is to be chosen into the set of vertices $W$ , and [math] otherwise. Finally, denote the cost of choosing vertex $i$ as $c_{i}$ , and let vectors $w=(w_{1},\ldots,w_{n})$ and $c=(c_{1},\ldots,c_{n})$ . Then, the covering $0-1$ integer program formulation is as follows.

Integer Program 1 (IP 1).

[TABLE]

The constraints (2) ensure that for each path $p\in\mathcal{P}(u,v)$ , we choose at least one node $i\in p$ . By Lemma 2, the optimal solution to IP 1 corresponds to an optimal solution of MULTI-PCUT. The linear relaxation of IP 1 is designated LP 1, in which each constraint (3) is replaced by $w_{i}\in[0,1]$ . Finally, we remark that since PCUT is a special case of MULTI-PCUT, IP 1 and all solutions we discuss apply to PCUT as well.

II-B1 Discussion

Notice that if we let $T$ become large enough, the classical problems CUT and MULTI-CUT are recovered from PCUT and MULTI-PCUT.

If $T$ is an input, IP 1 above is superpolynomial in size; there could be $n^{T}$ constraints (1); The analogous integer program for MULTI-CUT also could have exponentially many constraints but has a polynomial-time separation oracle that enables the linear relaxation to be solved in polynomial time by the ellipsoid method [6]. However, this separation oracle does not work for the linear relaxation of IP 1; in general, the linear relaxation may not be solvable in polynomial time. However, the IP formulations above hold when $T$ is a constant. Thus, IP 1 is polynomial in size when T-MULTI-PCUT is considered.

Finally, notice that not all instances to PCUT admit a valid solution; suppose as input a graph consisting of a single edge $(s,t)$ is given. PCUT is formulated to disallow choosing $s$ or $t$ ; hence, there is no solution. Whether a feasible solution exists can easily be detected in polynomial time, so unless otherwise stated, we assume that a feasible problem instance is given in our analysis.

III Computational complexity

In this section, we present our results on the computational complexity of the pseudocut problems.

III-A T-PCUT

We give polynomial-time algorithms for certain cases of the version of T-PCUT with uniform lengths. However, T-PCUT with arbitrary edge lengths and uniform vertex costs is shown to be $NP$ -hard.

Proposition 1.

For $T\leq 3$ , T-PCUT with uniform lengths and costs is solvable in polynomial time.

Proof.

Let $G,(s,t)$ be an instance of T-PCUT. First consider the case $T=2$ . Since edge lengths are uniform, all paths $p$ of length $2$ from $s$ to $t$ have exactly three vertices: $p=sxt$ for some $x\in V$ . Therefore, no such paths can intersect unless they are identically equal. So to ensure $d(s,t)>2$ , one must simply remove all intermediate vertices between $s$ and $t$ .

Next, suppose $T=3$ . Let $p_{1}=sxyt$ be a path of length 3 from $s$ to $t$ , and let $p_{2}$ be a path of length 2 that intersects $p_{1}$ . In order to satisy $d(s,t)>3$ , $p_{2}$ must be broken, which can happen in only one way and necessarily breaks $p_{1}$ as well. Hence, in the first step we break all paths of length 2 in the same way as for the $T=2$ case, and denote the modified graph as $G^{\prime}$ . The remaining paths of length 3 do not intersect paths of length 2. Two distinct paths of length 3 can intersect each other in a maximum of one vertex. Let $X=\{x_{1},x_{2},x_{3},\ldots\}$ be the set of all nodes that appear as the second node (after $s$ ) on a path of length 3; similarly, let $Y=\{y_{1},y_{2},y_{3},\ldots\}$ be the set of nodes appearing as the third node on a path of length 3. Notice that $X\cap Y=\emptyset$ , because otherwise a path of length 2 would still be extant in the graph, but all such paths were removed in the first step.

Thus, the relevant subgraph $G^{\prime}$ will appear of the form exemplified in Fig. 2. Notice that an edge $(x_{2},x_{1})$ would have no relevance to the solution, as the only way to create a path of length 3 using $(x_{2},x_{1})$ would be to add $(x_{1},t)$ as well; but this process creates the path $sx_{1}t$ , which is of length 2; so $x_{1}$ would have been chosen in the first step. If we delete $s$ and $t$ from the graph $G^{\prime}$ , we see that our problem reduces to a bipartite vertex cover problem, which is solvable in polynomial time; the second step will consist of the optimal solution to this problem. The final solution is the union of vertices chosen in the first and second steps. ∎

Proposition 2.

Let $D$ be a constant, T-PCUT $(G,(s,t))$ be an instance of T-PCUT for some constant $T$ with uniform lengths and uniform costs. If the maximum degree $\delta$ in $G$ satisfies $\delta\leq D$ , then the optimal solution $W$ is computable in polynomial time.

Proof.

Consider all distinct paths of length at most $T$ starting from $s$ and ending at $t$ . The number of distinct vertices on these paths is $O(\delta^{T})=O(D^{T})$ ; let us call this set $V^{\prime}$ . Therefore, the number of possible subsets of these vertices is a constant bounded by $O(2^{D^{T}})$ . Since each subset can be checked in polynomial time, the optimal solution can be found by checking each possible subset of $V^{\prime}$ . ∎

Theorem 1.

Consider the decision version of 1-PCUT with uniform costs and arbitrary lengths; that is, given problem instance 1-PCUT $(G,(s,t))$ with uniform costs and arbitrary lengths, and given constant $D>0$ , determine if a solution $W\subset V$ exists with $|W|\leq D$ . This problem is NP-complete.

Proof.

For clarity, we first prove the theorem for the edge version of 1-PCUT (where edges $e\in G$ have both cost and length functions), with arbitrary costs of edges; next, we discuss how to modify the proof for the uniform cost function and the vertex version of PCUT. The decision problem is clearly in $NP$ . To show $NP$ -hardness, we first reduce the Knapsack problem to an instance of Pseudocut with non-uniform costs; then we discuss how to modify the reduction for uniform costs. A problem instance of Knapsack is specified as follows. Let $S=\{a_{1},\ldots,a_{n}\}$ be a set of objects with sizes $w(a_{i})\in\mathbf{Z}^{+}$ and profits $p(a_{i})\in\mathbf{Z}^{+}$ , and a “knapsack capacity” $W$ , and desired profit $P$ . The decision version of the problem is to find a subset of objects with total profit at least $P$ and total size bounded by $W$ .

Given a Knapsack instance, we construct an instance of the pseudocut problem in the following way. For each item $a_{i}$ , we add nodes $u_{i},v_{i}$ and edges $e_{i}=(u_{i},u_{i+1})$ , $f_{i}=(u_{i},v_{i})$ , and $g_{i}=(v_{i},u_{i+1})$ . We also set the following cost and $d$ values: $c(e_{i}):=w(a_{i})$ , $d(e_{i}):=0$ , $c(f_{i}):=\infty$ , $d(f_{i}):=0$ , $c(g_{i}):=\infty$ , and $d(g_{i}):=p(a_{i})/P$ . Fig 3 illustrates this construction.

Then, letting $s=u_{1}$ , $t=u_{n+1}$ , we have an instance of the 1-PCUT, the decision version of which is whether there exists a set of edges of total cost at most $W$ such that $d(s,t)\geq 1$ . Notice that including edge $e_{i}$ into a solution incurs cost $c(a_{i})$ and adds $p(a_{i})/P$ to $d(s,t)$ . Furthermore, edges $f_{i}$ and $g_{i}$ will not be chosen since these edges have infinite cost. So choosing edge $e_{i}$ exactly corresponds to adding item $a_{i}$ into the knapsack, and solutions to the Knapsack instance and the Pseudocut instance are in one-to-one correspondence, with corresponding solutions having the same cost. Also, $d(s,t)\geq 1$ iff the corresponding solution to the Knapsack problem has profit at least $P$ .

Modification for vertex version: To obtain the $NP$ -hardness of the uniform cost vertex 1-PCUT problem, we discuss how to modify the above reduction. The first modification is to replace each vertex in the construction with a clique of $W+1$ vertices. Edges $f_{i}$ and $g_{i}$ are replaced by $W+1$ edges matching clique $v_{1}$ with $u_{1}$ and with $u_{2}$ , respectively. Instead of a single edge $e_{i}$ we add $c(a_{i})$ vertices $w_{ij}$ between $u_{i}$ and $u_{i+1}$ , connecting each vertex in cliques $u_{i},u_{i+1}$ to each $w_{ij}$ . Distinct nodes $s,t$ are added and $s$ is connected to each vertex in first clique $u_{1}$ , and $t$ to each node in clique $u_{n}$ . Thus, in order to add $p(a_{i})/P$ to the distance $d(s,t)$ , it is necessary to pick all $c(a_{i})$ vertices $w_{ij}$ . ∎

III-B T-MULTI-PCUT

In this section, we show uniform length and cost T-MULTI-PCUT to be inapproximable within a factor of $1.3606$ .

Theorem 2.

Let $T\geq 1$ . Consider the decision version of T-MULTI-PCUT with uniform lengths and costs; that is, given problem instance T-MULTI-PCUT $(G,\mathcal{S})$ with uniform lengths and costs, determine if a solution $W\subset V$ exists with $|W|\leq D$ . This problem is NP-complete.

Proof.

The feasibility of a solution $W$ satisfying $|W|\leq D$ can easily be checked in polynomial time, so T-MULTI-PCUT $\in NP$ . We give an approximation-preserving reduction [6] from the vertex cover problem to T-MULTI-PCUT. Let $H$ be an instance of the vertex cover problem; let the vertex set of $H$ be $V=\{1,2,\ldots,n\}$ . An instance of T-MULTI-PCUT s constructed as follows. Let $G$ be a complete graph on $\{1,2,\ldots,n\}$ , and $\mathcal{S}$ be the edge set of $H$ .

Then, there is a natural one-to-one, cost-preserving correspondence between solutions of the two instances; namely the identity mapping: if $W\subset V$ is a vertex cover of size $l$ , $W$ is also a feasible solution to the T-MULTI-PCUT instance of size $l$ , since $(u,v)\in\mathcal{S}$ implies $(u,v)\in H$ , which implies $u\in W$ or $v\in W$ since $W$ is a vertex cover, which finally implies $d(u,v)=\infty>T$ in $G$ (by the convention discussed in Section II). If $W\subset V$ is a solution to T-MULTI-PCUT, then for each $(u,v)\in H$ , $d(u,v)=\infty$ after removal of $W$ . Since the edge $(u,v)$ is in $G$ , $u$ or $v$ is in $W$ , so that $W$ is a vertex cover. ∎

Corollary 1.

Unless $P=NP$ , there is no polynomial-time approximation to uniform length, cost T-MULTI-PCUT within a factor of $1.3606$ , for $T\geq 1$ .

Proof.

This corollary follows from the proof of Theorem 2 and the inapproximability of vertex cover [26]. ∎

IV Approximation algorithms

In this section, we present three approximation algorithms for arbitrary vertex cost T-MULTI-PCUT, when the length function on edges is bounded below: $d(e)>q_{min}$ for some constant $q_{min}>0$ . In this case, we call the edge lengths bounded. Recall from Section II-A that edge lengths are bounded when the edge length function $d$ is an additive QoS metric. For the case of bounded edge lengths, we let constant $T_{0}=T/q_{min}$ . If the length function is uniform, then of course $T_{0}=T$ . For bounded edge length, arbitrary vertex cost T-MULTI-PCUT, we present GEN, an $O(\log n)$ -approximation algorithm, and FEN, a $(T_{0}+1)$ -approximation algorithm in Section IV-A. Although these algorithms run in polynomial time since $T_{0}$ is constant, their running time may suffer if $T_{0}$ is large for some application. Hence, we also present a randomized algorithm with probabilistic performance guarantee in Section IV-B, capable of running efficiently even for large $T_{0}$ .

IV-A Approximations for T-MULTI-PCUT

First, we present two approximation algorithms for the constant $T$ problems T-PCUT and T-MULTI-PCUT, based upon Lemma 2 and IP 1, when edge lengths have a lower bound $q_{min}>0$ . The idea is as follows: for each path of vertices $p=v_{1}\ldots v_{l}$ between a pair of the target set $\mathcal{S}$ with $d(p)=\sum_{i=1}^{l}d(v_{i-1},v_{i})\leq T$ , we must select at least one node belonging to the path into the solution. Thus, we formulate the problem into a covering framework, where each node covers a subset of paths. Both algorithms require the following enumeration of paths.

IV-A1 Path enumeration

This enumeration can be accomplished in polynomial-time in the following way: let $T_{0}=T/q_{min}$ ; then each path $p\in\mathcal{P}=\bigcup_{(u,v)\in\mathcal{S}}\mathcal{P}(u,v)$ must have at most $T_{0}+1$ nodes. Thus, we may iterate through all sequences of nodes of length at most $T_{0}$ , and test if the path produced is in $\mathcal{P}$ ; that is, for some $(u,v)\in\mathcal{S}$ , the path must start at $u$ , terminate at $v$ , and satisfy $d(p)<T$ . This procedure can be accomplished in time $O(n^{T_{0}})$ . Using these paths, we can construct the matrices $A^{(u,v)}$ in IP 1.

IV-A2 $O(\log n)$ -approximation

The first approximation algorithm for MULTI-PCUT is given in Alg. 1. The general approach is as follows. After the enumeration of all paths in $\mathcal{P}$ , the algorithm greedily selects the node that intersects the largest number of paths normalized by the vertex cost until all paths in $\mathcal{P}$ have been covered. By the proof of Lemma 2, when all such paths in $\mathcal{P}$ are covered, we have a feasible solution $W$ .

An explicit description of the algorithm is given in Alg. 1. In lines 1 – 3, the enumeration described above is performed. Next, the algorithm initializes $W$ , the set of vertices chosen, and $C$ , the set of paths covered by $W$ to $\emptyset$ in line 4. The while loop on line 5 tests whether any paths satisfying $d(p)\leq T$ still exist in the network. If so, it chooses the node $i^{*}$ which covers the most such extant paths into the set $W$ on line 11 and updates $C$ accordingly on line 12.

Theorem 3.

Alg. 1 achieves a performance guarantee of $O(\log n)$ with respect to the optimal solution with running time bounded by $O(kn^{T_{0}})$ . Furthermore, for each $n$ , there exists an instance of the single pair PCUT problem where Alg. 1 returns a solution of cost greater than a factor $\Omega(\log n)$ of the optimal.

Proof.

The performance ratio of $O(\log n)$ follows from the fact that IP 1 is a covering integer program corresponding to the set cover problem with at most $O(n^{T_{0}+1})$ elements (the paths) for which the greedy algorithm has the ratio $O(T_{0}\log n)$ [6].

Next, we construct a tight example for Alg. 1; which holds even in the case of the single pair T-PCUT, for $T=5$ . At the beginning of the construction, $G$ contains two isolated nodes, $s,t$ . Add nodes $g_{1},\ldots,g_{k}$ and edges $(s,g_{i})$ for each $g_{i}$ . Next, add nodes $o_{1},o_{2}$ to the graph, along with edges $(o_{1},t),(o_{2},t)$ . Then, for each $g_{i}$ , add $2^{i-1}$ disjoint paths of length 2 between $g_{i}$ and $o_{1}$ , and similar paths between $g_{i}$ and $o_{2}$ . Let $d(u,v)=1$ for all edges in $G$ . For $k=3$ , see Fig. 4 in the Appendix for a depiction of the construction. Then Alg. 1 will select nodes $g_{k},\ldots,g_{1}$ in that order, while the optimal solution is $\{o_{1},o_{2}\}$ . ∎

IV-A3 $(T_{0}+1)$ -approximation

Next, we present FEN in Alg. 2, a frequency-based rounding algorithm for LP 1. FEN first enumerates $\mathcal{P}$ and constructs LP 1. In this covering program, each path intersects at most $T_{0}+1$ nodes, as discussed above. Hence, the algorithm nexts solves LP 1 to obtain optimal fractional solution $\bar{w}$ . Next, an integral solution $\hat{w}$ is obtained by rounding

[TABLE]

That $\hat{w}$ is a feasible solution follows from the fact that for each $(u,v)\in\mathcal{S}$ and $p\in\mathcal{P}(u,v)$ , constraint $\sum_{i=1}^{n}A_{p,i}^{(u,v)}\bar{w}_{i}\geq 1,$ so at least one $\bar{w}_{i}$ in the sum must satisfy $\bar{w}_{i}\geq 1/(T_{0}+1)$ , since the sum has at most $T_{0}+1$ nonzero elements. Furthermore, since the optimal fractional solution has cost at most the cost of the optimal integral solution, and the cost of $\hat{w}$ is within factor $T_{0}+1$ of $\bar{w}$ , it follows that FEN is an $(T_{0}+1)$ -approximation algorithm.

IV-B Probabilistic approximation algorithm

In this section, we propose another approximation algorithm, for T-PCUT and T-MULTI-PCUT when the length function is bounded below. This algorithm, GEST, is intended to more easily handle large values of $T_{0}$ than the algorithms in the preceding section. The key for GEST is a procedure to efficiently estimate the number of paths between $(u,v)$ of length at most $T$ that each vertex $i\in V$ lies upon, which will guide the greedy selection of nodes. By theoretical analysis, we demonstrate that GEST is not only efficient, but also has a probabilistic performance guarantee.

IV-B1 Algorithm overview and key results

The GEST algorithm is detailed in Alg. 3. As an overview, GEST iteratively selects nodes for removal based upon its estimation procedure, until the distance between all pairs $(u,v)\in\mathcal{S}$ exceeds $T$ . Define $\tau(S),\tau_{uv}(S)$ as the number of paths in $\cup_{(u,v)\in S}\mathcal{P}(u,v)$ , $\mathcal{P}(u,v)$ that $S$ intersects, respectively and $\sigma(S)$ , $\sigma_{uv}(S)$ as corresponding estimators. From the definition, we have $\tau(S)=\sum_{(u,v)\in\mathcal{S}}\tau_{uv}(S)$ and $\sigma(S)=\sum_{(u,v)\in\mathcal{S}}\sigma_{uv}(S)$ . In each iteration of GEST, the node that maximizes $\sigma(W\cup\{i\}),i\in V\backslash W$ will be added to $W$ , the set of selected nodes. The details of the estimator $\sigma(S)$ and the path sampling method are discussed in Sections IV-B2 and IV-B3, respectively.

In the following, we will prove Theorem 4, which establishes the key results on the probabilistic approximation ratio and time complexity of GEST. Before the proof, we introduce Lemma 3 on the number of samples $L$ for each pair to guarantee the accuracy of $\sigma(S)$ . The proof of Lemma 3 is provided in Section IV-B4. The parameter $\alpha$ in $L$ can be used to balance running time and accuracy of the algorithm.

Lemma 3.

Let the number of paths sampled for each $(u,v)\in\mathcal{S}$ be at least $L=3k^{2}\log(2n^{2})/2\alpha^{2}$ . Then, given a set $S\subset V$ and $\delta$ as the maximum degree in $G$ , the inequality $|\tau(S)-\sigma(S)|<\alpha\delta^{T_{0}}$ holds with probability at least $1-1/n^{3}$ .

Theorem 4.

Given an instance $(G,c,d,\mathcal{S})$ of uniform vertex cost T-MULTI-PCUT whose length function $d$ is bounded below, let $\delta$ be the maximum degree in $G$ . With probability at least $1-1/n$ , Alg. 3 returns a feasible solution $W$ with cost within ratio $O\left(\alpha\delta^{T_{0}}+\log|\mathcal{S}|\right)$ of optimal. The running time of Alg. 3 is $O(k^{3}n\log(2n^{2})/2\alpha^{2})$ .

Proof.

Let $\Delta_{x}\tau(S)=\tau\left(S\cup\{x\}\right)-\tau(S),\forall S\subseteq V,\forall x\in V$ ; then for any $S\subset T$ , observe that

[TABLE]

We will apply Lemma 3 and consider that the inequality therein always holds; later, we will consider the probability that the inequality in Lemma 3 does not hold for some application. Let $\varepsilon=4\alpha\delta^{T_{0}}$ and apply Lemma 3. By (5), we have:

[TABLE]

Observe that Alg. 3 at each iteration picks $a_{i}$ such that $a_{i}=\operatorname*{arg\,max}\Delta_{a_{i}}\sigma(\{a_{1},\ldots,a_{i-1}\})$ . Let $A_{i}=\{a_{1},\ldots,a_{i}\}$ be the choice of Alg. 3 after $i$ iterations, and let $A_{g}$ be the final solution returned by the algorithm. Let $o=OPT$ be the size of an optimal solution $C=\{c_{1},\ldots,c_{o}\}$ satisfying $\sigma(C)\geq P$ , where $P$ is the number of paths in $\mathcal{P}$ ; notice that $\sigma(S)\geq P$ is determined in Alg. 3 by testing if all pairs in $\mathcal{S}$ satisfy $d(s,t)>T$ after removal of $S$ . Then

[TABLE]

Therefore, $P-\sigma(A_{i+1})-\varepsilon\leq\left(1-\frac{1}{o}\right)(P-\sigma(A_{i})).$ Then

[TABLE]

From here, there exists an $i$ such that the following differences satisfy

[TABLE]

Thus, by inequalities (8) and (9), $o\leq P\exp\left(\frac{-i}{o}\right),$ and $i\leq o\log\left(\frac{P}{o}\right).$ By inequality (10) and the assumption on the termination of the algorithm, the greedy algorithm adds at most $o(1+\varepsilon)$ more elements, so $g\leq i+o(1+\varepsilon)\leq o\left(1+\varepsilon+\log\left(\frac{P}{o}\right)\right).$ In Alg. 3, we require the guarantee from Lemma 3 for all nodes $i\in V\backslash W$ for all iterations, which can happen $n^{2}$ times in the worst case. Therefore, by union bound, the probability of having the desired approximation ratio is at least $1-1/n$ . The running time follows from the choice of $L$ . Alg. 3 needs to sample $k$ sets of $L$ samples per iteration and in the worst case, there can be $n$ iterations. ∎

IV-B2 The estimators

Let $u,v\in V$ , and let $\mathcal{P}^{i}(u,v)$ be the set of all paths $p$ between $u,v$ satisfying the distance constraint $d(p)\leq T$ and additionally vertex $i\in p$ . We want to efficiently estimate the quantity $\tau_{uv}(W\cup\{i\}):=|\cup_{j\in W\cup\{i\}}\mathcal{P}^{j}(u,v)|$ for all $i\in V\backslash W$ . To achieve this estimation, we adapt the approach of Roberts et al. [27]; their estimators are for the total number of simple paths in a graph, while we require as estimation of the number of simple paths each vertex $v\in G$ lies upon, where the length of each path is restricted to be at most $T$ .

To define an estimator $\sigma_{uv}(W\cup\{i\})$ , we proceed in the following way. Let $q$ be any simple path between $u$ and $v$ ; we will define a probability distribution $h(q)$ on paths $q$ satisfying $h(q)\neq 0$ if $q\in\mathcal{P}(u,v)$ ; the distribution $h(q)$ is defined in Section IV-B3 and will have domain $\mathcal{R}(u,v)$ , a set of simple paths starting from $u$ . We will then independently sample paths $q_{1},\ldots,q_{L}$ from $h(q)$ and define the estimator

[TABLE]

where $I\left(q_{l}\in\cup_{j\in W\cup\{i\}}\mathcal{P}^{j}(u,v)\right)$ is an indicator random variable that takes value $1$ if $W\cup\{i\}\cap q_{j}\neq\emptyset$ and $q_{j}\in\mathcal{P}(u,v)$ , and [math] otherwise.

Lemma 4.

$\sigma_{uv}(W\cup\{i\})$ * is an unbiased estimator of $\tau_{uv}(W\cup\{i\})$ .*

Proof.

Let $Y(q)$ be the random variable

[TABLE]

for $q\in\mathcal{R}(u,v)$ . Then the expection of $Y(q)$ is

[TABLE]

From here, the lemma follows from the law of large numbers. ∎

IV-B3 Definition of $h(q)$ and path sampling

Next, we define the probability distribution $h(u)$ on $\mathcal{R}(u,v)$ , the set of all simple paths $q=u_{0}u_{1}\ldots u_{l}$ starting from $u$ and ending at $v$ or ending at another vertex $v^{\prime}$ and is maximal; that is, adding any vertex $u_{l+1}$ to $q$ creates a cycle or causes the length of the path to exceed $T$ . We define the probability of a path $q\in\mathcal{R}(u,v)$ sequentially: $h(q):=\prod_{i=1}^{l}h(u_{i}|u_{0}u_{1}\ldots u_{i-1}).$ Notice that $h(u_{0})=h(s)=1$ since $s$ is always chosen as the starting vertex. Furthermore, $h(u_{i}|u_{0}\ldots u_{i-1})$ is a uniform distribution over the number of vertices available to be chosen as the next vertex of the path; that is $u_{i}$ does not create a cycle and $d(u_{0}\ldots u_{i})\leq T$ .

The definition of $h$ lends itself to the following sequential sampling algorithm, shown in Alg. 4. In line 1, the algorithm choose $u_{0}=u$ with probability $h=1$ . Let $N(u_{i})$ be the set of neighbors of $u_{i}$ not previously chosen into the path $q$ . If $N(u_{i})=\emptyset$ or $u_{i}=v$ , the algorithm terminates. Otherwise $u_{i+1}$ is chosen from $N(u_{i})$ uniformly with probability $1/|N(u_{i})|$ and the value of $h$ is updated accordingly.

IV-B4 Bound on number of samples required

In this section, we prove Lemma 3 for how many path samples are required to ensure $|\tau_{uv}(S)-\sigma_{uv}(S)|\leq\alpha\delta^{T_{0}}/k$ . To this end, we require Hoeffding’s inequality

Theorem (Hoeffding’s inequality).

Suppose $Y_{1},\ldots,Y_{L}$ are independent random variables in $[0,K]$ . Let $Y=\frac{1}{L}\sum_{i=1}^{L}Y_{i}$ . Then the probability $\mathbf{P}\left(\left|Y-\mathbf{E}(Y)\right|\geq t\right)\leq 2\exp\left(\frac{-2Lt^{2}}{K^{2}}\right).$

Proof for Lemma 3.

Consider $Y_{i}=Y(q_{i})$ , where $Y(q)$ is the random variable defined in (12). Let $K\leq\delta^{T_{0}}$ , which is the maximum value of $Y_{i}$ , and $t=\alpha\delta^{T_{0}}/k$ . Next, we require the probability bound from Hoeffding’s inequality to be less than $\frac{1}{n^{3}k}$ . Solving for the number of samples yields $L\geq 3k^{2}\log(2n^{2})/2\alpha^{2}.$ Therefore, when the number of samples is at least $L$ , we can guarantee $|\tau_{uv}(S)-\sigma_{uv}(S)|\leq\alpha\delta^{T_{0}}/k$ for one pair $(u,v)\in\mathcal{S}$ with probability $1-\frac{1}{n^{3}k}$ . Then, the inequality holds for all $(u,v)\in\mathcal{S}$ with probability $1-1/n^{3}$ by union bound. Since $\tau(S)$ and $\sigma(S)$ are the summations, $|\tau(S)-\sigma(S)|$ is at most $\alpha\delta^{T_{0}}$ when all the inequalities hold. ∎

IV-B5 Further modification to GEST

In this section, we discuss a simple modifications to GEST; this modification, GESTA, improves performance for the T-MULTI-PCUT problem.

GESTA: In practice, valid path samples in $\mathcal{P}$ become harder to obtain as GEST progresses nearer to a solution to T-MULTI-PCUT; this fact results from most valid paths originally in the network having already been broken. Therefore, we propose GESTA, a modification to Alg. 3 as follows: if GESTA performs $L$ samples, as in line 5 of GEST, and obtains no valid paths in $\mathcal{P}(u,v)$ for any $(u,v)\in\mathcal{S}$ , then GESTA computes a shortest path between a randomly chosen pair $(u,v)$ in $\mathcal{S}$ for which $d(u,v)\leq T$ . The algorithm then chooses the cheapest node on this path into its solution, and continues with the while loop on line 2 of GEST.

V Experimental evaluation

In this section, we experimentally evaluate our proposed algorithms on the QoS vulnerability assessment TCVA in V-B. In Section V-A, we discuss the methodology of our evaluation.

V-A Datasets and methodology

Synthesized datasets: To generate topologies, we used a well-known Internet topology generator BRITE [28]; which we employed to generate (1) Flat Router-Level (RL) only, (2) Flat Autonomous System level (AS) only, and (3) hierarchical top-down datasets, consisting of AS and RL, with each AS divided into routers. We also used topologies generated according to Erdos-Renyi (ER) random graphs. To simulate a QoS metric, edges were weighted uniformly in the interval $[1,10]$ , following [29, 14]. The dataset statistics are as follows: ER1, an ER graph with $n=1000$ , $m=49995$ ; RL1, router-level graph with $n=5000$ , $m=250000$ , generated by BRITE with default parameters and Waxman model; RL2, same as RL1 except $n=1000,m=2000$ ; RL3, same as RL1 except $n=100$ , $m=200$ ; AS1, an AS-level graph generated by BRITE with default parameters and $n=10000,m=498725$ ; and finally, H1, a hierarchical BRITE top-down graph with 200 autonomous systems and 100 routers per AS, with $n=20000,m=660604$ .

Algorithms for TCVA: For TCVA, we compared the following algorithms with GEN (Alg. 1), FEN (Alg. 2), and GESTA (Section IV-B5):

•

OPT: the optimal solution of IP 1, which was implemented using the IP solver included in the open-source GNU Linear Programming Kit (GLPK) [30];

•

MC: the classical minimum-cut algorithm implemented with the Goldberg-Tarjan algorithm [31] for maximum flow, only employed when the size of the target set $|\mathcal{S}|=1$ ; and

The cost function on vertices employed for TCVA is specified in each section; when cost is uniform, we refer to the size of the solution returned by each algorithm. The path enumeration required for GEN, FEN, and OPT was parallelized, using at most 25 threads. This parallelization was accomplished by assigning distinct initial segments of paths to distinct threads. Also, when $k>1$ , enumerations for distinct pairs were assigned to distinct threads. Total computation time is the sum of the computation time over all threads. Algorithms were limited to one hour of wall-clock time before termination; this could be much more computation time than one hour depending on the level of parallelization. All times shown in the results are total computation time. All experiments were performed on a machine with Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz and 392 GB RAM.

V-B Evaluation for Targeted Assessment (TCVA)

V-B1 On choice of target set

In order to evaluate the algorithms for TCVA, it is necessary to choose the target set $\mathcal{S}$ ; in practice, this choice is entirely up to the user. First, we discuss the motivation and effectiveness of choosing the target sets $\mathcal{S}$ uniformly randomly; next, we observe how restricting the elements of the target set based upon their degree affects the size of the optimal solution.

Uniformly random: One method of evaluating the performance of our algorithms for TCVA is to measure the average size (or cost) of the solution over all possible choices of the target set $\mathcal{S}$ . To avoid the large computation time involved in running each algorithm on each possible choice of $\mathcal{S}$ , we approximated this value by averaging over $N$ uniformly random choices of $\mathcal{S}$ . To justify this approximation, we show in Fig. 5(a) the average cost of the solution returned by each algorithm versus $N$ on the RL1 dataset, with $k=|\mathcal{S}|=1000$ and $T=4$ . Also shown is the sample standard deviation of the $N$ values for the cost. While the value of the mean fluctuates, the value of these fluctuations is less than $10\%$ despite the huge number ${5000\choose 1000}$ , the number of possible choices of $\mathcal{S}$ . Qualitatively similar results were found for the other datasets and $k$ values. Therefore, in the remainder of this section we average results over $N=10$ uniformly random choices of $\mathcal{S}$ unless otherwise stated, which we found sufficient to identify trends in the results.

By degree: Next, we observed how restricting the choice of the target set by degree impacts the size of the optimal solution. For the purposes of this assessment, let $\zeta\in(0,1)$ , and let $\delta$ be the maximum degree in graph $G=(V,E)$ ; define the following two sets of vertices: $H=\{v\in V:d(v)\geq\zeta\delta\}$ , $L=\{v\in V:d(v)\leq(1-\zeta)\delta\}$ . Then we may restrict a source or target node to lie uniformly randomly within one of these sets. We consider four different schemes of choosing the target set based upon $H,L$ : HL, HH, LL, and RR. In HL, for each pair $(s,t)\in\mathcal{S}$ , $s$ is chosen uniformly random from $H$ , and $t$ is chosen uniformly randomly within $L$ . HH and LL are defined analogously, and RR chooses both nodes of each pair uniformly randomly from the entire vertex, as in the previous section.

In Fig. 5(b), we plot the size of the optimal solution to TCVA versus $\zeta$ for each scheme of target set selection, averaged over $N=10$ choices of $\mathcal{S}$ . The results for LL and RR are as expected; RR shows no dependence on $\zeta$ , and LL is approximately equal to RR for low values of $\zeta$ before decreasing monotonically as $\zeta$ approaches

However, HH and HL initially increase before decreasing below RR – this behavior is explained by the cardinality of $H$ and $L$ in addition to the restriction upon the degree. As $\zeta$ increases, the cardinality of $H,L$ decrease; as these cardinalities decrease, it becomes more likely that an element from one pair in the target set appears in another pair, even though all pairs in the target set $\mathcal{S}$ are distinct. As the fraction of nodes appearing in multiple pairs increases, it becomes easier to pseudo-separate the target set. This effect counteracts the fact that higher degree nodes are more difficult to pseudo-separate.

V-B2 Size of target set

In this section, we fixed a constant $T$ for each dataset, let vertices have uniform cost, and observed the behavior of the algorithms when $k=|\mathcal{S}|$ was incremented from $k=200$ to $2000$ . The only algorithm able to run on all datasets and $k$ values was GESTA, and it demonstrated good performance (always within a factor of 2 in solution cost) in comparison with OPT while running faster than the other algorithms by a factor of more than 10. Representative results are shown in the first two columns of Fig. 6. GEN outperforms GESTA and is the algorithm consistently the closest in performance to OPT when both run. Second best alternates between GESTA and FEN on RL1 and AS1, respectively. For each dataset, at some $k$ value, OPT exceeds one hour of computation time and is no longer included in the results. Notice on our largest dataset H1, with $T=10$ , neither GEN nor FEN can run after $k=600$ . Both of these algorithms require the enumeration of $\mathcal{P}$ , which was unable to complete after this value of $k$ on this dataset. However, on RL1 and AS1, GEN and FEN continue to finish within one hour throughout the experiment; notice from the running time shown in Fig. 6(e) that the asymptotic behavior of the running time for fixed $T$ of GEN is linear in $k$ , consistent with Theorem 3. In practice, GESTA runs faster than GEN and FEN by a constant factor of more than 10 on all inputs.

V-B3 Varying threshold $T$

In this section, we consider two choices of $k$ : $k=1$ , and $k=1000$ . We then observed the behavior of the algorithms when $T$ was incremented; representative results are shown in the last two columns of Fig. 6. When $k=1$ , we compared the performance of our algorithms to the classical MC algorithm (Fig. 6(c)); as expected, MC returned a result independent of $T$ , which demonstrates the inadequacy of solutions to the classical cutting problems for our assessments: for example, at $T=7$ , MC is returning a solution of size more than four times the optimal, and it does comparatively worse for lower values of $T$ . Also, we observe experimentally that as $T$ increases, we recover the classical version of our problem: past $T=13$ , GESTA is completely separating the input pair, and returning a solution of size similar to MC.

As in the previous section, the only algorithm able to run for all parameter values was GESTA, which maintained performance within factor 2 of OPT. Although not as scalable as GESTA, GEN consistently outperformed the other algorithms in size of solution. On ER1, shown in Fig. 6(c), GEN was limited by the path enumeration time after $T=11$ , and FEN and OPT were unable to finish solving the LP 1; this LP solution is necessary for the rounding of FEN and the integer solver of GLPK. Indeed, the running time of GEN and FEN increased exponentially with $T$ (Fig. 6(h)) as expected.

V-B4 Discussion

Throughout the TCVA experiments, we consistently observed the best performance compared to the optimal by GEN, which was able to run in many situations where OPT could not finish. Furthermore, GEN scales well with the size of the target set $|\mathcal{S}|$ . However, as the threshold value $T$ becomes relatively large, LP 1 becomes much larger and thus more difficult to solve; for this reason, GEN was unable to finish when $T$ became large. In these cases, we demonstrated that the approach of GESTA scales well with both the size of $|\mathcal{S}|$ and the threshold value $T$ , while maintaining good performance with respect to the optimal.

VI Conclusions and Future Work

In this work, we introduced three new combinatorial pseudocut problems. We analyzed the computational complexity of these problems, and we provided three approximation algorithms. We used the pseudocut problems to formulate a vulnerability assessment TCVA with respect to an arbitrary additive QoS metric on a communications network. Future work would include extending this assessment to incorporate more than one QoS metric; however, this is likely to be difficult as the problem of finding a routing path satisfying two or more QoS constraints is NP-hard; however, approximation algorithms do exist for this problem [29]. In addition, the computational complexity of the uniform edge length version of our simplest problem, T-PCUT, is left open; our NP-hardness proof required nonuniform edge lengths and we provided polynomial-time algorithms only for special cases.

In our experimental evaluation, we found our $O(\log n)$ -approximation GEN for T-MULTI-PCUT to consistently return the solution closest to the optimal value, although its asymptotic ratio is worse than the $(T+1)$ ratio of FEN; however, for applications that demand a high value for $T$ , our experiments showed that GEN and FEN may be unsuitable, despite the ease with which path enumeration may be parallelized – for this case, minor modifications to our probabilistic algorithm GEST were shown to give good performance in practice. The modifications to GEST were necessary because of the difficulty of obtaining valid path samples when GEST is close to a feasible solution; future work would include boosting the ability of GEST to obtain valid samples of paths between a terminal pair $(u,v)\in\mathcal{S}$ , so that heuristic modification GESTA becomes unnecessary.

-A Edge versions

Let $T$ be an arbitrary but fixed constant throughout this section. The problems will take as input a triple $(G,c,d)$ , where $G$ is a directed graph $G=(V,E)$ ; $c:E\to\mathbf{R}^{+}$ is a cost function on edges representing the difficulty of removing each edge; and $d:E\to\mathbf{R}^{+}$ is a length function on edges. Although both $c$ and $d$ may be considered weight functions, we use cost for $c$ and length for $d$ to avoid confusion. The distance $d(u,v)$ between two vertices is the length of the $d$ -weighted, directed, and shortest path between $u$ and $v$ ; the cost $c(W)$ of set $W$ of a set of edges is the sum of the costs of individual edges in $W$ .

Problem 4 (Minimum $T$ -pseudocut (edge version)).

Given triple $(G,c,d)$ and a pair $(s,t)$ of vertices of $G$ , determine a minimum cost set $W\subset E$ of edges such that $d(s,t)>T$ after the removal of $W$ from $G$ .

Problem 5 (Minimum $T$ -multi-pseudocut (edge version)).

Given triple $(G,c,d)$ , and a target set of pairs of vertices of $G$ , $\mathcal{S}=\{(s_{1},t_{1}),(s_{2},t_{2}),\ldots,(s_{k},t_{k})\}$ , determine a minimum cost set $W$ of edges such that $d(s_{i},t_{i})>T$ for all $i$ after the removal of $W$ from $G$ .

-B Algorithms for edge versions

If paths from $u$ to $v$ are defined as sequences of edges instead of vertices, then, to approximate the edge versions, we can define analogous approximation algorithms to GEN, FEN, GEST, and ENBI with analagous performance bounds. For example, we define an analogous program to IP 1 for the edge version of MULTI-PCUT below.

We will consider simple paths $p=p_{0}p_{1}\ldots p_{l}\in E$ ; that is, paths containing no cycles. Let $\mathcal{P}(s_{i},t_{i})$ denote the set of simple paths $p$ between $(s_{i},t_{i})\in\mathcal{S}$ that satisfy the condition $d(p)\leq T$ . If an edge $u$ lies on path $p$ , we write $u\in p$ . Consider the edge set of $G$ to be $\{1,\ldots,n\}$ . Let $A^{(u,v)}_{p,i}=1$ if edge $i$ lies on path $p\in\mathcal{P}(u,v)$ , where $(u,v)\in\mathcal{S}$ . If $i\not\in p$ , let $A^{(u,v)}_{p,i}=0$ . Also, let variable $w_{i}=1$ if edge $i$ is to be chosen into the set of edges $W$ , and [math] otherwise. Finally, denote the cost of choosing edge $i$ as $c_{i}$ , and let vectors $w=(w_{1},\ldots,w_{n})$ and $c=(c_{1},\ldots,c_{n})$ . Then, the covering $0-1$ integer program formulation is as follows.

Integer Program 2 (Edge MULTI-PCUT).

[TABLE]

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Tony H Grubesic, Timothy C Matisziw, Alan T Murray, and Diane Snediker. Comparative Approaches for Assessing Network Vulnerability. International Regional Science Review , 31(1):88–112, 2008.
2[2] Ashwin Arulselvan, Clayton W. Commander, Lily Elefteriadou, and Panos M. Pardalos. Detecting critical nodes in sparse graphs. Computers and Operations Research , 36(7):2193–2200, 2009.
3[3] Thang N. Dinh, Ying Xuan, My T. Thai, Panos M. Pardalos, and Taieb Znati. On new approaches of assessing network vulnerability: Hardness and approximation. IEEE/ACM Transactions on Networking , 20(2):609–619, 2012.
4[4] Thang N. Dinh and My T. Thai. Network under joint node and link attacks: Vulnerability assessment methods and analysis. IEEE/ACM Transactions on Networking , 23(3):1001–1011, 2015.
5[5] Tom Leighton and Satish Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM , 46(6):787–832, 1999.
6[6] Vijay V Vazirani. Approximation Algorithms . 2013.
7[7] C J Colbourn. The Combinatorics of Network Reliability. 1987.
8[8] L. R. Ford and D. R. Fulkerson. Sur le probleme des courbes gauches en topologie. Canad. J. Math , 8:399–404, 1956.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Pseudo-Separation for Assessment

Abstract

I Introduction

I-A Our contributions

I-B Related work

I-C Organization

II Problem definitions

Problem 1** (Minimum TTT-pseudocut (T-PCUT)).**

Problem 2** (Minimum TTT-multi-pseudocut (T-MULTI-PCUT)).**

II-A Motivation and applications for the pseudocut problems

II-A1 Industrial Internet of Things

Lemma 1**.**

Proof.

II-A2 Military communications networks

II-A3 Vulnerability assessment on communication networks

Problem 3** (Targeted Communication Vulnerability Assessment (TCVA)).**

II-B Integer programming formulations

Lemma 2**.**

Proof.

Integer Program 1** (IP 1).**

II-B1 Discussion

III Computational complexity

III-A T-PCUT

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

Theorem 1**.**

Proof.

III-B T-MULTI-PCUT

Theorem 2**.**

Proof.

Corollary 1**.**

Proof.

IV Approximation algorithms

IV-A Approximations for T-MULTI-PCUT

IV-A1 Path enumeration

IV-A2 O(log⁡n)O(\log n)O(logn)-approximation

Theorem 3**.**

Proof.

IV-A3 (T0+1)(T_{0}+1)(T0​+1)-approximation

IV-B Probabilistic approximation algorithm

IV-B1 Algorithm overview and key results

Lemma 3**.**

Theorem 4**.**

Proof.

IV-B2 The estimators

Lemma 4**.**

Proof.

IV-B3 Definition of h(q)h(q)h(q) and path sampling

IV-B4 Bound on number of samples required

Theorem** (Hoeffding’s inequality).**

Proof for Lemma 3.

IV-B5 Further modification to GEST

V Experimental evaluation

V-A Datasets and methodology

V-B Evaluation for Targeted Assessment (TCVA)

V-B1 On choice of target set

V-B2 Size of target set

V-B3 Varying threshold TTT

V-B4 Discussion

VI Conclusions and Future Work

-A Edge versions

Problem 4** (Minimum TTT-pseudocut (edge version)).**

Problem 5** (Minimum TTT-multi-pseudocut (edge version)).**

-B Algorithms for edge versions

Integer Program 2** (Edge MULTI-PCUT).**

Problem 1 (Minimum $T$ -pseudocut (T-PCUT)).

Problem 2 (Minimum $T$ -multi-pseudocut (T-MULTI-PCUT)).

Lemma 1.

Problem 3 (Targeted Communication Vulnerability Assessment (TCVA)).

Lemma 2.

Integer Program 1 (IP 1).

Proposition 1.

Proposition 2.

Theorem 1.

Theorem 2.

Corollary 1.

IV-A2 $O(\log n)$ -approximation

Theorem 3.

IV-A3 $(T_{0}+1)$ -approximation

Lemma 3.

Theorem 4.

Lemma 4.

IV-B3 Definition of $h(q)$ and path sampling

Theorem (Hoeffding’s inequality).

V-B3 Varying threshold $T$

Problem 4 (Minimum $T$ -pseudocut (edge version)).

Problem 5 (Minimum $T$ -multi-pseudocut (edge version)).

Integer Program 2 (Edge MULTI-PCUT).