A Game of Drones: Cyber-Physical Security of Time-Critical UAV   Applications with Cumulative Prospect Theory Perceptions and Valuations

Anibal Sanjab; Walid Saad; and Tamer Ba\c{s}ar

arXiv:1902.03506·cs.GT·February 11, 2020

A Game of Drones: Cyber-Physical Security of Time-Critical UAV Applications with Cumulative Prospect Theory Perceptions and Valuations

Anibal Sanjab, Walid Saad, and Tamer Ba\c{s}ar

PDF

TL;DR

This paper develops a mathematical game-theoretic framework incorporating cumulative prospect theory to analyze the cyber-physical security of time-critical UAV operations, revealing how subjective perceptions influence strategic interactions.

Contribution

It introduces a novel security game model for UAVs that integrates PT to account for bounded rationality, providing new analytical tools and algorithms for equilibrium analysis.

Findings

01

PT significantly affects the equilibrium strategies.

02

Bounded rationality disadvantages the UAV operator.

03

Algorithms effectively compute game equilibria.

Abstract

In this paper, a novel mathematical framework is introduced for modeling and analyzing the cyber-physical security of time-critical UAV applications. A general UAV security network interdiction game is formulated to model interactions between a UAV operator and an interdictor, each of which can be benign or malicious. In this game, the interdictor chooses the optimal location(s) from which to target the drone system by interdicting the potential paths of the UAVs. Meanwhile, the UAV operator responds by finding an optimal path selection policy that enables its UAVs to evade attacks and minimize their mission completion time. New notions from cumulative prospect theory (PT) are incorporated into the game to capture the operator's and interdictor's subjective valuations of mission completion times and perceptions of the risk levels facing the UAVs. The equilibrium of the game, with and…

Figures23

Click any figure to enlarge with its caption.

Tables1

Table 1. TABLE I: Summary of main notations.

$𝒢 (𝒩, ℰ)$	Directed security graph
$O, D \in 𝒩$	$O$ : Origin node, $D$ : Destination node
$ℋ$	Set of $O$ -to- $D$ paths over $𝒢$
$t (i, j) : ℰ \to ℝ$	Travel time from node $i$ to $j$ over $e_{k} = (i, j) \in ℰ$
$p_{n}$	Attack success probability at $n \in 𝒩$
$t_{a}$	Re-handling time
$f^{h} (n)$	Travel time from $O$ to $n \in 𝒩$ following $h \in ℋ$
$𝒵 = {I, U}$	Set of players: $I$ (interdcitor), $U$ (UAV operator)
$𝒙 \in 𝒳$	Generic mixed-strategy interdiction
$E_{d} (n, h)$	Expected deliver time for pure-strategy interdiction at $n$ and UAV path $h$
$M (i, j; (𝒙, k))$	MDP transition probability from state $i$ to $j$ for a mixed-strategy interdiction $𝒙$ and $U$ ’s action $k$
$r (i, j; (𝒙, k))$	MDP $i$ to $j$ state transition instantaneous cost/reward
$π_{𝒙} \in 𝒫$	Path selection policy for MDP defined by $𝒙 \in 𝒳$
$h_{π_{𝒙}}$	$O$ -to- $D$ path resulting from policy $π_{𝒙}$
$E_{π_{𝒙}} (O; 𝒙)$	Expected delivery time under policy $π_{𝒙}$
$V_{i} (n, h)$	PT valuation by $i \in 𝒵$ of strategy pair $(n \in 𝒩, h \in ℋ)$
$Ξ_{i} (𝒙, h)$	PT valuation by $i \in 𝒵$ of strategy pair $(𝒙 \in 𝒳, h \in ℋ)$

Equations155

T_{k} = f^{h} (D) + k [f^{h} (n) + t_{a}],

T_{k} = f^{h} (D) + k [f^{h} (n) + t_{a}],

τ_{k} = (1 - q_{n})^{k} q_{n} = p_{n}^{k} q_{n},

E_{d} (n, h) =

E_{d} (n, h) =

E_{d} (n, h) =

E_{d} (n^{*}, h^{*} = ρ (n^{*})) \geq E_{d} (n, ρ (n)) \forall n \in N, and

E_{d} (n^{*}, h^{*} = ρ (n^{*})) \geq E_{d} (n, ρ (n)) \forall n \in N, and

ρ (n) = h \in H ar g min E_{d} (n, h),

ρ (n) = h \in H ar g min E_{d} (n, h),

n^{*} =

n^{*} =

n^{*} =

n_{1} = n \in N_{h_{s}} ar g max \frac{p _{n}}{1 - p _{n}} (f^{h_{s}} (n) + t_{a}) + f^{h_{s}} (D),

n_{1} = n \in N_{h_{s}} ar g max \frac{p _{n}}{1 - p _{n}} (f^{h_{s}} (n) + t_{a}) + f^{h_{s}} (D),

n_{2} = n \in h_{s} ∖ N_{h_{s}} ar g max f^{h_{n}} (D), and

n_{2} = n \in h_{s} ∖ N_{h_{s}} ar g max f^{h_{n}} (D), and

N_{h_{s}} = {n \in h_{s} ∣ \frac{p _{n}}{1 - p _{n}} (f^{h_{s}} (n) + t_{a}) + f^{h_{s}} (D) \leq f^{h_{n}} (D)} .

N_{h_{s}} = {n \in h_{s} ∣ \frac{p _{n}}{1 - p _{n}} (f^{h_{s}} (n) + t_{a}) + f^{h_{s}} (D) \leq f^{h_{n}} (D)} .

h^{*} = ρ (n^{*}) =

h^{*} = ρ (n^{*}) =

h^{*} = ρ (n^{*}) =

E_{d} (n^{*}, h^{*}) =

E_{d} (n^{*}, h^{*}) =

E_{d} (n^{*}, h^{*}) =

\displaystyle M\big{(}i,j;(\boldsymbol{x},k)\big{)}\textrm{=}

\displaystyle M\big{(}i,j;(\boldsymbol{x},k)\big{)}\textrm{=}

\displaystyle M\big{(}i,j;(\boldsymbol{x},k)\big{)}\textrm{=}

\displaystyle M\big{(}i,j;(\boldsymbol{x},k)\big{)}\textrm{=}

\displaystyle r\Big{(}i,j;\big{(}\boldsymbol{x},k\in\mathcal{N}_{g}(i)\big{)}\Big{)}\textrm{=}

\displaystyle r\Big{(}i,j;\big{(}\boldsymbol{x},k\in\mathcal{N}_{g}(i)\big{)}\Big{)}\textrm{=}

\displaystyle r\Big{(}i,j;\big{(}\boldsymbol{x},k\in\mathcal{N}_{g}(i)\big{)}\Big{)}\textrm{=}

\displaystyle E_{\pi_{\boldsymbol{x}}}(s;\boldsymbol{x})\textrm{$=$}\!\!\!\!\!\!\!\!\sum_{s^{\prime}\in\{\pi_{\boldsymbol{x}}(s),O\}}\!\!\!\!\!\!\!\!\!\!M\Big{(}s,s^{\prime};\big{(}\boldsymbol{x},\pi_{\boldsymbol{x}}(s)\big{)}\Big{)}\Big{[}r\big{(}s,s^{\prime};(\boldsymbol{x},\pi_{\boldsymbol{x}}(s))\big{)}\textrm{$+$}E_{\pi_{\boldsymbol{x}}}(s^{\prime};\boldsymbol{x})\Big{]}.

\displaystyle E_{\pi_{\boldsymbol{x}}}(s;\boldsymbol{x})\textrm{$=$}\!\!\!\!\!\!\!\!\sum_{s^{\prime}\in\{\pi_{\boldsymbol{x}}(s),O\}}\!\!\!\!\!\!\!\!\!\!M\Big{(}s,s^{\prime};\big{(}\boldsymbol{x},\pi_{\boldsymbol{x}}(s)\big{)}\Big{)}\Big{[}r\big{(}s,s^{\prime};(\boldsymbol{x},\pi_{\boldsymbol{x}}(s))\big{)}\textrm{$+$}E_{\pi_{\boldsymbol{x}}}(s^{\prime};\boldsymbol{x})\Big{]}.

\displaystyle E_{\pi_{\boldsymbol{x}}}(n_{i};\boldsymbol{x})=(1-x_{n_{j}}p_{n_{j}})\big{(}t(n_{i},n_{j})+E_{\pi_{\boldsymbol{x}}}(n_{j};\boldsymbol{x})\big{)}+x_{n_{j}}p_{n_{j}}\big{(}t(n_{i},n_{j})+t_{a}+E_{\pi_{\boldsymbol{x}}}(O;\boldsymbol{x})\big{)}.

\displaystyle E_{\pi_{\boldsymbol{x}}}(n_{i};\boldsymbol{x})=(1-x_{n_{j}}p_{n_{j}})\big{(}t(n_{i},n_{j})+E_{\pi_{\boldsymbol{x}}}(n_{j};\boldsymbol{x})\big{)}+x_{n_{j}}p_{n_{j}}\big{(}t(n_{i},n_{j})+t_{a}+E_{\pi_{\boldsymbol{x}}}(O;\boldsymbol{x})\big{)}.

\displaystyle E_{\pi^{*}_{\boldsymbol{x}}}(s;\boldsymbol{x})\textrm{$=$}\!\!\!\min_{k\in\mathcal{N}_{g}(s)}\!\!\sum_{s^{\prime}\in\{k,O\}}\!\!\!\!\!M\big{(}s,s^{\prime};(\boldsymbol{x},k)\big{)}[r\big{(}s,s^{\prime};(\boldsymbol{x},k)\big{)}\textrm{$+$}E_{\pi^{*}_{\boldsymbol{x}}}(s^{\prime};\boldsymbol{x})].

\displaystyle E_{\pi^{*}_{\boldsymbol{x}}}(s;\boldsymbol{x})\textrm{$=$}\!\!\!\min_{k\in\mathcal{N}_{g}(s)}\!\!\sum_{s^{\prime}\in\{k,O\}}\!\!\!\!\!M\big{(}s,s^{\prime};(\boldsymbol{x},k)\big{)}[r\big{(}s,s^{\prime};(\boldsymbol{x},k)\big{)}\textrm{$+$}E_{\pi^{*}_{\boldsymbol{x}}}(s^{\prime};\boldsymbol{x})].

E_{π_{x}} (O; x) = t (n_{m}, D)

E_{π_{x}} (O; x) = t (n_{m}, D)

\displaystyle\textrm{$+$}\frac{1}{1\textrm{$-$}x_{n_{2}}p_{n_{2}}}\big{(}g(O,n_{1},n_{2})\textrm{$+$}\frac{1}{1\textrm{$-$}x_{n_{1}}p_{n_{1}}}g(O,n_{1})\!\!\underbrace{\big{)}\!\Big{)}...\!\bigg{)}\Bigg{]}}_{m\textrm{ brackets}},

x^{*} = x \in X ar g max E_{π_{x}^{*}} (O; x), where

x^{*} = x \in X ar g max E_{π_{x}^{*}} (O; x), where

π_{x}^{*} = π_{x} \in P ar g min E_{π_{x}} (O; x) .

π_{x}^{*} = π_{x} \in P ar g min E_{π_{x}} (O; x) .

h^{*} = h \in H ar g min E_{h} (O; x) .

h^{*} = h \in H ar g min E_{h} (O; x) .

T^{'} (k_{n_{1}}, k_{n_{2}}, ..., k_{n_{m}}) = f^{h} (D) + i = 1 \sum m k_{n_{i}} [f^{h} (n_{i}) + t_{a}],

T^{'} (k_{n_{1}}, k_{n_{2}}, ..., k_{n_{m}}) = f^{h} (D) + i = 1 \sum m k_{n_{i}} [f^{h} (n_{i}) + t_{a}],

\displaystyle\vspace{-0.5cm}{\small\tau^{\prime}(k_{n_{1}},k_{n_{2}},\!...,k_{n_{m}})\textrm{$=$}\big{[}\!\prod_{i=1}^{m}\!(1\textrm{$-$}x_{n_{i}}p_{n_{i}})\big{]}\!\big{[}x_{n_{1}}p_{n_{1}}\big{]}^{k_{n_{1}}}\!\big{[}\prod_{j=2}^{m}\!(\xi_{n_{j}})^{k_{n_{j}}}\!\big{]},}

\displaystyle{\small\xi_{n_{j}}=\big{[}\prod_{r=1}^{j-1}(1-x_{n_{r}}p_{n_{r}})\big{]}x_{n_{j}}p_{n_{j}}}.

\displaystyle{\small\xi_{n_{j}}=\big{[}\prod_{r=1}^{j-1}(1-x_{n_{r}}p_{n_{r}})\big{]}x_{n_{j}}p_{n_{j}}}.

v (ϕ_{i}) =

v (ϕ_{i}) =

v (ϕ_{i}) =

V (g^{+}) = i = 1 \sum κ π_{i}^{+} v (ϕ_{i}), and V (g^{-}) = i = - m \sum - 1 π_{i}^{-} v (ϕ_{i}),

V (g^{+}) = i = 1 \sum κ π_{i}^{+} v (ϕ_{i}), and V (g^{-}) = i = - m \sum - 1 π_{i}^{-} v (ϕ_{i}),

\displaystyle{\small\pi_{i}^{+}\!\!=\!\omega^{+}\!\big{(}\sum_{j=i}^{\kappa}\!\eta_{i}\big{)}\textrm{$-$}\omega^{+}\big{(}\!\!\!\sum_{j=i+1}^{\kappa}\!\!\!\eta_{i}\big{)},\,\,\,}{\small\pi_{i}^{-}\!\!=\!\omega^{-}\big{(}\!\!\!\sum_{j=-m}^{i}\!\!\!\eta_{i}\big{)}-\omega^{\textrm{$-$}}\big{(}\!\!\!\sum_{j=-m}^{i-1}\!\!\!\eta_{i}\big{)},}

\displaystyle{\small\pi_{i}^{+}\!\!=\!\omega^{+}\!\big{(}\sum_{j=i}^{\kappa}\!\eta_{i}\big{)}\textrm{$-$}\omega^{+}\big{(}\!\!\!\sum_{j=i+1}^{\kappa}\!\!\!\eta_{i}\big{)},\,\,\,}{\small\pi_{i}^{-}\!\!=\!\omega^{-}\big{(}\!\!\!\sum_{j=-m}^{i}\!\!\!\eta_{i}\big{)}-\omega^{\textrm{$-$}}\big{(}\!\!\!\sum_{j=-m}^{i-1}\!\!\!\eta_{i}\big{)},}

ω^{+} (η) = \frac{η ^{γ^{+}}}{( η ^{γ^{+}} + ( 1 - η ) ^{γ^{+}} ) ^{1/ γ^{+}}}, ω^{-} (η) = \frac{η ^{γ^{-}}}{( η ^{γ^{-}} + ( 1 - η ) ^{γ^{-}} ) ^{1/ γ^{-}}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Game of Drones: Cyber-Physical Security of Time-Critical UAV Applications with Cumulative Prospect Theory Perceptions and Valuations

Anibal Sanjab1,2, Walid Saad1, and Tamer Başar3

1 Wireless@VT, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA,

Emails: {anibals,walids}@vt.edu

2 Flemish Institute for Technological Research, VITO/EnergyVille, Genk, Belgium, Email: [email protected]

3 Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, IL, USA, Email: [email protected]

Abstract

In this paper, a novel mathematical framework is introduced for modeling and analyzing the cyber-physical security of time-critical UAV applications. A general UAV security network interdiction game is formulated to model interactions between a UAV operator and an interdictor, each of which can be benign or malicious. In this game, the interdictor chooses the optimal location(s) from which to target the drone system by interdicting the potential paths of the UAVs. Meanwhile, the UAV operator responds by finding an optimal path selection policy that enables its UAVs to evade attacks and minimize their mission completion time. New notions from cumulative prospect theory (PT) are incorporated into the game to capture the operator’s and interdictor’s subjective valuations of mission completion times and perceptions of the risk levels facing the UAVs. The equilibrium of the game, with and without PT, is then analytically characterized and studied. Novel algorithms are then proposed to reach the game’s equilibria under both PT and classical game theory. Simulation results show the properties of the equilibrium for both the rational and PT cases. The results show that the operator’s and interdictor’s bounded rationality is more likely to be disadvantageous to the UAV operator.

Index Terms:

Unmanned Aerial Vehicles, Cyber-Physical Systems, Security, Network Interdiction Games, Game Theory, Cumulative Prospect Theory.

I Introduction

Recent developments in unmanned aerial vehicle (UAV) technology have led to its adoption in various applications such as telecommunications, surveillance, delivery systems, rescue operations, and intelligence missions [1, 2, 3, 4, 5, 6]. Due to their ability to reach relatively inaccessible locations (such as natural disaster sites, remote mountains, valleys, and forests) and their capacity to travel without being restricted to predefined pathways, UAVs can effectively carry out time-critical missions [1, 7, 8, 9].

I-A Time Critical UAV Applications and Security Challenges

One prominent time-critical UAV application is drone delivery systems [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] which can be used to deliver consumer parcels [10, 6, 11, 12] (with Amazon Prime Air [10] and Google’s Project Wing [6] being key examples) as well as emergency medical products [7, 8, 9]. However, the practical deployment of drone delivery systems can be hindered by their vulnerability to a myriad of cyber and physical attacks [16, 17, 18, 19, 20, 21, 22]. On the physical side, to avoid conflict with manned and commercial aviations, the altitude of UAVs is typically limited to around 400 ft [23], putting them in the range of hunting rifles and firearms. Moreover, UAVs are vulnerable to a variety of cyber threats as demonstrated in [16, 17, 18, 19, 20, 21, 22]. For example, the work in [16] provided a general overview of cyber attacks which can target the confidentiality, integrity, and availability of UAV systems. The authors in [17] focused on the security of the communication links between ground control and unmanned aircrafts. Moreover, the authors in [18] successfully launched a man-in-the-middle attack against a typical UAV used by law enforcement agencies for critical applications. Meanwhile, the authors in [19] and [20] investigated GPS spoofing attacks to manipulate the trajectory of an autonomous UAV while the work in [21] considered jamming, spoofing, and eavesdropping attacks which can target UAV systems. In addition, the authors in [22] surveyed various detection and localization techniques as well as cyber-physical attacks which can be used against UAVs.

On the other hand, the ability of drones to reach secure or private locations has raised concerns regarding their possible usage for executing malicious missions, with recent real-world incidents at Gatwick airport in the UK [23]. For instance, a number of recent works, such as [22] and [24], studied the risks of potentially using UAVs to execute nefarious missions such as targeting a public, political, or military figure in a secure perimeter, intruding into a military secure perimeter, smuggling illicit products, or gaining unauthorized access to personal property. This has led to the development of what is known as anti-drone systems whose goal is to defend against intruding drones as discussed in [22] and [24]. The interactions between intruding drones and anti-drone systems is clearly another highly time-critical application of UAVs, beyond delivery systems.

Security analyses of these two time-critical UAV applications involve: a) a UAV aiming to achieve a mission (benign or malicious) in the shortest possible time and b) an interdictor (malicious, e.g., in drone delivery systems, or benign, e.g., in anti-drone systems) whose goal is to interdict and delay the UAV and compromise its mission. The highly intertwined decision making processes of these two scenarios motivate the need for a holistic strategic analysis which can capture this underlying interdependent decision making processes and identify optimal interdiction and security strategies. However, beyond our preliminary work in [25] on the security of drone delivery systems, which was limited to a static analysis111Our current work advances and generalizes our preliminary results presented in [25]. Our preliminary work [25] considered a static environment while the current work treats a general setting in which the UAV performs a repeated path selection decision making aiming at minimizing a cumulative mission completion time. In addition, the results in [25] mainly relied on numerical simulations while the current work presents rigorous analytical derivation and results., prior art [16, 17, 18, 20, 19, 21, 22, 24], and references therein, have somewhat remarkably ignored such interactive time-critical situations and, instead, have either provided qualitative analyses or focused on specific and isolated security experiments, rather than on a comprehensive study.

I-B Summary of Contributions

The main contribution of this paper is to develop the first comprehensive framework for the modeling and analysis of the cyber-physical security of time-critical UAV applications. We pose the general problem as a network interdiction game with a leader-follower structure between an interdictor (malicious or benign) and a UAV operator (benign or malicious). In this game, the interdictor (i.e. the leader) chooses the optimal attack locations along the area which can be traversed by the UAV to interdict the UAV, via a cyber or physical attack, with the goal of delaying the UAV and compromising its mission. On the other hand, the UAV (i.e. the follower) acts as an evader that chooses the best path selection policy from its origin to its destination, while evading attacks and minimizing its total expected travel time (hereinafter called the expected delivery time) needed to complete the mission. We consider both deterministic and probabilistic interdiction strategies. First, with deterministic interdiction strategies, we derive and analyze the Stackelberg equilibrium (SE) of the game. We then show that a probabilistic interdiction strategy gives rise to a game structure in which the UAV’s problem corresponds to finding an optimal policy in a Markov Decision Process (MDP) and the interdictor’s problem corresponds to setting the parameters of this MDP. In this regard, we characterize the SE of the game with mixed interdiction strategies, and propose practical algorithms to solve the underlying UAV operator’s and interdictor’s problems.

The aforementioned analysis captures the decision making processes of the agents considering that they are fully rational, i.e., they assess delivery times and perceive risk levels objectively. In order to capture wider practical application settings, our work also considers the interdictor’s and UAV operator’s potential subjectivity, i.e. bounded rationality. For instance, time-critical UAV applications aim at strictly accomplishing a mission within a target delivery time as delays in such applications can have tragic consequences. Given this time criticality, the merit of an achieved delivery time can be valued relatively to the target delivery time, rather than as an absolute quantity, and this valuation can be performed subjectively and differently by the UAV operator and the interdictor. In addition, the choices of interdiction and path selection strategies are influenced by various underlying uncertainties which stem, for example, from the probabilistic risk levels of a certain path and the likelihood with which a carried out cyber-physical attack is successful. Hence, due to these uncertainties, the likelihood of achieving a certain delivery time can be perceived and assessed differently by the interdictor and the UAV operator222The subjective valuation of outcomes and distorted perception of probabilities in decision making under risk have been repeatedly observed and quantified in various empirical analyses such as in [26] and [27].. Classical game theory does not capture such subjective valuations and perceptions as it assumes full rationality of the players, which for our game implies that both players assess delivery times and their probability of occurrence objectively and similarly. Thus, to capture these bounded rationality factors in our game, we extend our analysis by using tools from cumulative prospect theory333Cumulative prospect theory [26] provides a refinement and generalization of traditional prospect theory [27, 28, 29, 30] allowing it to accommodate a large number of outcomes as needed in this work. (PT) [26]. In this respect, we consider both deterministic and probabilistic strategies in the PT game analysis. We derive closed-from analytical expressions of the PT valuations of the interdictor and the UAV operator, and prove their convergence. Then, we analytically derive the SE of the deterministic PT game, and propose solution algorithms that deliver numerically the SE of the PT game with mixed interdiction strategies.

We complement our theoretical analysis with extensive simulations, where our results provide key insights into the effects of PT on the equilibrium strategies and achieved delivery times. For example, the numerical results show that the PT bounded rationality of the players is in general disadvantageous to the UAV operator, leading to expected delivery times that exceed the pre-set target delivery times and highlighting the need for proper PT game modeling when specifying such target times.

The rest of this paper is organized as follows. Section II presents the system model and formulates the proposed network interdiction game with fully rational players. Section III and Section IV study the game under deterministic and probabilistic interdiction strategies, respectively. Section V studies the PT game. Numerical results are presented in Section VI; while conclusions and future directions are discussed in Section VII. A summary of our main notations is given in Table I.

II System Model and Problem Formulation

II-A System Model

Consider a drone system in which a UAV, controlled by an operator, executes a time-critical mission requiring it to travel from a source location $O$ to a destination location $D$ in minimum time, referred to as the delivery time. Meanwhile, an interdictor seeks to interdict the UAV’s flight by choosing a certain area or location, among a number of “danger points” along its path from $O$ to $D$ , to launch a cyber-physical attack. The interdictor’s attacks[16, 17, 18, 19, 22] include physical attacks against the UAV (such as using rifles or a military defense system) as well as cyber attacks (such as de-authentication or GPS spoofing attacks) which cause the UAV operator to lose control of the drone. Our model readily captures two time-critical UAV use cases: a) The drone delivery system case in which the UAV is a benign player and the interdictor is malicious, and b) the anti-drone scenario in which the interdictor is an anti-drone system seeking to stop a rogue (or malicious) drone from reaching its destination.

A danger point represents a location (or area) along the possible paths between $O$ and $D$ , from which the UAV is exposed to possible cyber-physical attacks. Such points can represent locations of high altitude, which allow line-of-sight and spatial proximity (e.g., high hills, high-rise buildings, etc.) between a potential attacker and the UAV. As a result, the set of danger points between $O$ and $D$ correspond to inevitable locations along the drone’s flight paths that are susceptible to attacks by a malicious interdictor or an anti-drone system. The set of danger points between $O$ and $D$ define a security network represented by a directed graph $\mathcal{G}(\mathcal{N},\mathcal{E})$ , as shown in Fig. 1, in which the set of vertices, $\mathcal{N}$ , is the set of $N$ danger points between $O$ and $D$ , and the set of edges, $\mathcal{E}$ , such that $|\mathcal{E}|=E$ , is the set of connections between these danger points. Given that, in practice, the UAV’s travel from origin to destination may not be restricted to predefined airways, there can be an infinite number of paths which connect $O$ to $D$ . However, each one of these paths will go through a number of danger points that may be shared among different paths. This infinite set of possible $O$ to $D$ paths can, from a security viewpoint, be represented by the set of danger points that each path traverses. Given the time-critical nature of the considered UAV applications, the defined set of edges $\mathcal{E}$ in the security graph $\mathcal{G}$ will comprise the shortest paths between each two danger points. For two neighboring points $i$ and $j$ connected by edge $e_{k}\in\mathcal{E}$ , we let $t(i,j)$ , $t(.)\!:\!\mathcal{E}\rightarrow\mathds{R}$ , be the time that the UAV needs to travel from $i$ to $j$ over $e_{k}$ .

We let $p_{n}$ be the probability with which an attack launched from point $n\in\mathcal{N}$ is successful. Without loss of generality, we consider that for any $n\in\mathcal{N}\setminus\{O,D\}$ , $p_{n}\neq 0$ ; and for $n^{\prime}\in\{O,D\}$ , $p_{n^{\prime}}=0$ . We define $\mathcal{H}$ to be the set of $H$ paths (containing no repeated vertices444Cycles are naturally dismissed by a UAV operator aiming to minimize delivery time.) from the origin, $O$ , to destination, $D$ , over the security graph $\mathcal{G}$ . For each path555An $O$ -to- $D$ path $h\in\mathcal{H}$ is represented by its sequence of nodes connecting $O$ to $D$ . Hence, we use the notation $n\in h$ to represent a node $n$ that is in $h$ . $h\in\mathcal{H}$ , we define a distance function $f^{h}(.)\!\!:h\rightarrow\mathds{R}$ , which takes an input node $n\in h$ and returns the time needed by the UAV to reach $n\in h$ from $O$ following path $h\in\mathcal{H}$ . For example, in Fig. 1, $f^{h^{\prime}}(5)=t_{2}+t_{6}$ where $h^{\prime}\triangleq(1,3,5,8,10)$ .

On this security graph $\mathcal{G}$ , the interdictor aims at finding the best interdiction strategy (a choice of danger points from which to launch an attack) to intercept/delay the travel of the UAV while the UAV acts as an evader who aims at finding the best travel policy, and as a result a path selection strategy, to reach $D$ from $O$ in a minimum delivery time.

II-B Game-Theoretic Problem Formulation

The UAV operator, denoted by player $U$ , must find the best possible path for the UAV to take over graph $\mathcal{G}$ to reach $D$ from $O$ in minimum time while accounting for the presence of the interdictor (player $I$ ). In case the UAV is successfully compromised by the interdictor from a node $n\in\mathcal{N}$ , $U$ will have to resend a new UAV with the same mission from node $O$ , which leads to both financial losses and delayed delivery time. Hence, a successful attack by $I$ at node $n$ can be mathematically modeled as if the UAV had returned to the point of origin from which it needs to travel again to its destination. Hence, with the goal of minimizing delivery time, $U$ may not always choose the shortest $O$ -to- $D$ path if this path is suspected to be risky. Hence, the path selection strategy must account for possible interdiction strategies so as to successfully accomplish the $O$ -to- $D$ mission in a minimum delivery time. Similarly, the interdiction strategy must anticipate the possible paths that may be taken by the UAV to maximize this delivery time. To model and analyze these intertwined decision making processes of the interdictor and the UAV operator, we next introduce a novel time-critical network interdiction game.

In this game, the set of players is $\mathcal{Z}\triangleq\{U,I\}$ . $I$ chooses first an interdiction strategy $\boldsymbol{x}\in\mathcal{X}$ which is a probability distribution over the set of danger points, $\mathcal{N}$ , where $x_{n}$ (i.e. element $n$ of vector $\boldsymbol{x}$ ) specifies the probability with which to launch an attack from node $n\in\mathcal{N}$ while satisfying $\sum_{n\in\mathcal{N}}x_{n}=1$ . We refer to this probabilistic choice of $\boldsymbol{x}$ as a mixed interdiction strategy. A special case of $\boldsymbol{x}$ consists of restricting $\boldsymbol{x}$ to pure interdiction strategies in which case $x_{n}=1$ for some $n=m\in\mathcal{N}$ and $x_{n}=0$ for $n\in\mathcal{N}\setminus{m}$ . On the other hand, $U$ chooses a travel policy (i.e. a path selection strategy), which specifies the node $n^{\prime}\in\mathcal{N}_{g}(n)$ to go to from each possible node $n\in\mathcal{N}$ , where $\mathcal{N}_{g}(n)$ is the set of outgoing neighbor nodes of $n$ in graph $\mathcal{G}$ . Such a policy will result in a certain $O$ -to- $D$ path. Hence, the goal of $I$ is to choose the best interdiction strategy $\boldsymbol{x}$ , while anticipating the path selection policy that could be taken by $U$ , to maximize the expected delivery time while the goal of $U$ is to respond to $\boldsymbol{x}$ by choosing the best possible path $h\in\mathcal{H}$ to minimize the expected delivery time. This gives rise to a leader-follower (with $I$ as the leader and $U$ as the follower) hierarchical time-critical network interdiction game. We next separately study the games under pure interdiction and mixed interdiction strategies.

III Game under Pure Interdiction Strategies

III-A Game Formulation under Pure Strategies

Under pure strategies, $I$ chooses to be located at node $n$ (the action space of $I$ is, hence, $\mathcal{N}$ ) while the UAV seeks to choose an $O$ -to- $D$ path $h\in\mathcal{H}$ . If $h\in\mathcal{H}$ contains node $n$ , when traveling from $O$ -to- $D$ along path $h$ , it will traverse all danger points $n^{\prime}\in h,\,n^{\prime}\neq n$ without any risk of being attacked. However, when the UAV reaches danger point $n$ , it may continue its path with probability $1-p_{n}$ , i.e., the probability with which the attack launched from $n$ is not successful, or it may be sent back to $O$ with probability $p_{n}$ , i.e., the probability with which the attack launched from $n$ is successful. Let $t_{a}$ be the re-handling time, which is the time needed by the operator to send a new UAV, if the original one was compromised, captured, or destroyed. In other words, $t_{a}$ is the time span between the instant at which the drone is compromised or destroyed and the instant at which a new replacement drone is sent from $O$ . This time span would include the time delay for the operator to detect666We consider that when the UAV is attacked, $U$ can eventually detect (with a possible delay accounted for as part of $t_{a}$ ) that the UAV has been destroyed/compromised. Hence, the inclusion of $t_{a}$ allows our model to accommodate attack types which might not be promptly detected by $U$ . that an attack has taken place and the time the operator needs to prepare a new replacement drone. Then, the possible delivery times which can occur when $n\in h$ and their probability of occurrence will be:

[TABLE]

for $k\in\mathds{N}_{0}$ ; where $q_{n}=1-p_{n}$ , $T_{k}$ is the $k$ th possible delivery time, and $\tau_{k}$ is the probability of occurrence of $T_{k}$ . Hence, based on the possible delivery times and their likelihood, defined respectively in (1) and (2), the expected delivery time, $E_{d}(n,h)$ ,777We also use the notations $E_{d}(n\in h)$ and $E_{d}(n\notin h)$ to highlight whether or not path $h$ contains node $n$ in the computed expected delivery time. when the interdictor is located at $n$ and the UAV takes path $h$ is given in Proposition 1.

Proposition 1

The expected delivery time for an interdiction and path selection strategy pair, $(n,h)$ , is given by:

[TABLE]

Proof:

First, we consider the case in which $n\notin h$ . If the chosen path $h$ does not contain $n$ , then the UAV cannot be successfully attacked, which yields $E_{d}(n\notin h)=f^{h}(D)$ . Second, we consider the case in which $h$ contains node $n$ , i.e. $n\in h$ . From (1), one can see that $f^{h}(D)$ appears in every possible delivery time outcome, while $(f^{h}(n)+t_{a})$ is multiplied by the number of times the UAV had been successfully attacked at $n$ before it was successfully able to traverse $n$ . This latter component of (1) corresponds to the number of failures that the UAV experiences before the first success in traversing $n$ . Consider being successfully attacked at $n$ to be a failure of the UAV in traversing $n$ , which can occur with probability $p_{n}$ , and consider traversing $n$ to be a success for the UAV, which can occur with probability $q_{n}=1-p_{n}$ ; then, the expected delivery time will be: $E_{d}(n\in h)\textrm{$ = $}(\textrm{expected \# failures before$ 1^{\textrm{st}} $success})(f^{h}(n)\textrm{$ + $}t_{a})\textrm{$ + $}f^{h}(D).$ The number of failures before the first success follows a geometric distribution whose mean is given by $\mu=\frac{1-q_{n}}{q_{n}}=\frac{p_{n}}{1-p_{n}}$ . As a result, $E_{d}(n\in h)\textrm{$ = $}\frac{p_{n}}{1-p_{n}}(f^{h}(n)+t_{a})+f^{h}(D).$

∎

Hence, the $\frac{p_{n}}{1-p_{n}}(f^{h}(n)+t_{a})$ term in (4) can be viewed as a delay penalty, which the UAV would endure for taking the risk of traversing a risky danger point at which the interdictor is located. The goal of the interdictor is to maximize this expected delivery time, $E_{d}(n,h)$ , while the goal of the UAV operator is to minimize it, leading to a zero-sum game.

III-B Equilibrium in Pure Strategies

For each choice $n\in\mathcal{N}$ by the interdictor, $U$ can identify the optimal reaction strategy $h=\rho(n)$ specifying the best path to take when $I$ chooses $n$ . The equilibrium concept of this hierarchical game structure is known as the Stackelberg equilibrium [31] and is defined as follows:

Definition 1

A strategy pair $(n^{*},h^{*})$ constitutes a Stackelberg equilibrium (SE) of the network interdiciton game if

[TABLE]

where $E_{d}(n,h)$ is as given in (3) and (4).

Denoting a shortest $O$ -to- $D$ path by $h_{s}$ and a shortest $O$ -to- $D$ path not containing a node $n$ by $h_{n}$ , the SE of our network interdiction game can be analytically characterized.

Theorem 1

The interdictor’s SE strategy, $n^{*}$ , is given by:

[TABLE]

where

[TABLE]

*The UAV operator’s SE strategy is given by *

[TABLE]

*In addition, the resulting SE expected delivery time is *

[TABLE]

Proof:

The proof is presented in Appendix A. ∎

The SE888The SE of the game is not necessarily unique. However, given the hierarchical structure of the game [31], all possible SEs will lead to an equal expected delivery time. This equally applies to the equilibria which we will derive for the games that ensue. highlights that, from a delivery time perspective, selecting the shortest path may still be optimal since it may result in an expected delivery time that is lower than all other alternative paths. This, in particular, occurs when $n^{*}=n_{1}$ . However, in general, as shown in Theorem 1, the optimal path selection strategy goes beyond simply considering the shortest $O$ -to- $D$ path, as is, for example, the case when $n^{*}=n_{2}$ .

IV Game under Mixed Interdiction Strategies

IV-A Game Formulation with Mixed-Strategy Interdiction

We now analyze the time-critical network interdiction game under a more general probabilistic choice of interdiction999Given the hierarchical structure of our game, considering mixed path selection policies by $U$ would not yield any advantage regarding the achieved expected delivery time as compared to the optimal deterministic path selection policy [31, 32]. Thus, we limit our analysis to deterministic path selection.. Here, the interdictor may prefer to choose a probabilistic (i.e. mixed) interdiction strategy to possibly prevent $U$ from predicting their exact actions and, hence, potentially achieving a better outcome. In this case, $I$ ’s mixed-strategy vector, $\boldsymbol{x}=[x_{1},x_{2},...,x_{N}]\in\mathcal{X}$ specifies the probability with which $I$ plans to launch an attack on the UAV from the nodes in $\mathcal{N}$ . Hence, under mixed-strategy interdiction, the UAV can be subject to successive probabilistic attacks from multiple nodes. Next, we show that when $I$ chooses a mixed interdiction strategy $\boldsymbol{x}$ , $U$ ’s choice of optimal path becomes an MDP problem whose transition probabilities result from the choice $\boldsymbol{x}$ by $I$ .

Consider the case in which $I$ had chosen strategy $\boldsymbol{x}\in\mathcal{X}$ and the UAV was at node $n$ , at time $t_{0}$ , and then decides to go to a neighboring node $j\in\mathcal{N}_{g}(n)$ . By reaching node $j$ (i.e. the proximity of danger point $j$ ) at time $t_{0}+t(i,j)$ , the UAV could be subject to an attack. The probability with which the UAV is successfully attacked at node $j$ is equal to $x_{j}p_{j}$ . Hence, if the UAV has reached node $i$ at time $t_{0}$ and then decided to go to node $j$ next, it can either reach node $j$ at time $t_{0}+t(i,j)$ and not be successfully attacked at $j$ (with probability $1-x_{j}p_{j}$ ), or it can be brought back to the origin when reaching node $j$ (if subject to a successful attack) with probability $x_{j}p_{j}$ . This latter case implies that the UAV would reach node $O$ at time $t_{0}+t(i,j)+t_{a}$ with probability $x_{j}p_{j}$ . This security problem can be modeled as an MDP [32] whose transition probabilities depend on the security graph, $\mathcal{G}$ , and on the choice $\boldsymbol{x}$ of $I$ . We define the set of states of this MDP to be the set of nodes $\mathcal{N}$ of $\mathcal{G}$ . $U$ can then decide to go from a node $n$ to any of its neighboring nodes (i.e. next potential states). However, its transition to this state is stochastic because, if the attack is successful, instead of transitioning to a neighboring node, the UAV transitions to state $O$ .

The state transition probabilities, $M\big{(}i,j;(\boldsymbol{x},k)\big{)}$ , specify the probability of transitioning from state $i$ to state $j$ when $I$ chooses strategy $\boldsymbol{x}$ and $U$ chooses action $k$ when at $i$ (choosing action $k$ refers to choosing to move from node $i$ to node $k\in\mathcal{N}_{g}(i)$ )101010 $M\big{(}i,j;(\boldsymbol{x},k)\big{)}$ can be alternatively represented as $M\big{(}i,j;(\boldsymbol{x},i\rightarrow k)\big{)}$ to explicitly indicate that the action of $U$ is to move the UAV from node $i$ to node $k$ . However, for ease of notation, and since it is given that the UAV is initially at state $i$ , rather than using $i\rightarrow k$ , we use only the end node $k$ to indicate the operator’s action.. $M\big{(}i,j;(\boldsymbol{x},k)\big{)}$ is defined as:

[TABLE]

Here, we note the fundamental difference between the attempted action, $k$ , by $U$ and the MDP state $j$ to which the UAV transitions from state $i$ . In fact, in both (15) and (16), the attempted action is to move the UAV from node $i$ to node $k$ . However, the MDP state to which the UAV transitions is either $j=O$ or $j=k$ depending on whether or not the UAV is successfully attacked. The instantaneous cost to $U$ (reward to $I$ ) from a state transition from $i$ to $j$ , when $I$ chooses $\boldsymbol{x}$ and $U$ chooses to move to node $k$ , can be expressed as follows:

[TABLE]

For every transition between two states, the UAV accumulates additional delivery time as expressed in (18) and (19), until the UAV reaches $D$ and the game ends. The goal of $U$ is hence to minimize this expected cumulative delivery time. Therefore, the choice of a mixed strategy by the interdictor, $\boldsymbol{x}$ , defines an MDP111111Hence, hereinafter, we refer to this MDP as the MDP induced by $\boldsymbol{x}$ . whose set of states is $\mathcal{N}$ with transition probabilities as defined in (15)-(17) and instantaneous reward/cost structure as shown in (18) and (19). The goal of $U$ is to choose the best MDP policy to minimize its expected accumulated delivery time, where a policy $\pi_{\boldsymbol{x}}$ specifies, for each node $n\in\mathcal{N}\setminus\{D\}$ , the next node $n^{\prime}\in\mathcal{N}_{g}(n)$ to which to go. We let $\mathcal{P}$ be the set of all policies. We note that, given the state transitions in (15)-(17), a policy $\pi_{\boldsymbol{x}}$ practically results in one realizable $O$ -to- $D$ path denoted by $h_{\pi_{\boldsymbol{x}}}$ . This is due to the fact that under the MDP policy $\pi_{\boldsymbol{x}}$ , only the nodes of a certain path will ever be reached. Hence, a policy reduces to a path selection strategy. Given the equivalence between a policy $\pi_{\boldsymbol{x}}$ and its resulting $O$ -to- $D$ path $h_{\pi_{\boldsymbol{x}}}$ , we next use the two notations interchangeably depending on whether the emphasis is on a general policy $\pi_{\boldsymbol{x}}$ or on its resulting path $h_{\pi_{\boldsymbol{x}}}$ . We define $E_{\pi_{\boldsymbol{x}}}(s;\boldsymbol{x})$ to be the value of the state $s\in\mathcal{N}$ when $U$ follows policy $\pi_{\boldsymbol{x}}$ for the MDP induced by the mixed strategy, $\boldsymbol{x}$ , of player $I$ . In other words, $E_{\pi_{\boldsymbol{x}}}(s;\boldsymbol{x})$ is the expected time that the UAV needs to reach $D$ from $s$ when policy $\pi_{\boldsymbol{x}}$ is followed. Based on (15)-(19), we can express the values of the states, for a given policy $\pi_{\boldsymbol{x}}$ , recursively; as follows:

[TABLE]

As such, the values of each two consecutive nodes, $n_{i}$ and $n_{j}$ ( $n_{j}$ being reached from $n_{i}$ based on $\pi_{\boldsymbol{x}}$ ), are such that:

[TABLE]

Of particular interest to our analysis is the value at the origin, $E_{\pi_{\boldsymbol{x}}}(O;\boldsymbol{x})$ , which constitutes the expected delivery time when following policy $\pi_{\boldsymbol{x}}$ . For a given choice $\boldsymbol{x}$ by $I$ , the goal of $U$ is to find a policy $\pi^{*}_{\boldsymbol{x}}$ which minimizes $E_{\pi_{\boldsymbol{x}}}(O;\boldsymbol{x})$ . The optimal values, $E_{\pi^{*}_{\boldsymbol{x}}}(s;\boldsymbol{x})$ , at each state $s$ – i.e. the minimum expected time for the UAV to reach $D$ from $s$ – are interdependent in a recursive manner following from the Bellman equation:

[TABLE]

Based on the recursive definition in (21), the value at the origin for an interdiction strategy $\boldsymbol{x}$ and an MDP policy $\pi_{\boldsymbol{x}}$ , inducing a path $h_{\pi_{\boldsymbol{x}}}$$=$$($ $O,$ $n_{1},$ $n_{2},$ $n_{3},$ $...,$ $n_{r},$ $n_{l},$ $n_{k},$ $n_{m},$ $D)$ containing $m+2$ nodes with ordered indices, is given in Proposition 2.

Proposition 2

The expected delivery time, $E_{\pi_{\boldsymbol{x}}}(O;\boldsymbol{x})$ , for a mixed interdiction strategy $\boldsymbol{x}$ and MDP policy $\pi_{\boldsymbol{x}}$ , inducing path $h_{\pi_{\boldsymbol{x}}}=(O,n_{1},n_{2},n_{3},...,n_{r},n_{l},n_{k},n_{m},D)$ , is given by:

[TABLE]

*where $g(.)$ is a function which takes either $2$ or $3$ inputs ( $2$ or $3$ consecutive nodes of a path $h_{\pi_{\boldsymbol{x}}}$ , respectively) and which we define as $g(k,m,n)$$=x_{n}p_{n}(t(m,n)+t_{a})$$+t(k,m)$ , and $g(m,n)$$=x_{n}p_{n}(t(m,n)$$+t_{a})$ , considering $k$ , $m$ , and $n$ to be three consecutive nodes of a path $h_{\pi_{\boldsymbol{x}}}$ *

Proof:

The proof follows directly from (21) and from the fact that $E_{\pi_{\boldsymbol{x}}}(D;\boldsymbol{x})=0$ for any possible policy, since the expected delivery time starting from $D$ is equal to [math]. Details are omitted due to space limitations. ∎

To solve the game, we define the SE with mixed-strategy interdiction121212The MSE in Definition 2 is a saddle point of our underlying zero-sum game. An alternative approach for studying the equilibrium of the zero-sum game is to identify its corresponding saddle point in mixed strategies (i.e. considering mixed strategies for both players), where these saddle-point mixed strategies can be computed by solving a linear program [31]. However, the MSE in Definition 2 is tailored to the structure of our game, introduced in Section II, and does not follow a brute-force approach.:

Definition 2

A strategy pair $(\boldsymbol{x}^{*},\pi^{*}_{\boldsymbol{x}^{*}})$ constitutes a mixed interdiction Stackelberg equilibrium (MSE) of the network interdiction game if

[TABLE]

This MSE can be also equivalently defined in terms of $\boldsymbol{x}^{*}$ and the optimal path induced by $\pi^{*}_{\boldsymbol{x}^{*}}$ , i.e., $(\boldsymbol{x}^{*},h^{*}=h_{\pi^{*}_{\boldsymbol{x}^{*}}})$ .

IV-B Game Equilibrium under Mixed-Strategy Interdiction

$U$ ’s problem consists of computing the optimal policy (or optimal path) for the MDP induced by $\boldsymbol{x}$ . This can be achieved using known methods such as value iteration and policy iteration methods [32]. Indeed, for obtaining the values at each state (i.e. node) resulting from a policy $\pi_{\boldsymbol{x}}$ (known as policy evaluation), $E_{\pi_{\boldsymbol{x}}}(O;\boldsymbol{x})$ can be computed as shown in (23) and then used to find $E_{\pi_{\boldsymbol{x}}}(s;\boldsymbol{x})$ for each $s\in\mathcal{S}$ by starting from $D$ (whose value is $E_{\pi_{\boldsymbol{x}}}(D;\boldsymbol{x})=0$ ) and moving backwards while applying (21). As such, using policy iteration [32], starting from a certain MDP policy, policy evaluation and policy improvement steps can be sequentially taken to converge to the optimal policy.

In their traditional form, value and policy iteration methods seek to find an optimal policy specifying the best action to take from every state in the state space. However, as stated in our game formulation, a certain policy leads to a unique resulting $O$ -to- $D$ path resulting in a certain value at the origin as shown in (23). Next, we propose an alternative method for identifying $U$ ’s problem solution which does not seek to find the optimal action to be taken from each possible state, but rather an optimal $O$ -to- $D$ path. This method is dubbed the all-paths method and can be carried out by the following steps:

Find all possible paths, $\mathcal{H}$ , from $O$ to $D$ , 2. 2.

Evaluate $E_{h}(O;\boldsymbol{x})$ for each path $h\in\mathcal{H}$ using (23), 3. 3.

Find the optimal path $h^{*}$ which solves:

[TABLE]

Note that after computing $E_{h}(O;\boldsymbol{x})$ , and given that $E_{h}(D;\boldsymbol{x})=0$ , the resulting optimal values at the nodes of $h^{*}$ can be computed following (21).

Remark 1

The all-paths method is guaranteed to find a solution to $U$ ’s problem, given in (25), in $|\mathcal{H}|=H$ iterations. By its definition, the all-paths method searches over all possible $O$ -to- $D$ paths. Due to the equivalence between a certain policy and its resulting path in terms of the achieved value at the origin, searching over all possible paths $\mathcal{H}$ , requiring $H$ iterations, will guarantee obtaining the solution to (25).

The all-paths method can be considered an informed exhaustive search method. In fact, rather than searching over all possible policies, $\mathcal{P}$ , whose size can be computed as $|\mathcal{P}|=\prod_{n\in\mathcal{N}\setminus\{D\}}|\mathcal{N}_{g}(n)|\geq H$ , the all-paths method leverages the policy-path equivalence to search only over the set of possible $O$ -to- $D$ paths, $\mathcal{H}$ . If the security graph, $\mathcal{G}$ , can be split into phases where each two consecutive phases form a complete bipartite graph131313We refer to such graphs as phase-connected graphs, which reflect the practical case in which the UAV goes from one set of danger points to the other (e.g. between sets of hills and sets of high-rise buildings) with relatively safe conditions in between. (as is the case in Fig. 1 and Fig. 2), $H$ grows linearly in the number of nodes, $N_{i}$ , in a given phase. Indeed, in a phase-connected graph with $A$ phases, the total number of $O$ -to- $D$ paths is given by $H=\prod_{i=1}^{A}N_{i}$ .

For example, in Fig. 2, $A=5$ and $H=18$ while $|\mathcal{P}|=216$ ; the latter is the number of iterations needed for a standard exhaustive search. Hence, the all-paths method requires fewer iterations than the exhaustive search method, and in contrast to policy and value iterations, each iteration of the all-paths method is search-free (that is, it does not require a minimization step) and is only limited to arithmetic operations which can be efficiently performed.

From the interdictor’s side, after predicting the reaction $\pi^{*}_{\boldsymbol{x}}$ for a chosen interdiction strategy $\boldsymbol{x}\in\mathcal{X}$ , $I$ aims at solving the optimization problem defined in (24). The main challenge with solving this problem resides in the discontinuous changes in the objective function which can be induced by a slight modification to the chosen strategy $\boldsymbol{x}$ . This is due to the fact that a minimal change to the chosen $\boldsymbol{x}$ can lead to a complete modification of the resulting optimal reaction MDP policy of $U$ . Hence, due to the discontinuity of the objective function in (24), finding an exact globally optimal solution to the interdictor’s problem may not be guaranteed. The search for such a global optimum can be done using heuristic methods such as pattern search based methods [33]. By using pattern search based methods, an achievable solution to the interdictor’s problem can be obtained which leads to what we consider an achievable MSE. As such, the proposed all-paths method and pattern search are two complimentary methods, which when combined, allow computing an MSE of the network interdiction game.

V Game Analysis under PT

As established in Section III and Section IV, the choices of interdiction and path selection strategies are carried out under uncertainty. Indeed, every chosen interdiction strategy and path selection strategy give rise to a prospect: A set of possible achievable delivery times each of which can occur with a certain probability. In fact, when $I$ chooses $\boldsymbol{x}$ and $U$ chooses path $h=(O,n_{1},n_{2},n_{3},...,n_{r},n_{l},n_{k},n_{m},D)$ , and if we let $k_{n_{i}}\in\mathds{N}_{0}$ be the number of times the UAV is successfully attacked at node $n_{i}\in h\setminus\{O,D\}$ , then the possible achieved delivery times $T^{\prime}(k_{n_{1}},k_{n_{2}},...,k_{n_{m}})$ and their associated probabilities of occurrence, $\tau^{\prime}(k_{n_{1}},k_{n_{2}},...,k_{n_{m}})$ , will be given by141414The expressions in (27) and (28) reduce, respectively, to (1) and (2) when considering pure-strategy interdiction.

[TABLE]

where

[TABLE]

The previous analyses in Section III and Section IV had considered the situation where the uncertainty is managed by $I$ and $U$ in a fully rational and objective manner. In other words, the possible delivery times, in (27), and the probabilities of their occurrence, in (28), are similarly and objectively perceived by $I$ and $U$ , leading the players to assess a pair of strategies based on an expected value of their resulting prospect. However, given the time criticality of the studied drone applications (which must execute certain missions within a target time period), a certain achieved delivery time can be assessed subjectively and differently by $U$ and $I$ with respect to their chosen target delivery times. In addition, the perception of probabilities by $U$ and $I$ can be distorted, which makes them deviate from the rational objective perception, leading each player to assess the risk level of a certain path differently. Indeed, as has been shown in a number of psychological empirical studies, as in [26] and [27], when faced with risk and uncertainty (similarly to our time-critical network interdiction game), the decision making processes of individuals can significantly deviate from full rationality. Essentially, individuals have been found to subjectively evaluate outcomes and perceive probabilities [27, 26], hence assessing a certain prospect not based on its expected value but based on a subjective valuation assigned to this prospect.

To capture the interdictor’s and UAV operator’s potential subjective perceptions (i.e. bounded rationality)151515Although the proposed game policy will be implemented autonomously by the drone, the design of the game-theoretic policies are performed by a human operator whose perceptions are subjective and rationality is bounded., we incorporate the principles of cumulative prospect theory [26] in our game formulation. PT is a Nobel prize-winning theory which has been shown to successfully model and predict decision makers’ subjective behaviors, preferences, and valuations. Indeed, using PT, the subjective perception of the likelihood of occurrence of a probabilistic delivery time and the subjective evaluation of this delivery time with respect to a reference point becomes central to the decision making processes of $I$ and $U$ . Consider a prospect $g(\phi_{i},\eta_{i})$ , listing each possible outcome $\phi_{i}$ and its probability of occurrence $\eta_{i}$ . Each $\phi_{i}$ is a possible delivery time $T^{\prime}$ in (27) and $\eta_{i}$ is its corresponding probability, $\tau^{\prime}_{i}$ , in (28). Under PT, for a maximizer, the value of an outcome $\phi_{i}$ , denoted by $v(\phi_{i})$ , with respect to a reference point $R$ is given by [26]:

[TABLE]

where $\lambda$ is known as the loss multiplier and $\beta^{+}$ and $\beta^{-}$ are constant parameters which shape the value function. Based on the sign of $v(\phi_{i})$ , $g$ can be split into a negative prospect $g^{-}$ and positive prospect $g^{+}$ . The values in $g^{-}$ correspond to losses and the values in $g^{+}$ correspond to gains. Consider that $g^{-}$ contains $m$ terms, indexed from $-m$ to $-1$ , and $g^{+}$ contains $\kappa$ terms, indexed from $1$ to $\kappa$ . In addition, consider that each of the two prospects are ranked in ascending order based on the values, $v(\phi_{i})$ . Under PT, the valuations of the positive and negative prospects, $V(g^{+})$ and $V(g^{-})$ , are given by[26]:

[TABLE]

resulting in the valuation, $V(g)=V(g^{+})+V(g^{-})$ , of prospect $g$ . $\pi_{i}^{+}$ and $\pi_{i}^{-}$ are decision weights defined based on the cumulative probability of occurrence of outcome $\phi_{i}$ :

[TABLE]

where $\omega^{+}$ and $\omega^{-}$ are the weighting functions associated with the positive and negative prospects, respectively, and are defined as follows (for a certain objective probability $\eta$ )[26]:

[TABLE]

where $\gamma^{+}\in(0,1]$ and $\gamma^{-}\in(0,1]$ are known as the rationality parameters. The higher the value of the rationality parameter, the closer are $\omega^{+}(\eta)$ and $\omega^{-}(\eta)$ to the rational probability $\eta$ .

The expressions in (33) showcase the way decision weights are formed from cumulative probabilities of outcomes in a prospect. In fact, $\sum_{j=i}^{\kappa}\eta_{i}$ corresponds to the probability that the outcome is at least as good as $\phi_{i}$ while $\sum_{j=i+1}^{\kappa}\eta_{i}$ corresponds to the probability that the outcome is strictly better than $\phi_{i}$ . Equivalently, $\sum_{j=-m}^{i}\eta_{i}$ corresponds to the probability that the outcome is at least as bad as $\phi_{i}$ while $\sum_{j=-m}^{i-1}$ corresponds to the probability that the outcome is strictly worse than $\phi_{i}$ .

Next, we formulate our network interdiction game under PT, which we call the PT game. We also split our analysis of the PT game into pure and mixed interdiction cases. Here, we note that the notations of the constants used in (30), (31), and (34), i.e. $\lambda,R,\,\beta^{+},\,\beta^{-},\,\gamma^{+}$ , and $\gamma^{-}$ , will be consistently used in the analyses that ensues but will be indexed by $I$ and $U$ depending on the player to which they refer.

V-A PT Game under Pure Interdiction Strategies

As discussed in Section III-A, when $U$ chooses path $h$ and $I$ is located on node $n\in h$ , the possible outcomes, $T_{k}$ , and their associated probability of occurrence, $\tau_{k}$ , for $k\in\mathds{N}_{0}$ , are as described, respectively, in (1) and (2). Hence, the $(n,h)$ strategy pair gives rise to a prospect, $g(n\in h)$ , in which the outcomes are ordered from lowest to highest, and is expressed as:

[TABLE]

As PT predicts, the interdictor and the UAV operator evaluate each possible outcome of this prospect subjectively, as shown in (30) and (31). In this regard, the valuation, $v_{k}^{I}$ , that the interdictor gives to the $k^{\textrm{th}}$ possible outcome, $T_{k}=f^{h}(D)+k(f^{h}(n)+t_{a})$ , is as follows:

[TABLE]

where

[TABLE]

Given that the interdictor aims at maximizing the expected delivery time, $\Delta I_{k}\geq 0$ is seen as a gain while $\Delta I_{k}<0$ is seen as a loss. Equivalently, the valuation, $v_{k}^{U}$ , that the UAV operator gives to the $k^{\textrm{th}}$ possible outcome, $T_{k}$ , is as follows:

[TABLE]

where

[TABLE]

Since $U$ aims at minimizing the expected delivery time, $\Delta U_{k}>0$ is evaluated as a loss while $\Delta U_{k}\leq 0$ is viewed as a gain.

Using PT principles, we derive the valuations that $I$ and $U$ assign to each possible choice of the pair of pure interdiction and path selection strategies $(n,h)$ . We denote these valuations by $V_{I}(n,h)$ and $V_{U}(n,h)$ for, respectively, $I$ and $U$ .

Theorem 2

The PT valuation that $I$ assigns to a strategy pair $(n,h)$ is given by

[TABLE]

where

[TABLE]

where $k_{I}^{-}$ and $k_{I}^{+}$ are such that: $\Delta I_{k}<0$ for $k\leq k_{I}^{-}$ , $\Delta I_{k}>0$ , for $k>k_{I}^{+}$ , and $k_{I}^{+}=k_{I}^{-}+1$ .

Proof:

The proof is presented in Appendix B.

∎

Theorem 3

The PT valuation that $U$ assigns to a strategy pair $(n,h)$ is given by

[TABLE]

where

[TABLE]

where $k_{U}^{-}$ and $k_{U}^{+}$ are such that: $\Delta U_{k}<0$ for $k\leq k_{U}^{-}$ , $\Delta U_{k}>0$ for $k\geq k_{U}^{+}$ , and $k_{U}^{+}=k_{U}^{-}+1$ .

Proof:

This proof follows steps similar to those in the proof of Theorem 2 while accounting for the valuations that $U$ assigns to each possible outcome given in (39)-(41).

∎

As shown in (45) and (49), $V_{I}(g_{I}(n\in h))$ and $V_{U}(g_{U}(n\in h))$ correspond to infinite summations, i.e. infinite series. Hence, to be able to compare between possible pairs of strategies $(n,h)$ , based on their valuations $V_{I}(n,h)$ and $V_{U}(n,h)$ , and to identify the equilibrium strategy pair, it is necessary for these sums to converge. We next show in Proposition 3 and Proposition 4 that $V_{I}(g_{I}(n\in h))$ and $V_{U}(g_{U}(n\in h))$ are convergent series.

Proposition 3

$V_{I}(g_{I}(n\in h))$ * is a convergent series.*

Proof:

Toward proving the convergence of $V_{I}(g_{I}(n\in h))$ , we first prove that $V_{I}(g_{I}^{+}(n\in h))$ , defined in (68) and composed of positive terms, converges using what is known as the ratio test. Following the ratio test, for a series $\sum_{n=1}^{\infty}a_{n}$ with positive terms $a_{n}$ , $L$ is defined as $L=\underset{n\rightarrow\infty}{\textrm{lim}}|\frac{a_{n+1}}{a_{n}}|$ . If $L<1$ , then $\sum_{n=1}^{\infty}a_{n}$ converges. As such, we refer to the $k^{\textrm{th}}$ term of $V_{I}(g_{I}^{+}(n\in h))$ by $V^{I^{+}}_{k}$ , which is given by $V^{I^{+}}_{k}=(\Delta I_{k})^{\beta_{I}^{+}}\Big{[}\omega_{I}^{+}\big{(}(p_{n})^{k}\big{)}\textrm{$ - $}\omega_{I}^{+}\big{(}(p_{n})^{k+1}\big{)}\Big{]}$ , while $\omega_{I}^{+}(p_{n}^{k})$ follows from (34). In this respect,

[TABLE]

∎

Proposition 4

$V_{U}(g_{U}(n\in h))$ * is a convergent series.*

Proof:

The proof follows steps similar to those in the proof of Proposition 3. ∎

Under PT, the pure-strategy equilibrium of the game is based on the subjective valuations, $V_{I}(n,h)$ and $V_{U}(n,h)$ , that $I$ and $U$ respectively assign to the prospect resulting from the choice of strategy pair $(n,h)$ . As such, under PT, the game becomes a nonzero-sum game whose SE is analyzed next.

As in the analysis in Section III-B, $U$ can optimally react to a decision $n$ that had been taken by $I$ . However, for the PT game, this optimal reaction is based on the valuation $V_{U}(n,h)$ rather than the expected delivery time $E_{d}(n,h)$ . In this PT game, we denote the choice of a path $h\in\mathcal{H}$ by $U$ , as an optimal reaction to a node $n\in\mathcal{N}$ that had been chosen by $I$ , by $\rho^{\textrm{PT}}(n)$ , which is formally defined as:

[TABLE]

where $V_{U}(n,h)$ is as given in Theorem 3.

Paralleling the SE for the fully rational game in Definition 1, an SE for the PT game (SE-PT) is defined as follows.

Definition 3

A strategy pair $(\tilde{n}^{*},\tilde{h}^{*})$ constitutes a Stackelberg equilibrium of the PT game if

[TABLE]

where $V_{I}(n,h)$ is as defined in Theorem 2, and $\rho^{\textrm{PT}}(n)$ is as defined in (50).

$I$ ’s problem corresponds, then, to choosing $\tilde{n}^{*}$ which solves

[TABLE]

Following a similar logic as in the derivation of the SE in Theorem 1, the SE-PT can be analytically characterized.

Theorem 4

*The interdictor’s SE-PT strategy, $\tilde{n}^{*}$ , is given by: *

[TABLE]

*where *

[TABLE]

and $h_{n}$ is the shortest $O$ -to- $D$ path not containing node $n$ .

The resulting UAV operator’s SE-PT strategy is given by

[TABLE]

Proof:

Due to space limitations, only a sketch of the proof is provided. $U$ ’s response to a choice $n\in h_{s}$ by $I$ will either be $h_{s}$ or $h_{n}$ . $I$ always has an incentive to choose $n\in h_{s}$ , since otherwise, $\rho^{\textrm{PT}}(n)=h_{s}$ , which results in the worst possible $V_{I}(n,h)$ for $I$ . However, choosing an $n\in h_{s}$ might also lead $U$ to deviate from $h_{s}$ to the best alternative $h_{n}$ . Hence, $I$ can split the nodes in $h_{s}$ into two sets, $\mathcal{M}_{h_{s}}$ and $\mathcal{N}\setminus\mathcal{M}_{h_{s}}$ , where the former set consists of nodes of $h_{s}$ which when attacked would not lead $U$ to deviate from $h_{s}$ , while the latter set consists of nodes which when attacked will lead to deviations to the best alternative. Hence, $m_{1}$ and $m_{2}$ in (54) and (55) represent the best two alternatives for $I$ . As such, $\tilde{n}^{*}$ in (53) corresponds to choosing the best of these two alternatives, and $\tilde{h}^{*}$ in (57) and (58) correspond to choosing the best reaction $\rho^{\textrm{PT}}$ by $U$ to the choice made by $I$ . ∎

Theorem 4 analytically characterizes the SE of the PT game, which can be compared to the SE of the game with full rationality derived in Theorem 1. This comparison enables us to analyze the effect of the players’ subjective PT valuations and perceptions on their chosen equilibrium strategies. A main component of the choice of the SE and SE-PT strategies is the characterization of sets $\mathcal{N}_{h_{s}}$ , in (10), and $\mathcal{M}_{h_{s}}$ , in (56). By comparing (10) and (56), we can see that $\mathcal{N}_{s}$ relies on the comparison between $\frac{p_{n}}{1-p_{n}}(f^{h_{s}}(n)+t_{a})+f^{h_{s}}(D)$ and $f^{h_{n}}(D)$ for each $n\in h_{s}$ ; while $\mathcal{M}_{h_{s}}$ relies on comparing $V_{U}(g_{U}(n\in h_{s}))$ , which can be obtained from (49), with $V_{U}(g_{U}(n\notin h_{n}))$ , which can be obtained from (48). This difference in $\mathcal{N}_{h_{s}}$ and $\mathcal{M}_{h_{s}}$ enables possible deviation of the SE-PT strategies from the SE strategies.

V-B PT Game under Mixed-Strategy Interdiction

Consider the case where $I$ chooses $\boldsymbol{x}$ and $U$ chooses a policy that induces path $h=($$O,$ $n_{1},$ $n_{2},$ $n_{3},$ $...,$ $n_{r},$ $n_{l},$ $n_{k},$ $n_{m},$ $D)$ . Then, the resulting possible delivery times, $T^{\prime}(k_{n_{1}},$ $k_{n_{2}},...,$ $k_{n_{m}})$ , and their associated probabilities of occurrence, $\tau^{\prime}(k_{n_{1}},k_{n_{2}},...,k_{n_{m}})$ , are given by (27) and (28), where $k_{n_{i}}\in\mathds{N}_{0}$ is the number of times the UAV is successfully attacked at a node $n_{i}\in h\setminus\{O,D\}$ . Hence, the interdiction strategy $\boldsymbol{x}$ , by $I$ , and response path $h$ , by $U$ , result in a prospect $\Gamma(\boldsymbol{x},h)$ in which each outcome $T^{\prime}(k_{n_{1}},k_{n_{2}}...,k_{n_{m}})$ occurs with probability $\tau^{\prime}(k_{n_{1}},k_{n_{2}}...,k_{n_{m}})$ . Under PT, to compare strategy pairs $(\boldsymbol{x},h)\in\mathcal{X}\times\mathcal{H}$ , each of $I$ and $U$ generates a personal valuation of this prospect. As a result, their choices of optimal mixed interdiction and path selection strategies are based on these PT valuations. Given (27)-(29) and the value and weighting functions introduced in (30)-(34), we can generate the valuations assigned by $I$ and $U$ , $\Xi_{I}(\boldsymbol{x},h)$ and $\Xi_{U}(\boldsymbol{x},h)$ , to prospect $\Gamma(\boldsymbol{x},h)$ by following steps similar to those in Section V-A. Based on $\Xi_{I}(\boldsymbol{x},h)$ and $\Xi_{U}(\boldsymbol{x},h)$ , the equilibrium of the PT game with mixed interdiction strategies can be characterized.

In this regard, the definition of the SE-PT equilibrium introduced in Definition 3 can be extended to the mixed-strategy interdiction case as follows:

Definition 4

A strategy pair $(\tilde{\boldsymbol{x}}^{*},\tilde{h}^{*}_{\tilde{\boldsymbol{x}}^{*}})$ constitutes a PT mixed-strategy interdiction Stackelberg equilibrium (MSE-PT) of the network interdiction game if

[TABLE]

where $\tilde{\rho}^{\textrm{PT}}(\boldsymbol{x})$ is the optimal reaction of $U$ to $\boldsymbol{x}$ and is given by:

[TABLE]

Our solution approach presented in Section IV-B, which delivered the MSE of the game (under full rationality), also applies here to derive the MSE-PT of the PT game. Indeed, characterizing the MSE-PT requires solving $U$ ’s problem in (60) as well as $I$ ’s problem given in (59). The all-paths method proposed in Section IV-B can guarantee solving $U$ ’s problem.

Remark 2

*The all-paths method is guaranteed to find $\tilde{\rho}(\boldsymbol{x})$ for each interdiction strategy $\boldsymbol{x}\in\mathcal{X}$ . Finding $\tilde{\rho}(\boldsymbol{x})$ corresponds to identifying the path $h$ obtained as $h=\underset{h\in\mathcal{H}}{\arg\!\min}\,\Xi_{U}(\boldsymbol{x},h)$ . As such, by following steps $1$ to $3$ of the all-paths method, and considering $\Xi_{U}(\boldsymbol{x},h)$ instead of $E_{h}(O;\boldsymbol{x})$ , the all-paths method performs a complete search over all possible $O$ -to- $D$ paths and returns path $h$ which results in the minimum $\Xi_{U}(\boldsymbol{x},h)$ , hence, determining $\tilde{\rho}(\boldsymbol{x})$ . *

The interdictor’s problem corresponds to solving the following optimization problem:

[TABLE]

As in $I$ ’s problem in Section IV, obtaining an exact global solution to (61) cannot be guaranteed due to the non-convexity and discontinuity of the objective function stemming from the sudden changes to $\tilde{\rho}^{\textrm{PT}}(\boldsymbol{x})$ which can be triggered by minimal changes to $\boldsymbol{x}$ . Hence, for obtaining a solution to (61), we propose using a pattern search based method, as discussed in Section IV-B.

VI Numerical Results

For our numerical analysis, we provide a tractable set of examples which showcase the different contributions of the derived analytical results and highlight the effects that the various PT parameters can have on the equilibrium strategies and achieved expected delivery times. For these simulation-based numerical analyses, we consider the graph shown in Fig. 1 composed of $N=10$ nodes and $E=18$ edges. We label the $18$ paths, from $1$ to $18$ , as follows: $[1,$ $2,$ $...,$ $18]$ $\triangleq$ $[(2,5,7),$ $(2,5,8),$ $(2,5,9),$ $(2,6,7),$ $(2,6,8),$ $(2,6,9),$ $(3,5,7),$ $(3,5,8),$ $(3,5,9),$ $(3,6,7),$ $(3,6,8),$ $(3,6,9),$ $(4,5,7),$ $(4,5,8),$ $(4,5,9),$ $(4,6,7),$ $(4,6,8),$ $(4,6,9)]$ . Given that node 1 ( $O$ ) and node 10 ( $D$ ) are part of each path, a path $($$1,$ $i,$ $j,$ $k,$ $10)$ is, for convenience, referred to by $(i,j,k)$ . In addition, the travel times $t_{i}$ , for $i\in\{1,...,18\}$ , in Fig. 1 are drawn from a uniform distribution in the interval $[2,8]$ yielding $[t_{1},t_{2},...,t_{18}]$ $\triangleq$ $[$$6.89,$ $3.46,$ $7.58,$ $4.1,$ $3.18,$ $3.51,$ $5.7,$ $4.84,$ $4.11,$ $6.99,$ $5.51,$ $5.3,$ $7.5,$ $3.72,$ $6.54,$ $6.52,$ $4.28,$ $5.41$$]$ . We then choose the attack success probabilities as $\boldsymbol{p}$$=$ $[$$0,$ $0.3,$ $0.5,$ $0.4,$ $0.6,$ $0.3,$ $0.4,$ $0.8,$ $0.4,$ [math] $]$ . The length of each path $h$ , $f^{h}(D)$ , and the risk probability at each node, $p_{n}$ , are shown in Fig. 3. Fig. 3 shows that path $8$ , i.e. $(3,5,8)$ , is the shortest path followed by paths $11$ , i.e. $(3,6,8)$ , and path $9$ , $(3,5,9)$ ; while node $8$ is the most risky node followed by nodes $5$ and $3$ , respectively. The re-handling and processing time is considered to be $t_{a}=5$ . For the PT parameters of $I$ and $U$ , unless stated otherwise, we consider $R_{I}=R_{U}=20$ , $\lambda_{I}=\lambda_{U}=2.5$ , $\beta_{I}^{-}=\beta_{I}^{+}=\beta_{U}^{-}=\beta_{U}^{+}=0.6$ , and $\gamma_{I}^{-}=\gamma_{I}^{+}=\gamma_{U}^{-}=\gamma_{U}^{+}=0.5$ .

We will first take the reference points (which represent, for example, a target delivery time) of both players to be equal, $R_{I}=R_{U}=R$ , and ranging from $10$ to $35$ . The resulting equilibrium interdiction strategies (i.e. $I$ ’s equilibrium strategies) are shown in Fig. 4, and $U$ ’s equilibrium strategies are shown in Fig. 5. Fig. 4 shows that the MSE interdiction strategy, $\boldsymbol{x}^{*}$ , focuses solely on nodes $5$ , $8$ , and $9$ , ( $x^{*}_{5}=0.48$ , $x^{*}_{8}=0.31$ , and $x_{9}^{*}=0.21$ ) each of which is at least part of one of the three shortest paths (paths $8$ , $11$ , and $9$ ). In addition, $U$ ’s MSE strategy, $h^{*}$ , corresponds to choosing path $12$ , which is composed of nodes $3$ , $6$ , and $9$ . Given that nodes $3$ and $6$ are not attacked by $I$ at the MSE and that $p_{9}=0.4$ and $x^{*}_{9}=0.2$ , path $12$ is a relatively safe path. The players’ MSE strategies lead to an MSE expected delivery time that is equal to around $23$ , as shown in Fig. 6.

Fig. 4 shows the difference between $I$ ’s MSE-PT interdiction strategies, $\tilde{\boldsymbol{x}}^{*}$ , and the MSE interdiction strategies for different values of $R$ . Fig. 4 shows the shift in the PT interdiction strategy, $\tilde{\boldsymbol{x}}^{*}$ , from mainly targeting the incoming neighbor nodes of $D$ (i.e. nodes $7$ , $8$ , and $9$ ), at $R=10$ , to a more spread out interdiction strategy targeting a larger number of nodes, at $R=35$ . At small values of $R$ , such as $R=10$ , all possible delivery times fall above $R$ . Hence, all possible outcomes are valued by $I$ as gains. Since the PT value function, $v_{I}(.)$ in (30) and (31), leads $I$ to be risk averse in gains, choosing nodes $7$ , $8$ , and $9$ is appealing since any $O$ -to- $D$ path is guaranteed to pass by at least one of these nodes. Clearly, this choice of $\tilde{\boldsymbol{x}}^{*}$ is a risk averse choice that guarantees a sure gain. However, when $R$ increases, some of the possible delivery times will fall below $R$ . Hence, for a choice $\boldsymbol{x}$ by $I$ , and $h$ by $U$ , some of the outcomes will correspond to gains and some to losses leading $I$ to drift away from a mere risk averse strategy. In Fig. 5, we show the different MSE-PT strategies of $U$ as $R$ varies. Fig. 5 shows that at $R=10$ , $U$ chooses the shortest path $8$ at the MSE-PT. This is due to the fact that, for this small reference point $R$ , all possible delivery times are seen as losses by $U$ . The concavity of the value function for outcomes greater than $R_{U}$ renders $U$ risk seeking in losses. Hence, taking the shortest path (even if it is risky up to a certain extent) becomes more appealing to $U$ . When $R$ increases, $U$ ’s MSE-PT strategy will drift away from the shortest path, particularly at values of $R$ that are high enough to enable certain possible delivery times to fall below the reference delivery time, $R$ , leading to outcomes that are valued as gains.

Fig. 6a shows the resulting expected delivery times, at the MSE and MSE-PT, for the different values of $R$ . Clearly, for low values of $R$ , the MSE-PT results in a lower expected delivery time than the MSE. However, for relatively high values of $R$ , the MSE-PT results in an expected delivery time that is higher than the expected delivery time at the MSE. As shown in Fig. 6a, the percentage difference in expected delivery time at the MSE-PT compared to the MSE is $-7.5\%$ at $R=15$ and $+14.4\%$ at $R=30$ . Indeed, since at low values of $R$ , $I$ takes a risk averse non-aggressive attack strategy, as shown in Fig. 4, and $U$ chooses a risk-seeking shortest path, as shown in Fig. 5, this leads to achieving a relatively short expected delivery time since this shortest path (i.e. path $8$ ) is not heavily targeted by $I$ at the MSE-PT. However, for higher values of $R$ , $I$ considers more aggressive interdiction strategies and $U$ considers safer paths which results in expected delivery times that are higher at the MSE-PT than at the MSE. In addition, the results in Fig. 6a show that at the MSE-PT, except for $R=30$ and $R=35$ , $U$ was not able to achieve an expected delivery time that is below its target reference delivery time. However, at the MSE, $U$ ’s expected delivery time is lower than its target delivery time for $R\geq 25$ . Hence, selecting strategies based on PT valuations is, based on this comparison, disadvantageous to $U$ . In addition, Fig. 6a shows the expected delivery time achieved when $U$ chooses the shortest path (i.e. path $8$ ) and $I$ chooses either its fully rational MSE interdiction strategy or its prospect-theoretic MSE-PT interdiction strategy (these strategies are shown in Fig. 3) – labeled, respectively, “Shortest path vs. Interdiction MSE” and “Shortest path vs. Interdiction MSE-PT” – for different values of $R$ . Fig. 6a shows that under full rationality, unilaterally deviating from the MSE path (i.e. path $12$ as shown in Fig. 5) to the shortest path (i.e. path $8$ ) results in an increase in the expected delivery time, which is not advantageous to $U$ . Under PT, deviating from the MSE-PT path to the shortest path results in a worse (i.e. higher) expected delivery time for $R\leq 20$ , while it results in a better (i.e. lower) expected delivery time for $R\geq 25$ . Indeed, under PT, $U$ aims at minimizing its PT valuation of the expected delivery time, $\Xi_{U}(\boldsymbol{x},h)$ , as shown in (60), rather than the objective expected delivery time. However, minimizing $\Xi_{U}(\boldsymbol{x},h)$ may not lead to achieving the minimum possible expected delivery time. In fact, Fig. 6b shows $\Xi_{U}(\tilde{\boldsymbol{x}}^{*},\tilde{h}^{*}_{\tilde{\boldsymbol{x}}^{*}})$ and $\Xi_{U}(\tilde{\boldsymbol{x}}^{*},8)$ , i.e., the PT valuation achieved by $U$ when choosing its MSE-PT strategy vs. $I$ ’s MSE-PT strategy ( $\tilde{\boldsymbol{x}}^{*}$ ) as compared to choosing the shortest path $8$ vs. $\tilde{\boldsymbol{x}}^{*}$ . As shown in Fig. 6b, $U$ ’s valuation of choosing its equilibrium MSE-PT strategy is lower than the valuation achieved when choosing the shortest path. However, Fig. 6a shows that the deviation to the shortest path would have been advantageous to $U$ for $R\geq 25$ . This, hence, highlights the effect of the subjective PT perceptions of $U$ , which may lead to a worse expected delivery time as compared to the expected delivery time which could have been achieved by a mere choice of a non-strategic shortest path.

Hereinafter, to characterize the effect of the various PT parameters on the resulting equilibrium strategies and outcomes, we consider the interdictor to be fully rational (i.e. $R_{I}=0$ , $\lambda_{I}=1$ , $\beta_{I}^{-}=\beta_{I}^{+}=1$ , and $\gamma_{I}^{-}=\gamma_{I}^{+}=1$ ), while $U$ values outcomes and performs probability weighting following PT, with PT parameters similar to the ones used in the previous simulations, unless stated otherwise. We first study the effect of varying the rationality parameters of $U$ , i.e. $\gamma_{U}^{-}$ and $\gamma_{U}^{+}$ , on the MSE-PT and then study the effects of varying $U$ ’s loss parameter $\lambda_{U}$ . First, we consider $\gamma_{U}=\gamma_{U}^{-}=\gamma_{U}^{+}$ , and we let $\gamma_{U}$ take the following values: $0.25$ , $0.3$ , $0.35$ , $0.5$ , $0.75$ , and $0.9$ .

Fig. 7 shows that the MSE-PT interdiction strategy approaches its MSE strategy at higher values of $\gamma_{U}$ . However, one can see that $I$ ’s MSE-PT strategy does not completely coincide with its MSE even for high values of $\gamma_{U}$ . This is due to the fact that even when $U$ ’s probability weighting is closer to full rationality, the way $U$ values the possible game outcomes (i.e. the possible delivery times) is based on its reference point $R_{U}$ and value function. Hence, even with a closely rational probability weighting, $U$ ’s MSE-PT may not equal its MSE strategy. This can, indeed, be seen from Fig. 8, which shows that even for $\gamma_{U}=0.9$ , $U$ ’s MSE-PT strategy is different from its MSE strategy. Fig. 8 shows how $U$ ’s MSE-PT strategy changes with an increase in $\gamma_{U}$ . At lower values of $\gamma_{U}$ , $U$ ’s MSE-PT strategy consists of path $9$ , i.e. $(3,5,9)$ , while at higher values of $\gamma_{U}$ , $U$ ’s MSE-PT strategy shifts to choosing path $11$ , i.e $(3,6,8)$ . As shown in Fig. 7, at lower values of $\gamma_{U}$ , $I$ ’s optimal strategy is focused on nodes $5$ and $8$ making path $9$ , chosen by $U$ at the MSE-PT, highly risky. However, $U$ still chooses this path, at the MSE-PT, since at such low values of $\gamma_{U}$ , $U$ ’s valuation of probabilities is highly distorted. In fact, the weighting functions $\omega_{U}^{+}(.)$ and $\omega_{U}^{-}(.)$ flatten for lower values of $\gamma_{U}$ . Hence, $U$ would assess different paths as almost equally risky leading $U$ to choose path $9$ . However, when $\gamma_{U}$ increases, $U$ ’s perception of probabilities becomes more rational. Hence, for these values of $\gamma_{U}$ , $U$ can observe that path $9$ is highly risky and chooses instead the safer path $11$ , composed of nodes $(3,6,8)$ which are not attacked with a high probability by $I$ at the MSE-PT.

Fig. 9a shows the resulting expected delivery times at the MSE and at the MSE-PT for various values of $\gamma_{U}$ . From Fig. 9a, we can see that the MSE-PT strategies result in expected delivery times that are longer than the expected delivery time achieved at the MSE. Indeed, for $\gamma_{U}=0.25$ , the percentage difference between the expected delivery time at the MSE-PT and that at the MSE goes up to $+21.5\%$ . The reason is that, as shown in Fig. 8, for low values of $\gamma_{U}$ , $U$ admits a risky MSE-PT strategy leading to high expected delivery times. However, as $\gamma_{U}$ increases, the shift in $U$ ’s MSE-PT strategy allows achieving better expected delivery times; which are, however, still longer than the MSE expected delivery time. Fig. 9a also shows an expected delivery time labeled “Rational response”. This corresponds to $U$ choosing a rational strategy in response to $I$ ’s MSE-PT strategy. In other words, rational response corresponds to choosing the path strategy $h^{*}$ which solves (26) for $\boldsymbol{x}=\tilde{\boldsymbol{x}}^{*}$ . In this scenario, $I$ assumes that $U$ admits PT valuations and would, hence, choose its MSE-PT strategy, $\tilde{\boldsymbol{x}}^{*}$ . However, if $U$ is rather rational, it can take advantage of its knowledge of $\tilde{\boldsymbol{x}^{*}}$ to achieve a better expected delivery time. Indeed, the rational response of $U$ consists of choosing path $11$ , for $\gamma_{U}=0.25$ and $\gamma_{U}=0.3$ , and path $12$ , for the higher values of $\gamma_{U}$ , which result in achieving expected delivery times that are shorter than the expected delivery times at the MSE-PT and the MSE, as shown in Fig. 9a. In fact, as can be seen from Fig. 9a, at $\gamma_{U}=0.25$ , choosing the rational response strategy (which corresponds to choosing path $11$ ) allows $U$ to achieve an expected delivery time that is $30.3\%$ lower than the expected delivery time achieved at the MSE-PT. Fig. 9a also shows the resulting expected delivery time when $U$ chooses the shortest $O$ -to- $D$ path and $I$ chooses its MSE-PT interdiction strategy, labeled “Shortest Path vs. Interdictor MSE-PT”, for different values of $\gamma_{U}$ . As can be seen in Fig. 9a, a deviation from the MSE-PT path to the shortest path would have been advantageous to $U$ as it would lead to a lower expected delivery time for the entire investigated range of $\gamma_{U}$ . However, as $U$ subjectively assesses expected delivery times under PT, the choice of the MSE-PT path is valued to be better than choosing the shortest path, as shown in Fig. 9b, as the MSE-PT path leads to a lower PT valuation. Hence, this further highlights the negative effect that the subjective PT perception of $U$ can have on its achieved expected delivery time. The rational response as well as the MSE strategies both lead to a better expected delivery time than the shortest path and the MSE-PT strategies, as shown in Fig. 9a.

Fig. 10a shows the resulting expected delivery times at the MSE and at the MSE-PT, for the various values of $\lambda_{U}\in\{1,2.5,5\}$ . Fig. 10a also shows the expected delivery time achieved when $U$ plays the rational response strategy, or the shortest path, as a reaction to $I$ choosing its MSE-PT strategy. Fig. 10a shows that the MSE-PT strategies chosen at different values of $\lambda_{U}$ result in an expected delivery time that is only slightly higher than the one achieved at the MSE. At higher values of $\lambda_{U}$ , this difference in expected delivery times decreases. Indeed, at $\lambda_{U}=1$ , the percentage difference between the MSE-PT and the MSE expected delivery times is $+4.14\%$ while this difference drops to only $1.3\%$ at $\lambda_{U}=5$ . However, when $U$ plays a rational response strategy, in response to $I$ ’s MSE-PT strategy (which consists of choosing path $12$ for all the three values of $\lambda_{U}$ , i.e. $1$ , $2.5$ and $5$ ), $U$ can achieve an expected delivery time that is up to $11\%$ lower than the expected delivery time achieved at the MSE. Choosing the shortest path by $U$ would lead to a better expected delivery time only for $\lambda_{U}=1$ . Fig. 10b shows $U$ ’s valuation of the MSE-PT strategies as compared to choosing the shortest path, which highlights the reason for which a deviation from the MSE-PT path to the shortest path is not valued to be advantageous by $U$ as it leads to an increase in the valuation. In all cases, choosing the rational response is the most advantageous to $U$ , as shown in Fig. 10a.

VII Conclusion and Future Outlook

In this paper, we have introduced a novel mathematical framework for studying the cyber-physical security of time-critical UAV applications, such as drone delivery systems and anti-drone systems. We have provided a formulation of the problem using the framework of a network interdiction game between the UAV operator and the interdictor, while viewing either of them as malicious and the other one as benign. In addition, we have incorporated principles from cumulative prospect theory in the game formulation to account for the players’ potential bounded rationality. We have characterized Stackelberg (leader-follower) equilibria of the various types of games and studied their properties. Simulation results have shown that the subjectivity of the players can lead to delays in the expected delivery time.

This work paves the way for various future research steps. Indeed, the introduced time-critical network interdiction game can be studied in the presence of multiple UAVs and multiple adversaries as well as considering dynamically changing security graphs. In addition, the introduced time-critical model can be leveraged beyond the analysis of UAVs, by focusing on any autonomous system performing a time-critical mission. Each studied application yields different types of security graphs over which the game can be formulated and analyzed.

Appendix A Proof of Theorem 1

Proof:

We first prove that choosing a node $n\notin h_{s}$ is a dominated strategy for the interdictor. In fact, If $n\notin h_{s}\Rightarrow\rho(n)=h_{s}$ $\Rightarrow E_{d}(n\notin h_{s},\rho(n))\textrm{$ = $}f^{h_{s}}(D)\leq E_{d}(n,\rho(n))$ $\forall n\in\mathcal{N}$ , since $f^{h_{s}}(D)$ is the shortest possible expected delivery time. Hence, the interdictor should always choose a node $n$ that is part of a shortest $O$ -to- $D$ path, $h_{s}$ . Now, based on (3) and (4), for $n\in h_{s}$ ,

[TABLE]

where condition (62) reflects that, even when the interdictor is located at $n\in h_{s}$ , the shortest path, $h_{s}$ , results in a shorter expected delivery time than the best alternative, i.e., $h_{n}$ . When this condition is not met, a deviation from $h_{s}$ to the best alternative, $h_{n}$ , leads to a shorter expected delivery time as captured in (63). In this respect, we let $\mathcal{N}_{h_{s}}$ denote the set of nodes that are part of $h_{s}$ but are such that $\frac{p_{n}}{1-p_{n}}$$(f^{h_{s}}(n)$$\textrm{$ + $}t_{a})$$+$$f^{h_{s}}(D)$$\leq$$f^{h_{n}}(D)$ . $\mathcal{N}_{h_{s}}$ is formally defined in (10). Hence, the two possible alternatives for the optimal choice of $I$ are $n_{1}$ and $n_{2}$ defined as:

[TABLE]

which result, respectively, in expected delivery times:

[TABLE]

The interdictor’s SE strategy consists, hence, of choosing the best of the two alternatives, $n_{1}$ and $n_{2}$ :

[TABLE]

which will result in SE strategies for $U$ and expected delivery times as stated in (11)-(14). ∎

Appendix B Proof of Theorem 2

Proof:

We start by considering the case in which $n\in h$ . In this case, incorporating $I$ ’s valuation of each possible outcome, based on (36)-(38), in prospect $g(n\in h)$ , leads to the following prospect, $g_{I}(n\in h)$ :

[TABLE]

such that $\Delta I_{k}<0$ , for $k\leq k_{I}^{-}$ , and $\Delta I_{k}>0$ , for $k>k_{I}^{+}$ ; while $\Delta I_{k}$ is as defined in (38) for $k\!\in\!\{0,1,...,k_{I}^{-},k_{I}^{+},...,\infty\}$ . $g_{I}(n\in h)$ can be further split into a negative prospect, $g^{-}_{I}(n\in h)$ , which includes the elements of $g_{I}(n\in h)$ with $\Delta I_{k}<0$ (i.e. for $k\in\{0,...,k_{I}^{-}\}$ ), and a positive prospect, $g^{+}_{I}(n\in h)$ , which includes the elements of $g_{I}(n\in h)$ with $\Delta I_{k}>0$ (i.e. for $k\geq k_{I}^{+}$ ). The negative and positive prospects include, respectively, the outcomes that $I$ values as losses and outcomes that $I$ values as gains. $g^{-}_{I}(n\in h)$ and $g^{+}_{I}(n\in h)$ are expressed as:

[TABLE]

We next consider the way $I$ values this prospect by incorporating not only its subjective valuation of outcomes but also its cumulative weighting of the probability of occurrence of each of these outcomes. We let $V_{I}(g_{I}(n\in h))$ denote the PT value that $I$ gives to prospect $g_{I}(n\in h)$ , which results from the PT valuation of the negative and positive components of $g_{I}(n\in h)$ ,

[TABLE]

Here,

[TABLE]

where $\Delta I_{i}$ is as defined in (38) for $i\in\{0,1,...,k_{I}^{-}\}$ . Hence,

[TABLE]

However, based on geometric series, $\sum_{j=0}^{i}q_{n}(p_{n})^{j}=1-p_{n}^{i+1}$ . Then,

[TABLE]

A similar analysis can be carried out to obtain the expression of $V_{I}(g_{I}^{+}(n\in h))$ . In this regard,

[TABLE]

In addition, based on geometric series, $\sum_{j=i}^{\infty}(p_{n})^{j}q_{n}$ $\textrm{$ = $}q_{n}\Big{(}\sum_{j=0}^{\infty}(p_{n})^{j}\textrm{$ - $}\sum_{j=0}^{i-1}(p_{n})^{j}\Big{)}=p_{n}^{i}$ which results in

[TABLE]

Hence, based on (66), (67), and (68),

[TABLE]

where $k_{I}^{+}=k_{I}^{-}+1$ .

Next, we consider the case of $n\notin h$ . When the chosen path $h$ does not include the interdiction node $n$ , the resulting delivery time does not result in a probabilistic prospect but is rather deterministic and equal to $f^{h}(D)$ with a probability equal to $1$ , i.e., $g(n\notin h)=(f^{h}(D),1)$ . As such, $g(n\notin h)$ is valued by $I$ depending on whether $f^{h}(D)$ is higher or lower than $R_{I}$ (i.e. a gain or a loss scenario). Hence, the value, $V_{I}(g_{I}(n\notin h))$ , that $I$ associates to prospect $g_{I}(n\notin h)$ , is:

[TABLE]

∎

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. P. Valavanis and G. J. Vachtsevanos, Handbook of Unmanned Aerial Vehicles . Springer, Dordrecht, 2015.
2[2] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Unmanned aerial vehicle with underlaid device-to-device communications: Performance and tradeoffs,” IEEE Transactions on Wireless Communications , vol. 15, no. 6, pp. 3949–3963, June 2016.
3[3] Y. A. Nijsure, G. Kaddoum, N. K. Mallat, G. Gagnon, and F. Gagnon, “Cognitive chaotic UWB-MIMO detect-avoid radar for autonomous UAV navigation,” IEEE Transactions on Intelligent Transportation Systems , vol. 17, no. 11, pp. 3121–3131, Nov. 2016.
4[4] M. Mozaffari, A. T. Z. Kasgari, W. Saad, M. Bennis, and M. Debbah, “Beyond 5G with UA Vs: Foundations of a 3D wireless cellular network,” IEEE Transactions on Wireless Communications , vol. 18, no. 1, pp. 357–372, Jan. 2019.
5[5] Y. Nijsure, M. F. A. Ahmed, G. Kaddoum, G. Gagnon, and F. Gagnon, “WSN-UAV monitoring system with collaborative beamforming and ADS-B based multilateration,” in Proc. IEEE Vehicular Technology Conference , Nanjing, China, May 2016, pp. 1–5.
6[6] M. Mc Farland, “Google drones will deliver chipotle burritos at Virginia Tech,” CNN Money , Sept. 2016.
7[7] R. Pahonie, R. Mihai, and C. Barbu, “Biomechanics of flexible wing drones usable for emergency medical transport operations,” in Proc. E-Health and Bioengineering Conference (EHB) , Iasi, Romania, Nov. 2015, pp. 1–4.
8[8] G. Xiang, A. Hardy, M. Rajeh, and L. Venuthurupalli, “Design of the life-ring drone delivery system for rip current rescue,” in Proc. IEEE Systems and Information Engineering Design Symposium (SIEDS) , Charlottesville, VA, Apr. 2016, pp. 181–186.