Attacking the V: On the Resiliency of Adaptive-Horizon MPC
Scott A. Smolka (1), Ashish Tiwari (2), Lukas Esterle (3), Anna Lukina, (3), Junxing Yang (1), and Radu Grosu (1, 3) ((1) Department of Computer, Science, Stony Brook University, USA, (2) SRI International, USA, (3), Cyber-Physical Systems Group, Technische Universit\"at Wien

TL;DR
This paper introduces a new adaptive-horizon model predictive control method for flocking dynamics, analyzing its resilience against various strategic attacks through statistical model checking.
Contribution
It proposes Adaptive-Horizon MPC (AMPC) for flocking control and evaluates its robustness against different attacker strategies using extensive simulations.
Findings
AMPC enables controllers to almost always achieve V-formation under ideal conditions.
Resource constraints significantly increase attacker success probability.
Intelligent AMPC-enabled attackers outperform naive strategies in disrupting flocking.
Abstract
We introduce the concept of a V-formation game between a controller and an attacker, where controller's goal is to maneuver the plant (a simple model of flocking dynamics) into a V-formation, and the goal of the attacker is to prevent the controller from doing so. Controllers in V-formation games utilize a new formulation of model-predictive control we call Adaptive-Horizon MPC (AMPC), giving them extraordinary power: we prove that under certain controllability assumptions, an AMPC controller is able to attain V-formation with probability 1. We define several classes of attackers, including those that in one move can remove R birds from the flock, or introduce random displacement into flock dynamics. We consider both naive attackers, whose strategies are purely probabilistic, and AMPC-enabled attackers, putting them on par strategically with the controllers. While an AMPC-enabled…
| Ctrl. success rate, % | Avg. convergence duration | Avg. horizon | |||||||
| Bird 4 | |||||||||
| Bird 3 | |||||||||
| Bird 2 |
| Ctrl. success rate, % | Avg. convergence duration | Avg. horizon | |||||||
| Birds 2 and 3 | |||||||||
| Birds 2 and 4 | |||||||||
| Birds 2 and 5 | |||||||||
| Birds 2 and 6 | |||||||||
| Birds 3 and 4 | |||||||||
| Birds 3 and 5 |
| Range of noise | Ctrl. success rate, % | Avg. convergence duration | Avg. horizon | ||||||
| Random displacement game | |||||||||
| AMPC game | |||||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Fault Detection and Control Systems · Simulation Techniques and Applications
11institutetext: Department of Computer Science, Stony Brook University, USA
22institutetext: SRI International, USA
33institutetext: Cyber-Physical Systems Group, Technische Universität Wien, Austria
Attacking the V:
On the Resiliency of Adaptive-Horizon MPC
Scott A. Smolka 11
Ashish Tiwari 22
Lukas Esterle 33
Anna Lukina
Junxing Yang 3311
Radu Grosu 1133
Abstract
We introduce the concept of a V-formation game between a controller and an attacker, where controller’s goal is to maneuver the plant (a simple model of flocking dynamics) into a V-formation, and the goal of the attacker is to prevent the controller from doing so. Controllers in V-formation games utilize a new formulation of model-predictive control we call Adaptive-Horizon MPC (AMPC), giving them extraordinary power: we prove that under certain controllability assumptions, an AMPC controller is able to attain V-formation with probability 1.
We define several classes of attackers, including those that in one move can remove birds from the flock, or introduce random displacement into flock dynamics. We consider both naive attackers, whose strategies are purely probabilistic, and AMPC-enabled attackers, putting them on par strategically with the controllers. While an AMPC-enabled controller is expected to win every game with probability 1, in practice, it is resource-constrained: its maximum prediction horizon and the maximum number of game execution steps are fixed. Under these conditions, an attacker has a much better chance of winning a V-formation game.
Our extensive performance evaluation of V-formation games uses statistical model checking to estimate the probability an attacker can thwart the controller. Our results show that for the bird-removal game with , the controller almost always wins (restores the flock to a V-formation). For , the game outcome critically depends on which two birds are removed. For the displacement game, our results again demonstrate that an intelligent attacker, i.e. one that uses AMPC in this case, significantly outperforms its naive counterpart that randomly executes its attack.
1 Introduction
Many Cyber-Physical Systems (CPSs) are highly distributed in nature, comprising a multitude of computing agents that can collectively exhibit emergent behavior. A compelling example of such a distributed CPS is the drone swarm, which are beginning to see increasing application in battlefield surveillance and reconnaisance [3]. The emergent behavior they exhibit is that of flight formation.
A particularly interesting form of flight formation is V-formation, especially for long-range missions where energy conservation is key. V-formation is emblematic of migratory birds such as Canada geese, where a bird flying in the upwash region of the bird in front of it can enjoy significant energy savings. The V-formation also offers a clear view benefit, as no bird’s field of vision is obstructed by another bird in the formation. Because of the V-formation’s intrinsic appeal, it is important to quantify the resiliency of the control algorithms underlying this class of multi-agent CPSs to various kinds of cyber-attacks. This question provides the motivation for the investigation put forth in this paper.
Problem Statement and Summary of Results.
We introduce the concept of V-formation games, where the goal of the controller is to maneuver the plant (a simple model of flocking dynamics) into a V-formation, and the goal of the attacker is to prevent the controller from doing so. Controllers in V-formation games utilize a new formulation of model-predictive control we call Adaptive Receding-Horizon MPC (AMPC), giving them extraordinary power: we prove that under certain controllability conditions, an AMPC controller can attain V-formation with probability 1.
We define several classes of attackers, including those that in one move can remove a small number of birds from the flock, or introduce random displacement (perturbation) into the flock dynamics, again by selecting a small number of victim agents. We consider both naive attackers, whose strategies are purely probabilistic, and AMPC-enabled attackers, putting them on par strategically with the controllers. The architecture of a V-formation game with an AMPC-enabled attacker is shown in Figure 1. While an AMPC-enabled controller is expected to win every game with probability 1, in practice, it is resource-constrained: its maximum prediction horizon and the maximum number of game execution steps are fixed in advance. Under these conditions, an attacker has a much better chance of winning a V-formation game.
AMPC is a key contribution of the work presented in this paper. Traditional MPC uses a fixed finite prediction horizon to determine the optimal control action. Hence, it may get stuck in local minima. The AMPC procedure chooses it dynamically. Thus, AMPC can adapt to the severity of the action played by its adversary by choosing its own horizon accordingly. The AMPC procedure is inspired by an adaptive optimization procedure recently presented in [9].
Our extensive performance evaluation of V-formation games uses statistical model checking to estimate the probability that an attacker can thwart the controller. Our results show that for the bird-removal game with , the controller almost always wins (restores the flock to a V-formation). For , the game outcome critically depends on which two birds are removed. For the displacement game, our results again demonstrate that an intelligent attacker, i.e. one that uses AMPC in this case, significantly outperforms its naive counterpart that randomly executes its attack.
Traditional feedback control is, by design, resilient to noise, and also certain kinds of attacks; as our results show, however, it may not be resilient against smart attacks. Adaptive-horizon control helps to guard against a larger class of attacks, but it can still falter due to limited resources. Our results also demonstrate that statistical model checking represents a promising approach toward the evaluation CPS resilience against a wide range of attacks.
The rest of the paper is organized as follows. Section 2 introduces our dynamic model of V-formation in a flock of autonomous agents, and Section 3 defines our controller-attacker stochastic games. Section 4 presents AMPC, and Section 5 shows how AMPC is used in the V-formation games we consider. Section 6 gives a critical analysis of our results, and Section 7 discusses related work. Section 8 offers our concluding remarks and directions for future work.
2 V-Formation
We consider the problem of bringing a flock of birds from a random initial configuration to an organized V-formation. Recently, Lukina et al. [9] have modeled this problem as a deterministic Markov Decision Process (MDP) , where the goal was to generate actions that caused to reach a desired state.
In our case, is an MDP. The state of each bird in the flock is modeled using 4 variables: a 2-dimensional vector denoting the position of the bird in a 2D space, and a 2-dimensional vector denoting the velocity of the bird. Thus, the state space of is representing a flock of birds. The control actions of each bird are 2-dimensional accelerations and 2-dimensional position displacements (see discussion of and below). Both are random variables.
Let , and denote the position, velocity, acceleration, and displacement of the -th bird at time , respectively. Then, the transition relation of the MDP is given as follows:
[TABLE]
Once the current acceleration and displacement are sampled, the next state is uniquely determined by (2) from the current state in [9].
The problem of whether we can go from a random flock to a V-formation is a reachability question. The reachability goal is the set of states representing a V-formation. A key assumption in [9] was that the reachability goal can be specified using a fitness function , which assigns a non-negative real (fitness) value to each state in .
The fitness of a state was determined by the following three terms:
- •
Clear View (). A bird’s visual field is a cone with angle that can be blocked by the wings of other birds. The clear-view metric is defined by accumulating the percentage of a bird’s visual field that is blocked by other birds. The CV for the flock is the sum of the clear-view metric of all birds. The minimum value of is , and this value is attained in a perfect V-formation where all birds have clear view.
- •
Velocity Matching (). is defined as the difference between the velocity of a given bird and all other birds, summed up over all birds in the flock. The minimum value for is , and this value is attained in a perfect V-formation where all birds have the same velocity.
- •
Upwash Benefit (). The trailing upwash is generated near the wingtips of a bird, while downwash is generated near the center of a bird. An upwash measure is defined on the 2D space using a Gaussian-like model that peaks at the appropriate upwash and downwash regions. For bird with upwash , the upwash-benefit metric is defined as , and for the flock is the sum of all . The upwash benefit of a flock in V-formation is , as all birds, except for the leader, have minimum upwash-benefit metric (), while the leader has upwash-benefit metric of ().
Let be a state of a flock with birds. Given the above metrics, the overall fitness (cost) metric is of a sum-of-squares combination of , , and defined as follows:
[TABLE]
A state is considered to be a V-formation whenever , for a certain small threshold .
Given the above flocking model, the goal is to bring the flock from any configuration to a V-formation. Recall that we had two sets of control variables: accelerations and displacements for each bird of the flock. We consider the scenario where the accelerations are under the control of one agent (the controller), and the displacements (position perturbations) are under the control of a second malicious agent (the attacker). This partition of the actions of the MDP into disjoint sets gives rise to a stochastic game on an MDP, which is described next.
3 Controller-Attacker Games: Problem Definition
We are interested in games between a controller and an attacker, where the goal of the controller is to take the system to a desired set of states, and the goal of the attacker is to keep the system outside these states. We formulate our problem by using a Markov Decision Processes (MDP) such that the controller and the attacker jointly determine the transition probabilities.
Definition 1
A Markov Decision Process (MDP) is a tuple consisting of: (1) a set of states, (2) a set of actions, (3) a function , where is the probability of transitioning from state to state under action , and (4) a function , where is the reward (fitness) associated to state .
In a stochastic game [15], the transition probability from state to state is controlled jointly by two players, a controller and an attacker in our case. To view an MDP as a stochastic game, we assume that the set of actions is given as a product , where the controller chooses the -component of an action and the attacker chooses the -component of . We assume that the game is played in parallel by the controller and the attacker; i.e., they both take the state of the system at time , compute their respective actions and , and then use the composed action to determine the next state of the system (based on the transition function ).
We consider randomized strategies for both the controller and the attacker. A randomized strategy is a mapping taking every state to a probability distribution over the (available) actions. Once we fix a randomized strategy for the controller, and a randomized strategy for the attacker, the MDP reduces to a Markov chain on the state space . Thus, the controller and attacker jointly fix the probability of transitioning from a state to a state .
In this paper, we consider reachability games only. In other words, we are given a set of “good” states and the goal of the controller is to reach a state in . Let be a sequence of states (a run of the system). The controller wins on this run if , and the attacker wins otherwise.
We are interested in discrete-time continuous-space dynamical systems. Formally, the state space is and the action space is in . In the bird flocking example, , where is the number of birds. We have four state variables and four action variables, respectively for each bird. They represent the - and the -components of the position , velocity , acceleration , and displacement of each bird , respectively.
A classical problem in the study of games pertains to determining the existence of an optimal winning strategy (e.g. a Nash equilibrium) for a player. We are not concerned with such problems in this paper. Due to the uncountably many states in the state- and action-space, solving such problems for our games of interest is extremely challenging. Instead, we focus on the problem of determining the likely winner of a game where the strategy of the two players is fixed. Since we consider randomized strategies, determining the likely winner is a statistical model checking problem. In other words, we want to evaluate the resilience of certain controllers under certain attack models.
We are now ready to formally define the problem we would like to solve.
Definition 2 (Stochastic-game verification problem)
Let be an MDP, where , and randomized strategies and mapping states to probability distributions over and . The stochastic-game verification problem is to determine the probability of reaching a state in in steps, for a given , starting from an initial state (taken from a given probability distribution) in the underlying Markov chain induced by strategies on the MDP .
Let us specify the randomized strategies. For a strategy , we assume that we are given a randomized algorithm that takes a state and returns an action consistent with the probability distribution .
Our main interest here is in evaluating the resilience of a specific controller algorithm . The key assumption that the controller and the attacker algorithms make is the existence of a fitness function such that
[TABLE]
Given such a fitness metric , the controller works by minimizing the fitness of states reachable, in one or more steps, as it is done in model-predictive control (MPC). Since the fitness function is highly nonlinear, the controller uses an optimization procedure based on randomization to search for a minimum. Hence, our controller is a randomized procedure. One possible attack strategy we consider (for an advanced attacker) is based on the fitness function as well: the attacker tries to maximize the fitness of reachable states.
A key contribution of our work is an adaptive MPC procedure called AMPC. Recall that traditional MPC uses a fixed finite horizon to determine the best control action. The AMPC procedure chooses it dynamically. Thus, AMPC can adapt to the severity of the action played by its adversary by choosing its own horizon accordingly. The AMPC procedure is inspired by an adaptive optimization procedure recently presented in [9], which dynamically changes the amount of the effort it uses to search for a better solution in each step. The motivation for adaptation in [9] however was different, namely to take the optimizer out of a local minimum, and thus, ensure convergence to a global optimum.
4 The Adaptive-Horizon MPC Algorithm
We now present our new adaptive-horizon model-predictive-control algorithm, we call AMPC. We will use this algorithm as the controller strategy in the stochastic game on MDPs. We will also consider attack strategies that use AMPC. Since AMPC is an adaptive MPC procedure based on particle-swarm optimization (PSO), we first briefly present background material on MPC and PSO.
4.1 Background on Model-Predictive Control
Model-predictive control (MPC) determines the control action at current time by looking steps into the future and finding the best -length sequence of control actions that can take the system from its current state to a new state that has the lowest fitness. (Since we assume existence of a fitness metric that we are trying to minimize, we specialize the description of MPC to this case.) If denotes the state reached from state in time following the actions of length , then in the MPC approach, at each time step , the following minimization is performed to find the optimal set of actions
[TABLE]
Since the model is an approximation of the system, only the first action {\boldsymbol{a}}(t)=\textbf{opt-{\boldsymbol{a}}}^{1}(t) is applied as the action at time , and the remaining future actions found by the optimizer are ignored. After the control action is applied, the system is left to evolve, and the process is repeated at and so on.
The MPC approach can be used for achieving a V-formation, as was outlined in [17, 16]. These earlier works, however, did not use an adaptive dynamic window, and did not consider the adversarial control problem.
In MPC, optimization problem (3) is additionally subject to constraints that bound the set of possible actions and states. For example, in our flocking model, the magnitude of velocity and acceleration for each of the birds is bounded: , where is a predefined constant and .
We use a particle-swarm-optimization algorithm to solve the optimization problems generated by the MPC procedure.
4.2 Background on Particle Swarm Optimization
Particle Swarm Optimization (PSO) is a randomized approximation algorithm for determining the parameters that minimize a possibly nonlinear and possibly discontinuous cost (or fitness) function. PSO was first introduced by [8]. In an interesting twist of events, PSO took its original inspiration from bird flocking.
The PSO procedure is best described using the metaphor of a swarm of insects collaboratively trying to find the location of food. The insects, also called particles, live in the space defined by all possible valuations of the unknown parameters (of the optimization problem). The food is located at the position where the objective function is minimized. PSO works by having a swarm of particles, which have the same goal of finding food (the reward) without knowing its location. Each particle is informed about its distance to the food (value of the objective function). The PSO algorithm repeatedly redistributes each particle towards the one closest to the food, with a speed proportional to the distance separating them, until all particles converge to the same position.
AMPC employs Matlab’s toolbox particleswarm, which performs the classical version of PSO. A swarm of particles is sampled uniformly at random within a given bound on their positions and velocities. In the bird flocking example, if we try to find acceleration vectors by optimization over horizon , then one “particle” represents 2-dimensional vectors for each of the birds, along with a vector of values that determine how these acceleration vectors will be updated. After choosing a neighborhood of random size for each particle , , PSO computes the value of the given fitness function for each particle, and stores two vectors for each particle : its so-far personal-best position , and the position of its fittest neighbor . The positions and velocities of the particle swarm are updated the following way:
[TABLE]
where is an inertia weight, which quantifies the trade-off between global and local exploration of the swarm (the value of is proportional to the exploration range); and are the self adjustment and the social adjustment, respectively; are random variables; and is the vector dot product, that is, random vector : .
If the value of the fitness computed at each step of the PSO for falls below the one for , then is reassigned to . A global best for the next iteration is determined as the particle with the best fitness among . The stopping criterion of the PSO algorithm is either reaching the maximum number of iterations set in advance, or reaching the set time bound, or satisfying the minimum criterion.
PSO can be used to solve any optimization problem. We use it to solve the optimization problem generated in the MPC approach. In a V-formation game, it can be used to obtain the birds’ best accelerations, or even the best displacements, at each time step – depending on whether MPC/PSO is being used by the controller or the attacker.
Remark. We assume that PSO is fair, in the sense that it has a chance to sample all the points in the parameter space, and therefore it has the chance to find the optimal solution with probability one, given enough time.
4.3 The Main Algorithm of AMPC
We propose the main algorithm of AMPC. This algorithm performs step-by-step control of a given MDP by looking steps ahead and predicting the next best state to move to. We use PSO to identify the potentially best actions in the current state achieving the optimal value of the fitness function in the next state. For bird flocking, the fitness function, Fitness of is defined as the minimum fitness metric obtained within steps by applying on . Formally, we have
[TABLE]
where is the state after apply the th action of on . For horizon , PSO searches for the best sequence of 2-dimensional acceleration vector of length , thus having parameters to be optimized. The number of particles used in PSO is proportional to the number of parameters, i.e., .
The pseudocode for the AMPC algorithm is given in Algorithm 1. A novel feature of AMPC is that, unlike classical MPC that uses a fixed horizon , AMPC adaptively chooses an depending on whether it is able to reach a fitness value that is lower than the current fitness by our chosen quanta , .
AMPC is hence an adaptive MPC procedure that uses level-based horizons. It employs PSO to identify the potentially best next actions. If the chosen actions improve (decrease) the fitness of the next state , , in comparison to the fitness of the previous state by the predefined , the controller considers these actions to be worthy of leading the flock towards or keeping it in the V-formation.111We focus our attention on bird flocking, since the details generalize naturally to other MDPs that come with a fitness metric.
In this case, the controller applies the actions to each bird and transitions to the next state of the MDP. The threshold determines the next level of the algorithm, where is the horizon with the best fitness. The prediction horizon is increased iteratively if the fitness has not been decreased enough. Upon reaching a new level, the horizon is reset to one (see Algorithm 1). Having a horizon means it will take multiple transitions in the MDP in order to reach a solution with improved fitness. However, when finding such a solution with , we only apply the first action to transition the MDP to the next state. This is explained by the need to allow the other player (environment or an adversary) to apply their action before we obtain the actual next state. If no new level is reached within horizons, the first action of the best using horizon is applied.
The dynamic threshold is defined as in [9]. Its initial value is obtained by dividing the fitness range to be covered into equal parts, that is, , where and . Subsequently, is determined by the previously reached level , as . This way AMPC advances only if is at least apart from .
This approach allows us to force PSO to escape from a local minimum, even if this implies passing over a bump, by gradually increasing the exploration horizon . We assume that the MDP is controllable and that the set of good states is not empty, which means, that from any state, it is possible to reach a state whose fitness decreased by at least . Algorithm 1 illustrates our approach.
Theorem 4.1 (AMPC Convergence)
Given an MDP with positive and continuous fitness function , and a nonempty set of target states with . If the transition relation is controllable with actions in , then there is a finite maximum horizon and a finite number of execution steps , such that AMPC is able to find a sequence of actions that brings a state in to a state in with probability one.
Proof
In each (macro) step of horizon length , from level to level , AMPC decreases the distance to by , where is fixed by the number of steps chosen in advance. Hence, AMPC converges to a state in in a finite number of steps, for a properly chosen . AMPC is able to decrease the fitness in a macro step by by the controllability assumption and the fairness assumption about the PSO algorithm. Since AMPC is a randomized algorithm, the result is probabilistic.
Note that AMPC is a general procedure that performs adaptive MPC using PSO for dynamical systems that are controllable, come with a fitness metric, and have at least one optimal solution. In an adversarial situation two players have opposing objectives. The question arises what one player assumes about the other when computing its own action, which we discuss next.
5 Stochastic Games for V-Formation
We describe the specialization of the stochastic-game verification problem to V-formation. In particular, we present the AMPC-based control strategy for reaching a V-formation, and the various attacker strategies against which we evaluate the resilience of our controller.
The MDP for V-formation was presented in Section 2. The state variables of the MDP are the positions and velocities of the birds, and the control variables (defining the actions) are the accelerations and displacements. In the transition relation given in equation (1), the attacker chooses the displacement it needs to manipulate the position of the birds, whereas the controller chooses the acceleration to apply. Together, the pair defines the action that transforms one MDP state to another. We now define the controller’s and attacker’s strategies.
5.1 Controller’s Adaptive Strategies
Given current state , the controller’s strategy returns a probability distribution on the space of all possible accelerations (for all birds). As mentioned above, this probability distribution is specified implicitly via a randomized algorithm that returns an actual acceleration (again for all birds). This randomized algorithm is the AMPC algorithm, which inherits its randomization from the randomized PSO procedure it deploys.
When the controller computes an acceleration, it assumes that the attacker does not introduce any disturbances; i.e., the controller uses the following model:
[TABLE]
where is the only control variable. Note that the controller chooses its next action based on the current configuration of the flock using MPC. The current configuration may have been influenced by the disturbance introduced by the attacker in the previous time step. Hence, the current state need not be the state predicted by the controller when performing MPC in step . Moreover, depending on the severity of the attacker action , the AMPC procedure dynamically adapts its behavior, i.e. the choice of horizon , in order to enable the controller to pick the best control action in response.
5.2 Attacker’s Strategies
We are interested in evaluating the resilience of our V-formation controller when it is threatened by an attacker that can remove a certain number of birds from the flock, or manipulate a certain number of birds by taking control of their actuators (modeled by the displacement term in equation (2)). We assume that the attack lasts for a limited amount of time, after which the controller attempts to bring the system back into the good set of states. When there is no attack, the system behavior is the one given by equation (6).
Note that there can be many different criteria for evaluating the success of an attack, but in our experiments, the controller is declared the winner if it can bring the flock to V-formation. We consider three attack strategies (but see the future work discussion in Section 8), each of which defines a V-formation game.
Remove Birds Game.
In an RBG, the attacker selects a subset of birds, where , and removes them from the flock. The removal of bird from the flock at time can be simulated in our framework by allowing the attacker to set the displacement for bird to . We assume that the flock is in a V-formation at time . Thus, the goal of the controller is to bring the flock back into a V-formation consisting of birds. In an RBG, the attacker plays only one move. When picking birds, the attacker is able to decide which birds will have the greatest negative impact on the flock’s fitness when removed from the flock. Apart from seeing if the controller can bring the flock back to a V-formation, we also analyze the time it takes the controller to do so.
Random Displacement Game.
In an RDG, the attacker chooses the displacement vector for a fixed number of birds uniformly from the space . This means that the magnitude of the displacement vector is picked from the interval , and the direction of the displacement vector is picked from the interval . We vary in our experiments. The birds that are picked in different steps are not necessarily the same, as the attacker makes this choice uniformly at random at runtime as well. The game starts from an initial V-formation. The attacker is allowed a fixed number of moves, say , after which the displacement vector is identically [math] for all birds. The controller, which has been running in parallel with the attacker, is then tasked with moving the flock back to a V-formation, if necessary.
AMPC Game.
An AMPC game is similar to an RDG except that the attacker does not use a uniform distribution to determine the displacement vector. The attacker is advanced and calculates the displacement (that will be the worst for the controller) using the AMPC procedure. See Figure 1. In detail, the attacker applies AMPC, but assumes the controller applies zero acceleration. Thus, the attacker uses the following model of the flock dynamics:
[TABLE]
Note that the attacker is still allowed to have be nonzero for a small number of birds. However, it can choose which birds it picks in each step. It uses the AMPC procedure to simultaneously pick the birds and their displacements.
Theorem 5.1 (AMPC resilience in a C-A game)
Given a controller-attacker game, there is a finite maximum horizon and a finite maximum number of game-execution steps such that AMPC controller will win the controller-attacker game in steps with probability one.
Proof
Since the flock MDP (defined by Equation 6) is controllable, the PSO algorithm we use is fair, and the attack has a bounded duration, the proof of the theorem follows from Theorem 4.1.
Remark 1
While Theorem 5.1 states that the controller is expected to win with probability one, we expect winning probability to be possibly lower than one in many cases because: (1) the maximum horizon is fixed in advance, and so is (2) the maximum number of execution steps ; (3) the underlying PSO algorithm is also run with bounded number of particles and time.
6 Statistical MC Evaluation of V-Formation Games
As discussed in Section 3, the stochastic-game verification problem we address in the context of the V-formation-AMPC algorithm is formulated as follows. Given a flock MDP (we consider the case of birds), acceleration actions of the controller, displacement actions of the attacker, the randomized strategy of the controller (the AMPC algorithm), and a randomized strategy for the attacker, determine the probability of reaching a state where the fitness function (V-formation in a 7-bird flock), starting from an initial state (in this case this is a V-formation), in the underlying Markov chain induced by strategies , on .
Since the exact solution to this reachability is intractable due to the infinite/continuous space of states and actions, we solve it approximately with classical statistical model-checking (SMC). The particular SMC procedure we use is described in [6] and based on an additive or absolute-error -Monte-Carlo-approximation scheme. This technique requires running i.i.d. game executions, each for a given maximum time horizon, computing if these executions reach a V-formation, and returning the average number of times this occurs.
The i.i.d. experiments determine the random variables , where the sample mean is assumed to be sufficiently greater than 0. In this case, one can exploit the Bernstein’s inequality and fix to . This results in an additive-error -approximation scheme:
[TABLE]
where approximates with absolute error and probability . In our case, each is a Bernoulli random variable, where 1 means that the execution ends in a V-formation, and 0 means the opposite:
[TABLE]
This allows us to use the Chernoff-Hoeffding instantiation of the Bernstein’s inequality, and fix the proportionality constant to , as in [7].
Each of the games described in Section 5 is executed 2,000 times. For a confidence ratio , we thus obtain an additive error of .
We use the following parameters in the game executions: number of birds , threshold on the fitness , maximum horizon , number of particles in PSO . In RBG, the controller is allowed to run for a maximum of steps. In RDG and AMPC game, the attacker and the controller run in parallel for steps, after which the displacement becomes [math], and the controller has a maximum of more steps to restore the flock to a V-formation.
To perform SMC evaluation of our AMPC approach we designed the above experiments in C and ran them on the Intel Core i7-5820K CPU with 3.30 GHz and with 32GB RAM available.
6.1 Discussion of the Results
To demonstrate the resilience of our adaptive controller, for each game introduced in Section 5, we performed a number of experiments to estimate the probability of the controller winning. Moreover, for the runs where the controller wins, the average number of steps required by the controller to bring the flock to a V-formation is reported as average convergence duration, and the average length of the horizon used by AMPC is reported as average horizon.
The numbering of the birds in Tables 1 and 2 is given in Figure 2. Bird-removal scenarios that are symmetric with the ones in the tables are omitted. The results presented in Table 1 are for the RBG game with . In this case, the controller is almost always able to bring the flock back to a V-formation, as is evident from Table 1. Note that removing Bird (or ) is a trivial case that results in a V-formation.
In the case when , shown in Table 2, the success rate of the controller depends on which two birds are removed. Naturally, there are cases where dropping two birds does not break the V-formation; for example, after dropping Birds 1 and 2, the remaining birds continue to be in a V-formation. Such trivial cases are not shown in Table 2. Note that the scenario of removing Bird (or ) and one other bird can be viewed as removing one bird in flock of birds, thus not considered in this table. Among the other nontrivial cases, the success rate of controller drops slightly in four cases, and drops drastically in remaining two cases. This suggests that attacker of a CPS system can incur more damage by being prudent in the choice of the attack.
Impressively, whenever the controller wins, the controller needs about the same number of steps to get back to V-formation (as in the one-bird removal case). On average, removal of two birds results in a configuration that has worse fitness compared to an RBG with . Hence, the adaptive controller is able to make bigger improvements (in each step) when challenged by worse configurations. Furthermore, among the four cases where the controller win rate is high, experimental results demonstrate that removing two birds positioned asymmetrically with respect to the leader poses a stronger, however, still manageable threat to the formation. For instance, the scenarios of removing birds 2 and 6 or 3 and 5 give the controller a significantly higher chance to recover from the attack, and , respectively.
Table 3 explores the effect of making the attacker smarter. Compared to an attacker that makes random changes in displacement, an attacker that uses AMPC to pick its action is able to win more often. This again shows that an attacker of a CPS system can improve its chances by cleverly choosing the attack. For example, the probability of success for the controller to recover drops from to when the attacker uses AMPC to pick displacements with magnitude in and direction in . The entries in the other two columns in Table 3 reveal two even more interesting facts.
First, in the cases when the controller wins, we clearly see that the controller uses a longer look-ahead when facing a more challenging attack. This follows from the observation that the average horizon value increases with the strength of attack. This gives evidence for the fact that the adaptive component of our AMPC plays a pivotal role in providing resilience against sophisticated attacks. Second, the average horizon still being in the range -, means that the adaptation in our AMPC procedure also helps it perform better than a fixed-horizon MPC procedure, where usually the horizon is fixed to . When a low value of (say ) suffices, the AMPC procedure avoids unnecessary calculation that using a fixed might incur.
In the cases where success rate was low (Row 5 in Table 2 and Row 6 in Table 3), we observed improved success rate ( and respectively across runs) when we increased to and to . This shows that success rate of AMPC improves as it is given more resources, as predicted by Theorem 4.1.
7 Related Work
In the field of CPS security, one of the most widely studied attacks is sensor spoofing. When sensors measurements are compromised, state estimation becomes challenging, which inspired a considerable amount of work on attack-resilient state estimation [5, 14, 12, 13, 4]. In these approaches, resilience to attacks is typically achieved by assuming the presence of redundant sensors, or coding sensor outputs. In our work, we do not consider sensor spoofing attacks, but assume the attacker gets control of the displacement vectors (for some of the birds/drones). We have not explicitly stated the mechanism by which an attacker obtains this capability, but it is easy to envision ways (radio controller, attack via physical medium, or other channels [2]) for doing so.
Adaptive control, and its special case of adaptive model predictive control, typically refers to the aspect of the controller updating its process model that it uses to compute the control action. The field of adaptive control is concerned with the discrepancy between the actual process and its model used by the controller. In our adaptive-horizon MPC, we adapt the lookahead horizon employed by the MPC, and not the model itself. Hence, the work in this paper is orthogonal to what is done in adaptive control [10, 1].
A key focus in CPS security has also been detection of attacks. For example, recent work considers displacement-based attacks on formation flight [11], but it primarily concerned with detecting which UAV was attacked using an unknown-input-observer based approach. We are not concerned with detecting attacks, but establishing that the adaptive nature of our controller provides attack-resilience for free. Moreover, in our setting, for both the attacker the and controller the state of the plant is completely observable.
We are unaware of any work that uses statistical model checking to evaluate the resilience of adaptive controllers against (certain classes of) attacks.
8 Conclusions
We have introduced AMPC, a new model-predictive controller that unlike MPC, comes with provable convergence guarantees. The key innovation of AMPC is that it dynamically adapts its receding horizon (RH) to get out of local minima. In each prediction step, AMPC calls PSO with an optimal RH and corresponding number of particles. We used AMPC as a bird-flocking controller whose goal is to achieve V-formation despite various forms of attacks, including bird-removal, bird-position-perturbation, and advanced AMPC-based attacks. We quantified the resiliency of AMPC to such attacks using statistical model checking. Our results show that AMPC is able to adapt to the severity of an attack by dynamically changing its horizon size and the number of particles used by PSO to completely recover from the attack, given a sufficiently long horizon and execution time (ET). The intelligence of an attacker, however, makes a difference in the outcome of a game if RH and ET are bounded before the game begins.
Future work includes the consideration of additional forms of attacks, including: Energy attack, when the flock is not traveling in a V-formation for a certain amount of time; Collisions, when two birds are dangerously close to each other due to sensor spoofing or adversarial birds; and Heading change, when the flock is diverted from its original destination (mission target) by a certain degree.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Adetola, V., De Haan, D., Guay, M.: Adaptive model predictive control for constrained nonlinear systems. Systems & Control Letters 58(5), 320–326 (2009)
- 2[2] Checkoway, S., Mc Coy, D., Kantor, B., Anderson, D., Shacham, H., Savage, S., Koscher, K., an, A.C., Roesner, F., Kohno, T.: Comprehensive experimental analyses of automotive attack surfaces. In: USENIX Security (2011)
- 3[3] Condliffe, J.: A 100-drone swarm, dropped from jets, plans its own moves. MIT Technology Review (Jan 2017)
- 4[4] Davidson, D., Wu, H., Jellinek, R., Ristenpart, T., Singh, V.: Controlling UA Vs with sensor input spoofing attacks. In: Proceedings of WOOT’16, 10th USENIX Workshop on Offensive Technologies. Austin, TX (Aug 2016)
- 5[5] Fawzi, H., Tabuada, P., Diggavi, S.N.: Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Trans. Automat. Contr. 59(6), 1454–1467 (2014), http://dx.doi.org/10.1109/TAC.2014.2303233
- 6[6] Grosu, R., Peled, D., Ramakrishnan, C.R., Smolka, S.A., Stoller, S.D., Yang, J.: Using statistical model checking for measuring systems. In: Proceedings of the International Symposium Leveraging Applications of Formal Methods, Verification and Validation. LNCS, vol. 8803, pp. 223–238. Springer (2014)
- 7[7] Hérault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Proceedings of the International Conference on Verification, Model Checking, and Abstract Interpretation (2004)
- 8[8] Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of 1995 IEEE International Conference on Neural Networks. pp. 1942–1948 (1995)
