A Dynamic Game Framework for Rational and Persistent Robot Deception With an Application to Deceptive Pursuit-Evasion
Linan Huang, Quanyan Zhu

TL;DR
This paper develops a dynamic game framework for rational and persistent deception among robots, using a Bayesian approach and Riccati equations, with applications to pursuit-evasion scenarios and new metrics for deception effectiveness.
Contribution
It introduces a novel PBNE computation method as a stochastic control problem, derives Riccati equations under LQ assumptions, and proposes metrics for evaluating deception strategies.
Findings
PBNE can be characterized by extended Riccati equations.
Receding-horizon algorithm efficiently computes PBNE.
Numerical case study validates the framework and metrics.
Abstract
This article studies rational and persistent deception among intelligent robots to enhance security and operational efficiency. We present an N-player K-stage game with an asymmetric information structure where each robot's private information is modeled as a random variable or its type. The deception is persistent as each robot's private type remains unknown to other robots for all stages. The deception is rational as robots aim to achieve their deception goals at minimum cost. Each robot forms a dynamic belief of others' types based on intrinsic or extrinsic information. Perfect Bayesian Nash equilibrium (PBNE) is a natural solution concept for dynamic games of incomplete information. Due to its requirements of sequential rationality and belief consistency, PBNE provides a reliable prediction of players' actions, beliefs, and expected cumulative costs over the entire K stages. The…
| Variable | Meaning |
|---|---|
| Set of players in the dynamic game | |
| Set of discrete stages in the dynamic game | |
| Set of possible types for player | |
| Type of player | |
| players’ joint type | |
| Set of types of all players except for player | |
| Types of all players except for player | |
| Set of probability distributions over set | |
| Probability distribution of player ’s type | |
| Probability distribution of the joint type | |
| Probability distribution of noise | |
| System state of dimension at stage | |
| Player ’s state of dimension at stage | |
| Reference trajectory for player of type | |
| Player ’s belief state at stage | |
| players’ joint belief state at stage | |
| State history | |
| State transition function at stage | |
| Player ’s belief transition function at stage | |
| Player ’s cost function at stage | |
| Player ’s PBNE cost | |
| Player ’s PBNE cost when all players’ types are common knowledge | |
| Player ’s action of dimension at stage | |
| players’ joint action at stage | |
| Player ’s action sequence from to | |
| Player ’s and all other players’ control sequences from stage to | |
| Player ’s belief at stage , i.e., the probability of other players’ types being based on player ’s available information of |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Dynamic Game Framework for Rational and Persistent Robot Deception with an Application to Deceptive Pursuit-Evasion
Linan Huang, Quanyan Zhu, This paper has been accepted for publication in IEEE Transactions on Automation Science and Engineering This research is partially supported by awards ECCS-1847056, CNS-1544782, CNS-2027884, and SES-1541164 from National Science of Foundation (NSF), and grant W911NF-19-1-0041 from Army Research Office (ARO). L. Huang and Q. Zhu are with the Department of Electrical and Computer Engineering, New York University, 370 Jay Street, Brooklyn, NY 11201, USA; E-mail: {lh2328,qz494}@nyu.eduDigital Object Identifier 10.1109/TASE.2021.3097286
Abstract
This paper studies rational and persistent deception among intelligent robots to enhance security and operational efficiency. We present an N-player K-stage game with an asymmetric information structure where each robot’s private information is modeled as a random variable or its type. The deception is persistent as each robot’s private type remains unknown to other robots for all stages. The deception is rational as robots aim to achieve their deception goals at minimum cost. Each robot forms a dynamic belief of others’ types based on intrinsic or extrinsic information. Perfect Bayesian Nash Equilibrium (PBNE) is a natural solution concept for dynamic games of incomplete information. Due to its requirements of sequential rationality and belief consistency, PBNE provides a reliable prediction of players’ actions, beliefs, and expected cumulative costs over the entire K stages. The contribution of this work is fourfold. First, we identify the PBNE computation as a nonlinear stochastic control problem and characterize the structures of players’ actions and costs under PBNE. We further derive a set of extended Riccati equations with cognitive coupling under the linear-quadratic setting and extrinsic belief dynamics. Second, we develop a receding-horizon algorithm with low temporal and spatial complexity to compute PBNE under intrinsic belief dynamics. Third, we investigate a deceptive pursuit-evasion game as a case study and use numerical experiments to corroborate the results. Finally, we propose metrics, such as deceivability, reachability, and the price of deception, to evaluate the strategy design and the system performance under deception.
Note to Practitioners
Recent advances in automation and adaptive control in multi-agent systems enable robots to use deception to accomplish their objectives. Deception involves intentional information hiding to compromise the security and operational efficiency of the robotic systems. This work proposes a dynamic game framework to quantify the impact of deception, understand the robots’ behaviors and intentions, and design cost-efficient strategies under the deception that persists over stages. Existing researches on robot deception have relied on experiments while this work aims to lay a theoretical foundation of deception with quantitative metrics, such as deceivability and the price of deception. The proposed model has wide applications, including cooperative robots, pursuit and evasion, and human-robot teaming. The pursuit-evasion games are used as case studies to show how the deceiver can amplify the deception by belief manipulation and how the deceived robots can reduce the negative impact of deception by enhanced maneuverability and Bayesian learning. The future work would focus on designing cooperative deception among swarm robotics and robotic systems that are robust to or further benefit from deception.
Index Terms:
Robot deception, perfect Bayesian equilibrium, pursuit-evasion, linear-quadratic games, discrete-time Riccati equations
I Introduction
Deception is a ubiquitous phenomenon in biology [1], military [2], politics and media [3], and cyberspace [4]. In particular, deception plays an increasingly significant role in cyber-physical systems, including autonomous vehicles and robots driven by artificial intelligence (AI). Recent advances in these AI-enabled technologies have not only allowed robots to adapt to the dynamic environment via real-time observations, but also made them deceivable. A deceiver can intentionally hide or reveal selected information to alter the beliefs and behaviors of the target robots for a higher reward. Since deception has many forms and delivery methods, understanding deception in a unified and quantitative framework is an indispensable step toward assessing the outcomes, measuring the impact, and designing strategies. This work aims to design robots that can interact with others efficiently under deceptive environments.
We identify the following challenges and features of robot deception. First, by definition, deception involves at least two participants interacting with each other. An intelligent robot should further consider other participants’ rationality, predict their potential deceptive behaviors, and adjust its actions accordingly to alleviate the negative effect of deception. Second, due to the robots’ dynamic nature, one-shot deception can exert a subsequent influence. The participating robots need to form long-term objectives to deceive or counter-deceive other robots. The multi-stage interactions also make it possible for the deceiver to apply deception at different stages. Third, each robot contains heterogeneous private information, which results in an asymmetric cognition structure; i.e., robots can form different beliefs over the same piece of unknown information. Thus, besides the couplings of state dynamics and costs, the multi-agent system further has cognitive coupling; i.e., each robot’s behaviors are not only affected by its own belief but also the beliefs of the others.
To capture these features, we model the deceptive interaction between strategic robots as a dynamic game of incomplete information. During the finite stages of interaction, robots accomplish non-cooperative tasks such as pursuit-evasion in the battlefield [5] or cooperative tasks such as collective towing [6]. Robots introduce deception in the above interacting scenarios due to antagonism, selfishness, and privacy concerns. Following Harsanyi’s approach [7], we capture each robot’s private information by a random variable. The realization of the random variable, which is called the robot’s type, is known only to itself, while the support of the random variable, which contains all its possible types, is known to all robots. Take the pursuit-evasion scenario as an example, due to the constraints of weather, terrain, and weapon, both the evading and the pursuing robots know the feasible beachheads for the evader to land on. However, the evader chooses only one beachhead as his true target and the evader’s choice, i.e., his type, is unknown to the pursuer. The pursuer in the battlefield knows the existence of the deception and learns to counter the deception by forming and updating her belief based on real-time observations. Since these tasks are usually time-constrained, robots cannot wait and freeze until they have learned the true type. Instead, they have to take concurrent actions while the deceiver’s type remains uncertain.
We consider two classes of belief dynamics based on whether robots exploit the intrinsic information such as the prediction of other robots’ actions, or the extrinsic information to update their beliefs. Each robot aims to minimize its expected cumulative cost over stages. Since the expectation involves its -stage belief sequence of other players’ private types, its actions should be sequentially rational under its belief sequence and the belief sequence should be consistent with the belief dynamics as well. These two requirements lead to the solution concept of Perfect Bayesian Nash Equilibrium (PBNE) where a player’s unilateral deviation from the equilibrium increases his long-run cost. By appending the belief state (i.e., all players’ beliefs under all possible types) to the system state, the PBNE computation is equivalent to a multi-agent nonlinear stochastic control problem and the method of dynamic programming applies. Without loss of generality, we characterize the structure of the action and the cost under PBNE as a feedback function of the belief state and the system state at the current stage. To provide an offline evaluation metric of the equilibrium cost under incomplete information, we use the expected equilibrium cost under complete information as a benchmark and define the Price of Deception (PoD).
Due to their tractability and generality, we focus on incomplete-information Linear-Quadratic (LQ) games with extrinsic belief dynamics to obtain the PBNE action that is unique and affine to the system state. We obtain a set of extended Riccati equations, which explicitly characterizes the coupling in the state dynamics, costs, and cognition of all robots. Under proper decoupling structures, the extended Riccati equations degenerate to the classical Riccati equations for the problems of optimal control or complete-information LQ games. Under the incomplete-information LQ games with intrinsic belief dynamics, the equilibrium action is in general not affine feedback of the system state. Thus, we adopt a receding-horizon approach to provide a reasonable approximation of PBNE; i.e., instead of offline planning of all -stage actions before the game starts, players recompute their actions based on the real-time observations and their updated beliefs at each new stage during the interaction.
Finally, we investigate a target protection problem where an evader aims to deceptively reach one of the possible targets and simultaneously evade the pursuer. The game has doubled-sided asymmetric information. The evader’s private or hidden information is his true target while the pursuer’s private information is her capability to maneuver or the maneuverability. We propose multi-dimensional metrics, including the stage of truth revelation and the endpoint distance, to assess the deception impact. We define the concept of deceivability to characterize the fundamental limits of deception and investigate how it is affected by the distinguishability of the private information. We compare the proposed control policy with two heuristic polices to demonstrate its efficacy to counter deception at a much lower cost. We show that Bayesian learning can significantly reduce the impact of initial belief manipulation and result in a win-win situation for some cases. The increase of the pursuer’s maneuverability improves her control performance under deception yet has a marginal effect. We also find that applying deception to counter deception is not always effective; e.g., it can be beneficial for a less maneuverable pursuer to disguise as a more maneuverable pursuer but not vice versa. The numerical results corroborate that PoD can exceed ; i.e., deception among players may not only benefit the deceiver but also the deceivee.
I-A Related Works
The secure and efficient operation of robots, autonomous vehicles, and industrial control systems is vital for recent advances in technologies. Many works [8, 9, 10] have investigated how to protect these systems from various attacks on sensor measurements [11], communication channels [12], and control signals [13, 14]. Deception is a key feature of sophisticated attacks with a focus on intentionally hiding private information [15, 16], introducing randomness [17], and manipulating other players’ beliefs [18, 19]. Deception in robotic systems can be conducted through visual displays [20], facial expressions and body gestures [21], and trajectories [22, 15]. Existing works on robot deception are largely based on experimental approaches [15, 23, 24]. There is a need for a formal and quantitative framework to assess the deception impact, understand the fundamental limit and tradeoff of deception, and determine real-time strategies. Compared to the theoretical works of deceptive path planning and goal recognition [25, 26], which focus on identifying the true target behind deception, our work further determines optimal and cost-effective control policies to counteract deception and physically protect the true target; e.g., the pursuer adopts the action sequence of minimum cost to reach and protect the true beachhead selected by the evader. Compared to control-theoretic deception frameworks based on Markov decision processes [17, 18] and stochastic games [27], we adopt a state-space representation to better characterize the physical dynamics of robots and autonomous vehicles.
Game models such as hypergames [28], dynamic Bayesian games [16], partially observable stochastic games [19, 29], and games that involve signaling mechanisms [30, 31] have been adopted as natural analytic paradigms to understand deception between intelligent players. The computation of equilibrium solutions for dynamic games of incomplete information, especially ones with non-classical information structure [32], is often a challenging task. Previous works have adopted conjugate prior assumptions to simplify Bayesian update and decouple the forward type estimation and backward action optimization under a finite state space and a continuous type space [33, 34]. To solve the coupling between players’ belief dynamics and the multi-agent optimal control problem in the context of robotic systems where states are continuous and constrained by physical dynamics with noises, we adopt a receding-horizon approach to compute PBNE, which yields computationally tractable online strategies for the players. Similar receding-horizon approaches have been used in other contexts, including cyber-physical systems [35], military air operation [36], and autonomous racing [37].
I-B Notations and Organization of the Paper
Calligraphic letter defines a set and represents its cardinality. Define as the set of elements in but not in . The Euclidean norm of a vector is represented by . Let denote the expectation of over random variable whose probability distribution is . Let ′ represent matrix transpose and represent a block diagonal matrix with possibly non-square matrices , on its diagonal. Define as a set of elements, as block matrices of the same number of rows arranged in one row vector, and as block matrices of the same number of columns arranged in one column vector. Let be the identity matrix and the zero matrix, respectively. The superscript is the stage index and the subscript is the player index. We omit a function’s arguments when there is no ambiguity, e.g., . A piece of information for a group of players is called common knowledge if all players know it, all players know that all players know it, and so on ad infinitum. We summarize main notations in Table I.
The rest of paper is organized as follows. Section II introduces the dynamic game of incomplete information and the solution concept of PBNE. To obtain explicit and practical solutions, we consider a class of a linear-quadratic problems in Section III and obtain a set of extended Riccati equations. We present a case study of deceptive pursuit-evasion in Section IV and Section V concludes the paper.
II Dynamic Game with Private Types
We model deception as a -stage game consisting of robots as players and each robot has asymmetric information. Let be the set of players and be the set of discrete stages. Private information of player , i.e., his type , is modeled as the realization of a discrete random variable with a finite support and a prior probability distribution . Hence, is the number of possible types for player and is the probability that player ’s type is . Define shorthand notation and let be the set of types of all players except for player . Each player knows the value of his own type , but does not know the values of other players’ types , throughout stages of the game. The system state dynamics under players’ joint action , joint type , and an additive external noise are shown in (1):
[TABLE]
The dynamics in (1) can have different interpretations based on applications. In the pursuit-evasion scenario as in [5], represents robot ’s local states such as its location and speed. The system state can be explicitly represented by robots’ joint state with . In the application where robots cooperatively transport a payload, e.g., [38, 6], system state represents the payload’s location and posture, which does not explicitly relate to robots’ local states. The noise sequence assumed to be independent with probability density function , i.e., . The noise is not necessarily Gaussian distributed but is assumed to have a zero mean, i.e., . We assume that system dynamics (1) are multi-agent controllable as defined in Definition 1 so that players can design their deceptive actions to reach the entire state space in finite stages.
Definition 1** (Multi-Agent Controllability).**
System dynamics (1) are called multi-agent controllable if for any target state at stage , initial state , and joint type , there exists a sequence of finite joint actions that drive the system state from to in expectation.
II-A Forward Belief Dynamics
At each stage , the information available to player compromises all players’ state history as well as his own type value . Define as the set of probability distributions over set . Each player at stage forms a belief based on his available information. Thus, is a probability measure of other players’ types, i.e., .
Define a vector
[TABLE]
as player ’s belief state at stage . We assume that the set of belief states is independent of stages, i.e., . Then, we can represent player ’s belief dynamics as
[TABLE]
Note that the belief transition function can be different for each and , i.e., players’ belief updates can be heterogeneous and time-varying. Define . In this work, we assume that the initial beliefs of all players of all types and the belief update rules , are common knowledge. In the next two subsections, we provide two specific forms of that rely on intrinsic and extrinsic information, respectively.
II-A1 Bayesian Belief Dynamics
The most common belief update rule in (2) for player at stage uses Bayesian inference. Given the knowledge of the sequential state observations and all players’ actions , each player of type at stage can update his belief as follows: ,
[TABLE]
In (3), we use the Markov property, i.e., . The denominator is positive as .
Remark 1** (Actions Reveal Type Information).**
Even if the state dynamics in (1) are independent of , player can still learn player ’ type via (3) as player ’s action is a function111Each player’s action is a function of his type as his cost is related to his type and the action aims to minimize his cost. of his type .
II-A2 Markov-Chain Belief Dynamics
In section II-A1, we assume that players can exploit the intrinsic information of state dynamics , state observations , and the prediction of all players’ actions . Since the above intrinsic information may not be available in practice, we consider the belief dynamics with extrinsic information in this subsection. In particular, we assume that each player ’s belief dynamics , are a discrete-time Markov chain where the extrinsic information at stage is characterized by the transition function . Note that the transition function only characterizes how players update their beliefs at each stage yet does not guarantee that a player can learn the true types of others. The following example illustrates a class of players whose belief dynamics exhibit the confirmation bias [39] where players tend to ignore intrinsic evidence such as and preserve their belief update rules at each stage .
Example 1**.**
*Consider a two-person game where the first player has two types and the second player only has one type . The second player’s belief state toward the first player’s type belongs to a finite set . The transition function is independent of : if the current belief state is , then the belief at the next stage is or with probability , respectively. If the current belief state is (resp. ), then the belief at the next stage is (resp. ) or with probability and , respectively. The above transition function means that the second player tends to interpret the extrinsic information of the first player’s type based on his current belief. If the second player already believes that the first player is of type with a high probability of at stage , i.e., , then the second player is more inclined to enhance his current belief, i.e., his belief state at the next stage, i.e., , will remain to be with a high probability of . The above transition function represents the phenomena of attitude polarization and confirmation bias where players preserve their existing beliefs and the disagreement becomes more extreme at each stage even when players are exposed to the same evidence. *
II-B Nonzero-Sum Cost Function and Equilibrium Concept
At non-terminal stage , player ’s cost function is . The final stage cost is . Define as player ’s action sequence from stage to and as player ’s and all other players’ action sequences from stage to . Player ’s expected cumulative cost from arbitrary initial stage to the terminal stage is defined as
[TABLE]
The expectations are taken first over the external noise sequence and then over other players’ internal type uncertainty.
We cannot exchange the order of these two expectations as is a function of . Each player at stage aims to minimize by choosing only his action sequence but not other players’ action sequence . The following definition of sequential rationality in Definition 2 guarantees that each player has no motivation to deviate from the sequentially rational action at any stage during the interaction if all other players adopt the sequentially rational actions.
Definition 2** (Sequential Rationality).**
*An action sequence is called sequentially rational for player under the belief sequence , state , and type , if for any state at stage , player does not benefit from taking any other action sequence , i.e., . *
Since players’ actions may affect their future beliefs as captured by the belief dynamics in (2), we further require the equilibrium action in Definition 2 to be consistent with the belief dynamics, which leads to the following definition of Perfect Bayesian Nash Equilibrium (PBNE).
Definition 3** (Perfect Bayesian Nash Equilibrium).**
Consider the -player dynamic game of private types and asymmetric information defined by the state dynamics (1) and the expected cumulative cost (4). The action sequence of players over stages compromises the Perfect Bayesian Nash Equilibrium (PBNE) if, regardless of each player ’s type , the following statements hold.
Sequential rationality*: is sequential rational for each player under his belief sequence ;* 2. 2.
Belief consistency*: each player ’s belief sequence is consistent with (2) under .*
Proposition 1**.**
It is sufficient to represent player ’s equilibrium cost under the PBNE action at stage as a function of , and , which is defined as . Under the boundary condition , the following holds for all and all , i.e.,
[TABLE]
where and satisfy (2) and (1), respectively.
Proof.
According to the definition of PBNE, at the second last stage , each player ’s equilibrium action is in general a function of . Due to the coupling between and , we need to solve a set of system equations for all and . Then, will be a function of and we obtain (5) at stage . We can repeat the above procedure from to to obtain the recursive form in (5). ∎
Proposition 1 characterizes the structure of the equilibrium action and the equilibrium cost for each player of type under the solution concept of PBNE; i.e., both terms are feedback functions of the belief state , the physical state , and the player’ type . Although is a function of beliefs over all the remaining stages, only depends on the belief state at the current stage .
If all players’ types are common knowledge, PBNE still applies and we can define a new function to represent the resulting equilibrium cost for all without loss of generality.
II-C Offline Evaluation of Equilibrium Cost
If each player ’s initial belief confirms to the prior distribution of other players’ types, i.e., , then each player at system state with belief state can use his expected equilibrium cost over his type uncertainty as an offline performance measure of the equilibrium action . As a comparison, player ’s expected equilibrium cost under the complete information game serves as a benchmark. Note that player does not need to know the realization of the joint type to compute . Due to the coupling in dynamics, costs, and cognition among players, obtaining more information and knowing the type of another player may not always improve player ’s performance; i.e., there is no guarantee that . Besides the above performance evaluation for an individual player under deception, we may also aim to evaluate the overall performance of multiple players or all players. We define the Price of Deception (PoD) in Definition 4 with a set of coefficients . Since the equilibrium cost can be negative, we let be the normalizing constant to guarantee that is non-negative for all chosen coefficients .
Definition 4** (Price of Deception).**
For a given set of coefficients , the Price of Deception (PoD) of the -player -stage game defined by (1), (4), and (2) under the prior probability distribution is
[TABLE]
The PoD is a crucial evaluation and design metric. We can endow PoD with different meanings by properly choosing the weighting coefficients . For example, if besides players, there is a central planner who aims to minimize the total cost of all players under their deceptive interaction. Then, we can pick , to represent the overall system performance. Although the central planner cannot control players’ state dynamics, costs, and belief dynamics directly, he can still affect their deceptive interaction if he can design the prior probability distribution of the joint type . If the central planner instead only aims to reduce the cost of one player , then we can pick and . With a given weighting parameters , a larger value of indicates a better accomplishment of the above goals. Note that individual deception may improve the system performance, i.e., .
III Linear-Quadratic Specification
Linear-Quadratic (LQ) game is an important class of dynamic games. They can also be applied iteratively to approximate nonlinear stochastic systems with general cost functions and obtain equilibrium actions [40]. In the following sections, we consider linear state dynamics
[TABLE]
with stage-varying matrices , .
Remark 2**.**
System (6) is multi-agent controllable if and only if matrices , are of full rank as noise has zero mean and we can obtain by induction.
Each player ’s cost is quadratic in both and ; i.e.,
[TABLE]
where is a known type-dependent reference trajectory for player and is a known function of . The cost matrices , are symmetric. At the final stage, . We introduce the following three sets of notations for the belief matrix, the extended Riccati equations, and the matrix-form equilibrium action, respectively.
Belief Matrix
With a little abuse of notation, we can define the marginal probability , as the player ’s belief toward the player ’s type at stage . Define the belief matrix for all , as
[TABLE]
where each block element Since all its elements are positive and all rows sum to one, the belief matrix is a right stochastic matrix.
Extended Riccati Equations
Let a sequence of symmetric matrices , vectors , and scalars satisfy the following extended Riccati equations for all :
[TABLE]
[TABLE]
[TABLE]
where functions , are defined below. The boundary conditions of the extended Riccati equations are
[TABLE]
Equilibrium Action in Matrix Form
We need to represent the equilibrium action of all players under all types in matrix form as each player’s action is coupled with other players’ actions under PBNE. Since each player has different equilibrium actions under different types, with a little abuse of notation, we write each player ’s action as a function of his type and define two action vectors and . For all , define a series of -by- square matrices
[TABLE]
Let be -by- block matrices and be -by- block matrices. Finally, define parameter matrices , , and for any . Their elements are given as follows; i.e., ,
[TABLE]
Let matrix , be the truncated row block, i.e., from row to , of matrix . Define shorthand notations and .
III-A Extrinsic Belief Dynamics and Extended Riccati Equations
In this section, we focus on the extrinsic belief dynamics where is independent of players’ actions for all . The proof of Theorem 1 generalizes the one of classical LQ games (e.g., Chapter and in [41]) where we further incorporate players’ asymmetric belief dynamics into their objective functions to minimize their expected costs under deception. We apply dynamic programming from stage backward to stage [math] to obtain a closed-form solution of PBNE.
Theorem 1**.**
An -player -stage LQ game of incomplete information defined by (6), (7), and extrinsic belief dynamics , admits a unique state-feedback PBNE
[TABLE]
if and only if is positive definite and is non-singular for all . The equilibrium cost is quadratic in , i.e.,
[TABLE]
Proof.
We use backward induction to prove the result. At the final stage , the value function is quadratic in and we obtain the boundary conditions for in (12) by matching the RHS of (14). At any stage , if (14) is true at stage , we can expand by plugging in the state dynamics and the belief dynamics . Then, the Right-Hand Side (RHS) of (5) is quadratic in for each player . If the coefficient matrix of the quadratic form is positive definite, then the first-order necessary conditions for minimization are also sufficient and we obtain the following unique set of equations for the equilibrium action by differentiating the RHS of (5) and setting it to zero, i.e., ,
[TABLE]
Due to the coupling in players’ actions and beliefs, we rewrite (15) in matrix form, i.e., , to solve the set of equations. Given the existence of , each player ’s equilibrium action is an affine function in , i.e., . Note that the coefficients for player are functions of , i.e., the beliefs of all players under all types at stage .
Finally, after substituting the equilibrium action into the RHS of (5) and representing in the Left-Hand Side (LHS) in its quadratic form of , we can match the coefficients of quadratic, linear, and constant terms in the LHS and RHS to obtain the extended Riccati equations (9), (10), and (11). ∎
Remark 3** (Positive Definiteness).**
If and , are positive definite for all , then is positive definite for all , because the linear combination of positive definite matrices in (9) preserves positive definiteness. Note that the above condition is only a necessary condition; i.e., and do not need to be positive definite to make positive definite as shown in Section IV.
Remark 4** (Cognitive Coupling).**
Compared with the classical LQ games (e.g., Chapter in [41]), the deception of players’ types results in a unique feature of cognitive coupling represented by the belief matrix in (8); i.e., each player’s action hinges on not only his own belief but also all other players’ beliefs as these beliefs can affect their actions and further the outcome of the interaction. Thus, player can change other players’ actions by manipulating their beliefs of his type , i.e., or making them believe that his belief on their types has changed.
We introduce matrix block partitions as follows. For each type , we divide into -by- blocks where the block is , respectively. The - row block of is , respectively. The - row block of is .
When the system state can be represented by players’ joint states , Corollary 1 shows that the LQ game of asymmetric information degenerates to an LQ control problem if players have decoupled cost and state dynamics defined as follows.
Definition 5** (Decoupled Dynamics and Cost).**
Player has decoupled dynamics if for all , , while all other elements in the - row block and the - column block of are [math]. Besides, all elements of except for the row block are required to be [math]. Player has a decoupled cost if for all stage , , and all elements of equal [math] except for .
Corollary 1** (Degeneration to LQ Control).**
If for all stage and player has both decoupled cost and state dynamics, then his action under PBNE is independent of other players’ actions, types, and beliefs, i.e., , where , , , and .
Proof.
We show by induction that satisfy the sparsity condition that only the block of and the - row block of are nonzero. At stage , and satisfy the above condition. At stage , if satisfy the sparsity condition, becomes a diagonal block matrix where and for all . Then, satisfy the condition based on (9) and (10). ∎
III-B Intrinsic Belief Dynamics and Receding-Horizon Control
If there exists a player whose belief dynamics depend on intrinsic information at some stage as shown in (2), then the equilibrium action is in general a nonlinear function of and the equilibrium cost is not quadratic in even under the LQ setting of (6) and (7). Besides the static cognitive coupling among players in Remark 4, the intrinsic information of in the belief update introduces another dynamic cognitive coupling between the forward belief dynamics via (2) and the backward equilibrium computation via (5), which makes it challenging to compute PBNE. To reduce the computational complexity and further obtain implementable actions, we adopt a receding-horizon approach that computes the sequentially rational action sequence of all the future stages at current stage assuming , yet only implements the current-stage action . Then, at the new stage , each player observes the new system state and updates the belief to and recomputes the entire action sequence under assumption of , yet still only implements the new current-stage action . Players repeat the above procedure until they reach the final stage of the interaction.
Compared with PBNE, which produces an offline planning for all future stages under all possible scenarios before the game has taken place, the receding-horizon approach enables an online replanning of their actions repeatedly at the beginning of each new stage as the interaction continues.
Although we assume that players’ beliefs at the future stages are the same as the current beliefs during the phase of equilibrium computation, players can correct and update their beliefs and actions based on the online observation of during each replanning phase. Thus, the receding-horizon approach provides a reasonable approximation of the PBNE action and is more adaptive to unexpected environmental changes of the state dynamics and cost structure .
Under the LQ specification in (6) and (7) and Bayesian belief dynamics in (3), we summarize the computation phase and online implementation phase in Algorithm 1 and 2, respectively. To investigate the scalability of our algorithms, we analyze the temporal and spatial complexity concerning , and . To simplify the notation and enhance readability, we focus on the symmetric setting where . For each player of type at the beginning of the interaction, i.e., , he needs to store the game parameters , and the belief matrix for all , which are common knowledge. The spatial complexity to store the game parameters and the belief matrix is and , respectively. Note that in general, player has coupled cognition as shown in Remark 4 and has to keep track of not only his belief , but also other players’ beliefs , to decide his equilibrium action under deception at each stage . During the -stage interaction, each player of type observes the system state and computes his equilibrium action at stage based on Algorithm 1. After all players implement their equilibrium actions at stage , the system state evolves to . Based on the new state observation , each player updates the belief matrix in (8) via (3). Since player can delete the game parameters and the belief matrices of previous stages, the spatial complexity remains the same as the real-time stage index increases. Thus, our algorithm can handle the interaction of long duration. All players repeat the above procedure stated in lines - of Algorithm 2 until reaching the terminal stage .
The computational complexity of the belief matrix update in the line of Algorithm 2 is . For any , the term has computational complexity , which is determined by the belief matrix update and the matrix chain multiplication of , respectively. Then, the computational complexity of and is and , respectively. Given and , the computational complexity of in (9) is , which hinges on the computational complexity of (or ), , and the matrix chain multiplication in (9). Similarly, and both have computational complexity of . Therefore, player ’s temporal complexity at each stage is
[TABLE]
The temporal complexity has the maximum value of at the initial stage where each player has to predict the entire future stages to act optimally under the deception. Since the temporal complexity decreases as the real-time stage index increases, a player who can compute the equilibrium action within the required time at the initial stage is guaranteed to meet the real-time requirement in the following stages of interaction. If the number of types and agents are on the same scale, e.g., , then and the computation of belief matrix update plays a dominant role as each player keeps track of all players’ beliefs to obtain the equilibrium action under deception. If , e.g., , then and the inverse of becomes the most time-consuming operation due to the coupling in dynamics, costs, and cognition.
Effective deception can prevent or delay other players from learning the deceiver’s private type. We define the criterion of successful learning of the deceiver’s type in Definition 6 and -deceviability and -learnability in Definition 7.
Definition 6** (Stage of Truth Revelation).**
Consider two players with type and , respectively. Stage is said to be player ’s truth-revealing stage with accuracy 222Since the belief mismatch does not reduce to [math] in finite stages with initial belief , the accuracy threshold . if it satisfies the following two conditions.
- •
The bounded mismatch condition*: player ’s belief mismatch remains less than after stage , i.e.,*
[TABLE]
- •
The first-hitting-time condition*: is the first stage satisfying (16), i.e., *
If there does not exist that satisfies (16), we define . If there are only two players , we write as without ambiguity.
Due to deceivers’ deceptive actions and the external noises, the belief sequence may be fluctuant; i.e., there can exist such that . Thus, as shown in Definition 6, a player should only claim a successful learning of other players’ types if his belief mismatch remains less than for the remaining stages.
Definition 7** (Deceviability and Learnability).**
Consider players with type and , thresholds , and a given stage index . Player is -stage -deceivable if the probability , or equivalently , is not greater than for all . If the above does not hold, player ’s type is said to be -stage -learnable by player .
Since robot deception involves only a finite number of stages, it is essential that the deceived robot can learn the deceiver’s type as quickly as possible so that he has sufficient stages to plan on and mitigate the deception impact from the previous stages. Therefore, the definition of learnability, i.e., non-deceviability in Definition 7, not only requires the deceived player to be capable of learning the deceiver’s private information, but also learning it in a desirable rate, i.e., within stage. Due to the external noise, is a random variable. Thus, the definition of learnability requires ; i.e., player has a large probability to correctly learn the type of player before stage .
IV Dynamic Target Protection under Deception
We investigate a pursuit-evasion scenario that contains
two UAVs with the decoupled linear time-invariant state dynamics, i.e., . We use ‘she’ for UAV , the pursuer, and ‘he’ for UAV , the evader. UAV ’s state represents ’s location in the space, and action affects ’s speed in and directions.
UAV as the evader selects either the harbor in ‘Normandy’ or ‘Calais’ as his final location based on his type . He aims to reach ‘Normandy’ located at in stages if his type is , otherwise ‘Calais’ located at if his type is . UAV as the pursuer can make interfering signals and aims to be close to UAV at the final stage to protect the harbor targeted by the evader, i.e., , where penalizes her distance from the evader at stage , prevents her from a high action cost, and incites her opponent, i.e., the evader, to take costly actions. We classify UAV into two types, i.e., , based on her maneuverability represented by the value of . Given higher maneuverability , the pursuer of type can obtain a higher speed under the same action and thus cover a longer distance.
The evader’s goals of deceptive target reaching and pursuit evasion are incorporated into the cost structure .
Similar to the pursuer’s cost parameters, represents the evader’s level of evasion determination to keep a distance from the pursuer along the trajectory. The action costs of the evader and the pursuer are regulated by and , respectively. The parameters and represent the evader’s attempt to head toward ‘Normandy’ and ‘Calais’, respectively, at stage under type . We use the ratio to represent the evader’s level of trajectory deception. Since the pursuer can learn the evader’s type based on the real-time observations of state , the evader attempts to make his target -ambiguous at all previous stages, i.e., , and reveal his true target only at the final stage, i.e., and . The evader chooses a small and achieves the maximum ambiguity when . Two blue lines in Fig. 1(a) illustrate how the evader manages to remain ambiguous in a cost-effective manner from two different initial locations. Instead of keeping an equal distance to both potential targets, the evader heads toward the midpoint at the early stages to confuse the pursuer. However, the evader starts to head toward the true target at around half of stages rather than the last few stages so that he can reach the target with a moderate control cost . Fig. 1(a) also shows that for a given initial location, the evader who adopts a higher level of trajectory deception heads more toward the misleading target at the early stages.
In this case study, we suppose that the evader’s true target is Calais and let be his true type and be the misleading type.
The following two ratios capture the evader’s tradeoff of being deceptive, effective, and evasive. On one hand, the ratio , reflects the evader’s tradeoff between applying deception along the trajectory and staying close to the true target at the final stage.
Fig. 1(b) shows that as the evader focuses more on a deceptive trajectory represented by a larger value of , his trajectory remains ambiguous for longer stages while his final location is farther away from the true target. On the other hand, the ratio , reflects the evader’s tradeoff between evasion and target-reaching. As the evader focuses more on keeping a distance from the pursuer along the trajectory, he takes a bigger detour and stays farther away from his true target at the final stage as shown in Fig. 1(c).
Finally, we transform UAV ’s coupled cost into the matrix form given in Section III, i.e., ,
[TABLE]
, .
IV-A Deceptive Evader with Decoupled Cost Structure
We first investigate the scenario where the evader has a decoupled cost structure333 This paper has supplementary downloadable materials available at http://ieeexplore.ieee.org, provided by the authors. This includes a video demo of two UAVs’ trajectories and belief updates under the decoupled structure. defined in Definition 5, i.e., . According to Corollary 1, the evader’s trajectory is then independent of the pursuer’s action, type, and belief. Fig. 2 visualizes the pursuer’s trajectories. Although the pursuer only aims to be close to the evader at the final stage, she also takes proactive actions in the previous stages to be cost-efficient. If the pursuer knows the evader’s type, then she can head toward the true target directly and will not be misled by the evader’s trajectory ambiguity at the early stages as illustrated by the black dashed line in Fig. 2. If the evader’s type is private, then a larger initial belief mismatch makes the pursuer head more toward the misleading target at the early stages as illustrated by the three solid lines in Fig. 2. However, due to the pursuer’s online learning, which is compatible, efficient, and robust as shown in Section IV-A1, she manages to approach the evader at the final stage regardless of her initial belief mismatch. Fig. 3 shows the pursuer’s -stage belief variation. The evader’s ambiguous trajectory results in belief fluctuations at the early stages, yet the pursuer can quickly reduce the belief mismatch when the evader starts to head toward the true target. After the pursuer has corrected her initial belief mismatch at around stage , she can head toward the true target in the cost-efficient way; i.e, she attempts to keep a uniform linear motion under the external noise as shown in the upper right region of Fig. 2.
IV-A1 Finite-Horizon Analysis of Bayesian Update
In this subsection, we illustrate the compatibility, efficiency, and robustness of the finite-horizon Bayesian update in (3) to reduce the initial belief mismatch. The pursuer is of high-maneuverability and the evader’s true type is . Define the likelihood function of and as and , respectively. As , and are positive. With an initial belief and a finite likelihood ratio , we can represent (3) in the following form with three properties:
[TABLE]
(Compatibility): For all , the belief update at stage is compatible to the evidence represented by the ratio . In particular, if , then ; if , then ; if , then . 2. 2.
(Efficiency): If the evidence of state observation indicates that the type is more likely to be the true type , i.e., , then the function at stage is monotonically decreasing over . If the evidence indicates that the type is more likely to be the misleading type , i.e., , then the function is monotonically increasing over . 3. 3.
(Robustness): The order of the evidence sequence , has no impact on the belief .
Property one shows that although the external noise can result in the fluctuations of the belief update, the belief mismatch, i.e., , will decrease when , regardless of the prior belief . Property two shows the efficiency of the belief update. The belief changes more under a larger belief mismatch, which results in a quick correction. Property three shows the robustness of the belief update. The erroneous belief update caused by a heavy noise can be corrected in the later stages when the noise fades.
IV-A2 Comparison with Heuristic Policies
We compare the proposed pursuer’s control policy with two heuristic ones to demonstrate its efficacy in counter-deception444 The supplementary materials include a video demo that compares the proposed policy’s trajectory and performance with two heuristic policies. . The first heuristic policy is to repeat the attacker’s trajectory with a one-stage delay; i.e., the pursuer applies the action so that . The pursuer does not need to apply Bayesian learning and we name this policy as direct following. The second heuristic policy for the pursuer is to stay at the initial location until her truth-revealing stage and then head toward the evader’s expected final-stage location in the remaining stages. The second policy is conservative because the pursuer does not take proactive actions until she identifies the evader’s type.
Let player ’s ex-post cumulative cost , be a real-time evaluation of the online algorithm. Although a pursuer under both heuristic policies manages to stay close to the evader at the final stage, Fig. 4 shows that both heuristic policies are more costly than the proposed equilibrium strategy in the long run.
The conservative policy avoids potential trajectory deviations under deception but results in less planning stages for the pursuer to achieve the capture goal. We visualize the accumulation of the pursuer’s cost in Fig. 4(c). The red lines show that the pursuer who adopts the conservative policy spends no action costs before the truth-revealing stage , i.e., , but huge costs in the remaining stages to fulfill her capture goal. The total cumulative cost at the final stage increases exponentially with the value of as shown in Fig. 4(b). The black line in Fig. 4(c) illustrates the accumulation of when the pursuer direct follows the evader’s trajectory. Only under extreme deception scenarios where , the direct following policy results in a lower cost than the conservative policy does. Since the initial belief affects both the truth-revealing stage and the proposed policy, we plot versus under the conservative policy and the proposed policy in Fig. 4(a). When there is no belief mismatch , we have and the conservative policy is equivalent to the proposed policy. As the belief mismatch increases, the cost under the proposed policy (resp. the conservative policy) increases due to the larger deviation along the -axis (resp. the larger ). The proposed policy always results in a lower cost than the conservative policy does. The results in Fig. 4 lead to the following two principles for the pursuer to behave under deception. First, Bayesian learning is a more effective countermeasure than the direct following of the evader’s deceptive trajectory. Second, if learning the evader’s type takes a long time, the pursuer is better to act proactively based on her current belief than to delay actions until the truth-revealing stage.
IV-B Dynamic Game for Deception and Counter-Deception
In this section, the evader has a coupled cost555 A video demo of two UAVs’ real-time trajectories and belief updates under the coupled structure is included in the supplementary materials. defined in Definition 5 and the level of evasion determination increases with a constant rate ; i.e., . The evader deceives the pursuer by hiding his true target. The pursuer can adopt the following two countermeasures to reduce her cost under the evader’s deception. Section IV-B1 investigates the effectiveness of adaptive learning. We find that the pursuer manages to approach the true target at the final stage by updating her belief and taking actions accordingly based on the real-time trajectory observation. Section IV-B2 further allows the pursuer to introduce additional deception, i.e., obfuscate her maneuverability, to counteract the evader’s information advantage and his deception impact.
IV-B1 Pursuer with a Public Type
When the pursuer’s type is common knowledge, we plot both UAVs’ trajectories under two initial beliefs and two types of pursuers in Fig. 5.
The solid lines show that the evader with the coupled cost detours to stay further from the pursuer. The initial belief mismatch causes a deviation along the -axis for both high- and low-maneuverability pursuers as shown in red and blue, respectively. However, the deviation has a smaller magnitude and lasts shorter than the one represented by the red line in Fig. 2 due to the coupled cost structure of the evader. The pursuer with a high maneuverability stays closer to the evader at the final stage.
IV-B2 Deception to Counteract Deception
When the pursuer’s type is also private, Fig. 6 shows that she can manipulate the evader’s initial belief to obtain a smaller and a belief update with less fluctuation. The red line with stars is the same as the one in Fig. 3. It shows that the pursuer’s belief learning is slower and fluctuates more when she interacts with the evader who has a decoupled cost. The reason is that her manipulation of the initial belief does not affect the evader’s decision making as shown in Corollary 1.
A comparison between Fig. 6(a) and Fig. 6(b) shows that it is beneficial for a low-maneuverability pursuer to disguise as a high-maneuverability pursuer but not vice versa. Thus, introducing additional deception to counteract existing deception is not always effective.
IV-C Multi-Dimensional Deception Metrics
The impact of the evader’s deception can be measured by metrics such as the endpoint distance between the evader and the true target, the endpoint distance between two UAVs, both UAVs’ truth-revealing stages , and their ex-post cumulative costs . In this pursuit-evasion case study, we define -reachability and -capturability in Definition 8. Although , is a random variable, we can obtain a good estimate of the reachability and capturability due to the negligible variance of as shown in Fig. 7(a) and Fig. 8(a).
Definition 8** (Reachability and Capturability).**
Consider the proposed pursuit-evasion scenario with a given , a threshold , and all initial beliefs . The target is said to be -reachable if . The evader is said to be -capturable if .
In Section IV-C1, we investigate how the evader can manipulate the pursuer’s initial belief to influence the deception. In Section IV-C2, we investigate how the pursuer’s maneuverability plays a role in deception. In both sections, the evader has a coupled cost structure. The pursuer either applies the Bayesian update or not, which is denoted by blue and red lines, respectively, in both Fig. 7 and Fig. 8. In Section IV-C3, we study other metrics, such as deceivability, distinguishability, and PoD.
IV-C1 The Impact of the Evader’s Belief Manipulation
Both UAVs determine their initial beliefs based on the intelligence collected before their interactions. By falsifying the pursuer’s intelligence, the evader can manipulate the pursuer’s initial belief and further influence the deception as shown in Fig. 7.
In the -axis, an initial belief closer to indicates a smaller belief mismatch. Fig. 7(a) shows that the pursuer’s distance to the evader at the final stage decreases as the belief mismatch decreases regardless of the existence of Bayesian learning. However, the initial belief manipulation has a much less influence on the endpoint distance when Bayesian learning is applied. Fig. 7(b) shows that for each realization of the noise sequence , the pursuer’s truth-revealing stage steps down as the belief mismatch decreases when Bayesian update is applied. Fig. 7(c) illustrates the pursuer’s ex-post cumulative cost and at the last and the second last stage, respectively. Without Bayesian update, the evader’s deception significantly increases the pursuer’s cost at the second last stage due to the large endpoint distance . The red lines show that the cost increase is higher under a larger belief mismatch. Fig. 7(d) illustrates the evader’s ex-post cumulative cost at the last stage. If the pursuer does not apply Bayesian learning, then the evader can decrease his cost by increasing the pursuer’s belief mismatch. If the pursuer applies Bayesian learning, then the evader’s cost increases slightly if the pursuer’s belief mismatch is increased. When the belief mismatch is small (i.e., ), we observe a win-win situation; i.e., Bayesian learning not only reduces the pursuer’s ex-post cumulative cost, but also the evader’s.
IV-C2 The Impact of the Pursuer’s Maneuverability
The pursuer’s maneuverability can also affect deception as shown in Fig. 8.
The pursuer has an initial belief and the evader knows the pursuer’s type. Fig. 8(a) illustrates that the pursuer can exponentially decrease her distance to the evader at the final stage as her maneuverability increases. Fig. 8(b) demonstrates that the maneuverability increase can decrease and increase the pursuer’s and the evader’s ex-post cumulative costs at the final stage, respectively. The variance grows as maneuverability decreases because the pursuer’s trajectory will become largely affected by the external noise. In both figures, we observe the phenomenon of the marginal effect; i.e., the change rates of both the endpoint distance and the cost decrease as the maneuverability increases. Thus, we conclude that higher maneuverability can improve the pursuer’s performance under the evader’s deception as measured by the distance and the cost . Moreover, the improvement rate is higher with low maneuverability.
IV-C3 Deceivability, Distinguishability, and PoD
Deceivability defined in Definition 7 is highly related to the distinguishblity among different types. In this case study, a larger distance between targets, i.e., , makes it easier for the pursuer to distinguish between evaders of type and type . A larger maneuverability difference makes it easier for the evader to distinguish between pursuers of type and type . We visualize two UAVs’ truth-revealing stages versus the distance between targets and the maneuverability difference in Fig. 9. The evader has a coupled cost and both players’ initial belief mismatches are . The dashed black line indicates .
When the maneuverability difference is negligible , the pursuer’s type cannot be learned correctly in stages; i.e., the pursuer is -stage [math]-deceivable. When the maneuverability difference is small, i.e., , yet not negligible, i.e., , the variance of is large.
Let be common knowledge and assume that the evader’s belief confirms to the prior distribution of the pursuer’s type for all stages, i.e., . Then, Fig. 10 illustrates how the prior distribution of the pursuer’s type affects the value of PoD under three scenarios:
- •
, i.e., the central planner only evaluates UAV ’s performance under deception.
- •
, i.e., the central planner only evaluates UAV ’s performance under deception.
- •
, i.e., the central planner evaluates the average performance of two UAVs under deception.
When the pursuer’s type is also common knowledge, i.e., (i.e., the pursuer has type ) and (i.e., the pursuer has type ), the game is of complete information and the value of PoD equals . Since PoD takes continuous values over and has a value of at two endpoints for all feasible , we refer to the plots in Fig. 10 as jump rope plots.
They corroborate that the PoD can be bigger than ; i.e., deception among players may not only benefit the deceiver but also the deceivee.
V Conclusion and Future Work
We have investigated a novel class of rational robot deception problems where intelligent robots hide their heterogeneous private information to achieve their objectives in finite stages with minimum costs. We have proposed an -player dynamic game framework to quantify the impact of deception and design long-term optimal actions for deception and counter-deception. Robots form their own initial beliefs on others’ private information and update their beliefs at each stage based on extrinsic or intrinsic information. Satisfying the properties of sequential rationality and belief consistency, perfect Bayesian Nash equilibrium can be used to predict robots’ actions and costs over the stages. We have studied a class of games in the linear-quadratic form with extrinsic belief dynamics to obtain a unique affine state-feedback control policy and a set of extended Riccati equations. The cognitive coupling resulted from the deception of types demonstrates a distinct feature of rational deception where each player’s action hinges on not only his own belief but also all other players’ beliefs. The concepts of deceivability, distinguishability, and reachability have been defined to characterize the fundamental limits of deception. Meanwhile, the price of deception serves as a crucial evaluation and design metric.
We have investigated a target protection problem where the evader aims to deceptively reach the true target and the pursuer keeps her maneuverability as private information. The pursuer achieves a lower ex-post cumulative cost under the proposed policy than under the direct-following and conservative policies. We have proposed multi-dimensional metrics such as the stage of truth revelation and the endpoint distance to measure the deception impact throughout stages. We have concluded that Bayesian learning can largely reduce the impact of initial belief manipulation and sometimes result in a win-win situation. The increase of the pursuer’s maneuverability can also reduce the endpoint distance and her ex-post cumulative cost yet has a marginal effect. A robot is more deceivable, i.e., less learnable when its potential type is less distinguishable. Finally, we have found that introducing additional deception to counteract existing deception is not always effective. Moreover, deception among multiple players may not only benefit the deceiver but also the deceivee.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. L. Smith, Why We Lie: The Evolutionary Roots of Deception and the Unconscious Mind . Macmillan, 2004.
- 2[2] M. Howard and M. E. Howard, Strategic Deception in the Second World War . WW Norton & Company, 1995, vol. 5.
- 3[3] L. Cowen, T. Ideker, B. J. Raphael, and R. Sharan, “Network propagation: a universal amplifier of genetic associations,” Nature Reviews Genetics , vol. 18, no. 9, p. 551, 2017.
- 4[4] E. Al-Shaer, J. Wei, K. W. Hamlen, and C. Wang, “Dynamic Bayesian games for adversarial and defensive cyber deception,” in Autonomous Cyber Deception . Springer, 2019, pp. 75–97.
- 5[5] D. Li and J. B. Cruz, “Defending an asset: A linear quadratic game approach,” IEEE Transactions on Aerospace and Electronic Systems , vol. 47, no. 2, pp. 1026–1044, 2011.
- 6[6] K. Sreenath and V. Kumar, “Dynamics, control and planning for cooperative manipulation of payloads suspended by cables from multiple quadrotor robots,” in Robotics: Science and Systems , 2013.
- 7[7] J. C. Harsanyi, “Games with incomplete information played by ”Bayesian” players, i-iii. part i. the basic model,” Management Science , vol. 14, no. 3, pp. 159–182, 1967.
- 8[8] V. L. L. Thing and J. Wu, “Autonomous vehicle security: A taxonomy of attacks and defences,” in 2016 IEEE International Conference on Internet of Things (i Things) and IEEE Green Computing and Communications (Green Com) and IEEE Cyber, Physical and Social Computing (CPS Com) and IEEE Smart Data (Smart Data) , 2016, pp. 164–170.
