A Dynamic Game Framework for Rational and Persistent Robot Deception   With an Application to Deceptive Pursuit-Evasion

Linan Huang; Quanyan Zhu

arXiv:1907.00459·eess.SY·July 29, 2021

A Dynamic Game Framework for Rational and Persistent Robot Deception With an Application to Deceptive Pursuit-Evasion

Linan Huang, Quanyan Zhu

PDF

TL;DR

This paper develops a dynamic game framework for rational and persistent deception among robots, using a Bayesian approach and Riccati equations, with applications to pursuit-evasion scenarios and new metrics for deception effectiveness.

Contribution

It introduces a novel PBNE computation method as a stochastic control problem, derives Riccati equations under LQ assumptions, and proposes metrics for evaluating deception strategies.

Findings

01

PBNE can be characterized by extended Riccati equations.

02

Receding-horizon algorithm efficiently computes PBNE.

03

Numerical case study validates the framework and metrics.

Abstract

This article studies rational and persistent deception among intelligent robots to enhance security and operational efficiency. We present an N-player K-stage game with an asymmetric information structure where each robot's private information is modeled as a random variable or its type. The deception is persistent as each robot's private type remains unknown to other robots for all stages. The deception is rational as robots aim to achieve their deception goals at minimum cost. Each robot forms a dynamic belief of others' types based on intrinsic or extrinsic information. Perfect Bayesian Nash equilibrium (PBNE) is a natural solution concept for dynamic games of incomplete information. Due to its requirements of sequential rationality and belief consistency, PBNE provides a reliable prediction of players' actions, beliefs, and expected cumulative costs over the entire K stages. The…

Tables1

Table 1. TABLE I : Summary of variables and their meanings.

Variable	Meaning
$𝒩 := {1, 2, \dots, N}$	Set of $N$ players in the dynamic game
$𝒦 := {0, 1, 2, \dots, K}$	Set of $K$ discrete stages in the dynamic game
$Θ_{i} := {θ_{i}^{1}, θ_{i}^{2}, \dots, θ_{i}^{N_{i}}}$	Set of $N_{i}$ possible types for player $i \in 𝒩$
$θ_{i} \in Θ_{i}$	Type of player $i \in 𝒩$
$θ := [θ_{1}, \dots, θ_{N}]$	$N$ players’ joint type
$Θ_{- i} := \prod_{j \in 𝒩 ∖ {i}} Θ_{j}$	Set of types of all players except for player $i$
$θ_{- i} := {[θ_{j}]}_{j \in 𝒩 ∖ {i}} \in Θ_{- i}$	Types of all players except for player $i$
$Δ (Θ_{- i})$	Set of probability distributions over set $Θ_{- i}$
$Ξ_{i} (\cdot)$	Probability distribution of player $i$ ’s type
$Ξ = {[Ξ_{i}]}_{i \in 𝒩}$	Probability distribution of the joint type $θ$
$Ξ_{w} (\cdot)$	Probability distribution of noise $w^{k}, \forall k \in 𝒦$
$x^{k} \in ℝ^{n \times 1}$	System state of dimension $n$ at stage $k$
$x_{i}^{k} \in ℝ^{n_{i} \times 1}$	Player $i$ ’s state of dimension $n_{i}$ at stage $k$
${[{\hat{x}}_{i}^{k} (θ_{i})]}_{k \in 𝒦}$	Reference trajectory for player $i$ of type $θ_{i}$
$β_{i}^{k} \in Λ_{i} \subseteq {[0, 1]}^{\| Θ_{- i} \| \times \| Θ_{i} \|}$	Player $i$ ’s belief state at stage $k$
$β^{k} = {[β_{i}^{k}]}_{i \in 𝒩} \in Λ$	$N$ players’ joint belief state at stage $k$
$h^{k} := [x^{0}, \dots, x^{k}] \in ℋ^{k}$	State history
$f^{k}$	State transition function at stage $k$
$Γ_{i}^{k}$	Player $i$ ’s belief transition function at stage $k$
$g_{i}^{k}$	Player $i$ ’s cost function at stage $k$
$V_{i}^{k} (β^{k}, x^{k}, θ_{i})$	Player $i$ ’s PBNE cost
${\bar{V}}_{i}^{k} (x^{k}, θ)$	Player $i$ ’s PBNE cost when all players’ types are common knowledge
$u_{i}^{k} \in ℝ^{m_{i} \times 1}$	Player $i$ ’s action of dimension $m_{i}$ at stage $k$
$u^{k} := [u_{1}^{k}, \dots, u_{N}^{k}]$	$N$ players’ joint action at stage $k$
$u_{i}^{k_{0} : K} := [u_{i}^{k_{0}}, \dots, u_{i}^{K}]$	Player $i$ ’s action sequence from $k_{0}$ to $K$
$u^{k_{0} : K} := [u_{i}^{k_{0} : K}, u_{- i}^{k_{0} : K}]$	Player $i$ ’s and all other players’ control sequences from stage $k_{0}$ to $K$
$l_{i}^{k} (θ_{- i} \| h^{k}, θ_{i})$	Player $i$ ’s belief at stage $k$ , i.e., the probability of other players’ types being $θ_{- i}$ based on player $i$ ’s available information of $h^{k}, θ_{i}$

Equations46

x^{k + 1} = f^{k} (x^{k}, u_{1}^{k}, \dots, u_{N}^{k}, θ_{1}, \dots, θ_{N}) + w^{k}, k \in K ∖ {K} .

x^{k + 1} = f^{k} (x^{k}, u_{1}^{k}, \dots, u_{N}^{k}, θ_{1}, \dots, θ_{N}) + w^{k}, k \in K ∖ {K} .

β_{i}^{k} := [l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}^{1}), l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}^{2}), \dots, l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}^{N_{i}})]_{θ_{- i} \in Θ_{- i}}

β_{i}^{k} := [l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}^{1}), l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}^{2}), \dots, l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}^{N_{i}})]_{θ_{- i} \in Θ_{- i}}

β_{i}^{k + 1} := Γ_{i}^{k} (β_{i}^{k}, u^{k}, w^{k}, θ_{i}), \forall k \in {0, \dots, K - 1} .

β_{i}^{k + 1} := Γ_{i}^{k} (β_{i}^{k}, u^{k}, w^{k}, θ_{i}), \forall k \in {0, \dots, K - 1} .

l_{i}^{k + 1} (θ_{- i} ∣ h^{k + 1}, θ_{i}) = \frac{l _{i}^{k} ( θ _{- i} ∣ h ^{k} , θ _{i} ) Pr ( x ^{k + 1} ∣ θ _{- i} , x ^{k} , θ _{i} )}{\sum _{\overset{ˉ}{θ}_{- i} \in Θ_{- i}} l _{i}^{k} ( θ ˉ _{- i} ∣ h ^{k} , θ _{i} ) Pr ( x ^{k + 1} ∣ θ ˉ _{- i} , x ^{k} , θ _{i} )} .

l_{i}^{k + 1} (θ_{- i} ∣ h^{k + 1}, θ_{i}) = \frac{l _{i}^{k} ( θ _{- i} ∣ h ^{k} , θ _{i} ) Pr ( x ^{k + 1} ∣ θ _{- i} , x ^{k} , θ _{i} )}{\sum _{\overset{ˉ}{θ}_{- i} \in Θ_{- i}} l _{i}^{k} ( θ ˉ _{- i} ∣ h ^{k} , θ _{i} ) Pr ( x ^{k + 1} ∣ θ ˉ _{- i} , x ^{k} , θ _{i} )} .

J_{i}^{k_{0}} (l_{i}^{k_{0} : K - 1}, u^{k_{0} : K - 1}, x^{k_{0}}, θ_{i}) = E_{w^{K - 1} \sim Ξ_{w}} [g_{i}^{K} (x^{K}, θ_{i})] + k = k_{0} \sum K - 1 E_{w^{k - 1} \sim Ξ_{w}} [E_{θ_{- i} \sim l_{i}^{k}} [g_{i}^{k} (x^{k}, u^{k}, θ_{i})]] .

J_{i}^{k_{0}} (l_{i}^{k_{0} : K - 1}, u^{k_{0} : K - 1}, x^{k_{0}}, θ_{i}) = E_{w^{K - 1} \sim Ξ_{w}} [g_{i}^{K} (x^{K}, θ_{i})] + k = k_{0} \sum K - 1 E_{w^{k - 1} \sim Ξ_{w}} [E_{θ_{- i} \sim l_{i}^{k}} [g_{i}^{k} (x^{k}, u^{k}, θ_{i})]] .

V_{i}^{k} (β^{k}, x^{k}, θ_{i}) = u_{i}^{k} min θ_{- i} \sum l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}) {g_{i}^{k} (x^{k}, u^{k}, θ_{i}) + E_{w^{k} \sim Ξ_{w}} [V_{i}^{k + 1} (β^{k + 1}, x^{k + 1}, θ_{i})]}, \forall θ_{i} \in Θ_{i}, \forall i \in N,

V_{i}^{k} (β^{k}, x^{k}, θ_{i}) = u_{i}^{k} min θ_{- i} \sum l_{i}^{k} (θ_{- i} ∣ h^{k}, θ_{i}) {g_{i}^{k} (x^{k}, u^{k}, θ_{i}) + E_{w^{k} \sim Ξ_{w}} [V_{i}^{k + 1} (β^{k + 1}, x^{k + 1}, θ_{i})]}, \forall θ_{i} \in Θ_{i}, \forall i \in N,

p^{η} (Ξ) := \frac{\sum _{i \in N} η _{i} E _{θ \sim Ξ} [ V ˉ _{i}^{0} ( x ^{0} , θ )] + η _{0} ( Ξ )}{\sum _{i \in N} η _{i} E _{θ_{i} \sim Ξ_{i}} [ V _{i}^{0} ( β ^{0} , x ^{0} , θ _{i} )] + η _{0} ( Ξ )} \in [0, \infty) .

p^{η} (Ξ) := \frac{\sum _{i \in N} η _{i} E _{θ \sim Ξ} [ V ˉ _{i}^{0} ( x ^{0} , θ )] + η _{0} ( Ξ )}{\sum _{i \in N} η _{i} E _{θ_{i} \sim Ξ_{i}} [ V _{i}^{0} ( β ^{0} , x ^{0} , θ _{i} )] + η _{0} ( Ξ )} \in [0, \infty) .

f^{k} (x^{k}, u^{k}, θ) := A^{k} (θ) x^{k} + i = 1 \sum N B_{i}^{k} (θ_{i}) u_{i}^{k},

f^{k} (x^{k}, u^{k}, θ) := A^{k} (θ) x^{k} + i = 1 \sum N B_{i}^{k} (θ_{i}) u_{i}^{k},

g_{i}^{k} (x^{k}, u^{k}, θ_{i}) = (x^{k} - \overset{x}{^}_{i}^{k} (θ_{i}))^{'} D_{i}^{k} (θ_{i}) (x^{k} - \overset{x}{^}_{i}^{k} (θ_{i})) + \hat{f}_{i}^{k} (\overset{x}{^}_{i}^{k} (θ_{i})) + j = 1 \sum N (u_{j}^{k})^{'} F_{ij}^{k} (θ_{i}) u_{j}^{k}, \forall k \in K,

g_{i}^{k} (x^{k}, u^{k}, θ_{i}) = (x^{k} - \overset{x}{^}_{i}^{k} (θ_{i}))^{'} D_{i}^{k} (θ_{i}) (x^{k} - \overset{x}{^}_{i}^{k} (θ_{i})) + \hat{f}_{i}^{k} (\overset{x}{^}_{i}^{k} (θ_{i})) + j = 1 \sum N (u_{j}^{k})^{'} F_{ij}^{k} (θ_{i}) u_{j}^{k}, \forall k \in K,

L_{ij}^{k} := L_{i}^{k} (θ_{j}^{1} ∣ h^{k}, θ_{i}^{1}), L_{i}^{k} (θ_{j}^{1} ∣ h^{k}, θ_{i}^{2}), ⋮ L_{i}^{k} (θ_{j}^{1} ∣ h^{k}, θ_{i}^{N_{i}}), \dots \dots ⋱ \dots L_{i}^{k} (θ_{j}^{N_{j}} ∣ h^{k}, θ_{i}^{1}) L_{i}^{k} (θ_{j}^{N_{j}} ∣ h^{k}, θ_{i}^{2}) ⋮ L_{i}^{k} (θ_{j}^{N_{j}} ∣ h^{k}, θ_{i}^{N_{i}}),

L_{ij}^{k} := L_{i}^{k} (θ_{j}^{1} ∣ h^{k}, θ_{i}^{1}), L_{i}^{k} (θ_{j}^{1} ∣ h^{k}, θ_{i}^{2}), ⋮ L_{i}^{k} (θ_{j}^{1} ∣ h^{k}, θ_{i}^{N_{i}}), \dots \dots ⋱ \dots L_{i}^{k} (θ_{j}^{N_{j}} ∣ h^{k}, θ_{i}^{1}) L_{i}^{k} (θ_{j}^{N_{j}} ∣ h^{k}, θ_{i}^{2}) ⋮ L_{i}^{k} (θ_{j}^{N_{j}} ∣ h^{k}, θ_{i}^{N_{i}}),

\begin{split}S_{i}^{k}=&D_{i}^{k}+\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}\bigg{[}(A^{k}+\sum_{j=1}^{N}B^{k}_{j}\Psi^{1,k}_{j})^{\prime}\mathbb{E}_{w^{k}\sim\Xi_{w}}[S_{i}^{k+1}]\\ &\cdot(A^{k}+\sum_{j=1}^{N}B^{k}_{j}\Psi^{1,k}_{j})+\sum_{j=1}^{N}(\Psi^{1,k}_{j})^{\prime}F_{ij}^{k}\Psi^{1,k}_{j}\bigg{]},\end{split}

\begin{split}S_{i}^{k}=&D_{i}^{k}+\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}\bigg{[}(A^{k}+\sum_{j=1}^{N}B^{k}_{j}\Psi^{1,k}_{j})^{\prime}\mathbb{E}_{w^{k}\sim\Xi_{w}}[S_{i}^{k+1}]\\ &\cdot(A^{k}+\sum_{j=1}^{N}B^{k}_{j}\Psi^{1,k}_{j})+\sum_{j=1}^{N}(\Psi^{1,k}_{j})^{\prime}F_{ij}^{k}\Psi^{1,k}_{j}\bigg{]},\end{split}

\begin{split}N_{i}^{k}=&-2D^{k}_{i}\hat{x}_{i}^{k}+\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}\bigg{[}(\sum_{j=1}^{N}B^{k}_{j}\Psi^{1,k}_{j}+A^{k})^{\prime}(\mathbb{E}_{w^{k}\sim\Xi_{w}}[N_{i}^{k+1}]\\ &+2\mathbb{E}_{w^{k}\sim\Xi_{w}}[S_{i}^{k+1}]\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j})+2\sum_{j=1}^{N}(\Psi^{1,k}_{j})^{\prime}F_{ij}^{k}\Psi^{2,k}_{j}\bigg{]},\end{split}

\begin{split}N_{i}^{k}=&-2D^{k}_{i}\hat{x}_{i}^{k}+\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}\bigg{[}(\sum_{j=1}^{N}B^{k}_{j}\Psi^{1,k}_{j}+A^{k})^{\prime}(\mathbb{E}_{w^{k}\sim\Xi_{w}}[N_{i}^{k+1}]\\ &+2\mathbb{E}_{w^{k}\sim\Xi_{w}}[S_{i}^{k+1}]\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j})+2\sum_{j=1}^{N}(\Psi^{1,k}_{j})^{\prime}F_{ij}^{k}\Psi^{2,k}_{j}\bigg{]},\end{split}

\begin{split}q_{i}^{k}=&(\hat{x}_{i}^{k})^{\prime}D^{k}_{i}\hat{x}_{i}^{k}+\hat{f}_{i}^{k}(\hat{x}_{i}^{k})+\mathbb{E}_{w^{k}\sim\Xi_{w}}[(w^{k})^{\prime}S_{i}^{k+1}w^{k}+q_{i}^{k+1}]\\ &+\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}\bigg{[}(\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j})^{\prime}\mathbb{E}_{w^{k}\sim\Xi_{w}}[S_{i}^{k+1}]\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j}\\ &+(\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j})^{\prime}\mathbb{E}_{w^{k}\sim\Xi_{w}}[N_{i}^{k+1}]+\sum_{j=1}^{N}(\Psi^{2,k}_{j})^{\prime}F_{ij}^{k}\Psi^{2,k}_{j}\bigg{]},\end{split}

\begin{split}q_{i}^{k}=&(\hat{x}_{i}^{k})^{\prime}D^{k}_{i}\hat{x}_{i}^{k}+\hat{f}_{i}^{k}(\hat{x}_{i}^{k})+\mathbb{E}_{w^{k}\sim\Xi_{w}}[(w^{k})^{\prime}S_{i}^{k+1}w^{k}+q_{i}^{k+1}]\\ &+\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}\bigg{[}(\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j})^{\prime}\mathbb{E}_{w^{k}\sim\Xi_{w}}[S_{i}^{k+1}]\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j}\\ &+(\sum_{j=1}^{N}B^{k}_{j}\Psi^{2,k}_{j})^{\prime}\mathbb{E}_{w^{k}\sim\Xi_{w}}[N_{i}^{k+1}]+\sum_{j=1}^{N}(\Psi^{2,k}_{j})^{\prime}F_{ij}^{k}\Psi^{2,k}_{j}\bigg{]},\end{split}

S_{i}^{K} = D_{i}^{K}; N_{i}^{K} = - 2 D_{i}^{K} \overset{x}{^}_{i}^{K}; q_{i}^{K} = (\overset{x}{^}_{i}^{K})^{'} D_{i}^{K} \overset{x}{^}_{i}^{K} + \hat{f}_{i}^{K} (\overset{x}{^}_{i}^{K}) .

S_{i}^{K} = D_{i}^{K}; N_{i}^{K} = - 2 D_{i}^{K} \overset{x}{^}_{i}^{K}; q_{i}^{K} = (\overset{x}{^}_{i}^{K})^{'} D_{i}^{K} \overset{x}{^}_{i}^{K} + \hat{f}_{i}^{K} (\overset{x}{^}_{i}^{K}) .

R_{i}^{k} (β^{k}, θ_{i}) := F_{ii}^{k} (θ_{i}) + (B_{i}^{k} (θ_{i}))^{'} S_{i}^{k + 1} (β^{k}, θ_{i}) B_{i}^{k} (θ_{i}) .

R_{i}^{k} (β^{k}, θ_{i}) := F_{ii}^{k} (θ_{i}) + (B_{i}^{k} (θ_{i}))^{'} S_{i}^{k + 1} (β^{k}, θ_{i}) B_{i}^{k} (θ_{i}) .

\begin{split}&W^{1,k}_{i}(\beta^{k})=\bigg{[}(B^{k}_{i}(\theta^{1}_{i}))^{\prime}S_{i}^{k+1}(\beta^{k},\theta^{1}_{i})\mathbb{E}_{{\theta}_{-i}\sim{l}_{i}^{k}}[A^{k}(\theta^{1}_{i},\theta_{-i})];\\ &\quad\quad\quad\quad\cdots;(B^{k}_{i}(\theta^{N_{i}}_{i}))^{\prime}S_{i}^{k+1}(\beta^{k},\theta^{N_{i}}_{i})\mathbb{E}_{{\theta}_{-i}\sim{l}_{i}^{k}}[A^{k}(\theta^{N_{i}}_{i},\theta_{-i})]\bigg{]},\\ &W^{2,k}_{i}(\beta^{k})=\frac{1}{2}\bigg{[}(B^{k}_{i}(\theta^{1}_{i}))^{\prime}N_{i}^{k+1}(\beta^{k},\theta^{1}_{i});\\ &\quad\quad\quad\quad\cdots;(B^{k}_{i}(\theta^{N_{i}}_{i}))^{\prime}N_{i}^{k+1}(\beta^{k},\theta^{N_{i}}_{i})\bigg{]},\\ &W_{ii}^{0,k}(\beta^{k})=\operatorname{Diag}[R^{k}_{i}(\beta^{k},\theta_{i}^{1}),\cdots,R^{k}_{i}(\beta^{k},\theta_{i}^{N_{i}})],\\ &W_{ij}^{0,k}(\beta^{k})=(\mathbf{B}_{i}^{k})^{\prime}\mathbf{S}_{i}^{k+1}(\beta^{k})\mathbf{L}_{ij}^{k}\mathbf{B}_{j}^{k},\forall j\in\mathcal{N}\setminus\{i\}.\end{split}

\begin{split}&W^{1,k}_{i}(\beta^{k})=\bigg{[}(B^{k}_{i}(\theta^{1}_{i}))^{\prime}S_{i}^{k+1}(\beta^{k},\theta^{1}_{i})\mathbb{E}_{{\theta}_{-i}\sim{l}_{i}^{k}}[A^{k}(\theta^{1}_{i},\theta_{-i})];\\ &\quad\quad\quad\quad\cdots;(B^{k}_{i}(\theta^{N_{i}}_{i}))^{\prime}S_{i}^{k+1}(\beta^{k},\theta^{N_{i}}_{i})\mathbb{E}_{{\theta}_{-i}\sim{l}_{i}^{k}}[A^{k}(\theta^{N_{i}}_{i},\theta_{-i})]\bigg{]},\\ &W^{2,k}_{i}(\beta^{k})=\frac{1}{2}\bigg{[}(B^{k}_{i}(\theta^{1}_{i}))^{\prime}N_{i}^{k+1}(\beta^{k},\theta^{1}_{i});\\ &\quad\quad\quad\quad\cdots;(B^{k}_{i}(\theta^{N_{i}}_{i}))^{\prime}N_{i}^{k+1}(\beta^{k},\theta^{N_{i}}_{i})\bigg{]},\\ &W_{ii}^{0,k}(\beta^{k})=\operatorname{Diag}[R^{k}_{i}(\beta^{k},\theta_{i}^{1}),\cdots,R^{k}_{i}(\beta^{k},\theta_{i}^{N_{i}})],\\ &W_{ij}^{0,k}(\beta^{k})=(\mathbf{B}_{i}^{k})^{\prime}\mathbf{S}_{i}^{k+1}(\beta^{k})\mathbf{L}_{ij}^{k}\mathbf{B}_{j}^{k},\forall j\in\mathcal{N}\setminus\{i\}.\end{split}

u_{i}^{*, k} (β^{k}, x^{k}, θ_{i}) = Ψ_{i}^{1, k} (β^{k}, θ_{i}) x^{k} + Ψ_{i}^{2, k} (β^{k}, θ_{i}),

u_{i}^{*, k} (β^{k}, x^{k}, θ_{i}) = Ψ_{i}^{1, k} (β^{k}, θ_{i}) x^{k} + Ψ_{i}^{2, k} (β^{k}, θ_{i}),

V_{i}^{k} (β^{k}, x^{k}, θ_{i}) = q_{i}^{k} (β^{k}, θ_{i}) + (x^{k})^{'} N_{i}^{k} (β^{k}, θ_{i}) + (x^{k})^{'} S_{i}^{k} (β^{k}, θ_{i}) x^{k}, \forall i \in N, k \in K .

V_{i}^{k} (β^{k}, x^{k}, θ_{i}) = q_{i}^{k} (β^{k}, θ_{i}) + (x^{k})^{'} N_{i}^{k} (β^{k}, θ_{i}) + (x^{k})^{'} S_{i}^{k} (β^{k}, θ_{i}) x^{k}, \forall i \in N, k \in K .

- R_{i}^{k} u_{i}^{*, k} (θ_{i}) = (B_{i}^{k})^{'} S_{i}^{k + 1} E_{θ_{- i} \sim l_{i}^{k}} [A^{k}] x^{k} + \frac{1}{2} (B_{i}^{k})^{'} N_{i}^{k + 1} + (B_{i}^{k})^{'} S_{i}^{k + 1} j \neq = i \sum E_{θ_{j} \sim l_{i}^{k}} [B_{j}^{k} (θ_{j}) u_{j}^{*, k} (θ_{j})], \forall i \in N .

- R_{i}^{k} u_{i}^{*, k} (θ_{i}) = (B_{i}^{k})^{'} S_{i}^{k + 1} E_{θ_{- i} \sim l_{i}^{k}} [A^{k}] x^{k} + \frac{1}{2} (B_{i}^{k})^{'} N_{i}^{k + 1} + (B_{i}^{k})^{'} S_{i}^{k + 1} j \neq = i \sum E_{θ_{j} \sim l_{i}^{k}} [B_{j}^{k} (θ_{j}) u_{j}^{*, k} (θ_{j})], \forall i \in N .

O ((K - k) \cdot N_{0} N \cdot max (N_{0}^{N} N, N_{0}^{3} N^{3})) .

O ((K - k) \cdot N_{0} N \cdot max (N_{0}^{N} N, N_{0}^{3} N^{3})) .

1 - l_{i}^{k} (θ_{j} ∣ h^{k}, θ_{i}) \leq δ, \forall k \geq k_{i, j}^{t r} .

1 - l_{i}^{k} (θ_{j} ∣ h^{k}, θ_{i}) \leq δ, \forall k \geq k_{i, j}^{t r} .

\leavevmode\resizebox{465.06001pt}{}{ $D_{2}^{k}(\theta_{2})=\begin{bmatrix}-d_{21}^{k}&0&d_{21}^{k}&0\\ 0&-d_{21}^{k}&0&d_{21}^{k}\\ d_{21}^{k}&0&d_{2,b}^{k}+d_{2,g}^{k}-d_{21}^{k}&0\\ 0&d_{21}^{k}&0&d_{2,b}^{k}+d_{2,g}^{k}-d_{21}^{k}\\ \end{bmatrix}$},

\leavevmode\resizebox{465.06001pt}{}{ $D_{2}^{k}(\theta_{2})=\begin{bmatrix}-d_{21}^{k}&0&d_{21}^{k}&0\\ 0&-d_{21}^{k}&0&d_{21}^{k}\\ d_{21}^{k}&0&d_{2,b}^{k}+d_{2,g}^{k}-d_{21}^{k}&0\\ 0&d_{21}^{k}&0&d_{2,b}^{k}+d_{2,g}^{k}-d_{21}^{k}\\ \end{bmatrix}$},

l_{1}^{k + 1} = \frac{l _{1}^{k} \cdot a ^{k}}{l _{1}^{k} \cdot a ^{k} + ( 1 - l _{1}^{k} ) \cdot c ^{k}} = \frac{1}{1 + ( \frac{1}{l _{1}^{0}} - 1 ) \prod _{\overset{ˉ}{k} = 0}^{k} e ^{\overset{ˉ}{k}}} \in (0, 1) .

l_{1}^{k + 1} = \frac{l _{1}^{k} \cdot a ^{k}}{l _{1}^{k} \cdot a ^{k} + ( 1 - l _{1}^{k} ) \cdot c ^{k}} = \frac{1}{1 + ( \frac{1}{l _{1}^{0}} - 1 ) \prod _{\overset{ˉ}{k} = 0}^{k} e ^{\overset{ˉ}{k}}} \in (0, 1) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Dynamic Game Framework for Rational and Persistent Robot Deception with an Application to Deceptive Pursuit-Evasion

Linan Huang, Quanyan Zhu, This paper has been accepted for publication in IEEE Transactions on Automation Science and Engineering This research is partially supported by awards ECCS-1847056, CNS-1544782, CNS-2027884, and SES-1541164 from National Science of Foundation (NSF), and grant W911NF-19-1-0041 from Army Research Office (ARO). L. Huang and Q. Zhu are with the Department of Electrical and Computer Engineering, New York University, 370 Jay Street, Brooklyn, NY 11201, USA; E-mail: {lh2328,qz494}@nyu.eduDigital Object Identifier 10.1109/TASE.2021.3097286

Abstract

This paper studies rational and persistent deception among intelligent robots to enhance security and operational efficiency. We present an N-player K-stage game with an asymmetric information structure where each robot’s private information is modeled as a random variable or its type. The deception is persistent as each robot’s private type remains unknown to other robots for all stages. The deception is rational as robots aim to achieve their deception goals at minimum cost. Each robot forms a dynamic belief of others’ types based on intrinsic or extrinsic information. Perfect Bayesian Nash Equilibrium (PBNE) is a natural solution concept for dynamic games of incomplete information. Due to its requirements of sequential rationality and belief consistency, PBNE provides a reliable prediction of players’ actions, beliefs, and expected cumulative costs over the entire K stages. The contribution of this work is fourfold. First, we identify the PBNE computation as a nonlinear stochastic control problem and characterize the structures of players’ actions and costs under PBNE. We further derive a set of extended Riccati equations with cognitive coupling under the linear-quadratic setting and extrinsic belief dynamics. Second, we develop a receding-horizon algorithm with low temporal and spatial complexity to compute PBNE under intrinsic belief dynamics. Third, we investigate a deceptive pursuit-evasion game as a case study and use numerical experiments to corroborate the results. Finally, we propose metrics, such as deceivability, reachability, and the price of deception, to evaluate the strategy design and the system performance under deception.

Note to Practitioners

Recent advances in automation and adaptive control in multi-agent systems enable robots to use deception to accomplish their objectives. Deception involves intentional information hiding to compromise the security and operational efficiency of the robotic systems. This work proposes a dynamic game framework to quantify the impact of deception, understand the robots’ behaviors and intentions, and design cost-efficient strategies under the deception that persists over stages. Existing researches on robot deception have relied on experiments while this work aims to lay a theoretical foundation of deception with quantitative metrics, such as deceivability and the price of deception. The proposed model has wide applications, including cooperative robots, pursuit and evasion, and human-robot teaming. The pursuit-evasion games are used as case studies to show how the deceiver can amplify the deception by belief manipulation and how the deceived robots can reduce the negative impact of deception by enhanced maneuverability and Bayesian learning. The future work would focus on designing cooperative deception among swarm robotics and robotic systems that are robust to or further benefit from deception.

Index Terms:

Robot deception, perfect Bayesian equilibrium, pursuit-evasion, linear-quadratic games, discrete-time Riccati equations

I Introduction

Deception is a ubiquitous phenomenon in biology [1], military [2], politics and media [3], and cyberspace [4]. In particular, deception plays an increasingly significant role in cyber-physical systems, including autonomous vehicles and robots driven by artificial intelligence (AI). Recent advances in these AI-enabled technologies have not only allowed robots to adapt to the dynamic environment via real-time observations, but also made them deceivable. A deceiver can intentionally hide or reveal selected information to alter the beliefs and behaviors of the target robots for a higher reward. Since deception has many forms and delivery methods, understanding deception in a unified and quantitative framework is an indispensable step toward assessing the outcomes, measuring the impact, and designing strategies. This work aims to design robots that can interact with others efficiently under deceptive environments.

We identify the following challenges and features of robot deception. First, by definition, deception involves at least two participants interacting with each other. An intelligent robot should further consider other participants’ rationality, predict their potential deceptive behaviors, and adjust its actions accordingly to alleviate the negative effect of deception. Second, due to the robots’ dynamic nature, one-shot deception can exert a subsequent influence. The participating robots need to form long-term objectives to deceive or counter-deceive other robots. The multi-stage interactions also make it possible for the deceiver to apply deception at different stages. Third, each robot contains heterogeneous private information, which results in an asymmetric cognition structure; i.e., robots can form different beliefs over the same piece of unknown information. Thus, besides the couplings of state dynamics and costs, the multi-agent system further has cognitive coupling; i.e., each robot’s behaviors are not only affected by its own belief but also the beliefs of the others.

To capture these features, we model the deceptive interaction between $N$ strategic robots as a dynamic game of incomplete information. During the finite $K$ stages of interaction, $N$ robots accomplish non-cooperative tasks such as pursuit-evasion in the battlefield [5] or cooperative tasks such as collective towing [6]. Robots introduce deception in the above interacting scenarios due to antagonism, selfishness, and privacy concerns. Following Harsanyi’s approach [7], we capture each robot’s private information by a random variable. The realization of the random variable, which is called the robot’s type, is known only to itself, while the support of the random variable, which contains all its possible types, is known to all robots. Take the pursuit-evasion scenario as an example, due to the constraints of weather, terrain, and weapon, both the evading and the pursuing robots know the feasible beachheads for the evader to land on. However, the evader chooses only one beachhead as his true target and the evader’s choice, i.e., his type, is unknown to the pursuer. The pursuer in the battlefield knows the existence of the deception and learns to counter the deception by forming and updating her belief based on real-time observations. Since these tasks are usually time-constrained, robots cannot wait and freeze until they have learned the true type. Instead, they have to take concurrent actions while the deceiver’s type remains uncertain.

We consider two classes of belief dynamics based on whether robots exploit the intrinsic information such as the prediction of other robots’ actions, or the extrinsic information to update their beliefs. Each robot aims to minimize its expected cumulative cost over $K$ stages. Since the expectation involves its $K$ -stage belief sequence of other players’ private types, its actions should be sequentially rational under its belief sequence and the belief sequence should be consistent with the belief dynamics as well. These two requirements lead to the solution concept of Perfect Bayesian Nash Equilibrium (PBNE) where a player’s unilateral deviation from the equilibrium increases his long-run cost. By appending the belief state (i.e., all players’ beliefs under all possible types) to the system state, the PBNE computation is equivalent to a multi-agent nonlinear stochastic control problem and the method of dynamic programming applies. Without loss of generality, we characterize the structure of the action and the cost under PBNE as a feedback function of the belief state and the system state at the current stage. To provide an offline evaluation metric of the equilibrium cost under incomplete information, we use the expected equilibrium cost under complete information as a benchmark and define the Price of Deception (PoD).

Due to their tractability and generality, we focus on incomplete-information Linear-Quadratic (LQ) games with extrinsic belief dynamics to obtain the PBNE action that is unique and affine to the system state. We obtain a set of extended Riccati equations, which explicitly characterizes the coupling in the state dynamics, costs, and cognition of all robots. Under proper decoupling structures, the extended Riccati equations degenerate to the classical Riccati equations for the problems of optimal control or complete-information LQ games. Under the incomplete-information LQ games with intrinsic belief dynamics, the equilibrium action is in general not affine feedback of the system state. Thus, we adopt a receding-horizon approach to provide a reasonable approximation of PBNE; i.e., instead of offline planning of all $K$ -stage actions before the game starts, players recompute their actions based on the real-time observations and their updated beliefs at each new stage during the interaction.

Finally, we investigate a target protection problem where an evader aims to deceptively reach one of the possible targets and simultaneously evade the pursuer. The game has doubled-sided asymmetric information. The evader’s private or hidden information is his true target while the pursuer’s private information is her capability to maneuver or the maneuverability. We propose multi-dimensional metrics, including the stage of truth revelation and the endpoint distance, to assess the deception impact. We define the concept of deceivability to characterize the fundamental limits of deception and investigate how it is affected by the distinguishability of the private information. We compare the proposed control policy with two heuristic polices to demonstrate its efficacy to counter deception at a much lower cost. We show that Bayesian learning can significantly reduce the impact of initial belief manipulation and result in a win-win situation for some cases. The increase of the pursuer’s maneuverability improves her control performance under deception yet has a marginal effect. We also find that applying deception to counter deception is not always effective; e.g., it can be beneficial for a less maneuverable pursuer to disguise as a more maneuverable pursuer but not vice versa. The numerical results corroborate that PoD can exceed $1$ ; i.e., deception among players may not only benefit the deceiver but also the deceivee.

I-A Related Works

The secure and efficient operation of robots, autonomous vehicles, and industrial control systems is vital for recent advances in technologies. Many works [8, 9, 10] have investigated how to protect these systems from various attacks on sensor measurements [11], communication channels [12], and control signals [13, 14]. Deception is a key feature of sophisticated attacks with a focus on intentionally hiding private information [15, 16], introducing randomness [17], and manipulating other players’ beliefs [18, 19]. Deception in robotic systems can be conducted through visual displays [20], facial expressions and body gestures [21], and trajectories [22, 15]. Existing works on robot deception are largely based on experimental approaches [15, 23, 24]. There is a need for a formal and quantitative framework to assess the deception impact, understand the fundamental limit and tradeoff of deception, and determine real-time strategies. Compared to the theoretical works of deceptive path planning and goal recognition [25, 26], which focus on identifying the true target behind deception, our work further determines optimal and cost-effective control policies to counteract deception and physically protect the true target; e.g., the pursuer adopts the action sequence of minimum cost to reach and protect the true beachhead selected by the evader. Compared to control-theoretic deception frameworks based on Markov decision processes [17, 18] and stochastic games [27], we adopt a state-space representation to better characterize the physical dynamics of robots and autonomous vehicles.

Game models such as hypergames [28], dynamic Bayesian games [16], partially observable stochastic games [19, 29], and games that involve signaling mechanisms [30, 31] have been adopted as natural analytic paradigms to understand deception between intelligent players. The computation of equilibrium solutions for dynamic games of incomplete information, especially ones with non-classical information structure [32], is often a challenging task. Previous works have adopted conjugate prior assumptions to simplify Bayesian update and decouple the forward type estimation and backward action optimization under a finite state space and a continuous type space [33, 34]. To solve the coupling between players’ belief dynamics and the multi-agent optimal control problem in the context of robotic systems where states are continuous and constrained by physical dynamics with noises, we adopt a receding-horizon approach to compute PBNE, which yields computationally tractable online strategies for the players. Similar receding-horizon approaches have been used in other contexts, including cyber-physical systems [35], military air operation [36], and autonomous racing [37].

I-B Notations and Organization of the Paper

Calligraphic letter $\mathcal{A}$ defines a set and $|\mathcal{A}|$ represents its cardinality. Define $\mathcal{B}\setminus\mathcal{A}$ as the set of elements in $\mathcal{B}$ but not in $\mathcal{A}$ . The Euclidean norm of a vector $x$ is represented by $||x||_{2}$ . Let $\mathbb{E}_{a\sim A}[f(a)]$ denote the expectation of $f(a)$ over random variable $a$ whose probability distribution is $A$ . Let ′ represent matrix transpose and $\operatorname{Diag}[a_{1},\cdots,a_{N}]$ represent a block diagonal matrix with possibly non-square matrices $a_{i},i\in\mathcal{N}$ , on its diagonal. Define $\{a_{i}\}_{i\in\mathcal{N}}:=\{a_{1},\cdots,a_{N}\}$ as a set of $N$ elements, $[a_{i}]_{i\in\mathcal{N}}:=[a_{1},\cdots,a_{N}]$ as $N$ block matrices of the same number of rows arranged in one row vector, and $[a_{1};\cdots;a_{N}]=[a_{1},\cdots,a_{N}]^{\prime}$ as $N$ block matrices of the same number of columns arranged in one column vector. Let $\mathbf{I}_{r},\mathbf{0}_{m,n}$ be the $r\times r$ identity matrix and the $m\times n$ zero matrix, respectively. The superscript $k\in\mathcal{K}$ is the stage index and the subscript $i\in\mathcal{N}$ is the player index. We omit a function’s arguments when there is no ambiguity, e.g., $S_{i}^{k}:=S_{i}^{k}({\beta}_{i}^{k},\theta_{i})$ . A piece of information for a group of players is called common knowledge if all players know it, all players know that all players know it, and so on ad infinitum. We summarize main notations in Table I.

The rest of paper is organized as follows. Section II introduces the dynamic game of incomplete information and the solution concept of PBNE. To obtain explicit and practical solutions, we consider a class of a linear-quadratic problems in Section III and obtain a set of extended Riccati equations. We present a case study of deceptive pursuit-evasion in Section IV and Section V concludes the paper.

II Dynamic Game with Private Types

We model deception as a $K$ -stage game consisting of $N$ robots as players and each robot has asymmetric information. Let $\mathcal{N}:=\{1,\cdots,N\}$ be the set of $N$ players and $\mathcal{K}:=\{0,1,2,\allowbreak\cdots,K\}$ be the set of $K$ discrete stages. Private information of player $i\in\mathcal{N}$ , i.e., his type $\theta_{i}$ , is modeled as the realization of a discrete random variable with a finite support $\Theta_{i}:=\{\theta_{i}^{1},\theta_{i}^{2},\allowbreak\cdots,\theta_{i}^{N_{i}}\}$ and a prior probability distribution $\Xi_{i}(\cdot)$ . Hence, $N_{i}$ is the number of possible types for player $i$ and $\Xi_{i}(\theta_{i})$ is the probability that player $i$ ’s type is $\theta_{i}$ . Define shorthand notation $\Xi:=[\Xi_{i}]_{i\in\mathcal{N}}$ and let $\Theta_{-i}:=\prod_{j\in\mathcal{N}\setminus\{i\}}\Theta_{j}$ be the set of types of all players except for player $i\in\mathcal{N}$ . Each player $i$ knows the value of his own type $\theta_{i}$ , but does not know the values of other players’ types $\theta_{-i}:=[\theta_{j}]_{j\in\mathcal{N}\setminus\{i\}}\in\Theta_{-i}$ , throughout $K$ stages of the game. The system state dynamics under $N$ players’ joint action $u^{k}:=[u_{1}^{k},\cdots,u_{N}^{k}]$ , joint type $\theta:=[\theta_{1},\cdots,\theta_{N}]$ , and an additive external noise $w^{k}\in\mathbb{R}^{n\times 1}$ are shown in (1):

[TABLE]

The dynamics in (1) can have different interpretations based on applications. In the pursuit-evasion scenario as in [5], $x_{i}^{k}\in\mathbb{R}^{n_{i}\times 1}$ represents robot $i$ ’s local states such as its location and speed. The system state $x^{k}\in\mathbb{R}^{n\times 1}$ can be explicitly represented by $N$ robots’ joint state $[x_{1}^{k},\cdots,x_{N}^{k}]$ with $n=\sum_{{i}=1}^{N}n_{{i}}$ . In the application where $N$ robots cooperatively transport a payload, e.g., [38, 6], system state $x^{k}\in\mathbb{R}^{n\times 1}$ represents the payload’s location and posture, which does not explicitly relate to robots’ local states. The noise sequence $[w^{k}]_{k\in\mathcal{K}}$ assumed to be independent with probability density function $\Xi_{w}(\cdot)$ , i.e., $\mathbb{E}_{w^{k},w^{h}\sim\Xi_{w}}[w^{k}(w^{h})^{\prime}]=0,\forall k\in\mathcal{K},h\in\mathcal{K}\setminus\{k\}$ . The noise is not necessarily Gaussian distributed but is assumed to have a zero mean, i.e., $\mathbb{E}_{w^{k}\sim\Xi_{w}}[w^{k}]=0,\forall k\in\mathcal{K}$ . We assume that system dynamics (1) are multi-agent controllable as defined in Definition 1 so that players can design their deceptive actions to reach the entire state space in finite stages.

Definition 1 (Multi-Agent Controllability).

System dynamics (1) are called multi-agent controllable if for any target state $x^{k}\in\mathbb{R}^{n\times 1}$ at stage $k\in\mathcal{K}\setminus\{0\}$ , initial state $x^{0}\in\mathbb{R}^{n\times 1}$ , and joint type $\theta\in\Theta$ , there exists a sequence of finite joint actions $u^{0:k}$ that drive the system state from $x^{0}$ to $x^{k}$ in expectation.

II-A Forward Belief Dynamics

At each stage $k\in\mathcal{K}$ , the information available to player $i$ compromises all players’ state history $h^{k}:=[x^{0},\cdots,x^{k}]\in\mathcal{H}^{k}$ as well as his own type value $\theta_{i}$ . Define $\Delta(\Theta_{-i})$ as the set of probability distributions over set $\Theta_{-i}$ . Each player $i$ at stage $k$ forms a belief $l_{i}^{k}:\mathcal{H}^{k}\times\Theta_{i}\mapsto\bigtriangleup\Theta_{-i}$ based on his available information. Thus, $l_{i}^{k}(\cdot|h^{k},\theta_{i})$ is a probability measure of other players’ types, i.e., $\sum_{{\theta}_{-i}\in\Theta_{-i}}l_{i}^{k}({\theta}_{-i}|h^{k},\theta_{i})=1,\forall h^{k}\in\mathcal{H}^{k},\theta_{i}\in\Theta_{i}$ .

Define a vector

[TABLE]

as player $i$ ’s belief state at stage $k\in\mathcal{K}$ . We assume that the set of belief states is independent of stages, i.e., $\beta_{i}^{k}\in\Lambda_{i}\subseteq[0,1]^{|\Theta_{-i}|\times|\Theta_{i}|}$ . Then, we can represent player $i$ ’s belief dynamics as

[TABLE]

Note that the belief transition function $\Gamma^{k}_{i}$ can be different for each $i$ and $k$ , i.e., players’ belief updates can be heterogeneous and time-varying. Define $\beta^{k}:=[\beta_{i}^{k}]_{i\in\mathcal{N}}\in\Lambda:=\prod_{i\in\mathcal{N}}\Lambda_{i}$ . In this work, we assume that the initial beliefs of all players of all types $\beta^{0}$ and the belief update rules $\Gamma_{i}^{k},\forall i\in\mathcal{N},\forall k\in\{0,\cdots,K-1\}$ , are common knowledge. In the next two subsections, we provide two specific forms of $\Gamma^{k}_{i}$ that rely on intrinsic and extrinsic information, respectively.

II-A1 Bayesian Belief Dynamics

The most common belief update rule $\Gamma^{k}_{i}$ in (2) for player $i$ at stage $k+1$ uses Bayesian inference. Given the knowledge of the sequential state observations $x^{k},x^{k+1}$ and all players’ actions $u^{k}$ , each player $i$ of type $\theta_{i}\in\Theta_{i}$ at stage $k+1$ can update his belief as follows: $\forall\theta_{-i}\in\Theta_{-i}$ ,

[TABLE]

In (3), we use the Markov property, i.e., $\Pr(x^{k+1}|\theta_{-i},h^{k},\theta_{i})=\Pr(x^{k+1}|\theta_{-i},x^{k},\theta_{i})=\Xi_{w}(x^{k+1}-f^{k}(x^{k},u^{k},\theta))$ . The denominator is positive as $w^{k}\in\mathbb{R}^{n\times 1}$ .

Remark 1 (Actions Reveal Type Information).

Even if the state dynamics $f^{k}$ in (1) are independent of $\theta_{j},\forall j\in\mathcal{N}\setminus\{i\}$ , player $i\in\mathcal{N}$ can still learn player $j$ ’ type via (3) as player $j$ ’s action $u^{k}_{j}$ is a function111Each player’s action is a function of his type as his cost is related to his type and the action aims to minimize his cost. of his type $\theta_{j}$ .

II-A2 Markov-Chain Belief Dynamics

In section II-A1, we assume that players can exploit the intrinsic information of state dynamics $f^{k}$ , state observations $x^{k},x^{k+1}$ , and the prediction of all players’ actions $u^{k}$ . Since the above intrinsic information may not be available in practice, we consider the belief dynamics with extrinsic information in this subsection. In particular, we assume that each player $i$ ’s belief dynamics $\beta_{i}^{k+1}:=\Gamma^{k}_{i}(\beta_{i}^{k},w^{k},\theta_{i}),\forall k\in\{0,\cdots,K-1\}$ , are a discrete-time Markov chain where the extrinsic information at stage $k$ is characterized by the transition function $\Gamma^{k}_{i}(\cdot,w^{k},\theta_{i})$ . Note that the transition function only characterizes how players update their beliefs at each stage yet does not guarantee that a player can learn the true types of others. The following example illustrates a class of players whose belief dynamics exhibit the confirmation bias [39] where players tend to ignore intrinsic evidence such as $u^{k}$ and preserve their belief update rules $\Gamma_{i}^{k}$ at each stage $k$ .

Example 1.

*Consider a two-person game $N=2$ where the first player has two types $N_{1}=2,\Theta_{1}=\{\theta_{1}^{1},\theta_{1}^{2}\}$ and the second player only has one type $N_{1}=1,\Theta_{2}=\{\theta_{2}^{1}\}$ . The second player’s belief state $\beta_{2}^{k}=[l_{2}^{k}(\theta_{1}^{1}|\theta_{2}^{1}),l_{2}^{k}(\theta_{1}^{2}|\theta_{2}^{1})]$ toward the first player’s type belongs to a finite set $\Lambda_{2}=\{[0.2,0.8],[0.5,0.5],[0.8,0.2]\}$ . The transition function $\Gamma_{2}^{k}$ is independent of $k$ : if the current belief state is $[0.5,0.5]$ , then the belief at the next stage is $[0.2,0.8],[0.5,0.5],$ or $[0.8,0.2]$ with probability $0.4,0.2,0.4$ , respectively. If the current belief state is $[0.8,0.2]$ (resp. $[0.2,0.8]$ ), then the belief at the next stage is $[0.8,0.2]$ (resp. $[0.2,0.8]$ ) or $[0.5,0.5]$ with probability $0.9$ and $0.1$ , respectively. The above transition function $\Gamma_{2}^{k}$ means that the second player tends to interpret the extrinsic information of the first player’s type based on his current belief. If the second player already believes that the first player is of type $\theta_{1}^{1}$ with a high probability of $0.8$ at stage $k$ , i.e., $\beta_{2}^{k}=[0.8,0.2]$ , then the second player is more inclined to enhance his current belief, i.e., his belief state at the next stage, i.e., $\beta_{2}^{k+1}$ , will remain to be $[0.8,0.2]$ with a high probability of $0.9$ . The above transition function represents the phenomena of attitude polarization and confirmation bias where players preserve their existing beliefs and the disagreement becomes more extreme at each stage even when players are exposed to the same evidence. *

II-B Nonzero-Sum Cost Function and Equilibrium Concept

At non-terminal stage $k\in\mathcal{K}\setminus\{K\}$ , player $i$ ’s cost function is $g_{i}^{k}:\mathbb{R}^{n\times 1}\times\prod_{{j}=1}^{N}\mathbb{R}^{m_{{j}}\times 1}\times\Theta_{i}\mapsto\mathbb{R}$ . The final stage cost is $g_{i}^{K}:\mathbb{R}^{n\times 1}\times\Theta_{i}\mapsto\mathbb{R}$ . Define $u_{i}^{k_{0}:K-1}:=[u_{i}^{k_{0}},\cdots,u_{i}^{K-1}]$ as player $i$ ’s action sequence from stage $k_{0}$ to $K-1$ and $u^{k_{0}:K-1}:=[u_{i}^{k_{0}:K-1},u_{-i}^{k_{0}:K-1}]$ as player $i$ ’s and all other players’ action sequences from stage $k_{0}$ to $K-1$ . Player $i$ ’s expected cumulative cost from arbitrary initial stage $k_{0}\in\mathcal{K}$ to the terminal stage $K$ is defined as

[TABLE]

The expectations are taken first over the external noise sequence $w^{k}$ and then over other players’ internal type uncertainty.

We cannot exchange the order of these two expectations as $l_{i}^{k}$ is a function of $w^{k-1}$ . Each player $i$ at stage $k_{0}\in\mathcal{K}$ aims to minimize $J_{i}^{k_{0}}$ by choosing only his action sequence $u_{i}^{k_{0}:K-1}$ but not other players’ action sequence $u_{-i}^{k_{0}:K-1}$ . The following definition of sequential rationality in Definition 2 guarantees that each player $i$ has no motivation to deviate from the sequentially rational action at any stage $k\in\{k_{0},\cdots,K-1\}$ during the interaction if all other players adopt the sequentially rational actions.

Definition 2 (Sequential Rationality).

*An action sequence $u^{*,k_{0}:K-1}:=\{u_{i}^{*,k_{0}:K-1},u_{-i}^{*,k_{0}:K-1}\}$ is called sequentially rational for player $i$ under the belief sequence $l_{i}^{k_{0}:K-1}$ , state $x^{k_{0}}$ , and type $\theta_{i}$ , if for any state $x^{k}$ at stage $k\in\{k_{0},\cdots,K-1\}$ , player $i$ does not benefit from taking any other action sequence $u_{i}^{k:K-1}$ , i.e., $J_{i}^{{k}}(l_{i}^{k:K-1},u_{i}^{*,k:K-1},u_{-i}^{*,k:K-1},x^{k},\theta_{i})\leq J_{i}^{{k}}(l_{i}^{k:K-1},u_{i}^{k:K-1},u_{-i}^{*,k:K-1},x^{k},\theta_{i}),\forall u_{i}^{k:K-1}$ . *

Since players’ actions may affect their future beliefs as captured by the belief dynamics $\Gamma_{i}^{k}$ in (2), we further require the equilibrium action $u^{*,k_{0}:K-1}$ in Definition 2 to be consistent with the belief dynamics, which leads to the following definition of Perfect Bayesian Nash Equilibrium (PBNE).

Definition 3 (Perfect Bayesian Nash Equilibrium).

Consider the $N$ -player dynamic game of private types and asymmetric information defined by the state dynamics (1) and the expected cumulative cost (4). The action sequence $u^{*,0:K-1}:=\{u^{*,0:K-1}_{i},u^{*,0:K-1}_{-i}\}$ of $N$ players over $K$ stages compromises the Perfect Bayesian Nash Equilibrium (PBNE) if, regardless of each player $i$ ’s type $\theta_{i}\in\Theta_{i}$ , the following statements hold.

Sequential rationality*: $u^{*,0:K-1}$ is sequential rational for each player $i\in\mathcal{N}$ under his belief sequence $l_{i}^{*,0:K-1}$ ;* 2. 2.

Belief consistency*: each player $i$ ’s belief sequence $l_{i}^{*,0:K-1}$ is consistent with (2) under $u^{*,0:K-1}$ .*

Proposition 1.

It is sufficient to represent player $i$ ’s equilibrium cost $J_{i}^{{k}}(l_{i}^{*,k:K-1},u^{*,k:K-1},x^{k},\theta_{i})$ under the PBNE action $u^{*,k:K-1}$ at stage $k\in\mathcal{K}$ as a function of $\beta^{k}$ , $x^{k}$ and $\theta_{i}$ , which is defined as $V_{i}^{k}(\beta^{k},x^{k},\theta_{i})$ . Under the boundary condition $V_{i}^{K}(\beta^{K},x^{K},\theta_{i}):=g_{i}^{K}(x^{K},\theta_{i})$ , the following holds for all $k\in\{0,\cdots,K-1\}$ and all $x^{k}\in\mathbb{R}^{n\times 1},\beta^{k}\in\Lambda$ , i.e.,

[TABLE]

where $\beta^{k+1}$ and $x^{k+1}$ satisfy (2) and (1), respectively.

Proof.

According to the definition of PBNE, at the second last stage $k=K-1$ , each player $i$ ’s equilibrium action $u_{i}^{*,k}=arg\min_{u_{i}^{k}}\mathbb{E}_{\theta_{-i}\sim l_{i}^{k}}[g_{i}^{k}(x^{k},u^{k},\theta_{i})]+\mathbb{E}_{w^{k}\sim\Xi_{w}}[g_{i}^{K}(x^{K},\theta_{i})]$ is in general a function of $\theta_{i},x^{k},l_{i}^{*,k},u_{-i}^{*,k}$ . Due to the coupling between $u_{i}^{*,k}$ and $u_{-i}^{*,k}$ , we need to solve a set of system equations for all $i\in\mathcal{N}$ and $\theta_{i}\in\Theta_{i}$ . Then, $u_{i}^{*,k}$ will be a function of $\beta^{k},x^{k},\theta_{i}$ and we obtain (5) at stage $k=K-1$ . We can repeat the above procedure from $k=K-2$ to $k=0$ to obtain the recursive form in (5). ∎

Proposition 1 characterizes the structure of the equilibrium action $u_{i}^{*,k}$ and the equilibrium cost $V_{i}^{k}(\beta^{k},x^{k},\theta_{i})$ for each player $i$ of type $\theta_{i}$ under the solution concept of PBNE; i.e., both terms are feedback functions of the belief state $\beta^{k}$ , the physical state $x^{k}$ , and the player’ type $\theta_{i}$ . Although $J_{i}^{k}$ is a function of beliefs $l_{i}^{k:K-1}$ over all the remaining stages, $V_{i}^{k}(\beta^{k},x^{k},\theta_{i})$ only depends on the belief state at the current stage $k$ .

If all players’ types are common knowledge, PBNE still applies and we can define a new function $\bar{V}_{i}^{k}(x^{k},\theta)$ to represent the resulting equilibrium cost ${V}_{i}^{k}(\beta^{k},x^{k},\theta_{i})$ for all $k\in\mathcal{K}$ without loss of generality.

II-C Offline Evaluation of Equilibrium Cost

If each player $i$ ’s initial belief confirms to the prior distribution of other players’ types, i.e., $l_{i}^{0}(\theta_{j}|x^{0},\theta_{i})=\Xi_{j}(\theta_{j}),\forall\theta_{i}\in\Theta_{i},j\in\mathcal{N},\theta_{j}\in\Theta_{j},\forall x^{0}$ , then each player $i$ at system state $x^{0}$ with belief state $\beta^{0}$ can use his expected equilibrium cost $\mathbb{E}_{\theta_{i}\sim\Xi_{i}}[{V}_{i}^{0}(\beta^{0},x^{0},\theta_{i})]$ over his type uncertainty $\Xi_{i}$ as an offline performance measure of the equilibrium action $u^{*,0:K}$ . As a comparison, player $i$ ’s expected equilibrium cost $\mathbb{E}_{\theta\sim\Xi}[\bar{V}_{i}^{0}(x^{0},\theta)]$ under the complete information game serves as a benchmark. Note that player $i$ does not need to know the realization of the joint type $\theta$ to compute $\mathbb{E}_{\theta\sim\Xi}[\bar{V}_{i}^{0}(x^{0},\theta)]$ . Due to the coupling in dynamics, costs, and cognition among $N$ players, obtaining more information and knowing the type of another player $j\in\mathcal{N}\setminus\{i\}$ may not always improve player $i$ ’s performance; i.e., there is no guarantee that $\mathbb{E}_{\theta_{i}\sim\Xi_{i}}[{V}_{i}^{0}(\beta^{0},x^{0},\theta_{i})]\geq\mathbb{E}_{\theta\sim\Xi}[\bar{V}_{i}^{0}(x^{0},\theta)]$ . Besides the above performance evaluation for an individual player $i\in\mathcal{N}$ under deception, we may also aim to evaluate the overall performance of multiple players or all $N$ players. We define the Price of Deception (PoD) in Definition 4 with a set of coefficients $\eta_{i}\in[0,1],\forall i\in\mathcal{N},\sum_{i\in\mathcal{N}}\eta_{i}=1$ . Since the equilibrium cost can be negative, we let $\eta_{0}(\Xi):=-\min(0,\allowbreak\{\mathbb{E}_{\theta_{i}\sim\Xi_{i}}[{V}_{i}^{0}(\beta^{0},x^{0},\theta_{i})]\}_{i\in\mathcal{N}},\allowbreak\{\mathbb{E}_{\theta\sim\Xi}[\bar{V}_{i}^{0}(x^{0},\theta)]\}_{i\in\mathcal{N}})$ be the normalizing constant to guarantee that $p^{\eta}(\Xi)$ is non-negative for all chosen coefficients $\eta_{i},i\in\mathcal{N}$ .

Definition 4 (Price of Deception).

For a given set of coefficients $\eta:=\{\eta_{i}\}_{i\in\mathcal{N}\cup\{0\}}$ , the Price of Deception (PoD) of the $N$ -player $K$ -stage game defined by (1), (4), and (2) under the prior probability distribution $\Xi=[\Xi_{i}]_{i\in\mathcal{N}}$ is

[TABLE]

The PoD is a crucial evaluation and design metric. We can endow PoD with different meanings by properly choosing the weighting coefficients $\eta_{i},i\in\mathcal{N}$ . For example, if besides $N$ players, there is a central planner who aims to minimize the total cost of all $N$ players under their deceptive interaction. Then, we can pick $\eta_{i}=1/N,i\in\mathcal{N}$ , to represent the overall system performance. Although the central planner cannot control players’ state dynamics, costs, and belief dynamics directly, he can still affect their deceptive interaction if he can design the prior probability distribution $\Xi$ of the joint type $\theta$ . If the central planner instead only aims to reduce the cost of one player $j\in\mathcal{N}$ , then we can pick $\eta_{j}=1$ and $\eta_{h}=0,\forall h\in\mathcal{N}\setminus\{j\}$ . With a given weighting parameters $\eta$ , a larger value of $p_{\eta}(\Xi)$ indicates a better accomplishment of the above goals. Note that individual deception may improve the system performance, i.e., $p^{\eta}(\Xi)>1$ .

III Linear-Quadratic Specification

Linear-Quadratic (LQ) game is an important class of dynamic games. They can also be applied iteratively to approximate nonlinear stochastic systems with general cost functions and obtain equilibrium actions [40]. In the following sections, we consider linear state dynamics

[TABLE]

with stage-varying matrices $A^{k}(\theta)\in\mathbb{R}^{n\times n}$ , $B^{k}_{i}(\theta_{i})\in\mathbb{R}^{n\times m_{i}}$ .

Remark 2.

System (6) is multi-agent controllable if and only if matrices $H_{i}^{k}(\theta):=[B_{i}^{k-1}(\theta_{i}),\allowbreak\cdots,\prod_{{h}=2}^{k-1}A^{{h}}(\theta)B_{i}^{1}(\theta_{i}),\allowbreak\prod_{{h}=1}^{k-1}A^{{h}}(\theta)B_{i}^{0}(\theta_{i})],\forall i\in\mathcal{N},\forall\theta\in\Theta,\forall k\in\mathcal{K}$ , are of full rank as noise $w^{k}$ has zero mean and we can obtain $\mathbb{E}[x^{k}]=\prod_{{h}=0}^{k-1}A^{{h}}(\theta)x^{0}\allowbreak+\sum_{{r}=1}^{N}H_{{r}}^{k}(\theta)[u_{{r}}^{k-1};\cdots;u_{{r}}^{0}]$ by induction.

Each player $i$ ’s cost is quadratic in both $x^{k}$ and $u^{k}$ ; i.e.,

[TABLE]

where $[\hat{x}_{i}^{k}(\theta_{i})]_{k\in\mathcal{K}}$ is a known type-dependent reference trajectory for player $i\in\mathcal{N}$ and $\hat{f}_{i}^{k}$ is a known function of $\hat{x}_{i}^{k}(\theta_{i})$ . The cost matrices $D_{i}^{k}(\theta_{i})\in\mathbb{R}^{n\times n},F_{ij}^{k}(\theta_{i})\in\mathbb{R}^{m_{i}\times m_{i}},\forall i,j\in\mathcal{N},k\in\mathcal{K}$ , are symmetric. At the final stage, $F^{K}_{ij}(\theta_{i})\equiv\mathbf{0}_{m_{i},m_{i}},\forall i,j\in\mathcal{N},\forall\theta_{i}\in\Theta_{i}$ . We introduce the following three sets of notations for the belief matrix, the extended Riccati equations, and the matrix-form equilibrium action, respectively.

Belief Matrix

With a little abuse of notation, we can define the marginal probability $l_{i}^{k}({\theta}_{j}|h^{k},\theta_{i}):=\sum_{\theta_{r}\in\Theta_{r},r\in\mathcal{N}\setminus\{i,j\}}l_{i}^{k}(\theta_{-i}|h^{k},\theta_{i}),\allowbreak\forall j\in\mathcal{N}\setminus\{i\}$ , as the player $i$ ’s belief toward the player $j$ ’s type at stage $k$ . Define the belief matrix for all $i\in\mathcal{N},j\in\mathcal{N}\setminus\{i\},k\in\{0,\cdots,K-1\}$ , as

[TABLE]

where each block element $\mathbf{L}_{i}^{k}(\theta_{j}^{r}|h^{k},\theta_{i}^{h})=\operatorname{Diag}[{l}_{i}^{k}(\theta_{j}^{r}|h^{k},\theta_{i}^{h}),\allowbreak\cdots,\allowbreak{l}_{i}^{k}(\theta_{j}^{r}|h^{k},\theta_{i}^{h})]\in\mathbb{R}^{n\times n},\forall r\in\{1,\cdots,N_{j}\},\forall h\in\{1,\cdots,N_{i}\}.$ Since all its elements are positive and all rows sum to one, the belief matrix $\mathbf{L}_{ij}^{k}$ is a right stochastic matrix.

Extended Riccati Equations

Let a sequence of symmetric matrices $S_{i}^{k}(\beta^{k},\theta_{i})\in\mathbb{R}^{n\times n}$ , vectors $N_{i}^{k}(\beta^{k},\theta_{i})\in\mathbb{R}^{n\times 1}$ , and scalars $q_{i}^{k}(\beta^{k},\theta_{i})\in\mathbb{R}$ satisfy the following extended Riccati equations for all $\beta^{k}\in\Lambda,i\in\mathcal{N},\theta_{i}\in\Theta_{i},k\in\{0,\cdots,K-1\}$ :

[TABLE]

where functions $\Psi^{1,k}_{i},\Psi^{2,k}_{i},\forall i\in\mathcal{N}$ , are defined below. The boundary conditions of the extended Riccati equations are

[TABLE]

Equilibrium Action in Matrix Form

We need to represent the equilibrium action of all players under all types in matrix form as each player’s action is coupled with other players’ actions under PBNE. Since each player $i$ has different equilibrium actions under different types, with a little abuse of notation, we write each player $i$ ’s action as a function of his type $\theta_{i}$ and define two action vectors $\mathbf{u}_{i}^{k}:=[u_{i}^{k}(\theta^{1}_{i}),\cdots,u_{i}^{k}(\theta^{N_{i}}_{i})]^{\prime}\in\mathbb{R}^{m_{i}N_{i}\times 1}$ and $\mathbf{u}^{k}:=[\mathbf{u}_{1}^{k},\mathbf{u}_{2}^{k}\cdots,\mathbf{u}_{N}^{k}]^{\prime}\in\mathbb{R}^{\sum_{{{r}}=1}^{N}m_{{r}}N_{{r}}\times 1}$ . For all $i\in\mathcal{N},l_{i}^{k},\theta_{i}\in\Theta_{i},k\in\{0,\cdots,K-1\}$ , define a series of $(m_{i})$ -by- $(m_{i})$ square matrices

[TABLE]

Let $\mathbf{B}_{i}^{k}:=\operatorname{Diag}[B_{i}^{k}(\theta_{i}^{1})\cdots,B_{i}^{k}(\theta_{i}^{N_{i}})]$ be $(N_{i}n)$ -by- $(N_{i}m_{i})$ block matrices and $\mathbf{S}_{i}^{k}(\beta^{k}):=\operatorname{Diag}[S_{i}^{k}(\beta^{k},\theta_{i}^{1}),\cdots,S_{i}^{k}(\beta^{k},\theta_{i}^{N_{i}})]$ be $(N_{i}n)$ -by- $(N_{i}n)$ block matrices. Finally, define parameter matrices $\mathbf{W}^{1,k}(\beta^{k})=[W^{1,k}_{1}(\beta^{k});\cdots;W^{1,k}_{N}(\beta^{k})]\in\mathbb{R}^{\sum_{{{r}}=1}^{N}m_{{r}}N_{{r}}\times n}$ , $\mathbf{W}^{2,k}(\beta^{k})=[W^{2,k}_{1}(\beta^{k});\cdots;W^{2,k}_{N}(\beta^{k})]\in\mathbb{R}^{\sum_{{r}=1}^{N}m_{{r}}N_{{r}}\times 1}$ , and $\mathbf{W}^{0,k}(\beta^{k}):=[W^{0,k}_{ij}(\beta^{k})\in\mathbb{R}^{m_{i}N_{i}\times m_{j}N_{j}}]_{i,j\in\mathcal{N}}$ for any $\beta^{k}\in\Lambda$ . Their elements are given as follows; i.e., $\forall i\in\mathcal{N},\forall k\in\{0,\cdots,K-1\}$ ,

[TABLE]

Let matrix $\mathbf{M}_{i}^{k}(\beta^{k},\theta_{i}^{l})\in\mathbb{R}^{m_{i}\times\sum_{{r}=1}^{N}m_{{r}}N_{{r}}},l\in\{1,2,\cdots,N_{i}\},i\in\mathcal{N},k\in\{0,\cdots,K-1\}$ , be the truncated row block, i.e., from row $\sum_{{r}=1}^{i-1}m_{{r}}N_{{r}}+m_{i}(l-1)$ to $\sum_{{r}=1}^{i-1}m_{{r}}N_{{r}}+m_{i}l$ , of matrix $(-\mathbf{W}^{0,k}(\beta^{k}))^{-1}$ . Define shorthand notations $\Psi^{1,k}_{i}(\beta^{k},\theta_{i}):=\mathbf{M}_{i}^{k}(\beta^{k},\theta_{i})\mathbf{W}^{1,k}(\beta^{k})$ and $\Psi^{2,k}_{i}(\beta^{k},\theta_{i}):=\mathbf{M}_{i}^{k}(\beta^{k},\theta_{i})\mathbf{W}^{2,k}(\beta^{k})$ .

III-A Extrinsic Belief Dynamics and Extended Riccati Equations

In this section, we focus on the extrinsic belief dynamics where $\Gamma_{i}^{k}$ is independent of players’ actions $u^{k}$ for all $i\in\mathcal{N},k\in\{0,\cdots,K-1\}$ . The proof of Theorem 1 generalizes the one of classical LQ games (e.g., Chapter $5.5$ and $6.2$ in [41]) where we further incorporate players’ asymmetric belief dynamics into their objective functions to minimize their expected costs under deception. We apply dynamic programming from stage $K-1$ backward to stage [math] to obtain a closed-form solution of PBNE.

Theorem 1.

An $N$ -player $K$ -stage LQ game of incomplete information defined by (6), (7), and extrinsic belief dynamics $\beta^{k+1}_{i}=\Gamma_{i}^{k}(\beta^{k}_{i},w^{k},\theta_{i}),\forall i\in\mathcal{N},\forall k\in\{0,\cdots,K-1\}$ , admits a unique state-feedback PBNE

[TABLE]

if and only if $R_{i}^{k}(\beta^{k},\theta_{i})$ is positive definite and $\mathbf{W}^{0,k}(\beta^{k})$ is non-singular for all $\beta^{k}\in\Lambda,i\in\mathcal{N},\theta_{i}\in\Theta_{i},k\in\{0,\cdots,K-1\}$ . The equilibrium cost $V_{i}^{k}$ is quadratic in $x^{k}$ , i.e.,

[TABLE]

Proof.

We use backward induction to prove the result. At the final stage $K$ , the value function $V_{i}^{K}(\beta^{K},x^{K},\theta_{i})=(x^{K}-\hat{x}_{i}^{K}(\theta_{i}))^{\prime}D_{i}^{K}(\theta_{i})(x^{K}-\hat{x}_{i}^{K}(\theta_{i}))+\hat{f}_{i}^{K}(\hat{x}_{i}^{K}(\theta_{i}))$ is quadratic in $x^{K}$ and we obtain the boundary conditions for $S_{i}^{K},N_{i}^{K},q_{i}^{K}$ in (12) by matching the RHS of (14). At any stage $k\in\{0,\cdots,K-1\}$ , if (14) is true at stage $k+1$ , we can expand $\mathbb{E}_{w^{k}\sim\Xi_{w}}[V_{i}^{k+1}(\beta^{k+1},x^{k+1},\theta_{i})]$ by plugging in the state dynamics $x^{k+1}=A^{k}(\theta)x^{k}+\sum_{i=1}^{N}B^{k}_{i}(\theta_{i})u_{i}^{k}+w^{k}$ and the belief dynamics $\beta^{k+1}_{i}=\Gamma_{i}^{k}(\beta^{k}_{i},w^{k},\theta_{i})$ . Then, the Right-Hand Side (RHS) of (5) is quadratic in $u_{i}^{k}$ for each player $i$ . If the coefficient matrix $R_{i}^{k}$ of the quadratic form $(u_{i}^{k})^{\prime}R_{i}^{k}u_{i}^{k}$ is positive definite, then the first-order necessary conditions for minimization are also sufficient and we obtain the following unique set of equations for the equilibrium action $u^{*,k}$ by differentiating the RHS of (5) and setting it to zero, i.e., $\forall\theta_{i}\in\Theta_{i}$ ,

[TABLE]

Due to the coupling in players’ actions and beliefs, we rewrite (15) in matrix form, i.e., $-\mathbf{W}^{0,k}(\beta^{k})\mathbf{u}^{*,k}=\mathbf{W}^{1,k}(\beta^{k})x^{k}+\mathbf{W}^{2,k}(\beta^{k})$ , to solve the set of equations. Given the existence of $(-\mathbf{W}^{0,k}(\beta^{k}))^{-1}$ , each player $i$ ’s equilibrium action is an affine function in $x^{k}$ , i.e., $u_{i}^{*,k}(\beta^{k},x^{k},\theta_{i})=\Psi^{1,k}_{i}(\beta^{k},\theta_{i})x^{k}+\Psi^{2,k}_{i}(\beta^{k},\theta_{i})$ . Note that the coefficients $\Psi^{1,k}_{i},\Psi^{2,k}_{i}$ for player $i$ are functions of $\beta^{k}$ , i.e., the beliefs of all players under all types at stage $k$ .

Finally, after substituting the equilibrium action $u_{i}^{*,k}(\beta^{k},x^{k},\theta_{i})=\Psi^{1,k}_{i}(\beta^{k},\theta_{i})x^{k}+\Psi^{2,k}_{i}(\beta^{k},\theta_{i})$ into the RHS of (5) and representing $V_{i}^{k}$ in the Left-Hand Side (LHS) in its quadratic form of $x^{k}$ , we can match the coefficients of quadratic, linear, and constant terms in the LHS and RHS to obtain the extended Riccati equations (9), (10), and (11). ∎

Remark 3 (Positive Definiteness).

If $D_{i}^{k}(\theta_{i})$ and $F_{ij}^{k}(\theta_{i}),\forall j\in\mathcal{N}$ , are positive definite for all $k\in\mathcal{K}$ , then $R_{i}^{k}(\beta^{k},\theta_{i})$ is positive definite for all $k\in\mathcal{K},\beta^{k}\in\Lambda$ , because the linear combination of positive definite matrices in (9) preserves positive definiteness. Note that the above condition is only a necessary condition; i.e., $D_{i}^{k}$ and $F_{ij}^{k}$ do not need to be positive definite to make $R_{i}^{k}$ positive definite as shown in Section IV.

Remark 4 (Cognitive Coupling).

Compared with the classical LQ games (e.g., Chapter $6$ in [41]), the deception of players’ types results in a unique feature of cognitive coupling represented by the belief matrix in (8); i.e., each player’s action hinges on not only his own belief but also all other players’ beliefs as these beliefs can affect their actions and further the outcome of the interaction. Thus, player $i$ can change other players’ actions by manipulating their beliefs of his type $\theta_{i}$ , i.e., $l_{j}^{k},\forall j\in\mathcal{N}\setminus\{i\},$ or making them believe that his belief $l_{i}^{k}$ on their types $\theta_{-i}$ has changed.

We introduce matrix block partitions as follows. For each type $\theta_{i}\in\Theta_{i}$ , we divide $A^{k}(\theta),D_{i}^{k}(\theta_{i}),S_{i}^{k}(\theta_{i})$ into $N$ -by- $N$ blocks where the $(i,i)$ block is $A_{i}^{k}(\theta),\bar{D}_{i}^{k}(\theta_{i}),\bar{S}_{i}^{k}(\theta_{i})\in\mathbb{R}^{n_{i}\times n_{i}}$ , respectively. The $i$ - ${th}$ row block of $N_{i}^{k}(\theta_{i}),\hat{x}_{i}^{k}(\theta_{i})$ is $\bar{N}_{i}^{k}(\theta_{i}),\bar{x}_{i}^{k}(\theta_{i})\in\mathbb{R}^{n_{i}\times 1}$ , respectively. The $i$ - ${th}$ row block of $B_{i}^{k}(\theta_{i})$ is $\bar{B}_{i}^{k}(\theta_{i})\in\mathbb{R}^{n_{i}\times m_{i}}$ .

When the system state $x^{k}$ can be represented by players’ joint states $[x_{i}^{k}]_{i\in\mathcal{N}}$ , Corollary 1 shows that the LQ game of asymmetric information degenerates to an LQ control problem if players have decoupled cost and state dynamics defined as follows.

Definition 5 (Decoupled Dynamics and Cost).

Player $i\in\mathcal{N}$ has decoupled dynamics if for all $k\in\mathcal{K}$ , $A_{i}^{k}(\theta)=\bar{A}_{i}^{k}(\theta_{i}),\forall\theta\in\Theta$ , while all other elements in the $i$ - ${th}$ row block and the $i$ - ${th}$ column block of $A^{k}(\theta)$ are [math]. Besides, all elements of $B_{i}^{k}(\theta_{i})$ except for the row block $\bar{B}_{i}^{k}(\theta_{i})$ are required to be [math]. Player $i\in\mathcal{N}$ has a decoupled cost if for all stage $k\in\mathcal{K}$ , $F^{k}_{ij}(\theta_{i})=\mathbf{0}_{m_{i},m_{i}},\forall\theta_{i}\in\Theta_{i},j\in\mathcal{N}\setminus\{i\}$ , and all elements of $D_{i}^{k}(\theta_{i})$ equal [math] except for $\bar{D}_{i}^{k}(\theta_{i})$ .

Corollary 1 (Degeneration to LQ Control).

If $x^{k}=[x_{i}^{k}]_{i\in\mathcal{N}}$ for all stage $k\in\mathcal{K}$ and player $i$ has both decoupled cost and state dynamics, then his action under PBNE is independent of other players’ actions, types, and beliefs, i.e., $u_{i}^{*,k}=-(R_{i}^{k})^{-1}(\bar{B}_{i}^{k})^{\prime}\bar{S}_{i}^{k+1}A_{i}^{k}x_{i}^{k}-\frac{1}{2}(R_{i}^{k})^{-1}(\bar{B}_{i}^{k})^{\prime}\bar{N}_{i}^{k+1}$ , where $R_{i}^{k}=F_{ii}^{k}+(\bar{B}_{i}^{k})^{\prime}\bar{S}_{i}^{k+1}\bar{B}_{i}^{k}$ , $(G_{i}^{k})^{\prime}=\mathbf{I}_{n}-\bar{S}_{i}^{k+1}\bar{B}_{i}^{k}(R_{i}^{k})^{-1}(\bar{B}_{i}^{k})^{\prime}$ , $\bar{S}_{i}^{k}=(A_{i}^{k})^{\prime}(G_{i}^{k})^{\prime}\bar{S}_{i}^{k+1}A_{i}^{k}+\bar{D}_{i}^{k}$ , and $\bar{N}_{i}^{k}=(A_{i}^{k})^{\prime}(G_{i}^{k})^{\prime}\bar{N}_{i}^{k+1}-2\bar{D}_{i}^{k}\bar{x}^{k}_{i}$ .

Proof.

We show by induction that $S_{i}^{k},N_{i}^{k},\forall k\in\mathcal{K},$ satisfy the sparsity condition that only the $(i,i)$ block of $S_{i}^{k}$ and the $i$ - ${th}$ row block of $N_{i}^{k}$ are nonzero. At stage $K$ , $S_{i}^{K}=D_{i}^{K}$ and $N_{i}^{K}=-2D_{i}^{K}\hat{x}_{i}^{K}$ satisfy the above condition. At stage $k\in\{0,\cdots,K-1\}$ , if $S_{i}^{k+1},N_{i}^{k+1}$ satisfy the sparsity condition, $\mathbf{W}^{0,k}(\beta^{k})$ becomes a diagonal block matrix where $W^{0,k}_{ij}(\beta^{k})=\mathbf{0}_{m_{i}N_{i},m_{j}N_{j}}$ and $\mathbf{M}_{i}^{k}(\beta^{k},\theta_{i})=-(R_{i}^{k}(\beta^{k},\theta_{i}))^{-1}$ for all $\beta^{k}\in\Lambda$ . Then, $S_{i}^{k},N_{i}^{k}$ satisfy the condition based on (9) and (10). ∎

III-B Intrinsic Belief Dynamics and Receding-Horizon Control

If there exists a player $i\in\mathcal{N}$ whose belief dynamics $\Gamma_{i}^{k}$ depend on intrinsic information at some stage $k\in\{0,\cdots,K-1\}$ as shown in (2), then the equilibrium action $u_{i}^{*,k}$ is in general a nonlinear function of $x^{k}$ and the equilibrium cost $V_{i}^{k}$ is not quadratic in $x^{k}$ even under the LQ setting of (6) and (7). Besides the static cognitive coupling among $N$ players in Remark 4, the intrinsic information of $u^{k}$ in the belief update introduces another dynamic cognitive coupling between the forward belief dynamics via (2) and the backward equilibrium computation via (5), which makes it challenging to compute PBNE. To reduce the computational complexity and further obtain implementable actions, we adopt a receding-horizon approach that computes the sequentially rational action sequence of all the future stages $u^{*,k:K-1}$ at current stage $k\in\{0,\cdots,K-1\}$ assuming $\beta^{\bar{k}}=\beta^{k},\forall\bar{k}\in\{k,...,K-1\}$ , yet only implements the current-stage action $u^{*,k}$ . Then, at the new stage $k+1$ , each player observes the new system state $x^{k+1}$ and updates the belief to $\beta^{k+1}$ and recomputes the entire action sequence $u^{*,k+1:K-1}$ under assumption of $\beta^{\bar{k}}=\beta^{k+1},\forall\bar{k}\in\{k+1,...,K-1\}$ , yet still only implements the new current-stage action $u^{*,k+1}$ . Players repeat the above procedure until they reach the final stage of the interaction.

Compared with PBNE, which produces an offline planning for all future stages under all possible scenarios before the game has taken place, the receding-horizon approach enables an online replanning of their actions repeatedly at the beginning of each new stage as the interaction continues.

Although we assume that players’ beliefs at the future stages are the same as the current beliefs during the phase of equilibrium computation, players can correct and update their beliefs and actions based on the online observation of $x^{k}$ during each replanning phase. Thus, the receding-horizon approach provides a reasonable approximation of the PBNE action and is more adaptive to unexpected environmental changes of the state dynamics $f^{k}$ and cost structure $g^{k}_{i},\forall i\in\mathcal{N}$ .

Under the LQ specification in (6) and (7) and Bayesian belief dynamics in (3), we summarize the computation phase and online implementation phase in Algorithm 1 and 2, respectively. To investigate the scalability of our algorithms, we analyze the temporal and spatial complexity concerning $N,K$ , and $N_{i}$ . To simplify the notation and enhance readability, we focus on the symmetric setting where $N_{i}=N_{0}\in\mathbb{Z}^{+},\forall i\in\mathcal{N}$ . For each player $i\in\mathcal{N}$ of type $\theta_{i}\in\Theta_{i}$ at the beginning of the interaction, i.e., $k=0$ , he needs to store the game parameters $A^{0},B_{{r}}^{0}({\theta}_{{r}}),D_{{r}}^{0}({\theta}_{{r}}),F_{{r}h}^{0}({\theta}_{{r}}),\forall{\theta}_{{r}}\in\Theta_{{r}}$ , and the belief matrix $\mathbf{L}_{{r}h}^{0}$ for all ${r},h\in\mathcal{N}$ , which are common knowledge. The spatial complexity to store the game parameters and the belief matrix is $O(N^{2}N_{0})$ and $O(N^{2}N_{0}^{2})$ , respectively. Note that in general, player $i$ has coupled cognition as shown in Remark 4 and has to keep track of not only his belief $\mathbf{L}_{i,j}^{k},\forall j\in\mathcal{N}$ , but also other players’ beliefs $\mathbf{L}_{{r},h}^{k},\forall{r}\in\mathcal{N}\setminus\{i\},h\in\mathcal{N}$ , to decide his equilibrium action under deception at each stage $k$ . During the $K$ -stage interaction, each player $i\in\mathcal{N}$ of type $\theta_{i}\in\Theta_{i}$ observes the system state $x^{k}$ and computes his equilibrium action $u_{i}^{*,k}(\beta^{k},x^{k},\theta_{i})$ at stage $k$ based on Algorithm 1. After all players implement their equilibrium actions at stage $k$ , the system state evolves to $x^{k+1}$ . Based on the new state observation $x^{k+1}$ , each player $i$ updates the belief matrix in (8) via (3). Since player $i$ can delete the game parameters and the belief matrices of previous stages, the spatial complexity remains the same as the real-time stage index $k$ increases. Thus, our algorithm can handle the interaction of long duration. All players repeat the above procedure stated in lines $14$ - $17$ of Algorithm 2 until reaching the terminal stage $k=K$ .

The computational complexity of the belief matrix update in the line $15$ of Algorithm 2 is $O(N_{0}^{N}N)$ . For any $\beta^{k}$ , the term $\mathbf{W}^{0,k}(\beta^{k})$ has computational complexity $O(N_{0}^{N}N)+O(N_{0}^{3}N^{2})$ , which is determined by the belief matrix update and the matrix chain multiplication of $W_{ij}^{0,k}(\beta^{k})$ , respectively. Then, the computational complexity of $(\mathbf{W}^{0,k}(\beta^{k}))^{-1}$ and $\mathbf{W}^{1,k}(\beta^{k})$ is $O(N_{0}^{N}N)+O(N_{0}^{3}N^{3})$ and $O(N_{0}^{N}N)+O(N_{0}^{3}N^{2})$ , respectively. Given $\beta^{k}$ and $\theta_{i}$ , the computational complexity of $S_{i}^{k}(\beta^{k},\theta_{i})$ in (9) is $O(N_{0}^{N}N)+O(N_{0}^{3}N^{3})+O(N_{0}^{3}N^{2})+O(N_{0}N)=O(\max{(N_{0}^{N}N,N_{0}^{3}N^{3})})$ , which hinges on the computational complexity of $\mathbf{M}_{i}^{k}(\beta^{k},\theta_{i})$ (or $(\mathbf{W}^{0,k}(\beta^{k}))^{-1}$ ), $\mathbf{W}^{1,k}(\beta^{k})$ , and the matrix chain multiplication in (9). Similarly, $N_{i}^{k}(\beta^{k},\theta_{i})$ and $W^{2,k}(\beta^{k})$ both have computational complexity of $O(N_{0}^{N}N)+O(N_{0}N)$ . Therefore, player $i$ ’s temporal complexity at each stage $k\in\{0,1,\cdots,K-1\}$ is

[TABLE]

The temporal complexity has the maximum value of $O(K\cdot\max{\{N_{0}^{N+1}N^{2},N_{0}^{4}N^{4}\}})$ at the initial stage $k=0$ where each player has to predict the entire $K$ future stages to act optimally under the deception. Since the temporal complexity decreases as the real-time stage index $k$ increases, a player who can compute the equilibrium action within the required time at the initial stage $k=0$ is guaranteed to meet the real-time requirement in the following stages of interaction. If the number of types and agents are on the same scale, e.g., $N_{0}=N$ , then $\lim_{N\rightarrow\infty}(N_{0}^{N+1}N^{2})/(N_{0}^{4}N^{4})\rightarrow\infty$ and the computation of belief matrix update plays a dominant role as each player keeps track of all players’ beliefs to obtain the equilibrium action under deception. If $N_{0}\ll N$ , e.g., $N_{0}=N^{1/N}$ , then $\lim_{N\rightarrow\infty}(N_{0}^{N+1}N^{2})/(N_{0}^{4}N^{4})\rightarrow 0$ and the inverse of $\mathbf{W}^{0,k}(\beta^{k})$ becomes the most time-consuming operation due to the coupling in dynamics, costs, and cognition.

Effective deception can prevent or delay other players from learning the deceiver’s private type. We define the criterion of successful learning of the deceiver’s type in Definition 6 and $\epsilon$ -deceviability and $\epsilon$ -learnability in Definition 7.

Definition 6 (Stage of Truth Revelation).

Consider two players $i,j\in\mathcal{N}$ with type $\theta_{i}$ and $\theta_{j}$ , respectively. Stage $k_{i,j}^{tr}\in\mathcal{K}\cup\{K+1\}$ is said to be player $i$ ’s truth-revealing stage with accuracy $\delta\in(0,1]$ 222Since the belief mismatch does not reduce to [math] in finite stages with initial belief $l_{i}^{0}\in(0,1)$ , the accuracy threshold $\delta\neq 0$ . if it satisfies the following two conditions.

•

The bounded mismatch condition*: player $i$ ’s belief mismatch remains less than $\delta$ after stage $k_{i,j}^{tr}\in\mathcal{K}$ , i.e.,*

[TABLE]

•

The first-hitting-time condition*: $k_{i,j}^{tr}\in\mathcal{K}$ is the first stage satisfying (16), i.e., $1-l^{k_{i,j}^{tr}-1}_{i}(\theta_{j}|h^{k_{i,j}^{tr}-1},\theta_{i})>\delta,k_{i,j}^{tr}>1.$ *

If there does not exist $k_{i,j}^{tr}\in\mathcal{K}$ that satisfies (16), we define $k_{i,j}^{tr}:=K+1$ . If there are only two players $N=2$ , we write $k_{i,j}^{tr}$ as $k_{i}^{tr}$ without ambiguity.

Due to deceivers’ deceptive actions and the external noises, the belief sequence may be fluctuant; i.e., there can exist $k<k_{i,j}^{tr}$ such that $1-l^{k}_{i}(\theta_{j}|h^{k},\theta_{i})\leq\delta$ . Thus, as shown in Definition 6, a player should only claim a successful learning of other players’ types if his belief mismatch remains less than $\delta$ for the remaining stages.

Definition 7 (Deceviability and Learnability).

Consider players $i,j\in\mathcal{N}$ with type $\theta_{i}$ and $\theta_{j}$ , thresholds $\delta\in(0,1],\epsilon\in[0,1]$ , and a given stage index $\tilde{k}\in\mathcal{K}\cup\{K+1\}$ . Player $i$ is $\tilde{k}$ -stage $\epsilon$ -deceivable if the probability $\Pr(k_{i,j}^{tr}<\tilde{k})$ , or equivalently $\Pr(l_{i}^{\tilde{k}}(\theta_{j}|x^{\tilde{k}},\theta_{i})>1-\delta)$ , is not greater than $\epsilon$ for all $l_{i}^{0}\in(0,1)$ . If the above does not hold, player $j$ ’s type is said to be $\tilde{k}$ -stage $\epsilon$ -learnable by player $i$ .

Since robot deception involves only a finite number of stages, it is essential that the deceived robot can learn the deceiver’s type as quickly as possible so that he has sufficient stages to plan on and mitigate the deception impact from the previous stages. Therefore, the definition of learnability, i.e., non-deceviability in Definition 7, not only requires the deceived player to be capable of learning the deceiver’s private information, but also learning it in a desirable rate, i.e., within $\tilde{k}$ stage. Due to the external noise, $k_{i,j}^{tr}$ is a random variable. Thus, the definition of learnability requires $\Pr(k_{i,j}^{tr}<\tilde{k})>\epsilon$ ; i.e., player $i$ has a large probability to correctly learn the type of player $j$ before stage $\tilde{k}$ .

IV Dynamic Target Protection under Deception

We investigate a pursuit-evasion scenario that contains

two UAVs with the decoupled linear time-invariant state dynamics, i.e., $A^{k}(\theta)=\mathbf{I}_{4},\bar{B}_{i}^{k}(\theta_{i})=[\tilde{B}_{i}(\theta_{i}),0;0,\tilde{B}_{i}(\theta_{i})]\in\mathbb{R}^{2\times 2},\forall k\in\mathcal{K}$ . We use ‘she’ for UAV $1$ , the pursuer, and ‘he’ for UAV $2$ , the evader. UAV $i$ ’s state $x_{i}^{k}:=[x^{k}_{i,x},x^{k}_{i,y}]^{\prime}\in\mathbb{R}^{2\times 1}$ represents $i$ ’s location $(x^{k}_{i,x},x^{k}_{i,y})$ in the $2D$ space, and action $u_{i}^{k}=[u_{i,x}^{k},u_{i,y}^{k}]\in\mathbb{R}^{2\times 1}$ affects $i$ ’s speed in $x$ and $y$ directions.

UAV $2$ as the evader selects either the harbor in ‘Normandy’ or ‘Calais’ as his final location based on his type $\theta_{2}\in\{\theta_{2}^{g},\theta_{2}^{b}\}$ . He aims to reach ‘Normandy’ located at $\gamma(\theta_{2}^{g}):=(x^{g},y^{g})$ in $K=40$ stages if his type is $\theta_{2}^{g}$ , otherwise ‘Calais’ located at $\gamma(\theta_{2}^{b}):=(x^{b},y^{b})$ if his type is $\theta_{2}^{b}$ . UAV $1$ as the pursuer can make interfering signals and aims to be close to UAV $2$ at the final stage to protect the harbor targeted by the evader, i.e., $g_{1}^{k}(x^{k},u^{k},\theta_{1})=d_{12}^{k}(\theta_{1})((x^{k}_{2,y}-x^{k}_{1,y})^{2}+(x^{k}_{2,x}-x^{k}_{1,x})^{2})+f_{11}^{k}(\theta_{1})((u_{1,x}^{k})^{2}+(u_{1,y}^{k})^{2})-f_{12}^{k}(\theta_{1})((u_{2,x}^{k})^{2}+(u_{2,y}^{k})^{2}),\forall k\in\mathcal{K}$ , where $d_{12}^{k}(\theta_{1})\in\mathbb{R}_{\geq 0}$ penalizes her distance from the evader at stage $k\in\mathcal{K}$ , $f_{11}^{k}(\theta_{1})\in\mathbb{R}_{\geq 0}$ prevents her from a high action cost, and $f_{12}^{k}(\theta_{1})\in\mathbb{R}_{\geq 0}$ incites her opponent, i.e., the evader, to take costly actions. We classify UAV $1$ into two types, i.e., $\Theta_{1}=\{\theta_{1}^{H},\theta_{1}^{L}\}$ , based on her maneuverability represented by the value of $\tilde{B}_{1}(\theta_{1})$ . Given higher maneuverability $\tilde{B}_{1}(\theta_{1}^{H})>\tilde{B}_{1}(\theta_{1}^{L})$ , the pursuer of type $\theta_{1}^{H}$ can obtain a higher speed under the same action $u_{1}^{k}$ and thus cover a longer distance.

The evader’s goals of deceptive target reaching and pursuit evasion are incorporated into the cost structure $g_{2}^{k}(x^{k},u^{k},\theta_{2})=d_{2,b}^{k}(\theta_{2})((x^{k}_{2,y}-y^{b})^{2}+(x^{k}_{2,x}-x^{b})^{2})+d_{2,g}^{k}(\theta_{2})((x^{k}_{2,y}-y^{g})^{2}+(x^{k}_{2,x}-x^{g})^{2})-d_{21}^{k}(\theta_{2})((x^{k}_{1,y}-x^{k}_{2,y})^{2}+(x^{k}_{1,x}-x^{k}_{2,x})^{2})+f_{22}^{k}(\theta_{2})((u_{2,x}^{k})^{2}+(u_{2,y}^{k})^{2})-f_{21}^{k}(\theta_{2})((u_{1,x}^{k})^{2}+(u_{1,y}^{k})^{2}),\forall k\in\mathcal{K}$ .

Similar to the pursuer’s cost parameters, $d_{21}^{k}(\theta_{2})\in\mathbb{R}_{\geq 0}$ represents the evader’s level of evasion determination to keep a distance from the pursuer along the trajectory. The action costs of the evader and the pursuer are regulated by $f_{22}^{k}(\theta_{2})\in\mathbb{R}_{\geq 0}$ and $f_{21}^{k}(\theta_{2})\in\mathbb{R}_{\geq 0}$ , respectively. The parameters $d_{2,b}^{k}(\theta_{2})$ and $d_{2,g}^{k}(\theta_{2})$ represent the evader’s attempt to head toward ‘Normandy’ and ‘Calais’, respectively, at stage $k\in\mathcal{K}$ under type $\theta_{2}\in\Theta_{2}$ . We use the ratio $d_{2,g}^{k}(\theta_{2})/d_{2,b}^{k}(\theta_{2})$ to represent the evader’s level of trajectory deception. Since the pursuer can learn the evader’s type based on the real-time observations of state $x_{2}^{k}$ , the evader attempts to make his target $\epsilon_{0}$ -ambiguous at all previous stages, i.e., $|d_{2,b}^{k}(\theta_{2})/d_{2,g}^{k}(\theta_{2})-1|\leq\epsilon_{0},\forall\theta_{2},\forall k\neq K$ , and reveal his true target only at the final stage, i.e., $d_{2,g}^{K}(\theta_{2}^{b})=0$ and $d_{2,b}^{K}(\theta_{2}^{g})=0$ . The evader chooses a small $\epsilon_{0}\geq 0$ and achieves the maximum ambiguity when $\epsilon_{0}=0$ . Two blue lines in Fig. 1(a) illustrate how the evader manages to remain ambiguous in a cost-effective manner from two different initial locations. Instead of keeping an equal distance to both potential targets, the evader heads toward the midpoint $((x^{g}+x^{b})/2,(y^{g}+y^{b})/2)$ at the early stages to confuse the pursuer. However, the evader starts to head toward the true target at around half of $K$ stages rather than the last few stages so that he can reach the target with a moderate control cost $(u^{k}_{2})^{\prime}F^{k}_{22}(\theta_{2})u^{k}_{2}$ . Fig. 1(a) also shows that for a given initial location, the evader who adopts a higher level of trajectory deception heads more toward the misleading target at the early stages.

In this case study, we suppose that the evader’s true target is Calais and let $\theta_{2}^{b}$ be his true type and $\theta_{2}^{g}$ be the misleading type.

The following two ratios capture the evader’s tradeoff of being deceptive, effective, and evasive. On one hand, the ratio $d_{2,b}^{k}(\theta_{2}^{b})/d_{2,b}^{K}(\theta_{2}^{b}),k\neq K$ , reflects the evader’s tradeoff between applying deception along the trajectory and staying close to the true target at the final stage.

Fig. 1(b) shows that as the evader focuses more on a deceptive trajectory represented by a larger value of $d_{2,b}^{k}(\theta_{2}^{b})/d_{2,b}^{K}(\theta_{2}^{b}),k\neq K$ , his trajectory remains ambiguous for longer stages while his final location is farther away from the true target. On the other hand, the ratio $d_{21}^{k}(\theta_{2}^{b})/d_{2,b}^{K}(\theta_{2}^{b}),k\neq K$ , reflects the evader’s tradeoff between evasion and target-reaching. As the evader focuses more on keeping a distance from the pursuer along the trajectory, he takes a bigger detour and stays farther away from his true target at the final stage as shown in Fig. 1(c).

Finally, we transform UAV $i$ ’s coupled cost $g_{i}^{k}$ into the matrix form given in Section III, i.e., $\hat{x}_{1}^{k}(\theta_{1})=\mathbf{0}_{4,1},\hat{f}_{1}^{k}(\hat{x}_{1}^{k}(\theta_{1}))=0,F^{k}_{ii}(\theta_{1})=f^{k}_{ii}(\theta_{1})\cdot\mathbf{I}_{2},F^{k}_{ij}(\theta_{1})=-f^{k}_{ij}(\theta_{1})\cdot\mathbf{I}_{2},j\neq i,D_{1}^{k}(\theta_{1})=d_{12}^{k}(\theta_{1})\cdot[1,0,-1,0;0,1,0,-1;-1,0,1,0;0,-1,0,1]$ ,

[TABLE]

$\hat{x}_{2}^{k}(\theta_{2})=\frac{1}{d_{2,b}^{k}+d_{2,g}^{k}}\cdot[d_{2,b}^{k}x^{b}+d_{2,g}^{k}x^{g}\>;\>d_{2,b}^{k}y^{b}+d_{2,g}^{k}y^{g}\>;\>d_{2,b}^{k}x^{b}+d_{2,g}^{k}x^{g}\>;\>d_{2,b}^{k}y^{b}+d_{2,g}^{k}y^{g}]$ , $\hat{f}_{2}^{k}(\hat{x}_{2}^{k}(\theta_{2}))=\frac{d_{2,b}^{k}d_{2,g}^{k}((x^{b}-x^{g})^{2}+(y^{b}-y^{g})^{2})}{d_{2,b}^{k}+d_{2,g}^{k}}$ .

IV-A Deceptive Evader with Decoupled Cost Structure

We first investigate the scenario where the evader has a decoupled cost structure333 This paper has supplementary downloadable materials available at http://ieeexplore.ieee.org, provided by the authors. This includes a video demo of two UAVs’ trajectories and belief updates under the decoupled structure. defined in Definition 5, i.e., $d^{k}_{21}(\theta_{2})=0,\allowbreak\forall\theta_{2}\in\Theta_{2},\allowbreak\forall k\in\mathcal{K}$ . According to Corollary 1, the evader’s trajectory is then independent of the pursuer’s action, type, and belief. Fig. 2 visualizes the pursuer’s trajectories. Although the pursuer only aims to be close to the evader at the final stage, she also takes proactive actions in the previous stages to be cost-efficient. If the pursuer knows the evader’s type, then she can head toward the true target directly and will not be misled by the evader’s trajectory ambiguity at the early stages as illustrated by the black dashed line in Fig. 2. If the evader’s type is private, then a larger initial belief mismatch $1-l_{1}^{0}(\theta_{2}^{b}|x^{0},\theta_{1}^{H})$ makes the pursuer head more toward the misleading target at the early stages as illustrated by the three solid lines in Fig. 2. However, due to the pursuer’s online learning, which is compatible, efficient, and robust as shown in Section IV-A1, she manages to approach the evader at the final stage regardless of her initial belief mismatch. Fig. 3 shows the pursuer’s $K$ -stage belief variation. The evader’s ambiguous trajectory results in belief fluctuations at the early stages, yet the pursuer can quickly reduce the belief mismatch when the evader starts to head toward the true target. After the pursuer has corrected her initial belief mismatch at around stage $k=16$ , she can head toward the true target in the cost-efficient way; i.e, she attempts to keep a uniform linear motion under the external noise as shown in the upper right region of Fig. 2.

IV-A1 Finite-Horizon Analysis of Bayesian Update

In this subsection, we illustrate the compatibility, efficiency, and robustness of the finite-horizon Bayesian update in (3) to reduce the initial belief mismatch. The pursuer is of high-maneuverability and the evader’s true type is $\theta_{2}^{b}$ . Define the likelihood function of $\theta_{2}^{b}$ and $\theta_{2}^{g}$ as $a^{k}:=\Pr(x^{k+1}|\theta_{2}^{b},x^{k},\theta_{1}^{H})$ and $c^{k}:=\Pr(x^{k+1}|\theta_{2}^{g},x^{k},\theta_{1}^{H})$ , respectively. As $w^{k}\in\mathbb{R}^{n\times 1}$ , $a^{k}$ and $c^{k}$ are positive. With an initial belief $l_{1}^{0}\in(0,1)$ and a finite likelihood ratio $e^{k}:=c^{k}/a^{k}\in(0,\infty)$ , we can represent (3) in the following form with three properties:

[TABLE]

(Compatibility): For all $l_{1}^{k}\in(0,1)$ , the belief update at stage $k$ is compatible to the evidence represented by the ratio $e^{k}$ . In particular, if $e^{k}<1$ , then $l_{1}^{k+1}>l_{1}^{k}$ ; if $e^{k}>1$ , then $l_{1}^{k+1}<l_{1}^{k}$ ; if $e^{k}=1$ , then $l_{1}^{k+1}=l_{1}^{k}$ . 2. 2.

(Efficiency): If the evidence of state observation $x^{k+1}$ indicates that the type is more likely to be the true type $\theta_{2}^{b}$ , i.e., $e^{k}<1$ , then the function $l_{1}^{k+1}/l_{1}^{k}=1/(l_{1}^{k}+(1-l_{1}^{k})e^{k})$ at stage $k$ is monotonically decreasing over $l_{1}^{k}$ . If the evidence indicates that the type is more likely to be the misleading type $\theta_{2}^{g}$ , i.e., $e^{k}>1$ , then the function $l_{1}^{k+1}/l_{1}^{k}$ is monotonically increasing over $l_{1}^{k}$ . 3. 3.

(Robustness): The order of the evidence sequence $e^{\bar{k}},\bar{k}=0,\cdots,k$ , has no impact on the belief $l_{1}^{k+1}$ .

Property one shows that although the external noise can result in the fluctuations of the belief update, the belief mismatch, i.e., $1-l_{1}^{k}$ , will decrease when $e^{k}<1$ , regardless of the prior belief $l_{1}^{k}\in(0,1)$ . Property two shows the efficiency of the belief update. The belief changes more under a larger belief mismatch, which results in a quick correction. Property three shows the robustness of the belief update. The erroneous belief update caused by a heavy noise can be corrected in the later stages when the noise fades.

IV-A2 Comparison with Heuristic Policies

We compare the proposed pursuer’s control policy with two heuristic ones to demonstrate its efficacy in counter-deception444 The supplementary materials include a video demo that compares the proposed policy’s trajectory and performance with two heuristic policies. . The first heuristic policy is to repeat the attacker’s trajectory with a one-stage delay; i.e., the pursuer applies the action so that $x_{1}^{k+1}=x_{2}^{k},\forall k\in\mathcal{K}\setminus\{K\}$ . The pursuer does not need to apply Bayesian learning and we name this policy as direct following. The second heuristic policy for the pursuer is to stay at the initial location until her truth-revealing stage $k_{1}^{tr}$ and then head toward the evader’s expected final-stage location in the remaining stages. The second policy is conservative because the pursuer does not take proactive actions until she identifies the evader’s type.

Let player $i$ ’s ex-post cumulative cost $\hat{V}_{i}^{0:k}:=\sum_{h=0}^{k}g_{i}^{h},\forall k\in\mathcal{K}$ , be a real-time evaluation of the online algorithm. Although a pursuer under both heuristic policies manages to stay close to the evader at the final stage, Fig. 4 shows that both heuristic policies are more costly than the proposed equilibrium strategy in the long run.

The conservative policy avoids potential trajectory deviations under deception but results in less planning stages for the pursuer to achieve the capture goal. We visualize the accumulation of the pursuer’s cost in Fig. 4(c). The red lines show that the pursuer who adopts the conservative policy spends no action costs before the truth-revealing stage $k_{1}^{tr}$ , i.e., $(u^{k}_{1})^{\prime}F^{k}_{11}(\theta_{1})u^{k}_{1}=0,\forall k\leq k_{1}^{tr}$ , but huge costs in the remaining stages to fulfill her capture goal. The total cumulative cost $\hat{V}_{i}^{0:K}$ at the final stage increases exponentially with the value of $k_{1}^{tr}$ as shown in Fig. 4(b). The black line in Fig. 4(c) illustrates the accumulation of $\hat{V}_{i}^{0:k}$ when the pursuer direct follows the evader’s trajectory. Only under extreme deception scenarios where $k_{1}^{tr}>34$ , the direct following policy results in a lower cost than the conservative policy does. Since the initial belief $l_{1}^{0}$ affects both the truth-revealing stage and the proposed policy, we plot $\hat{V}_{i}^{0:K}$ versus $l_{1}^{0}$ under the conservative policy and the proposed policy in Fig. 4(a). When there is no belief mismatch $l_{1}^{0}(\theta_{2}^{b}|x^{0},\theta_{1}^{H})=1$ , we have $k_{1}^{tr}=1$ and the conservative policy is equivalent to the proposed policy. As the belief mismatch increases, the cost $\hat{V}_{i}^{0:K}$ under the proposed policy (resp. the conservative policy) increases due to the larger deviation along the $x$ -axis (resp. the larger $k_{1}^{tr}$ ). The proposed policy always results in a lower cost $\hat{V}_{i}^{0:K}$ than the conservative policy does. The results in Fig. 4 lead to the following two principles for the pursuer to behave under deception. First, Bayesian learning is a more effective countermeasure than the direct following of the evader’s deceptive trajectory. Second, if learning the evader’s type takes a long time, the pursuer is better to act proactively based on her current belief than to delay actions until the truth-revealing stage.

IV-B Dynamic Game for Deception and Counter-Deception

In this section, the evader has a coupled cost555 A video demo of two UAVs’ real-time trajectories and belief updates under the coupled structure is included in the supplementary materials. defined in Definition 5 and the level of evasion determination increases with a constant rate $\alpha>0$ ; i.e., $d^{k}_{21}(\theta_{2})=\alpha k,\forall\theta_{2}\in\Theta_{2},\forall k\in\mathcal{K}$ . The evader deceives the pursuer by hiding his true target. The pursuer can adopt the following two countermeasures to reduce her cost under the evader’s deception. Section IV-B1 investigates the effectiveness of adaptive learning. We find that the pursuer manages to approach the true target at the final stage by updating her belief and taking actions accordingly based on the real-time trajectory observation. Section IV-B2 further allows the pursuer to introduce additional deception, i.e., obfuscate her maneuverability, to counteract the evader’s information advantage and his deception impact.

IV-B1 Pursuer with a Public Type

When the pursuer’s type is common knowledge, we plot both UAVs’ trajectories under two initial beliefs and two types of pursuers in Fig. 5.

The solid lines show that the evader with the coupled cost detours to stay further from the pursuer. The initial belief mismatch causes a deviation along the $x$ -axis for both high- and low-maneuverability pursuers as shown in red and blue, respectively. However, the deviation has a smaller magnitude and lasts shorter than the one represented by the red line in Fig. 2 due to the coupled cost structure of the evader. The pursuer with a high maneuverability stays closer to the evader at the final stage.

IV-B2 Deception to Counteract Deception

When the pursuer’s type is also private, Fig. 6 shows that she can manipulate the evader’s initial belief $l_{2}^{0}$ to obtain a smaller $k_{1}^{tr}$ and a belief update with less fluctuation. The red line with stars is the same as the one in Fig. 3. It shows that the pursuer’s belief learning is slower and fluctuates more when she interacts with the evader who has a decoupled cost. The reason is that her manipulation of the initial belief $l_{2}^{0}$ does not affect the evader’s decision making as shown in Corollary 1.

A comparison between Fig. 6(a) and Fig. 6(b) shows that it is beneficial for a low-maneuverability pursuer to disguise as a high-maneuverability pursuer but not vice versa. Thus, introducing additional deception to counteract existing deception is not always effective.

IV-C Multi-Dimensional Deception Metrics

The impact of the evader’s deception can be measured by metrics such as the endpoint distance $x_{2}^{fd}:=||x_{2}^{K}-\gamma(\theta_{2})||_{2}$ between the evader and the true target, the endpoint distance $x_{1}^{fd}:=||x_{2}^{K}-x_{1}^{K}||_{2}$ between two UAVs, both UAVs’ truth-revealing stages $k_{i}^{tr}$ , and their ex-post cumulative costs $\hat{V}_{i}^{0:k},\forall k\in\mathcal{K}$ . In this pursuit-evasion case study, we define $\epsilon$ -reachability and $\epsilon$ -capturability in Definition 8. Although $x_{i}^{fd},\forall i\in\{1,2\}$ , is a random variable, we can obtain a good estimate of the reachability and capturability due to the negligible variance of $x_{i}^{fd}$ as shown in Fig. 7(a) and Fig. 8(a).

Definition 8 (Reachability and Capturability).

Consider the proposed pursuit-evasion scenario with a given $\epsilon\geq 0$ , a threshold $\bar{x}^{fd}\geq 0$ , and all initial beliefs $l_{i}^{0}\in(0,1)$ . The target is said to be $\epsilon$ -reachable if $\Pr(x_{2}^{fd}\geq\bar{x}^{fd})\leq\epsilon$ . The evader is said to be $\epsilon$ -capturable if $\Pr(x_{1}^{fd}\geq\bar{x}^{fd})\leq\epsilon$ .

In Section IV-C1, we investigate how the evader can manipulate the pursuer’s initial belief $l_{1}^{0}(\theta_{2}^{b}|x^{0},\theta_{1}^{H})$ to influence the deception. In Section IV-C2, we investigate how the pursuer’s maneuverability plays a role in deception. In both sections, the evader has a coupled cost structure. The pursuer either applies the Bayesian update or not, which is denoted by blue and red lines, respectively, in both Fig. 7 and Fig. 8. In Section IV-C3, we study other metrics, such as deceivability, distinguishability, and PoD.

IV-C1 The Impact of the Evader’s Belief Manipulation

Both UAVs determine their initial beliefs based on the intelligence collected before their interactions. By falsifying the pursuer’s intelligence, the evader can manipulate the pursuer’s initial belief $l_{1}^{0}$ and further influence the deception as shown in Fig. 7.

In the $x$ -axis, an initial belief $l_{1}^{0}(\theta_{2}^{b}|x^{0},\theta_{1}^{H})$ closer to $1$ indicates a smaller belief mismatch. Fig. 7(a) shows that the pursuer’s distance to the evader at the final stage decreases as the belief mismatch decreases regardless of the existence of Bayesian learning. However, the initial belief manipulation has a much less influence on the endpoint distance $x_{1}^{fd}$ when Bayesian learning is applied. Fig. 7(b) shows that for each realization of the noise sequence $w^{k}$ , the pursuer’s truth-revealing stage steps down as the belief mismatch decreases when Bayesian update is applied. Fig. 7(c) illustrates the pursuer’s ex-post cumulative cost $\hat{V}_{1}^{0:K}$ and $\hat{V}_{1}^{0:K-1}$ at the last and the second last stage, respectively. Without Bayesian update, the evader’s deception significantly increases the pursuer’s cost at the second last stage due to the large endpoint distance $x_{1}^{fd}$ . The red lines show that the cost increase is higher under a larger belief mismatch. Fig. 7(d) illustrates the evader’s ex-post cumulative cost at the last stage. If the pursuer does not apply Bayesian learning, then the evader can decrease his cost by increasing the pursuer’s belief mismatch. If the pursuer applies Bayesian learning, then the evader’s cost increases slightly if the pursuer’s belief mismatch is increased. When the belief mismatch is small (i.e., $1-l_{1}^{0}\in(0,0.35)$ ), we observe a win-win situation; i.e., Bayesian learning not only reduces the pursuer’s ex-post cumulative cost, but also the evader’s.

IV-C2 The Impact of the Pursuer’s Maneuverability

The pursuer’s maneuverability can also affect deception as shown in Fig. 8.

The pursuer has an initial belief $l_{1}^{0}(\theta_{2}^{b}|x^{0},\theta_{1}^{H})=0.5$ and the evader knows the pursuer’s type. Fig. 8(a) illustrates that the pursuer can exponentially decrease her distance to the evader at the final stage as her maneuverability increases. Fig. 8(b) demonstrates that the maneuverability increase can decrease and increase the pursuer’s and the evader’s ex-post cumulative costs at the final stage, respectively. The variance grows as maneuverability decreases because the pursuer’s trajectory will become largely affected by the external noise. In both figures, we observe the phenomenon of the marginal effect; i.e., the change rates of both the endpoint distance $x_{1}^{fd}$ and the cost $\hat{V}_{i}^{0:K}$ decrease as the maneuverability increases. Thus, we conclude that higher maneuverability can improve the pursuer’s performance under the evader’s deception as measured by the distance $x_{1}^{fd}$ and the cost $\hat{V}_{1}^{0:K}$ . Moreover, the improvement rate is higher with low maneuverability.

IV-C3 Deceivability, Distinguishability, and PoD

Deceivability defined in Definition 7 is highly related to the distinguishblity among different types. In this case study, a larger distance between targets, i.e., $||\gamma(\theta_{2}^{g})-\gamma(\theta_{2}^{b})||_{2}$ , makes it easier for the pursuer to distinguish between evaders of type $\theta_{2}^{b}$ and type $\theta_{2}^{g}$ . A larger maneuverability difference $|\tilde{B}_{1}(\theta_{1}^{H})-\tilde{B}_{1}(\theta_{1}^{L})|$ makes it easier for the evader to distinguish between pursuers of type $\theta_{1}^{H}$ and type $\theta_{1}^{L}$ . We visualize two UAVs’ truth-revealing stages $k_{i}^{tr}$ versus the distance between targets and the maneuverability difference in Fig. 9. The evader has a coupled cost and both players’ initial belief mismatches are $0.5$ . The dashed black line indicates $\tilde{B}_{1}(\theta_{1}^{L})=0.3$ .

When the maneuverability difference is negligible $\tilde{B}_{1}(\theta_{1}^{H})\in(0.26,0.36)$ , the pursuer’s type cannot be learned correctly in $K$ stages; i.e., the pursuer is $(K+1)$ -stage [math]-deceivable. When the maneuverability difference is small, i.e., $\tilde{B}_{1}(\theta_{1}^{H})\in(0.1,0.5)$ , yet not negligible, i.e., $\tilde{B}_{1}(\theta_{1}^{H})\notin(0.26,0.36)$ , the variance of $k_{2}^{tr}$ is large.

Let $\theta_{2}=\theta_{2}^{b}$ be common knowledge and assume that the evader’s belief confirms to the prior distribution of the pursuer’s type for all stages, i.e., $l_{2}^{k}(\theta_{1}|h^{k},\theta^{b})=\Xi_{1}(\theta_{1}),\forall\theta_{1}\in\Theta_{1},\forall k\in\mathcal{K}$ . Then, Fig. 10 illustrates how the prior distribution of the pursuer’s type affects the value of PoD under three scenarios:

•

$\eta_{1}=1$ , i.e., the central planner only evaluates UAV $1$ ’s performance under deception.

•

$\eta_{1}=0$ , i.e., the central planner only evaluates UAV $2$ ’s performance under deception.

•

$\eta_{1}=0.5$ , i.e., the central planner evaluates the average performance of two UAVs under deception.

When the pursuer’s type is also common knowledge, i.e., $\Xi_{1}(\theta_{1}^{H})=0$ (i.e., the pursuer has type $\theta_{1}^{L}$ ) and $\Xi_{1}(\theta_{1}^{H})=1$ (i.e., the pursuer has type $\theta_{1}^{H}$ ), the game is of complete information and the value of PoD equals $1$ . Since PoD takes continuous values over $\Xi_{1}(\theta_{1}^{H})\in[0,1]$ and has a value of $1$ at two endpoints for all feasible $\eta_{1}$ , we refer to the plots in Fig. 10 as jump rope plots.

They corroborate that the PoD can be bigger than $1$ ; i.e., deception among players may not only benefit the deceiver but also the deceivee.

V Conclusion and Future Work

We have investigated a novel class of rational robot deception problems where intelligent robots hide their heterogeneous private information to achieve their objectives in finite stages with minimum costs. We have proposed an $N$ -player dynamic game framework to quantify the impact of deception and design long-term optimal actions for deception and counter-deception. Robots form their own initial beliefs on others’ private information and update their beliefs at each stage based on extrinsic or intrinsic information. Satisfying the properties of sequential rationality and belief consistency, perfect Bayesian Nash equilibrium can be used to predict $N$ robots’ actions and costs over the $K$ stages. We have studied a class of games in the linear-quadratic form with extrinsic belief dynamics to obtain a unique affine state-feedback control policy and a set of extended Riccati equations. The cognitive coupling resulted from the deception of types demonstrates a distinct feature of rational deception where each player’s action hinges on not only his own belief but also all other players’ beliefs. The concepts of deceivability, distinguishability, and reachability have been defined to characterize the fundamental limits of deception. Meanwhile, the price of deception serves as a crucial evaluation and design metric.

We have investigated a target protection problem where the evader aims to deceptively reach the true target and the pursuer keeps her maneuverability as private information. The pursuer achieves a lower ex-post cumulative cost under the proposed policy than under the direct-following and conservative policies. We have proposed multi-dimensional metrics such as the stage of truth revelation and the endpoint distance to measure the deception impact throughout stages. We have concluded that Bayesian learning can largely reduce the impact of initial belief manipulation and sometimes result in a win-win situation. The increase of the pursuer’s maneuverability can also reduce the endpoint distance and her ex-post cumulative cost yet has a marginal effect. A robot is more deceivable, i.e., less learnable when its potential type is less distinguishable. Finally, we have found that introducing additional deception to counteract existing deception is not always effective. Moreover, deception among multiple players may not only benefit the deceiver but also the deceivee.

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. L. Smith, Why We Lie: The Evolutionary Roots of Deception and the Unconscious Mind . Macmillan, 2004.
2[2] M. Howard and M. E. Howard, Strategic Deception in the Second World War . WW Norton & Company, 1995, vol. 5.
3[3] L. Cowen, T. Ideker, B. J. Raphael, and R. Sharan, “Network propagation: a universal amplifier of genetic associations,” Nature Reviews Genetics , vol. 18, no. 9, p. 551, 2017.
4[4] E. Al-Shaer, J. Wei, K. W. Hamlen, and C. Wang, “Dynamic Bayesian games for adversarial and defensive cyber deception,” in Autonomous Cyber Deception . Springer, 2019, pp. 75–97.
5[5] D. Li and J. B. Cruz, “Defending an asset: A linear quadratic game approach,” IEEE Transactions on Aerospace and Electronic Systems , vol. 47, no. 2, pp. 1026–1044, 2011.
6[6] K. Sreenath and V. Kumar, “Dynamics, control and planning for cooperative manipulation of payloads suspended by cables from multiple quadrotor robots,” in Robotics: Science and Systems , 2013.
7[7] J. C. Harsanyi, “Games with incomplete information played by ”Bayesian” players, i-iii. part i. the basic model,” Management Science , vol. 14, no. 3, pp. 159–182, 1967.
8[8] V. L. L. Thing and J. Wu, “Autonomous vehicle security: A taxonomy of attacks and defences,” in 2016 IEEE International Conference on Internet of Things (i Things) and IEEE Green Computing and Communications (Green Com) and IEEE Cyber, Physical and Social Computing (CPS Com) and IEEE Smart Data (Smart Data) , 2016, pp. 164–170.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Dynamic Game Framework for Rational and Persistent Robot Deception with an Application to Deceptive Pursuit-Evasion

Abstract

Note to Practitioners

Index Terms:

I Introduction

I-A Related Works

I-B Notations and Organization of the Paper

II Dynamic Game with Private Types

Definition 1** (Multi-Agent Controllability).**

II-A Forward Belief Dynamics

II-A1 Bayesian Belief Dynamics

Remark 1** (Actions Reveal Type Information).**

II-A2 Markov-Chain Belief Dynamics

Example 1**.**

II-B Nonzero-Sum Cost Function and Equilibrium Concept

Definition 2** (Sequential Rationality).**

Definition 3** (Perfect Bayesian Nash Equilibrium).**

Proposition 1**.**

Proof.

II-C Offline Evaluation of Equilibrium Cost

Definition 4** (Price of Deception).**

III Linear-Quadratic Specification

Remark 2**.**

Belief Matrix

Extended Riccati Equations

Equilibrium Action in Matrix Form

III-A Extrinsic Belief Dynamics and Extended Riccati Equations

Theorem 1**.**

Proof.

Remark 3** (Positive Definiteness).**

Remark 4** (Cognitive Coupling).**

Definition 5** (Decoupled Dynamics and Cost).**

Corollary 1** (Degeneration to LQ Control).**

Proof.

III-B Intrinsic Belief Dynamics and Receding-Horizon Control

Definition 6** (Stage of Truth Revelation).**

Definition 7** (Deceviability and Learnability).**

IV Dynamic Target Protection under Deception

IV-A Deceptive Evader with Decoupled Cost Structure

IV-A1 Finite-Horizon Analysis of Bayesian Update

IV-A2 Comparison with Heuristic Policies

IV-B Dynamic Game for Deception and Counter-Deception

IV-B1 Pursuer with a Public Type

IV-B2 Deception to Counteract Deception

IV-C Multi-Dimensional Deception Metrics

Definition 8** (Reachability and Capturability).**

IV-C1 The Impact of the Evader’s Belief Manipulation

IV-C2 The Impact of the Pursuer’s Maneuverability

IV-C3 Deceivability, Distinguishability, and PoD

V Conclusion and Future Work

Definition 1 (Multi-Agent Controllability).

Remark 1 (Actions Reveal Type Information).

Example 1.

Definition 2 (Sequential Rationality).

Definition 3 (Perfect Bayesian Nash Equilibrium).

Proposition 1.

Definition 4 (Price of Deception).

Remark 2.

Theorem 1.

Remark 3 (Positive Definiteness).

Remark 4 (Cognitive Coupling).

Definition 5 (Decoupled Dynamics and Cost).

Corollary 1 (Degeneration to LQ Control).

Definition 6 (Stage of Truth Revelation).

Definition 7 (Deceviability and Learnability).

Definition 8 (Reachability and Capturability).