Centralized Versus Decentralized Detection of Attacks in Stochastic   Interconnected Systems

Rajasekhar Anguluri; Vaibhav Katewa; and Fabio Pasqualetti

arXiv:1903.10109·math.OC·March 26, 2019·IEEE Trans. Autom. Control.

Centralized Versus Decentralized Detection of Attacks in Stochastic Interconnected Systems

Rajasekhar Anguluri, Vaibhav Katewa, and Fabio Pasqualetti

PDF

TL;DR

This paper compares centralized and decentralized attack detection methods in stochastic interconnected systems, revealing conditions where decentralized detection can outperform centralized detection, and designs attacks to challenge system security.

Contribution

It characterizes the performance of both detection schemes, showing the surprising potential for decentralized detectors to outperform centralized ones under certain conditions.

Findings

01

Decentralized detectors can outperform centralized detectors depending on system parameters.

02

A method to design worst-case attacks that maximize system degradation while remaining detectable.

03

Validation through numerical studies on an electric power system.

Abstract

We consider a security problem for interconnected systems governed by linear, discrete, time-invariant, stochastic dynamics, where the objective is to detect exogenous attacks by processing the measurements at different locations. We consider two classes of detectors, namely centralized and decentralized detectors, which differ primarily in their knowledge of the system model. In particular, a decentralized detector has a model of the dynamics of the isolated subsystems, but is unaware of the interconnection signals that are exchanged among subsystems. Instead, a centralized detector has a model of the entire dynamical system. We characterize the performance of the two detectors and show that, depending on the system and attack parameters, each of the detectors can outperform the other. In particular, it may be possible for the decentralized detector to outperform its centralized…

Figures10

Click any figure to enlarge with its caption.

Equations198

x_{i} (k + 1) y_{i} (k) = A_{ii} x_{i} (k) + B_{i} u_{i} (k) + w_{i} (k), = C_{i} x_{i} (k) + v_{i} (k),

x_{i} (k + 1) y_{i} (k) = A_{ii} x_{i} (k) + B_{i} u_{i} (k) + w_{i} (k), = C_{i} x_{i} (k) + v_{i} (k),

B_{i}

B_{i}

u_{i}

x_{i} (k + 1) = A_{ii} x_{i} (k) + B_{i} u_{i} (k) + B_{i}^{a} u_{i}^{a} (k) + w_{i} (k),

x_{i} (k + 1) = A_{ii} x_{i} (k) + B_{i} u_{i} (k) + B_{i}^{a} u_{i}^{a} (k) + w_{i} (k),

x (k + 1) y (k) = A x (k) + B^{a} u^{a} (k) + w (k), = C x (k) + v (k),

x (k + 1) y (k) = A x (k) + B^{a} u^{a} (k) + w (k), = C x (k) + v (k),

A = A_{11} ⋮ A_{N 1} \dots ⋱ \dots A_{1 N} ⋮ A_{N N}, B^{a} = B_{1}^{a} ⋮ 0 \dots ⋱ \dots 0 ⋮ B_{N}^{a},

A = A_{11} ⋮ A_{N 1} \dots ⋱ \dots A_{1 N} ⋮ A_{N N}, B^{a} = B_{1}^{a} ⋮ 0 \dots ⋱ \dots 0 ⋮ B_{N}^{a},

Y_{i} = [y_{i}^{T} (1) y_{i}^{T} (2) \dots y_{i}^{T} (T)]^{T},

Y_{i} = [y_{i}^{T} (1) y_{i}^{T} (2) \dots y_{i}^{T} (T)]^{T},

Y_{c} = [y^{T} (1) y^{T} (2) \dots y^{T} (T)]^{T},

Y_{c} = [y^{T} (1) y^{T} (2) \dots y^{T} (T)]^{T},

O_{i}

O_{i}

F_{i}^{(u)} = C_{i} B_{i} ⋮ C_{i} A_{ii}^{T - 1} B_{i} \dots ⋱ \dots 0 ⋮ C_{i} B_{i}, F_{i}^{(w)} = C_{i} ⋮ C_{i} A_{ii}^{T - 1} \dots ⋱ \dots 0 ⋮ C_{i} .

F_{i}^{(u)} = C_{i} B_{i} ⋮ C_{i} A_{ii}^{T - 1} B_{i} \dots ⋱ \dots 0 ⋮ C_{i} B_{i}, F_{i}^{(w)} = C_{i} ⋮ C_{i} A_{ii}^{T - 1} \dots ⋱ \dots 0 ⋮ C_{i} .

Y_{i}

Y_{i}

Y_{c}

Y_{i} Y_{c} = N_{i} Y_{i} = N_{i} [F_{i}^{(a)} U_{i}^{a} + F_{i}^{(w)} W_{i} + V_{i}], = N_{c} Y_{c} = N_{c} [F_{c}^{(a)} U^{a} + F_{c}^{(w)} W + V],

Y_{i} Y_{c} = N_{i} Y_{i} = N_{i} [F_{i}^{(a)} U_{i}^{a} + F_{i}^{(w)} W_{i} + V_{i}], = N_{c} Y_{c} = N_{c} [F_{c}^{(a)} U^{a} + F_{c}^{(w)} W + V],

Y_{i} Y_{c} \sim N (β_{i}, Σ_{i}), for all i \in {1, \dots, N}, and \sim N (β_{c}, Σ_{c}),

Y_{i} Y_{c} \sim N (β_{i}, Σ_{i}), for all i \in {1, \dots, N}, and \sim N (β_{c}, Σ_{c}),

β_{i} β_{c} Σ_{i} Σ_{c} = N_{i} F_{i}^{(a)} U_{i}^{a}, = N_{c} F_{c}^{(a)} U^{a}, = N_{i} [(F_{i}^{(w)}) (I_{T} \otimes Σ_{w_{i}}) (F_{i}^{(w)})^{T} + (I_{T} \otimes Σ_{v_{i}})] N_{i}^{T}, = N_{c} [(F_{c}^{(w)}) (I_{T} \otimes Σ_{w}) (F_{c}^{(w)})^{T} + (I_{T} \otimes Σ_{v})] N_{c}^{T} .

β_{i} β_{c} Σ_{i} Σ_{c} = N_{i} F_{i}^{(a)} U_{i}^{a}, = N_{c} F_{c}^{(a)} U^{a}, = N_{i} [(F_{i}^{(w)}) (I_{T} \otimes Σ_{w_{i}}) (F_{i}^{(w)})^{T} + (I_{T} \otimes Σ_{v_{i}})] N_{i}^{T}, = N_{c} [(F_{c}^{(w)}) (I_{T} \otimes Σ_{w}) (F_{c}^{(w)})^{T} + (I_{T} \otimes Σ_{v})] N_{c}^{T} .

Λ_{i} ≜ Y_{i}^{T} Σ_{i}^{- 1} Y_{i} H_{0} ≷ H_{1} τ_{i},

Λ_{i} ≜ Y_{i}^{T} Σ_{i}^{- 1} Y_{i} H_{0} ≷ H_{1} τ_{i},

P_{i}^{F} = Pr [Λ_{i} \geq τ_{i} ∣ H_{0}] and P_{i}^{D} = Pr [Λ_{i} \geq τ_{i} ∣ H_{1}] .

P_{i}^{F} = Pr [Λ_{i} \geq τ_{i} ∣ H_{0}] and P_{i}^{D} = Pr [Λ_{i} \geq τ_{i} ∣ H_{1}] .

Λ_{c} ≜ Y_{c}^{T} Σ_{c}^{- 1} Y_{c} H_{0} ≷ H_{1} τ_{c},

Λ_{c} ≜ Y_{c}^{T} Σ_{c}^{- 1} Y_{c} H_{0} ≷ H_{1} τ_{c},

P_{i}^{F} P_{c}^{F} = Q (τ_{i}; p_{i}, 0), P_{i}^{D} = Q (τ_{i}; p_{i}, λ_{i}), and = Q (τ_{c}; p_{c}, 0), P_{c}^{D} = Q (τ_{c}; p_{c}, λ_{c}),

P_{i}^{F} P_{c}^{F} = Q (τ_{i}; p_{i}, 0), P_{i}^{D} = Q (τ_{i}; p_{i}, λ_{i}), and = Q (τ_{c}; p_{c}, 0), P_{c}^{D} = Q (τ_{c}; p_{c}, λ_{c}),

p_{i} λ_{i} = Rank (Σ_{i}), p_{c} = Rank (Σ_{c}), = (U_{i}^{a})^{T} M_{i} (U_{i}^{a}), λ_{c} = (U^{a})^{T} M_{c} (U^{a}),

p_{i} λ_{i} = Rank (Σ_{i}), p_{c} = Rank (Σ_{c}), = (U_{i}^{a})^{T} M_{i} (U_{i}^{a}), λ_{c} = (U^{a})^{T} M_{c} (U^{a}),

M_{i} M_{c} = (N_{i} F_{i}^{(a)})^{T} Σ_{i}^{- 1} (N_{i} F_{i}^{(a)}), = (N_{c} F_{c}^{(a)})^{T} Σ_{c}^{- 1} (N_{c} F_{c}^{(a)}) .

M_{i} M_{c} = (N_{i} F_{i}^{(a)})^{T} Σ_{i}^{- 1} (N_{i} F_{i}^{(a)}), = (N_{c} F_{c}^{(a)})^{T} Σ_{c}^{- 1} (N_{c} F_{c}^{(a)}) .

P_{d}^{F} P_{d}^{D} = Pr [Λ_{i} \geq τ_{i}, for some i \in {1, \dots, N} ∣ H_{0}], = Pr [Λ_{i} \geq τ_{i}, for some i \in {1, \dots, N} ∣ H_{1}],

P_{d}^{F} P_{d}^{D} = Pr [Λ_{i} \geq τ_{i}, for some i \in {1, \dots, N} ∣ H_{0}], = Pr [Λ_{i} \geq τ_{i}, for some i \in {1, \dots, N} ∣ H_{1}],

P_{d}^{F} = 1 - i = 1 \prod N (1 - P_{i}^{F}), and P_{d}^{D} = 1 - i = 1 \prod N (1 - P_{i}^{D}) .

P_{d}^{F} = 1 - i = 1 \prod N (1 - P_{i}^{F}), and P_{d}^{D} = 1 - i = 1 \prod N (1 - P_{i}^{D}) .

P_{c}^{F} = 1 - i = 1 \prod N (1 - P_{i}^{F}) .

P_{c}^{F} = 1 - i = 1 \prod N (1 - P_{i}^{F}) .

τ_{c} \leq p_{c} + λ_{c} - 4 N (p_{c} + 2 λ_{c}) ln (\frac{1}{1 - P _{max}^{D}}),

τ_{c} \leq p_{c} + λ_{c} - 4 N (p_{c} + 2 λ_{c}) ln (\frac{1}{1 - P _{max}^{D}}),

τ_{c} \geq p_{c} + λ_{c} + 4 (p_{c} + 2 λ_{c}) ln (\frac{1}{1 - ( 1 - P _{min}^{D} ) ^{N}}) + + 2 ln (\frac{1}{1 - ( 1 - P _{min}^{D} ) ^{N}}),

τ_{c} \geq p_{c} + λ_{c} + 4 (p_{c} + 2 λ_{c}) ln (\frac{1}{1 - ( 1 - P _{min}^{D} ) ^{N}}) + + 2 ln (\frac{1}{1 - ( 1 - P _{min}^{D} ) ^{N}}),

μ_{c} σ_{c} ≜ E [Λ_{c}] = λ_{c} + p_{c}, and ≜ SD [Λ_{c}] = 2 (p_{c} + 2 λ_{c}) .

μ_{c} σ_{c} ≜ E [Λ_{c}] = λ_{c} + p_{c}, and ≜ SD [Λ_{c}] = 2 (p_{c} + 2 λ_{c}) .

τ_{c}

τ_{c}

τ_{c}

U^{a} max

U^{a} max

P_{c}^{D} \leq δ_{c},

x (k + 1) = A x (k) + B^{a} u^{a} (k) + w (k),

U^{a} max

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\setstackEOL

Centralized Versus Decentralized Detection of Attacks in

Stochastic Interconnected Systems

Rajasekhar Anguluri

\IEEEmembershipStudent Member, IEEE

Vaibhav Katewa

\IEEEmembershipMember, IEEE

and Fabio Pasqualetti \IEEEmembershipMember, IEEE This material is based upon work supported by the awards ARO 71603NSYIP, NSF ECCS1405330, and UCOP LFR-18-548175. The authors are with the Department of Mechanical Engineering, University of California, Riverside, {rangu003,vkatewa,fabiopas}@engr.ucr.edu.

Abstract

We consider a security problem for interconnected systems governed by linear, discrete, time-invariant, stochastic dynamics, where the objective is to detect exogenous attacks by processing the measurements at different locations. We consider two classes of detectors, namely centralized and decentralized detectors, which differ primarily in their knowledge of the system model. In particular, a decentralized detector has a model of the dynamics of the isolated subsystems, but is unaware of the interconnection signals that are exchanged among subsystems. Instead, a centralized detector has a model of the entire dynamical system. We characterize the performance of the two detectors and show that, depending on the system and attack parameters, each of the detectors can outperform the other. In particular, it may be possible for the decentralized detector to outperform its centralized counterpart, despite having less information about the system dynamics, and this surprising property is due to the nature of the considered attack detection problem. To complement our results on the detection of attacks, we propose and solve an optimization problem to design attacks that maximally degrade the system performance while maintaining a pre-specified degree of detectability. Finally, we validate our findings via numerical studies on an electric power system.

1 Introduction

Cyber-physical systems are becoming increasingly more complex and interconnected. In fact, different cyber-physical systems typically operate in a connected environment, where the performance of each system is greatly affected by neighboring units. An example is the smart grid, which arises from the interconnection of smaller power systems at different geographical locations, and whose performance depends on other critical infrastructures including the transportation network and the water system. Given the interconnected nature of large cyber-physical systems, and the fact that each subsystem usually has only partial knowledge or measurements of other interconnected units, the security question arises as to whether sophisticated attackers can hide their action to the individual subsystems while inducing system-wide critical perturbations.

In this work we investigate whether, and to what extent, coordination among different subsystems and knowledge of the global system dynamics is necessary to detect attacks in interconnected systems. In fact, while existing approaches for the detection of faults and attacks typically rely on a centralized detector [1, 2, 3], the use of local detectors would not only be computationally convenient, but it would also prevent the subsystems from disclosing private information about their plants. As a counterintuitive result, we will show that local and decentralized detectors can, in some cases, outperform a centralized detector, thus supporting the development of distributed and localized theories and tools for the security of cyber-physical systems.

Related work: Centralized attack detectors have been the subject of extensive research in the last years [4, 5, 6, 7, 8, 9, 10, 11, 12], where the detector has complete knowledge of the system dynamics and all measurements. Furthermore, these studies use techniques from various disciplines including game theory, information theory, fault detection and signal processing, and have a wide variety of applications [2]. Instead, decentralized attack detectors, where each local detector decides on attacks based on partial information and measurements about the system, and local detectors cooperate to improve their detection capabilities, have received only limited and recent attention [13, 14, 15, 16, 17].

Decentralized detection schemes have also been studied for fault detection and isolation (FDI). In such schemes, multiple local detectors make inferences about either the global or local process, and transmit their local decisions to a central entity, which uses appropriate fusion rules to make the global decision[18, 19, 20, 21, 22]. Methods to improve the detection performance by exchanging information among the local detectors have also been proposed [23, 24, 25]. These decentralized algorithms are typically complex [1], their effectiveness in detecting unknown and unmeasurable attacks is difficult to characterize, and their performance is believed to be inferior when compared to their centralized counterparts. To the best of our knowledge, a rigorous comparison of centralized and decentralized attack detection schemes is still lacking, which prevents us from assessing whether, and to what extent, decentralized and distributed schemes should be employed for attack detection and identification.

Main contributions:111In a preliminary version of this paper [26], we used asymptotic approximations to compare the detectors’ performance. Instead, in this paper we provide stronger, tight, and non-asymptotic results without using any approximations. Further, it contains new results on the design of optimal undetectable attacks, and a characterization of the performance degradation induced by such attacks. In addition, an illustration of the results using electrical power grid is also presented. This paper features three main contributions. First, we propose centralized and decentralized schemes to detect unknown and unmeasurable sensor attacks in stochastic interconnected systems. Our detection schemes are based on the statistical decision theoretic framework that falls under the category of simple versus composite hypotheses testing. We characterize the probability of false alarm and the probability of detection for both detectors, as a function of the system and attack parameters. Second, we compare the performance of the centralized and decentralized detectors, and show that each detector can outperform the other for certain system and attack configurations. We discuss that this counterintuitive phenomenon is inherent with the simple versus composite nature of the considered attack detection problem, and provide numerical examples of this behavior. Third, we formulate and solve an optimization problem to design attacks against interconnected systems that maximally affect the system performance as measured by the mean square deviation of the state while remaining undetected by the centralized and decentralized detectors with a pre-selected probability. Finally, we validate our theoretical findings on the IEEE RTS-96 power system model.

Paper organization: The rest of the paper is organized as follows. Section 2 contains our problem formulation. In Section 3, we present our local, decentralized, and centralize detectors, and characterize their performance. Section 4 contains our main results regarding the comparison of the performance of centralized and decentralized detectors. Section 5 contains the design of optimal undetectable attacks. Finally, Section 6 contains our numerical studies, and Section 7 concludes the paper.

Mathematical notation: The following notation will be adopted throughout the paper. Let $X_{1},\ldots,X_{N}$ be arbitrary sets, then $\bigcup_{i=1}^{N}X_{i}$ and $\bigcap_{i=1}^{N}X_{i}$ denotes the union and intersection of the sets, respectively. $\mathrm{Trace}(\cdot)$ , $\mathrm{Rank}(\cdot)$ , and $\mathrm{Null}(\cdot)$ denote the trace, rank, and null space of a matrix, respectively. $Q>0$ ( $Q\geq 0$ ) denotes that $Q$ is a positive definite (positive semi definite) matrix. $\otimes$ denotes the Kronecker product for matrices. $\mathrm{blkdiag}(A_{1},\cdots,A_{N})$ denotes the block diagonal matrix with $A_{1},\cdots,A_{N}$ as diagonal entries. The identity matrix is denoted by $I$ (or $I_{\text{dim}}$ to denote dimension explicitly). $\mathrm{Pr}[\mathcal{E}]$ denotes the probability of the event $\mathcal{E}$ . The mean and covariance of a real or vector valued random variable $Y$ is denoted by $\mathbb{E}[Y]$ and $\mathrm{Cov}[Y]$ . Further, for a real valued random variable $Y$ , we denote the standard deviation as $\mathrm{SD}[Y]$ . If $Y$ follows a Gaussian distribution, we denote it by $Y\sim\mathcal{N}\left(\mathbb{E}[Y],\mathrm{Cov}[Y]\right)$ . Instead, if $Y$ follows a noncentral chi-squared distribution, we denote it by $Y\sim\chi^{2}(p,\lambda)$ , where $p$ is the degrees of freedom and $\lambda$ is the non-centrality parameter. For $Y\sim\chi^{2}(p,\lambda)$ and $\tau\geq 0$ , $Q(\tau;p,\lambda)$ denotes the complementary cumulative distribution function of $Y$ .

2 Problem setup and preliminary notions

We consider an interconnected system with $N$ subsystems, where each subsystem obeys the discrete-time linear dynamics

[TABLE]

with $i\in\{1,\ldots,N\}$ . In the above equation, the vectors $x_{i}\in\mathbb{R}^{n_{i}}$ and $y_{i}\in\mathbb{R}^{r_{i}}$ are the state and measurement of the $i-$ th subsystem, respectively. The process noise $w_{i}(k)\sim\mathcal{N}(0,\Sigma_{w_{i}})$ and the measurement noise $v_{i}(k)\sim\mathcal{N}(0,\Sigma_{v_{i}})$ are independent stochastic processes, and $w_{i}$ is assumed to be independent of $v_{i}$ , for all $k\geq 0$ . Further, the noise vectors across different subsystems are assumed to be independent at all times. The $i-$ th subsystem is coupled with the other subsystems through the term $B_{i}u_{i}$ , which takes the form

[TABLE]

The input $B_{i}u_{i}=\sum_{j\neq i}^{N}A_{ij}x_{j}$ represents the cumulative effect of subsystems $j$ on subsystem $i$ . Hence, we refer to $B_{i}$ as to the interconnection matrix, and to $u_{i}$ as to the interconnection signal, respectively.

We allow for the presence of attacks compromising the dynamics of the subsystems, and model such attacks as exogenous unknown inputs. In particular, the dynamics of the $i-$ th subsystem under the attack $u^{a}_{i}$ with matrix $B^{a}_{i}$ read as

[TABLE]

where $u_{i}^{a}\in\mathbb{R}^{m_{i}}$ . In vector form, the dynamics of the interconnected system under attack read as

[TABLE]

where $\phi=\begin{bmatrix}\phi_{1}^{\mathsf{T}}&\ldots&\phi_{N}^{\mathsf{T}}\end{bmatrix}$ , with $\phi$ standing for $x\in\mathbb{R}^{n}$ , $w\in\mathbb{R}^{n}$ , $u^{a}\in\mathbb{R}^{m}$ , $y\in\mathbb{R}^{r}$ , $v\in\mathbb{R}^{r}$ , $n=\sum_{i=1}^{N}n_{i}$ , $m=\sum_{i=1}^{N}m_{i}$ , and $r=\sum_{i}^{N}r_{i}$ . Moreover, as the components of the vectors $w$ and $v$ are independent and Gaussian, $w\sim\mathcal{N}(0,\Sigma_{w})$ and $v\sim\mathcal{N}(0,\Sigma_{v})$ , respectively, where $\Sigma_{w}=\mathrm{blkdiag}\left(\Sigma_{w_{1}},\ldots,\Sigma_{w_{N}}\right)$ and $\Sigma_{v}=\mathrm{blkdiag}\left(\Sigma_{v_{1}},\ldots,\Sigma_{v_{N}}\right)$ . Further,

[TABLE]

and $C=\mathrm{blkdiag}\left(C_{1},\ldots,C_{N}\right)$ .

We assume that each subsystem is equipped with a local detector, which uses the local measurements and knowledge of the local dynamics to detect the presence of local attacks. In particular, the $i-$ th local detector has access to the measurements $y_{i}$ in (1), knows the matrices $A_{ii}$ , $B_{i}$ , and $C_{i}$ , and the statistical properties of the noise vectors $w_{i}$ and $v_{i}$ . Yet, the $i-$ th local detector does not know or measure the interconnection input $u_{i}$ , and the attack parameters $B_{i}^{a}$ and $u_{i}^{a}$ . Based on this information, the $i-$ th local detector aims to detect whether $B_{i}^{a}u_{i}^{a}\neq 0$ . The decisions of the local detectors are then processed by a decentralized detector, which aims to detect the presence of attacks against the whole interconnected system based on the local decisions. Finally, we assume the presence of a centralized detector, which has access to the measurements $y$ in (3), and knows the matrix $A$ and the statistical properties of the overall noise vectors $w$ and $v$ . Similarly to the local detectors, the centralized detector does not know or measure the attack parameters $B^{a}$ and $u^{a}$ , and aims to detect whether $B^{a}u^{a}\neq 0$ . We postpone a detailed description of our detectors to Section 3. To conclude this section, note that the decentralized and centralized detectors have access to the same measurements. Yet, these detectors differ in their knowledge of the system dynamics, which determines their performance as explained in Section 4.

Remark 1

(Control input and initial state) The system setup in (2) and (3) typically includes a control input. However, assuming that each subsystem knows its control input, it can be omitted without affecting generality. Further, as the detectors do not have information about the initial state, we assume without loss of generality, that the initial state is deterministic and unknown to the detectors. $\square$

3 Local, decentralized, and centralized detectors

In this section we formally describe our local, decentralized, and centralized detectors, and characterize their performance as a function of the available measurements and knowledge of the system dynamics. To this aim, let $T>0$ be an arbitrary time horizon and define the vectors

[TABLE]

which contains the measurements available to the $i-$ th detector, and

[TABLE]

which contains the measurements available to the centralized detector. Both the local and centralized detectors perform the following three operations in order:

Collect measurements as in (4) and (5), respectively; 2. 2.

Process measurements to filter unknown variables; and 3. 3.

Perform statistical hypotheses testing to detect attacks (locally or globally) using the processed measurements.

The decisions of the local detectors are then used by the decentralized detector, which triggers an alarm if any of the local detectors does so. We next characterize how the detectors process their measurements and perform attack detection via statistical hypothesis testing.

3.1 Processing of measurements

The measurements (4) and (5) depend on parameters that are unknown to the detectors, namely, the system initial state and the interconnection signal (although the process and measurement noises are also unknown, the detectors know their statistical properties). Thus, to test for the presence of attacks, the detectors first process the measurement vectors to eliminate their dependency on the unknown parameters. To do so, using equations (1) and (2), define the observability matrix and the attack, interconnection, and noise forced response matrices of the $i-$ th subsystem as

[TABLE]

Analogously, for the system model (3) define the matrices $\mathcal{O}_{c}$ , $\mathcal{F}^{(w)}_{a}$ , and $\mathcal{F}^{(w)}_{c}$ , which are constructed as above by replacing $A_{i}$ , $B_{i}^{a}$ , and $C_{i}$ with $A$ , $B^{a}$ , and $C$ , respectively. The measurements (4) and (5) can be written as follows:

[TABLE]

where $U_{i}=\begin{bmatrix}u_{i}^{\mathsf{T}}(0)&u_{i}^{\mathsf{T}}(1)&\cdots&u_{i}^{\mathsf{T}}(T-1)\end{bmatrix}^{\mathsf{T}}$ . The vectors $U^{a}_{i}$ , $U^{a}$ , $W_{i}$ and $W$ are the time aggregated signals of $u^{a}_{i}$ , $u^{a}$ , $w_{i}$ , and $w$ , respectively, and are defined similarly to $U_{i}$ . Instead, $V_{i}=\begin{bmatrix}v_{i}^{\mathsf{T}}(1)&v_{i}^{\mathsf{T}}(2)&\cdots&v_{i}^{\mathsf{T}}(T)\end{bmatrix}^{\mathsf{T}}$ , and $V$ is defined similarly to $V_{i}$ . To eliminate the dependency from the unknown variables, let $N_{i}$ and $N_{c}$ be bases of the left null spaces of the matrices $\begin{bmatrix}\mathcal{O}_{i}&\mathcal{F}^{(u)}_{i}\end{bmatrix}$ and $\mathcal{O}_{c}$ , respectively, and define the processed measurements as

[TABLE]

where the expressions for $\widetilde{Y}_{i}$ and $\widetilde{Y}_{c}$ follows from (6) and (7). Notice that, in the absence of attacks ( $U^{a}=0$ ), the measurements $\widetilde{Y}_{i}$ and $\widetilde{Y}_{c}$ depend only on the system noise. Instead, in the presence of attacks, such measurements also depend on the attack vector, which may leave a signature for the detectors.222If $\mathrm{Im}({B}_{i}^{a})\subseteq\mathrm{Im}({B}_{i})$ , then $N_{i}\mathcal{F}^{(a)}_{i}=0$ and the processed measurements do not depend on the attack. Thus, our local detection technique can only be successful against attacks that do not satisfy this condition. We now characterize the statistical properties of $\widetilde{Y}_{i}$ and $\widetilde{Y}_{c}$ .

Lemma 3.1

(Statistical properties of the processed measurements) The processed measurements $\widetilde{Y}_{i}$ and $\widetilde{Y}_{c}$ satisfy

[TABLE]

where

[TABLE]

A proof of Lemma 3.1 is postponed to the Appendix. From Lemma 3.1, the mean vectors $\beta_{i}$ and $\beta_{c}$ depend on the attack vector, while the covariance matrices $\Sigma_{i}$ and $\Sigma_{c}$ are independent of the attack. This observation motivates us to develop a detection mechanism based on the mean of the processed measurements, rather the covariance matrices.

3.2 Statistical hypothesis testing framework

In this section we detail our attack detection mechanism, which we assume to be the same for all local and centralized detectors, and we characterize its false alarm and detection probabilities. We start by analyzing the test procedure of the $i-$ th local detector. Let $H_{0}$ be the null hypothesis, where $\beta_{i}=0$ and the system is not under attack, and let $H_{1}$ be the alternative hypothesis, where $\beta_{i}\neq 0$ and the system is under attack. To decide which hypothesis is true, or equivalently whether the mean value of the processed measurements is zero, we resort to the generalized log-likelihood ratio test (GLRT):

[TABLE]

where the threshold $\tau_{i}\geq 0$ is selected based on the desired false alarm probability of the test (11) [27]. For a statistical hypothesis testing problem, the false alarm probability equals the probability of deciding for $H_{1}$ when $H_{0}$ is true, while the detection probability equals the probability of deciding for $H_{1}$ when $H_{1}$ is true. While the former is used for tuning the threshold, the latter is used for measuring the performance of the test. Formally, the false alarm and detection probabilities of (11) are the probabilities that are conditioned on the hypothesis $H_{0}$ and $H_{1}$ , respectively, and are symbolically denoted as

[TABLE]

Similarly, the centralized detector test is defined as

[TABLE]

where $\tau_{c}\geq 0$ is a preselected threshold, and its false alarm and detection probabilities are denoted as $P^{F}_{c}$ and $P^{D}_{c}$ . We next characterize the false alarm and detection probabilities of the detectors with respect to the system and attack parameters.

Lemma 3.2

(False alarm and detection probabilities of local and centralized detectors) The false alarm and the detection probabilities of the tests (11) and (12) are, respectively,

[TABLE]

where

[TABLE]

and

[TABLE]

Lemma 3.2, whose proof is postponed to the Appendix, allows us to compute the false alarm and detection probabilities of the detectors using the decision thresholds, the system parameters, and the attack vector. Moreover, for fixed $P^{F}_{i}$ and $P^{F}_{c}$ , the detection thresholds are computed as $\tau_{c}=Q^{-1}(P^{F}_{c};p_{c},0)$ and $\tau_{i}=Q^{-1}(P^{F}_{i};p_{i},0)$ , where $Q^{-1}(\cdot)$ is the inverse of the complementary Cumulative Distribution Functions (CDF) that is associated with a central chi-squared distribution. The parameters $p_{i}$ , $p_{c}$ and $\lambda_{i}$ , $\lambda_{c}$ in Lemma 3.2 are referred to as degrees of freedom and non-centrality parameters of the detectors.

Remark 2

(System theoretic interpretation of detection probability parameters) The degrees of freedom and the non-centrality parameters quantify the knowledge of the detectors about the system dynamics and the energy of the attack signal contained in the processed measurements. In particular:

(Degrees of freedom $p_{i}$ ) The detection probability and the false alarm probability are both increasing functions of the degrees of freedom $p_{i}$ , because the $Q$ function in (13) is an increasing function of $p_{i}$ . Thus, increasing $p_{i}$ by, for instance, increasing the number of sensors or the horizon $T$ , does not necessarily lead to an improvement of the detector performance.

(Non-centrality parameter $\lambda_{i}$ ) The non-centrality parameter $\lambda_{i}$ measures the energy of the attack signal contained in the processed measurements. In the literature of communication and signal processing, the non-centrality parameter is often referred to as signal to noise ratio (SNR) **[27]**. For fixed $\tau_{i}$ and $p_{i}$ , the detection probability increases monotonically with $\lambda_{i}$ , and approaches the false alarm probability as $\lambda_{i}$ tends to zero.

*(Decision threshold $\tau_{i}$ ) For fixed $\lambda_{i}$ and $p_{i}$ , the probability of detection and the false alarm probability are monotonically decreasing functions of the detection threshold $\tau_{i}$ . This is due to the fact that the complementary CDFs, which define the false alarm and detection probabilities, are decreasing functions of $\tau_{i}$ . As we show later, because of the contrasting behaviors of the false alarm and detection probabilities with respect to all individual parameters, the decentralized detector can outperform the centralized detector. * $\square$

We now state a result that provides a relation between the degrees of freedom ( $p_{i}$ and $p_{c}$ ) and the non-centrality parameters ( $\lambda_{i}$ and $\lambda_{c}$ ) of the local and the centralized detectors. This result plays a central role in comparing the performance of these centralized and decentralized detectors.

Lemma 3.3

(Degrees of freedom and non-centrality parameters) Let $p_{i}$ , $p_{c}$ and $\lambda_{i}$ , $\lambda_{c}$ be the degrees of freedom and non-centrality parameters of the $i-$ th local and centralized detectors, respectively. Then, $p_{i}\!\leq\!p_{c}$ and $\lambda_{i}\!\leq\!\lambda_{c}$ for all $i\in\{1,\dots,N\}$ .

A proof of Lemma 3.3 is postponed to the Appendix. In loose words, given the interpretation of the degrees of freedom and noncentrality parameters in Remark 2, Lemma (3.3) states that a centralized detector has more knowledge about the system dynamics ( $p_{i}\leq p_{c}$ ) and its measurements contain a stronger attack signature ( $\lambda_{i}\leq\lambda_{c}$ ) than any of the $i-$ th local detector. Despite these properties, we will show that the decentralized detector can outperform the centralized one.

4 Comparison of centralized and decentralized detection of

attacks

In this section we characterize the detection probabilities of the decentralized and centralized detectors, and we derive sufficient conditions for each detector to outperform the other. Recall that the decentralized detector triggers an alarm if any of the local detectors detects an alarm. In other words,

[TABLE]

where $P^{F}_{d}$ and $P^{D}_{d}$ denote the false alarm and detection probabilities of the decentralized detector, respectively.

Lemma 4.1

(Performance of the decentralized detector) The false alarm and detection probabilities in (16) satisfy

[TABLE]

A proof of Lemma 4.1 is postponed to the Appendix. As shown in Fig. 1, for the case when $P^{F}_{i}=P^{F}_{j}$ , for all $i,j\in\{1,\ldots,N\}$ , $P^{F}_{d}$ increases with increase in $P^{F}_{i}$ and $N$ . To allow for a fair comparison between the decentralized and centralized detectors, we assume that $P_{c}^{F}=P^{F}_{d}$ . Consequently, for a fixed false alarm probability $P_{c}^{F}$ , the probabilities $P_{i}^{F}$ satisfy

[TABLE]

We now derive a sufficient condition for the centralized detector to outperform the decentralized detector.

Theorem 4.2

(Sufficient condition for $P^{D}_{c}\geq P^{D}_{d}$ ) Let $P^{F}_{c}=P_{d}^{F}$ , and assume that the following condition is satisfied:

[TABLE]

where $P^{D}_{\text{max}}=\max\{P_{1}^{D},\dots,P_{N}^{D}\}$ . Then, $P^{D}_{c}\geq P^{D}_{d}$ .

A proof of Theorem 4.2 is postponed to the Appendix. We next derive a sufficient condition for the decentralized detector to outperform the centralized detector.

Theorem 4.3

(Sufficient condition for $P^{D}_{d}\geq P^{D}_{c}$ ) Let $P^{F}_{c}=P_{d}^{F}$ , and assume that the following condition is satisfied:

[TABLE]

where $P^{D}_{\text{min}}=\min\{P_{1}^{D},\dots,P_{N}^{D}\}$ . Then $P^{D}_{d}\geq P^{D}_{c}$ .

A proof of Theorem 4.2 is postponed to the Appendix. Theorems 4.2 and 4.3 provide sufficient conditions on the detectors and attack parameters that result in one detector outperforming the other. In particular, from (18) and (19) we note that, depending on decision threshold $\tau_{c}$ , a centralized detector may or may not outperform a decentralized detector. This is intuitive as the $Q$ function, which quantifies the detection probability, is a decreasing function of the detection threshold (see Remark 2). To clarify the effect of attack and detection parameters on the performance trade-offs of the detectors, we now express (18) and (19) using the mean and standard deviation of the test statistic $\Lambda_{c}$ in (12). Let

[TABLE]

where the expectation and standard deviation (SD) of $\Lambda_{c}$ follows from the fact that under $H_{1}$ , $\Lambda_{c}\sim\chi^{2}(p_{c},\lambda_{c})$ (see proof of Lemma 3.2). Hence, (18) and (19) can be rewritten, respectively, as

[TABLE]

From (20a) and (20b) we note that a centralized detector outperforms the decentralized one if $\tau_{c}$ is $\kappa_{c}$ standard deviations smaller than the mean $\mu_{c}$ . Instead, a decentralized detector outperforms the centralized detector if $\tau_{c}$ is at least $\kappa_{d}$ standard deviations larger than the mean $\mu_{c}$ . See Fig. 2 for a graphical illustration of this interpretation.

Theorems 4.2 and 4.3 are illustrated in Fig. 3 as a function of the non-centrality parameters. It can be observed that (i) each of the detectors can outperform the other depending on the values of the noncentrality parameter values, (ii) the provided bounds qualitatively capture the actual performance of the centralized and decentralized detectors as the non-centrality parameters increase, and (iii) the provided bounds are rather tight over a large range of non-centrality parameters. In Fig. 4 we show that the difference of the detection probabilities of the centralized and decentralized detectors can be large, especially when the non-centrality parameters are small and satisfy $\lambda_{c}\approx\lambda_{i}$ , as evident in panel (a) of Fig. 4 .

5 Design of optimal attacks

In this section we consider the problem of designing attacks that deteriorate the performance of the interconnected system (1) while remaining undetected from the centralized and decentralized detectors. We measure the degradation induced by an attack with the expected value of the deviation of the state trajectory from the origin. We assume that the attack is a deterministic signal, and thus independent of the noise affecting the system dynamics and measurements. In particular, for a fixed value of the probability $P_{c}^{F}$ and a threshold $P_{c}^{F}\leq\delta_{c}\leq 1$ , we consider the optimization problem

[TABLE]

where $U^{a}$ is the deterministic attack input over time horizon $T$ (see (7)). Notice that, because the attack is deterministic, the objective function in (P.1) can be simplified by bringing the expectation inside the summation, and replacing the state equation constraint with the mean state response. Further, because the system parameters and $P_{c}^{F}$ are fixed, $\tau_{c}$ and $p_{c}$ are also fixed, which ensures that $P^{D}_{d}$ only depends on noncentrality parameter. This observation along with the fact that $Q(\cdot)$ is increasing function in noncentrality parameter (see Remark 2) allows us to express the detection constraint in terms of $\lambda_{c}$ . Specifically, the optimization problem (P.1) can be rewritten as

[TABLE]

where we have used that $\mathrm{Cov}\left[x(k)\right]$ is independent of the attack $u^{a}(k)$ , and

[TABLE]

with $\overline{x}(k)=\mathbb{E}[x(k)]$ . Further, we have $\widetilde{\delta}_{c}=Q^{-1}_{p_{c},\tau_{c}}(\delta_{c})$ , where $Q^{-1}_{p_{c},\tau_{c}}(\alpha):[0,1]\to[0,\infty]$ denotes the inverse of $Q(\tau_{c};p_{c},\lambda_{c})$ for fixed $p_{c}$ and $\tau_{c}$ , and $\lambda_{c}=(U^{a})^{\mathsf{T}}M_{c}(U^{a})$ , with $M_{c}$ as in (15). It should be noticed that the attack constraint in (P.2) essentially limits the (weighted) energy of the attack signal. We next characterize the solution to the optimization problem (P.2).

Theorem 5.1

(Optimal attack vectors) Let $U^{*}_{c}$ be any solution of (P.2). Then, there exist a $\gamma_{c}>0$ such that the pair $(U^{*}_{c},\gamma_{c})$ solves the following optimality equations:

[TABLE]

where

[TABLE]

A proof of Theorem 5.1 is postponed to the Appendix. Theorem 5.1 not only guarantees the existence of optimal attacks, but it also provides us with necessary conditions to verify if an attack is (locally) optimal. When the system initial state is zero, we can also quantify the performance degradation induced by an optimal attack. Let $\rho_{\text{max}}(A,B)$ and $\nu_{\text{max}}(A,B)$ denote a largest generalized eigenvalue of a matrix pair $(A,B)$ and one of its associated generalized eigenvectors [28].

Lemma 5.2

(System degradation with zero initial state) Let $x(0)=0$ . Then, the optimal solution to (P.2) is

[TABLE]

and its associated optimal cost is

[TABLE]

where $\nu^{*}=\nu_{\text{max}}\left(\mathcal{B}_{a}^{\mathsf{T}}\mathcal{B}_{a},M_{c}\right)$ .

A proof of Lemma 5.2 is postponed to the Appendix. From (24), notice that the system degradation caused by an optimal attack depends on the detector’s tolerance, as measured by $\widetilde{\delta}_{c}$ , and the system dynamics, as measured by $\rho_{\text{max}}\left(\cdot\right)$ . See Remark 4 for the influence of processed measurement’s noise uncertainty on the system degradation due to optimal attacks.

Remark 3

(Optimal attack vector against decentralized detector) To characterize the performance degradation of the system analytically, we consider a relaxed form of detection constraint. Specifically, we design optimal attacks subjected to $\overline{P}^{D}_{d}\leq\delta_{d}$ instead of $P^{D}_{d}\leq\delta_{d}$ , where $\overline{P}^{D}_{d}$ is an upper bound on $P^{D}_{d}$ (see Lemma 1.3). The design of optimal attacks that are undetectable from the decentralized detector can be formulated in the following way:

[TABLE]

where the summation in the detectability constraint follows from Lemma 1.3 and the fact that $\overline{P}^{D}_{d}\leq\delta_{d}$ becomes equivalent to $\sum_{i=1}^{N}\lambda_{i}\leq\widetilde{\delta}_{d}$ , where $\widetilde{\delta}_{d}=Q^{-1}_{p_{\text{sum}},\tau_{\text{min}}}(\delta_{d})$ , $p_{\text{sum}}=\sum_{i=1}^{N}p_{i}$ , and $\tau_{\text{min}}=\min\limits_{1\leq i\leq N}\tau_{i}$ . Let $\Pi_{i}$ be a permutation matrix such that $U_{i}^{a}=\Pi_{i}U^{a}$ , and let $\Pi=\left[\Pi_{1}^{\mathsf{T}},\ldots,\Pi_{N}^{\mathsf{T}}\right]^{\mathsf{T}}$ and $M_{d}=\Pi^{\mathsf{T}}\mathrm{blkdiag}(M_{1},\ldots,M_{N})\Pi$ . For any solution $U^{*}_{d}$ of (P.2), there exist $\gamma_{d}>0$ such that the pair $\left(U^{*}_{d},\gamma_{d}\right)$ solves the following optimality equations:

[TABLE]

Further, if $x(0)=0$ , then the largest degradation is $J^{*}_{d}=\widetilde{\delta}_{d}\,\rho_{\text{max}}\left(\mathcal{B}_{a}^{\mathsf{T}}\mathcal{B}_{a},M_{d}\right)$ . $\square$

Remark 4

(Maximum degradation of the system performance with respect to system noise) To see the role of noise level, in the processed measurements, on the system degradation, we consider the following covariance matrices: $\Sigma_{w_{i}}=\sigma^{2}I_{n_{i}}$ and $\Sigma_{v_{i}}=\sigma^{2}I_{r_{i}}$ , for $i\in\{1,\ldots,N\}$ . Then, from (24) we have

[TABLE]

where $\widetilde{M}_{c}=\left(N_{c}\mathcal{F}_{c}^{(a)}\right)^{\mathsf{T}}\left[\mathcal{F}_{c}^{(w)}\left(\mathcal{F}_{c}^{(w)}\right)^{\mathsf{T}}+I\right]^{-1}\left(N_{c}\mathcal{F}_{c}^{(a)}\right)$ . From (25) we note that the system degradation increases with the increase in the noise level, i.e., $\sigma^{2}$ . $\square$

6 Numerical comparison of centralized and decentralized

detectors

In this section, we demonstrate our theoretical findings on the IEEE RTS-96 power network model [29], which we partition into three subregions as shown in Fig. 5. We followed the approach in [30] to obtain a linear time-invariant model of the power network, and then discretized it using a sampling time of $0.01$ seconds. For a false alarm probability $P^{F}_{c}=P^{F}_{d}=0.05$ , we consider the family of attacks $U^{a}=\sqrt{\theta/(\boldsymbol{1}^{\mathsf{T}}M_{c}\boldsymbol{1})}\boldsymbol{1}$ , where $\boldsymbol{1}$ is the vector of all ones and $\theta>0$ . It can be shown that the noncentrality parameters satisfy $\lambda_{c}=\theta$ and $\lambda_{i}=\theta(\boldsymbol{1}^{\mathsf{T}}M_{i}\boldsymbol{1})/(\boldsymbol{1}^{\mathsf{T}}M_{c}\boldsymbol{1})$ , and moreover, the choice of vector $\boldsymbol{1}$ is arbitrary and it does not affecting the following results.

(Illustration of Theorem 4.2) For the measurement horizon of $T=100$ seconds, the values of $p_{c}$ and $\tau_{c}$ are $5130$ and $5480.6$ , respectively. Fig. 6 show that the detection probabilities of the centralized and decentralized detectors increase monotonically with the attack parameter $\theta$ . As predicted by the sufficient condition (20a) and shown in Fig. 6, the centralized detector is guaranteed to outperform the decentralized detector when $\theta>173$ . This figure also shows that our condition is conservative, because $P^{D}_{c}\geq P^{D}_{d}$ for all values of $\theta$ as shown in Fig. 6.

(Illustration of Theorem 4.3) Contrary to the previous example, by letting $T=125$ seconds, we obtain $p_{c}=6755$ and $\tau_{c}=6947.3$ . For these choice of parameters, the decentralized detector is guaranteed to outperform the centralized detector when $\theta\leq 511$ . This behavior is predicted by our sufficient condition (20b), and it is illustrated in Fig. 7. As in the previous example, the estimation provided by our condition (20b) is conservative, as illustrated in Fig. 7.

(Illustration of Lemma 5.2) In Fig. 8 we compare the performance degradation induced by the optimal attacks designed according to the optimization problems (P.2) and (P.3) with zero initial conditions. In particular, we plot the optimal costs $J^{*}_{c}$ and $J^{*}_{d}$ against the tolerance levels $\widetilde{\delta}_{c}$ and $\widetilde{\delta}_{d}$ , respectively. As expected, the performance degradation is proportional to the tolerance levels and, for the considered setup, it is larger in the case of the decentralized detector.

7 Conclusion

In this work we compare the performance of centralized and decentralized schemes for the detection of attacks in stochastic interconnected systems. In addition to quantifying the performance of each detection scheme, we prove the counterintuitive result that the decentralized scheme can, at times, outperform its centralized counterpart, and that this behavior results due to the simple versus composite nature of the attack detection problem. We illustrate our findings through academic examples and a case study based on the IEEE RTS-96 power system. Several questions remain of interest for future investigation, including the characterization of optimal detection schemes, an analytical comparison of the degradation induced by undetectable attacks as a function of the detection scheme, and the analysis of iterative detection strategies.

APPENDIX

Proof of Lemma 3.1:

Since the attack vectors $U^{a}_{i}$ and $U^{a}$ are deterministic, and $W_{i}$ , $V_{i}$ , $V$ , and $W$ are zero mean random vectors, from the linearity of the expectation operator it follows from (8) that

[TABLE]

Further, from the properties of $\mathrm{Cov}[\cdot]$ , we have the following:

[TABLE]

where (a) follows because the measurement and process noises are independent of each other. Instead, (b) is due to the fact that the noise vectors are independent and identically distributed. Similar analysis also results in the expression of $\Sigma_{c}$ , and hence the details are omitted. Finally, by invoking the fact that linear transformations preserve Gaussianity, the distribution of $\widetilde{Y}_{i}$ and $\widetilde{Y}_{c}$ is Gaussian as well. \QED

Proof of Lemma 3.2:

From the statistics and distributional form of $\widetilde{Y}_{i}$ and $\widetilde{Y}_{c}$ (see (9)), and threshold tests defined in (11) and (12), it follows from [31, Theorem 3.3.3] that, under

null hypothesis $H_{0}$ : $\Lambda_{i}\sim\chi^{2}(p_{i})$ and $\Lambda_{c}\sim\chi^{2}(p_{c})$ , where $p_{i}$ and $p_{c}$ are defined in (14). 2. 2.

alternative hypothesis $H_{1}$ : $\Lambda_{i}\sim\chi^{2}(p_{i},\lambda_{i})$ and $\Lambda_{c}\sim\chi^{2}(p_{c},\lambda_{c})$ , where $\lambda_{i}=\beta_{i}^{\mathsf{T}}\Sigma_{i}^{-1}\beta_{i}$ and $\lambda_{c}=\beta_{c}^{\mathsf{T}}\Sigma_{c}^{-1}\beta_{c}$ .

By substituting $\beta_{i}=N_{i}\mathcal{F}^{(a)}_{i}U^{a}_{i}$ and $\beta_{c}=N_{c}\mathcal{F}^{(a)}_{c}U^{a}_{c}$ (see Lemma 3.1) and rearranging the terms, we get the expressions of $\lambda_{i}$ and $\lambda_{c}$ in (14). Finally, from the aforementioned distributional forms of $\Lambda_{i}$ and $\Lambda_{c}$ , it now follows that the false alarm and the detection probabilities of the tests (11) and (12) are the right tail probabilities (represented by $Q(\cdot)$ function) of the central and noncentral chi-squared distributions, respectively. Hence, the expressions in (13) follow. \QED

Proof of Lemma 3.3:

Without loss of generality let $i=1$ . Thus, it suffices to show that a) $p_{1}\leq p_{c}$ and b) $\lambda_{1}\leq\lambda_{c}$ .

Case (a): For brevity, define

[TABLE]

and note that $\widetilde{\Sigma}_{i}>0$ and $\widetilde{\Sigma}_{c}>0$ . From Lemma 3.1, Lemma 3.2, and (26), we have

[TABLE]

Similarly, $p_{1}=\text{Rank}\left(N_{1}\right)$ . Since, $N_{1}^{\mathsf{T}}$ and $N_{c}^{\mathsf{T}}$ are a basis vectors of the null spaces $\mathcal{N}_{1}^{L}$ and $\mathcal{N}_{c}^{L}$ (see (37)) respectively, it follows from Proposition A.1 that $p_{1}\leq p_{c}$ .

Case (b): As the proof for this result is rather long and tedious, we break it down in to multiple steps:

•

Step 1: Express $\lambda_{1}$ and $\lambda_{c}$ using the statistics of a permuted version of $Y_{c}$ .

•

Step 2: Obtain lower bound on $\lambda_{c}$ , which depends on the statistics of the measurements pertaining to Subsystem 1.

•

Step 3: Show that $\lambda_{1}$ is less than bound in Step $2$ .

Step 1 (alternative form of $\lambda_{1}$ and $\lambda_{c}$ ):

Notice that $\lambda_{1}$ and $\lambda_{c}$ in (14) can be expressed as $\lambda_{1}=\beta_{1}^{\mathsf{T}}\Sigma_{1}^{-1}\beta_{1}$ and $\lambda_{c}=\beta_{c}^{\mathsf{T}}\Sigma_{c}^{-1}\beta_{c}$ , respectively, where $\beta_{1}$ , $\beta_{c}$ , $\Sigma_{1}$ , and $\Sigma_{c}$ are defined in Lemma 3.1. For convenience, we express $\lambda_{1}$ and $\lambda_{c}$ in an alternative way. Let $i\in\{1,\ldots,N\}$ and consider the $i-$ th sensor measurements of (3)

[TABLE]

Also, define $Y_{c,i}=\begin{bmatrix}y_{c,i}^{\mathsf{T}}(1)&\ldots&y_{c,i}^{\mathsf{T}}(T)\end{bmatrix}^{\mathsf{T}}$ and $\widehat{Y}_{c}=\begin{bmatrix}Y_{c,1}^{\mathsf{T}}&\ldots&Y^{\mathsf{T}}_{c,N}\end{bmatrix}$ . Now, from (27) and state equation in (3), $Y_{c,i}$ can be expanded as

[TABLE]

where the matrices $\mathcal{O}_{c,i}$ , $\mathcal{F}^{(a)}_{c,i}$ , and $\mathcal{F}^{(w)}_{c,i}$ are similar to the matrices defined in Section II-A. By substituting the above decomposition of $Y_{c,i}$ in $\widehat{Y}$ we have

[TABLE]

Moreover, from the distributional assumptions on $W$ and $V$ , it readily follows that (similarly to the proof of Lemma 3.1),

[TABLE]

where $\Sigma=\left(\widehat{\mathcal{F}}^{w}_{c}\right)\left(I_{T}\otimes\Sigma_{w}\right)\left(\widehat{\mathcal{F}}^{w}_{c}\right)^{\mathsf{T}}+\left(I_{T}\otimes\Sigma_{v}\right)$ , and $\Sigma_{w}$ and $\Sigma_{v}$ are defined same as in Lemma 3.1.

Now, consider the measurement equation $y_{i}(k)$ in (1) and note that $C_{c,i}x(k)=C_{i}x_{i}(k)$ . Thus, $y_{i}(k)=y_{c,i}(k)$ , for all $i\in\{1,\ldots,N\}$ and $k\in\mathbb{N}$ . From this observation it follows that $Y_{i}=Y_{c,i}=\Pi_{i}\widehat{Y}_{c}$ , where $\Pi_{i}$ is a selection matrix. Let $\widetilde{N}_{i}=N_{i}\Pi_{i}$ and note that $\widetilde{N}_{i}\widehat{\mathcal{O}}=N_{i}\mathcal{O}_{c,i}$ . Further from Proposition A.1 we have $N_{i}\mathcal{O}_{c,i}=0$ . With these facts in place, from Lemma 3.1 we now have

[TABLE]

Similarly, since $\widehat{Y}_{c}$ is just a rearrangement of $Y_{c}$ (see (5)), there exists a permutation matrix $Q$ such that $Y_{c}=Q\widehat{Y}_{c}$ , and, ultimately $\widetilde{Y}_{c}=N_{c}Y_{c}=N_{c}Q\widehat{Y}_{c}$ . Thus,

[TABLE]

Let $z=\widehat{\mathcal{F}}^{a}_{c}U^{a}$ . From (29) and (30) we have

[TABLE]

Step 2 (lower bound on $\lambda_{c}$ ):

Since, $Y_{c}=N_{c}Y_{c}=N_{c}Q\widehat{Y}_{c}$ , it follows that $N_{c}Q$ is the basis of the null space $\widehat{\mathcal{O}}_{c}$ . Further, the row vectors of $\mathcal{O}_{c,i}$ and $\mathcal{O}_{c,j}$ are linearly independent, whenever $i\neq j$ . Using these facts we can define $N_{c,i}=\begin{bmatrix}N_{c,i}^{i}&\cdots&N_{c,i}^{N}\end{bmatrix}$ such that $N_{c}Q=\begin{bmatrix}N_{c,1}^{\mathsf{T}}&\cdots&N_{c,N}^{\mathsf{T}}\end{bmatrix}^{\mathsf{T}}$ , where $N_{c,i}^{i}\mathcal{O}_{c,i}=0$ . Let $P_{1}=\begin{bmatrix}\left(N_{c,2}\right)^{\mathsf{T}}&\cdots&\left(N_{c,N}\right)^{\mathsf{T}}\end{bmatrix}^{\mathsf{T}}$ and note that

[TABLE]

Let $S_{1}=N_{c,1}\Sigma N_{c,1}^{\mathsf{T}}$ . Since $\Sigma>0$ , it follows that both the matrices $S_{1}$ and $P_{1}^{\mathsf{T}}\Sigma P_{1}$ are invertible. Hence, from Schur’s complement, there exists a matrix $X\geq 0$ such that

[TABLE]

Similarly, consider the following partition of $\Sigma$ :

[TABLE]

where $\Sigma_{11}>0$ and $\Sigma_{22}>0$ , and let $S_{2}=(N^{1}_{c,1})\Sigma_{11}(N^{1}_{c,1})^{\mathsf{T}}$ . Invoking Schur’s complement, we have the following:

[TABLE]

where $Y\geq 0$ . Substituting(32) and (33) in (31), we have

[TABLE]

Instead, $\lambda_{1}$ in (31) can be shown as

[TABLE]

where we used the fact that $\widetilde{N}_{1}=N_{1}\Pi_{1}$ .

Step 3 ( $\lambda_{c}\geq\lambda_{1}$ ):

For $\lambda_{c}\geq\lambda_{1}$ to hold true, it suffices to show the following:

[TABLE]

By invoking Proposition A.1, we note that there exists a full row rank matrix $F_{1}$ , such that $N_{1}=F_{1}N_{c,1}^{1}$ . Since $F_{1}^{\mathsf{T}}$ is a full column rank matrix, we can define an invertible matrix $\widetilde{F}_{1}^{\mathsf{T}}\triangleq\left[F_{1}^{\mathsf{T}}\,M_{1}^{\mathsf{T}}\right]$ , where $M_{1}$ forms a basis for null space of $F_{1}$ , such that the following holds

[TABLE]

By invoking Schur’s complement, it follows that

[TABLE]

where $Z\geq 0$ . Hence,

[TABLE]

By substituting $\widetilde{F}_{1}^{\mathsf{T}}=[F_{1}^{\mathsf{T}}\,M_{1}^{\mathsf{T}}]$ in the above expression, and rearranging the terms we have

[TABLE]

The required inequality follows by substituting $S_{2}=(N_{c,1}^{1})\Sigma_{11}(N_{c,1}^{1})^{\mathsf{T}}$ and $N_{1}=F_{1}N_{c,1}$ , and recalling the fact that the sum of two positive semi definite matrices is greater than or equal to either of the matrices. \QED

Proof of Lemma 4.1:

Let $\mathcal{E}_{i}$ be an event that the $i-$ th local detector decides $H_{1}$ when the true hypothesis is $H_{0}$ . Then, $P^{F}_{i}=\mathrm{Pr}\left[\mathcal{E}_{i}\right]$ . Let $\mathcal{E}_{i}^{\complement}$ be the complement of $\mathcal{E}_{i}$ . Then, from (16) it follows that

[TABLE]

where for the $(a)$ we used the fact that the events $\mathcal{E}_{i}$ are mutually independent for all $i\in\{1,\ldots N\}$ . To see this fact, notice that the event $\mathcal{E}_{i}$ is defined on $\widetilde{Y}_{i}$ (see (8)). Further, $\widetilde{Y}_{i}$ depends only on the deterministic attack signal $U^{a}_{i}$ and the noise vectors $V_{i}$ and $W_{i}$ , but not on the interconnection signal $U_{i}$ (see (6)). Now, by invoking the fact that noises variables across different subsystems are independent, it also follows that the events $\mathcal{E}_{i}$ are also mutually independent. Similar procedure will lead to the analogous expression for $P^{D}_{d}$ and hence, the details are omitted. \QED

Proof of Theorem 4.2:

Let $\mu_{c}=p_{c}+\lambda_{c}$ and $\sigma_{c}=\sqrt{2(p_{c}+2\lambda_{c})}$ , and assume that (18) holds true. Then, from the monotonicity property of the CDF associated with the test statistic $\Lambda_{c}$ , which follows $\chi^{2}(p_{c},\lambda_{c})$ , we have the following inequality

[TABLE]

From the inequality (41b), it now follows that

[TABLE]

where for the last inequality we used the fact that $P^{D}_{i}\leq P^{D}_{\text{max}}$ for all $i\in\{1,\ldots,N\}$ . By using the above inequality and Lemma 3.2, under hypothesis $H_{1}$ , we have

[TABLE]

Proof of Theorem 4.3

Let $\mu_{c}=p_{c}+\lambda_{c}$ and $\sigma_{c}=\sqrt{2(p_{c}+2\lambda_{c})}$ , and assume that (19) holds true. Then, from the monotonicity property of the CDF associated with the test statistic $\Lambda_{c}$ , which follows $\chi^{2}(p_{c},\lambda_{c})$ , we have the following inequality

[TABLE]

From the inequality (41a), it now follows that

[TABLE]

The result follows by substituting $P^{D}_{c}=1-\mathrm{Pr}\left[\Lambda_{c}\leq\tau_{c}|H_{1}\right]$ in the above inequality. \QED

Proof of Theorem 5.1

By recursively expanding the equality constraint of the optimization problem (P.2) we have

[TABLE]

By using the above identity, (P.2) can also be expressed as

[TABLE]

From the first-order necessary conditions [32] we now have

[TABLE]

where the gradient $\nabla$ is with respect to $U^{a}$ .

Case (i): Suppose $(U^{*}_{c})^{\mathsf{T}}M_{c}(U^{*}_{c})<\widetilde{\delta}_{c}$ . Then $\gamma=0$ should hold true to ensure the complementarity slackness condition (36b). Using these observations in the KKT conditions we now have $\nabla f(U^{*}_{c})=0$ . Further, since, $f(U^{a})$ is a convex function of $U^{a}$ , by evaluating the second derivative of $f(U^{a})$ at $U^{*}_{c}$ , it can be easily seen that the obtained $U^{*}_{c}$ results in minimum value of (P.2) rather than the maximum.Thus, for any $U^{*}_{c}$ of (P.2), the condition $(U^{*}_{c})^{\mathsf{T}}M_{c}(U^{*}_{c})<\widetilde{\delta}_{c}$ cannot hold true.

Case (ii): Suppose $(U^{*}_{c})^{\mathsf{T}}M_{c}(U^{*}_{c})=\widetilde{\delta}_{c}$ . Then the KKT conditions can be simplified to the following:

[TABLE]

The result now follows by evaluating the derivative on the left hand side of the first equality. \QED

Proof of Lemma 5.2

By substituting $x(0)=0$ in (21a), we note that any optimal attack $U^{*}_{c}$ is of the form $k\nu$ , where $\nu$ is the generalized eigenvector of the pair $(\mathcal{B}^{\mathsf{T}}_{a}\mathcal{B},M_{c})$ [28], and the scalar $k=\sqrt{\widetilde{\delta}_{c}/\nu^{\mathsf{T}}M_{c}\nu}$ is obtained from (21b). Let $J_{c}$ be the optimal cost associated with an attack of the form $U^{*}_{c}=k\nu$ . Then,

[TABLE]

where the first equality follows from the fact that the objective function $\sum_{k=1}^{T}\overline{x}^{\mathsf{T}}(k)\overline{x}(k)$ in (P.2) can be expressed as $(U^{*}_{c})^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}_{a}\mathcal{B}_{a}(U^{*}_{c})$ , and the second equality follows from (21a). Since $\nu$ is a generalized eigenvector of the pair $(\mathcal{B}^{\mathsf{T}}_{a}\mathcal{B},M_{c})$ , it follows that $\gamma$ is the eigenvalue corresponding to $\nu$ and hence, $J_{c}$ is maximized when $\gamma$ is maximum, which is obtained for $v=v^{\star}$ . The result follows since, $\gamma=\rho_{\text{max}}$ , for $v=v^{\star}$ . \QED

Proposition A.1

Let $\mathcal{O}_{i}$ $\mathcal{F}_{i}^{(u)}$ be the observability and impulse response matrices defined in (6). Define

[TABLE]

where $\mathcal{O}_{c,i}=\begin{bmatrix}\left(C_{c,i}A\right)^{\mathsf{T}}&\cdots&\left(C_{c,i}A^{T}\right)^{\mathsf{T}}\end{bmatrix}^{\mathsf{T}}$ and $C_{c,i}=\begin{bmatrix}0&\cdots&C_{i}&\cdots&0\end{bmatrix}$ . Then, $\mathcal{N}^{L}_{i}\subseteq\mathcal{N}^{L}_{c,i}\subseteq\mathcal{N}^{L}_{c}$ , for all $i\in\{1,\ldots,N\}$ .

Proof 1.2.

Without loss of generality, let $i=1$ . By definition, the set inclusion $\mathcal{N}^{L}_{c,1}\subseteq\mathcal{N}^{L}_{c}$ is trivial. For the other inclusion, consider the system defined in (3) without the attack and noise, i.e., $x(k+1)=Ax(k)$ . Let $x(k)=\begin{bmatrix}x_{1}^{\mathsf{T}}(k)&u_{1}^{\mathsf{T}}(k)\end{bmatrix}^{\mathsf{T}}$ , where $x_{1}(k)$ and $u_{1}(k)$ are the state and the interconnection signal of Subsystem $1$ . Also, let

[TABLE]

Notice that, $x(k+1)=Ax(k)$ can be decomposed as

[TABLE]

By letting $\widetilde{C}_{1}=\begin{bmatrix}C_{1}A_{11}&C_{1}B_{1}\end{bmatrix}$ and recursively expanding $x_{1}(k)$ using (39), we have

[TABLE]

where the second, third, and fourth equalities follows from (38), system $x(k+1)=Ax(k)$ , and (39), respectively. By recalling that $\mathcal{O}_{c,1}x(0)=\begin{bmatrix}\left(C_{c,1}A\right)^{\mathsf{T}}&\cdots\left(C_{c,1}A^{T}\right)^{\mathsf{T}}\end{bmatrix}^{\mathsf{T}}x(0)$ , it follows from (1.2) that

[TABLE]

Let $z$ be any vector such that $z^{\mathsf{T}}\begin{bmatrix}\mathcal{O}_{1}&\mathcal{F}^{(u)}_{1}\end{bmatrix}=0^{\mathsf{T}}$ . Then, $z$ also satisfies $z^{\mathsf{T}}\mathcal{O}_{c,1}=0^{\mathsf{T}}$ . Thus, $\mathcal{N}^{L}_{1}\subseteq\mathcal{N}^{L}_{c,1}$ .

Lemma 1.3.

(Upper bound on $P^{D}_{d}$ ) Let $p_{i}$ and $\lambda_{i}$ be defined as in (14), and $\tau_{i}$ be defined as in (11). Let $p_{\text{sum}}=\sum_{i=1}^{N}p_{i}$ , $\lambda_{\text{sum}}=\sum_{i=1}^{N}\lambda_{i}$ , and $\tau_{\text{min}}=\min\limits_{1\leq i\leq N}\tau_{i}$ . Then,

[TABLE]

where $S_{d}\sim\chi^{2}(p_{\text{sum}},\lambda_{\text{sum}})$ .

Proof 1.4.

Consider the following events:

[TABLE]

where the event $\mathcal{V}_{i}$ is associated with the $i-$ th local detector’s threshold test. From the definition of the above events, it is easy to note that $\bigcup_{i=1}^{N}\mathcal{V}_{i}\subseteq\mathcal{V}$ . By the monotonicity of the probability measures, it follows that

[TABLE]

From the reproducibility property of the noncentral chi-squared distribution [33], it now follows that $\sum_{i=1}^{N}\widetilde{Y}_{i}^{\mathsf{T}}\Sigma^{-1}_{i}\widetilde{Y}_{i}$ equals $S_{d}$ in distribution and hence, $\mathrm{Pr}[\mathcal{V}|H_{1}]=\mathrm{Pr}[S_{d}>\tau_{\text{min}}]$ .

Lemma 1.5.

(Exponential bounds on the tails of $\chi^{2}(p,\lambda)$ ) Let $Y\sim\chi^{2}(p,\lambda)$ , $\mu=p+\lambda$ , $\sigma=\sqrt{2(p+2\lambda)}$ . For all $x>0$ ,

[TABLE]

Proof 1.6.

See [34].

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] F. Pasqualetti, F. D rfler, and F. Bullo, “Attack detection and identification in cyber-physical systems,” IEEE Transactions on Automatic Control , vol. 58, no. 11, pp. 2715–2729, Nov 2013.
2[2] Y. Lun, A. D’Innocenzo, I. Malavolta, M. Domenica, and D. Benedetto, “Cyber-physical systems security: a systematic mapping study,” arxiv , 2016, available at https://arxiv.org/pdf/1605.09641.pdf.
3[3] J. Chen and R. Patton, Robust Model-Based Fault Diagnosis for Dynamic Systems . Springer-Verlag New York, 1999.
4[4] Y. Yuan, Q. Zhu, F. Sun, Q. Wang, and T. Basar, “Resilient control of cyber-physical systems against denial-of-service attacks,” in 2013 6th International Symposium on Resilient Control Systems (ISRCS) , Aug 2013, pp. 54–59.
5[5] H. Fawzi, P. Tabuada, and S. Diggavi, “Secure estimation and control for cyber-physical systems under adversarial attacks,” IEEE Transactions on Automatic Control , vol. 59, no. 6, pp. 1454–1467, 2014.
6[6] S. Sridhar and M. Govindarasu, “Model-based attack detection and mitigation for automatic generation control,” IEEE Transactions on Smart Grid , vol. 5, no. 2, pp. 580–591, March 2014.
7[7] L. Liu, M. Esmalifalak, Q. Ding, V. A. Emesih, and Z. Han, “Detecting false data injection attacks on power grid by sparse optimization,” IEEE Transactions on Smart Grid , vol. 5, no. 2, pp. 612–621, March 2014.
8[8] H. Zhang, P. Cheng, L. Shi, and J. Chen, “Optimal denial-of-service attack scheduling with energy constraint,” IEEE Transactions on Automatic Control , vol. 60, no. 11, pp. 3023–3028, Nov 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Centralized Versus Decentralized Detection of Attacks in

Abstract

1 Introduction

2 Problem setup and preliminary notions

Remark 1

3 Local, decentralized, and centralized detectors

3.1 Processing of measurements

Lemma** 3.1**

3.2 Statistical hypothesis testing framework

Lemma** 3.2**

Remark 2

Lemma** 3.3**

4 Comparison of centralized and decentralized detection of

Lemma** 4.1**

Theorem** 4.2**

Theorem** 4.3**

5 Design of optimal attacks

Theorem** 5.1**

Lemma** 5.2**

Remark 3

Remark 4

6 Numerical comparison of centralized and decentralized

7 Conclusion

APPENDIX

Proposition A.1

Proof 1.2**.**

Lemma 1.3**.**

Proof 1.4**.**

Lemma 1.5**.**

Proof 1.6**.**

Lemma 3.1

Lemma 3.2

Lemma 3.3

Lemma 4.1

Theorem 4.2

Theorem 4.3

Theorem 5.1

Lemma 5.2

Proof 1.2.

Lemma 1.3.

Proof 1.4.

Lemma 1.5.

Proof 1.6.