Metrics Towards Measuring Cyber Agility

Jose David Mireles; Eric Ficke; Jin-Hee Cho; Patrick Hurley and; Shouhuai Xu

arXiv:1906.05395·cs.CR·June 14, 2019

Metrics Towards Measuring Cyber Agility

Jose David Mireles, Eric Ficke, Jin-Hee Cho, Patrick Hurley and, Shouhuai Xu

PDF

TL;DR

This paper introduces a novel metric framework to quantify cyber agility by analyzing the dynamic evolution of cyber attacks and defenses, transforming static security metrics into meaningful measures of strategic effectiveness.

Contribution

It presents the first systematic framework for measuring cyber agility, applicable to various static metrics, validated through real-world case studies.

Findings

01

Framework successfully quantifies cyber agility in case studies.

02

Transforms static security metrics into dynamic measures.

03

Highlights limitations and future research directions.

Abstract

In cyberspace, evolutionary strategies are commonly used by both attackers and defenders. For example, an attacker's strategy often changes over the course of time, as new vulnerabilities are discovered and/or mitigated. Similarly, a defender's strategy changes over time. These changes may or may not be in direct response to a change in the opponent's strategy. In any case, it is important to have a set of quantitative metrics to characterize and understand the effectiveness of attackers' and defenders' evolutionary strategies, which reflect their {\em cyber agility}. Despite its clear importance, few systematic metrics have been developed to quantify the cyber agility of attackers and defenders. In this paper, we propose the first metric framework for measuring cyber agility in terms of the effectiveness of the dynamic evolution of cyber attacks and defenses. The proposed framework is…

Figures18

Click any figure to enlarge with its caption.

Tables1

Table 1. TABLE I: Summary of key notations and their meanings.

Notation	Description
$[0, T]$	Time horizon, $t \in {0, 1, \dots, T}$
$𝒟_{t}$	Defense at time $t \in [0, T]$
$𝒜_{t^{'}}$	Attack at time $t^{'} \in [0, T]$
$𝒳$	Attacker or defender
$ℓ$ ( $𝒳$ )	Number of generations made by $𝒳$ during $[0, T]$
$ℳ$	Universe of static security metrics for measuring defense effectiveness, scaled in $[0, 1]$
$𝒟_{t} (𝒜_{t^{'}}, M)$	Effectiveness of defense $𝒟_{t}$ at time $t$ against attack $𝒜_{t^{'}}$ at time $t^{'}$ in terms of metric $M \in ℳ$
$GT (𝒟, t)$	Defender’s Generation-Time at time $t$
$GT (𝒜, t^{'})$	Attacker’s Generation-Time at time $t^{'}$
$EGT (𝒟, t)$	Defender’s Effective-Generation-Time at time $t$
$EGT (𝒜, t^{'})$	Attacker’s Effective-Generation-Time at time $t^{'}$
$TT (𝒟, t)$	Defender’s Triggering-Time at time $t$
$TT (𝒜, t^{'})$	Attacker’s Triggering-Time at time $t^{'}$
$LBT (𝒳)$	Lagging-Behind-Time of $𝒳$
$EE (𝒟, t)$	Defender’s Evolutionary-Effectiveness at time $t$
$EE (𝒜, t^{'})$	Attacker’s Evolutionary-Effectiveness at time $t^{'}$
$RGI (𝒳)$	Relative-Generational-Impact of $𝒳$
$AGI (𝒳)$	Aggregated-Generational-Impact of $𝒳$

Equations57

GT (D, i) = t_{i + 1} - t_{i} for i = 0, 1, \dots, ℓ - 1.

GT (D, i) = t_{i + 1} - t_{i} for i = 0, 1, \dots, ℓ - 1.

GT (A, j) = t_{j + 1}^{'} - t_{j}^{'} for j = 0, 1, \dots, k - 1.

GT (A, j) = t_{j + 1}^{'} - t_{j}^{'} for j = 0, 1, \dots, k - 1.

GT (D, 0) = t_{1} - t_{0}, \dots, GT (D, ℓ - 1) = t_{ℓ} - t_{ℓ - 1},

GT (D, 0) = t_{1} - t_{0}, \dots, GT (D, ℓ - 1) = t_{ℓ} - t_{ℓ - 1},

GT (A, 0) = t_{1}^{'} - t_{0}^{'}, \dots, GT (A, ℓ - 1) = t_{ℓ}^{'} - t_{ℓ - 1}^{'},

GT (A, 0) = t_{1}^{'} - t_{0}^{'}, \dots, GT (A, ℓ - 1) = t_{ℓ}^{'} - t_{ℓ - 1}^{'},

EGT (D, i) = t_{i^{*}} - t_{i}

EGT (D, i) = t_{i^{*}} - t_{i}

D_{t_{i} + Δ t} (A_{t_{i}}, M) \leq D_{t_{i}} (A_{t_{i}}, M)

D_{t_{i} + Δ t} (A_{t_{i}}, M) \leq D_{t_{i}} (A_{t_{i}}, M)

for any 0 < Δ t < EGT (D, i)

D_{t_{i} + EGT (D, i)} (A_{t_{i}}, M) = D_{t_{i^{*}}} (A_{t_{i}}, M) > D_{t_{i}} (A_{t_{i}}, M);

D_{t_{i} + EGT (D, i)} (A_{t_{i}}, M) = D_{t_{i^{*}}} (A_{t_{i}}, M) > D_{t_{i}} (A_{t_{i}}, M);

EGT (A, j) = t_{j^{*}}^{'} - t_{j}^{'}

EGT (A, j) = t_{j^{*}}^{'} - t_{j}^{'}

D_{t_{j}^{'}} (A_{t_{j}^{'} + Δ t}, M) \geq D_{t_{j}^{'}} (A_{t_{j}^{'}}, M)

D_{t_{j}^{'}} (A_{t_{j}^{'} + Δ t}, M) \geq D_{t_{j}^{'}} (A_{t_{j}^{'}}, M)

for any 0 < Δ t < EGT (A, j)

D_{t_{j}^{'}} (A_{t_{j}^{'} + EGT (A, j)}, M) = D_{t_{j}^{'}} (A_{t_{j^{*}}^{'}}, M) < D_{t_{j}^{'}} (A_{t_{j}^{'}}, M);

D_{t_{j}^{'}} (A_{t_{j}^{'} + EGT (A, j)}, M) = D_{t_{j}^{'}} (A_{t_{j^{*}}^{'}}, M) < D_{t_{j}^{'}} (A_{t_{j}^{'}}, M);

GT (D, t) \leq EGT (D, t) and GT (A, t) \leq EGT (A, t)

GT (D, t) \leq EGT (D, t) and GT (A, t) \leq EGT (A, t)

t^{'} = arg max_{0 \leq t^{'} < t_{i}} D_{t_{i}} (A_{t^{'}}, M) - D_{t_{i - 1}} (A_{t^{'}}, M), where D_{t_{i}} (A_{t^{'}}, M) > D_{t_{i - 1}} (A_{t^{'}}, M) .

t^{'} = arg max_{0 \leq t^{'} < t_{i}} D_{t_{i}} (A_{t^{'}}, M) - D_{t_{i - 1}} (A_{t^{'}}, M), where D_{t_{i}} (A_{t^{'}}, M) > D_{t_{i - 1}} (A_{t^{'}}, M) .

t = arg min_{0 \leq t < t_{j}^{'}} D_{t} (A_{t_{j}^{'}}, M) - D_{t} (A_{t_{j - 1}^{'}}, M),

t = arg min_{0 \leq t < t_{j}^{'}} D_{t} (A_{t_{j}^{'}}, M) - D_{t} (A_{t_{j - 1}^{'}}, M),

where D_{t} (A_{t_{j}^{'}}, M) < D_{t} (A_{t_{j - 1}^{'}}, M) .

D_{t} (A_{t - λ}, M) \geq ε for λ = 0, \dots, T and t \geq λ

D_{t} (A_{t - λ}, M) \geq ε for λ = 0, \dots, T and t \geq λ

LBT (D) = min {λ : D_{t} (A_{t - λ}, M) \geq ε, λ = 0, \dots, T, t \geq λ} .

LBT (D) = min {λ : D_{t} (A_{t - λ}, M) \geq ε, λ = 0, \dots, T, t \geq λ} .

D_{t} (A_{t + λ}, M) \geq ε for λ = 0, \dots, T and t \leq T - λ,

D_{t} (A_{t + λ}, M) \geq ε for λ = 0, \dots, T and t \leq T - λ,

LBT (A) = max {λ : D_{t} (A_{t + λ}, M) \geq ε, where λ = 0, \dots, T, t \geq T - λ} .

LBT (A) = max {λ : D_{t} (A_{t + λ}, M) \geq ε, where λ = 0, \dots, T, t \geq T - λ} .

\frac{1}{T - λ + 1} t = λ \sum T D_{t} (A_{t - λ}, M) \geq ε .

\frac{1}{T - λ + 1} t = λ \sum T D_{t} (A_{t - λ}, M) \geq ε .

EE (D, j) = \frac{1}{T + 1} t = 0 \sum T [D_{t} (A_{t_{j}^{'}}, M)] .

EE (D, j) = \frac{1}{T + 1} t = 0 \sum T [D_{t} (A_{t_{j}^{'}}, M)] .

EE (A, i) = \frac{1}{T + 1} t^{'} = 0 \sum T [D_{t_{i}} (A_{t^{'}}, M)] .

EE (A, i) = \frac{1}{T + 1} t^{'} = 0 \sum T [D_{t_{i}} (A_{t^{'}}, M)] .

EE (D, j) = \frac{1}{T - t _{j}^{'} + 1} t = t_{j}^{'} \sum T [D_{t} (A_{t_{j}^{'}}, M)] .

EE (D, j) = \frac{1}{T - t _{j}^{'} + 1} t = t_{j}^{'} \sum T [D_{t} (A_{t_{j}^{'}}, M)] .

RGI (t) = D_{t} (A_{t}, M) - D_{t - 1} (A_{t - 1}, M) .

RGI (t) = D_{t} (A_{t}, M) - D_{t - 1} (A_{t - 1}, M) .

G_{s} (i)

G_{s} (i)

AGI (D) = \frac{1}{T} i = 1 \sum T G_{s} (i) .

AGI (D) = \frac{1}{T} i = 1 \sum T G_{s} (i) .

AGI (T) = \frac{1}{T} i = 1 \sum z G_{s} (i)

AGI (T) = \frac{1}{T} i = 1 \sum z G_{s} (i)

G_{s} (i)

G_{s} (i)

D_{t} (A_{t}, M) - D_{t} (A_{t^{'}}, M) > τ .

D_{t} (A_{t}, M) - D_{t} (A_{t^{'}}, M) > τ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Metrics Towards Measuring Cyber Agility

Jose David Mireles, Eric Ficke, Jin-Hee Cho, , Patrick Hurley, and Shouhuai Xu J. Mireles, E. Ficke, and S. Xu are with the Department of Computer Science, University of Texas at San Antonio. J.H. Cho is with the Department of Computer Science, Virginia Tech. P. Hurley is with the U.S. Air Force Research Laboratory, Rome, NY. Correspondence: [email protected]

Abstract

In cyberspace, evolutionary strategies are commonly used by both attackers and defenders. For example, an attacker’s strategy often changes over the course of time, as new vulnerabilities are discovered and/or mitigated. Similarly, a defender’s strategy changes over time. These changes may or may not be in direct response to a change in the opponent’s strategy. In any case, it is important to have a set of quantitative metrics to characterize and understand the effectiveness of attackers’ and defenders’ evolutionary strategies, which reflect their cyber agility. Despite its clear importance, few systematic metrics have been developed to quantify the cyber agility of attackers and defenders. In this paper, we propose the first metric framework for measuring cyber agility in terms of the effectiveness of the dynamic evolution of cyber attacks and defenses. The proposed framework is generic and applicable to transform any relevant, quantitative, and/or conventional static security metrics (e.g., false positives and false negatives) into dynamic metrics to capture dynamics of system behaviors. In order to validate the usefulness of the proposed framework, we conduct case studies on measuring the evolution of cyber attacks and defenses using two real-world datasets. We discuss the limitations of the current work and identify future research directions.

Index Terms:

Security metrics, agility metrics, cyber agility, cyber maneuverability, measurements, attack, defense

I Introduction

In order to maximize the effectiveness of cyber attacks or defenses, both cyber attackers and defenders frequently evolve their strategies. The rule-of-thumb is that cyber attackers are more agile in adapting their strategies than cyber defenders, because cyber defenders often tend to take reactive responses to new attacks. Accordingly, cyber attack incidents are frequently reported in news media. However, the state-of-the-art technology does not provide quantitative metrics that can measure how well cyber attackers or defenders are able to adapt or update their resources over time. We call this problem measuring cyber agility. Cyber agility and its quantification have recently been recognized as critical cybersecurity issues that are little understood [1, 2, 3, 4, 5].

In this paper, we take a first step towards tackling the problem of measuring cyber agility. We propose a systematic set of quantitative metrics to measure cyber attack and defense evolution generations (or simply generations), which are cyber attack and defense updates that can be considered as “building-blocks” or “atomic moves” used by cyber attackers and defenders in their operational practice. The notion of generations is important because cyber attacks and defenses often evolve over time. It is also important to see that the causality of evolution generations may differ between them. For example, some evolution generations are caused by specific opponent moves, but others aren’t (e.g., moving-target defense is not necessarily caused by any specific attacks). Intuitively, we can quantify the relative agility of cyber attackers and defenders by characterizing their evolving strategies and measuring their effectiveness during the course of interplay between cyber attackers and cyber defenders.

The aforementioned dynamic view of cybersecurity metrics, which we pursue in the present paper, contrasts with the conventional static view of cybersecurity metrics as follows: the dynamic view reflects a system’s evolution over a period of time, while the static view captures measurements of metrics at a certain time point or in an aggregated way. To the best of our knowledge, this is the first work that defines systematic metrics to quantify the effectiveness of cyber attack and defense evolution gearing towards measuring cyber agility.

I-A Key Contributions

This work makes the following key contributions:

•

Development of a system-level evolutionary metric framework: We develop a dynamic metric framework that can deliver more information of system behaviors than static metrics. There exist some time-dependent metrics that measure the outcome of attack-defense interactions at every time point $t$ [6, 7, 8, 9, 10, 11]. However, our proposed metric framework takes one step beyond those time-dependent metrics because it is capable of measuring, at time $t$ , the system state at both a past time $t^{\prime}$ (where $t^{\prime}<t$ ) and a future time $t^{\prime\prime}$ (where $t^{\prime\prime}>t$ ). This means that the framework can be used for retrospective security analysis (e.g., identifying what a defender did right or wrong in the past). In contrast, the time-dependent metrics [6, 7, 8, 9, 10, 11] only measure the defense effectiveness at $t$ when the measurement is made at $t$ .

•

Transformation of static metrics into dynamic metrics: We transform conventional static metrics into dynamic metrics to measure attack and defense effectiveness at each evolution generation. For example, static metrics (e.g., false-positive or false-negative rate) can be transformed into dynamic metrics to capture a dynamic sequence of evolution generations by attackers or defenders. Thus, the framework can be viewed as a “compiler” that transforms static metrics into dynamic metrics.

•

Validation of the proposed metric framework using real datasets: In order to validate the framework, we apply it to analyze two real-world datasets, one is the network traffic collected at a honeypot instrument [12] and the other corresponds to the DEFCON Capture The Flag (CTF) exercise [13]. We use the Snort intrusion detection system (IDS) [14] as a defense mechanism, whose detection capability are frequently updated (reflecting defense generations that may or may not be caused by specific attack evolution generations).

In the present study we focus on proposing a systematic framework with clear definitions of cyber agility metrics. Although our case study is limited by the datasets we have access to, we hope researchers having access to semantically richer datasets can applying our framework to their datasets to draw deeper insights towards taming cyber agility.

The remainder of the paper is organized as follows. Section II discusses the background and related prior studies. Section III presents the proposed metric framework. Section IV discusses the insights obtained from Section III. Section V presents the case study on using the framework to analyze two real datasets. Section VI discusses limitations of this study and future research directions. Section VII concludes this paper.

II Background & Related Work

II-A Concept of Agility and Agility Metrics

Agility has been recognized as one of the key system metrics, but has been studied only in an ad-hoc manner [1, 15]. It has been investigated in multiple domains [16, 17, 18, 19, 20, 21, 22]. In an enterprise system, agility is defined as the ability to deal with sudden changes in an environment (e.g., the latency of response to sudden or unexpected changes [1, 16, 17, 18, 22]). In the systems engineering domain, agility measures a system’s capability of the reactive or proactive response to sudden environmental changes [23]. In military settings, agility refers to the ability an entity takes an effective action under dynamic, unexpected environments that may threaten the system’s sustainability [19]. For example, the qualitative notion of agility quotient is proposed to accommodate six attributes, including responsiveness, versatility, flexibility, resilience, adaptiveness, and innovativeness [21].

In the cybersecurity domain, agility often refers to reasoned changes to a system or environment in response to functional or security needs, but has not been paid due attention until very recently [15, 1, 2, 3, 4, 5]. While it would be intuitive to understand agility as how fast (e.g., response ability [23]) and how effective a system can adapt its configuration to unexpected attacks against it (e.g., considering the cost incurred by the degraded system performance or the cost incurred by response actions [1, 24]), the concept of agility is, like other metrics, elusive to formalize. Indeed, there is no rigorous or quantitative definition of agility in the cybersecurity domain and existing attempts to model agility have not been able to produce concrete measurements [2, 25, 26]. Despite the apparent importance, systematic and quantitative agility metrics have not been studied and understood in-depth.

In this paper, we tackle the problem of measuring agility in the cybersecurity domain and propose an agility metric framework based on evolution generations of cyber attacks and defenses. To the best of our knowledge, this is the first systematic metric framework for understanding and measuring cyber agility. Our framework is both general and flexible because it can accommodate attack and defense evolution generations that may or may not be incurred by specific opponent evolution generations.

II-B Dynamic Security Metrics vs. Agility Metrics

The importance of security metrics and the challenges of developing useful security metrics have been recognized by security communities [27, 28, 29, 30, 31]. Most metrics proposed in the literature are static in nature because they are often defined without considering dynamics over time [4, 5, 32]. This implies that static metrics can capture either a system’s snapshot at a particular time or a system’s overall behavior/state for a period of time; this static view can easily overlook the evolution of attacks and/or defenses over the time horizon. For example, the effectiveness of anti-malware tools (e.g., the false positive rate or false negative rate) is often measured based on malware samples collected during a period of time (e.g., one year), while ignoring their instantaneous evolution over time. Taking one step further from static metrics, time-dependent security metrics have been studied to characterize and quantify system states at different times, such as the proportion of compromised computers in a network [6, 7, 8, 9, 10, 11, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42].

The dynamic metrics proposed in this paper aim to capture a system’s state, covering both a previous time $t^{\prime}$ and a future time $t^{\prime\prime}$ , when a measurement is made at $t$ where $t^{\prime}<t<t^{\prime\prime}$ . On the other hand, the time-dependent metrics measure the system’s current state at time $t$ , as mentioned in Section I-A. For example, the effectiveness of an IDS can be characterized by its true-positive rate and/or false-negative rate, which may not stay the same over time because its decision engine (e.g., the rule set in the case of the Snort) is frequently updated.

There have been other proposals for dynamic security metrics [4], including (i) metrics for measuring the strength of preventive defense (e.g., reaction time between the observation of an adversarial entity at time $t$ and the blacklisting of the adversarial entity at time $t^{\prime}$ [43]); (ii) metrics for measuring the strength of reactive defense (e.g., detection time between a compromised computer starting to wage attacks at times $t$ and $t^{\prime}$ at which the attack is first observed by some cyber defense instrument [44, 45]); (iii) metrics for measuring the strength of overall defense (e.g., penetration resistance for measuring the level of effort that is imposed on a red team in order to penetrate into a cyber system [46, 47]); and (iv) metrics for measuring and forecasting cyber threats and incidents [48, 49, 50, 51, 52, 53]. Although these metrics are related to time, they are geared towards individual security events. In contrast, our framework is systematic and correlates attack-defense over the horizon of time.

III The Metrics Framework

The proposed framework aims to define metrics to measure the effectiveness of attack and defense generations during the course of attack-defense interactions.

III-A Guiding Principles

The framework is designed under the following principles:

•

Leveraging static security metrics: Although most existing security metrics measure the static aspects of a system’s security, some of them are well defined and commonly accepted, such as detection errors (e.g., false-positives or false-negatives) for an IDS or anti-malware system. The framework aims to accommodate static metrics that were defined in the past or may be defined in the future.

•

Considering the evolution of both attack and defense behaviors: The framework aims to understand and improve defenders’ evolution over the course of time. To this end, we will answer the following research questions:

–

To what extent is an attacker or defender evolving based on its opponent’s new strategy?

–

Which evolutionary strategy is more effective in terms of an attacker’s or defender’s perspective?

–

Which party (i.e., an attacker or defender) is more active over the course of attack-defense interactions?

•

Identifying the core metrics to measure systems security: Since the effectiveness of attack and defense generations may not be adequately reflected by a single metric, we consider a suite of metrics that measure evolution generations from multiple perspectives. These metrics may be then aggregated using an appropriate method (e.g., a weighted average).

•

Coping with new or zero-day attacks: Current defenses have a very limited power in detecting new or zero-day attacks. For a clear understanding on the effectiveness of attack and defense evolutions, we measure generations by tracing recorded network traffic and/or computer execution. This allows us to characterize defense failures in retrospect.

III-B Representation of Attacks and Defenses over Time

In this paper, the term “target system” is used to represent a range of systems, from a single computer (or device) to an enterprise network or a cloud. A target system is defended by human defenders who can use a variety of defense tools. Therefore, “attackers" and “defenders" refer to either humans, automated systems, or a combination of them (depending on the practice). The term “attack and defense generations” is used to refer to the atomic evolution of behaviors by attackers and defenders.

Specifically, we consider time horizon $t\in[0,T]$ , where $T$ can be infinite in theory (i.e., $T=\infty$ ) but is often finite in practice (i.e., $T<\infty$ ). Since most metrics are often measured at discrete times (e.g., daily or hourly), we consider discrete-time over $t=0,1,\ldots,T$ . That is, we treat each generation as if it happens at an instant time in the beginning of each time unit (e.g., day or hour). In practice, it is possible that attacks and defenses are respectively observed over time intervals, say, $[t_{1},t_{2}]$ and $[t^{\prime}_{1},t^{\prime}_{2}]$ , where $t_{1}\neq t^{\prime}_{1}$ and $t_{2}\neq t^{\prime}_{2}$ . In this case, we can treat $\min(t_{1},t^{\prime}_{1})$ as time 0 and $\max(t_{2},t^{\prime}_{2})$ as time $T$ .

We use the term “defense" at discrete time $t$ to refer to the defense tool and the human defender(s) employed at time $t$ . This is important because defense tools may be updated with newer versions and human defenders may join/leave a defense team at any point in time. Such a change in defense produces a new defense “generation”. We denote the defense at time $t$ by ${\mathcal{D}}_{t}$ , where ${\mathcal{D}}_{0},{\mathcal{D}}_{1},\ldots,{\mathcal{D}}_{T}$ represent the evolution of the defense over time $t=0,1,\ldots,T$ . Note that ${\mathcal{D}}_{t}={\mathcal{D}}_{t+1}$ indicates that the defense at time $t$ and the defense at time $t+1$ are the same and therefore belong to the same defense generation. Note also that ${\mathcal{D}}_{t}$ can be described by a nominal scale, such as a version number for a defense tool or an action by a human defender. In this way, we can check whether or not the defense at time $t$ belongs to a different generation than the defense at time $t+1$ . For example, suppose a target system’s defense is started at $t=0$ (e.g., Jan. 1). Suppose the time unit is a day and the defense at $t=0$ consists of a human defender and an attack-detection tool with version 10.1.2. Suppose the attack-detection tool is updated on the first day of each month. This means that ${\mathcal{D}}_{0}={\mathcal{D}}_{1}=\ldots={\mathcal{D}}_{30}\neq{\mathcal{D}}_{31}$ , where ${\mathcal{D}}_{31}$ refers to the same human defender but the attack-detection tool is version (say) 10.1.3 on Feb. 1.

Similarly, we use the term “attack” to describe attack tools, attack tactics, or attack vectors as well as the human attackers exploiting these attack tools, attack tactics, or attack vectors. Let ${\mathcal{A}}_{t}$ denote an attack against a target system at time $t$ , and ${\mathcal{A}}_{0},{\mathcal{A}}_{1},\ldots,{\mathcal{A}}_{T}$ represent the evolution of the attack over $t=0,1,\ldots,T$ . Note also that ${\mathcal{A}}_{t}$ can be measured by a nominal scale. Hence, we can detect if two attacks performed at two different points in time belong to the same generation.

Remark. In the discrete-time model, attack and defense generations are assumed to evolve at discrete, deterministic time points. In practice, generations can evolve over stochastic time, implying that the time resolution should be sufficiently small. When the monitoring time interval is infinitely small, a continuous-time model should be used. In reality, the highest time resolution is what can be measured by a computer clock. The investigation on whether to consider a continuous-time model or not is beyond the scope of this paper.

III-C Representation of Defense Effectiveness over Time

The effectiveness of defense ${\mathcal{D}}_{t}$ against attack ${\mathcal{A}}_{t}$ at time $t\in[0,T]$ is measured by some static metrics (e.g., false-positive rate or false-negative rate). A metric $M$ is a mathematical function that maps the target system (or a particular property or attribute of the target system) at time $t$ to a value in a range (e.g., false-positive rate) [4]. In order to make the presentation succinct, we assume that the range of any metric $M\in{\mathcal{M}}$ can be normalized to $[0,1]$ , where a larger value is more desirable from a defender’s point of view (e.g., a larger value means a higher level of security). Some metrics with an opposite meaning (e.g., a smaller false-positive rate, denoted by $fp$ , is better) can be adjusted to be consistent with the scale in $M$ (e.g., using $1-fp$ instead of $fp$ ).

Let ${\mathcal{M}}$ denote the universe of static metrics, including both existing metrics and metrics that are yet to be defined. For a metric $M\in{\mathcal{M}}$ , we use ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)$ to denote the effectiveness of defense ${\mathcal{D}}_{t}$ against attack ${\mathcal{A}}_{t}$ at time $t\in[0,T]$ in terms of metric $M$ . By considering metric $M\in{\mathcal{M}}$ over time $t=0,1,\ldots,T$ , we obtain a sequence of effectiveness measurements, which can be leveraged to define cyber defense agility as described below.

Table I summarizes the main notations used in this paper.

III-D Example Scenario

Fig. 1 shows an example used throughout the rest of this section. Fig. 1 (a) illustrates that a defender evolved (e.g., updated a security software version) at $t=0,3$ and $4$ while making no changes at $t=1,2,5$ and $6$ . An attacker evolved (e.g., changed an attack strategy) at $t=0,4$ and $6$ while making no changes at $t=1,2,3$ and $5$ . Fig. 1 (b) views attack-defense generations at the same time scale axis, by demonstrating how the attacker’s evolution times may or may not coincide with the defender’s evolution times.

III-E Overview of Metrics

At a high level, we consider two dimensions of evolution: timeliness and effectiveness. Timeliness reflects the time it takes to evolve new generations while effectiveness reflects impacts of these generations. However, timeliness-oriented metrics can use effectiveness as a reference, and effectiveness-oriented metrics can use time as a reference. Fig. 2 summarizes these metrics and the structural relationship between them.

III-E1 Timeliness-Oriented Metrics

This suite of metrics measures how quickly one adversarial party (i.e., an attacker or defender) evolves its strategies with or without considering the resulting effectiveness. This suite contains 4 metrics, which are equally applicable to both an attacker and defender, leading to 8 metrics in total. The 4 metrics are as follows:

•

Generation-Time (GT) measures the time between two consecutive generations of strategies that are observed by the measuring party (i.e., an attacker or defender).

•

Effective-Generation-Time (EGT) measures the time it takes for a party to evolve a generation which indeed increases the effectiveness against the opponent.

•

Triggering-Time (TT) measures the length of time since the opponent’s reference generation that (if observed) may have triggered a particular generation.

•

Lagging-Behind-Time (LBT) measures how long a party lags behind its opponent with respect to a reference time.

These 4 metrics are random variables in nature, sampled over the time horizon $[0,T]$ .

III-E2 Effectiveness-Oriented Metrics

This suite of metrics measures the effectiveness of generations over the course of the evolution. This suite contains 3 metrics, which are equally applicable to both an attacker and defender, leading to 6 metrics in total. The 3 metrics are as follows:

•

Evolutionary-Effectiveness (EE) measures the overall effectiveness of generations with respect to the opponent’s generation. This is a random variable over $t\in[0,T]$ .

•

Relative-Generational-Impact (RGI) measures the effectiveness gained by generation $i$ over that of generation $i-1$ .

•

Aggregated-Generational-Impact (AGI) measures the gain in the effectiveness of all generations $t\in[0,T]$ .

III-E3 Relationship between the Metrics

Fig. 3 systematizes the relationship between the aforementioned 14 metrics. These metrics are organized in two dimensions: time ( $x$ -axis) and effectiveness ( $y$ -axis). The metrics in the upper half of the plane (i.e., $y>0$ ) represent the defender’s perspective; the metrics in the lower half of the plane (i.e., $y<0$ ) represent the attacker’s perspective. The metrics in the right-hand half of the plane (i.e., $x>0$ ) look forward in time based on some reference point; the metrics in the left-hand half of the plane (i.e., $x<0$ ) look backward in time based on some reference point.

A metric closer to the $x$ -axis indicates a more time-oriented perspective, while the other metrics are more oriented towards an effectiveness perspective. A metric on the $x$ -axis is defined primarily based on the time dimension, including GT and LBT. Effectiveness-oriented metrics are RGI and AGI, where ${\text{AGI}}({\mathcal{D}})=-{\text{AGI}}({\mathcal{A}})$ . The metrics defined from the defender’s perspective and the metrics defined from the attacker’s perspective are symmetric across the $x$ -axis. ${\text{EE}}({\mathcal{D}})$ and ${\text{EE}}({\mathcal{A}})$ are defined over the entire plane because they look both backward and forward in time.

III-F Timeliness-Oriented Metrics

III-F1 Generation-Time (GT)

This metric measures the time it takes for a party to evolve its strategy, which may or may not be induced by the opponent’s evolution in strategy. GT is a random variable because generations often evolve based on some stochastic events.

Defenders’ GT, denoted by ${\text{GT}}({\mathcal{D}})$ : Suppose the defense is evolved at $t_{0}=0,t_{1},\ldots,t_{\ell}\leq T$ , namely $\{t_{0},t_{1},\ldots,t_{\ell}\}\subseteq[0,T]$ . The defender’s GT, namely random variable ${\text{GT}}({\mathcal{D}})$ , is sampled by ${\text{GT}}({\mathcal{D}},i)$ ’s, where

[TABLE]

This implies that ${\mathcal{D}}_{t_{i}+\Delta t}={\mathcal{D}}_{t_{i}}$ for any $\Delta t<t_{i+1}-t_{i}$ and ${\mathcal{D}}_{t_{i}+{\text{GT}}({\mathcal{D}},i)}={\mathcal{D}}_{t_{i+1}}\neq{\mathcal{D}}_{t_{i}}$ because the defense is not evolved until time $t_{i+1}$ . Consider the example in Fig. 1 (a), where the defense generations evolve at $t=0,3$ and $4$ , meaning $t_{0}=0$ , $t_{1}=3$ , $t_{2}=4$ , ${\mathcal{D}}_{0}={\mathcal{D}}_{1}={\mathcal{D}}_{2}$ , and ${\mathcal{D}}_{4}={\mathcal{D}}_{5}={\mathcal{D}}_{6}$ . Therefore, the defender’s GT is a random variable sampled by ${\text{GT}}({\mathcal{D}},0)=t_{1}-t_{0}=3$ and ${\text{GT}}({\mathcal{D}},1)=t_{2}-t_{1}=1$ in this toy example.

Attackers’ GT, denoted by ${\text{GT}}({\mathcal{A}})$ : Suppose the attack evolves at $t^{\prime}_{0}=0,t^{\prime}_{1},\ldots,t^{\prime}_{k}\leq T$ , namely $\{t^{\prime}_{0},t^{\prime}_{1},\ldots,t^{\prime}_{k}\}\subseteq[0,T]$ , where notation $t^{\prime}$ (rather than $t$ ) is meant to further highlight the perspective of the attacker’s. Then, the attacker’s GT, namely random variable ${\text{GT}}({\mathcal{A}})$ , is sampled by ${\text{GT}}({\mathcal{A}},j)$ ’s, where

[TABLE]

This means that ${\mathcal{A}}_{{t^{\prime}_{j}}+\Delta t}={\mathcal{A}}_{t^{\prime}_{j}}$ for any $\Delta t<t^{\prime}_{j+1}-t^{\prime}_{j}$ and that ${\mathcal{A}}_{t^{\prime}_{j}+{\text{GT}}({\mathcal{A}},j)}={\mathcal{A}}_{t^{\prime}_{j+1}}\neq{\mathcal{A}}_{t^{\prime}_{j}}$ . Consider the example illustrated in Fig. 1 (a), where the defense is evolved at $t^{\prime}=0,4$ and $6$ . This means $t^{\prime}_{0}=0$ , $t^{\prime}_{1}=4$ , $t^{\prime}_{2}=6$ , ${\mathcal{A}}_{0}={\mathcal{A}}_{1}={\mathcal{A}}_{2}={\mathcal{A}}_{3}$ , and ${\mathcal{A}}_{4}={\mathcal{A}}_{5}$ . The attacker’s GT is a random variable sampled by ${\text{GT}}({\mathcal{A}},0)=t^{\prime}_{1}-t^{\prime}_{0}=4$ and ${\text{GT}}({\mathcal{A}},1)=t^{\prime}_{2}-t^{\prime}_{1}=2$ in this toy example.

Summarizing the preceding discussion, we have:

Definition 1

(GT)* The defender’s GT is defined as random variable ${\text{GT}}({\mathcal{D}})$ , sampled by:*

[TABLE]

where $t_{0}=0,t_{1},\ldots,t_{\ell}\leq T$ are the sequence of points in time the defense evolves. The attacker’s GT is defined as a random variable ${\text{GT}}({\mathcal{A}})$ , sampled by:

[TABLE]

where $t^{\prime}_{0}=0,t^{\prime}_{1},\ldots,t^{\prime}_{k}\leq T$ are the sequence of time points to which the attack evolves.

III-F2 Effective-Generation-Time (EGT)

This metric considers the attack-defense generations as a whole by measuring the time to make effective generations by an attacker or defender. Note that EGT is different from GT because the latter only focuses on timeliness. Moreover, GT may not be able to reveal a relationship with respect to the opponent’s generations because not every generation is the result of adversarial action. Measuring EGT will allow a party to characterize in retrospect the effectiveness of its strategies over the time horizon.

Defenders’ EGT, denoted by ${\text{EGT}}({\mathcal{D}})$ : Suppose defense generations evolve at $t_{0}=0,t_{1},\ldots,t_{\ell}\leq T$ , namely $\{t_{0},t_{1},\ldots,t_{\ell}\}\subseteq[0,T]$ . The defender’s EGT is a random variable, denoted by ${\text{EGT}}({\mathcal{D}})$ , because the evolution of defense generations are stochastic in nature. The random variable ${\text{EGT}}({\mathcal{D}})$ is sampled by ${\text{EGT}}({\mathcal{D}},i)$ ’s for $i=0,\ldots,\ell-1$ such that ${\mathcal{D}}_{t_{i}+{\text{EGT}}({\mathcal{D}},i)}$ is the nearest future generation that leads to a higher than ${\mathcal{D}}_{t_{i}}({\mathcal{A}}_{t_{i}},M)$ defense effectiveness. Formally, ${\text{EGT}}({\mathcal{D}},i)$ is defined as

[TABLE]

when there exists some $t_{i^{*}}\in\{t_{i+1},\ldots,t_{\ell}\}$ such that

[TABLE]

and

[TABLE]

otherwise, we define ${\text{EGT}}({\mathcal{D}},i)=\infty$ , indicating that from the defender’s perspective, no further effective defense generation is made against attack ${\mathcal{A}}_{t_{i}}$ after $t_{i}$ .

Let us continue to use the example in Fig. 1, where the three defense generations are respectively evolved at $t_{0}=0$ , $t_{1}=3$ , and $t_{2}=4<T=6$ . Fig. 4 (a) describes the defense effectiveness of ${\mathcal{D}}_{t}({\mathcal{A}}_{0},M)$ for $t=0,1,2,3,4$ . Since ${\mathcal{D}}_{3}({\mathcal{A}}_{0},M)<{\mathcal{D}}_{0}({\mathcal{A}}_{0},M)$ , the defense generation at $t=3$ is not effective. Since ${\mathcal{D}}_{4}({\mathcal{A}}_{0},M)>{\mathcal{D}}_{0}({\mathcal{A}}_{0},M)$ , the defense generation at $t=4$ is effective. Therefore, we have ${\text{EGT}}({\mathcal{D}},0)=t_{2}-t_{0}=4>{\text{GT}}({\mathcal{D}},0)=3$ . Moreover, suppose ${\mathcal{D}}_{4}({\mathcal{A}}_{3},M)>{\mathcal{D}}_{3}({\mathcal{A}}_{3},M)$ , meaning that the generation evolved at time $t_{2}=4$ is more effective than the previous generation evolved at time $t_{1}=3$ . Then, we have ${\text{EGT}}({\mathcal{D}}_{t_{1}},1)=t_{2}-t_{1}=1$ ; otherwise, we have ${\text{EGT}}({\mathcal{D}}_{t_{1}},1)=\infty$ , indicating that no more effective defense evolution is made against attack ${\mathcal{A}}_{t_{1}}$ after time $t_{1}=3$ .

Attackers’ EGT, denoted by ${\text{EGT}}({\mathcal{A}})$ : Suppose the attack generation evolve at $t^{\prime}_{0}=0,t^{\prime}_{1},\ldots,t^{\prime}_{k}\leq T,$ namely $\{t^{\prime}_{0},t^{\prime}_{1},\ldots,t^{\prime}_{k}\}\subseteq[0,T]$ . The attacker’s EGT is defined as a random variable ${\text{EGT}}({\mathcal{A}})$ because the evolution events are stochastic. The random variable ${\text{EGT}}({\mathcal{A}})$ is sampled by ${\text{EGT}}({\mathcal{A}},j)~{}~{}\text{for}~{}~{}j=0,\ldots,k-1$ such that ${\mathcal{D}}_{t^{\prime}_{j}+{\text{EGT}}({\mathcal{D}},j)}$ is the nearest-future defense generation that leads to defense effectiveness that is smaller than ${\mathcal{D}}_{t^{\prime}_{j}}({\mathcal{A}}_{t^{\prime}_{j}},M)$ in terms of a metric $M\in{\mathcal{M}}$ . Formally, ${\text{EGT}}({\mathcal{A}},j)$ is defined as

[TABLE]

when there exists $t^{\prime}_{j^{*}}\in\{t^{\prime}_{j+1},\ldots,t^{\prime}_{k}\}$ such that

[TABLE]

and

[TABLE]

otherwise, we define ${\text{EGT}}({\mathcal{A}},j)=\infty$ , meaning that no effective attack generation is evolved against ${\mathcal{D}}_{t^{\prime}_{j}}$ at $t^{\prime}_{j}$ .

Let us continue to use the example described in Fig. 1, where the three attack generations are respectively evolved at time $t^{\prime}_{0}=0$ , $t^{\prime}_{1}=4$ , and $t^{\prime}_{2}=T=6$ . Fig. 4 (b) describes the defense effectiveness of ${\mathcal{D}}_{0}({\mathcal{A}}_{t^{\prime}})$ for $t^{\prime}=0,\ldots,6$ . Since ${\mathcal{D}}_{0}({\mathcal{A}}_{4},M)>{\mathcal{D}}_{0}({\mathcal{A}}_{0},M)$ , the attack generation at time $t=4$ is not effective from the attacker’s perspective. Since ${\mathcal{D}}_{0}({\mathcal{A}}_{6},M)<{\mathcal{D}}_{0}({\mathcal{A}}_{0},M)$ , the attack generation at time $t=6$ to ${\mathcal{D}}_{0}$ is effective from the attacker’s perspective. Therefore, we have ${\text{EGT}}({\mathcal{A}},0)=t^{\prime}_{2}-t^{\prime}_{0}=6>{\text{GT}}({\mathcal{A}},0)=4$ .

Summarizing the preceding discussion, we have:

Definition 2

(EGT)* Let ${\mathcal{M}}$ be the universe of metrics measuring static defense effectiveness discussed above. Suppose the defense generation evolve at $t_{0}=0,\ldots,t_{\ell}$ where $t_{\ell}\leq T$ . The defender’s EGT is defined as a random variable ${\text{EGT}}({\mathcal{D}})$ sampled by the ${\text{EGT}}({\mathcal{D}},i)$ ’s, where $i=0,1,\ldots,\ell-1$ , that satisfy conditions in Eqs. (6) and (7).*

Suppose the attack generations evolve at $t^{\prime}_{0}=0,t^{\prime}_{1},\ldots,t^{\prime}_{k}\leq T$ . The attacker’s EGT is defined as a random variable ${\text{EGT}}({\mathcal{A}})$ sampled by the ${\text{EGT}}({\mathcal{A}},j)$ ’s, where $j=0,1,\ldots,k-1$ , that satisfy conditions in Eqs. (9) and (10).

Remark. When comparing Definitions 1 and 2, we can derive:

[TABLE]

for any $t$ . In Definition (2), it is possible that ${\text{EGT}}({\mathcal{D}},i)=\infty$ for some $i\in[0,\ldots,\ell-1]$ , indicating that the attack generation ${\mathcal{A}}_{t_{i}}$ occurred at time at $t_{i}$ is not addressed or countered by any later defense generation. Similarly, ${\text{EGT}}({\mathcal{A}},j)=\infty$ for some $j\in[0,\ldots,k-1]$ indicates that the defense generation ${\mathcal{D}}_{t^{\prime}_{j}}$ occurred at time $t^{\prime}_{j}$ is not countered by the attacker in any later attack generation.

III-F3 Triggering-Time (TT)

This metric aims to answer the intuitive question “which generations may have caused or triggered which of the opponent’s generations”. This would offer valuable insights into the opponent’s operational process, especially a sense of responsiveness. However, the measurement result does not necessarily represent the causal triggering of a given generation (e.g., a moving-target defense may not be causally related to any specific attack but may be used to increase the attacker’s attack effort or cost).

Defender’s TT, denoted by ${\text{TT}}({\mathcal{D}})$ : Fig. 5 is obtained by splitting Fig. 1 (a) into two pictures to explain how TT is measured. Recall that defense generations are evolved at time $t=0,3,4$ , but here it suffices to consider only the two defense generations at $t=0$ and $t=3$ as an example. In Fig. 5(a), the defense generation at $t=3$ , namely ${\mathcal{D}}_{3}$ , may be triggered by some of the attack generations ${\mathcal{A}}_{0}$ , ${\mathcal{A}}_{1}$ , and ${\mathcal{A}}_{2}$ that have been made by the attacker. We may define the triggering-event of ${\mathcal{D}}_{3}$ as ${\mathcal{A}}_{j}$ for some $j\in[0,2]$ such that ${\mathcal{D}}_{3}({\mathcal{A}}_{j},M)$ has the greatest positive change in defense effectiveness when compared to ${\mathcal{D}}_{0}({\mathcal{A}}_{j},M)$ , where ${\mathcal{D}}_{0}$ is considered because it represents the previous defense generation at $t=0$ , and $j\in[0,2]$ is considered because ${\mathcal{A}}_{0},{\mathcal{A}}_{1}$ and ${\mathcal{A}}_{2}$ represent the entire history of attacks in the time horizon. Suppose ${\mathcal{D}}_{3}({\mathcal{A}}_{j},M)-{\mathcal{D}}_{0}({\mathcal{A}}_{j},M)$ is maximized for some $j\in[0,2]$ , suggesting that the defense generation may be triggered by the attack generation ${\mathcal{A}}_{j}$ . This leads us to define the Triggering-Time (TT) for defense generation ${\mathcal{D}}_{3}$ as $3-j$ , which is a particular sample, denoted by ${\text{TT}}({\mathcal{D}},3)$ , of the random variable ${\text{TT}}({\mathcal{D}})$ that is defined over the defense generations, except the ${\mathcal{D}}_{0}$ (because every sample needs to have a previous reference for comparison).

Attacker’s TT, denoted by ${\text{TT}}({\mathcal{A}})$ : For attack generations evolved at time $t=0$ and $4$ , as shown in Fig. 5 (b), attack generation ${\mathcal{A}}_{4}$ may be triggered by ${\mathcal{D}}_{0}$ , ${\mathcal{D}}_{1}$ , ${\mathcal{D}}_{2}$ , or ${\mathcal{D}}_{3}$ . We may define the triggering-event of ${\mathcal{A}}_{4}$ as ${\mathcal{D}}_{j}$ for some $j\in[0,3]$ such that ${\mathcal{D}}_{j}({\mathcal{A}}_{4},M)$ has the greatest negative change in defense effectiveness when compared to ${\mathcal{D}}_{j}({\mathcal{A}}_{0},M)$ , where $j\in[0,3]$ is considered because ${\mathcal{D}}_{0},{\mathcal{D}}_{1},{\mathcal{D}}_{2}$ ${\mathcal{D}}_{3}$ represent the history of defense generations up to time $t=4$ , and ${\mathcal{A}}_{0}$ refers to the previous attack generation (prior to ${\mathcal{A}}_{4}$ ). Suppose ${\mathcal{D}}_{j}({\mathcal{A}}_{4},M)-{\mathcal{D}}_{j}({\mathcal{A}}_{0},M)<0$ is minimized (i.e., maximized in its absolute value) at $j$ , then we can define TT for attack generation ${\mathcal{A}}_{4}$ as $4-j$ , which is a particular sample, denoted by ${\text{TT}}({\mathcal{A}},4)$ , of the random variable ${\text{TT}}({\mathcal{A}})$ that is defined over the attack generations.

Summarizing the preceding discussion, we have:

Definition 3

(TT)* Suppose defense generations are evolved at $t_{0}=0,t_{1},\ldots,t_{\ell}\leq T$ and attack generations are evolved at $t^{\prime}_{0}=0,t^{\prime}_{1},\ldots,t^{\prime}_{k}\leq T$ . The triggering-event for defense generation ${\mathcal{D}}_{t_{i}}$ , where $i\in[1,\ell]$ is defined to be attack ${\mathcal{A}}_{t^{\prime}}$ that leads to the greatest positive change in defense effectiveness relative to ${\mathcal{D}}_{t_{i-1}}$ in terms of metric $M\in{\mathcal{M}}$ , namely*

[TABLE]

If such $t^{\prime}$ exists, we define ${\text{TT}}({\mathcal{D}},i)=t_{i}-t^{\prime}$ ; otherwise, we define ${\text{TT}}({\mathcal{D}},i)=\infty$ , meaning that the defense generation is not triggered by any past attack within the time horizon. The TT of defense generations is a random variable, denoted by ${\text{TT}}({\mathcal{D}})$ , that is sampled by ${\text{TT}}({\mathcal{D}},1),\ldots,{\text{TT}}({\mathcal{D}},\ell)$ .

Similarly, the triggering-event for attack generation ${\mathcal{A}}_{t^{\prime}_{j}}$ , where $j\in[1,k]$ is defined to be defense ${\mathcal{D}}_{t}$ that leads to the greatest negative change in defense effectiveness relative to ${\mathcal{A}}_{t^{\prime}_{j-1}}$ in terms of metric $M\in{\mathcal{M}}$ , namely

[TABLE]

If such $t^{\prime}$ exists, we define ${\text{TT}}({\mathcal{A}},j)=t^{\prime}_{j}-t$ ; otherwise, we define ${\text{TT}}({\mathcal{D}},j)=\infty$ , meaning that the attack generation is not triggered by any past defense within the time horizon. The TT of attack generations is a random variable, denoted ${\text{TT}}({\mathcal{A}})$ , that is sampled by ${\text{TT}}({\mathcal{A}},1),\ldots,{\text{TT}}({\mathcal{A}},k)$ .

Remark. The preceding definition of TT can be adapted in many flavors. In the preceding definition, we propose using the maximization of ${\mathcal{D}}_{t_{i}}({\mathcal{A}}_{t^{\prime}},M)-{\mathcal{D}}_{t_{i-1}}({\mathcal{A}}_{t^{\prime}},M)$ in Eq. (3) as the criterion for identifying triggering event. Alternatively, the definition can be adapted to maximize, for example, ${\mathcal{D}}_{t_{i}}({\mathcal{A}}_{t^{\prime}},M)$ , meaning that defense $D_{t_{i}}$ is most effective against attack ${\mathcal{A}}(t^{\prime})$ . In the use case where both parties’ evolution generations are completely known, the TT metric reflects the responsiveness of a party. In the case the opponent’s evolution generations are not completely known (but the party’s own evolution generations are naturally known), the TT metric can be used to identify, in retrospect, a party’s evolution generation that may be the result of non-adversarial changes that have a security effect (e.g., new feature releases or patching of vulnerabilities). Such a retrospective security analysis is important because it helps the defender identify effective defense activities that would not be noticed by the defender otherwise. This is important because these possibly unconscious defense decisions and can offer insights into effective defense (e.g., best practice).

III-F4 Lagging-Behind Time (LBT)

This metric aims to measure how far one party is behind its opponent. Although professionals have often said that the defender lags behind the attacker, we define this metric from both the defender’s and attacker’s perspectives because they could provide ways for proactive defenses ahead of actions by the attacker.

Defender’s LBT, denoted by ${\text{LBT}}({\mathcal{D}})$ : For a security metric $M\in{\mathcal{M}}$ of interest, let $\varepsilon$ , where $0\leq\varepsilon\leq 1$ , represent the acceptable defense effectiveness. The LBT metric considers ${\mathcal{D}}_{t}({\mathcal{A}}_{t-\lambda},M)$ for $\lambda=0,\ldots,T$ and $t\geq\lambda$ . Fig. 6 (a) illustrates ${\mathcal{D}}_{t}({\mathcal{A}}_{t-\lambda},M)$ for $\lambda=1$ . Then, we define ${\text{LBT}}({\mathcal{D}})$ to be the minimum $\lambda$ such that

[TABLE]

if such $\lambda$ exists; otherwise, we define ${\text{LBT}}({\mathcal{D}})=-\infty$ , meaning that defenses lag behind attacks at least for time $T$ . In other words, we have

[TABLE]

Note that $\varepsilon$ may vary from attack-defense settings (e.g., malware detection vs. intrusion detection). Note also that ${\text{LBT}}({\mathcal{D}})=0$ means that the defender always keeps pace with the attacker.

Attacker’s LBT, denoted by ${\text{LBT}}({\mathcal{A}})$ : For a security metric $M\in{\mathcal{M}}$ of interest, recall that $\varepsilon$ , where $0\leq\varepsilon\leq 1$ , represents the acceptable defense effectiveness. This metric considers ${\mathcal{D}}_{t}({\mathcal{A}}_{t+\lambda},M)$ for $\lambda=0,\ldots,T$ . Fig. 6 (b) illustrates ${\mathcal{D}}_{t}({\mathcal{A}}_{t+\lambda},M)$ for $\lambda=1$ . Then, we define ${\text{LBT}}({\mathcal{A}})$ to be the maximum $\lambda$ such that

[TABLE]

if such $\lambda$ exists; otherwise, we define ${\text{LBT}}({\mathcal{A}})=-\infty$ , meaning that attacks lags behind defenses at least for time $T$ . In other words, we have

[TABLE]

Note that ${\text{LBT}}({\mathcal{A}})=0$ means that the attacker does not lag behind the defender.

Summarizing the preceding discussion, we have:

Definition 4

${\text{LBT}}({\mathcal{D}})$ * is defined by Eq. (16) with the minimum $\lambda$ that satisfies Eq. (15); ${\text{LBT}}({\mathcal{A}})$ is defined by Eq. (20) with the maximum $\lambda$ that satisfies Eq. (17).*

Remark. Definition 4 can be relaxed by adjusting Eq. (15) such that:

[TABLE]

This relaxation is to demand that the average defense effectiveness is acceptable, rather than to demand that the defense effectiveness is always acceptable. Note that ${\text{LBT}}({\mathcal{D}})$ and ${\text{LBT}}({\mathcal{A}})$ are “dual” to each other only in the sense that ${\text{LBT}}({\mathcal{D}})$ looks backward in time while ${\text{LBT}}({\mathcal{A}})$ looks forward in time.

III-G Effectiveness-Oriented Metrics

III-G1 Evolutionary Effectiveness (EE)

This metric measures each generation with respect to a reference generation. This is a random variable for each generation, sampled by the opponent’s generations.

Definition 5

(EE)* Suppose defense generations are evolved at time $t_{0}=0,t_{1},\ldots,t_{\ell}$ and attack generation are evolved at time $t^{\prime}_{0}=0,t^{\prime}_{1},\ldots,t^{\prime}_{k}$ . Defender’s EE is defined as a random variable, denoted by ${\text{EE}}({\mathcal{D}})$ , which is sampled by ${\text{EE}}({\mathcal{D}},j)$ for $j\in[1,k]$ , where*

[TABLE]

With respect to a reference defense generation ${\mathcal{D}}_{t}$ , the attacker’s EE is defined as a random variable, denoted by ${\text{EE}}({\mathcal{A}})$ , which is sampled by ${\text{EE}}({\mathcal{A}},i)$ for $i\in[1,\ell]$ , where

[TABLE]

Remark. Definition 5 can be adapted by replacing Eq. (22) with, for example,

[TABLE]

III-G2 Relative-Generational-Impact (RGI)

As illustrated in Fig. 7, we propose comparing ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)$ and ${\mathcal{D}}_{t-1}({\mathcal{A}}_{t-1},M)$ for $t=1,\ldots,T$ . At a specific point in time $t\in[1,T]$ , there are three possible scenarios:

(a)

When ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)={\mathcal{D}}_{t-1}({\mathcal{A}}_{t-1},M)$ , the attacker’s maneuver and the defender’s maneuver at $t$ are equal.

(b)

When ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)>{\mathcal{D}}_{t-1}({\mathcal{A}}_{t-1},M)$ , the defender is out-maneuvering the attacker at $t$ .

(c)

When ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)<{\mathcal{D}}_{t-1}({\mathcal{A}}_{t-1},M)$ , the attacker is out-maneuvering the defender at $t$ .

Definition 6

(RGI)* Defender’s RGI is a random variable, denoted by ${\text{RGI}}({\mathcal{D}})$ and sampled by ${\text{RGI}}({\mathcal{D}},t)$ for $t=1,\ldots,T$ , where*

[TABLE]

Note that unlike the metrics mentioned above, we omit an attacker’s RGI because it would hold that ${\text{RGI}}({\mathcal{A}},t)=-{\text{RGI}}({\mathcal{D}},t)$ for $t=1,\ldots,T$ .

III-G3 Aggregated-Generational-Impact (AGI)

This metric aims to measure the overall security gained by the defender over time horizon $[0,T]$ . We propose measuring the security gain over time interval $[t_{i-1},t_{i}]$ , denoted by $\mathcal{G}_{s}(i)$ , for $i=1,\ldots,T$ . As Fig. 7 describes, we assume that a straight-line is used to link $(t_{i-1},{\mathcal{D}}_{t_{i}-1}({\mathcal{A}}_{t_{i}-1},M))$ and $(t_{i},{\mathcal{D}}_{t_{i}}({\mathcal{A}}_{t_{i}},M))$ . Then, $\mathcal{G}_{s}(i)$ is defined as the area of the triangle that depends on the sign of $[{\mathcal{D}}_{t_{i}}({\mathcal{A}}_{t_{i}},M)-{\mathcal{D}}_{t_{i}-1}({\mathcal{A}}_{t_{i}-1},M)]$ . More specifically, we define

[TABLE]

where $t_{i}-t_{i-1}=1$ . This leads to:

Definition 7

(AGI)* Defender’s AGI over $[0,T]$ , denoted by ${\text{AGI}}({\mathcal{D}})$ , is defined by*

[TABLE]

An attacker’s AGI is omitted because its definition is trivial as ${\text{AGI}}({\mathcal{A}})=-{\text{AGI}}({\mathcal{D}})$ .

Remark. Note that the length of each time interval is $t_{i}-t_{i-1}=1$ for $i=1,\ldots,T$ because the time horizon is defined as $t=0,1,\ldots,T$ . As such, one may observe that $\mathcal{G}_{s}(i)$ as shown in Eq. (31) does not have to be interpreted as using the areas of triangles corresponding to individual time intervals; instead, it can be interpreted directly as $\sum_{i=1}^{T}\mathcal{G}_{s}(i)$ while ignoring the constant $\frac{1}{2}$ . Nevertheless, it has two advantages to use areas of the triangles to define security gain $\mathcal{G}_{s}(i)$ : (i) This definition remains equally applicable when the time intervals do not have the same length, as illustrated in Fig. 8; and (ii) when there is a need to interpolate a continuous and smooth curve of ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)$ over $[0,T]$ , such as the curve $f(x)$ in Fig. 8, we can divide the curve $f(x)$ into $z$ segments such that within each segment, $f(x)$ is strictly monotonic. In the example of Fig. 8, $f(x)$ is divided into 5 segments, $f_{1}(x),\ldots,f_{z}(x)$ . Accordingly, we can extend Definition 7 to define AGI over $[0,T]$ as:

[TABLE]

where

[TABLE]

IV Discussion

In this section, we discuss use cases of the proposed metric framework. Since our goal is to help defenders, the discussion below will be from a defender’s point of view. We focus on two issues: the amount of attack evolution generations that are observed by the defender; and the number of attackers vs. the number of defenders.

IV-A Use Case with respect to the Amount of Attack Generations Observed

We differentiate two scenarios: all vs. some or no attack evolution generations being observed.

IV-A1 Use Cases When All Evolution Generations Are Observed

This is the ideal case and is possible when considering (for example) white-hat attack defense experiments over a period of time. This use case is also possible for retrospective attack-defense analysis. This can happen because some attacks (e.g., new attacks or even zero-day attacks) that take place at time $t$ may not be recognized until time $t^{\prime}$ , where $t<t^{\prime}$ . If all cyber activities (e.g., network traffic and host execution) are properly recorded, the metrics defined above can be used to measure attack and defense evolution in retrospect. The measurement results tell a defender about its agility and may lead to insights into explaining why the defense failed and how the failure may be fixed in the future.

IV-A2 Use Case When Some or No Evolution Generations Are Observed

In real-world cyber attack-defense practice, the defender may only observe some or no attack evolution generations. In this case, the defender can identify “probable” attack evolution generations as follows. First, the defender treats each attack ${\mathcal{A}}_{0},\ldots,{\mathcal{A}}_{T}$ at time $t\in[0,T]$ as an evolution generation, despite that some attacks are not evolution generations. Then, the defender can identify the attacks that disrupt the existing defense most as an approximation of attack evolution generations (i.e., “probable” generations as an approximation to the unknown “ground-truth” generations). For example, this can be done as follows: Given a threshold $\tau$ where $0<\tau<1$ , attack ${\mathcal{A}}_{t^{\prime}}$ can be treated as an evolution generation if there exists $t$ , where $0\leq t<t^{\prime}$ , such that

[TABLE]

As a result, the defender can identify the probable generations and then use them measure the metrics proposed in the paper.

From a conceptual point of view, the preceding use case is reminiscent of the ground truth in supervised machine learning. In principle, the training data in supervised machine learning should be 100% accurate or correct (e.g., 0% false-positives and 0% false-negatives). In practice, this is hard to achieve. Still, supervised machine learning is useful and successful even if the ground truth of the training data is not guaranteed. In this sense, the preceding method we propose using to identify attack evolution generations approximately might be as useful as in the case of supervised machine learning with an approximate ground-truth.

IV-B Use Cases with respect to the Number of Attackers vs. the Number of Defenders

There are 4 scenarios: one attacker against one defender; one attacker against multiple defenders; multiple attackers against one defender; and multiple attackers against multiple defenders. Since the description in Section III focused on the scenario of one attacker against one defender, in what follows we discuss how the metrics framework can be used in the other three scenarios.

IV-B1 Use Case with One Attacker against Multiple Defenders

This scenario is interesting when evaluating the collective effectiveness and failures of multiple defenders. From a defender’s point of view, this is a straightforward extension to the preceding “one attacker against one defender” case because the defense generations that are evolved by each defender is known. In order to measure the collective defense effectiveness and failures, we can treat the collection of defenders as a single virtual defender.

IV-B2 Use Case with Multiple Attackers against One Defender

This scenario is perhaps what happens in the real world where a defender (of an enterprise) needs to cope with multiple attackers. In this scenario, the multiple attackers can be represented by a single virtual attacker. This makes sense when the attackers are coordinated because the coordinator can be seen as the attacker (while noting that this insight has been widely used in cryptographic models). This treatment also makes sense even if the attackers are not coordinated with each other because in the real world each defender (of an enterprise network, for example) is indeed likely dealing with multiple attackers. In this case, the attacks waged by different attackers can be superimposed over each other, leading to attack generations that are a superset of the generations that are evolved by individual attackers. As a result, the metrics framework is equally applicable in this scenario. This generality of the framework can be attributed to the fact that agility is about the attacks rather than the identities of the attackers.

We stress that the preceding discussion does not mean that effort should not be made to distinguish the attackers. This is because there is a spectrum of situations: at one end of the spectrum, the defender cannot tell the attackers apart; at the other end of the same spectrum, the defender can tell all of the attackers apart. In the former case, the defender has to treat all of the attackers as a single entity or coordinated one. In the latter case, the defender can measure cyber agility with respect to each recognized attacker, which allows the defender to measure which attacker(s) are more agile than the other attackers and therefore possibly prioritize defense resources against these more agile attackers. Therefore, the defender should always strive to distinguish the attacks waged or coordinated by different attackers. In any case, our metrics framework is equally applicable.

IV-B3 Use Case with Multiple Attackers against Multiple Defenders

Similarly, this case can be seen as a simple extension to the case of “one attacker against multiple defenders” or the case of “multiple attackers against one defenders.”

V Case Study

In this section, we show the results from our case study by applying the proposed metrics to two real datasets.

V-A Experimental Setup

V-A1 Defense Tool

The case study is based on replaying network traffic, which contains attacks, against the Snort intrusion detection system [54]. We used six versions of Snort (v2.9.4 - v.9.8) released for Dec. 2012 - Dec. 2016. For each version (e.g., v2.9.4), there are sub-versions (e.g., v2.9.4.1). Each subversion is counted as a defense generation, leading to 18 versions or defense generations, denoted by ${\mathcal{D}}_{t_{0}},\ldots,{\mathcal{D}}_{t_{17}}$ . These 18 generations are made for 1,294 days, meaning that the time horizon for the defender is $[t_{0},t_{17}]=[1,1294]$ . These Snort versions are tested on Virtual Machines (VMs) using the Ubuntu 14 operating system after making few changes on the default settings in the snort.conf file to best fit for this experiment.

V-A2 Datasets

The 18 versions of Snort are tested against the following two datasets:

•

Honeypots dataset: This dataset was collected at a low-interaction honeypot of approximately 7,000 IP addresses. This dataset contains traffic spanning Feb. 2013 - Dec. 2015. The low-interaction honeypot consists of programs, including Dionaea, Mwcollector, Amun, and Nepenthes. These programs simulate services (e.g., SMB, NetBIOS, HTTP, MySQL, and SSH) to some extent. The dataset was collected over 1,029 days, with a time horizon for the attacker $[t^{\prime}_{0},t^{\prime}_{17}]=[80,1029]$ . This is important because many datasets do not include such a length, especially in continuous (or mostly continuous) collections. Due to the absence of well-defined attack generations, we treat the attacks at each time unit (days) as an attack generation. This changes the measurement of some metrics, namely LBT and EE. There are some missing data during shorter time periods within this time horizon because the data collection system was occasionally shutdown due to various reasons. Although the dataset is not ideal because the attacker’s and defender’s time horizons are not exactly the same, which often happens in practice, the metrics framework can be appropriately applied by adjusting a different time span.

•

DEFCON CTF dataset: DEFCON is one of the world’s premiere hacker-type conferences. We use the publicly available $pcap$ files collected from DEFCON 21 (2013), DEFCON 22 (2014) and DEFCON 23 (2015), each of which corresponds to a single day. This dataset has well defined attack generations (one per year). Putting into the terms of the metrics framework, the time horizon of the attacker’s generations are at $t^{\prime}_{0}=238$ , $t^{\prime}_{1}=609$ , and $t^{\prime}_{2}=973$ . The dataset consists of $pcaps$ from 20 teams. We randomly selected a single team’s data for our experiment, and followed the team through all three DEFCON CTFs. Although DEFCON datasets are available for a number of years, we only consider these three because these three years correspond to the period of time during which Snort was considered as discussed above. It is critical that these years are distinct, because we are able to clearly see the difference between attack generations; this is not the case in the aforementioned honeypot dataset.

In our experiments, we replay the network traffic against Snort, log the alerts generated by Snort’s preprocessor and detection engine, and calculate the true-positive rate, $tp$ . This is possible because the datasets are composed of malicious traffic. In terms of the notations used above, the static metric $M$ used in the notation ${\mathcal{D}}_{t}({\mathcal{A}}_{t^{\prime}},M)$ is the true-positive rate $tp$ . It is worth mentioning that as in any data-driven study, the insights derived from a specific dataset may not be arbitrarily generalized. When not all attack generations are observed, the resulting insights may be specific to the specification of generations used in the analysis in question.

V-B Case Study with the Honeypot Dataset

Fig. 9 plots the measured evolution metrics for the Honeypot dataset. Recall that the defender’s time horizon is $[t_{0},t_{17}]=[1,1294]$ and the attacker’s time horizon is $[t^{\prime}_{0},t^{\prime}_{17}]=[80,1108]$ . This means that there are missing attack generations in the defender’s time horizon, for which some metrics cannot be measured. We measure the proposed metrics by focusing on a defender’s point of view. To be specific, we do not consider the lagging-behind time (LBT) and evolutionary-effectiveness (EE) metrics because these require both attack and defense generations to be well defined. The remainder of the metrics are well defined, given the constraints of this dataset.

Analysis on Fig. 9 (a): This figure plots the sample of a defender’s Generation Time, ${\text{GT}}({\mathcal{D}})$ , and the sample of the defender’s Effective GT, ${\text{EGT}}({\mathcal{D}})$ . Note that some blue bars representing ${\text{GT}}({\mathcal{D}})$ are not accompanied by red bars representing ${\text{EGT}}({\mathcal{D}})$ due to the missing attack traffic data for the corresponding dates. Graphing both ${\text{GT}}({\mathcal{D}})$ and ${\text{EGT}}({\mathcal{D}})$ together allows us to see any disparity between these two metrics. We observe that the mean ratio of ${\text{EGT}}({\mathcal{D}})$ to ${\text{GT}}({\mathcal{D}})$ is 3.00. This implies that effective generations evolve 3x slower than normal generations. However, when we investigate individual generations, some defense generations are less responsive than others. For example, attack generation ${\mathcal{A}}_{0}$ is not effectively responded to by a defense generation until 208 days, as shown in ${\text{EGT}}({\mathcal{D}},0)=208$ .

Analysis on Fig. 9 (b) : This figure shows the defender’s Triggering-Time (TT) metric. The red curve indicates the hypothetical worst-case scenario in that every defense generation is triggered by the very first attack, ${\mathcal{A}}_{t_{0}}={\mathcal{A}}_{80}$ . The blue curve corresponds to ${\text{TT}}({\mathcal{D}},t)$ as sampled at $t=88,123,138,208,235,284,346,501,586,685$ , and $825$ . Note that ${\text{TT}}({\mathcal{D}},88)$ and ${\text{TT}}({\mathcal{D}},123)$ are not shown because they are not obtainable from the dataset as they reach $\infty$ , based on their definition. Note that ${\text{TT}}({\mathcal{D}},t)$ for $t>825$ is not plotted because of missing attack traffic data at $t=825$ . We observe that most defense generations respond to relatively old attacks, except ${\text{TT}}({\mathcal{D}},235)=65$ and ${\text{TT}}({\mathcal{D}},586)=59$ , indicating that these defense generations respond to attacks that are almost two months old.

Analysis on Fig. 9 (c): This figure plots the RGI metric, where there are some $t$ ’s at which ${\text{RGI}}({\mathcal{D}},t)$ is missing because of missing attack traffic. From Fig. 9 (c), we can notice the true-positive rate is observed with a typical range of $\pm 10\%$ . The aggregated generational effectiveness (AGI) of the defender (i.e., Snort) is very poor at $0.01\%$ , implying little significant evolution relative to the attacker during the given times.

Results Analysis: Our results show that Snort has a history of being responsive to attacks in evolving its defense in a timely manner. However, the attackers also evolved, offsetting the previous defense gains. This explains why ${\text{AGI}}({\mathcal{D}})\approx 0.01\%$ , indicating that the Snort community is in a stalemate with the attacker. We notice that the static defense effectiveness metric, ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)$ , is low with a mean of $8.11\%$ . The cause of the low effectiveness is the low-interaction nature of the honeypot, which makes it not as semantically rich as we would like.

In summary, the defender’s agility is comparable to the attacker’s. In addition, a frequent evolution does not necessarily result in an effective evolution. This means that the defender’s agility cannot be strictly related to defense effectiveness in overall. Therefore, we need to measure cyber agility separately from defense effectiveness.

V-C Case Study with DEFCON Dataset

Fig. 10 demonstrates the measurement of evolution metrics when replaying the DEFCON dataset against Snort. Since we have three distinct DEFCON CTF attack traffic datasets (i.e., one capture per year), ${\mathcal{A}}_{t^{\prime}_{j}}$ , for $j=0,1,2$ , where $t^{\prime}_{0}=238$ , $t^{\prime}_{1}=609$ , and $t^{\prime}_{2}=973$ , we can naturally treat each of them as an attack generation.

Analysis on Fig. 10 (a): This figure plots the sample of the defender’s GT and the sample of the defender’s EGT corresponding to the data available, where each release of Snort is treated as a generation. Since there are only three attack generations where each generation serves as a reference point for defining the corresponding metric ${\text{EGT}}({\mathcal{D}},t^{\prime}_{j})$ for $j=0,1$ and $2$ , we have only three measurements of ${\text{EGT}}({\mathcal{D}},t)$ for $t=238,609,973$ , showing only three red bars. Notice that the blue bars are the same as in Fig. 9 (a). Defense evolution actions responding to ${\mathcal{A}}_{238}$ and ${\mathcal{A}}_{958}$ took relatively less generation time, showing 49 days and 69 days, respectively. However, the defense evolution took more than one year (i.e., 441 days) to respond to ${\mathcal{A}}_{609}$ . This may be because of a small set of samples to draw conclusions about the evolution rates of these two parties.

Analysis on Fig. 10 (b): This figure exhibits a the defender’s LBT. In Fig. 10 (b), the purple curve indicates the average of ${\text{LBT}}({\mathcal{D}})$ samples, which are shown in green, red, and blue. The security threshold, $\varepsilon$ (shown in teal), is set at a true-positive rate of 12%; in general, $\varepsilon$ is the minimum acceptable value for the metric chosen (i.e. true-positive rate). The average ${\text{LBT}}({\mathcal{D}})$ falls below $\varepsilon$ at $x=1.14$ years, meaning that for the threshold chosen, the defender lags behind the attacker by 1.14 years.

Analysis on Fig. 10 (c): This figure addresses the sample of defender’s EE based on the three attack generations with $t^{\prime}=238,609,973$ , resulting in three curves, each corresponding to one of the three ${\mathcal{A}}_{t^{\prime}}$ ’s. We expect that for a fixed reference attack ${\mathcal{A}}_{t^{\prime}}$ , the defense effectiveness ${\mathcal{D}}_{t}({\mathcal{A}}_{t^{\prime}},M)$ should increase over time. However, this is not universally true because the Snort rules are not monotonically increasing (i.e., some rules are occasionally deleted in order to reduce false-positives or replace poorly written rules). This explains why the effectiveness is not monotonically increasing against each given attack generation.

Results Analysis: Based on the result in Fig. 10 (c), we examine the alert types from DEFCON 22. We learn that the attackers in DEFCON 22 primarily attacked protocols for which the Snort Preprocessor Plugins had more reliable effectiveness, such as HTTP Inspect pre-processors (relative to those in DEFCON 21). The subsequent year, DEFCON 23, shows the worst detection across all defensive generations. Because this attack generation was more effective than the corresponding defensive generations, we can conclude that the attacker outmaneuvers the defenders (i.e., the Snort community) in this case.

From the defender’s EE with reference to DEFCON 22, we also observe that the EE for $A_{609}$ (DEFCON 22) increases sharply when defense ${\mathcal{D}}_{208}$ is released. In this case, the defender predicted the attack generation. One situation which could cause this is that the defender (i.e., the Snort community) saw a proof-of-concept exploit, and then evolved in order to mitigate this exploit. Later, the exploit became widespread in the wild, and was used by the attackers. However, the exploit was not successful when the attackers finally adopted it because the defender had already prepared for it. The exploit eventually fell out of popularity, so it ceased to show up in the next attack generation. This example would explain why the change appears in DEFCON 22 data, but not DEFCON 21 or DEFCON 23 data.

In summary, Snort exhibits a lower responsiveness to human-launched attacks, which are presumably most prevalent in DEFCON CTF competitions. However, the static effectiveness, interpreted as a static metric sequence ${\mathcal{D}}_{t}({\mathcal{A}}_{t},M)$ for $t=0,1,\ldots,T$ , is actually higher than its counterpart, the dataset observed by the honeypot, which is more likely to contain mostly automated attacks. This is partly because Snort has some aspects of the proactive defense capability. From these observations, cyber agility and static effectiveness need to be separately investigated.

V-D Discussion

For the honeypot dataset, the mean of ${\mathcal{D}}_{t}({\mathcal{A}}_{t},tp)$ over the time horizon is $8.11\%$ . For the DEFCON dataset, the mean of ${\mathcal{D}}_{t}({\mathcal{A}}_{t},tp)$ over $t\in\{238,609,973\}$ is $17.67\%$ . This means that Snort is not effective overall. However, we suspect that the lower true-positive rate of Snort against the honeypot dataset is largely because these low-interaction honeypots limit the depth to which the attacker may penetrate. If the dataset were collected by a high-interaction honeypot, the true-positive rate would have been higher. Snort appears to be more effective in detecting reconnaissance activities (mainly shown in the honeypot dataset) than detecting exploitation activities (mainly shown in the DEFCON CTF activities). This is why Snort appears to evolve more efficiently against the attacks observed by the honeypot.

In summary, according to the perspective of the static effectiveness, Snort is more effective in detecting attacks largely launched by human attackers than detecting attacks observed by honeypots. One caveat is that the attacks observed by low-interaction honeypots are not semantically rich enough because they capture only a limited interaction with attackers. Defense generations appear to be more effective in offsetting attacks observed by honeypots than offsetting attacks launched by human attackers. Nevertheless, Snort exhibits a potential proactive defense capability, as discussed earlier. Therefore, cyber agility needs to be separately investigated from static effectiveness. Another application of these metrics is the following: an attacker may be interested in predicting how long a zero-day attack will be usable before an effective defense will be deployed. For example, the attacker can calculate the expected EGT in the response to zero-day attacks. This also helps the defender to estimate the attacker’s expectation of EGT, which may be leveraged to launch advanced defense (e.g., deception).

VI Limitations

In this section, we discuss the limitations of our study. Addressing these limitations will guide us to strengthen and refine the current metrics framework and other related metrics research.

First, the metrics require the defender to record the network traffic and/or computer execution traces in order to measure ${\mathcal{A}}_{t}({\mathcal{D}}_{t^{\prime}})$ in retrospect, where $t<t^{\prime}$ , $t=t^{\prime}$ or $t>t^{\prime}$ . This may not always be feasible, especially for high speed networks that generate a large volume of network traffic or complex applications that may incur concurrent executions. Nevertheless, this appears to be the only way to measure the response to new or zero-day attacks.

Second, the used datasets are not ideal because they lack rich semantics (i.e., the low-interaction honeypot dataset) or continuity over a long period of time (i.e., the DEFCON dataset). Nevertheless, we are able to validate most metrics using one dataset or the other (with the exception of LBT). Because the datasets complement each other, these experiments sufficiently demonstrate the usefulness of the framework.

Third, the TT metric aims to correlate the effects of adversarial actions between the two parties. In the datasets used, this may not accurately represent the causality of their evolution. For example, in the DEFCON dataset, Snort was not updated during a competition, so evolution by the attacker may not have been caused by a change in Snort’s rules. Nevertheless, an approximation to the “ground-truth” evolution generation may, as discussed in Section IV, be practically useful enough.

Fourth, our study represents only the first step towards the ultimate goal of measuring cyber agility. Even if many limitations exist as mentioned above, our case study clearly shows that a detection tool, like Snort, evolves effectively and has proactive defense capability. These have not been studied in the literature.

Fifth, the proposed framework contains a set of metrics that may only capture some aspects of cyber attack and defense evolution. The framework may require further investigation to fully establish a sense of completeness, which is important because any security metric of interest can be derived from a complete set of metrics.

VII Conclusion

We have presented a suite of metrics to measure cyber agility by estimating the degree of attack and defense generational evolution. The proposed set of metrics includes generation time, effective generation time, triggering time (from detection to perform an adaptive defense), evolutionary effectiveness, lagging behind time, relative generational impact, and aggregated generational impact. These metrics mainly focus on measuring the timeliness and effectiveness in order to capture the core concepts of agility. We demonstrated the measurement of these metrics using the two real-world datasets (i.e., honeypots and DEFCON) which were tested using Snort. We discussed the underlying meanings of these metrics as well as their implications.

The metrics proposed in this paper can provide valuable insights for defense strategization because they allow a defender to measure several aspects of cyber agility. These aspects include the defender’s own responsiveness to attacks over time, the identification of which defense changes have been targeted by attackers, and the ongoing effectiveness of the defender against a single or set of attackers. Insights derived from these metrics will enable the defender to become more responsive, more targeted and more effective in their competition to outmaneuver attackers. As one example, we mention that the ability to measure security effectiveness in retrospect makes it possible to characterize why the defense failed against new or zero-day attacks. This may lead to insights into how the failure may be prevented in the future. As another example, we mention that being able to tell the attackers apart would allow the defender to use our metrics to tell which attackers are more agile than others. This would suggest the defender to prioritize the defense correspondingly (e.g., paying special attention to the more agile attackers). As yet another example, we mention that the defender can use our metrics to tell which defense changes have been particularly targeted by attackers. This would suggest the defender to use more advanced defense techniques (e.g., deception) to offset the attacker’s agility against these defense changes.

Acknowledgment. We thank the reviewers for their insightful comments that guided us in improving the paper. For example, the term generation was suggested to replace our original term of adaptation because the former can more broadly accommodate attack and defense updates that are not necessarily incurred by a specific opponent move, effectively making the metric framework more widely applicable.

This research was supported in part by the US Department of Defense (DoD) through the office of the Assistant Secretary of Defense for Research and Engineering (ASD (R&E)), ARO Grant #W911NF-17-1-0566, ARL Grant #W911NF-17-2-0127, and NSF Grant #1814825. The views and opinions of the authors do not reflect those of the US DoD, ASD (R&E), Air Force Research Laboratory, US Army, or NSF. Approved for Public Release; Distribution Unlimited: 88ABW-2019-1731 Dated 15 April 2019.

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Mc Daniel, T. Jaeger, T. F. La Porta, N. Papernot, R. J. Walls, A. Kott, L. Marvel, A. Swami, P. Mohapatra, S. V. Krishnamurthy, and I. Neamtiu, “Security and science of agility,” in Proceedings of the First ACM Workshop on Moving Target Defense , MTD’14, pp. 13–19, 2014.
2[2] L. M. Marvel, S. Brown, I. Neamtiu, R. Harang, D. Harman, and B. Henz, “A framework to evaluate cyber agility,” in Military Communications Conference, MILCOM 2015-2015 IEEE , pp. 31–36, IEEE, 2015.
3[3] J.-H. Cho, P. Hurley, and S. Xu, “Metrics and measurement of trustworthy systems,” in IEEE Military Communication Conference (MILCOM 2016) , 2016.
4[4] M. Pendleton, R. Garcia-Lebron, J.-H. Cho, and S. Xu, “A survey on systems security metrics,” ACM Comput. Surv. , vol. 49, pp. 62:1–62:35, Dec. 2016.
5[5] J. Cho, S. Xu, P. Hurley, M. Mackay, T. Benjamin, and M. Beaumont, “Stram: Measuring the trustworthiness of computer-based systems,” ACM Comput. Surv. (accepted for publication) , 2018.
6[6] S. Xu, W. Lu, and L. Xu, “Push- and pull-based epidemic spreading in arbitrary networks: Thresholds and deeper insights,” ACM Transactions on Autonomous and Adaptive Systems (ACM TAAS) , vol. 7, no. 3, pp. 32:1–32:26, 2012.
7[7] S. Xu, W. Lu, and Z. Zhan, “A stochastic model of multivirus dynamics,” IEEE Transactions on Dependable and Secure Computing , vol. 9, no. 1, pp. 30–45, 2012.
8[8] M. Xu and S. Xu, “An extended stochastic model for quantitative security analysis of networked systems,” Internet Mathematics , vol. 8, no. 3, pp. 288–320, 2012.