Control Behavior Integrity for Distributed Cyber-Physical Systems

Sridhar Adepu; Ferdinand Brasser; Luis Garcia; Michael Rodler; Lucas; Davi; Ahmad-Reza Sadeghi; Saman Zonouz

arXiv:1812.08310·cs.CR·December 21, 2018

Control Behavior Integrity for Distributed Cyber-Physical Systems

Sridhar Adepu, Ferdinand Brasser, Luis Garcia, Michael Rodler, Lucas, Davi, Ahmad-Reza Sadeghi, Saman Zonouz

PDF

TL;DR

This paper introduces Scadman, a system designed to ensure control behavior integrity in distributed cyber-physical systems, effectively detecting various cyberattacks without false positives, especially in safety-critical environments.

Contribution

The paper presents a novel approach to verifying control behavior integrity in cyber-physical systems, addressing limitations of existing security solutions for safety-critical ICS environments.

Findings

01

Successfully detects a wide range of attacks including malware and code-reuse.

02

No false positives observed during nominal operation.

03

Effective in real-world water treatment ICS testbed.

Abstract

Cyber-physical control systems, such as industrial control systems (ICS), are increasingly targeted by cyberattacks. Such attacks can potentially cause tremendous damage, affect critical infrastructure or even jeopardize human life when the system does not behave as intended. Cyberattacks, however, are not new and decades of security research have developed plenty of solutions to thwart them. Unfortunately, many of these solutions cannot be easily applied to safety-critical cyber-physical systems. Further, the attack surface of ICS is quite different from what can be commonly assumed in classical IT systems. We present Scadman, a system with the goal to preserve the Control Behavior Integrity (CBI) of distributed cyber-physical systems. By observing the system-wide behavior, the correctness of individual controllers in the system can be verified. This allows Scadman to detect a wide…

Figures15

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\acsetup

first-long-format= \DeclareAcronymrop short = ROP, long = return-oriented programming

\DeclareAcronymdop short = DOP, long = data-oriented programming

\DeclareAcronymcfi short = CFI, long = control-flow integrity

\DeclareAcronymics short = ICS, long = industrial control system, long-plural = industrial control systems

\DeclareAcronymplc short = PLC, long = programmable logic controller, short-plural = s, long-plural = s

\DeclareAcronymcps short = CPS, long = cyber physical system, long-plural = s

\DeclareAcronymssa short = SSA, long = single static assignment

\DeclareAcronymcbi short = cbi, long = control behavior integrity

\DeclareAcronymswat short = SWaT, long = secure water treatment plant

Control Behavior Integrity for Distributed Cyber-Physical Systems

Sridhar Adepu*∗1*, Ferdinand Brasser*†2*, Luis Garcia*‡3*, Michael Rodler*⊕4*,

Lucas Davi*⊕5*, Ahmad-Reza Sadeghi*†6*, Saman Zonouz*∘7*

∗Singapore University of Technology and Design Singapore; †Technische Universität Darmstadt;

‡University of California, Los Angeles; ⊕University of Duisburg-Essen; ∘Rutgers University

[email protected]; {ferdinand.brasser2,ahmad.sadeghi6}.trust.tu-darmstadt.de; [email protected];

{michael.rodler4, lucas.davi5}@uni-due.de [email protected]

Abstract

Cyber-physical control systems, such as industrial control systems (ICS), are increasingly targeted by cyberattacks. Such attacks can potentially cause tremendous damage, affect critical infrastructure or even jeopardize human life when the system does not behave as intended. Cyberattacks, however, are not new and decades of security research have developed plenty of solutions to thwart them. Unfortunately, many of these solutions cannot be easily applied to safety-critical cyber-physical systems. Further, the attack surface of ICS is quite different from what can be commonly assumed in classical IT systems.

We present Scadman, a system with the goal to preserve the Control Behavior Integrity (CBI) of distributed cyber-physical systems. By observing the system-wide behavior, the correctness of individual controllers in the system can be verified. This allows Scadman to detect a wide range of attacks against controllers, like programmable logic controller (PLCs), including malware attacks, code-reuse and data-only attacks. We implemented and evaluated Scadman based on a real-world water treatment testbed for research and training on ICS security. Our results show that we can detect a wide range of attacks–including attacks that have previously been undetectable by typical state estimation techniques–while causing no false-positive warning for nominal threshold values.

I Introduction

\Ac

ics are used in a multitude of control systems across several applications of industrial sectors and critical infrastructures, including electric power transmission and distribution, oil and natural gas production, refinery operations, water treatment systems, wastewater collection systems, as well as pipeline transport systems [64]. ICS typically consist of interconnected embedded systems, called programmable logic controllers (PLCs). In a distributed ICS, multiple PLCs jointly control a physical process or the physical environment. Using a series of sensors and actuators, PLCs can monitor the physical system’s state and control the system behavior. This makes the correct functioning of PLCs crucial for the correct and safe operation of these systems.

This critical role of the PLCs makes them a valuable target for adversaries aiming to interfere with any of these systems [9]. Past incidences show that such attacks are applied in practice, often remaining undetected over a long period of time. Examples include the infamous Stuxnet worm [29] against Iranian nuclear uranium enrichment facilities as well as the BlackEnergy crimeware [28] against the Ukranian train railway and electricity power industries. These attacks demonstrate impressively that targeted attacks on critical infrastructure can evade traditional cybersecurity detection and cause catastrophic failures with substantive impact. The discoveries of Duqu [21] and Havex [61] show that such attacks are not isolated cases as they infected ICS in more than eight countries. Nation-state ICS malware has typically either targeted the control programs of PLCs or the central control infrastructure (e.g., operator workstations). However, academic research has demonstrated even more sophisticated attacks against ICS and PLCs that can circumvent existing defense mechanisms by manipulating the PLC’s firmware and incorporating physics-aware models into the attack code [31].

A comprehensive defense against ICS attacks needs to protect against various attack vectors. (1) The software determining a PLC’s behavior could be replaced by a malicious program [29, 44, 16]. Updating the PLC control program over the network is an intended functionality of PLCs to allow central management. However, the control program can also be manipulated if an attacker gains physical access to a PLC. (2) The PLC firmware (which includes the OS) could be manipulated/replaced either via the network or through physical access [31, 15]. (3) The attack can exploit a memory corruption vulnerability (e.g., buffer overflow [62]) in the PLC’s control programs and/or firmware for code-injection or to launch run-time attacks such as return-oriented programming (ROP) [60], to manipulate a PLC’s behavior. (4) Memory corruption vulnerabilities can be exploited to launch data-only attacks [38] against a PLC to manipulate its behavior. For instance, the initiation of a trigger-response may be inhibited by manipulating the associated control parameters [59], e.g., a threshold value that determines whether the action must be started.

For all above enumerated attack vectors, a common goal of the adversary is to modify the physical behavior of the system. As long as the attacked device behaves correctly the overall system will continue to operate correctly. Therefore, the ultimate goal of the attacker will always be to change a device’s behavior.

Previous works in defending against ICS attacks focus only on a subset of the above listed attack vectors. These approaches can be generally categorized into two categories: defenses that focus on verifying the integrity of the software running on a PLC and defenses that verify the behavior of the overall ICS based on models that abstract control decisions of the PLC software. In the former case, PLC-based verification solutions typically cannot account for attacks which replace and/or modify either the application layer programs or the underlying firmware. For instance, ECFI [4] provides protection against run-time attacks targeting PLC control programs, but does not protect against data-only attacks nor maliciously modified/replaced control programs or firmware. Orpheus [20] monitors the behavior of a device’s control program based on the invoked system calls. Attacks are detected based on a finite-state machine (FSM) representing the control program’s benign system call behavior. Orpheus’ behavior monitor is placed inside the device’s OS, hence, a compromised OS can disable and circumvent its protection mechanism.

Similarly, solutions that enforce compliance via state estimation [10] or cyber-physical access control [27] from within the PLC could be circumvented as well. Zeus [37] uses side-channel analysis to verify the software control flow of programs running on a PLC, but cannot defend against firmware modifications nor sensor data attacks. By extension, offline, static analysis of control programs being loaded onto PLCs [53] [23] provides even less run-time guarantees. For ICS-based verification techniques, it has been shown that state estimation can be used to infer the control commands issued by distributed controllers [26] or to detect false data injection attacks [51] based on the sensor data. Such protection mechanisms may be circumvented via physics-aware attacks [31, 32]. Further, supervised machine learning has been used to characterize physical invariants of the CPS [19]. However, such approaches depend on the training data to include all corner cases of the system execution and are not based on the control flow of the software.

We present Scadman, the first control behavior integrity (CBI) solution for distributed industrial control systems. Unlike previous state estimation approaches Scadman does not abstract the behavior of the cyber-components (i.e., PLCs). Instead, Scadman precisely simulates the state of all PLCs. By monitoring the input and output behavior of the entire ICS, Scadman can detect inconsistencies within the actions of PLCs. To enable a global view of the entire ICS, a consolidated control program of all PLCs in the system is generated to resolve functional dependencies between individual programs. The consolidated control program in conjunction with a physical state estimator is used to determine a set of acceptable states at any particular point in time. For that, Scadman needs means to analyze which control-flow paths are valid given the current system state. Based on this context-aware control-flow path analysis, Scadman determines benign resulting states. Comparing the set of benign states against the reported sensor readings and actuation commands from the ICS allows Scadman to detect anomalies in the system behavior. This makes Scadman agnostic to the various attack technique listed above, that can be used to cause a PLC to deviate from its intended behavior and makes Scadman a powerful tool to protect ICS against a wide range of attack vectors.

We evaluated Scadman on real-world industrial control system equipment [40], which are the quasi-standard for security research validation in the context of ICS [42, 49, 19, 18, 35, 45, 68, 67, 39, 65, 57]. Simulation-based evaluation does not provide a viable option for Scadman. In general, simulations are based on models of an ICS similar to Scadman. Hence, such an evaluation would validate the accuracy of our models against the model of the simulator–leading to no meaningful results.

We make the following contributions:

•

We present Scadman, the first control behavior integrity system for distributed ICS based on a model comprising cyber and physical components.

•

Our solution does not require any changes to the hardware or software of the PLCs, making it independent of the PLC manufacturers. Furthermore, leaving the PLCs unmodified is important for safety certifications to remain valid in the presence of Scadman.

•

We provide an automated solution that allows Scadman to consolidate the control programs of all PLCs in an ICS. This allows us to comprehensively simulate the control behavior of the entire system, which is important for detecting inconsistent behavior across the borders of individual PLCs.

•

We implemented Scadman using the MATIEC compiler from the OpenPLC project for automated generation of the consolidated PLC control program and LLVM for instrumentation of the consolidated PLC control program.

•

We evaluated Scadman on real-world ICS network equipment. The results were very promising. Using its runtime control behavior monitoring, Scadman was able to detect all the attacks of different types against the platform in a time manner.

The rest of the paper is structured as follows. First, we provide background on the most relevant topics related to our work (industrial control systems, control-flow integrity and cyber-physical system modeling) in Section II. We define the assumptions and system model underlying our work in Section III. In Section IV we explain the design and main ideas of Scadman. We detail on our implementation in Section V. In Section VI we discuss Scadman’s security for various attack scenarios and present our evaluation results in Section VII. Relevant related work is discussed in Section VIII. Section IX proposes future work directions, while Section X concludes.

II Background

In this section we first provide background on industrial control systems (ICS) in general. Afterwards we introduce two concepts–control-flow integrity (CFI) and cyber-physical systems modeling–which have been used in the past in an attempt to secure ICS, however, none of them are sufficient to solve this challenge.

Industrial Control Systems. Programmable logic controllers (PLC) are cyber-physical systems that are used to control industrial appliances. PLCs feature input and output modules, which translate physical inputs – in most cases current on a wire–into digital values and vice versa, to interact with the physical appliances like sensors and actuators.

PLCs can convert sensor readings into digital values, process the readings with the built-in computing unit, and forward the outputs to actuators to manipulate the physical world. Based on the available information about the system state, a PLC calculates the next actuations to steer the system towards a desired state. The program running on the PLC, called control logic, defines the control algorithm used to decide actuations. The control logic program(s) of a PLC are Turing complete and programmable using the development environments provided by the PLC manufacturers. The target system state towards which the PLC is working can be fixed in the control logic or could be set dynamically over the network by the ICS operator.

Control logic programs can be loaded onto PLCs and run on top of a privileged software layer like a real-time operating system (RTOS). This privileged software layer contained in the PLC’s firmware provides services to the control logic programs (e.g., networking, storage) and manages the programs’ updates and execution. The control logic programs are executed repeatedly in fixed intervals, called scan cycles. Furthermore, in a distributed ICS, the physical process is jointly controlled by multiple PLCs. To do so, PLCs are usually connected through a computer network, allowing them to share information like sensor readings or internal states.

The PLCs in an ICS are usually managed and monitored through central management systems, called Supervisory Control and Data Acquisition (SCADA). Typical components of a SCADA system are historians, which are databases logging data from all control devices in the ICS, IT infrastructure servers that connect the ICS to other systems such as a supply chain management system, human machine interfaces (HMI), which allow an operator to interactively control the system, and operator workstations that provide interactive control as well as PLC reprogramming.

The control logic running on the PLCs is highly application specific and is usually programmed by the plant operator itself. In particular, while the firmware and development tools for PLCs are usually closed source, the operator has full access to the control logic of the PLCs. In order to design and setup a control system, the operator usually needs knowledge, i.e., a model, of the physical processes in the system.

Control-Flow Integrity. Control-flow integrity (CFI) is a defense mechanism against run-time attacks. Modern run-time attacks do not inject or modify the code of a system. Instead, they reuse the existing code by hijacking the control flow of a program in order to cause unintended, malicious program behavior [60]. These attacks have been demonstrated on various platforms and devices, including embedded architectures like ARM [46], SPARC [17] and Atmel AVR [30].

CFI enforces that a program’s control flow does not deviate from the developer-intended flow. The integrity of the program flow is ensured by validating for each control-flow decision if the executed path lies within the program’s control-flow graph [1, 3].

In the context of ICS, the guarantees provided by CFI are not sufficient. In particular, data-only attacks like data-oriented programming (DOP) [38] pose a severe threat to PLCs. Simple modifications like changing a threshold value can have catastrophic consequences, e.g., in the attack against a steel-mill, the blast furnace cloud not be turned off due to compromised controllers, resulting in massive damage.111https://www.wired.com/2015/01/german-steel-mill-hack-destruction/

Cyber-Physical Systems Modeling. ICS comprise a class of cyber-physical systems that can be modeled as hybrid systems, or systems whose continuous evolution (physical equations) evolve based on the discrete-state transitions (controller actuations) of the system [13].

For instance, in Figure 1 a simplified example of two PLCs controlling the mixing and filling of colors is shown. Four input colors are mixed and filled into cans. The input of each color is controlled by PLC1, which controls the respective valves. PLC2 controls the conveyor belt, using a scale to determine when the current can is full and the next one has to be placed under the mixer. The pseudo code and control-flow graph show the relation between the actions of PLC1 and the readings of PLC2, i.e., the operations of PLC1 determine the physical behavior observed by PLC2. In such a hybrid system, the closing of the valve is a discrete event. The time required to fill a single can, on the other hand, will increase gradually (evolve continuously).

In the context of ICS, state estimation techniques have been leveraged to model physical dynamics for particular discrete events of the system [25, 27]. However, the actuation of the associated controlling devices are typically abstracted to simplify the complexity of the model, neglecting the underlying control-flow behavior of any running programs. This simplification opens up these systems to motivated adversaries that exploit such abstractions to launch stealthy attacks [43].

III Models and Assumptions

In this section, we present the system model and adversary model considered in this paper.

III-A System Model

We consider large distributed industrial control systems (ICS) with a centralized monitoring system (SCADA). This is the predominant system design [64] for large scale industrial plants. The ICS consists of networked controllers (\acspplc) that jointly control a (complex) physical process, where the actions of the individual \acpplc are interdependent. In particular, actuations initiated by one PLC effect the system state which will be represented in the sensor readings of other \acpplc. This means that all \acpplc are indirectly connected with each other through the physical dynamics of the controlled physical system.222Note, the system does not need to be “fully” connected, i.e., the actuations of one PLC are not required to be observed by all other \acpplc.

Each PLC is connected to its own local array of sensors and actuators.333For simplicity we assume local sensors and actuators, however, remote I/O devices, i.e., networked sensors and actuators, can be modeled in our system as “\acpplc without computations”. These sensors and actuators are directly interfacing with the physical system, and associated discrete sampled values are accessed by the \acpplc.

In addition to the indirect connection between \acpplc, all components of a distributed ICS are also connected explicitly. That said, all \acpplc are connected to each other and to SCADA over a computer network, e.g., Ethernet. By means of the computer network, input and output data of all \acpplc are reported to the SCADA system, where data is recorded in a historian database and can be viewed by the operator.

III-B Adversary Model

The adversary’s goal is to cause misbehavior of the ICS while remaining undetected. The behavior of the system refers to the actions that influence the physical process controlled by the ICS. In particular, the adversary alters control commands sent to physical appliances (actuators) that can change the state of the physical process. Passive attacks that do not alter the system behavior, e.g., attacks that ex-filtrate data, are out of scope in this work.

ICS usually have built-in safety functions that will be triggered by rapid changes of the system state or control commands that set system parameters far outside of the valid range. Therefore, the adversary needs to make sure not to trigger these safety mechanisms. Similarly, ICS are usually monitored by a human operator. The adversary has to make sure that her manipulations do not cause suspicion on the operator’s side [31], i.e., the attack needs to be stealthy. This precludes naïve attacks like denial-of-service (DoS) on devices or the network.

Number of Compromised PLCs:

The adversary can compromise one (or a small subset) of PLCs in a distributed ICS. The attacker in our adversary model has knowledge about the attacked system, so we have to assume a compromised PLC will report legitimate sensor values. Furthermore, we assume that the attacker is not able to compromise all PLCs in the ICS. We argue that this is a realistic assumption due to different reasons. For instance, in a geographically distributed ICS, the adversary might have physical access to some PLCs in a remote station of the plant. This is relevant if the attacker compromises PLCs via physical access, e.g., by updating the control logic via USB, replacing storage media like an SD-Memory card, or through debugging interfaces like JTAG [34]. Furthermore, PLCs often have physical switches that deactivate the remote update functionality, i.e., an adversary has to have physical access to a PLC before being able to modify its software remotely. Other reasons why an adversary cannot compromise all PLCs of a plant include systems which consist of heterogeneous PLCs, i.e., PLCs with different hardware or firmware versions, different models, or even from different vendors. If the adversary has knowledge about a vulnerability in one of the PLC variants, he can compromise these but not the other PLCs of the system. Also, PLCs might be isolated in different network segments, exposing only a subset to a remote attacker.

We assume that the adversary has complete control over the compromised PLC, i.e., she can compromise the firmware and the control logic of the PLC. The adversary can gain control over a PLC leveraging static or dynamic attack techniques. In a static attack, the adversary replaces the software (firmware or control logic) of a PLC, e.g., via a malicious software update. Dynamic attacks are based on injecting new code at run-time, manipulating the behavior of existing code by means of return-oriented programming (ROP) [60], or data-oriented programming (DOP) [38].

Scadman:

We assume that Scadman itself is not compromised. Scadman executes on a separate system that is isolated from all other system components, including operator workstations and PLCs. Run-time attacks that can compromise both a \acplc and Scadman at the same time are particularly hard to find due to the architectural differences between them. \acpplc are typically ARM or MIPS based systems while Scadman is typically executed on an x86-based computer. Further, we assume that the Scadman system is hardened against cyber attacks using well-known defense mechanisms–e.g., CFI [2], or SoftBound [55] and CETS [56]–and protected against physical attacks, e.g., by being placed in a physically protected environment. Malicious updates of the Scadman system, e.g., through compromised operator workstations, can be countered with standard methods such as digital signatures and two factor authentication.444Such methods cannot be easily retrofitted to protect software updates of the \acpplc themselves due to resource limitations and legacy compliance requirements.

Network Attacks:

For the sake of simplicity we consider network attacks out of scope. We assume a secure, i.e., integrity protected and authenticated channel between the controllers and Scadman. We discuss network attacks and defense mechanisms for settings without secure channels in Section VI-C.

IV Our Design

Before we describe our Scadman design and framework, we discuss important challenges that we had to tackle for Scadman.

IV-A Challenges

Important limitations in ICS stem from closed source, proprietary software. Control software, firmware, and compilers are usually manufacturer specific and cannot be modified by the customer. Thus, modification of the software running on the PLC is not feasible as it would require cooperation of the manufacturer. The control logic, on the other hand, is usually developed by the plant operator, i.e., the operator has full access to its source code.

Modifications of the PLC software also can lead to undesirable implications that will hinder adoption in practice. Safety and reliability are paramount in ICS. Hence, all modifications that could impact them are unlikely to be adapted. In particular, in systems that require safety certification modifications of the PLC software would void them, i.e., solutions that rely on the modification of control-components cannot be used in highly-sensitive environments.

Finally, having a comprehensive view of the system is necessary to detect the various attacks mentioned before. Although a single PLC may have access to monitor the values of other sensors/actuators of the ICS, the maintenance of state estimation of the physical processes will incur significant overhead in the PLC scan cycle. Even if a PLC had the memory resources for such state estimation, the computation of the state estimation may cause a violation of the real-time constraints.

IV-B Scadman* Design*

The goal of Scadman is to ensure the correct behavior of a distributed industrial control system (ICS). The correct behavior can be violated by different types of attacks, as discussed before in Section III-B. As a result, Scadman must provide a general mechanism that can counter all possible attacks that result in an incorrect control behavior of the system. Control behavior includes any action taken by any of the \acpplc that modifies the overall system state. We consider the control behavior as correct, if it fits the behavior intended by the system operator. An attack can result in incorrect control behavior, when the attacker makes one of the components perform a different action than was intended. For example, a \acplc is supposed to close a valve when a certain threshold is reached. The attacker then forces the \acplc to keep the valve open, contrary to the original programming of the \acplc. Scadman ensures the control behavior integrity (CBI) of an ICS. A violation of CBI is a deviation from the intended behavior of any of the \acpplc within the \acics.

The system can also deviate from the intended state for other reasons like faults, e.g., a faulty sensor reporting incorrect values. Scadman can detect these situations, allowing the operator to repair the system.

Figure 2 shows the concept of Scadman. In a distributed ICS, multiple \acpplc interact independently with a physical process. However, the actions of one \acplc influence the overall state of the physical system. This is reflected in the sensor readings of other \acpplc. We exploit this interdependency to detect the misbehavior of a compromised PLC. For example, a compromised \acplc cannot stealthily open a valve because a second trustworthy \acplc, which is not under attacker control, would measure the change in inflow. This discrepancy between expected sensor readings and the actual system state is used by Scadman to detect deviations in the control behavior.

Scadman-Monitor:

All PLCs report their actuation commands and sensor readings to a central entity, which we call Scadman-Monitor. The Scadman-Monitor is a program that interacts with a simulated physical system and allows Scadman to calculate the expected state of the overall \acics. Note that centrally reporting and logging all operations is very common in ICS [64], e.g., for reporting to HMI components. Based on the retrieved data, the Scadman-Monitor will subsequently check whether any of the \acpplc have been deviating from the intended behavior, i.e., that all \acpplc have been following a small set of valid control flow paths given the current system state. This check requires two components of the Scadman-Monitor. (1) A consolidated control logic code of all PLCs, and (2) a model of the physical process allowing Scadman-Monitor to determine the interdependencies of the PLCs’ inputs and outputs, shown on the right in Figure 2.

Scadman generates the consolidated control logic which combines the control logic of all PLCs into a single large program that represents the entire control actions of the ICS. This code is executed on the Scadman-Monitor to determine valid actions of the \acpplc. Based on the current state of the overall system and the model of the system’s physical state, Scadman dynamically derives the legitimate control-flow paths through the \acplc code in a physics-aware manner. This means, Scadman does not accept all possible, benign control-flow paths in the control logic’s control-flow graph as valid, as is the case with CFI [1, 3], but only those that are valid at any given time in the current state of the \accps. This approach limits the set of allowed control-flow paths and thus the adversary’s actions.

The physical process model allows the Scadman-Monitor to estimate the influence of control commands sent by \acpplc on the expected sensor readings. As the adversary cannot influence the physical model of the system (the laws of physics cannot be altered), an inconsistency between actuation commands and sensor readings implies that either the \acplc controlling the actuation or the \acplc controlling the sensors must behave incorrectly, i.e., issue wrong actuation commands or report forged sensor readings. Scadman can tolerate imprecise and incomplete models. The model quality largely determines the detection precision. However, an imprecise model, noisy sensors, and other factors impacting the state estimation are handled by Scadman as described below and in more details in Section V-C.

IV-C Scadman* Framework*

Our Scadman framework leverages a compiler-based approach to automatically generate the consolidated PLC code and connect it to the physical state estimator. Both are executed in Scadman-Monitor, as shown in Figure 3.

Physical State Estimation:

The state estimator uses physical models of the physical processes that are controlled by the \acpplc of the \acics. The state estimator simulates the evolution of the physical system. Based on the current state of the physical systems and actuation inputs to the system the state estimator determines the following state of the system.

Physical systems usually evolve continuously, however, for Scadman the system state at the sampling point is relevant, i.e., at the points in time when a \acplc reads the system state using its sensors. Models of the physical processes in \acics are usually known by the operator of the plant. Additionally, several recent works have developed methods to extract and generate such models, which can be used with Scadman [25, 27].

Scadman-Monitor Operation:

Scadman runs in parallel with the \accps (on the left in Figure 3) and compares the sensor readings and actuation commands it receives over the network from all \acpplc to a set of valid state determined by Scadman-Monitor–allowing for the validation of the behavior of the \accps (i.e., distributed \acics).

Scadman-Monitor works in iterations, similar to the scan cycle based operations of \acpplc. During each iteration the current system state and actuation commands are fed into the state estimator of Scadman-Monitor, in parallel the physical process in the \accps evolves, based on the current system state and past actuation commands. The state estimator calculates the state space into which the system should have evolved based on its history, in Figure 3 the state variables $S_{1}\dots S_{3}$ have been predicted to lay within some given interval of possible values. The fuzziness of the state space can be, for instance, due to impressions of the physical model used in state estimator or due to (small) errors in the input value. However, Scadman can tolerate these impressions.

The actual physical state in the \accps is read by the \acpplc’s sensors and processed by them. In parallel, Scadman-Monitor executes the consolidated PLC using the estimated state as input. Since the input state can be fuzzy the execution of Scadman-Monitor has to account for it. By using a technique called error-margin multi-execution the execution is performed over the entire range of possible input values (see Section V-C for details). By executing–given the current system state context–all valid control-flow paths in the consolidated \acplc Scadman-Monitor determines the set of possible outputs. Again, the outputs ( $O_{1}\dots O_{3}$ in Figure 3) are represented as intervals or sets of allowed values.

When Scadman-Monitor receives the sensor reading and actuation command of the \acpplc from the \accps it performs the consistency check. If all \acpplc were executing correctly–and all sensors and actuators operated correctly–the reported values must be a subset of the outputs determined by Scadman-Monitor. Any deviation indicates an inconsistency within the behavior of the \accps and Scadman rises an alarm.

If the system was found to be correct the system state reported by the \acpplc is accepted as the current system state and serves as input for the next iteration of Scadman-Monitor. This is necessary to prevent the system state calculated by the state estimator to gradually deviate from the actual system state.555To detect and counter slow evolving attacks Scadman can be adapted to simulate the system state over multiple iterations at the cost of added impression in the state estimation (see Section IX).

V Scadman Implementation

Scadman introduces the Scadman-Monitor process, which is responsible for receiving all sensor values and actuations, and validating them using the consolidated PLC code and state estimation. We propose a generic approach to build the Scadman-Monitor. We will first discuss how Scadman consolidates the control programs for a distributed PLC network along with the necessary assumptions for timing and functional correctness. We then demonstrate how the consolidated code will be compiled and instrumented into an executable that can receive the state of the ICS network, e.g., the state of the sensors, as input for each scan cycle and update the estimated state of the system. We also introduce a novel approach, so-called error-margin multi-execution that allows Scadman to account for cases where the model of the physical system deviates from the real system. Finally, we describe how these components of Scadman can be combined with the physical state estimation to detect any compromised components in the ICS.

Example. For the purpose of clarity, we will provide a simplified representation for a single process control for two PLCs from the system in Figure 1. Figure 4 shows two PLC programs, plc1 and plc2 that are consolidated by Scadman into a single PLC representation, master. PLC1 is responsible for controlling a valve, YellowValve, associated with the yellow color dispenser. The valve will open if an input amount is greater than 0. PLC2 is responsible for moving the conveyor belt if the current can is full and if the YellowValve is not open. Descriptive variable names have been used in the code. This example will be used to explain each component of the implementation.

V-A PLC-Code Consolidation

The premise of generating a Scadman-Monitor representation is to first merge the control program code of all the ICS PLC’s into a single PLC program representation. This consolidated representation is necessary for two reasons. First, in order to monitor the distributed processes, we need access to all of the system parameters in order to successfully simulate the physical model of the overall system, i.e., we cannot simulate the physics of a process with partial sensor data. In theory, the consolidation would not be necessary for a subset of the PLCs that do not have any cyber-physical interdependencies. However, these dependencies are difficult to derive. As such, Scadman automatically generates models that incorporate these cyber-physical interdependencies as long as the distributed system conforms to the assumptions required to ensure functional and timing correctness, which are discussed at the end of this subsection.

In this paper, we consider PLC control programs that conform to the IEC 61131 standard [41]. According to the standard, programs are typically composed of three types of programming organisation units (POUs): programs, functions, and function blocks. A program is the “main program” of the PLC that includes I/O assignments, variable definitions, and access paths. A function is a programming block that returns a value given input and output variables in a similar vein to function definitions for other procedural programming languages such as C. A function block is a data structure that has the same functionality as a function but retains the associated values in memory across executions. As such, the code consolidation process will append all of the function and function block definitions and merging the main PLC program of each PLC. This allows us to retrieve the state of all sensors and actuators of the ICS, feed the values through this consolidated representation, and observe how the actuators are updated. Figure 4 illustrates how the main programs of two PLC programs will be merged.

However, it is common practice to define different components for a single PLC control program using different programming languages. The IEC 61131 standard enumerates five programming languages: (1) ladder diagrams (LD) – a graphical programming language to design logic circuits, (2) sequential function charts (SFC) – another graphical programming language to define sequential state operations, (3) function block diagrams (FBD) – a graphical representation of function blocks, (4) instruction lists (IL) – an assembly-like textual programming language, and (5) structured text (ST) – a textual programming language similar to Pascal. The heterogeneity of a PLC program significantly increases the complexity of any form of static code analysis as compilation and simulation rules would have to be defined for each language. As such, Scadman first converts all programs to a single programming language representation. Previous works have formally proven that the structured text (ST) programming language can be used to represent the other four languages [24] and therefore serves as our base programming language for the Scadman-Monitor.

We will now discuss the correctness of our consolidation process and the necessary assumptions.

Timing correctness. The correctness of consolidating the control logic of all \acpplc depends on the required sampling time of the ICS. \acplc tasks can be executed either continuously or periodically for some interval. For a distributed network of $N$ \acpplc that are configured to run programs at varying sample times, $T_{sample}(i)$ for a $\mathrm{PLC}_{i}$ , the consolidation is valid if and only if the sum of the execution times, $T_{execution}(i)$ of all controller programs is less than the smallest task interval, i.e.,

$\sum_{i=1}^{N}T_{execution}(i)<\min_{\forall{i}\in N}T_{sample}(i)$ .

For continuously executing \acplc configurations–i.e., event-driven control–the cumulative scanning time must be less than the shortest duration time of an input or an output signal [54]. The continuous scan cycle time of \acpplc range from microseconds to tenths of a second. However, because Scadman is implemented on a standard computer with substantially more computing power than typical \acpplc, the “scan cycle” for Scadman’s consolidated \acplc code is much faster and, hence, the only bottleneck is the sampling time over network communication. In the system shown in Figure 4, the programs plc1 and plc2 are shown to have the same execution interval timing, which is reflected in the consolidated master program.

With respect to clock drift between \acpplc, we assume that the design of the overall ICS accounts for clock drift as such a hindrance would be a pre-existing condition.

Functional correctness. We also consider the functional correctness of combining multiple control logic programs sequentially into a single control logic program. A \acplc’s scan cycle can be abstracted into three components: the scanning of the inputs, the propagation of the inputs through a logic circuit, and the updating of all associated outputs at the end of the scan cycle. The Scadman-Monitor program combines the scanning of inputs and updating of outputs for all \acplc programs as these actions are atomic in nature. Previous work has shown that separate processes update the values of inputs/outputs to/from memory independent of the scan cycle process [31].

Because the process of propagating the inputs through a logic circuit to update the associated outputs is parallel in nature across \acpplc, we can inductively claim the correctness of merging based on the timing correctness of our assumptions. Furthermore, the ordering of the merging process is arbitrary as any unsatisfied dependencies or race conditions amongst \acpplc would be a pre-existing nuisance in the design of the system. For instance, for the system in Figure 4, the ordering of plc1 and plc2 is arbitrary in the context of the master program. If there was a race condition and/or ordering dependency where both programs were writing to the actuator YellowValve, this would be a flaw by design of the overall ICS.

V-B Compilation and Instrumentation

To compute the updated values of actuators at the end of a scan cycle, we need to execute the consolidated PLC code given the state of the sensors. We integrate the consolidated control code into the Scadman-Monitor and record the actuations, performed by the control code. Figure 5 provides an overview of the implementation and shows how the consolidated code is incorporated into the Scadman-Monitor. We use our extended MATIEC compiler666https://github.com/thiagoralves/OpenPLC_v2/ to compile the consolidated PLC code to C code. We modified the MATIEC compiler to automatically generate functions that allow easy access and modification of the internal state of the generated C code. This is used by the Scadman-Monitor process to simulate access to sensor values and actuators.

As a second step, we compile the C code into an intermediate representation using the LLVM [47] compiler framework. Operating on the LLVM intermediate code allows us to perform analysis and instrumentation without having to analyze structured text or C code directly. We perform instrumentation on the generated LLVM intermediate code to introduce an execution mode that draws from the ideas of symbolic execution, abstract interpretation and interval arithmetic. We call this execution mode error-margin multi-execution. We use this execution mode to reduce the number of false positives by introducing the notion of an error-margin to accessed sensor values. We discuss details of this approach in the following subsection.

The final step is to produce a runnable executable. We compile the instrumented PLC code to native code and link it with our support library, providing various utility functions. The C code generated by the MATIEC compiler is intended to be linked to a userspace driver that implements hardware access. Instead we link the generated code with our framework such that all hardware accesses are intercepted and forwarded to the state estimation and attack detection.

V-C Error-margin Multi-execution

The model of a physical system may deviate from the real system due to various reasons. For instance, a physical processes might evolve slower or faster than expected in the model. These minor differences between the model-based estimated state and the real system state can lead to inconsistencies that Scadman would incorrectly report as an attack. These false positives can occur, for instance, if the PLCs perform an actuation depending on whether a sensor value is above or below a threshold, as shown in the example in Figure 5. In the real system, the sensor might be above the threshold, while in the simulated physical system, the sensor is still below the threshold. In this case, the Scadman-Monitor performs an actuation, while the real PLC does not, or vice versa. For instance, this can happen when the weight of a can increases at a slightly higher rate than estimated by the physical system model. The reason for this can be a valve, which controls the inflow of a color into a can and may not necessarily close within one scan cycle. In our system model, the actuations are assumed to be immediate, i.e., the valve will be closed and our system model will reflect an inflow rate of 0. In reality it may only be partially closed with a nonzero inflow rate. These slight deviations will be propagated to the associated physical model.

To tackle this problem and reduce false positives, we check whether the Scadman-Monitor behaves differently in terms of actuation assuming an error in the sensor readings. We introduce error-margin multi-execution to detect differences in actuation. First, we define error-margins for sensors. Second, we detect whether a \acplc performs different actions, when executed with an error applied to the sensor value. A difference in the performed actions, are only observed when the \acplc is taking a different control-flow through the program execution. Therefore, we need to detect whether the control-flow of the \acplc depends on a sensor value (cf. code snippet in Figure 5).

We define an error-margin, $\pm\epsilon$ , for each of the sensors. We then check whether the Scadman-Monitor performs different actuations when applying $\pm\epsilon$ to the sensor reading, which we denote as $s$ . Using interval arithmetic, one can propagate the error-margin through the executed program. However, if a branching condition depends on the sensor value, possibly two branches must be executed if the decision is inconclusive. For example, the branching condition is $(s<N)$ , then the execution could take both branches if $s+\epsilon\geq N$ and $s-\epsilon<N$ . Therefore, we need to execute multiple paths through the control program. Symbolic execution would allow us to use symbolic sensor values and constrain them into the error-margin and execute multiple paths at once. However, current symbolic execution engines have known limitations when it comes to solving constraints for floating point operations [48]. Typically sensor values are represented as floating point types. To overcome this limitation we introduce multi-execution that operates solely on concrete floating point values within the error-margin applied and can execute multiple branches in parallel.

We integrate this error application into the consolidated PLC code simulation at the LLVM level. Whenever a conditional branch instruction depends on a sensor value $s$ , we introduce instrumentation that forks the execution of the PLC code. In one fork we continue without an error, so $s^{\prime}=s$ . In the second and third fork we continue with the upper bound of the error-margin $s^{\prime}=s+\epsilon$ and the lower-bound $s^{\prime}=s-\epsilon$ , respectively. Using only the upper and lower bound of the error interval $[s-\epsilon,s+\epsilon]$ is not sufficient. To be able to evaluate equality comparisons we need to also continue one fork of the code on $s$ (without applying any error). At the end of the scan cycle we merge all forks and continue without any error applied in the next scan cycle. We create $\mathcal{O}(3^{\#sensors})$ forks per scan cycle. While this is a significant overhead in the worst case, our evaluation, however, shows the practicality of this approach. We can use several optimizations in practice to reduce the number of concurrent forks. For example, if two forks take the same control-flow path, we can stop executing one of the two forks. Most basic blocks have only two outgoing edges, therefore we can usually kill one of the forks directly after they have taken the branch. In fact, we do not need to use the multi-execution approach until we detect a discrepancy in the actuations. We can then selectively re-execute only the violating scan cycle in multi-execution mode to get a more accurate result on the actuation.

Instead of producing one value for an actuator we now get a set of values for each of the actuators. If the actuation of the real system is not in the set which is reported by the consolidated PLC code we detected an inconsistency which is beyond the errors-margins and report an attack. Incorporating state estimation errors allows Scadman to minimize false positives that would arise with slight errors in the model of the physical system.

To detect whether a branch condition depends on a sensor value, we perform backwards data-flow analysis, starting from the condition of the branch. We use the \acssa of LLVM intermediate code to perform intra-procedural data-flow analysis. Inter-procedural analysis is not implemented in our current prototype as the code generated by MATIEC does not require inter-procedural data-flow analysis for most sensors. We search the resulting data-flow graph for load instructions that read sensor values. Because the load instruction is contained in the backwards data-flow graph, we know it will affect the branching condition and must be instrumented to incorporate the check for forking the process based on the given error-margins.

Our error-margin multi-execution introduces some imprecision into the system. An attacker might try to exploit this imprecision to evade detection, we discuss this scenario in more detail in Section VI.

V-D Attack Detection

To detect attacks, the Scadman-Monitor performs two steps, where the results from the $n$ -th scan cycle are used to predict and verify the $n+1$ -th scan cycle of the system. First the Scadman-Monitor compares sensor values received in scan cycle $n$ with the sensor values estimated based on the inputs from scan cycle $n-1$ . When the received senor values are verified, i.e., fall within the set of predicted values, the Scadman-Monitor uses them as input to execute the consolidated PLC code. This results in a set of acceptable actuation operations for scan cycle $n$ . Scadman-Monitor compares this set against actuation operations reported by the real PLCs. If the actuations of the real PLCs are verified correctly they serve as input to the state estimator of the physical system, which will predict the sensor values for the next scan cycle $n+1$ .

Incomplete data.

Scadman can also be used on system that cannot provide complete data or which involve (sub)process for which no accurate state estimation is possible. This can be due to various reasons, e.g., if a sensor state depends on human interactions with the system the Scadman-Monitor cannot predict the state of that sensor in a meaningful way. However, the state of other sensors and actuators of the system that are not directly influenced by such external influence can still be validated by Scadman.

VI Security Considerations

In this section we consider different kind of attacks in the context of ICS and how Scadman can detect them. According to our adversary model (cf. Section III) we consider a subset of PLCs to be compromised, i.e., $k$ out of $n$ PLCs are compromised, where $k<n$ .

Afterwards we will discuss attacks scenarios that go beyond our adversary model and show that Scadman is valuable in these scenarios as well.

VI-A k-out-of-n Compromised PLCs

As discussed before, for various reasons the attack might have compromised a subset of PLCs in an ICS. The adversary aims to act stealthy, hence, we assume the adversary lets the compromised PLCs report sensor readings and actuation commands that meet the expectations of the operator as well as Scadman. However, the reported values will not match the expected values relative to the values reported by the non-compromised PLCs.

In particular, as long as one PLC that is physically interconnected with the compromised PLCs reports correct values, a discrepancy will emerge. Scadman will detect this discrepancy and will raise an alarm. While Scadman will not be able to identify which PLCs are compromised, it can still warn the operator, who can then start an in-depth investigation on the system. In the next section we evaluated Scadman on a large set of ICS attacks implemented for a real-world industrial control network (SWaT [40]) and show that it can detect all of these attacks.

Slow evolving attacks.

Scadman is based on a closed loop approach where, for each scan cycle, the system is analyzed for anomalies. The system continues when no anomaly is detected. By continuing, the current state of the system is accepted as benign and serves as the basis for estimating the system’s future state. An adversary could try to exploit this scenario by slowly pushing the system towards a false state. On each iteration, the adversary would manipulate the system within the error margins of Scadman. However, ICS are usually designed to include safety measures programmed into the PLCs that prevent the system from being steered to an unsafe state. While the attack can slowly modify the system within the safety boundaries of the system without being detected, the system cannot be pushed to an unsafe state. Scadman would detect any deviations in the control flow path of the PLC, e.g., if an adversary pushes the system outside of the safety boundaries enforced by a safety check within the control flow of the original PLC program. The best the adversary can do is to leverage the simulation error margin used by Scadman to get the system slightly outside of its safety boundaries. However, the safety boundaries are chosen such that the system remains safe even in the presence of small errors, e.g., due to sensor measurement noise.

We discuss additional counter measures in Section IX

VI-B All PLCs Compromised

Scadman provides security based on the assumption that physical interdependencies of controllers (PLCs) enable the detection of misbehavior. In the simple case that the entire system is controlled by a single PLC, an adversary would control all of the data (e.g., sensor readings) available to Scadman if this PLC is compromised. Hence, an intelligent adversary can provide a consistent view of the system [31] towards Scadman and remain undetected.

For distributed ICS, the adversary needs to control all PLCs to provide a coherent view of all actuation commands and all sensor readings reported to Scadman-Monitor. This means the adversary has to simulate the expected behavior of the entire system and synchronizes the actions of all PLCs. While this might be feasible for very simple and static ICS, the attacker’s limited resources (PLCs have limited computation power and memory) significantly aggravate the complexity of stealthy attacks for dynamic ICS.

VI-C Network Attacks

In this work we assume a secure channel between Scadman-Monitor and the \acspplc, i.e., an adversary cannot launch network attacks by impersonating other devices. However, some legacy systems do not provide secure network channels, in which cases the adversary might try to overcome Scadman by manipulating network packages.

The adversary can either try to manipulate or suppress network packages of other PLCs, i.e., PLCs not controlled by the adversary that would reveal the adversary’s behavior manipulations. This is not possible in commonly used switched networks, i.e., network packages of an un-compromised PLC will never be routed to a compromised PLC but directly to Scadman-Monitor.

The second option for the adversary is to impersonate another PLC, i.e., by sending packages to Scadman-Monitor pretending to originate from an uncompromised PLC, e.g., by modifying the source IP address of a package. However, as discussed before, the adversary cannot suppress packages sent by the benign PLC, hence, Scadman-Monitor will receive both types of packages: those with benign sensor reading and those with manipulated values reported by the adversary. This mixture of input values will lead to inconsistencies, which will trigger a security alarm of Scadman.

VII Evaluations

In this section, we provide an overview of our experimental evaluation of Scadman. We first introduce the real-world industrial control network that we used for our evaluation. We then discuss how Scadman implements code consolidation for the proprietary PLCs used in our evaluation. Afterwards we describe the choice of physical state estimation equations used for our attack detection. Finally, we evaluate Scadman against a set of attacks that were enumerated by previous works.

VII-A Evaluation Environment and Dataset

The study reported here was conducted on data from a real distributed industrial control system [40] as shown below.

The network is a 6-stage water treatment plant that produces 5 gallons/minute of treated water. The plant can operate non-stop 24/7 in fully autonomous mode. The sub-process of each stage is controlled by an individual PLC (cf. Figure 6. In total, the plant contains 68 sensors and actuators; some actuators serve as standbys and are intended to be used only when the primary actuator fails.

ICS operation: Operation of the plant is initiated by an operator at the SCADA workstation and, when needed, can be controlled. State information can be viewed at the workstation or at the HMI, and is recorded in the historian.

Plant supervision and control: A Supervisory Control and Data Acquisition (SCADA) workstation is located in the plant control room. Data or control access to nearly all plant components is available via this workstation. A plant operator can view process state and set process parameters via the workstation. A historian is available for recording process state as well as network packet flows at preset time intervals.

Communications: A multi-layer network enables communications (as shown in Figure 7) across all components of the network. The ring network at each stage at level 0 enables PLCs to communicate with sensors and actuators at the corresponding stage. A star network at level 1 enables communications across PLCs, SCADA, HMI and the historian. PLCs communicate with each other through the L1 network, and with centralized Supervisory Control and Data Acquisition (SCADA) system and Human-Machine Interface (HMI), through the Level 2 network.

Dataset.

We evaluated Scadman using data generated by the industrial control network shown in Figure 6. The dataset includes both normal operations to evaluate the false positive rate of Scadman as well as attacks to evaluate the detection performance of Scadman. The attack data, containing data from a total of thirty-six attacks, were generated independently of our work, modeled after Adepu et al. [6, 8] and was used in previous works to evaluate ICS security solutions [35].

The dataset contains data collected during seven days of continuous operation of the ICS network. It contains $496,800$ data points, each point representing the system state using $53$ features of the system, e.g., sensor values and actuator states. The sensor data indicates the states of various plant components including tanks, valves, pumps, and meters, as well as data on chemical properties including pH, conductivity, and the Oxidation Reduction Potential (ORP).

VII-B Scadman* State Estimation*

We now describe how we evaluated each component of Scadman’s implementation in order to generate the cyber-physical state estimator.

ICS network PLC code consolidation.

The network consists of six Allen Bradley PLCs using proprietary code and development tools. Each PLC was programmed individually using the proprietary Rockwell Automation Studio 5000 development environment. In order to perform the code consolidation for each PLC, we first needed to translate the heterogeneously programmed controller projects to a single IEC 61131-3 standard structured text format. To do so, we extracted the L5X project files for each PLC [14]. The L5X format is an XML format used for importing/exporting projects to and from the Studio 5000 environment. We then built a translation tool, L5X2IEC, using an existing l5x Python library777https://pypi.python.org/pypi/l5x/1.2 that provides accessors for the XML elements within the L5X files. Although the Allen Bradley PLC programming languages conform to the IEC standards, we needed to provide translations for the proprietary extensions of each language.

Physical state estimation.

Scadman provides physical state estimation for sensors whose sensor and actuation dependencies are satisfied. For instance, we cannot predict the value of a water tank if we do not have access to the corresponding flow rate sensor. We provide generic physical state estimators for the water tank level sensors, the flow rate level sensors, as well as the status indicators for pumps and valves. However, models for other components of the system can be added in future work.

For water tanks, we used the same estimation and threshold values provided in prior analyses of the ICS network platform [10, 7]. Scadman implements the following closed-loop state estimation models:

$TankLevel=TankLevel+(Inflow-Outflow)*F_{c}$ .

Where Inflow and Outflow are the inflow/outflow rates of the tank and $F_{c}$ is a conversion constant for the flow rate.

For flow rates, we derived a closed-loop model that incorporates any actuators that may open/close the flow of water:

$FlowRate=FlowRate*\prod_{n=1}^{N}Actuator_{n}$

Where $Actuator_{n}$ represents any pump or value whose value is 0 (for off) or 1 (for on). For our analyses, we use the plant invariants that capture the state of the system at any point of time [7]. Each model is then invoked automatically when a particular variable needs to be estimated. In addition to providing generic models for these subsystems, the models for the binary values of the actuator states are automatically generated by our Scadman-Monitor executable.

VII-C Attack Detection

We were able to successfully detect all attacks enumerated in the attack data set. We further evaluated Scadman against the record-and-replay attacks enumerated in a previous case study, where sensor values were recorded and replayed back to the HMI to spoof sensor values as was done in the Stuxnet malware [10]. For the non-optimized evaluation, Scadman had 0 false negatives with a very low false positive rate of 0.36% for the nominal water tank level deviation threshold in the implementation without multi-execution. The false positives were due to the cases mentioned in section V, where an actuator may open/close a tick too early or too late based on our estimated sensor values.

False positive pruning. All false positives were pruned by our error-margin multi-execution implementation for the nominal water tank level deviation threshold. We show the associated ROC curves of varying water tank level deviation thresholds for both the normal execution and the error-margin multi-execution in Figure 8. False positives only exist for very small threshold values, i.e., a threshold value that is less than 1mm for the water tank level will obviously result in some false positive rate. The nominal threshold values were based on the threshold values used for state estimation in a previous work [10]. The nominal threshold value of 5mm obtained from our ROC curve confirmed the choice in the previous work.

Performance. We performed our evaluation on a system equipped with an Intel Core i7-4710MQ Processor at $2.50\,GHz$ , $16\,GB$ of RAM running Linux v4.4.0-112-generic. Running Scadman on the entire dataset of seven days took $30$ hours for the single threaded deviation-checking, and $51$ hours for the multi-threaded error-margin multi-execution using our current prototype implementation that is not optimized for performance. This shows that Scadman can “keep up” when running in parallel with the real system even on a desktop-grade computer.

Memory.

The memory usage of Scadman was $36\,MB$ on average with a peak memory requirement of $149\,MB$ , with multi-execution turned off. This shows that Scadman can be used to constantly monitor the system behavior using standard server equipment.

Communication.

By default, all PLCs communicate with the SCADA system to display the operational process data and to store the operational data in a historian. Scadman can retrieve its data from the historian causing no communication overhead in the PLC network.

VIII Related Work

The previous works on ICS security can be categorized into cyber-physical security mechanisms that are implemented within the ICS controller to enforce code integrity and monitoring solutions that abstract the control of PLCs to verify the overall cyber-physical system.

Internal CPS control security. ECFI [4] provides a control-flow integrity (CFI) solution for PLCs, where the code running on the PLC is instrumented to validate whether indirect branches follow a legitimate path in the control-flow graph (CFG). Scadman does not need any modifications of the code running on the PLC. In contrast, it monitors the overall behavior of the PLC reflecting its entire software (including the OS). Furthermore, Scadman provides context sensitive control-flow checking, i.e., the set of allowed CFG paths is further restricted based on the current system state.

Control-Flow Attestation (C-FLAT) enables a prover device to attest the exact control-flow path of an executed program to a remote verifier [5]. However, it cannot be applied to existing systems that do not have the necessary hardware security extensions such as the ARM TrustZone. PyCRA [63] uses a physical challenge-response authentication to protect active sensing systems against cyber physical attacks. PyCRA’s focus on active sensors, hence it is not applicable to passive sensors nor to actuators, both of which are also common in ICS. Orpheus [20] monitors the behavior of a program based on executed system calls and checks whether a system call is legitimate in the given context. The decision is made based on a finite-state machine (FSM) representing the programs system call behavior, i.e., system calls are only allowed to be executed in sequences for which valid transitions exist within the FSM. Orpheus requires a FSM of the monitored system, which needs to be constructed in a learning phase. Scadman does not require such a model of the overall system but only models for the individual subprocesses of the physical system. Also, Orpheus performs detection on the device and relies on an un-compromised OS and that physical event reports are untampered. Scadman does not require any modifications to the monitored devices and does not require a trusted channel to input sensors.

Zeus [37] monitors the control flow of a PLC control program by monitoring the electromagnetic emissions side channels of the PLC by a neural network model. Such a defense does not protect against data attacks and can further be circumvented via firmware modification attacks. Furthermore, Zeus cannot account for verifying other the other networked components as in the Scadman framework.

State estimation has been used within the PLCs to detect if any of the invariant properties of the system have been violated [10, 7, 11]. This enforcement resides in the application layer of a single PLC, which can be circumvented if the PLC is compromised. Furthermore, the physical invariants and their dependencies are specified manually. Scadman automatically enforces the checking of discrete-state transitions by analyzing the consolidated PLC code. Similarly, on-device runtime verification has been proposed for PLCs with coupled hypervisors [33]. The hypervisor resides above the firmware and relies on the integrity of the PLC control logic.

TSV [53] verifies the integrity of any program being loaded onto a PLC by lifting the associated binary to an intermediate language to symbolically execute the program and verify that it is not violating any of the provided infrastructural safety requirements. The safety requirements are enforced within the PLC by extension of the guarantees provided by TSV. Similarly, PLCVerif [23] provides a framework for checking safety properties of PLC code against finite state automata. These solutions are offline analyses that do not provide any runtime guarantees and only verify the control logic application code.

External CPS control security. Previous works have proposed means of detecting stealthy attacks in the context of ICS. David et. al. [66] reported on limiting the impact of stealthy attacks on industrial control systems. Liu et .al. [50, 52] presented false data injection attacks against state estimation in electric power grids. This work is implemented mainly in a simulation environment, where they are considering stealthy attacks on smart meters. In Scadman, we also consider stealthy attacks on multiple sensors and actuators on real-time operational data.

Yuqi et. al. [19, 18] proposed an approach for learning physical invariants that combines machine learning with ideas from mutation testing. Initial models are learned using support vector machines. These learned models are used for code attestation and identifying standard network attacks. Configuration based intrusion detection system have also been proposed for Advanced Metering Infrastructure [12]. The AMI behavior is modeled using event logs collected at smart meters. Event logs are modeled using Markov chains and linear temporal logic for the verification of specifications. However, such models depend on the completeness of the training data set used for the learned models. A water control system was modeled using an autoregressive model in order to monitor physics of the system [36]. For distributed systems with complex cyber-physical interdependencies, it is infeasible to assume all discrete states of the system will be traversed. Scadman automatically contains a discrete-state model of the entire ICS and depends only on the accuracy of the physical state estimation. In a similar vein, the idea of detecting attacks by monitoring physics [58, 22] of the ICS by using invariants has been applied. However, in these instances the invariants were derived manually based on domain knowledge. Scadman automatically derives these cyber-physical invariants and significantly reduces the probability of human error during the modelling phase of complex systems.

IX Future Work

In this section we propose possible extensions of Scadman to improve its security and functionalities.

Simulation interval.

The current implementation of Scadman uses a closed loop approach where the system state $s$ after each scan cycle is serving as the basis for the next round. However, Scadman can be extended to use state $s$ only after $n$ scan cycles. This means that the state estimation and multi-execution performed by Scadman must cover $n$ scan cycles, which could lead to larger errors in the state estimation, which in turn could impact the error-margin multi-execution of Scadman negatively. However, this approach can make slowly evolving attacks (see Section VI) even more complicated, further increasing the security of Scadman.

Automated invariant generation.

Scadman cannot only serve as a security solution but can also help improve the functional correctness and safety of an ICS. Our modeling framework can be used to determine interdependencies of system variables. This information is useful when programming an ICS as it helps to identify conditions and safety checks that need to be included in the PLCs for the ICS to operate correctly.

X Conclusions and Summary

Industrial control systems (ICS) are ubiquitous and increasingly deployed in critical infrastructures. In fact, recent large-scale cyber attacks (e.g., Stuxnet, BlackEnergy, Duqu to name a few) exploit vulnerabilities in these systems. Building a generic defense mechanism against the various ICS attack flavors is highly challenging. However, we observe that all these attacks influence the physics of these devices. As a result, we developed Scadman, a system that preserves the Control Behavior Integrity (CBI) of distributed cyber-physical systems. Scadman provides real time monitoring for intrusion detection and sensor fault detection by maintaining a cyber-physical state estimation of the system based on a novel control code consolidation generation as well as state estimation equations of the physical processes. Scadman enforces the correctness of individual controllers in the system by verifying the actuation values being sent from the PLCs as well as the associated changes that propagated through the physical dynamics of the system. We evaluated Scadman against an enumerated set of attacks on a real water treatment testbed. Our results show that we can detect a wide range of attacks in a timely fashion with zero false positives for nominal threshold values.

Bibliography68

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow Integrity,” in Conference on Computer and Communications Security , ser. CCS, 2005.
2[2] ——, “Control-flow integrity,” in Proceedings of the 12th ACM conference on Computer and communications security . ACM, 2005, pp. 340–353.
3[3] ——, “Control-flow Integrity Principles, Implementations, and Applications,” ACM Trans. Inf. Syst. Secur. , vol. 13, no. 1, Nov. 2009.
4[4] A. Abbasi, T. Holz, E. Zambon, and S. Etalle, “ECFI: Asynchronous Control Flow Integrity for Programmable Logic Controllers,” in Proceedings of the 33rd Annual Computer Security Applications Conference , ser. ACSAC, 2017.
5[5] T. Abera, N. Asokan, L. Davi, J.-E. Ekberg, T. Nyman, A. Paverd, A.-R. Sadeghi, and G. Tsudik, “C-FLAT: Control-Flow Attestation for Embedded Systems Software,” in Conference on Computer and Communications Security , ser. CCS, 2016.
6[6] S. Adepu and A. Mathur, “An investigation into the response of a water treatment system to cyber attacks,” in Proceedings of the 17th IEEE High Assurance Systems Engineering Symposium, Orlando , January 2016, pp. 141–148.
7[7] ——, “Distributed detection of single-stage multipoint cyber attacks in a water treatment plant,” in Proceedings of the 11th ACM Asia Conference on Computer and Communications Security . New York, NY: ACM, May 2016, pp. 449–460.
8[8] ——, “Generalized attacker and attack models for Cyber-Physical Systems,” in Proceedings of the 40th Annual International Computers, Software & Applications Conference, Atlanta, USA . Washington, D.C., USA: IEEE, June 2016, pp. 283–292.