Leveraging SDN to Monitor Critical Infrastricture Networks in a Smarter   Way

Roberto di Lallo; Federico Griscioli; Gabriele Lospoto; Habib; Mostafaei; Maurizio Pizzonia; Massimo Rimondini

arXiv:1701.04293·cs.NI·January 17, 2017

Leveraging SDN to Monitor Critical Infrastricture Networks in a Smarter Way

Roberto di Lallo, Federico Griscioli, Gabriele Lospoto, Habib, Mostafaei, Maurizio Pizzonia, Massimo Rimondini

PDF

Open Access

TL;DR

This paper presents a novel SDN-based methodology and architecture for efficiently monitoring critical infrastructure networks by intelligently forwarding traffic to IDS locations, improving security and resource utilization.

Contribution

It introduces a new SDN-driven approach for traffic replication and monitoring in ICS networks, leveraging network stability and predictability for optimal IDS placement.

Findings

01

Effective traffic forwarding with minimal packet loss

02

Utilization of spare bandwidth for monitoring

03

Validated on real network topologies

Abstract

In critical infrastructures, communication networks are used to exchange vital data among elements of Industrial Control Systems (ICSes). Due to the criticality of such systems and the increase of the cybersecurity risks in these contexts, best practices recommend the adoption of Intrusion Detection Systems (IDSes) as monitoring facilities. The choice of the positions of IDSes is crucial to monitor as many streams of data traffic as possible. This is especially true for the traffic patterns of ICS networks, mostly confined in many subnetworks, which are geographically distributed and largely autonomous. We introduce a methodology and a software architecture that allow an ICS operator to use the spare bandwidth that might be available in over-provisioned networks to forward replicas of traffic streams towards a single IDS placed at an arbitrary location. We leverage certain…

Figures3

Click any figure to enlarge with its caption.

Tables3

Table 1. TABLE I: Elements of a substation with the bandwidth of the streams used for the evaluation.

From SCADA

To SCADA

Qty

Bandwidth

Voltage

Meter

2

10 Kbps

100 Kbps

Circuit

Switches

2

1.5 Kbps

Breakers

2

1.5 Kbps

Current

Meters

2

10 Kbps

100 Kbps

Power

Transformer

1

50 Kbps

500 Kbps

HMI

1

30000 Kbps

3000 Kbps

Historian

DB

1

30000 Kbps

3000 Kbps

Table 2. TABLE II: Data about original topologies, and topologies used in the experimentation.

	From Topology Zoo					Input for experiments
	Name	$\| N \|$	$\| E \|$	min bw (bps)	max bw (bps)	$q$	$\| N \| + \| M \|$	$\| E \|$	num. strms
1	Cesnet	10	9	200M	600M	35	501	920	770
2	AttMpls	25	56	1G	1G	50	726	1357	1100
3	Agis	25	30	45M	155M	42	614	1123	924
4	Uninet	74	101	1G	1G	95	1405	2572	2090

Table 3. TABLE III: Results of the experimentation for the off-line routing solver.

Results (off-line)
	gurobi execution time	number of observed streams	max %bw on edge
1	12s	764	97.795%
2	30s	1100	62.060%
3	33s	869	98.058%
4	421s	2087	99.455%

Equations28

x_{σ}^{e} = {1, if stream σ is being routed through link e 0, otherwise

x_{σ}^{e} = {1, if stream σ is being routed through link e 0, otherwise

Outgoing flow Out_{σ} (v) = (v, w) \in E \sum x_{σ}^{(v, w)}

Outgoing flow Out_{σ} (v) = (v, w) \in E \sum x_{σ}^{(v, w)}

Incoming flow In_{σ} (v) = (u, v) \in E \sum x_{σ}^{(u, v)}

Incoming flow In_{σ} (v) = (u, v) \in E \sum x_{σ}^{(u, v)}

Vertex flow imbalance F_{σ} (v) = Out_{σ} (v) - In_{σ} (v)

Vertex flow imbalance F_{σ} (v) = Out_{σ} (v) - In_{σ} (v)

\forall e \in E : C (e) - σ \in Crit \sum (x_{σ}^{e} + x_{\overset{σ}{ˉ}}^{e}) \cdot B_{σ} \geq 0

\forall e \in E : C (e) - σ \in Crit \sum (x_{σ}^{e} + x_{\overset{σ}{ˉ}}^{e}) \cdot B_{σ} \geq 0

\begin{array}[]{ll}\forall\sigma\in\mathit{Crit}\\ &\forall v\in V-\{s_{\sigma},t_{\sigma}\}:\quad F_{\sigma}(v)=0\\ &\textrm{Out}_{\sigma}(s_{\sigma})=1,\\ &\textrm{In}_{\sigma}(t_{\sigma})=1\end{array}

\begin{array}[]{ll}\forall\sigma\in\mathit{Crit}\\ &\forall v\in V-\{s_{\sigma},t_{\sigma}\}:\quad F_{\sigma}(v)=0\\ &\textrm{Out}_{\sigma}(s_{\sigma})=1,\\ &\textrm{In}_{\sigma}(t_{\sigma})=1\end{array}

\begin{array}[]{lll}\forall\sigma\in\mathit{Crit}\\ &\forall v\in N-L_{\sigma}:\quad F_{\bar{\sigma}}(v)=0\\ &\forall v\in L_{\sigma}:\quad F_{\bar{\sigma}}(v)\leq x_{\sigma}^{(v,t)}\\ &\forall e\in E\ \textrm{exiting}\ d:\quad x_{\bar{\sigma}}^{e}=0\\ \end{array}

\begin{array}[]{lll}\forall\sigma\in\mathit{Crit}\\ &\forall v\in N-L_{\sigma}:\quad F_{\bar{\sigma}}(v)=0\\ &\forall v\in L_{\sigma}:\quad F_{\bar{\sigma}}(v)\leq x_{\sigma}^{(v,t)}\\ &\forall e\in E\ \textrm{exiting}\ d:\quad x_{\bar{\sigma}}^{e}=0\\ \end{array}

\begin{array}[]{lll}\forall\sigma\in\mathit{Crit},\forall v\in M-\{s_{\sigma},t_{\sigma}\},e\ \textrm{adjacent to}\ v\\ \qquad x_{\sigma}^{e}=0\\ \forall\bar{\sigma}\in\mathit{Rep},\forall v\in M-\{d\},e\ \textrm{adjacent to}\ v\\ \qquad x_{\bar{\sigma}}^{e}=0\\ \end{array}

\begin{array}[]{lll}\forall\sigma\in\mathit{Crit},\forall v\in M-\{s_{\sigma},t_{\sigma}\},e\ \textrm{adjacent to}\ v\\ \qquad x_{\sigma}^{e}=0\\ \forall\bar{\sigma}\in\mathit{Rep},\forall v\in M-\{d\},e\ \textrm{adjacent to}\ v\\ \qquad x_{\bar{\sigma}}^{e}=0\\ \end{array}

max σ \in Crit \sum e \in E \sum \frac{C ( e ) - B _{σ} \cdot ( x _{σ}^{e} + x _{\overset{σ}{ˉ}}^{e} )}{C ( e )} + σ \in Crit \sum K ρ_{σ} In_{\overset{σ}{ˉ}} (d)

max σ \in Crit \sum e \in E \sum \frac{C ( e ) - B _{σ} \cdot ( x _{σ}^{e} + x _{\overset{σ}{ˉ}}^{e} )}{C ( e )} + σ \in Crit \sum K ρ_{σ} In_{\overset{σ}{ˉ}} (d)

\overset{σ}{ˉ} \in Rep \sum In_{\overset{σ}{ˉ}} (d) \leq B_{d}

\overset{σ}{ˉ} \in Rep \sum In_{\overset{σ}{ˉ}} (d) \leq B_{d}

\begin{array}[]{lll}\forall\sigma\in\mathit{Crit}\\ &\forall v\in N-L_{\sigma}:\quad F_{\sigma,d}(v)=0\\ &\forall v\in L_{\sigma}:\quad\sum_{d\in D}F_{\sigma,d}(v)\leq x_{\sigma}^{(v,t)}\\ &\forall d\in D,\ \forall e\in E\ \textrm{exiting}\ d:\quad x_{\sigma,d}^{e}=0\\ \end{array}

\begin{array}[]{lll}\forall\sigma\in\mathit{Crit}\\ &\forall v\in N-L_{\sigma}:\quad F_{\sigma,d}(v)=0\\ &\forall v\in L_{\sigma}:\quad\sum_{d\in D}F_{\sigma,d}(v)\leq x_{\sigma}^{(v,t)}\\ &\forall d\in D,\ \forall e\in E\ \textrm{exiting}\ d:\quad x_{\sigma,d}^{e}=0\\ \end{array}

\begin{array}[]{c}\forall\sigma\in\mathit{Crit},\forall d\in D,\forall v\in M-\{d\},e\ \textrm{adjacent to}\ v\\ x_{\sigma,d}^{e}=0\end{array}

\begin{array}[]{c}\forall\sigma\in\mathit{Crit},\forall d\in D,\forall v\in M-\{d\},e\ \textrm{adjacent to}\ v\\ x_{\sigma,d}^{e}=0\end{array}

max σ \in Crit \sum (K ρ_{σ} d \in D \sum In_{\overset{σ}{ˉ}} (d) + e \in E \sum \frac{C ( e ) - B _{σ} \cdot ( x _{σ}^{e} + \sum _{d \in D} x _{σ, d}^{e} )}{C ( e )}) ‘

max σ \in Crit \sum (K ρ_{σ} d \in D \sum In_{\overset{σ}{ˉ}} (d) + e \in E \sum \frac{C ( e ) - B _{σ} \cdot ( x _{σ}^{e} + \sum _{d \in D} x _{σ, d}^{e} )}{C ( e )}) ‘

\forall v \in N σ \in Crit \sum (Out_{σ} (v) + \forall d \in D \sum Out_{σ, d} (v)) \leq F T (v)

\forall v \in N σ \in Crit \sum (Out_{σ} (v) + \forall d \in D \sum Out_{σ, d} (v)) \leq F T (v)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware-Defined Networks and 5G · Network Security and Intrusion Detection · Smart Grid Security and Resilience

Full text

Leveraging SDN to Monitor Critical Infrastructure Networks

in a Smarter Way††thanks: Work partially supported by EU FP7 project “Preemptive

Preventive Methodologies and Tools to Protect Utilities”, grant no. 607093.

Roberto di Lallo1, Federico Griscioli1, Gabriele Lospoto1, Habib Mostafaei1,

Maurizio Pizzonia1 and Massimo Rimondini1

1Roma Tre University, Department of Engineering

Via della Vasca Navale 79, 00146 Rome, Italy

{dilallo,griscioli,lospoto,mostafae,pizzonia,rimondini}@ing.uniroma3.it

Abstract

In critical infrastructures, communication networks are used to exchange vital data among elements of Industrial Control Systems (ICSes). Due to the criticality of such systems and the increase of the cybersecurity risks in these contexts, best practices recommend the adoption of Intrusion Detection Systems (IDSes) as monitoring facilities. The choice of the positions of IDSes is crucial to monitor as many streams of data traffic as possible. This is especially true for the traffic patterns of ICS networks, mostly confined in many subnetworks, which are geographically distributed and largely autonomous. We introduce a methodology and a software architecture that allow an ICS operator to use the spare bandwidth that might be available in over-provisioned networks to forward replicas of traffic streams towards a single IDS placed at an arbitrary location. We leverage certain characteristics of ICS networks, like stability of topology and bandwidth needs predictability, and make use of the Software-Defined Networking (SDN) paradigm. We fulfill strict requirements about packet loss, for both functional and security aspects. Finally, we evaluate our approach on network topologies derived from real networks.

Index Terms:

Critical infrastructure (CI), Software Defined Network (SDN), Industrial Control Systems (ICSes), Intrusion Detection System (IDS)

I Introduction

ICSes are the core of critical infrastructures. They are composed by many elements that interact by means of a communication network, which we call ICS network. Main elements of an ICS are embedded devices that control actuators or gather data from sensors. Special servers are in charge to collect data from these embedded devices, show them to the control room operators, record them in a database, change settings according to operators requests, etc. While the data that flow in an ICS network are very specific, standard networking technologies can be adopted for its implementation.

In the past decade, a growth of cyber-attacks directed toward ICSes has been observed [1]. For the security of the ICS networks, best practices suggest to deploy network-based IDSes [2]. In regular networks it is acceptable to observe traffic in a small number of relevant points. However, for reliability reasons, in ICSes, Supervisory Control And Data Acquisition (SCADA) servers are close to sensors and actuators, hence, traffic is mostly local. Further, attacks to ICSes are potentially carried out by organizations (e.g., governments, intelligence agencies, terrorist groups) that can have insiders and that can carefully design attacks so that they pass unobserved by sparsely deployed IDSes. Tapping traffic close to all embedded devices and servers can easily lead to prohibitive costs. Certain solutions [3] make possible to route traffic replicas using the same ICS network towards one, or a few, IDSes, but they are not able to guarantee the successful delivery of critical ICS traffic in all cases.

In this paper, we present a methodological approach and an architecture to

(i) allow an operator to choose which traffic has to be observed within an ICS network without installing new hardware,

(ii) enable the use of the spare bandwidth in the network to forward the traffic to be observed toward an IDS, while avoiding packet loss for regular traffic, and

(iii) guarantee that the IDS receives all the traffic that the operator configured to be observed in order not to introduce false negatives due to packet loss.

Our solution takes advantage of the fact that topology and bandwidth usage are quite stable in ICS networks (see for example [4]), allowing us to assume in advance knowledge of ICS network’s traffic, since it derives from ICS design, and to perform a global off-line optimization of switching paths. Furthermore, we support the usage of the ICS network for additional and occasional traffic, which are always considered potentially dangerous. We assume that this traffic can be served with a best-effort approach while maximizing the endeavor in observing it. We propose an architecture that exploits the Software-Defined Network (SDN) approach as prescribed by the OpenFlow specifications [5]. We evaluated our methodology against four network topologies, derived from real topologies and augmented with realistic networks in the domain of electrical distribution. Our experiments show that our optimization problem can be easily solved for those scenarios in reasonable time and our approach makes efficient use of the bandwidth when the topology allows it.

The rest of the paper is organized as follows. In Section II, we describe the state of the art. In Section III, we describe the context of ICSes and introduce basic terminologies. In Section IV, we formally state the requirements that our solution should fulfill. In Section V, we describe our methodology and our proposed architecture. Section VI introduces the ILP formulation for our off-line optimization problem and in Section VII we show the on-line algorithm for occasional traffic. In Section VIII, we evaluate our approach against realistic scenarios. In Section IX, we extend our approach in order to relax some simplifying assumptions and handle special cases. Conclusions are drawn in Section X.

II State of the Art and Background

ICS networks make use of proprietary protocols, as shown in [2]. Those protocols (e.g. ModBus [6]) are tipically application-layer, and they allow the communication among ICS devices. In many cases, proprietary protocols are used also to compute routing [7], but this does not limit the adoption of different link-layer technologies [8] and new installations tend to be based on widely adopted standards, like Ethernet. Protocols adopted in ICS networks do not consider security aspects, hence, well known recommendations (e.g. [2]) suggest, among several other countermeasures, the adoption of IDSes. Forcing network traffic to cross the IDS is not so simple, especially if a network administrator needs to be flexible in the selection of traffic that has to be observed. Some flexibility can be gained by adopting proprietary protocols (like ERSPAN [3]), which however offers an unhandy solution and does not guarantee that the rest of the traffic is not affected.

In the last years, a new centralized approach called Software-Defined Networking (SDN) is collecting the attention of the research community due to its promising benefits and, in particular, its flexibility in the selection of the paths to route packets [9]. There have been many attempts in exploiting SDN in security contexts. Some works [10, 11] propose to implement the IDS as an SDN controller module. We argue that such approach poses strong scalability issues and it is not advisable in the critical infrastructure context. A different approach consists in exploiting SDN to forward traffic towards one or more IDSes, as shown in [12, 13] for the cloud computing applicative context. These solutions cannot be directly adopted in the ICS context since they do not provide any guarantee about the delivery of regular traffic.

A relevant aspect in our approach is traffic engineering. In [14], authors show that having a traffic-matrix allows traffic engineering problems to be easily solved. Usually, the traffic engineering problem is treated as a multicommodity flow problem whose solution is described in [15]. Proposals that are specific to traffic engineering for SDN can be found in [16, 17, 18]. At the best of our knowledge, our approach is the first attempt to apply traffic engineering to the specific context of traffic monitoring by IDS leveraging the coordinates of the topologies and traffic in ICS networks.

III Application Context and Terminologies

For the sake of simplicity, we assume the ICS network to be isolated from the corporate network. While this is not completely true in general, still isolation (physical or by means of a firewall) is the best practice [2]. Hence, in the rest of the paper, we only address traffic monitoring and management solely in the context of ICS networks. ICS networks connect several kinds of devices. For the purpose of our discussion we divide them in two categories. We call the first category essential: devices in this category can have a very diverse nature, but they are essential for the correct operation of the ICS, are part of the ICS design, and are always connected to the ICS network. To let the reader better understand the applicative context, we provide a more concrete description. We distinguish them in embedded devices and servers. Embedded devices111For the reader that is acquainted with the ICS context, we are referring to Programmable Logic Controllers, Remote Terminal Units, Intelligent Electronic Devices, etc. control actuators gather data from sensors, and realize closed-loop control for restricted parts of the industrial system. They can send gathered data to servers and can be remotely controlled or configured, for example by asking to open/close a circuit switcher or by setting values, called set-points, that are objective of the closed-loop control, like, for instance, a target temperature of a heater. Typically, servers are

(i) the SCADA, which gather data from embedded devices and process them, for example, to detect industrial process faults,

(ii) the Human-Machine Interfaces (HMI) that show to control room operators the current status of the ICS and allow the operator to specify commands or new set-points for embedded devices, and

(iii) the historian DB, which stores gathered data for future off-line analysis.

We call the second category non-essential: occasionally, other devices can be attached to the ICS network, for example operators’ notebooks to perform maintenance of ICS devices or to perform firmware updates.

We call stream a communication between two devices on the ICS network. We identify it by its source and its destination, specified by IP addresses. Even though communications are usually bidirectional, throughout this paper we consider a stream to be unidirectional, which means that a full communication between two devices generally encompasses two streams. A stream can be critical or standard. In a critical stream, source and destination are essential devices and the properties about the stream are known in the ICS design phase. In particular their bandwidth demand, source, and destination are known. A reliable delivery of critical streams is considered fundamental for the proper working of the ICS and substantial resources are available to guarantee this, in term of design effort, equipment, etc. A standard stream is not essential for the current functioning of the ICS and it is not known in advance. It usually involves at least one non-essential device, but it can be involved in an occasional communication between two essential devices. Supporting standard streams is important to enable occasional use of the ICS network for maintenance or other non-critical activities, hence a best-effort delivery is enough for this kind of streams.

From the point of view of the security concerns, both kinds of streams are equally important, since attacks may involve any of the two with equal chance of disruptive effects. An attack to the ICS network consists in any action that introduces unexpected traffic or unexpected changes to standard traffic. To be more clear, it consists in a source of malicious traffic (e.g. a malware or a rogue device) or in the action of tampering with any critical or standard stream. We assume that switches cannot be tampered with. We point out that security of switching devices is out of the scope of this paper. We suppose there exists a centralized Intrusion Detection System (IDS) in the ICS network, which is able to recognize malicious traffic and properly send alarms.

The goal of this paper is to provide a flexible way to use a centralized IDS. To achieve this, we assume that a standard stream $\sigma$ is duplicated, generating a replica stream; this action is performed at a network node that we call observation point. Each replica stream $\bar{\sigma}$ , associated with $\sigma$ , originates at the observation point and ends at the IDS. The extension to several IDSes requires minimal effort and it is discussed in Section IX.

IV Requirements

In this section, we list the requirements that our methodology should fulfill. We also point out the limitations of the current practice.

Observation Points – Our methodology should be able to support the observation of potentially any stream in the network, independently from topology and IDS placement. For security reasons, we prefer observation points close to the destination of streams.

Concerning current practice, in certain switches, it is possible to remotely mirror a port and also tunneling the traffic of the replica (see for example the ERSPAN technology). However, this approach provides no control on the bandwidth occupation on each link and it is limited to specific vendors support. 2. 2.

Reliable Replica Forwarding – Our methodology should guarantee no packet loss for replica streams associated with critical or standard streams. This is important in order for the IDS to inspect all observed traffic and avoid false negatives due to packet loss.

Concerning current practice, the adoption of remote mirroring technologies implies that the replica is delivered with a best-effort approach. To overcome this, in principle, traffic engineering and QoS techniques might be applied. However, this considerably increases the architectural complexity. Further, a centralized management, like the one described in Section V, is needed anyway. 3. 3.

Reliable Critical Streams Forwarding – Our methodology should be able to configure the ICS network so that, for the critical streams, no packets loss can occur due to congestion.

This requirement is motivated by the fact that, due to Requirement 1, replica streams may easily overload some links and make the usual over-provisioning strategies ineffective. Actually, up to a certain extent, forwarding reliability can be realized by adopting reliable transport protocols like TCP. However, support of TCP is non-obvious for certain embedded devices. Further, retransmission could introduce a delay that is not acceptable in the ICS context and no bandwidth guarantee is provided. The adoption of QoS and traffic engineering exhibits the same drawbacks as discussed for Requirement 2. 4. 4.

Standard Streams Usability – Our methodology should allow operators to use the ICS network for occasional tasks, which results in injecting new standard streams. While the presence of these streams should not adversely impact the fulfillment of other requirements, we expect standard streams to be treated by the ICS network in fair way. Therefore, usage of the ICS network for occasional tasks produce the same outcome for all occasional users and applications.

We also consider the well-founded technology constraint that imposes not to split streams. In fact, if packets of the same stream take different paths, uncontrolled reordering can happen, which is detrimental for TCP performance at best and can change the semantic of datagram-based communications at worst.

V A Methodology and an Architecture

In this section, we describe a methodology and architecture that solve the problem described in Section III with the aim of satisfying the requirements described in Section IV.

Our methodology assumes that the network is made of SDN switches that are compliant with the OpenFlow standard [5]. We exploit the OpenFlow features to:

(i) configure network switches to forward critical streams on the basis of globally optimized paths,

(ii) configure network switches to forward standard streams on the basis of paths chosen by an on-line greedy approach,

(iii) instruct certain network switches (observation points) to duplicate traffic, for the streams that have to be observed (either critical or standard), and perform the first forwarding step of replica streams towards the IDS,

(iv) configure network switches to forward replica for critical streams towards the IDS choosing paths that are globally optimized by our off-line approach,

(v) configure network switches to forward replicas for standard streams along paths that are dynamically selected with our on-line greedy algorithm, and

(vi) configure shaping of all streams at ingress network switches.

To meet Requirements 2 and 3, we configure the SDN network to shape each stream at its ingress node, so that packets enter the network at a specified constant rate and all packets exceeding the configured bandwidth are discarded. For critical streams, the configured maximum bandwidth is determined during the design as described below, so no packet drop should happen. For standard streams, this early limiting avoids congestion of internal nodes that could adversely impact critical streams. The shaping configuration exploits the meter feature of the OpenFlow specifications.

Our methodology encompasses a design phase and an operation phase (see Fig. 1). In the design phase, we require an ICS designer to determine the network topology and to list the critical streams along with their maximal required bandwidth. These data are provided as input to an off-line routing solver, which computes the configuration of the SDN switches for critical streams. More specifically, the input of the off-line routing solver encompasses

(i) the network topology,

(ii) the location of essential devices,

(iii) the location of the IDS, and

(iv) for each critical stream its source, its destination and its bandwidth requirement.

The off-line solver produces, for each critical stream,

(i) a forwarding path,

(ii) an observation point, and

(iii) a forwarding path for the corresponding replica stream starting at the observation point and ending at the IDS.

The off-line solver is based on an ILP formulation, which is described in detail in Section VI.

In the operation phase, we mandate the adoption of a special architecture (shown in Fig. 1) in which an SDN-controller is in charge of configuring forwarding paths and meters to implement shaping. Its configuration is divided into two parts: one for critical streams and one for standard streams. The part related to critical streams is configured on the basis of the result of the off-line solver and does not change during operation. The part related to standard streams dynamically changes during operation to adapt the configuration of the ICS network when the set of active standard streams changes. A control room operator can monitor the status of the ICS network during production time to have a clear picture of what streams are currently replicated and processed by the IDS. During operation, any new packet reaching a network switch that does not match any of the rules configured in the switch to forward critical streams is treated as the first packet of a standard stream $\sigma$ . This packet is forwarded to the SDN-controller as in the classical SDN approach. To compute the forwarding path for $\sigma$ , the SDN-controller takes advantage of an on-line routing solver. This solver shares with the controller the network topology, and the current available bandwidth on each link derived from currently allocated paths. It takes as input the source $s$ and destination $t$ of $\sigma$ and computes

(i) a forwarding path $P$ for $\sigma$ ,

(ii) an observation point $op\in P$ (preferably close to $t$ according to Requirement 1),

(iii) a forwarding path $Q$ from $op$ to the IDS, and

(iv) a new assignment of bandwidth for all standard streams comprising $\sigma$ .

The details of the on-line routing solver are described in Section VII. These information are used by the controller to re-configure the shaping for all standard streams but $\sigma$ . The new standard stream $\sigma$ is configured only after a small amount of time $\tau$ that is dimensioned so that packets related to previous standard streams that where admitted in the network with the old bandwidth allocation are guaranteed to reach destination.

Concerning the path selection, our algorithm has a greedy approach keeping unchanged all paths previously allocated for both kinds of streams. There are several reasons for this choice:

(i) sophisticated optimization techniques, like those used in in Section VI, may take a considerable amount of time, which can easily be even larger than the lifespan of the new stream and impair the usability of the network for occasional activities,

(ii) modifying the path of a current stream can introduce temporary inconsistencies in the routing that can lead to packet loss, which is against Requirements 3 and 2,

(iii) since standard streams have usually a short lifespan, our main goal is to support them within the requirements listed in Section IV, keeping the optimization of their resource usage as a secondary goal.

VI Problem Formulation for the Off-Line Routing Solver

In this section, we present the ILP formulation that is at the basis of the off-line routing solver introduced in Section V. For the sake of simplicity, we made a number of assumptions. Section IX relaxes many of them and describes several extensions. Our formulation finds, for each critical stream $\sigma$ , a forwarding path $P_{\sigma}$ , an observation point $op_{\sigma}$ , and the forwarding path of the replica stream $\bar{\sigma}$ from $op_{\sigma}$ to the IDS $d$ . Our formulation is a variation of the well-known multicommodity flow problem [15]. In the following, the role of commodities are played by streams and we call flow the part of our solution that pertains to a certain critical stream. In this section, all the streams are critical unless different specification is provided. Our variation takes into account the following aspects:

(i) streams are unsplittable, i.e., it is not allowed for a flow to bifurcate (see Section IV),

(ii) flow demands (i.e., stream bandwidth) are fixed and all critical streams must be routed,

(iii) each stream can generate a new replica stream originating at its observation point which must be the last traversed node before the destination,

(iv) nodes of the network that represent embedded devices and servers do not have switching capabilities.

Since replica streams can take up a lot of bandwidth, we make the observation of a stream optional by introducing a relevance parameter $\rho_{\sigma}$ for each stream $\sigma$ , which indicates how important it is for $\sigma$ to be the observed.

In our formulation, we use the following notation. The network is represented by a directed graph $G=(V,E)$ , where $V$ is a set of vertices and $E$ is a set of directed edges. Each physical link corresponds to two oppositely directed edges $(v,w)$ . Each edge $e\in E$ has a capacity $C(e)$ that corresponds to the available bandwidth of the link in the corresponding direction. The set of vertices $V$ is partitioned in two subsets: $N$ , representing network switches, and $M$ , representing devices with no switching capabilities (e.g., embedded devices and servers). We assume that there is no connection among vertices in $M$ . The IDS is denoted by $d\in M$ . For the sake of simplicity, we do not include the SDN-controller in this model, assuming that connectivity between SDN-controller and network switches is obtained either by a dedicated out-of-band network or by protecting part of the bandwidth of the SDN network using proper configurations. A stream is a quadruple $\sigma=(s_{\sigma},t_{\sigma},B_{\sigma},\rho_{\sigma})$ containing its source, its destination, its bandwidth demand, and its relevance, respectively. A corresponding replica stream is a triple ${\bar{\sigma}}=(op_{\sigma},d,B_{\bar{\sigma}})$ , where $op_{\sigma}$ is its source (such that $(op_{\sigma},t_{\sigma})\in E$ ), $d$ is its destination, and $B_{\bar{\sigma}}=B_{\sigma}$ is its bandwidth demand. The set of the critical streams is denoted $\mathit{Crit}$ , the set of the corresponding replica streams is denoted $\mathit{Rep}$ .

For each $e\in E$ we define $x_{\sigma}^{e}\in\{0,1\}$ as a variable that has the following meaning

[TABLE]

Analogously, variables $x_{\bar{\sigma}}^{e}$ are defined for the corresponding replica stream $\bar{\sigma}$ associated with $\sigma$ . If a stream $\sigma$ is not observed, it will be $x_{\bar{\sigma}}^{e}=0$ $\forall e\in E$ .

We now define a few convenience functions. We provide definitions for a critical stream $\sigma\in\mathit{Crit}$ and a vertex $v\in V$ , the corresponding definitions for replica streams $\bar{\sigma}\in\mathit{Rep}$ are analogous.

[TABLE]

The bandwidth consumed by the critical and replica streams must comply with link capacities:

Capacity constraints.

[TABLE]

For each critical or replica streams, we need to express flow conservation. Since flows are unsplittable, each stream generates (consumes) one unit of flow at its source (destination). Conservation is expressed separately for each stream:

Flow conservation and demand constraints for critical streams.

[TABLE]

We now need to express similar constraints for replica streams. Let $L_{\sigma}$ be the set of the possible observation points for $\sigma$ , i.e., $L_{\sigma}=\{v\in N|(v,t_{\sigma})\in E\}$ . Flows should be balanced for all vertices in $N-L_{\sigma}$ , and each vertex in $L_{\sigma}$ can produce a unit of replica flow only if it is the last hop of the path assigned to $\sigma$ (by unsplittable flow this is unique), and the IDS cannot be source of flow.

Flow conservation and demand constraints for replica streams.

[TABLE]

The above constraints also imply that $\textrm{In}_{\bar{\sigma}}(d)\leq 1$ , since for each $\sigma$ only one variable $x_{\sigma}^{(v,t)}$ can be equal to one by the unsplittable flow property.

As stated above, only vertices in $N$ have switching capabilities. Hence, all nodes in $M$ should have, for their adjacent edges, flow equal to zero but for the streams for which they are source or destination:

[TABLE]

Our objective function consists of two parts: the first one expresses the residual capacity on all the links, while the second states the preference for observing the streams.

[TABLE]

Overall, we would like to maximize both parts. In the above formulation we give precedence to the second part. That is, we prefer to observe streams with respect to leaving more residual bandwidth. In order to enforce this, we multiply the second part by $K$ , which we suppose to be big. We also state that $\rho_{\sigma}$ must be integer and greater than or equal to one, and that $K$ must be chosen to be larger than the range of values that the first part can take, namely $K>|E|\cdot|\mathit{Crit}|$ .

VII Standard streams: methodology and algorithm

In this section, we describe our on-line algorithm for routing standard streams and their related replica streams. The algorithm takes as input a new standard stream $\sigma=(s,t)$ , where $s$ is its source and $t$ is its destination, and, on the basis of the topology of the network, of the available bandwidth on the links, and of the previously allocated paths and bandwidth, it produces as result

(i) a path $P$ to be used to forward the packets belonging to $\sigma$ ,

(ii) a switch $op\in P$ (observation point) where the traffic of $\sigma$ is duplicated,

(iii) a path $Q$ to be used to forward the replica stream of the traffic of $\sigma$ from $op$ to the IDS,

(iv) an assignment of bandwidth for all currently active standard streams, comprising $\sigma$ , that should be configured in the ICS network as explained in Section V, so that all streams are forwarded respecting Requirements 2 and 4.

Once the path for the new standard stream is computed, our algorithm re-assigns the bandwidth to all standard streams in order to fulfill Requirement 4. Bandwidth reduction entails a reconfiguration of limiting and shaping and we assume this operation can be safely performed without any packet loss. However, in order to avoid packet loss during the transition, we should ensure that no queue grows because of the simultaneous presence of packets bursts sent with previous configuration of bandwidth and packets of the new stream $\sigma$ , which may account for an overall bandwidth greater than one of the links.

To address this issue, the new stream is admitted in the network only after a small amount of time $\tau$ that ensures that all packets injected with the previous bandwidth configuration are delivered. The parameter $\tau$ should be greater than the maximum delivery latency of any packet, which, however, is a quite small number and is irrelevant for the vast majority of usage scenarios. The algorithm is formally described in Figure 2. As motivated in Section IV, the algorithm select observation points as close as possible to $t$ and secondarily try to allocate the largest possible bandwidth. The latter choice takes advantage of the standard WidestPath() function [20], which performs a depth first search with backtracking looking for the path with the widest bottleneck. Bandwidths to be used in WidestPath() are computed in the first step of the algorithm. To account for bandwidth reassignment for previously allocated standard streams, we estimated the bandwidth available for $\sigma$ as the the total bandwidth available for standard streams divided by the number of streams after the allocation of $\sigma$ .

Then, the algorithm starts enumerating the candidate observation points $op$ ordered by increasing distance from $t$ . Within the same value of distance, the $op$ that allows the widest bandwidth $b$ is chosen. Once $b$ has been computed, it is compared with $b_{best}$ , replacing it if and only if $b$ is greater than $b_{best}$ (lines 14 – 19). At this point, our algorithm recomputes all bandwidth assignment using the Water Filling (WF) technique [19] (lines 22 – 24), allowing us to find the maximum amount of bandwidth to assign to each stream. We realize WF in the following way. Suppose, the SDN-controller keeps a data structure that associates with each edge $e$ the set of streams $S(e)$ passing through $e$ . Let $c(e)$ be the available bandwidth for standard streams. WF looks for an edge $\bar{e}$ such that $\bar{e}$ has the minimum of $c(e)/|S(e)|$ . WF consider $\bar{e}$ a bottleneck, hence, all streams in $S(\bar{e})$ are assigned bandwidth $c(\bar{e})/|S(\bar{e})|$ and discarded. Remaining bandwidth $c(e)$ are re-computed for all edges and the search is performed again until all streams are discarded and their bandwidth assigned. In this way, our algorithm successfully computes:

i) $P_{best}$ , namely the best available path;

ii) $op$ , namely the starting vertex for replica streams;

iii) $Q_{best}$ , namely the best path for replica stream;

iv) new bandwidth assignment for $S$ and $\sigma$ .

The complexity of the WidestPath() functions is $O(|E|)$ , as it is based on BFS algorithm, and it is run on each vertex a constant number of times. Hence, the observation point is found in $O(|V||E|)$ time. The WF takes $O(|E||S|)$ . Therefore, the overall worst case time complexity of our on-line algorithm is $O(|E|(|V|+|S|))$ . Actually, in the most common cases, we think the $op$ is found in time much smaller than $O(|V|)$ , so the time complexity can be often regarded to be $O(|E||S|)$ .

VIII Evaluation

We validated our approach from three points of view:

(i) we assess the efficiency of our implementation with respect to computation time on realistic instances, inspired by the electricity distribution domain, for both on-line and off-line routing solvers,

(ii) we show the efficiency of the bandwidth allocation of the on-line routing solver for standard streams, and

(iii) we discuss the ability of our solution to meet requirements listed in Section IV.

We identified four different realistic topologies in the following way. We selected four large topologies form topology-zoo.org that are equipped with real link bandwidths or that are fairly mashed. When no links bandwidth are available 1Gbps links was assumed. We considered each node $n$ to be a router associated with a city. We equipped each city with a number of electrical substations whose ICS network is connected to $n$ . Let $B_{n}$ be the sum of the bandwidth of all links incident to node $n$ . The node with the largest value of $B_{n}$ is also equipped with one IDS serving the whole network.

The city associated with node $n$ , is equipped with $q_{n}$ identical substations. The total number of substations in the network is $q=\sum_{n}q_{n}$ . The dimensioning of $q_{n}$ is provided below. The network of a substation is designed on the basis of information that can be freely found in the Internet222Each of them modeled following the Wikipedia description https://en.wikipedia.org/wiki/Electrical_substation. Figure 3 shows the topology of a single substation with its connection to the router and Table I shows the devices it contains. Industrial process data are communicated from embedded devices to the local scada system, and in turn to the HMI and to the DB. The amount of bandwidth required by these communications is shown in Table I, which also show the quantity of each sensors/actuators. For the relevance, we chose always the value 1. We equip each city with a number $q_{n}$ of substations according to a decreasing power law distribution. In practice, nodes $n$ are sorted by their value of $B_{n}$ . For $n$ with the largest $B_{n}$ , we state $q_{n}=10$ . For $n$ in position $i$ , $q_{n}=\left\lfloor 10/i^{\alpha}\right\rfloor$ , where $\alpha$ is chosen between 0.7 and 1. When setting the capacities of the edges we reserved $5\%$ of the bandwidth for standard streams. Data about used topologies are shown in Table II.

To validate our off-line routing solver, we instantiated the ILP problem for our four topologies and solved them using Gurobi optimizer ver. 6.5. The formulation set up was performed by using the Python API. The corresponding code is available on the Internet [21]. The computation run on a workstation equipped with 8 processors Intel Xeon 2.8GHz. Results for the off-line solver are shown in Table III. The evaluation shows that the formulation of Section VI can be practically used. Considering that the foreseen usage of the formulation is during design, running times are quite small. This makes us believing that our approach could be successfully used even in much larger scenarios. Even though, solving times are small, they are not suitable for an on-line use. This justify the introduction of the specific ad-hoc on-line solver, whose algorithm was presented in Section VII.

To validate the on-line routing solver, for each network, we randomly generated a sequence of events (available at [21]) as follows. We suppose that standard streams are initiated by (human) operators, whose number is proportional to the network size. We choose to have as many operators as substations (i.e., $q$ ). Each operator $u$ is attached to a switch $s\in N$ chosen uniformly at random and generates a sequence containing two kinds of events:

(i) $\mathrm{begin}(c,u,t)$ operator $u$ starts a connection, identified by $c$ , with machine $t\in M$ , and (ii) $\mathrm{end}(c)$ connection $c$ ends.

Interarrival time between begin of connections is exponentially distributed with mean $1/\lambda$ . Duration of each connection is exponentially distributed with mean $3/\lambda$ (i.e., each operator on average connects to 3 machines at the same time). We set $1/\lambda=5\ \textrm{minutes}$ and the sequence spans about 10 minutes (from 176 to 576 streams).

We initialized the status of the solver with the output of the off-line solver for critical streams. Then, we run, for each network, the on-line solver on its sequence of events generated as described above. Figure 4 shows a density diagram, that has on the x-axis possible bandwidth values and on the y-axis the fraction of streams that had that bandwidth assigned in our experiments. In our experiment, assigned bandwidth is always very close to the maximum of the backbone bandwidth. Sometime, if source and destination of the stream are close each other, assigned bandwith can be larger (cf. Table II).

The off-line optimization, together with the traffic shaping approach described in Section V, ensures compliance to Requirements 1, 2, and 3. Further, the inclusion of standard streams is performed only by using the spare bandwidth of each link, thus protecting critical stream and replica streams from packet loss due to congestion (see Section VII). Requirement 4 encompasses two essential aspects: fairness of bandwidth allocation and response time. Our approach handle all streams always assigning the same bandwidth to all of them and dynamically adapting it on the basis of the current needs. This ensures fairness at expense of some bandwidth waste, since certain streams may not use the whole bandwidth assigned to them. To improve this aspect, dynamic polling of bandwidth usage should be adopted [16], however, we believe that in the ICS context, this approach may not be worth the effort. Concerning response time, this mostly depends on the internal architecture of the SDN-controller. A further aspect is the time $\tau$ the controller have to wait to be sure no packet loss occurs when the bandwidth of certain streams have to be reduced (see Section V). Since $\tau$ should be greater than the time a packet traverse the network, we expect it to be no more than a few milliseconds, which should be negligible for all applications that are reasonable to use in the ICS context.

IX Possible Variations and Improvements

In this section we discuss possible variations to the approach described in Sections V, VI, and VII.

Bandwidth Reservation for Standard Streams. Our approach statically allocate bandwidth for critical streams and their replica streams, using the spare bandwidth for standard streams. However, it is easy to use our formulation to explicitly save some bandwidth for this purpose during design by artificially reducing the capacities $C(e)$ of Constraint 4.

Dynamicity. In the description of our approach, we suppose that the needs for monitoring the critical streams are known in advance and embodied in the relevance parameters $\rho_{\sigma}$ . However, there are situations in which we may want to dynamically choose which stream IDS has to analyze. For example, when an anomaly is recognized, we may want the IDS analysis to focus on the devices close to it, possibly momentarily giving up the inspection of traffic of other devices to free up network and IDS resources. This can be supported by implementing in the controller with capability to switch off observation of critical streams upon request of the control room operator. Further, operator may explicitly ask for observation of a critical stream $\sigma$ that was currently not observed. To implement this operation, a search for the widest path starting from the last hop before $t_{\sigma}$ to the IDS have to be performed. If the resulting available bandwidth on the widest path is greater than $B_{\sigma}$ , the SDN-controller set up the rules for duplication and forwarding toward the IDS, otherwise the search can be done backward along the path from the $t_{\sigma}$ to $s_{\sigma}$ . Alternatively, since this somewhat relaxes the support for Requirement 1, the bottlenecks identified by the widest path algorithm can be used to suggest a set of streams whose observation can be switched off to free up enough network resources to satisfy the operator request.

Limited IDS Resources. In our description, we supposed that the IDS has unlimited computational power. While this might be reasonable if the IDS is based on cloud technologies, often the designer should deal with IDS limits. If we suppose that the IDS is known to scale up to a certain bandwidth $B_{d}$ , the formulation of Section VI can support it by simply introducing the following constraint.

[TABLE]

However, special care should be taken in handling standard streams. In fact, during the off-line optimization, some IDS bandwidth should be saved for the analysis of standard streams replicas. Further, on-line routing solver must consider the IDS bandwidth when calculating the new bandwidth assignment for all the standard streams in the WF phase. Essentially, both on-line and off-line solver can address the problem as if the IDS were reachable only through a link of capacity $B_{d}$ .

Support for Multiple IDSes. For the sake of simplicity, in our description, we assumed that only one IDS is present in the ICS network. However, there are situations in which it might be convenient to have more IDSes $d_{1},\dots,d_{k}\in D$ distributed across the ICS network. Hence, a stream can be observed by any of the IDSes. The formulation of Section VI can be changed to support this in the following way. Variables $x_{\bar{\sigma}}^{e}$ are substituted with distinct variable sets $x_{\sigma,d}^{e}$ for each IDS $d\in D$ . The functions $\textrm{Out}_{\sigma,d}(v)$ , $\textrm{In}_{\sigma,d}(v)$ , and $F_{\sigma,d}(v)$ are defined for each $d\in D$ as obvious variations of Equations 1, 2, and 3. In Constraint 4, $x_{\bar{\sigma}}^{e}$ should be substituted by $\sum_{d\in D}x_{\sigma,d}^{e}$ . Constraints 6 should be substituted by

[TABLE]

Since only one variable among $x_{\sigma}^{(v,t)}$ can be greater than zero (by unsplittability of flows), the second inequality implies that only one IDS is involved in the observation of $\sigma$ . The second of Constrants 7 should be substituted by

[TABLE]

Finally, the objective function should be changed into

[TABLE]

With these changes, the formulation automatically perform IDS assignment to streams so that objective function is maximized.

Flow Table Size Control. In SDN networks, the number of rules configured in each network switch is a concern. In fact, rules occupy entries in limited size flow tables. Since, the SDN-controller configures a rule for each outgoing stream, limits to the flow table can be take into account by the following constraints, where $FT(v)$ is the maximum number of rules that can be configured in the switch $v$ .

[TABLE]

X Conclusions

We proposed a methodology and an architecture that enable flexible adoption of one IDS (or a few of them), while keeping the possibility to mirror any stream in the network and forward it toward the IDS independently from its deployment location. While we think that our approach can be useful in many contexts, we tailored it for the usage within ICS networks, where most of the traffic flows are critical and known in advance, and occasional usage can be handled with a best effort approach. We base our work on SDN technology, which allowed us to keep a simple centrally managed network configuration. We presented several small-effort extensions to the basic description in Section IX. However, the integration of a distributed approach for the SDN-controller, like the one presented in [22], in our architecture, may be the subject of additional research. Further, in our solution, we statically assigned bandwidth to all critical streams, disregarding cases in which traffic is not stable over time. Better usage of the bandwidth could be achieved by taking this into account.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] I. control systems cyber emergency response team control systems security program, “Ics-cert incident response summary report 2009-2011,” ICS-CERT, Tech. Rep., 2011.
2[2] K. Stouffer, S. Lightman, V. Pillitteri, M. Abrams, and A. Hahn, “Guide to industrial control systems (ics) security – nist special publication (sp) 800-82 revision 2,” NIST, Tech. Rep., 2015.
3[3] “Cisco nexus 7000 series nx-os system management configuration guide,” Cisco Systems Inc., Tech. Rep., 2011.
4[4] S. Tom, D. Christiansen, and D. Berrett, “Recommended practice for patch management of control systems,” DHS control system security program (CSSP) Recommended Practice , 2008.
5[5] O. S. Specification, “Version 1.3.3 (wire protocol 0x 04),” Sept 2013.
6[6] I. A. S. MODICON, Inc., “Modbus protocol – reference guide,” Tech. Rep., 1996.
7[7] I. ODVA, “The common industrial protocol (cip),” https://www.odva.org/Technology-Standards/Common-Industrial-Protocol-CIP/.
8[8] M. Herrero Collantes and A. López Padilla, “Protocols and network security in ics infrustructures,” INCIBE, Tech. Rep., 2015.