CANet: An Unsupervised Intrusion Detection System for High Dimensional   CAN Bus Data

Markus Hanselmann; Thilo Strauss; Katharina Dormann; Holger Ulmer

arXiv:1906.02492·cs.CR·March 26, 2020

CANet: An Unsupervised Intrusion Detection System for High Dimensional CAN Bus Data

Markus Hanselmann, Thilo Strauss, Katharina Dormann, Holger Ulmer

PDF

TL;DR

CANet is an innovative unsupervised neural network designed to detect both known and unknown intrusions in high-dimensional CAN bus data, operating in real-time on individual messages with different IDs.

Contribution

It introduces the first deep learning-based IDS that evaluates individual CAN messages in real-time, handling messages with varying IDs and frequencies.

Findings

01

Outperforms previous machine learning methods significantly

02

Effective on real and synthetic CAN data

03

First to evaluate messages with different IDs in real-time

Abstract

We propose a novel neural network architecture for detecting intrusions on the CAN bus. The Controller Area Network (CAN) is the standard communication method between the Electronic Control Units (ECUs) of automobiles. However, CAN lacks security mechanisms and it has recently been shown that it can be attacked remotely. Hence, it is desirable to monitor CAN traffic to detect intrusions. In order to detect both, known and unknown intrusion scenarios, we consider a novel unsupervised learning approach which we call CANet. To our knowledge, this is the first deep learning based intrusion detection system (IDS) that takes individual CAN messages with different IDs and evaluates them in the moment they are received. This is a significant advancement because messages with different IDs are typically sent at different times and with different frequencies. Our method is evaluated on real and…

Tables3

Table 1. Table 1. Schematic representation of CAN bus data after preprocessing its bytes to signals. Note, that at each time stamp only the signal values of a single ID are transmitted. Different IDs may contain a different number of signals, e.g. ID A 𝐴 A consists of six signals whereas ID D 𝐷 D has one signal. The time stamp is given in milliseconds. In CANet, the time is discretized.

CAN Bus Data after Preprocessing
Time Stamp	ID	Signals of $A$						Signals of $B$			Signals of $C$			Signals of $D$
1.045	B	-	-	-	-	-	-	54.71	0	7.24	-	-	-	-
3.102	D	-	-	-	-	-	-	-	-	-	-	-	-	31.47
4.978	A	12	44.15	38.02	2	0	1	-	-	-	-	-	-	-
7.014	C	-	-	-	-	-	-	-	-	-	17.79	7	2	-
8.993	B	-	-	-	-	-	-	55.02	1	7.21	-	-	-	-
9.750	A	13	44.01	39.67	1	0	2	-	-	-	-	-	-	-
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮

Table 2. Table 2. CANet architecture for detecting intrusions on the CAN bus. A total number of N 𝑁 N signals is considered. The parameter h s c a l e ∈ ℕ subscript ℎ 𝑠 𝑐 𝑎 𝑙 𝑒 ℕ h_{scale}\in\mathbb{N} determines the computational power of the neural network and is specified in the evaluation part.

CANet Architecture
Layer Type	Size
LSTM per ID	Number of Signals of ID $\times$ $h_{s c a l e}$
Joint Latent Vector	$N \times h_{s c a l e}$
Fully Connected (ELU)	$N \times h_{s c a l e} / 2$
Fully Connected (ELU)	$N - 1$
Fully Connected (ELU)	$N$

Table 3. Table 3. Summary of experimental results on real and synthetic CAN data. For normal test data the accuracy is recorded. For data with attacks the true positive rate, i.e. rate of attacks that are successfully detected, and the true negative rate, i.e. rate of normal data that is found as such, are recorded. Note that this is a pointwise evaluation criterion.

Evaluation Table
Synthetic Data
Model Specification		Accuracy	True Positive Rate / True Negative Rate
Method	$h_{s c a l e}$	No Attack	Plateau	Continuous	Playback	Suppress	Flooding
CANet	5	0.991	0.896 / 0.980	0.740 / 0.994	0.896 / 0.997	0.496 / 0.996	0.900 / 0.997
CANet	10	0.990	0.955 / 0.975	0.765 / 0.994	0.905 / 0.996	0.613 / 0.996	0.901 / 0.996
CANet	20	0.992	0.885 / 0.993	0.771 / 0.996	0.906 / 0.997	0.581 / 0.995	0.884 / 0.997
CANet	30	0.995	0.707 / 0.996	0.446 / 0.997	0.630 / 0.998	0.184 / 0.997	0.625 / 0.997
Predictive	-	0.996	0.330 / 0.974	0.015 / 0.994	0.020 / 0.996	0.003 / 0.993	0.644 / 0.994
Autoencoder	-	0.983	0.355 / 0.927	0.016 / 0.975	0.029 / 0.995	0.001 / 0.993	0.688 / 0.995
Real Data
Model Specification		Accuracy	True Positive Rate / True Negative Rate
Method	$h_{s c a l e}$	No Attack	Plateau	Continuous	Playback	Suppress	Flooding
CANet	5	0.996	0.937 / 0.963	0.792 / 0.975	0.852 / 0.911	0.082 / 0.997	0.808 / 0.943
CANet	10	0.994	0.913 / 0.988	0.701 / 0.985	0.878 / 0.977	0.176 / 0.989	0.802 / 0.996
CANet	20	0.995	0.936 / 0.968	0.724 / 0.988	0.862 / 0.954	0.254 / 0.991	0.761 / 0.992
CANet	30	0.992	0.936 / 0.970	0.740 / 0.983	0.903 / 0.940	0.240 / 0.983	0.833 / 0.994
Predictive	-	0.995	0.269 / 0.949	0.577 / 0.985	0.134 / 0.964	0.001 / 0.998	0.182 / 0.998
Autoencoder	-	0.999	0.055 / 0.992	0.491 / 0.983	0.079 / 0.977	0.007 / 0.995	0.627 / 0.996

Equations4

R_{t} (s_{t, A_{i}}) = (R e c_{t} (s_{t, A_{1}}), \dots, R e c_{t} (s_{t, A_{K}})),

R_{t} (s_{t, A_{i}}) = (R e c_{t} (s_{t, A_{1}}), \dots, R e c_{t} (s_{t, A_{K}})),

l oss (s_{t, A_{i}}) = ∣∣ R e c_{t} (s_{t, A_{i}}) - s_{t, A_{i}} ∣ ∣_{ℓ_{2}}^{2} .

l oss (s_{t, A_{i}}) = ∣∣ R e c_{t} (s_{t, A_{i}}) - s_{t, A_{i}} ∣ ∣_{ℓ_{2}}^{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

CANet: An Unsupervised Intrusion Detection System for High Dimensional CAN Bus Data

Markus Hanselmann

ETAS GmbH, Bosch GroupStuttgartGermany

[email protected]

,

Thilo Strauss

ETAS GmbH, Bosch GroupStuttgartGermany

[email protected]

,

Katharina Dormann

Robert Bosch GmbHLudwigsburgGermany

[email protected]

and

Holger Ulmer

ETAS GmbH, Bosch GroupStuttgartGermany

[email protected]

Abstract.

We propose a novel neural network architecture for detecting intrusions on the CAN bus. The Controller Area Network (CAN) is the standard communication method between the Electronic Control Units (ECUs) of automobiles. However, CAN lacks security mechanisms and it has recently been shown that it can be attacked remotely. Hence, it is desirable to monitor CAN traffic to detect intrusions. In order to detect both, known and unknown intrusion scenarios, we consider a novel unsupervised learning approach which we call CANet. To our knowledge, this is the first deep learning based intrusion detection system (IDS) that takes individual CAN messages with different IDs and evaluates them in the moment they are received. This is a significant advancement because messages with different IDs are typically sent at different times and with different frequencies. Our method is evaluated on real and synthetic CAN data. For reproducibility of the method, our synthetic data is publicly available. A comparison with previous machine learning based methods shows that CANet outperforms them by a significant margin.

††conference: ; ;

1. Introduction

Automobiles are getting more and more connected by technologies such as Bluetooth, Wifi or smart phone plug-ins. While this simplifies the driver’s life, it simultaneously opens new paths for potential remote attacks on the Electronic Control Units (ECUs) of cars. Hijacking an ECU can allow attackers to place messages on the vehicle-internal communication network and, thus, e.g. to invoke sudden breaking or turning off the engine which can, potentially, cause traffic accidents (Checkoway et al., 2011; Miller and Valasek, 2015). This may have even more disastrous outcomes in autonomous vehicles. Hence, detecting the attempt of attacks in car networks is in the interest of traffic safety.

In this paper, we focus on the CAN bus as it is the most common vehicle bus standard. Typically, CAN messages are used to transmit signals between ECUs. For example, an ECU can send the information about objects on the road so that the break assist can react accordingly. An extensive overview about previous work on CAN intrusion detection systems can be found in (Tomlinson et al., 2018a). A strong focus lies on rule based and statistical methods to detect known attack scenarios. While many types of intrusions can be detected efficiently by these approaches, the configuration of such an IDS is time-consuming, requires domain expertise, and it is unlikely that unknown attack scenarios can be detected. Moreover, it is challenging to generate rules that capture the underlying behavior of signals or physical dependencies between them.

With the advances in deep learning in the recent years (Krizhevsky et al., 2012; LeCun et al., 2015; Hinton et al., 2012), new tools are becoming available that have the potential of detecting unknown attacks. Prior work on intrusion detection with neural networks on single CAN signals can be found in the literature (Taylor et al., 2016; Weber et al., 2018b, a). However, to the best of our knowledge, there is no neural network architecture that can handle the CAN bus data structure in the signal space. On the CAN bus, at every point in time at most one message is transmitted. As a result the CAN traffic consists of consecutive messages with different IDs. These messages contain different kinds of signals (see Table 1). The data structure makes it difficult to feed the data of the CAN bus directly into any kind of standard neural network.

The contributions of this manuscript are the following: We introduce CANet, a novel neural network architecture tailored to work on the signal space of CAN data and we show that it outperforms baseline methods by a significant margin. For each message ID we introduce one separate long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) subnetwork with input dimension equal to the number of signals from the message of that particular ID. The output of the LSTM subnetworks is concatenated to a single latent vector that encodes the current state of the entire CAN traffic. This vector is followed by several fully connected layers in an autoencoder setting to finally reconstruct the payload of the input message. We describe how this architecture can be trained in an unsupervised manner and evaluated in such a way that a variety of unknown attack types can be detected while identifying normal data correctly. We show that our method is especially strong in finding certain manipulations of signals that are difficult to detect by classical approaches (e.g. the continuous change of a signal towards a desired value). Hereby, we exploit the fact that by processing the joint latent vector the network is able to learn the functional dependencies of the signals under consideration.

We point out that anomaly detection in the signal space of CAN bus data may have other applications beyond intrusion detection e.g. early detection of technical failures.

This document is organized in the following way: In Section 2, the background of the CAN bus and the relevant literature on CAN IDS is briefly covered. In Section 3, the proposed network architecture and its training process is described. In Section 4, the method is evaluated on real and synthetic CAN data and compared to related work. Finally, in Section 5, we present our conclusions and future work.

2. Relevant Background

2.1. Terminology

The Controller Area Network (CAN) is a vehicle bus standard designed to allow automotive Electronic Control Units (ECUs) to communicate with each other.

A CAN message is characterized by a time stamp, an ID and typically an 8-byte payload field. The ID represents the type of the current message and the payload field is used to communicate current values of vehicle signals. Each ID is associated with a set of signals and the payload of messages carrying this ID provides their current values. In general, messages with different IDs contain a different set of signals. The encoding of signal values ranges from single bits to several bytes of the payload. A so-called CAN matrix provides information for each ID, specifying which payload bits encode which signal. In this paper, we assume that the decoding of raw payload bits into signal values is already performed (see Table 1).

2.2. Objective

The objective is to detect intrusions into the CAN bus communication. We assume that the attacker has already gained access to the CAN bus, e.g. by hijacking one of the connected ECUs. This follows the path of demonstrated security relevant attacks (Miller and Valasek, 2015). We further assume that an attacker now tries to influence the vehicle behavior by manipulating messages.

Our goal is to detect signals deviating from their “normal” behavior or signals breaking out of physical relationships. These physical relationships are typically complex, potentially unknown and hard to derive by rule based intrusion detection systems for CAN.

2.3. Related Research

In this section, we discuss previous approaches for intrusion detection on CAN bus data. In (Taylor et al., 2016), a LSTM network structure for predicting the next payload of a single ID is proposed. Other deep learning based results include (Weber et al., 2018b), where an autoencoder like network architecture is used on a sliding window over the appearances of a specific signal. In (Weber et al., 2018a), a lightweight on-line detector of anomalies (LODA) is proposed. While these methods show promising results, they all build a model for a single time series containing the values from one signal or signals of one ID. We extend this by enabling the network to work with all signals of multiple CAN IDs simultaneously. This gives CANet the advantage that it can detect intrusions by inferring functional or physical dependencies that appear between signals on the CAN bus.

Non neural network based methods for anomaly detection on the CAN bus payload include signature based methods (Studnia et al., 2014), finger printing (Cho and Shin, 2016), clustering methods (Tomlinson et al., 2018b), fuzzy logic (Martinelli et al., 2017), Hidden-Markov-Model based methods (Narayanan et al., 2016), and entropy based methods (Müter and Asaj, 2011; Marchetti et al., 2016). A comprehensive review of the strengths and weaknesses of these and other non-payload based methods can be found in (Tomlinson et al., 2018a).

3. CANet

The basic idea of CANet is to handle the challenging structure of CAN data by introducing several independent recurrent neural networks on the input side. This enables the network to learn temporal dependencies in the signals. In detail, we introduce a separate LSTM for each ID that gets the signal values associated to this ID as inputs and that stores the current state of the processed data. The state of the whole CAN traffic is then represented by a joint latent vector that is realized as concatenation of the current states of all input networks. This vector does not contain any information about what ID has been processed last. To enable an unsupervised learning setting, the joint latent vector is fed into a subnetwork of consecutive linear layers in an autoencoder setting. That is, at each time step the task of this subnetwork is to reconstruct the signal values of each possible input message solely based on the current joint latent vector. The deviation between the signal values of the true input message and its reconstruction is then used as a measure for the normality of the input message. Since the network is solely trained on normal data, it is expected that this reconstruction error is small on normal data and large on anomalous data.

3.1. Network Architecture

In order to describe the network architecture efficiently, we first establish some notations. Let ${\mathbf{A}=\{A_{1},\dots,A_{K}\}}$ be the ordered set of all $K\in\mathbb{N}$ considered IDs. For each ID $A\in\mathbf{A}$ we take $n_{A}$ corresponding signals into account that are encoded in the payload. We denote the total number of signals by $N$ . Whenever an ID $A\in\mathbf{A}$ is observed at time step $t$ , we denote the vector containing the corresponding signal values by $s_{t,A}\in\mathbb{R}^{n_{A}}$ . Note that we hereby discretize the time.

A visualization of the architecture can be found in Figure 1 and the corresponding layer specifications in Table 2. The network architecture consists of an input LSTM subnetwork for every ID ${A\in\mathbf{A}}$ . The $i$ -th LSTM subnetwork is associated with ID $A_{i}$ , has $n_{A_{i}}$ inputs, and a hidden dimension of size ${n_{A_{i}}\cdot h_{scale}}$ . Here, ${h_{scale}\in\mathbb{N}}$ represents the computational power of CANet. In the evaluation part, the performance with different $h_{scale}$ values is compared (see section 4.4). Whenever a new payload of an ID ${A\in\mathbf{A}}$ is fed through the corresponding LSTM subnetwork, its output of size ${n_{A}\cdot h_{scale}}$ is used to update the corresponding memory in the joint latent vector. Hence, the joint latent vector has length ${N\cdot h_{scale}}$ . It represents and stores the current state of the CAN traffic. The joint latent vector is followed by a set of fully connected layers where the penultimate layer has strictly less neurons than the output layer, which has $N$ neurons. The task of the output layer is to reconstruct all potential current input signals from all IDs.

During training, at each time step $t\in\mathbb{N}$ the payload of an ID, $A_{i}$ say, is fed through its corresponding LSTM input model. It then is used to update the joint latent vector in order to reconstruct the payload at time step $t$ for all $A\in\mathbf{A}$ . Formally, the reconstruction $R_{t}$ of the payload for the time step $t$ is denoted by

[TABLE]

where $Rec_{t}(s_{t,A_{j}})$ denotes the reconstruction of the payload associated to ID $A_{j}$ .

We then compare the true signal values from the payload of the current ID $A_{i}$ with their reconstructions $Rec_{t}(s_{t,A_{i}})$ . We use the quadratic error loss function, given by

[TABLE]

In detail, when computing the back propagation with respect to this loss function, only the gradients of the weights for the LSTM subnetwork of ID $A_{i}$ and the weights that connect the joint latent vector with the output $Rec_{t}(s_{t,A_{i}})$ must be computed.

Since the temporal dependencies are stored for each ID separately in the corresponding LSTM subnetwork, the training process has the advantage that the model as a whole is not sensitive to the exact order of consecutive message IDs. In fact, in real CAN data, there is some variability in the order of IDs, even within the same data set.

3.2. Anomaly Score

The quadratic error between the signal and its reconstruction as defined in Equation 1 can be used to predict whether or not a signal at time step $t\in\mathbb{N}$ is anomalous. This prediction can be made by testing if the error is above a fixed threshold. Due to the fact that we consider an unsupervised learning problem, the threshold must be chosen solely based on normal data. It is computed for each signal separately and is given by the $99.99\%$ percentile of all corresponding quadratic errors on a validation data set. During evaluation, an individual anomaly indicator is stored for every signal. Every time an ID is processed by the model, the anomaly indicator of the respective signals is updated and set to 1 if the reconstruction error exceeds the corresponding $99.99\%$ percentile and set to 0 otherwise. The global anomaly score at a time step $t$ is set to 1 if and only if at least one of the stored signal anomaly indicators is 1 and set to 0 otherwise.

Note that this anomaly score is only feasible if the number of signals does not get too large because it suffers from similar effects as multiple testing problems. That is, if the number of signals becomes too large, the probability that a normal data point is identified as such (true negative) decreases.

While this anomaly score is good for demonstrating the capabilities of our method, it may need to be refined for in-vehicle usage, where a 100% true negative rate is required because the cost of each false positive is relatively high. That is, depending on the response mechanism this might have consequences such as recommending the driver to stop or see a mechanic due to an attack.

4. Experiments

In this section, we evaluate our method on both, real and synthetic CAN data.

The real data was collected on a test vehicle. In our experiments, 13 IDs with a total number of 20 signals are taken into consideration. The signals are chosen in such a way that they contain physical values and that, for each signal, there is at least one other signal with a functional dependency to it. We divide about 13 hours of recorded data into 12.5 hours of training and 0.5 hours of test data. We only consider data representing the normal driving mode. Hence, we exclude e.g. starting and turning off the engine. All payloads are preprocessed into their signal value space (see Table 1).

In the case of the synthetic data, we consider a data set consisting of 10 different message IDs, each with different amounts of signals per ID and different noisy time frequencies. The total amount of signals is 20. The data is created in such a way that it is similar, based on our experience, to real CAN traffic. The data contains physical values, counters and signals that are dependent on one or multiple other signals. We use a training data set of about 16.5 hours and a test data set of about 7.5 hours of CAN traffic. The data set is available at https://github.com/etas/SynCAN.

4.1. Simulated Attacks

In both, the real and the synthetic data set, the test data is divided into six subsets of equal time length. We use one subset to evaluate our model on normal data. The other five test data sets are used to evaluate our model on the following attack types:

(1)

Plateau attack: A single signal is overwritten to a constant value over a period of time, i.e. a jump or freezing the signal. We only consider jumps in the typical signal range. Higher jumps represent a clear attack. For example, a car cannot speed up from $20$ km/h to $100$ km/h within 10 ms. Such attacks might be detected just by considering the respective signal. Smaller jumps or freezing might only be detected if we consider a set of signals with some kind of correlation between them. 2. (2)

Continuous change attack: A signal is overwritten so that it slowly drifts away from its true value. This assumes that the attacker wishes to set a signal to a concrete value while trying to fool the IDS with realistic small changes in the signal. 3. (3)

Playback attack: A signal value is overwritten over a period of time with a recorded time series of values of that signal. The attacker hopes to trick the IDS by sending completely real looking signal values of a different traffic situation. 4. (4)

Suppress attack: The attacker prevents an ECU from sending messages, for example, by turning it off. This kind of attack means that messages of some particular ID do not appear in the CAN traffic for some period of time. 5. (5)

Flooding attack: The attacker sends messages of a particular ID with high frequency to the CAN bus. This attack is easier to perform in praxis then the aforementioned ones, since the attacker does not need to control an ECU. It only requires to send additional messages to the CAN bus in order to “overwrite” the real message values.

The length of a typical attack interval, in both the real and the synthetic data set, is between 2-4 seconds. In each synthetic test data set are about $100$ and in each real test data set about $10$ attack intervals of the corresponding type. The suppress and flooding attack can be relatively easily detected with a rule based method by analyzing the frequencies of the signals. However, the plateau, continuous change and playback attack are rather difficult to be detected with a rule based system. Furthermore, for all three attacks it is often not sufficient to only consider each signal separately. For example, in the playback attack it is mostly impossible to detect the majority of an attack interval if no access to some correlated signals is given.

4.2. Network Training Details

In this section, we present the training details of our method in order to make the results reproducible. All code for training and evaluation is written with pyTorch (Paszke et al., 2017). We use the network architecture described in Table 2 for different $h_{scale}$ values. The optimizer of choice is Adam (Kingma and Ba, 2015) with a initial learning rate of $0.01$ . The data is signal wise 0-1 scaled. We train the network for 1000 epochs with batch size 25. Every element in a batch is a series of 5000 consecutive messages at random starting position in the training data. At the beginning of each epoch the hidden and cell state vector of all LSTM models are initialized with zero. During a single epoch, a back-propagation is performed every $250$ iterations (time steps) in order to update the network weights. For a more robust training, the loss function is multiplied by a fixed scalar for different IDs. That is, the scalar is linearly smaller the more frequent its corresponding ID appears in the training data set. This is to ensure that IDs that appear more often do not get more weight than less frequent IDs during training.

The training is performed on the training data set after removing a small portion of the data that is used to compute the thresholds for the anomaly score.

4.3. Comparison with Related Research

CANet is the first approach capable of handling the data structure of CAN bus data with multiple CAN IDs simultaneously within a single neural network model. Therefore, a one to one comparison with an existing method is not possible. In order to compare CANet with a baseline, we adapted the following methods:

(1)

Predictive Baseline: In (Taylor et al., 2016), the basic idea is to learn a separate model for each ID. At each time step, the model predicts the payload of the next occurrence of its associated ID. The network directly processes the bit representation of the payload. As preprocessing step, the raw data is fed to a subnetwork of linear layers. The output is then processed by a combination of LSTM and linear layers that perform the prediction. The difference between the true value and the prediction is used for the anomaly score. We train one predictive model per ID. Since we have access to the signal representation of the payload, we omit the preprocessing subnetwork and feed the signal values directly into the LSTM layers that are followed by a set of linear layers. 2. (2)

Autoencoder Baseline: In (Weber et al., 2018b), an autoencoder model for a single signal is used. That is, at time step $t$ the network has the task to reconstruct the input vector that consists of the signal values on a sliding window at the time steps $(t-7,\dots,t)$ . We use their network architecture to obtain one model for each signal.

Following our approach in section 3.2, we use the $99.99\%$ percentile of the quadratic errors on a validation set (signal wise) to combine it to a final anomaly score.

4.4. Evaluation

In this section, we describe the numerical findings of CANet and compare them with the related research. For both, the real data and the synthetic data, we evaluate models of different $h_{scale}$ sizes i.e. for $h_{scale}\in\{5,10,20,30\}$ (see Table 2). Note that for comparability reasons no extra tuning of the network architecture for the different $h_{scale}$ values is made. A summary of the numerical results can be found in Table 3. The accuracies, true positive rates (i.e. rate of attacks that where successfully detected) and true negative rates (i.e. rate of normal data that was found to be normal) presented in the table are computed point wise. At the end of this section, we introduce an interval based evaluation criterion that might be more meaningful for real applications.

We find that in the real as well as in the synthetic data setting the models identify normal data correctly in a solid way, with an accuracy typically larger than $0.99$ . The attack types plateau, continuous change and playback are detected reliably. The plateau and the playback attack show particular high detection rates, typically in the range of $[0.85,0.955]$ . The continuous change attack has a detection rate normally larger than $0.70$ . Note that this is an excellent performance since the evaluation is done point wise. That is, we usually detect the vast majority of each attack interval (see Figure 2). Our approach outperforms the predictive and the autoencoder baseline by a significant margin. Typically, the baseline approaches only detect the first few attacked messages of the attack interval but identify the rest of the interval as normal (see Figure 3). As expected, the baseline approaches perform particularly poorly on the playback attack. This is because if only a single signal of the CAN traffic is taken into account, deviations in the group of signals that have physical dependencies cannot be exploited.

For comparison, we also evaluate our models on two other common attack types: suppress and flooding. These attacks can be detected by a rule based approach in a straight forward way, e.g. by analyzing the frequencies of each ID. Although our model does not have access to the time stamp and is therefore not specifically designed to find such attacks, it turns out that it still detects flooding attacks with a high true positive rate, whereas it struggles to detect suppress attacks. This is expected since the values that are added with a high frequency into the CAN bus during a flooding attack are much easier to be found than the gradual change of the network state that is the consequence of not sending a certain ID at all. When comparing with the baselines, we find that CANet is superior in all aspects. Nevertheless, both, the predictive and the autoencoder baseline, show relatively good results on the flooding attack. This is because in between the messages from the flooding attack the normal data points are still taken into account. Hence, during a single attack interval many anomalous large jumps in signal values can be found.

We find that the different choices for the parameter $h_{scale}$ have a relatively low effect on the performance of the models. Even small models with $h_{scale}=5$ perform reasonably well. This is especially interesting for a potential use of such models on an embedded device where memory and computational power are limited. Nevertheless, for the synthetic data set, we see a significant decrease in performance in the case $h_{scale}=30$ . We believe this is due to overfitting. Of course, more parameters of the network architecture could be changed, e.g. the size of the autoencoder bottleneck, to prevent overfitting for large $h_{scale}$ values.

When comparing the models on the real and synthetic data, we find that the performance is in a similar range in most cases. We believe that the synthetic data set is a good benchmark to test models for a CAN IDS, even if the data looks somewhat “cleaner” than in the real case.

Some of the synthetic data is visualized in Figure 2, which contains the plots of four different signals on the same time interval. In the signal of the upper right plot $B$ , a playback attack has been injected. The attack interval is visualized by the shade in all plots. We observe that for all signals the reconstruction on normal data is usually really accurate, while during the attack interval deviations between the true data and its reconstruction can be found. Note that this deviation also appears in signals that are not explicitly attacked but only correlated in some way with the attacked signal, whereas signals without any correlations stay unaffected. Furthermore, over an attack interval typically not the entire attack is detected as such. This is expected, because an attacked signal and its original counterpart may have similar values in some parts of the attack window. For example, in case of a continuous change attack the modified signal values lie in a realistic range at the beginning of the attack. As a consequence, the model detects an attack only after the deviation between the original and the modified signal exceeds a certain threshold.

Since we believe that in real application, finding a large number of attack intervals is more important than a high overall point wise accuracy on attacks (i.e. true positives), we investigate this by redefining what it means that an attack is found (see Figure 4). That is, we define an attack as the entire period of time during which the attack is performed, i.e. an attack interval. We compute the percentage of attack intervals that are detected. Here, the criterion for identifying an interval as anomalous is that at least $P\%$ of that interval is detected point wise as anomaly. We can see in Figure 4 that based on this definition CANet finds most anomaly intervals if $P\%$ does not get too large. This is true for both the real and the synthetic data case. However, both baseline methods have very low detection rates of anomaly intervals even for small $P\%$ (see Figure 4).

Summing up the results, we find that CANet is capable of reliably detecting attacks on signals that have functional dependencies, while performing solidly on normal data. Our main findings are:

(1)

The presented architecture is the first method that is capable of handling the difficult data structure from signals of multiple CAN IDs in a single model. 2. (2)

CANet outperforms the selected baseline CAN IDS methods by a significant margin on all selected evaluation criteria. 3. (3)

Our model has an excellent true negative rate and is capable of detecting unknown intrusions robustly.

4.5. Risks and Benefits of Neural Network Based IDS Models

The biggest advantage of using machine learning based approaches, such as the one presented in this manuscript, is that they are potentially capable of detecting unknown intrusions. That is, they are successful in a task in which most other methods fail. Classically, for each possible attack scenario, a defense mechanism must be chosen. However, this process is highly time consuming and requires a significant amount of CAN bus domain expert knowledge for a successful detection. Here, neural networks significantly reduce both, the development time and the required CAN domain knowledge.

On the other hand, the output of machine learning based methods can be complicated to analyze which makes it difficult to execute an automatic response once an intrusion is detected. Furthermore, neural networks require a large amount of trainig data and are typically more computational expensive than many other approaches. Another potential risk of using neural networks in a security setting is that they can be sensitive to adversarial attacks (Szegedy et al., 2014; Tramèr et al., 2018; Strauss et al., 2017).

4.6. Reproducibility

Typically, CAN bus based intrusion detection methods are tested on real data. However, publishing real CAN traffic and the corresponding CAN matrix is usually not possible, since it is considered intellectual property by most car producers. Hence, to the best of our knowledge, there is no standard data set for comparing methods. We try to close this gap by evaluating our model on both real and synthetic data and we make the synthetic data publicly available111The data is publicly available at https://github.com/etas/SynCAN.. We hope that this simplifies the work of future researchers to compare their work with a baseline.

5. Conclusion

Cars are getting more and more connected. This opens ways for attacking the CAN bus of automobiles remotely. Since attacks can have a major impact on traffic safety, it is desirable that such attacks are detected in a robust manner.

We present CANet, a novel neural network architecture that is trained in a unsupervised manner to detect intrusions and anomalies on the CAN bus. Furthermore, it is the first model in the literature capable of working on messages with different IDs simultaneously. The trained models have a high true negative rate, typically over $0.99$ , which is necessary for real world applications. Furthermore, along with the high true negative rate we are able to detect a large amount of the unknown attacks, both on real and synthetic data. Our method is the only one in the literature capable of finding anomalies like the replay attack reliably. Although the results show the capabilities of the method, for applying it in real application further steps might be necessary. Those could include tuning the network architecture or redefining the anomaly score.

For reproducibility of the method and in order to have a benchmark set for forthcoming approaches, our synthetic data is published at https://github.com/etas/SynCAN.

Acknowledgements.

We thank Jens Gramm, Michael Oechsle and the ETAS Machine Learning group for the useful discussions. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, views or statements, either expressed or implied, of the affiliated organizations of the authors.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Checkoway et al . (2011) Stephen Checkoway, Damon Mc Coy, Brian Kantor, Danny Anderson, Hovav Shacham, Stefan Savage, Karl Koscher, Alexei Czeskis, Franziska Roesner, Tadayoshi Kohno, et al . 2011. Comprehensive experimental analyses of automotive attack surfaces.. In USENIX Security Symposium . San Francisco, 77–92.
3Cho and Shin (2016) Kyong-Tak Cho and Kang G Shin. 2016. Fingerprinting Electronic Control Units for Vehicle Intrusion Detection.. In USENIX Security Symposium . 911–927.
4Hinton et al . (2012) Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al . 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29, 6 (2012), 82–97.
5Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
6Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 .
7Krizhevsky et al . (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems . 1097–1105.
8Le Cun et al . (2015) Yann Le Cun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.