# Anomaly-based Intrusion Detection in Industrial Data with SVM and Random   Forests

**Authors:** Simon D. Duque Anton, Sapna Sinha, and Hans Dieter Schotten

arXiv: 1907.10374 · 2019-07-25

## TL;DR

This paper explores machine learning techniques, specifically SVM and Random Forest, for anomaly detection in industrial network data to identify cyber-attacks on critical infrastructure.

## Contribution

It evaluates the effectiveness of SVM and Random Forest algorithms on industrial control system data, highlighting feature selection and data handling methods.

## Key findings

- Random Forest slightly outperforms SVM in detection accuracy
- Both algorithms effectively identify anomalies in industrial network traffic
- Feature extraction and missing data handling are crucial for performance

## Abstract

Attacks on industrial enterprises are increasing in number as well as in effect. Since the introduction of industrial control systems in the 1970's, industrial networks have been the target of malicious actors. More recently, the political and warfare-aspects of attacks on industrial and critical infrastructure are becoming more relevant. In contrast to classic home and office IT systems, industrial IT, so-called OT systems, have an effect on the physical world. Furthermore, industrial devices have long operation times, sometimes several decades. Updates and fixes are tedious and often not possible. The threats on industry with the legacy requirements of industrial environments creates the need for efficient intrusion detection that can be integrated into existing systems. In this work, the network data containing industrial operation is analysed with machine learning- and time series- based anomaly detection algorithms in order to discover the attacks introduced to the data. Two different data sets are used, one Modbus-based gas pipeline control traffic and one OPC UA-based batch processing traffic. In order to detect attacks, two machine learning-based algorithms are used, namely \textit{SVM} and Random Forest. Both perform well, with Random Forest slightly outperforming SVM. Furthermore, extracting and selecting features as well as handling missing data is addressed in this work.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10374/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1907.10374/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1907.10374/full.md

---
Source: https://tomesphere.com/paper/1907.10374