# Comparison of System Call Representations for Intrusion Detection

**Authors:** Sarah Wunderlich, Markus Ring, Dieter Landes, Andreas Hotho

arXiv: 1904.07118 · 2019-05-29

## TL;DR

This paper compares four preprocessing methods for neural network-based intrusion detection using system call sequences, highlighting the effectiveness of embedding techniques and cautioning against kernel module mapping due to information loss.

## Contribution

It introduces and evaluates four different system call preprocessing options for neural network intrusion detection, providing insights into their relative effectiveness and limitations.

## Key findings

- All four preprocessing methods are viable options.
- Kernel module mapping leads to significant information loss.
- Embedding-based methods perform well in classification tasks.

## Abstract

Over the years, artificial neural networks have been applied successfully in many areas including IT security. Yet, neural networks can only process continuous input data. This is particularly challenging for security-related non-continuous data like system calls. This work focuses on four different options to preprocess sequences of system calls so that they can be processed by neural networks. These input options are based on one-hot encoding and learning word2vec or GloVe representations of system calls. As an additional option, we analyze if the mapping of system calls to their respective kernel modules is an adequate generalization step for (a) replacing system calls or (b) enhancing system call data with additional information regarding their context. However, when performing such preprocessing steps it is important to ensure that no relevant information is lost during the process. The overall objective of system call based intrusion detection is to categorize sequences of system calls as benign or malicious behavior. Therefore, this scenario is used to evaluate the different input options as a classification task. The results show, that each of the four different methods is a valid option when preprocessing input data, but the use of kernel modules only is not recommended because too much information is being lost during the mapping process.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.07118/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.07118/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1904.07118/full.md

---
Source: https://tomesphere.com/paper/1904.07118