# Predicting Malicious Insider Threat Scenarios Using Organizational Data   and a Heterogeneous Stack-Classifier

**Authors:** Adam James Hall, Nikolaos Pitropakis, William J Buchanan, Naghmeh, Moradpoor

arXiv: 1907.10272 · 2019-07-25

## TL;DR

This paper develops a machine learning ensemble approach to predict malicious insider threats, specifically the uploading of sensitive data before leaving an organization, achieving high accuracy and ROC performance.

## Contribution

It introduces a methodology for processing organizational log data into daily summaries and combines multiple classifiers into a meta-classifier for improved threat prediction.

## Key findings

- Meta-classifier achieves 96.2% accuracy
- ROC area under curve is 0.988
- Ensemble approach outperforms individual models

## Abstract

Insider threats continue to present a major challenge for the information security community. Despite constant research taking place in this area; a substantial gap still exists between the requirements of this community and the solutions that are currently available. This paper uses the CERT dataset r4.2 along with a series of machine learning classifiers to predict the occurrence of a particular malicious insider threat scenario - the uploading sensitive information to wiki leaks before leaving the organization. These algorithms are aggregated into a meta-classifier which has a stronger predictive performance than its constituent models. It also defines a methodology for performing pre-processing on organizational log data into daily user summaries for classification, and is used to train multiple classifiers. Boosting is also applied to optimise classifier accuracy. Overall the models are evaluated through analysis of their associated confusion matrix and Receiver Operating Characteristic (ROC) curve, and the best performing classifiers are aggregated into an ensemble classifier. This meta-classifier has an accuracy of \textbf{96.2\%} with an area under the ROC curve of \textbf{0.988}.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10272/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10272/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1907.10272/full.md

---
Source: https://tomesphere.com/paper/1907.10272