# Open Set Intrusion Recognition for Fine-Grained Attack Categorization

**Authors:** Steve Cruz, Cora Coleman, Ethan M. Rudd, and Terrance E. Boult

arXiv: 1703.02244 · 2017-03-08

## TL;DR

This paper evaluates open set intrusion recognition methods on the KDDCUP'99 dataset, highlighting the effectiveness of W-SVMs in identifying novel attack types in a fine-grained, realistic setting.

## Contribution

It introduces a fine-grained open set protocol for intrusion detection and compares Gaussian RBF SVMs with W-SVMs, demonstrating the latter's superior performance in recognizing unseen attack types.

## Key findings

- W-SVMs outperform RBF SVMs in open set intrusion detection.
- Recognition of individual intrusion types is feasible with the proposed protocol.
- Operational implications of open set recognition are discussed.

## Abstract

Confidently distinguishing a malicious intrusion over a network is an important challenge. Most intrusion detection system evaluations have been performed in a closed set protocol in which only classes seen during training are considered during classification. Thus far, there has been no realistic application in which novel types of behaviors unseen at training -- unknown classes as it were -- must be recognized for manual categorization. This paper comparatively evaluates malware classification using both closed set and open set protocols for intrusion recognition on the KDDCUP'99 dataset. In contrast to much of the previous work, we employ a fine-grained recognition protocol, in which the dataset is loosely open set -- i.e., recognizing individual intrusion types -- e.g., "sendmail", "snmp guess", ..., etc., rather than more general attack categories (e.g., "DoS","Probe","R2L","U2R","Normal"). We also employ two different classifier types -- Gaussian RBF kernel SVMs, which are not theoretically guaranteed to bound open space risk, and W-SVMs, which are theoretically guaranteed to bound open space risk. We find that the W-SVM offers superior performance under the open set regime, particularly as the cost of misclassifying unknown classes at query time (i.e., classes not present in the training set) increases. Results of performance tradeoff with respect to cost of unknown as well as discussion of the ramifications of these findings in an operational setting are presented.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.02244/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1703.02244/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/1703.02244/full.md

---
Source: https://tomesphere.com/paper/1703.02244