Intrusion Detection: Machine Learning Baseline Calculations for Image Classification
Erik Larsen, Korey MacVittie, John Lilly

TL;DR
This paper explores machine learning techniques for malware image classification, finding that traditional models outperform convolutional networks, with overall accuracy below 80%, indicating the need for more advanced methods.
Contribution
It provides a baseline comparison of various machine learning models for malware image classification, highlighting the limitations of convolutional networks in this context.
Findings
Light Gradient Boosting Machine performs well
Convolutional networks underperform compared to simple models
Overall accuracy remains below 80%
Abstract
Cyber security can be enhanced through application of machine learning by recasting network attack data into an image format, then applying supervised computer vision and other machine learning techniques to detect malicious specimens. Exploratory data analysis reveals little correlation and few distinguishing characteristics between the ten classes of malware used in this study. A general model comparison demonstrates that the most promising candidates for consideration are Light Gradient Boosting Machine, Random Forest Classifier, and Extra Trees Classifier. Convolutional networks fail to deliver their outstanding classification ability, being surpassed by a simple, fully connected architecture. Most tests fail to break 80% categorical accuracy and present low F1 scores, indicating more sophisticated approaches (e.g., bootstrapping, random samples, and feature selection) may be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
