Classification and Online Clustering of Zero-Day Malware
Olha Jure\v{c}kov\'a, Martin Jure\v{c}ek, Mark Stamp, Fabio Di Troia,, R\'obert L\'orencz

TL;DR
This paper presents an online method for classifying known malware and clustering new malware families using machine learning and self-organizing maps, demonstrating high accuracy and promising clustering purity.
Contribution
It introduces an integrated approach combining multilayer perceptron classification with self-organizing maps for real-time malware family identification and clustering of zero-day malware.
Findings
97.21% classification accuracy on streaming malware data
Clustering purity up to 77.68% for ten clusters
Effective differentiation between known and new malware families
Abstract
A large amount of new malware is constantly being generated, which must not only be distinguished from benign samples, but also classified into malware families. For this purpose, investigating how existing malware families are developed and examining emerging families need to be explored. This paper focuses on the online processing of incoming malicious samples to assign them to existing families or, in the case of samples from new families, to cluster them. We experimented with seven prevalent malware families from the EMBER dataset, four in the training set and three additional new families in the test set. Based on the classification score of the multilayer perceptron, we determined which samples would be classified and which would be clustered into new malware families. We classified 97.21% of streaming data with a balanced accuracy of 95.33%. Then, we clustered the remaining data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
MethodsTest
