Online Clustering of Known and Emerging Malware Families
Olha Jure\v{c}kov\'a, Martin Jure\v{c}ek, Mark Stamp

TL;DR
This paper presents a machine learning model for online clustering of malware samples, enabling rapid identification of known and emerging malware families with high cluster purity, based on static analysis of Windows executables.
Contribution
It introduces a novel online clustering approach combining weighted k-NN and online k-means for malware family classification, improving response time and cluster purity.
Findings
Cluster purity ranges from 90.20% to 93.34%.
Effective separation of known and emerging malware families.
Speeds up malware analysis process.
Abstract
Malware attacks have become significantly more frequent and sophisticated in recent years. Therefore, malware detection and classification are critical components of information security. Due to the large amount of malware samples available, it is essential to categorize malware samples according to their malicious characteristics. Clustering algorithms are thus becoming more widely used in computer security to analyze the behavior of malware variants and discover new malware families. Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats. This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families. Streaming data is divided according to the clustering decision rule into samples from known and new emerging malware families. The streaming data is classified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Spam and Phishing Detection · Network Security and Intrusion Detection
