Cluster Analysis and Concept Drift Detection in Malware
Aniket Mishra, Mark Stamp

TL;DR
This paper presents a clustering-based method using MiniBatch K-Means and silhouette coefficient to detect concept drift in malware data, improving classification accuracy and efficiency.
Contribution
It introduces a novel clustering approach with silhouette analysis for effective, automated concept drift detection in malware classification tasks.
Findings
Drift-aware retraining outperforms static models in accuracy.
The proposed method is nearly as accurate as periodic retraining.
The approach is more efficient than periodic retraining.
Abstract
Concept drift refers to gradual or sudden changes in the properties of data that affect the accuracy of machine learning models. In this paper, we address the problem of concept drift detection in the malware domain. Specifically, we propose and analyze a clustering-based approach to detecting concept drift. Using a subset of the KronoDroid dataset, malware samples are partitioned into temporal batches and analyzed using MiniBatch -Means clustering. The silhouette coefficient is used as a metric to identify points in time where concept drift has likely occurred. To verify our drift detection results, we train learning models under three realistic scenarios, which we refer to as static training, periodic retraining, and drift-aware retraining. In each scenario, we consider four supervised classifiers, namely, Multilayer Perceptron (MLP), Support Vector Machine (SVM), Random Forest,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Data Stream Mining Techniques · Advanced Malware Detection Techniques
