Cluster Analysis and Concept Drift Detection in Malware

Aniket Mishra; Mark Stamp

arXiv:2502.14135·cs.LG·March 17, 2026

Cluster Analysis and Concept Drift Detection in Malware

Aniket Mishra, Mark Stamp

PDF

Open Access

TL;DR

This paper presents a clustering-based method using MiniBatch K-Means and silhouette coefficient to detect concept drift in malware data, improving classification accuracy and efficiency.

Contribution

It introduces a novel clustering approach with silhouette analysis for effective, automated concept drift detection in malware classification tasks.

Findings

01

Drift-aware retraining outperforms static models in accuracy.

02

The proposed method is nearly as accurate as periodic retraining.

03

The approach is more efficient than periodic retraining.

Abstract

Concept drift refers to gradual or sudden changes in the properties of data that affect the accuracy of machine learning models. In this paper, we address the problem of concept drift detection in the malware domain. Specifically, we propose and analyze a clustering-based approach to detecting concept drift. Using a subset of the KronoDroid dataset, malware samples are partitioned into temporal batches and analyzed using MiniBatch $K$ -Means clustering. The silhouette coefficient is used as a metric to identify points in time where concept drift has likely occurred. To verify our drift detection results, we train learning models under three realistic scenarios, which we refer to as static training, periodic retraining, and drift-aware retraining. In each scenario, we consider four supervised classifiers, namely, Multilayer Perceptron (MLP), Support Vector Machine (SVM), Random Forest,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Data Stream Mining Techniques · Advanced Malware Detection Techniques