Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection
Ahmed Sabbah, Radi Jarrar, Samer Zein, David Mohaisen

TL;DR
This paper investigates how concept drift affects Android malware detection models, revealing its widespread impact and the limited effectiveness of current mitigation strategies, including large language models, under various feature types and environments.
Contribution
It provides an empirical evaluation of concept drift across multiple datasets, algorithms, and feature types, highlighting the challenges and limitations of existing mitigation approaches.
Findings
Concept drift significantly impacts model performance.
Feature types and data environments influence drift effects.
Default algorithms and LLMs do not fully mitigate drift.
Abstract
Despite outstanding results, machine learning-based Android malware detection models struggle with concept drift, where rapidly evolving malware characteristics degrade model effectiveness. This study examines the impact of concept drift on Android malware detection, evaluating two datasets and nine machine learning and deep learning algorithms, as well as Large Language Models (LLMs). Various feature types--static, dynamic, hybrid, semantic, and image-based--were considered. The results showed that concept drift is widespread and significantly affects model performance. Factors influencing the drift include feature types, data environments, and detection methods. Balancing algorithms helped with class imbalance but did not fully address concept drift, which primarily stems from the dynamic nature of the malware landscape. No strong link was found between the type of algorithm used and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
