ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection
Md Tanvirul Alam, Aritran Piplai, Nidhi Rastogi

TL;DR
This paper introduces ADAPT, a pseudo-labeling semi-supervised approach that effectively adapts malware detection models to concept drift, reducing reliance on costly annotations and outperforming existing methods across diverse datasets.
Contribution
The paper presents a novel, model-agnostic pseudo-labeling semi-supervised algorithm for handling concept drift in malware detection, validated through extensive experiments.
Findings
ADAPT outperforms baseline models and benchmarks.
Effective across Android, Windows, and PDF malware datasets.
Reduces need for frequent costly annotations.
Abstract
Machine learning models are commonly used for malware classification; however, they suffer from performance degradation over time due to concept drift. Adapting these models to changing data distributions requires frequent updates, which rely on costly ground truth annotations. While active learning can reduce the annotation burden, leveraging unlabeled data through semi-supervised learning remains a relatively underexplored approach in the context of malware detection. In this research, we introduce \texttt{ADAPT}, a novel pseudo-labeling semi-supervised algorithm for addressing concept drift. Our model-agnostic method can be applied to various machine learning models, including neural networks and tree-based algorithms. We conduct extensive experiments on five diverse malware detection datasets spanning Android, Windows, and PDF domains. The results demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
