TL;DR
This paper introduces a deep reinforcement learning approach for malware detection that adapts to concept drift, improving long-term performance and resilience in dynamic Android malware environments.
Contribution
It formulates malware detection as a Markov Decision Process and trains a DRL agent to optimize detection and manual labeling decisions under concept drift.
Findings
DRMD outperforms standard classifiers in AUT metrics.
DRMD achieves up to 10.90 AUT improvement.
Demonstrates DRL's effectiveness in malware detection under drift.
Abstract
Malware detection in real-world settings must deal with evolving threats, limited labeling budgets, and uncertain predictions. Traditional classifiers, without additional mechanisms, struggle to maintain performance under concept drift in malware domains, as their supervised learning formulation cannot optimize when to defer decisions to manual labeling and adaptation. Modern malware detection pipelines combine classifiers with monthly active learning (AL) and rejection mechanisms to mitigate the impact of concept drift. In this work, we develop a novel formulation of malware detection as a one-step Markov Decision Process and train a deep reinforcement learning (DRL) agent, simultaneously optimizing sample classification performance and rejecting high-risk samples for manual labeling. We evaluated the joint detection and drift mitigation policy learned by the DRL-based Malware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
