TL;DR
This paper evaluates how concept drift affects malware detection models over time and proposes a novel adaptive data stream pipeline to improve detection accuracy in evolving malware landscapes.
Contribution
It introduces a comprehensive analysis of concept drift in malware detection, compares drift detection methods, and proposes an adaptive pipeline that outperforms existing approaches.
Findings
Concept drift significantly impacts malware classifier performance over nine years.
The proposed adaptive pipeline maintains higher detection accuracy amidst evolving malware.
Certain drift detectors and feature extractors outperform others in real-world scenarios.
Abstract
Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples' features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (about 130K apps) and a subset of AndroZoo (about 285K apps). We used these datasets to train an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
