Robust Machine Learning for Malware Detection over Time
Daniele Angioni, Luca Demetrio, Maura Pintor, Battista Biggio

TL;DR
This paper introduces a framework for analyzing concept drift in malware detection data and proposes a time-aware SVM classifier that maintains performance better over time by understanding data evolution.
Contribution
It presents a novel drift-analysis framework and a time-aware SVM classifier that together improve malware detection robustness against concept drift.
Findings
SVM-CB outperforms standard classifiers over time
The drift-analysis framework identifies key data characteristics causing drift
Time-aware models better resist distribution changes in malware data
Abstract
The presence and persistence of Android malware is an on-going threat that plagues this information era, and machine learning technologies are now extensively used to deploy more effective detectors that can block the majority of these malicious programs. However, these algorithms have not been developed to pursue the natural evolution of malware, and their performances significantly degrade over time because of such concept-drift. Currently, state-of-the-art techniques only focus on detecting the presence of such drift, or they address it by relying on frequent updates of models. Hence, there is a lack of knowledge regarding the cause of the concept drift, and ad-hoc solutions that can counter the passing of time are still under-investigated. In this work, we commence to address these issues as we propose (i) a drift-analysis framework to identify which characteristics of data are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
