Continuous Learning for Android Malware Detection
Yizheng Chen, Zhoujie Ding, David Wagner

TL;DR
This paper introduces a novel combination of contrastive learning and active learning to address concept drift in Android malware detection, significantly improving classifier robustness and maintaining performance over multiple years.
Contribution
It proposes a hierarchical contrastive learning scheme and a new sample selection method to enhance active learning for malware classifiers, effectively combating concept drift.
Findings
False negative rate reduced from 14% to 9%.
False positive rate decreased from 0.86% to 0.48%.
Performance remains stable over seven years.
Abstract
Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples. In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications
MethodsTest · Contrastive Learning
