Continuous Learning for Android Malware Detection

Yizheng Chen; Zhoujie Ding; David Wagner

arXiv:2302.04332·cs.CR·June 16, 2023·5 cites

Continuous Learning for Android Malware Detection

Yizheng Chen, Zhoujie Ding, David Wagner

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel combination of contrastive learning and active learning to address concept drift in Android malware detection, significantly improving classifier robustness and maintaining performance over multiple years.

Contribution

It proposes a hierarchical contrastive learning scheme and a new sample selection method to enhance active learning for malware classifiers, effectively combating concept drift.

Findings

01

False negative rate reduced from 14% to 9%.

02

False positive rate decreased from 0.86% to 0.48%.

03

Performance remains stable over seven years.

Abstract

Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples. In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications

MethodsTest · Contrastive Learning