CITADEL: A Semi-Supervised Active Learning Framework for Malware Detection Under Continuous Distribution Drift
Md Ahsanul Haque, Md Mahmuduzzaman Kamol, Suresh Kumar Amalapuram, Vladik Kreinovich, Mohammad Saidur Rahman

TL;DR
CITADEL is a semi-supervised active learning framework that effectively detects Android malware under evolving distributions, reducing labeling costs and computational effort while maintaining high detection accuracy.
Contribution
It introduces malware-specific augmentations and a multi-criteria active learning strategy to improve malware detection under concept drift with limited labeled data.
Findings
Outperforms prior methods on four benchmarks with over 1-14% higher F1 scores.
Achieves 24x faster training and 13x fewer operations compared to previous approaches.
Effectively adapts to evolving malware distributions with only 40% labeled samples.
Abstract
Android malware detection systems suffer severe performance degradation over time due to concept drift caused by evolving malicious and benign app behaviors. Although recent methods leverage active learning and hierarchical contrastive loss to address drift, they remain fully supervised, computationally expensive, and ineffective on long-term real-world benchmark. Moreover, expert labeling does not scale to the monthly emergence of nearly 300K new Android malware samples, leaving most data unlabeled and underutilized. To address these challenges, we propose CITADEL, a semi-supervised active learning framework for Android malware detection. Existing semi-supervised methods assume continuous and semantically meaningful input transformations, and fail to generalize well to high-dimensional binary malware features. We bridge this gap with malware-specific augmentations, Bernoulli bit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Data Stream Mining Techniques · Spam and Phishing Detection
