Quantifying the Generalization Gap: A New Benchmark for Out-of-Distribution Graph-Based Android Malware Classification
Ngoc N. Tran, Anwar Said, Waseem Abbas, Tyler Derr, Xenofon D. Koutsoukos

TL;DR
This paper introduces a new benchmark and semantic enrichment framework to evaluate and improve the robustness of graph-based Android malware classifiers under distribution shifts, addressing a critical gap for real-world deployment.
Contribution
It presents a benchmarking suite for out-of-distribution scenarios and a semantic augmentation method that enhances feature richness for better generalization.
Findings
Data-centric approach improves robustness under distribution shift
Semantic enrichment enhances classifier performance
Benchmark datasets facilitate future research in resilient malware detection
Abstract
While graph-based Android malware classifiers achieve over 94% accuracy on standard benchmarks, they exhibit a significant generalization gap under distribution shift, suffering up to 45% performance degradation when encountering unseen malware variants from known families. This work systematically investigates this critical yet overlooked challenge for real-world deployment by introducing a benchmarking suite designed to simulate two prevalent scenarios: MalNet-Tiny-Common for covariate shift, and MalNet-Tiny-Distinct for domain shift. Furthermore, we identify an inherent limitation in existing benchmarks where the inputs are structure-only function call graphs, which fails to capture the latent semantic patterns necessary for robust generalization. To verify this, we construct a semantic enrichment framework that augments the original topology with function-level attributes, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Spam and Phishing Detection
