Dataset Bias in Android Malware Detection
Yan Lin, Tianming Liu, Wei Liu, Zhigaoyuan Wang, Li Li, Guoai Xu,, Haoyu Wang

TL;DR
This paper investigates how dataset bias affects Android malware detection performance, revealing that dataset variability can significantly mislead evaluation results and emphasizing the need for careful experimental controls.
Contribution
It systematically analyzes the impact of dataset bias in Android malware detection, highlighting how data selection and usage methods influence performance evaluation.
Findings
Detection performance varies by over 40% depending on dataset handling.
Method of flagging malware data directly impacts detection accuracy.
Malware family composition affects the superiority of detection approaches.
Abstract
Researchers have proposed kinds of malware detection methods to solve the explosive mobile security threats. We argue that the experiment results are inflated due to the research bias introduced by the variability of malware dataset. We explore the impact of bias in Android malware detection in three aspects, the method used to flag the ground truth, the distribution of malware families in the dataset, and the methods to use the dataset. We implement a set of experiments of different VT thresholds and find that the methods used to flag the malware data affect the malware detection performance directly. We further compare the impact of malware family types and composition on malware detection in detail. The superiority of each approach is different under various combinations of malware families. Through our extensive experiments, we showed that the methods to use the dataset can have a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
