Comparison of Multiple Classifiers for Android Malware Detection with Emphasis on Feature Insights Using CICMalDroid 2020 Dataset
Md Min-Ha-Zul Abedin, Tazqia Mehrub

TL;DR
This study compares multiple classifiers for Android malware detection using a comprehensive dataset, finding gradient boosting methods with hybrid features to be highly accurate and interpretable for practical deployment.
Contribution
It introduces a rigorous evaluation of classifiers on CICMalDroid 2020, highlighting the effectiveness of gradient boosting with hybrid features for malware detection.
Findings
Gradient boosting achieved over 97% accuracy.
PCA reduced model performance significantly.
Key decision drivers include package name and main activity.
Abstract
Accurate Android malware detection was critical for protecting users at scale. Signature scanners lagged behind fast release cycles on public app stores. We aimed to build a trustworthy detector by pairing a comprehensive dataset with a rigorous, transparent evaluation, and to identify interpretable drivers of decisions. We used CICMalDroid2020, which contained 17,341 apps across Benign, Adware, Banking, SMS malware, and Riskware. We extracted 301 static and 263 dynamic features into a 564 dimensional hybrid vector, then evaluated seven classifiers under three schemes, original features, principal component analysis, PCA, and linear discriminant analysis, LDA, with a 70 percent training and 30 percent test split. Results showed that gradient boosting on the original features performed best. XGBoost achieved 0.9747 accuracy, 0.9703 precision, 0.9731 recall, and 0.9716 F1, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Spam and Phishing Detection
