Machine Learning With Feature Selection Using Principal Component Analysis for Malware Detection: A Case Study
Jason Zhang

TL;DR
This study enhances malware detection by integrating PCA-based feature selection with neural networks, significantly reducing features and training time while maintaining high detection accuracy on PDF malware datasets.
Contribution
The paper introduces a novel combination of PCA with neural networks for malware detection, demonstrating improved efficiency and comparable accuracy over traditional methods.
Findings
PCA reduces feature set by 33% with minimal information loss.
Model with 32 principal components achieves 93.17% TPR.
Outperforms seven commercial antivirus scanners in detection rate.
Abstract
Cyber security threats have been growing significantly in both volume and sophistication over the past decade. This poses great challenges to malware detection without considerable automation. In this paper, we have proposed a novel approach by extending our recently suggested artificial neural network (ANN) based model with feature selection using the principal component analysis (PCA) technique for malware detection. The effectiveness of the approach has been successfully demonstrated with the application in PDF malware detection. A varying number of principal components is examined in the comparative study. Our evaluation shows that the model with PCA can significantly reduce feature redundancy and learning time with minimum impact on data information loss, as confirmed by both training and testing results based on around 105,000 real-world PDF documents. Of the evaluated models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
MethodsPrincipal Components Analysis
