MLPdf: An Effective Machine Learning Based Approach for PDF Malware Detection
Jason Zhang

TL;DR
This paper introduces MLPdf, a neural network-based method that effectively detects PDF malware with high accuracy, outperforming commercial antivirus solutions in identifying malicious PDFs.
Contribution
The paper presents a novel MLP neural network approach for PDF malware detection, utilizing high-quality features and demonstrating superior performance over existing antivirus tools.
Findings
Achieved a true positive rate of 95.12%.
Maintained a false positive rate of only 0.08%.
Outperformed eight well-known commercial antivirus scanners.
Abstract
Due to the popularity of portable document format (PDF) and increasing number of vulnerabilities in major PDF viewer applications, malware writers continue to use it to deliver malware via web downloads, email attachments and other methods in both targeted and non-targeted attacks. The topic on how to effectively block malicious PDF documents has received huge research interests in both cyber security industry and academia with no sign of slowing down. In this paper, we propose a novel approach based on a multilayer perceptron (MLP) neural network model, termed MLPdf, for the detection of PDF based malware. More specifically, the MLPdf model uses a backpropagation algorithm with stochastic gradient decent search for model update. A group of high quality features are extracted from two real-world datasets which comprise around 105000 benign and malicious PDF documents. Evaluation results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
