MEADE: Towards a Malicious Email Attachment Detection Engine
Ethan M. Rudd, Richard Harang, and Joshua Saxe

TL;DR
This paper investigates the application of machine learning techniques, including deep neural networks and gradient boosted decision trees, to detect malicious email attachments across diverse file types like Office documents and Zip archives, achieving high accuracy.
Contribution
It extends malware detection methods to heterogeneous email attachment types using large datasets and evaluates classifier performance, demonstrating high detection accuracy.
Findings
Achieved > 0.99 AUC with neural networks and gradient boosting.
Analyzed detection performance on large real-world datasets.
Discussed deployment considerations in anti-malware systems.
Abstract
Malicious email attachments are a growing delivery vector for malware. While machine learning has been successfully applied to portable executable (PE) malware detection, we ask, can we extend similar approaches to detect malware across heterogeneous file types commonly found in email attachments? In this paper, we explore the feasibility of applying machine learning as a static countermeasure to detect several types of malicious email attachments including Microsoft Office documents and Zip archives. To this end, we collected a dataset of over 5 million malicious/benign Microsoft Office documents from VirusTotal for evaluation as well as a dataset of benign Microsoft Office documents from the Common Crawl corpus, which we use to provide more realistic estimates of thresholds for false positive rates on in-the-wild data. We also collected a dataset of approximately 500k malicious/benign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
