MEADE: Towards a Malicious Email Attachment Detection Engine

Ethan M. Rudd; Richard Harang; and Joshua Saxe

arXiv:1804.08162·cs.CR·April 24, 2018

MEADE: Towards a Malicious Email Attachment Detection Engine

Ethan M. Rudd, Richard Harang, and Joshua Saxe

PDF

TL;DR

This paper investigates the application of machine learning techniques, including deep neural networks and gradient boosted decision trees, to detect malicious email attachments across diverse file types like Office documents and Zip archives, achieving high accuracy.

Contribution

It extends malware detection methods to heterogeneous email attachment types using large datasets and evaluates classifier performance, demonstrating high detection accuracy.

Findings

01

Achieved > 0.99 AUC with neural networks and gradient boosting.

02

Analyzed detection performance on large real-world datasets.

03

Discussed deployment considerations in anti-malware systems.

Abstract

Malicious email attachments are a growing delivery vector for malware. While machine learning has been successfully applied to portable executable (PE) malware detection, we ask, can we extend similar approaches to detect malware across heterogeneous file types commonly found in email attachments? In this paper, we explore the feasibility of applying machine learning as a static countermeasure to detect several types of malicious email attachments including Microsoft Office documents and Zip archives. To this end, we collected a dataset of over 5 million malicious/benign Microsoft Office documents from VirusTotal for evaluation as well as a dataset of benign Microsoft Office documents from the Common Crawl corpus, which we use to provide more realistic estimates of thresholds for false positive rates on in-the-wild data. We also collected a dataset of approximately 500k malicious/benign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.