On Training Robust PDF Malware Classifiers

Yizheng Chen; Shiqi Wang; Dongdong She; Suman Jana

arXiv:1904.03542·cs.CR·December 4, 2019·6 cites

On Training Robust PDF Malware Classifiers

Yizheng Chen, Shiqi Wang, Dongdong She, Suman Jana

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for training PDF malware classifiers that are verifiably robust against evasion attacks, improving security while maintaining high accuracy.

Contribution

It proposes a new distance metric and robustness properties for PDFs, enabling formal verification of classifier robustness and enhancing resistance to evasion tactics.

Findings

01

Achieves 92.27% verified robust accuracy on three properties

02

Maintains 99.74% overall accuracy with 0.56% false positive rate

03

Robust models require significantly larger modifications for evasion

Abstract

Although state-of-the-art PDF malware classifiers can be trained with almost perfect test accuracy (99%) and extremely low false positive rate (under 0.1%), it has been shown that even a simple adversary can evade them. A practically useful malware classifier must be robust against evasion attacks. However, achieving such robustness is an extremely challenging task. In this paper, we take the first steps towards training robust PDF malware classifiers with verifiable robustness properties. For instance, a robustness property can enforce that no matter how many pages from benign documents are inserted into a PDF malware, the classifier must still classify it as malicious. We demonstrate how the worst-case behavior of a malware classifier with respect to specific robustness properties can be formally verified. Furthermore, we find that training classifiers that satisfy formally verified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

surrealyz/pdfclassifier
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Network Security and Intrusion Detection