MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers
Robert J. Joyce, Edward Raff, Charles Nicholas, James Holt

TL;DR
This paper introduces MalDICT, a comprehensive benchmark dataset for classifying malware based on behaviors, platforms, vulnerabilities, and packers, utilizing a new AV label parser called ClarAVy to enable detailed malware analysis.
Contribution
The paper presents ClarAVy, a novel AV label parser, and releases the first large-scale benchmark datasets for malware classification across four under-explored attribute categories.
Findings
Created ClarAVy to parse 882 AV label formats.
Released nearly 5.5 million malware samples with detailed labels.
Provided the first datasets for malware platform and packer classification.
Abstract
Existing research on malware classification focuses almost exclusively on two tasks: distinguishing between malicious and benign files and classifying malware by family. However, malware can be categorized according to many other types of attributes, and the ability to identify these attributes in newly-emerging malware using machine learning could provide significant value to analysts. In particular, we have identified four tasks which are under-represented in prior work: classification by behaviors that malware exhibit, platforms that malware run on, vulnerabilities that malware exploit, and packers that malware are packed with. To obtain labels for training and evaluating ML classifiers on these tasks, we created an antivirus (AV) tagging tool called ClarAVy. ClarAVy's sophisticated AV label parser distinguishes itself from prior AV-based taggers, with the ability to accurately parse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
