MalDICT: Benchmark Datasets on Malware Behaviors, Platforms,   Exploitation, and Packers

Robert J. Joyce; Edward Raff; Charles Nicholas; James Holt

arXiv:2310.11706·cs.CR·October 19, 2023·1 cites

MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers

Robert J. Joyce, Edward Raff, Charles Nicholas, James Holt

PDF

Open Access 1 Repo

TL;DR

This paper introduces MalDICT, a comprehensive benchmark dataset for classifying malware based on behaviors, platforms, vulnerabilities, and packers, utilizing a new AV label parser called ClarAVy to enable detailed malware analysis.

Contribution

The paper presents ClarAVy, a novel AV label parser, and releases the first large-scale benchmark datasets for malware classification across four under-explored attribute categories.

Findings

01

Created ClarAVy to parse 882 AV label formats.

02

Released nearly 5.5 million malware samples with detailed labels.

03

Provided the first datasets for malware platform and packer classification.

Abstract

Existing research on malware classification focuses almost exclusively on two tasks: distinguishing between malicious and benign files and classifying malware by family. However, malware can be categorized according to many other types of attributes, and the ability to identify these attributes in newly-emerging malware using machine learning could provide significant value to analysts. In particular, we have identified four tasks which are under-represented in prior work: classification by behaviors that malware exhibit, platforms that malware run on, vulnerabilities that malware exploit, and packers that malware are packed with. To obtain labels for training and evaluating ML classifiers on these tasks, we created an antivirus (AV) tagging tool called ClarAVy. ClarAVy's sophisticated AV label parser distinguishes itself from prior AV-based taggers, with the ability to accurately parse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FutureComputing4AI/ClarAVy
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection