EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers

Robert J. Joyce; Gideon Miller; Phil Roth; Richard Zak; Elliott Zaresky-Williams; Hyrum Anderson; Edward Raff; James Holt

arXiv:2506.05074·cs.CR·June 6, 2025

EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers

Robert J. Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, James Holt

PDF

2 Repos 1 Models 3 Datasets

TL;DR

EMBER2024 is a comprehensive malware dataset designed for holistic evaluation of classifiers, including evasive malware samples, supporting multiple tasks and advancing malware detection research.

Contribution

It introduces EMBER2024, a large, multi-format dataset with new features and challenge samples, enabling more realistic and thorough malware classifier evaluation.

Findings

01

Includes over 3.2 million files across six formats.

02

Supports seven malware classification tasks.

03

Contains a set of evasive malware samples for robustness testing.

Abstract

A lack of accessible data has historically restricted malware analysis research, and practitioners have relied heavily on datasets provided by industry sources to advance. Existing public datasets are limited by narrow scope - most include files targeting a single platform, have labels supporting just one type of malware classification task, and make no effort to capture the evasive files that make malware detection difficult in practice. We present EMBER2024, a new dataset that enables holistic evaluation of malware classifiers. Created in collaboration with the authors of EMBER2017 and EMBER2018, the EMBER2024 dataset includes hashes, metadata, feature vectors, and labels for more than 3.2 million files from six file formats. Our dataset supports the training and evaluation of machine learning models on seven malware classification tasks, including malware detection, malware family…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
joyce8/EMBER2024-benchmark-models
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training