A Benchmark API Call Dataset for Windows PE Malware Classification
Ferhat Ozgur Catak, Ahmet Faruk Yaz{\i}

TL;DR
This paper introduces a dataset of Windows PE malware API call sequences, enabling researchers to develop and evaluate malware classification algorithms using a standardized benchmark.
Contribution
The authors provide a comprehensive dataset of 7107 malware samples with API call sequences, facilitating standardized malware classification research.
Findings
Dataset includes 7107 malware samples from various families.
Enables benchmarking of different classification algorithms.
Provides a structured format for malware analysis.
Abstract
The use of operating system API calls is a promising task in the detection of PE-type malware in the Windows operating system. This task is officially defined as running malware in an isolated sandbox environment, recording the API calls made with the Windows operating system and sequentially analyzing these calls. Here, we have analyzed 7107 different malicious software belonging to various families such as virus, backdoor, trojan in an isolated sandbox environment and transformed these analysis results into a format where different classification algorithms and methods can be used. First, we'll explain how we got the malware, and then we'll explain how we've got these software bundled into families. Finally, we will describe how to perform malware classification tasks using different computational methods for the researchers who will use the data set we have created.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
