A Benchmark API Call Dataset for Windows PE Malware Classification

Ferhat Ozgur Catak; Ahmet Faruk Yaz{\i}

arXiv:1905.01999·cs.CR·February 23, 2021·23 cites

A Benchmark API Call Dataset for Windows PE Malware Classification

Ferhat Ozgur Catak, Ahmet Faruk Yaz{\i}

PDF

Open Access 3 Repos

TL;DR

This paper introduces a dataset of Windows PE malware API call sequences, enabling researchers to develop and evaluate malware classification algorithms using a standardized benchmark.

Contribution

The authors provide a comprehensive dataset of 7107 malware samples with API call sequences, facilitating standardized malware classification research.

Findings

01

Dataset includes 7107 malware samples from various families.

02

Enables benchmarking of different classification algorithms.

03

Provides a structured format for malware analysis.

Abstract

The use of operating system API calls is a promising task in the detection of PE-type malware in the Windows operating system. This task is officially defined as running malware in an isolated sandbox environment, recording the API calls made with the Windows operating system and sequentially analyzing these calls. Here, we have analyzed 7107 different malicious software belonging to various families such as virus, backdoor, trojan in an isolated sandbox environment and transformed these analysis results into a format where different classification algorithms and methods can be used. First, we'll explain how we got the malware, and then we'll explain how we've got these software bundled into families. Finally, we will describe how to perform malware classification tasks using different computational methods for the researchers who will use the data set we have created.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications