Static analysis of executable files by machine learning methods

Nikolay Prudkovskiy

arXiv:2007.07501·cs.CR·July 16, 2020

Static analysis of executable files by machine learning methods

Nikolay Prudkovskiy

PDF

Open Access

TL;DR

This paper presents a machine learning approach for static analysis of executable files to detect malicious content by preprocessing data, feature selection, and ensemble classification methods.

Contribution

It introduces a comprehensive static analysis framework combining feature encoding, dimensionality reduction, and ensemble classifiers for malware detection.

Findings

01

Effective feature encoding and selection improve detection accuracy.

02

Ensemble classifiers outperform individual models.

03

System demonstrates robustness in uninsulated environments.

Abstract

The paper describes how to detect malicious executable files based on static analysis of their binary content. The stages of pre-processing and cleaning data extracted from different areas of executable files are analyzed. Methods of encoding categorical attributes of executable files are considered, as are ways to reduce the feature field dimension and select characteristic features in order to effectively represent samples of binary executable files for further training classifiers. An ensemble training approach was applied in order to aggregate forecasts from each classifier, and an ensemble of classifiers of various feature groups of executable file attributes was created in order to subsequently develop a system for detecting malicious files in an uninsulated environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Software Engineering Research