Static analysis of executable files by machine learning methods
Nikolay Prudkovskiy

TL;DR
This paper presents a machine learning approach for static analysis of executable files to detect malicious content by preprocessing data, feature selection, and ensemble classification methods.
Contribution
It introduces a comprehensive static analysis framework combining feature encoding, dimensionality reduction, and ensemble classifiers for malware detection.
Findings
Effective feature encoding and selection improve detection accuracy.
Ensemble classifiers outperform individual models.
System demonstrates robustness in uninsulated environments.
Abstract
The paper describes how to detect malicious executable files based on static analysis of their binary content. The stages of pre-processing and cleaning data extracted from different areas of executable files are analyzed. Methods of encoding categorical attributes of executable files are considered, as are ways to reduce the feature field dimension and select characteristic features in order to effectively represent samples of binary executable files for further training classifiers. An ensemble training approach was applied in order to aggregate forecasts from each classifier, and an ensemble of classifiers of various feature groups of executable file attributes was created in order to subsequently develop a system for detecting malicious files in an uninsulated environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Software Engineering Research
