Sandbox Sample Classification Using Behavioral Indicators of Compromise
M. Andrecut

TL;DR
This paper presents a machine learning approach to classify sandbox samples as malicious or benign based on behavioral indicators of compromise, utilizing traditional and Monte Carlo-inspired methods with real-world data.
Contribution
It introduces a novel classification approach combining traditional ML methods with Monte Carlo-inspired techniques for analyzing sandbox behavioral data.
Findings
Effective classification of sandbox samples achieved
Monte Carlo-inspired method shows promising results
Validated on ThreatGRID and ReversingLabs datasets
Abstract
Behavioral Indicators of Compromise are associated with various automated methods used to extract the sample behavior by observing the system function calls performed in a virtual execution environment. Thus, every sample is described by a set of BICs triggered by the sample behavior in the sandbox environment. Here we discuss a Machine Learning approach to the classification of the sandbox samples as MALICIOUS or BENIGN, based on the list of triggered BICs. Besides the more traditional methods like Logistic Regression and Naive Bayes Classification we also discuss a different approach inspired by the statistical Monte Carlo methods. The numerical results are illustrated using ThreatGRID and ReversingLabs data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsLogistic Regression
