Command-line Obfuscation Detection using Small Language Models
Vojtech Outrata, Michael Adam Polak, Martin Kopp

TL;DR
This paper presents a scalable NLP-based detection method using small transformer language models to identify command-line obfuscation in execution logs, outperforming signature-based approaches across diverse real-world environments.
Contribution
The authors develop a custom-trained small transformer model for obfuscation detection, demonstrating high precision and effectiveness on real-world telemetry data, including unseen obfuscated samples.
Findings
High-precision detection on diverse telemetry data
Outperforms signature-based methods on obfuscated malware
Detects previously unseen obfuscated command-line samples
Abstract
To avoid detection, adversaries often use command-line obfuscation. There are numerous techniques of the command-line obfuscation, all designed to alter the command-line syntax without affecting its original functionality. This variability forces most security solutions to create an exhaustive enumeration of signatures for even a single pattern. In contrast to using signatures, we have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model that can be applied to any source of execution logs. The evaluation on top of real-world telemetry demonstrates that our approach yields high-precision detections even on high-volume telemetry from a diverse set of environments spanning from universities and businesses to healthcare or finance. The practical value is demonstrated in a case study of real-world samples detected by our model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Spam and Phishing Detection · Digital Media Forensic Detection
MethodsSparse Evolutionary Training
