Learning the Language of NVMe Streams for Ransomware Detection
Barak Bringoltz, Elisha Halperin, Ran Feraru, Evgeny Blaichman, Amit, Berman

TL;DR
This paper introduces transformer-based language models to detect ransomware in NVMe command streams, significantly improving detection accuracy and data loss prevention over existing methods.
Contribution
It presents novel transformer models and tokenization schemes tailored for NVMe command sequences, advancing ransomware detection techniques.
Findings
Up to 24% reduction in missed detection rate
66% improvement in data loss prevention
84% accuracy in identifying accessed data
Abstract
We apply language modeling techniques to detect ransomware activity in NVMe command sequences. We design and train two types of transformer-based models: the Command-Level Transformer (CLT) performs in-context token classification to determine whether individual commands are initiated by ransomware, and the Patch-Level Transformer (PLT) predicts the volume of data accessed by ransomware within a patch of commands. We present both model designs and the corresponding tokenization and embedding schemes and show that they improve over state-of-the-art tabular methods by up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
