Malware Detection with LSTM using Opcode Language
Renjie Lu

TL;DR
This paper introduces a static malware detection method using LSTM neural networks to analyze opcode sequences, achieving high accuracy without requiring a secure environment or signature-based methods.
Contribution
It presents a novel approach modeling malware as a language and applying a two-stage LSTM model for static analysis, which is more efficient and effective than traditional methods.
Findings
Achieved an average AUC of 0.99 for malware detection.
Achieved an average AUC of 0.987 for malware classification.
Demonstrated the feasibility of language modeling for static malware analysis.
Abstract
Nowadays, with the booming development of Internet and software industry, more and more malware variants are designed to perform various malicious activities. Traditional signature-based detection methods can not detect variants of malware. In addition, most behavior-based methods require a secure and isolated environment to perform malware detection, which is vulnerable to be contaminated. In this paper, similar to natural language processing, we propose a novel and efficient approach to perform static malware analysis, which can automatically learn the opcode sequence patterns of malware. We propose modeling malware as a language and assess the feasibility of this approach. First, We use the disassembly tool IDA Pro to obtain opcode sequence of malware. Then the word embedding technique is used to learn the feature vector representation of opcode. Finally, we propose a two-stage LSTM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Digital and Cyber Forensics
