Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
Pei Yan, Shunquan Tan, Miaohui Wang, Jiwu Huang

TL;DR
This paper introduces a novel malware detection approach that leverages GPT-4 for API call explanation and BERT for representation, resulting in improved generalization and detection performance across multiple datasets.
Contribution
The method uniquely combines GPT-4 prompt engineering with BERT to generate API call representations without dataset training, enhancing malware detection accuracy and generalization.
Findings
Outperforms state-of-the-art TextCNN in detection accuracy.
Achieves nearly 100% recall in malware detection.
Demonstrates strong generalization in cross-database and few-shot scenarios.
Abstract
Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep learning models for malware detection based on API sequences, the quality of API call representations produced by those models is limited. These models cannot generate representations for unknown API calls, which weakens both the detection performance and the generalization. Further, the concept drift phenomenon of API calls is prominent. To tackle these issues, we introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Transformer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Dense Connections
