CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls
Ahmed Bensaoud, Jugal Kalita

TL;DR
This paper introduces a novel malware classification system combining CNN-LSTM and transfer learning models that utilize opcode sequences and API calls, achieving state-of-the-art accuracy on a large dataset.
Contribution
The study presents a new hybrid CNN-LSTM model with transfer learning for malware classification based on opcode and API call features, demonstrating superior performance.
Findings
Achieved 99.91% accuracy with 8-gram sequences.
CNN-LSTM outperforms several recent deep learning architectures.
Swin-T and Sequencer2D-L architectures also achieved high accuracy.
Abstract
In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDepthwise Convolution · Batch Normalization · 1x1 Convolution · Pointwise Convolution · Depthwise Separable Convolution · Inverted Residual Block · EfficientNetV2
