A Natural Language Processing Approach to Malware Classification
Ritik Mehta, Olha Jure\v{c}kov\'a, Mark Stamp

TL;DR
This paper introduces a hybrid NLP-inspired malware classification method using HMMs for feature extraction combined with classifiers, demonstrating superior performance over traditional techniques on a challenging dataset.
Contribution
It proposes a novel hybrid approach that leverages HMMs for feature extraction in malware classification, inspired by NLP techniques, and shows improved results.
Findings
HMM-Random Forest achieved the best classification accuracy.
NLP-inspired feature engineering outperforms traditional methods.
Hybrid approach enhances malware detection performance.
Abstract
Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) networks. In this research, we consider a hybrid architecture, where HMMs are trained on opcode sequences, and the resulting hidden states of these trained HMMs are used as feature vectors in various classifiers. In this context, extracting the HMM hidden state sequences can be viewed as a form of feature engineering that is somewhat analogous to techniques that are commonly employed in Natural Language Processing (NLP). We find that this NLP-based approach outperforms other popular techniques on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
