# Malware Detection using Machine Learning and Deep Learning

**Authors:** Hemant Rathore, Swati Agarwal, Sanjay K. Sahay, Mohit Sewak

arXiv: 1904.02441 · 2019-04-05

## TL;DR

This paper explores the use of machine learning and deep learning techniques, particularly opcode frequency features, for malware detection, comparing various algorithms and highlighting the effectiveness of Random Forest over deep neural networks.

## Contribution

The study evaluates multiple ML and DL models for malware detection using opcode features, revealing that Random Forest outperforms DNN and simpler feature reduction methods are more effective.

## Key findings

- Random Forest outperforms Deep Neural Network in malware classification
- Variance Threshold feature reduction is more effective than auto-encoders
- Opcode frequency is a useful feature for malware detection

## Abstract

Research shows that over the last decade, malware has been growing exponentially, causing substantial financial losses to various organizations. Different anti-malware companies have been proposing solutions to defend attacks from these malware. The velocity, volume, and the complexity of malware are posing new challenges to the anti-malware community. Current state-of-the-art research shows that recently, researchers and anti-virus organizations started applying machine learning and deep learning methods for malware analysis and detection. We have used opcode frequency as a feature vector and applied unsupervised learning in addition to supervised learning for malware classification. The focus of this tutorial is to present our work on detecting malware with 1) various machine learning algorithms and 2) deep learning models. Our results show that the Random Forest outperforms Deep Neural Network with opcode frequency as a feature. Also in feature reduction, Deep Auto-Encoders are overkill for the dataset, and elementary function like Variance Threshold perform better than others. In addition to the proposed methodologies, we will also discuss the additional issues and the unique challenges in the domain, open research problems, limitations, and future directions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.02441/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.02441/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1904.02441/full.md

---
Source: https://tomesphere.com/paper/1904.02441