Optimized Deep Learning Models for Malware Detection under Concept Drift

William Maillet; Benjamin Marais

arXiv:2308.10821·cs.CR·August 2, 2024

Optimized Deep Learning Models for Malware Detection under Concept Drift

William Maillet, Benjamin Marais

PDF

Open Access

TL;DR

This paper introduces a drift-resilient neural network approach for malware detection that adapts to evolving data, using feature reduction and a novel loss function, achieving significantly improved detection rates over time.

Contribution

It proposes a model-agnostic protocol with a new loss function and feature reduction techniques to enhance neural network robustness against concept drift in malware detection.

Findings

01

Detects 15.2% more malware than baseline

02

Effective against concept drift in recent datasets

03

Highlights importance of recent validation data

Abstract

Despite the promising results of machine learning models in malicious files detection, they face the problem of concept drift due to their constant evolution. This leads to declining performance over time, as the data distribution of the new files differs from the training one, requiring frequent model update. In this work, we propose a model-agnostic protocol to improve a baseline neural network against drift. We show the importance of feature reduction and training with the most recent validation set possible, and propose a loss function named Drift-Resilient Binary Cross-Entropy, an improvement to the classical Binary Cross-Entropy more effective against drift. We train our model on the EMBER dataset, published in2018, and evaluate it on a dataset of recent malicious files, collected between 2020 and 2023. Our improved model shows promising results, detecting 15.2% more malware than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications