MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic   Forgetting in Malware Classification

Jimin Park; AHyun Ji; Minji Park; Mohammad Saidur Rahman; Se Eun Oh

arXiv:2501.01110·cs.CR·January 3, 2025

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

Jimin Park, AHyun Ji, Minji Park, Mohammad Saidur Rahman, Se Eun Oh

PDF

Open Access 1 Repo

TL;DR

This paper introduces MalCL, a GAN-based continual learning system that effectively mitigates catastrophic forgetting in malware classification by generating high-quality synthetic samples and employing innovative replay sample selection, leading to significant accuracy improvements.

Contribution

MalCL is the first to combine GANs with feature matching loss and novel sample selection schemes for malware continual learning, enhancing performance on evolving malware datasets.

Findings

01

Achieves 55% accuracy on Windows malware, 28% higher than previous methods.

02

Demonstrates effective mitigation of catastrophic forgetting in class-incremental malware learning.

03

Provides practical insights and a publicly available implementation for future research.

Abstract

Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time. In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model's hidden representations. Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

malwarereplaygan/malcl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection